Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea Collection #1

Open
mousyball opened this issue Jan 13, 2021 · 2 comments
Open

Idea Collection #1

mousyball opened this issue Jan 13, 2021 · 2 comments
Assignees
Labels

Comments

@mousyball
Copy link
Owner

mousyball commented Jan 13, 2021

Intro.

The arch is similar to pytorch-lighting.

Arch

  • (Main Trainer #3)Trainer
    • Trainer Setup
    • Data Setup
    • Training Loop
      • Train
      • Eval
      • Callback Hook (logger, tensorboard, ...)

Components

ToyNet and ToyDataset are needed for development.

  • Registry modules: [ref] mmcv

    • Transform
    • Network
    • Loss
    • Callback
  • Logger

    • Design a logger class used to be called at the hook.
  • Config: [ref] fvcore

    • Use yacs to construct the config with inheritance.
  • Loss

    • Design a class that includes multiple meters for monitoring loss.
    • Support the loss depended on tasks in examples or common scenarios.
    • Support
      • OD sampler could be referenced from mmdetection.
      • OHEM for detection/segmentation/other tasks.
  • LR Schedular

    • Plain
    • Stepwise
    • Cosine Decay Restart
  • Optimizer

    • SGD, Adam, Ranger, ...
  • Loader for model weights

    • Exception handling
    • Layer by layer loading
  • Network

    • SyncBN
    • custom_freeze / freeze_bn / eval mode
    • lr_group and group weights
    • Builder?
  • Tensorboard

    • loss
    • lr
    • inference result of an image
    • confused prediction
  • Dataloader / Dataset

    • Support Public Dataset

Features

  • Resume checkpoints

  • Integrate with MLFlow

  • Utility functions

    • Cython helper function
  • Distributed Training for GPU

  • lr_finder

  • Visualizer

    • Transform
    • Output (may depend on tasks)
  • Evaluation for test set

  • Demo Script

  • Support ONNX

Others

  • Testing

    • pytest
  • Environment

    • .gitmessage
    • docker
    • pipenv
      • requirements.txt
  • Documentation

    • Readme
    • Describe the reason why the design is made from/for.
    • Dev SOP
@mousyball mousyball self-assigned this Jan 13, 2021
@mousyball
Copy link
Owner Author

mousyball commented Jan 14, 2021

These components should be considered first

@mousyball
Copy link
Owner Author

mousyball commented Jan 15, 2021

NOTE

  • Make sure that functions are de-coupled in the Callback.
  • Runner has 2 modes: iterative and epoch.
  • Schedular should re-write or use pytorch function?
  • get_10x_lr_params checker
    • use module.paramters() to check
  • batch_size finder
    • iteratively increase batch size and run training loop
  • distributed train
    • Sync Buffers between machine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant