Skip to content

Latest commit

 

History

History
115 lines (87 loc) · 6.33 KB

sample_tasks.md

File metadata and controls

115 lines (87 loc) · 6.33 KB

Sample Tasks

This document contains two sample tasks for the classification and segmentation pipelines.

The document will walk through the steps in Training Steps, but with specific examples for each task. Before trying to train these models, you should have followed steps to set up an environment and AzureML

Sample classification task: Glaucoma Detection on OCT volumes

This example is based on the paper A feature agnostic approach for glaucoma detection in OCT volumes.

Downloading and preparing the dataset

The dataset is available here [1].

After downloading and extracting the zip file, run the create_glaucoma_dataset_csv.py script on the extracted folder.

python create_dataset_csv.py /path/to/extracted/folder

This will convert the dataset to csv form and create a file dataset.csv.

Finally, upload this folder (with the images and dataset.csv) to Azure Blob Storage. For details on creating a storage account, see Setting up AzureML. The dataset should go into a container called datasets, with a folder name of your choice (name_of_your_dataset_on_azure in the description below).

Creating the model configuration and starting training

Next, you need to create a configuration file InnerEye/ML/configs/MyGlaucoma.py which extends the GlaucomaPublic class like this:

from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic
class MyGlaucomaModel(GlaucomaPublic):
    def __init__(self) -> None:
        super().__init__()
        self.azure_dataset_id="name_of_your_dataset_on_azure"

The value for self.azure_dataset_id should match the dataset upload location, called name_of_your_dataset_on_azure above.

Once that config is in place, you can start training in AzureML via

python InnerEye/ML/runner.py --model=MyGlaucomaModel --azureml

As an alternative to working with a fork of the repository, you can use InnerEye-DeepLearning via a submodule. Please check here for details.

Sample segmentation task: Segmentation of Lung CT

This example is based on the Lung CT Segmentation Challenge 2017 [2].

Downloading and preparing the dataset

The dataset [3][4] can be downloaded here.

You need to convert the dataset from DICOM-RT to NIFTI. Before this, place the downloaded dataset in another parent folder, which we will call datasets. This file structure is expected by the conversion tool.

Next, use the InnerEye-CreateDataset commandline tools to create a NIFTI dataset from the downloaded (DICOM) files. After installing the tool, run

InnerEye.CreateDataset.Runner.exe dataset --datasetRootDirectory=<path to the 'datasets' folder> --niftiDatasetDirectory=<output folder name for converted dataset> --dicomDatasetDirectory=<name of downloaded folder inside 'datasets'> --geoNorm 1;1;3

Now, you should have another folder under datasets with the converted Nifti files. The geonorm tag tells the tool to normalize the voxel sizes during conversion.

Finally, upload this folder (with the images and dataset.csv) to Azure Blob Storage. For details on creating a storage account, see Setting up AzureML. All files should go into a folder in the datasets container, for example my_lung_dataset. This folder name will need to go into the azure_dataset_id field of the model configuration, see below.

Creating the model configuration and starting training

You can then create a new model configuration, based on the template Lung.py. To do this, create a file InnerEye/ML/configs/segmentation/MyLungModel.py, where you create a subclass of the template Lung model, and add the azure_dataset_id field (i.e., the name of the folder that contains the uploaded data from above), so that it looks like:

from InnerEye.ML.configs.segmentation.Lung import Lung
class MyLungModel(Lung):
    def __init__(self) -> None:
        super().__init__()
        self.azure_dataset_id = "my_lung_dataset"

If you are using InnerEye as a submodule, please add this configuration in your private configuration folder, as described for the Glaucoma model here.

You can now run the following command to start a job on AzureML:

python InnerEye/ML/runner.py --azureml --model=MyLungModel

See Model Training for details on training outputs, resuming training, testing models and model ensembles.

References

[1] Ishikawa, Hiroshi. (2018). OCT volumes for glaucoma detection (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1481223

[2] Yang, J. , Veeraraghavan, H. , Armato, S. G., Farahani, K. , Kirby, J. S., Kalpathy-Kramer, J. , van Elmpt, W. , Dekker, A. , Han, X. , Feng, X. , Aljabar, P. , Oliveira, B. , van der Heyden, B. , Zamdborg, L. , Lam, D. , Gooding, M. and Sharp, G. C. (2018), Autosegmentation for thoracic radiation treatment planning: A grand challenge at AAPM 2017. Med. Phys.. . doi:10.1002/mp.13141

[3] Yang, Jinzhong; Sharp, Greg; Veeraraghavan, Harini ; van Elmpt, Wouter ; Dekker, Andre; Lustberg, Tim; Gooding, Mark. (2017). Data from Lung CT Segmentation Challenge. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2017.3r3fvz08

[4] Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. (paper)