paper link: https://arxiv.org/abs/2311.15719 and https://proceedings.bmvc2023.org/699/
Exploration of LIDC-IDRI lung lesion dataset (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=1966254).
- Converts the CT scans from DICOM format to numpy arrays (numerical vectors for Houndsfield Units of pixel intensities) and crop images to a region of interest (ROI) 64x64 pixels. Includes calculations for how to implement the bounding box cropping of the image based on the centre of the region annotated in the segmentation masks.
- View the metadata given with the LIDC-IDRI dataset and saves malignancy labels to a numpy array. Includes removal of the slices from the labels which were excluded.
- Find the region of interest (ROI) size for the lesions based on the convex hull and minimum bounding box of the segmentation masks.
- Split the patients into train/validation/test it produces the following two files also saved here meta_mal_ben.csv and meta_mal_nonmal.csv. These files hold the meta-data for both splits of the patients: malignant vs benign (mal_ben) with ambiguous excluded and malignant vs non-malignant (mal_nonmal).
- Extract the latent vectors from the VAE model using the model parameters saved and save the latent vectors.
- Gaussian VAE with hyperparameter training combined with MLP predictor to assess classification quality of latent vectors. Note: includes splitting slices at the patient level.
- VAE with Dirichlet latent space. Note: produces latent vectors with better disentanglement which may allow better latent exploration as each dimension in latent vector is encouraged to encode different features.
- Gaussian VAE malignant vs non-malignant with joint VAE and classifier loss.
- Gaussian VAE malignant vs benign with joint VAE and classifier loss.
- Dirichlet VAE with joint VAE and classifier loss.
- This file explores clustering of the latent vectors. Including extracting latent vectors, exploration using PCA and t-SNE and k-means clustering.
- Grid search for best clustering with K-Means and CLASSIX (https://github.com/nla-group/classix).
- Latent space exploration and code to generate latent traversal figures. Also included under main are 5 example GIFs of latent traversals.
- This file does a larger random hyperparameter search than my other random search files (in VAE). This script runs cross-validation on the latent vectors to find the best results of the classifier.
- This file does the larger random hyperparameter search for the Dirichlet VAE.