The Cryo-EMMAE pipeline starts with an input micrograph and follows these steps: a. Pre-processing: The micrograph undergoes normalization of background noise to minimize correlation with experimental parameters and is filtered to enhance particle contrast. b. Micrograph Representation: Patches are extracted from the pre-processed micrograph and used to map it onto the MAE representation space. c. Denoising: The resulting embeddings form a smaller image where a k-means trained on the train set identifies pixels with the lowest noise levels. These images undergo further denoising through micrograph-specific hierarchical clustering. d. Post-processing: Convolution-based smoothing is applied on the predictions of the particle centres with greater accuracy.
Cryo-Electron Microscopy Masked AutoEncoder Cryo-EMMAE a self-supervised method designed to mitigate the need for such manually annotated data. Cryo-EMMAE leverages the representation space of a masked autoencoder to sequentially denoise an input micrograph based on clusters of datapoints with different noise levels.
1. Create the emmae environment by running the following commands in your terminal:
# create env
conda create -n emmae python=3.10
conda activate emmae
# pytorch
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
# rest libraries
pip install huggingface-hub==0.23.3 scipy==1.11.4 opencv-python==4.10.0.82 scikit-learn==1.2.2 timm==1.0.3 tqdm==4.66.4 jupyter wandb==0.16.5
2. Whenever you want to work on the project, activate the emmae environment by executing the following command in the terminal:
conda activate emmae
3. Download from zenodo https://doi.org/10.5281/zenodo.11659477, checkpoints of the models and the datasets used in this work. Extract the files in the main directory of the project.
To preprocess your data (supporting image format and MRC format), give micrograph image directory, the particle diameter of your protein in pixels of the original micrograph shape, and the output directory to save the preprocessed micrographs.
python preprocess.py --md 'input_directory_path' \
--t 'True if mrc file type, False if image file type, default is True' \
--pd 'particle diameter as integer,
in pixel size of the original image, e.g. 224, default is 200' \
--od 'output_directory_path/' \
--id 'Give an identifier to your dataset'
#Example
python preprocess.py --md 'test/mrcs/' --od 'test/images/' --id 'test'
Pick particles, given the yaml file, the experiment number, the according epoch for the checkpoint and the path to the list of the images to be predicted in npy format file. This function also outputs the star file for the predicted coordinates at the directory "/results/star_files/".
python predict.py -c runs/example.yaml \
--ec 'Experiment checkpoint number to load' \
--e 'Number of epoch to load' \
--ip 'Provide the image path of your preprocessed mrc files.' \
--id 'Give the identifier of your dataset'
#Example
python predict.py -c runs/82.yaml --ec '82' --e '400' --ip 'test/images/' --id 'test'
To train the model prepare your yaml file from the given example.
python train.py -c runs/example.yaml
Evaluate the picked particles based on the ground truth data, uploaded at the zenodo link under the results.zip file. Give the same experiment, prediction-description, and prediction-set-path as in prediction step, the ground-truth-path is in default at "./results/target_512_20_npy/"
python evaluate.py --experiment 'example' \
--prediction-description 'example_prediction_set' \
--prediction-set-path 'path to prediction list in npy format,
e.g. ./datasets/data_lists/10291_validation.npy' \
--ground-truth-path './results/target_512_20_npy/'
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or have suggestions for improvement, please create an issue on GitHub. We appreciate your contribution!
For queries and suggestions, please contact: [email protected]
If you use this code in your research, please cite the original work.