Cryo-EMMAE : Self-Supervised particle picking in Cryo-EM imaging

The Cryo-EMMAE pipeline starts with an input micrograph and follows these steps: a. Pre-processing: The micrograph undergoes normalization of background noise to minimize correlation with experimental parameters and is filtered to enhance particle contrast. b. Micrograph Representation: Patches are extracted from the pre-processed micrograph and used to map it onto the MAE representation space. c. Denoising: The resulting embeddings form a smaller image where a k-means trained on the train set identifies pixels with the lowest noise levels. These images undergo further denoising through micrograph-specific hierarchical clustering. d. Post-processing: Convolution-based smoothing is applied on the predictions of the particle centres with greater accuracy.

Description

Cryo-Electron Microscopy Masked AutoEncoder Cryo-EMMAE a self-supervised method designed to mitigate the need for such manually annotated data. Cryo-EMMAE leverages the representation space of a masked autoencoder to sequentially denoise an input micrograph based on clusters of datapoints with different noise levels.

Installation (Linux)

1. Create the emmae environment by running the following commands in your terminal:

# create env
conda create -n emmae python=3.10
conda activate emmae

# pytorch
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

# rest libraries
pip install huggingface-hub==0.23.3 scipy==1.11.4 opencv-python==4.10.0.82 scikit-learn==1.2.2 timm==1.0.3 tqdm==4.66.4 jupyter wandb==0.16.5

2. Whenever you want to work on the project, activate the emmae environment by executing the following command in the terminal:

conda activate emmae

3. Download from zenodo https://doi.org/10.5281/zenodo.11659477, checkpoints of the models and the datasets used in this work. Extract the files in the main directory of the project.

Usage

Preprocess data

To preprocess your data (supporting image format and MRC format), give micrograph image directory, the particle diameter of your protein in pixels of the original micrograph shape, and the output directory to save the preprocessed micrographs.

python preprocess.py --md 'input_directory_path' \
                     --t  'True if mrc file type, False if image file type, default is True' \
                     --pd 'particle diameter as integer,
                                          in pixel size of the original image, e.g. 224, default is 200' \
                     --od 'output_directory_path/' \
                     --id 'Give an identifier to your dataset'

#Example
python preprocess.py --md 'test/mrcs/' --od 'test/images/' --id 'test'

Predict

Pick particles, given the yaml file, the experiment number, the according epoch for the checkpoint and the path to the list of the images to be predicted in npy format file. This function also outputs the star file for the predicted coordinates at the directory "/results/star_files/".

python predict.py -c runs/example.yaml \
                  --ec  'Experiment checkpoint number to load' \
                  --e   'Number of epoch to load' \
                  --ip  'Provide the image path of your preprocessed mrc files.' \
                  --id  'Give the identifier of your dataset'

#Example
python predict.py -c runs/82.yaml --ec '82' --e '400' --ip 'test/images/' --id 'test'

Train

To train the model prepare your yaml file from the given example.

python train.py -c runs/example.yaml

Evaluate

Evaluate the picked particles based on the ground truth data, uploaded at the zenodo link under the results.zip file. Give the same experiment, prediction-description, and prediction-set-path as in prediction step, the ground-truth-path is in default at "./results/target_512_20_npy/"

python evaluate.py  --experiment 'example' \
                    --prediction-description 'example_prediction_set' \
                    --prediction-set-path 'path to prediction list in npy format,
                                           e.g. ./datasets/data_lists/10291_validation.npy' \
                    --ground-truth-path './results/target_512_20_npy/'

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

If you encounter any issues or have suggestions for improvement, please create an issue on GitHub. We appreciate your contribution!

Contact

For queries and suggestions, please contact: [email protected]

Citing this work

If you use this code in your research, please cite the original work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cryo-EMMAE : Self-Supervised particle picking in Cryo-EM imaging

Table of Contents

Description

Installation (Linux)

Usage

Preprocess data

Predict

Train

Evaluate

License

Contributing

Contact

Citing this work

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
checkpoints		checkpoints
datasets		datasets
model		model
params		params
results		results
runs		runs
test		test
train_module		train_module
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
emmae.yaml		emmae.yaml
evaluate.py		evaluate.py
predict.py		predict.py
preprocess.py		preprocess.py
train.py		train.py

License

azamanos/Cryo-EMMAE

Folders and files

Latest commit

History

Repository files navigation

Cryo-EMMAE : Self-Supervised particle picking in Cryo-EM imaging

Table of Contents

Description

Installation (Linux)

Usage

Preprocess data

Predict

Train

Evaluate

License

Contributing

Contact

Citing this work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages