Skip to content

Tutorials

Artsemi Yushkevich edited this page Jun 17, 2022 · 56 revisions

This page contains TomoBEAR tutorials which should showcase the capabilities of the processing pipeline. These tutorials assume that you have already cloned the TomoBEAR github repository, installed all the required software and setup TomoBEAR.

Ribosome (EMPIAR-10064)

As a first data set to showcase the capabilities of TomoBEAR we have chosen the Ribosome data set with the number 10064 in the EMPIAR database.

You can get the data set from here. In our case we used just the mixedCTEM data and achieved 11.25 Å in resolution with ~4k particles which is similar to the resolution achieved by the original researchers. If you want you can additionally use the CTEM data to be able to pick even more particles.

After downloading the data extract it in a folder of your choice. One thing one should note about this data is that

  • the data is already motion corrected
  • the stacks are already assembled
  • the pixel size is not in the header
  • the tilt angles are not provided

Because of such circumstances which sometimes occur TomoBEAR is able to inject this data along with the JSON file which describes the processing pipeline.

If you have already cloned the TomoBEAR github repository to your local machine you can find in the configurations folder a file called ribosome_empiar_10064_dynamo.json. This file describes the processing pipeline which should be setup by TomoBEAR to process this data set.

The following paragraphs will explain the variables contained in the JSON file and the needed changes to be able to run TomoBEAR on your local machine. In the end of this chapter the whole JSON file is shown.

First of all and most importantly you need to show TomoBEAR the path to the data and the processing folder. This must be done in the section "general": {} of the JSON file.

    "general": {
        "project_name": "Ribosome",
        "project_description": "Ribosome EMPIAR 10064",
        "data_path": "/path/to/ribosome/data/*.mrc",
        "processing_path": "/path/to/processing/folder",
        "expected_symmetrie": "C1",
        "apix": 2.62,
        "tilt_angles": [-60.0, -58.0, -56.0, -54.0, -52.0, -50.0, -48.0, -46.0, -44.0, -42.0, -40.0, -38.0, -36.0, -34.0, -32.0, -30.0, -28.0, -26.0, -24.0, -22.0, -20.0, -18.0, -16.0, -14.0, -12.0, -10.0, -8.0, -6.0, -4.0, -2.0, 0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0, 22.0, 24.0, 26.0, 28.0, 30.0, 32.0, 34.0, 36.0, 38.0, 40.0, 42.0, 44.0, 46.0, 48.0, 50.0, 52.0, 54.0, 56.0],
        "rotation_tilt_axis":-5,
        "gold_bead_size_in_nm": 9,
        "template_matching_binning": 8,
        "binnings": [2, 4, 8],
        "reconstruction_thickness": 1400,
        "as_boxes": false
  },

Everything else should be fine for now and the processing can be started. To run the TomoBEAR on the Ribosome data set you need to type in the following command in the command window of MATLAB

runTomoBear("local", "/path/to/ribosome_empiar_10064_dynamo.json")

or if you are using a compiled version of TomoBEAR and have everything set up properly type in the following command on the command line from the TomoBEAR folder

./run_tomoBEAR local /path/to/ribosome_empiar_10064_dynamo.json /path/to/defaults.json

When you followed all the steps thoroughly TomoBEAR should run up to the first appearence of StopPipeline. That means the following modules will be executed.

    "MetaData": {
    },
    "CreateStacks": {
    },
    "DynamoTiltSeriesAlignment": {
    },
    "DynamoCleanStacks": {
    },
    "BatchRunTomo": {
        "skip_steps": [4],
        "ending_step": 6
    },
    "StopPipeline": {
    },

This can take a while, as the result of this segment TomoBEAR will create a folder structure with subfolders for the individual steps. You can monitor the progress of the execution in shell and by inspecting the contents of the folders. Upon success of an operation a file SUCCESS is written inside each folder. If you want to rerun a step you can terminate the process, change parameters, remove the SUCCESS file (or the entire subfolder) and restart the process. Here the stacks have already been assembled, so neither "Motioncorr2": {}, not "SortFiles": {} modules were not needed. Here the key functionality is performed by "DynamoTiltSeriesAlignment": {} (a recommended tutorial can be found here) after which the projections containing low number of tracked gold beads are excluded by "DynamoCleanStacks": {}. Finally, the output is converted into an IMOD project.

The running time depends on your infrastructure and setup. After TomoBEAR stops you can inspect the fiducial model in the folder of "BatchRunTomo": {} which you can find in your processing folder.

cd /path/to/your/processing/folder/5_BatchRunTomo_1

Now you can inspect the alignment of every tilt stack one after the other and can possibly refine it if needed. For that you can use the following command. Please replace xxx with the tomogram number(s) that you want to inspect.

etomo tomogram_xxx/*.edf

When etomo starts just chose the fine alignment step which should be Lila if everything went fine for that tomogram and then click on edit/view fiducial model to start 3dmod with the right options to be able to refine the gold beads. Before you start to refine just press the arrow up button in the top left corner of the window with the view port. To refine the gold beads click on Go to next big residual in the window with the stacked buttons from top to bottom and the view in the view port window should change immediately to the location of a gold bead with a big residual. Now see if you can center the marker better on the gold bead with the right mouse button. It is important that you don't put it on the peak of the red arrow but center it on the gold bead. When you are finished with this gold bead just press again on the Go to next big residual button. After you are finished with re-centering the marker on the gold beads you need to press the Save and run tiltalign button.

After you finished the inspection of all the alignments you can start TomoBEAR again as previously and it will continue from where it stopped up to the next StopPipeline section.

To continue running TomoBEAR on the Ribosome data set you need to type in as previously the following command in the command window of MATLAB

runTomoBear("local", "/path/to/ribosome_empiar_10064_dynamo.json")

or if you are using a compiled version of TomoBEAR and have everything set up properly type in the following command on the command line from the TomoBEAR folder

./run_tomoBEAR local /path/to/ribosome_empiar_10064_dynamo.json /path/to/defaults.json

TomoBEAR should now detect that it has stopped at the previous step StopPipeline and continue from where it stopped. The following excerpt from the ribosome_empiar_10064_dynamo.json file is describing what TomoBEAR needs to do next.

    "BatchRunTomo": {
        "starting_step": 8,
        "ending_step": 8
    },
    "GCTFCtfphaseflipCTFCorrection": {
    },
    "BatchRunTomo": {
        "starting_step": 10,
        "ending_step": 13
    },
    "BinStacks": {
    },
    "Reconstruct": {
    },
    "DynamoImportTomograms": {
    },
    "EMDTemplateGeneration": {
        "template_emd_number": "3420",
        "flip_handedness": true
    },
    "DynamoTemplateMatching": {
    },
    "TemplateMatchingPostProcessing": {
        "cc_std": 2.5
    },

This segment performs estimation of defocus, and hence, of the Contrast Transfer Function (CTF) using GCTF and subsequent CTF-correction using Ctfphaseflip from IMOD ("GCTFCtfphaseflipCTFCorrection": {}). You can inspect the quality of fitting by going into the folder 8_GCTFCtfphaseflipCTFCorrection_1 and typing imod tomogram_xxx/slices/*.ctf and making sure that the Thon rings match the estimation. If not - play with the parameters of the GCTFCtfphaseflipCTFCorrection module.

Then binned aligned CTF-corrected stacks are produced by "BinStacks": {} and tomographic reconstructions are generated for the binnings specified in the section "general": {}. In this example the particles are picked using template matching. First a template from EMDB is produced at a proper voxel size, then "DynamoTemplateMatching": {} creates cross-correlation (CC) volumes which can be inspected. Finally, highest cross-correlation peaks, over 2.5 standard deviations above the mean value in the cross-correlation volume are selected for extraction to 3D particle files, the initial coordinates are stored in the particles_table folder as a file in the dynamo table format.

In the section below you will find subtomogram classification projects that should produce you a reasonable structure. They first use multi-reference alignment projects with a true class and so-called noise trap classes to first classify out false-positive particles produced by template matching, this happens at the binning which was used for template matching. In the end of the segment you should have a reasonable set of particles in the best class.

    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 3,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1],
        "box_size": 1.10,
        "binning": 4
    },
    "StopPipeline": {
    },

After subtomogram classification projects done, you should have a reasonable set of particles in the best class which you should select. To select the best class you need to go into the last DynamoAlignmentProject folder before the last produced StopPipeline folder, and then go to alignment_project_1_bin_y/mraProject_bin_y/results/iteQQQQ/averages (where iteQQQQ corresponds to the pre-last iteration folder) and type imod average_ref_CCC_ite_QQQQ.em to open produced average for each class CCC to identify the best class to use further.

Variables binning y, pre-last iteration number QQQQ and class numbers CCC can depend on parameters used in DynamoAlignmentProject. But if you repeat instructions provided in this tutorial, this should be the folder 23_DynamoAlignmentProject_1/alignment_project_1_bin_4/mraProject_bin_4/results/ite0012/averages where you may find files average_ref_001_ite_0012.em, average_ref_002_ite_0012.em, and average_ref_003_ite_0012.em corresponding to the produced averages for 3 classes, from which you should choose the best one.

Once you have selected the best class, insert corresponding class number in the list [] as a value of the parameter "selected_classes" to the following section to be executed by TomoBEAR:

    "DynamoAlignmentProject": {
        "classes": 1,
        "iterations": 1,
        "use_noise_classes": false,
        "swap_particles": false,
        "use_symmetrie": false,
        "selected_classes": [3],
        "binning": 4,
        "threshold":0.8
    },

The section above is called single reference project, which will split the particles of the previously selected best class into two equally sized classes (called even/odd halves) with subsequent alignment of the particles in those halves to produce corresponding averages. This division will be needed further when unbinned data will be produced to be able to calculate the resolution of the resulting averaged map using Fourier Shell Correlation (FSC) curve.

After the first single reference project introduced above you will need to process tomograms by similar projects but at lower binnings in order to reduce the voxel size up to unbinned data to get the information corresponding to the highest possible resolution to be achieved using the current dataset. At this point automated workflow is finished as the user needs to play with the masks, particle sets, etc.

You may want to try to use the following example of the end section of JSON file in order to have experience of processing tomograms at lower binnings to produce unbinned data to finally be able to calculate resolution of your ribosome electron-density map as a result of the first experience with TomoBEAR!

    "DynamoAlignmentProject": {
        "classes": 1,
        "iterations": 1,
        "use_noise_classes": false,
        "swap_particles": false,
        "use_symmetrie": false,
        "selected_classes": [1,2],
        "binning": 2,
        "threshold":0.9
    },
    "BinStacks":{
        "binnings": [1],
        "use_ctf_corrected_aligned_stack": false,
        "run_ctf_phaseflip": true
    },
    "Reconstruct": {
        "reconstruct": "unbinned"
    },
    "DynamoAlignmentProject": {
        "classes": 1,
        "iterations": 1,
        "use_noise_classes": false,
        "swap_particles": false,
        "use_symmetrie": false,
        "selected_classes": [1,2],
        "binning": 1,
        "threshold":1
    }

Consider, that after performing first single reference project you need to select both halves from the previous step by setting "selected_classes": [1,2] (in order to keep all particles) while producing one class by setting "classes": 1 at the subsequent steps of binning reduction using "DynamoAlignmentProject": {} module.

If you get out of memory error while running some of "DynamoAlignmentProject": {} at lower binnings (especially the last one), you may put additional parameter "dt_crop_in_memory": 0 to the corresponding "DynamoAlignmentProject": {} sections in order to prevent keeping the whole tomogram in memory for processing. For example, in this tutorial size of the one of unbinned tomograms is ~72Gb, while for binning 2 it is near 9Gb.

Finally, to estimate resolution of produced by TomoBEAR results, you need to use the following Dynamo command in MATLAB:

fsc = dfsc(path_to_half1, path_to_half2, 'apix', 2.62, 'mask', path_to_mask, 'show', 'on')

where path_to_half1 and path_to_half2 are paths to the prelast iteration results of the last DynamoAlignmentProject folder, which in this tutorial are located in 29_DynamoAlignmentProject_1/alignment_project_1_bin_1/mraProject_bin_1_eo/results/ite0006/averages, where you may find files average_ref_001_ite_0006.em and average_ref_002_ite_0006.em corresponding to the averages made from halves of the resulting particles set. You also need to use a mask to filter averages for FSC calculation, and the accuracy of the used mask have impact on the resolution estimation. Appropriate mask to use for the initial resolution estimation you may find in the last DynamoAlignmentProject folder in a file called mask.em (in this tutorial path_to_mask is 29_DynamoAlignmentProject_1/mask.em).

After that you should get a similar FSC curve to the following one:

Ribosome EMPIAR-10064 FSC

where in Red we added a so-called "gold-standard" criterion of FSC = 0.143 to estimate the final map resolution, which in our case for the final set of ~4k ribosome particles reached 11.25Å.

Here the Ribosome data set-based tutorial is finished. We thank you for trying out TomoBEAR and hope you have enjoyed it!

The full JSON file to setup the processing pipeline in TomoBEAR and process the data you may find

here (expand to see)
{
    "general": {
        "project_name": "Ribosome",
        "project_description": "Ribosome EMPIAR 10064",
        "data_path": "/path/to/ribosome/data/*.mrc",
        "processing_path": "/path/to/processing/folder",
        "expected_symmetrie": "C1",
        "apix": 2.62,
        "tilt_angles": [-60.0, -58.0, -56.0, -54.0, -52.0, -50.0, -48.0, -46.0, -44.0, -42.0, -40.0, -38.0, -36.0, -34.0, -32.0, -30.0, -28.0, -26.0, -24.0, -22.0, -20.0, -18.0, -16.0, -14.0, -12.0, -10.0, -8.0, -6.0, -4.0, -2.0, 0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0, 22.0, 24.0, 26.0, 28.0, 30.0, 32.0, 34.0, 36.0, 38.0, 40.0, 42.0, 44.0, 46.0, 48.0, 50.0, 52.0, 54.0, 56.0],
        "rotation_tilt_axis":-5,
        "gold_bead_size_in_nm": 9,
        "template_matching_binning": 8,
        "binnings": [2, 4, 8],
        "reconstruction_thickness": 1400,
        "as_boxes": false
    },
    "MetaData": {
    },
    "CreateStacks": {
    },
    "DynamoTiltSeriesAlignment": {
    },
    "DynamoCleanStacks": {
    },
    "BatchRunTomo": {
        "skip_steps": [4],
        "ending_step": 6
    },
    "StopPipeline": {
    },
    "BatchRunTomo": {
        "starting_step": 8,
        "ending_step": 8
    },
    "GCTFCtfphaseflipCTFCorrection": {
    },
    "BatchRunTomo": {
        "starting_step": 10,
        "ending_step": 13
    },
    "BinStacks": {
    },
    "Reconstruct": {
    },
    "DynamoImportTomograms": {
    },
    "EMDTemplateGeneration": {
        "template_emd_number": "3420",
        "flip_handedness": true
    },
    "DynamoTemplateMatching": {
    },
    "TemplateMatchingPostProcessing": {
        "cc_std": 2.5
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 4,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1]
    },
    "DynamoAlignmentProject": {
        "iterations": 3,
        "classes": 3,
        "use_noise_classes": true,
        "use_symmetrie": false,
        "selected_classes": [1],
        "box_size": 1.10,
        "binning": 4
    },
    "StopPipeline": {
    },
    "DynamoAlignmentProject": {
        "classes": 1,
        "iterations": 1,
        "use_noise_classes": false,
        "swap_particles": false,
        "use_symmetrie": false,
        "selected_classes": [3],
        "binning": 4,
        "threshold": 0.8
    },
    "DynamoAlignmentProject": {
        "classes": 1,
        "iterations": 1,
        "use_noise_classes": false,
        "swap_particles": false,
        "use_symmetrie": false,
        "selected_classes": [1,2],
        "binning": 2,
        "threshold": 0.9
    },
    "BinStacks":{
        "binnings": [1],
        "use_ctf_corrected_aligned_stack": false,
        "run_ctf_phaseflip": true
    },
    "Reconstruct": {
        "reconstruct": "unbinned"
    },
    "DynamoAlignmentProject": {
        "classes": 1,
        "iterations": 1,
        "use_noise_classes": false,
        "swap_particles": false,
        "use_symmetrie": false,
        "selected_classes": [1,2],
        "binning": 1,
        "threshold": 1
    }
}

or download it here

HIV1 (EMPIAR-10164)

As the second data set to showcase the capabilities of TomoBEAR we have chosen the HIV-1 data set with the number 10164 in the EMPIAR database.

You can get the data set from here. In our case we use just the tomograms with the numbers 1, 3, 26, 28, 37 of the data and achieve 5.4Å in resolution with ~15.5k particles which is by now 1.5Å less than the resolution achieved by the original researchers.

After downloading the data extract it in a folder of your choice. One thing one should note about this data is that it is raw data. It is in the original form you acquire it from the microscope by SerialEM.

Following processing steps need to be applied to get tomograms

  • the data needs to be motion corrected
  • the tilt stacks need to be assembled assembled
  • the tilt stacks need to be aligned
  • the tomograms need to be reconstructed
Clone this wiki locally