Skip to content
This repository was archived by the owner on Apr 26, 2023. It is now read-only.

Creating a Protocol

Pablo Conesa edited this page Dec 2, 2015 · 13 revisions
Scipion Logo

In Scipion we define Protocol as a processing task that involves the execution of several steps. Each step can execute Python code or call external programs to perform specific sub-tasks. By designing a new protocol, we should provide a clear definition of the protocol inputs and outputs. The developer of a protocol also need to take care of needed conversions between Scipion-objects to the programs files and parameters. Moreover, the results of the protocol execution should be registered back as output in the form of Scipion-objects.

We are going to use a 2D classification protocol (maximum likelihood in Xmipp) as an example to ilustrate the development of a new protocol. This small guide will cover the basics to create a new protocol. In each section we will provide links to more detailed information when needed.

Protocol Definition

Overview

The first step when developping a protocol is to select the protocol class name. In this case it is XmippProtML2D, following the convention that all Xmipp protocol names will start by XmippProt (we recommend a similar approach to name protocols from other EM packages). In this case our protocol inherits from a base class ProtClassify2D which reflects the operation that this protocol performs.

The Python documentation string following the protocol class line will serve as a help for users. It is very useful to provide a short but descriptive help message that will help users to quickly have an idea of the protocol. If the protocol define a _label class property ('ml2d' in this example) it will be used as a label to display the protocol in menus. If not provided, the protocol class name will be used, but this name is probably less meanful to final users.

The protocol initialization function should receive the keyword-arguments ( *kwargs* ). The arguments should also be passed to the base class initialization method. This function is the right place to make variables initialization or similar things.

class XmippProtML2D(ProtClassify2D):
    """
    Perform (multi-reference) 2D-alignment using
    a maximum-likelihood ( *ML* ) target function.

    """
    _label = 'ml2d'

    def __init__(self, **kwargs):
        pass

    #--------------------------- DEFINE param functions --------------------------------------------

    def _defineParams(self, form):
        pass

    #--------------------------- INSERT steps functions --------------------------------------------

    def _insertAllSteps(self):
        pass

    #--------------------------- STEPS functions --------------------------------------------

    def convertInputStep(self):
        pass

    def runMLStep(self, params):
        pass

    def createOutputStep(self):
        pass

    #--------------------------- INFO functions --------------------------------------------

    def _validate(self):
        return []

    def _citations(self):
        return []

    def _summary(self):
        return []

    def _methods(self):
        return []

    #--------------------------- UTILS functions --------------------------------------------

    ...

The code above illustrate the skeleton of a protocol class. There are five main parts in the code:

  • Parameters definition: we should define all the parameters that will appears in the GUI and will be attributes of the protocol instance.

  • Steps list: prepare the list of steps that will be executed in order to get the protocol job done.

  • Steps functions: this functions will contains the code that will be executed (Python code or call to external programs)

  • Validation and info functions: this functions will decorate the protocol class by providing parameters validation and some useful information to the user.

  • Other utils functions: this section vary from protocol to protocol, it will contains helper functions to be used through the protocol code.

In the following sections we are going to explain better each of these parts in order to develop a fully functional protocol.

Defining input params

In the _defineParams(form) method the protocol’s form will be populated with the input parameters (which also will be rendered in a graphical way). All these parameters will be available as protocol’s attribute that can be used in the protocol steps.

All the parameters should have an unique name inside the protocol and also a type. There are two groups of parameters:

  • Simple parameters: which contains basic input parameters types.

    • StringParam: just a basic string input (a textbox in the GUI)

    • FloatParam: floating point input value (a textbox in the GUI, but should have a floating point format)

    • IntParam: a integer number (a textbox in the GUI, but should have an integer format)

    • BooleanParam: is a boolean value or True or False (a Yes/No question in the GUI)

    • EnumParam: it is also an integer input, but with a small number of possible choices (a combobox or a list in the GUI)

  • Complex parameters:

    • PointerParam: this param will serve to select objects from the database (a textbox with a search button in the GUI)

    • RelationParam: similar to PointerParam, but will select relations instead of objects (mainly used for CTF browsing)

    • ProtocolClassParam: similar to PointerParam, but will select protocol classes (used for Workflows, under development)

Parameters can be added with the form.addParam(paramName, paramClass, kwargs) method. paramClass should be one of the classes listed above and the kwargs are passed to the constructor. Valid options in the **kwargs dictionary are:

  • default : default param value

  • condition : an string representing an expression (which values are sustitued later) which conditionate if the param appears or not.

  • label : a label message that will be displayed in the GUI

  • help : usually a more extended help message that will pop-up after clicking in a help icon.

  • choices : a list of strings with the display values for the combobox ( Only valid for EnumParam)

  • display : can be _EnumParam.DISPLAYLIST or _EnumParam.DISPLAYCOMBO, and defined the preferred display mode for GUI.( Only valid for EnumParam)

  • pointerClass : this is the class of the objects that will be selected from the database ( Only valid for PointerParam)

  • pointerCondition : this is a string expression to filter the selected objects from the database (such as aligned=True, Only valid for PointerParam)

  • allowsNull : a boolean. If true this parameter is not required ( Only valid for PointerParam)

To improve the organization of the input parameters, they can be grouped into sections, groups or lines.

  • Section: The function form.addSection(sectionName) will create a new section (that will be visualized as a new tab in the GUI) and all further calls to form.addParam will add parameters to that section.

  • Group: The function form.addGroup(groupName) will return a Group object that can also add parameters to it. The group will be displayed as a labeled frame in the GUI.

  • Line: Another way of grouping is through form.addLine(lineLabel) that will return a Line object that can also contains other parameters. It will just display those parameters in the same row.

    def _defineParams(self, form):
        form.addSection(label='Params')
        group = form.addGroup('Input')
        group.addParam('inputParticles', PointerParam, pointerClass='SetOfParticles',
                       label="Input particles", important=True,
                       help='Select the input images from the project.')
        group.addParam('doGenerateReferences', BooleanParam, default=True,
                      label='Generate references?',
                      help='If you set to *No*, you should provide references images'
                           'If *Yes*, the default generation is done by averaging'
                           'subsets of the input images. (less bias introduced)')
        group.addParam('numberOfReferences', IntParam, default=3, condition='doGenerateReferences',
                      label='Number of references:',
                      help='Number of references to be generated.')
        group.addParam('inputReferences', PointerParam, condition='not doGenerateReferences',
                      label="Reference image(s)",
                      pointerClass='SetOfParticles',
                      help='Image(s) that will serve as initial 2D references')

        form.addParam('doMlf', BooleanParam, default=False, important=True,
                      label='Use MLF2D instead of ML2D?')

        group = form.addGroup('ML-Fourier', condition='doMlf')
        ...
        form.addParallelSection(threads=2, mpi=4)

The line form.addParallelSection(threads=2, mpi=4) specify the number of threads and MPI that will be used by default in this protocol. If not set, both thread and MPI are equal to 1. Setting thread or MPI with a 0 value here, it will mean that it is not possible to use it and will be hidden in the GUI. More about the parallelization of protocols can be found in Developers - Protocol Parallelization.

The above definition will generate a desktop GUI as shown in the following figure:

ml2d form

Defining Steps

Another important function is _insertAllSteps, in which the steps that will be executed are defined. This function is only invoked before a protocol start to run and the following actions take place: * The method protocol.run() is called * The protocol._insertAllSteps() is called and a list of steps is populated (depending on the current parameters selection) * The steps list is compared with previous steps lists in the database (if exists a previous execution) and, * If in mode RESUME, it will try to continue from the last steps that was completed sucessfully. (In mode RESTART it will start from the first step and output directory is cleaned)

It is important to note that in the _insertAllSteps function should not be performed any computing task (this should be done in the steps, read next section). This place is only to DEFINE what to be done and not do it at this moment.

The Step class represent the smallest execution unit that composes a Protocol. The most used sub-classes of Step are:

  • FunctionStep : this type of step is inserted using the function protocol._insertFunctionStep. Any function accessible can be inserted (It could be a function of the protocol or an external function). The changes in the parameters passed to the function are used to detect step changes, so even when sometimes it is not needed to pass certain parameters, it is useful to pass them for detecting the changes.

  • RunJobStep : this step wraps a call to an external program and builds the necessary command line arguments. It can be inserted using protocol._insertRunJobStep

In our example protocol, the _insertAllSteps function look like:

    def _insertAllSteps(self):
        self._insertFunctionStep('convertInputStep', self.inputParticles.get().getObjId())
        program = self._getMLProgram()
        params = self._getMLParams()
        self._insertRunJobStep(program, params)
        self._insertFunctionStep('createOutputStep')

This is a relatively simple case (but also common) where only three steps are inserted: convertInputStep, runJobStep, createOutputStep. In this case, the steps run in the same order that were inserted, but it is also possible to define a more complex dependency graph between steps that can be executed in parallel (through threads or MPI). You can read more about defining steps to be executed in parallel in Developers - Protocol Parallelization.

Even when a protocol runs its steps without parallelization, one particular step can take advantage of multiprocessor and use MPI or threads in a particular program command line.

Execution

Converting inputs

It is common that one of the first steps in a protocol is convertInputStep, which main task is to convert from input Scipion objects to files with the format that is adequate for running a particular program. In our example, we should convert the input SetOfParticles object into the metadata star file that is required by all Xmipp programs that operates on particles. In this classification protocol, it is also possible to provide a set of references images. This is also taken into account in the convertInputStep function and also write a metadata for the references if needed.

    def convertInputStep(self, inputId):
        """ Write the input images as a Xmipp metadata file. """
        writeSetOfParticles(self.inputParticles.get(), self._getFileName('input_particles'))
        # If input references, also convert to xmipp metadata
        if not self.doGenerateReferences:
            writeSetOfParticles(self.inputReferences.get(), self._getFileName('input_references'))

The writeSetOfParticles function iterates over each individual image in the input SetOfParticles and add a line to a valid STAR file using the Xmipp MetaData class in Python. With the same logic any other file format could be generated when writting a convertInputStep function. Read more about iterating over a SetOfParticles and querying its attributes in Developers - Using Sets.

Executing programs

The second step function in this example is a runJobStep. In this case the program is xmipp_ml_align2d (or mlf in fourier case). The command line argument for calling the program is prepared in the _getMLParams function.

    def _getMLParams(self):
        """ Mainly prepare the command line for call ml(f)2d program"""
        params = ' -i %s --oroot %s' % (self._getFileName('input_particles'), self._getOroot())
        if self.doGenerateReferences:
            params += ' --nref %d' % self.numberOfReferences.get()
            self.inputReferences.set(None)
        else:
            params += ' --ref %s' % self._getFileName('input_references')
            self.numberOfReferences.set(self.inputReferences.get().getSize())

        ...

        if self.doMirror:
            params += ' --mirror'

        if self.doNorm:
            params += ' --norm'

        return params

As you can see, this function will concatenate the arguments passed to the program in command line. The arguments will vary depending in the current selection of input parameters in the Scipion GUI. The same approach can be followed when executing a program from any other software package.

If we take a look at the output logs files after executing this protocol, we can see a command line similar to the following:

mpirun -np 2 -bynode `which xmipp_mpi_ml_align2d`
-i Runs/000194_XmippProtML2D/tmp/input_particles.xmd
--oroot Runs/000194_XmippProtML2D/ml2d_ --ref Runs/000194_XmippProtML2D/tmp/input_references.xmd
--fast --thr 2 --iter 3 --mirror

Creating outputs

At the end of a protocol execution we want to register the results in the Scipion project. This is the function of the createOutputStep method. It is somehow the inverse operation of the convertInputStep. It should read the files produced by the protocol and create the Scipion objects that represent the output of the protocol. It should also define the relations between the newly created output objects and the inputs.

In our case, the result of the protocol is a SetOfClasses2D, which is created by the following code:

    def createOutputStep(self):
        imgSet = self.inputParticles.get()
        classes2DSet = self._createSetOfClasses2D(imgSet)
        readSetOfClasses2D(classes2DSet, self._getFileName('output_classes'))
        self._defineOutputs(outputClasses=classes2DSet)
        self._defineSourceRelation(imgSet, classes2DSet)
        if not self.doGenerateReferences:
            self._defineSourceRelation(self.inputReferences.get(), classes2DSet)

Here the job is done in the functions _createSetOfClasses2D and readSetOfClasses2D. The first one just create an empty set of classes while the second is specific to Xmipp and populate the set reading the classes information from the Xmipp metadata outputs (STAR files). More information about creating Scipion sets objects can be found in Developers - Using Sets.

Additional functions

There are some functions that not are strictly required when implementing a protocol. Nevertheless, they can provide useful information to the final user. All these functions will return a list of strings, which meaning is different in each case.

Validate and warnings

The _validate and _warnings methods will be called just before a protocol is executed. Both could return a list of string messages, meanning that are some errors (or possible errors) in the input parameters. If the returned list is empty means that everything is fine and the protocol can run. The _warnings will show the messages to the user but give it the choice to continue or not. If there are errors from the _validate, the protocol will not run. This can save time to users because prevent simple errors that can be critical for the protocol to run properly.

In our example the _validate function is very simple. It check that the input particles have CTF estimation if using the maximum likelihood in the fourier space. The _warnings method can be implemented in a similar way.

    def _validate(self):
        errors = []
        if self.doMlf:
            if not self.inputParticles.get().hasCTF():
                errors.append('Input particles does not have CTF information.\n'
                              'This is required when using ML in fourier space.')
        return errors

Citations, summary and methods

The _citations function is the way to provide references to the methods used in the protocols. The returned list should contains the keys of the citation reference. All the references for an specific software package are listed in bibex format in a file called bibtex.py. Read more about this file in Developers - How to add a new package.

If this case there is a reference for the whole protocol and some extra references are added depending if some variants are activated. The citation will be displayed in the GUI as links to each publication. They can be shown using the cite icon from the protocol header in the form GUI or in the project windows in the Methods tab of the selected protocol.

    def _citations(self):
        cites = ['Scheres2005a']

        if self.doMlf:
            cites.append('Scheres2007b')

        elif self.doFast:
            cites.append('Scheres2005b')

        if self.doNorm:
            cites.append('Scheres2009b')

        return cites

The _summary function should provide a quick overview a particular protocol execution. It should check whether the protocol has not finished its execution yet or, when finished, it has to provide some brief information about the steps performed, outputs, quality or any other relevant information.

    def _summary(self):
        summary = []
        summary.append('Number of input images: *%d*' % self.inputParticles.get().getSize())
        summary.append('Classified into *%d* classes' % self.numberOfReferences.get())

        if self.doMlf:
            summary.append('- Used a ML in _Fourier-space_')
        elif self.doFast:
            summary.append('- Used _fast_, reduced search-space approach')

        if self.doNorm:
            summary.append('- Refined _normalization_ for each experimental image')

        return summary

The methods function should be implemented in a similar way than _summary but providing a more descriptive information of the execution. The text should be adequate to be used as a template when writting a _Materials and methods section of a paper.

Extra actions

Probably this actions should be done not at the end but while developing and testing your protocol.

Make the protocol available

If you want that your protocol appears (and probably you will) in the project GUI to be used, maybe you need to do some configuration setup. The protocols classes that are available in Scipion are discovery dynamically using Python reflection tools. So, when a new protocol class is added, it is automatically available to the whole system. The configuration is needed if you want that you protocol appears in an specific position in the protocols tree (in the left pane of the projects GUI).

The appeareance of this tree is specified in the configuration file ~/.config/scipion/menu.conf. This file contains the tree structure, but not listing all protocols. There are slots that are set as 'protocol_base' which tag protocol base classes. All protocols that inherits from that base class will be added to this point in the tree. So, if you new implemented protocol inherits from a base classes that is already in the configuration file no additional actions are required. If not, you can edit menu.conf file and add an entry for your protocol (or for its base class if you expect more protocol there) and execute:

 scipion config

Implement a viewer

The Viewer class is the base for implement visualization of different kind of objects. The same apply for visualizing protocols. The viewers are also discovered dynamically as the protocols. They should specify a _target property with a list of the object classes that this viewer is able to handle.

The details for developing a new viewer are described in Developers - How to develop Viewers

Writing Tests for your Protocol

Writting tests is the best way to develop from the beginning. It will help to cover different use cases of your functions (or protocols in this case). It they are run automatically, they will help to detect bugs introduced in future changes.

Here is the test for this protocol:

class TestXmippML2D(TestXmippBase):
    """This class check if the protocol to classify with ML2D in Xmipp works properly."""
    @classmethod
    def setUpClass(cls):
        setupTestProject(cls)
        TestXmippBase.setData('mda')
        cls.protImport = cls.runImportParticles(cls.particlesFn, 3.5)

    def test_ml2d(self):
        print "Run ML2D"
        protML2D = self.newProtocol(XmippProtML2D,
                                   numberOfReferences=2, maxIters=3,
                                   numberOfMpi=2, numberOfThreads=2)
        protML2D.inputParticles.set(self.protImport.outputParticles)
        self.launchProtocol(protML2D)

        self.assertIsNotNone(protML2D.outputClasses, "There was a problem with ML2D")
Clone this wiki locally