-
Notifications
You must be signed in to change notification settings - Fork 48
Architecture

Scipion is an image processing framework to obtain 3D models of macromolecular complexes using Electron Microscopy(3DEM) . It is designed to integrate several software packages in the field and present an unified interface for both biologists and developers. Scipion allow to execute workflows combining different software tools, while taking care of formats and conversions. Additionally, all steps are tracked and can be reproduced later on. The following diagram illustrate an overview of the Scipion framework.

Scipion design and implementation is driven by its main goals:
-
Integration: Scipion should provide a common framework where different EM software packages can be used in the same project. The needed conversions (to move from one package to another) will run behind the scenes without bothering users about it.
-
Reproducibility: It is very important to allow the reproducibility and validation of published results. In that sense, Scipion will tracks the whole processing (storing all parameters used and the workflow done). In that way, it will be possible to repeat a previous processing project starting from the original input data.
-
High Throughput: With the development of new devices, some steps are becoming more automated such as data acquisition. Scipion should be able to execute workflows in an automated manner, that will allow a better integration with acquisition systems.
-
Easy to use: Changing to a new system always requires some effort to the users, that’s one of the reasons why they usually tend to stay in their known environment. Scipion should not be more complex to use than existing software packages while providing extra functionality. We have care about graphical interfaces in order to improve the user experience.
-
Extensible: Scipion should also be a framework to developers, where they can find some tools that facilitate the integration of new protocols. In order to succeed, we must engage developers from other groups to collaborate with us. We have tried to lower the learning curve to add new protocols, even for people not with a pure computer science background.
Some desired features are:
-
use existing file formats: No new image format should be introduced. The framework should use the existing formats and ensure that the input for each operation is in the proper format. (The input should be converted if needed)
-
be workflow based: Each operation (protocol) should have a set of well-defined inputs and outputs. This will allow to create a pipeline of logical operations following each other.
-
share data and backup: Users should be able to work in the same project and share results with collaborators. The system should also provide capabilities to perform intelligent incremental backups (only those files differing from the previous backed-up state).
-
be maintainable: The framework should be written in a modular way, divided by functional components. Each component should have a clear definition of its functionality and the how will interact with other components. The design should allow to replace some particular component without affecting the whole system.
The following diagram represents the main components that build up Scipion. In the next subsections each component will be explained separately.
The Graphical User Interface (GUI) is an important part of each application. Having a nice and easy-to-use GUI will facilitate the final users to focus on their tasks. In Scipion, the GUI will present a more intuitive and consistent way to launch programs and analyze the results than dealing directly with the command line. We have develop two GUI for interacting with Scipion: (1) For desktop, through the Python-Tkinter library and, (2) Web-based, developed with the Python Django framework.
There are five main pieces of GUI’s:
-
Manager window: here the user can see the list of all projects and also select, create or delete projects. See screenshot
-
Project window: probably this is the windows where the users will spend more time. Here the users will launch new protocol executions (runs) and manage the existing ones. For a selected run more information is available such as the input/output, the protocol citations and the log files produced by the execution. It is possible to display the runs in both a list view and a tree view, where the relations between each run are better represented. See screenshot
-
Form window: This is the second more important windows, which is dynamically generated for each protocol from the parameters definition. This is an advantage for developers when creating new protocols, since they only need to care about defining the input parameters and not about GUI programming. See screenshot
-
Wizards: The "wizards" are simple and specific GUI’s that assist users in selecting some of the parameters. This normally saves user’s time, since he/she can have a better idea of how the results will look like before launching the whole job. See screenshot
-
Data viewers: Visualization of data is essential to analyze the results. In Scipion we have re-used some of the tools developed in the existing EM software packages to visualize data, such as _xmippshowj and Eman picking. On the other hand, we have developed a web tool similar to _xmippshowj, which is very useful for displaying tables, galleries of images or volumes. See screenshot
The work inside Scipion is organized into Projects. Each project has
its own folder (inside the $SCIPION_USER_DATA/projects
directory)
and a separated database (project.sqlite
). Each time the user
execute a new protocol inside the project, it is registered as a new
run in the database. In that way, the user can check at any time what
are the operations done so far and the exact parameters used in each
step. Each run also contains its own folder (named as the protocol class
name and the run id) where all the outputs of the runs should be placed.
Scipion will have a User management system. Users will have permission to perform some operations on each project. It will also be possible to define Groups of users, when some properties and roles can be centralized.
In order to model the EM domain (i.e. data entities and relations
involved) we have created a basic object model. (base classes are
defined in $SCIPION_HOME/pyworkflow/object.py
). Object is the
base class of all objects used in the model. The are two main types of
objects: Scalars and Compounds. The first type is for those objects that
hold a single value (such as String, Integer, Float, Boolean…). The
compound objects are those who contain other objects. This basic model
provides a layer to wrap the basic Python types and also facilitate the
development of the Mapper layer to automatically store any type of
object (derived from Object).
On top of this base, the objects related to EM were developed. Again, we have two main types of objects: Data and Protocols. Data objects are the inputs/outputs of the operations and they hide the underlying files and formats used by each EM package. Examples of Data objects are: Image, Micrograph, Volume, CTF, SetOfImages, SetOfMicrographs..etc. Protocols are the wrappers to the logical operations (such as: Filtering, Alignment, Classification, Refinement…etc) which usually involve the call to one or several command line programs. Protocols are in charge of making the needed conversions and preparing files for calling the programs. A protocol execution can be structured in different steps, which are more atomic operations, that can be resumed if the whole process stops for any reason.
The diagrams of Data and Protocols objects can be found here
The object model is thought to make it easy to integrate the wrappers
for the existing software packages. There is a folder
($SCIPION_HOME/pyworkflow/em/packages/
) that contains the
integrated packages. The idea is that this folder contains submodules,
one per package. For adding a new package, we need to create a folder
with the following structure (not all files are mandatory):
-
_ init _.py: Python required file for a submodule. In this file we should import the
-
bibtex.py: define the references to the package and protocols in a bibtex string.
-
constants.py: definition of constants needed for the protocols.
-
convert.py: functions to convert from Scipion objects to files valid to programs. Also convert from results to Scipion.
-
viewer.py: create tools to visualize results.
-
wizard.py: create the wizards used.
-
protocol*.py: we recommend to prefix with protocol all protocols implemented in the package.
The Mapper layer is in charge of storing and retrieving objects. Our main requirement for this module is to avoid a very complex database schema that will be very hard to maintain and extend. Since we aim for an easy integration of new packages and protocols, the Mapper will keep the developer from dealing directly with databases or other type of storage. It will provide an interface for storing, updating and retrieving objects while hiding the implementation details and the underlying storage.
Currently we have implemented two mappers base on sqlite
($SCIPION_HOME/pyworkflow/mapper/sqlite.py
). One of them is
designed to store objects relations and easily insert new objects
without the needs of creating new SQL tables. Actually, there are only two
tables: Objects and Relations. The first one store one row for
each Scalar object and several rows for Compound ones. The other
mapper implemented so far (although not so much used) is base on xml,
this is less efficient for querying and iteration, but very convenient
for configuration files.
A more detailed explanation of the Mapper implementation can be found here
Another added value of Scipion is the configuration of execution hosts and environments. The idea is that for one project we can have a set of execution host, each one with their capabilities. Then, at the time of executing a Protocol, the user can choose the host that better fits the job needs. The data transfer to/from the execution host will be done by Scipion under demand, so the user will not need to manually copy/move files. Currently, we have implemented the execution logic related to launching jobs to queue. Since each queue configuration varies between different systems, Scipion allows to configure it for each execution host. The configuration of the queue and other settings should be done only once, while installing Scipion (probably with the help of system administrators) and the user will not longer need to setup and edit submission scripts.