Summary
This case study shows the use of EnFuzion with MODELLER, a popular protein modeling package. EnFuzion supports the execution of a single program with multiple input parameters. Bioinformatics programs are perfect applications to use with EnFuzion; they are executed many times with different sets of input parameters. EnFuzion saves time and simplifies the execution:
- It speeds up the execution of jobs by distributing the computational load over a network of workstations;
- It manages jobs by providing an easy-to-use graphical interface for generating the input parameters and for controlling the execution;
- It works with little or no code change requirements for MODELLER and most other popular bioinformatic applications.
What is MODELLER?
MODELLER is a protein modeling package; it’s used for homology or comparative modeling of the three-dimensional structure of protein. You can also use MODELLER to perform the following tasks:
- multiple comparison of protein sequences, structures or both,
- clustering,
- searching of sequence databases.
MODELLER was used to model the brain lipid-binding protein shown in the following image. (The image is provided courtesy of Roberto Sánchez from the Rockefeller University.) The structural model was calculated using the similarity between the brain lipid-binding protein and other proteins from the family of fatty binding proteins. A more complete description of the protein can be found here.
How did using EnFuzion help the development of MODELLER?
Because EnFuzion handles the execution of a large number of jobs over multiple computers, it simplifies and speeds up computational experiments with MODELLER. EnFuzion was easy to use with existing programs because no code changes were required.
How was EnFuzion used?
To use MODELLER with EnFuzion, two files were required: a template file for MODELLER and a plan file for EnFuzion. The template file contains input parameters. The plan file describes how to run MODELLER; it also describes the necessary input and output files. EnFuzion was installed on all computers in the network. The installations were accomplished easily with the EnFuzion installation program. No root privileges are required. MODELLER uses a template file, which contains input parameters. The file needed minimal changes to be used with EnFuzion. The developer needed to replace the actual values of parameters whose values vary during the execution with an EnFuzion parameter. The following example shows these parameters: RAND_SEED, STARTING_MODEL and ENDING_MODEL. This template file for EnFuzion has been provided courtesy of Dr. Andrej Sali, the principal developer of MODELLER:
# Testing homology modeling by the MODELLER TOP routine ‘model’.
INCLUDE
SET ALNFILE = ‘alignment.ali’
SET KNOWNS = ‘5fd1’
SET SEQUENCE = ‘1fdx’
SET ATOM_FILES_DIRECTORY = ‘./../atom_files’
SET RAND_SEED = $rand_seed
SET STARTING_MODEL = $starting_model
SET ENDING_MODEL = $ending_model
CALL ROUTINE = ‘very_fast’
CALL ROUTINE = ‘model’
Parameters RAND_SEED, STARTING_MODEL and ENDING_MODEL contain ${parameter_name} instead of real numbers. At runtime, EnFuzion replaces these parameters with actual parameter values.
The plan file for EnFuzion specifies data that are necessary to execute MODELLER on a network of computers. These data include actual values for the RAND_SEED, STARTING_MODEL and ENDING_MODEL parameters, input files, a command to execute MODELLER, and output files. You can create a plan file using the intuitive graphical user interface or your favorite standard editor.
The following file is the complete plan file to use with the template file:
# this line specifies that MODELLER should be run 500 times parameter starting_model integer range from 1 to 500 step 1; # the value of ending_model should be starting_model+5 parameter ending_model integer compute starting_model+5; # a random value should be generated for each run parameter rand_seed integer random from -50000 to -1000; # 5fd1.atm, alignment.ali, model-fast.tmpl are required input files # so they are copied to each machine before the MODELLER is run task nodestart copy 5fd1.atm node:. copy alignment.ali node:. copy model-fast.tmpl node:. endtask task main # parameters in the template file are replaced with real values node:substitute model-fast.tmpl model-fast_$starting_model.top # MODELLER is executed node:execute mod4 model-fast_$starting_model # output files are copied back to the main computer copy node:model-fast_$starting_model.top . copy node:model-fast_$starting_model.log . copy node:1fdx.B* . copy node:1fdx.D* . copy node:1fdx.rsr . endtask
Using this plan file, EnFuzion generates 500 jobs and executes them on a network of computers. EnFuzion distributes the jobs and collects the results transparent to the user. By using multiple computers, EnFuzion reduces the execution time significantly.
Further Information
This case study describes some basic capabilities of EnFuzion. You can find out more about using EnFuzion from the EnFuzion User Manual.