Cluster Specification¶

When running on cluster, BioQueue corporates with the original DRMs and associates them to allocate proper resources for jobs. In this page, we will introduce:

How to use BioQueue in cluster
How to develop new cluster plugins for BioQueue
Problems you may encounter with

How to use BioQueue on clusters¶

The usage of BioQueue on clusters is identical to local or clouds. The only thing you need to do extraly is to tell BioQueue you are using a cluster. Here is the protocol.

Install BioQueue following the protocol mentioned before.
Run BioQueue web server.
Login to BioQueue and open Settings.
Click Cluster Settings in the page and fill in the form. By default, the value for Cluster engine is Run on local / cloud and all options for clusters are disabled. Once you choose a cluster engine (For example, TorquePBS), the cluster model for BioQueue will be activated. To turn it off, change cluster engine back to Run on local / cloud.
Click Save changes to save your changes.
Start the queue.

In Cluster Settings section, we provide some options. Here is a more detailed explanation for them.

Option	Default	Description
CPU cores for single job	1	Specify the number of virtual processors (a physical core on the node or an “execution slot”) per node requested for this job.
Physicial memory	No limit	Maximum amount of physical memory used by any single process of the job.
Virtual memory	No limit	Maximum amount of virtual memory used by all concurrent processes in the job.
Destination	Default server	Defines the destination of the job. The destination names a queue, a server, or a queue at a server.
Wall-time	No limit	Maximum amount of real time during which the job can be in the running state.

For example, when BioQueue submits job on a cluster managed by TorquePBS, the options defined above will be translated into Torque parameters like this:

-l ppn: CPU cores BioQueue predicts the job will take, if the prediction model has not been generated, the ppn option is equal to CPU cores for single job.
-l mem: The physical memory BioQueue predicts the job will use, if the prediction model has not been generated, the mem option is equal to Physicial memory.
-l vmem: The virtual memory BioQueue predicts the job will use, if the prediction model has not been generated, the vmem option is equal to Virtual memory.
-q: The destination defined in Destination, for example, if the cluster has five queues: high, middle, low, FAT_HIGH and BATCH, the destination should be one of them.
-l walltime: Maximum amount of real time during which job can be in the running state defined in Wall-time. For example, 24:00:00.

How to develop new cluster plugins for BioQueue¶

The support for clusters depends on the corresponding plugins. All python files in the folder worker>>cluster_models will be seen as a plugin. BioQueue can work with these plugins directly and users do not need to modify any code of BioQueue. However, limited by the clusters we can access, BioQueue now just provides build-in plugins for TorquePBS (carefully tested under production environment) and HTCondor (not tested under production environment). So we hope our users who have the privilege to access other types of DRMs can involve in the development of cluster plugins. Here we provide a detailed documentation for developing new plugins for BioQueue.

1. Coding conventions¶

Required: All plugins files are written in Python or at least provide a wrapper written in Python. The file name should be identical to the DRM’s name and other supplementary files should use the DRM’s name with different suffix. Take TorquePBS for example, the plugin file should be named as TorquePBS.py and if you use an extra file as the template for job script, the name of it should be TorquePBS.tpl.
Suggested: Function name or variable name should follow the style guide for Python code stated in PEP 8. In brief, both function name and variable in function should be lowercase.
Suggested: Two blank lines between two function block are expected.

2. Functions should be implemented in the plugin¶

The plugin must provide three functions for BioQueue to call. When you develop a new plugin, REMEMBER TO FOLLOW THE PARAMETER LIST WE PROVIDE BELOWE!!

submit_job¶

This function enables BioQueue to submit a job to the cluster via a DRM. The prototype of the function is:

submit_job(protocol, job_id, job_step, cpu=0, mem='', vrt_mem='', queue='', log_file='', wall_time='', workspace='')

param wall_time:
param protocol:	string, the command a job needs to run, like “wget http://www.a.com/b.txt”
param job_id:	int, job id in BioQueue, like 1
param job_step:	int, step order in the protocol, like 0
param cpu:	int, cpu cores the job will use
param mem:	string, allocated physical memory, eg. 64G.
param vrt_mem:	string, allocated virtual memory, eg. 64G.
param queue:	string, job queue
param log_file:	string, path to store the log file
	string, cpu time
param workspace:
	string, the initial directory of the job, all output files should be stored in the folder, or the users will not be able to see them
return:	int, if success, return job id in the cluster, else return 0

Note: BioQueue will assign ‘’ to both mem and vrt_mem if the user doesn’t define the max amount of pyhsical memory or virtual memory a job can use and there are no prediction model to predict the amount of resource the job will occupy.

Note: protocol here is just a single step in the protocol defined in BioQueue

query_job_status¶

This function provides BioQueue an interface to query the status of a job. The prototype of the function is:

query_job_status(job_id)

param job_id:	int, job id in the cluster
return:	int, job status

If the job has completed, the function should return 0. If the job is running, it should return 1. If the job is queuing, it should return 2. If an error occurs during the execution of a job, it should return a negative number.

cancel_job¶

The function allows BioQueue to terminate the execution of a job. The prototype of the function is:

cancel_job(job_id)

param job_id:	int, job id
return:	if success, return 1, else return 0

Problems you may encounter with¶

1. Install python 2.7 or high and pip without root privilege¶

Cluster users usually do not have root privilege, and the python installation may be out-of-date. So it may be hard for biologists to configure the python environment for BioQueue, here we provide a helper script in deploy/python_pip_non_root.sh. This shell script will download source code of Python 2.7.13 from Python.org and compile it on the machine. You can run the script by running:

cd deploy
chmod +x python_pip_non_root.sh
./python_pip_non_root.sh

If the compile process failed, you can download a pre-built binary from ActiveState straightforwardly. NOTICE: the pre-built binary from ActiveState cannot be used in production environment.

After installation, you can add the bin directory to your PATH environment variable for quicker access. For example, if you use the Bash shell on Unix, you could place this in your ~/.bash_profile file (assuming you installed into /home/your_name/bin):

export PATH=/home/your_name/bin:$PATH

then save the .bash_profile file and run:

source ~/.bash_profile

Now you should be able to run BioQueue with the new Python.

2. Cannot run BioQueue with sqlite on clusters¶

Before answer the question, we highly recommand that all users use MySQL rather than SQLite. When running BioQueue on a cluster with Network File System (NFS), you may get an error message like:

django.db.utils.OperationalError: disk I/O error

The reason for this error is that SQLite uses reader/writer locks to control access to the database, while those locks are unimplemented on many NFS implementations (including recent versions of Mac OS X). So the only solution is to use a database software like MySQL.