Link objects
The basic building block of an analysis pipeline is a Link object. In general
a Link is a single application that can be called from the command line.
The fermipy.jobs package imlements five types of Link objects, and the idea
is that users can make sub-classes to perform the steps of their analysis.
Every link sub-class has a small required header block, for example:
class AnalyzeROI(Link):
"""Small class that wraps an analysis script.
This particular script does baseline fitting of an ROI.
"""
appname = 'fermipy-analyze-roi'
linkname_default = 'analyze-roi'
usage = '%s [options]' % (appname)
description = "Run analysis of a single ROI"
default_options = dict(config=defaults.common['config'],
roi_baseline=defaults.common['roi_baseline'],
make_plots=defaults.common['make_plots'])
__doc__ += Link.construct_docstring(default_options)
The various pieces of the header are:
appname This is the unix command that will invoke this link.
linkname_default This is the default name that links of this type will be given when then are put into analysis pipeline.
usage, description These are passed to the argument parser and used to build the help string.
default_options This is the set of options and default values for this link
The __doc__ += Link.construct_docstring(default_options) line ensures that the default options will be included in the class’s docstring.
Link sub-classes
There are five types of Link sub-classes implemented here.
LinkThis is the sub-class to use for a user-defined function. In this case in addition to providing the header material above, the sub-class will need to implement the run_analysis() to perform that function.
def run_analysis(self, argv): """Run this analysis""" args = self._parser.parse_args(argv) do stuff
GtlinkThis is the sub-class to use to invoke a Fermi ScienceTools gt-tool, such as gtsrcmaps or gtexcube2. In this case the user only needs to provide the header content to make the options they want availble to the interface.
AppLinkThis is the sub-class to use to invoke a pre-existing unix command. In this case the user only needs to provide the header content to make the options they want availble to the interface.
ScatterGatherThis is the sub-class to use to send a set of similar jobs to a computing batch farm. In this case the user needs to provide the standard header content and a couple of addtional things. Here is an example:
class AnalyzeROI_SG(ScatterGather): """Small class to generate configurations for the `AnalyzeROI` class. This loops over all the targets defined in the target list. """ appname = 'fermipy-analyze-roi-sg' usage = "%s [options]" % (appname) description = "Run analyses on a series of ROIs" clientclass = AnalyzeROI job_time = 1500 default_options = dict(ttype=defaults.common['ttype'], targetlist=defaults.common['targetlist'], config=defaults.common['config'], roi_baseline=defaults.common['roi_baseline'], make_plots=defaults.common['make_plots']) __doc__ += Link.construct_docstring(default_options) def build_job_configs(self, args): """Hook to build job configurations """ job_configs = {} ttype = args['ttype'] do stuff return job_configs The job_time class parameter should be an estimate of the time the average job managed by this class will take. That is used to decided which batch farm resources to use to run the job, and how often to check from job completion. The user defined function build_job_configs() function should build a dictionary of dictionaries that contains the parameters to use for each instance of the command that will run. E.g., if you want to analyze a set of 3 ROIs, using different config files and making different roi_baseline output files, build_job_configs should return a dictionary of 3 dictionaries, something like this: .. code-block:: python job_configs = {"ROI_000000" : {config="ROI_000000/config.yaml", roi_baseline="baseline", make_plts=True}, "ROI_000000" : {config="ROI_000000/config.yaml", roi_baseline="baseline", make_plts=True}, "ROI_000000" : {config="ROI_000000/config.yaml", roi_baseline="baseline", make_plts=True}}
ChainThis is the sub-class to use to run multiple
Linkobjects in sequence.For
Chainsub-classes, in addtion to the standard header material, the user should profile a map_arguments() method that builds up the chain and sets the options of the componentLinkobjects using the _set_link() method. Here is an example:def _map_arguments(self, input_dict): """Map from the top-level arguments to the arguments provided to the indiviudal links """ config_yaml = input_dict['config'] config_dict = load_yaml(config_yaml) data = config_dict.get('data') comp = config_dict.get('comp') sourcekeys = config_dict.get('sourcekeys') mktimefilter = config_dict.get('mktimefilter') self._set_link('expcube2', Gtexpcube2wcs_SG, comp=comp, data=data, mktimefilter=mktimefilter) self._set_link('exphpsun', Gtexphpsun_SG, comp=comp, data=data, mktimefilter=mktimefilter) self._set_link('suntemp', Gtsuntemp_SG, comp=comp, data=data, mktimefilter=mktimefilter, sourcekeys=sourcekeys)
Using Links and sub-classes in python
The main aspect of the Link python interface are:
Building the
Linkand setting the parameters. By way of example we will build aLinkof typeAnalyzeROIand configure it do a standard analysis of the ROI using the file ‘config.yaml’ write the resulting ROI snapshot to ‘baseline’ and make the standard validation plots.link = AnalyzeROI.create() link.update_args(dict(config='config.yaml', roi_baseline='baseline', make_plots=True))
Seeing the equivalent command line task
link.formatted_command()
Running the
Link:link.run()
Seeing the status of the
Link:link.check_job_status()
Seeing the jobs associated to this
Link:link.jobs
Setting the arguments used to run this
Link:link.update_args(dict=(option_name=option_value, option_name2=option_value2, ...))