.. _fermipy_jobs_multiple_ROIs:

Using fermipy.jobs to analyze multiple ROIS
===========================================

The fermipy.jobs sub-package includes a few scripts and tools
that can be used to parallelize standard analysis of region
of interest as well as both positive and negative control tests.


Overview
=========

This package implements an analysis pipeline to perform as standard analysis on 
multiple ROIs.  This involves a lot of bookkeeping and loops over various things. 
It is probably easiest to first describe this with a bit of pseudo-code that represents 
the various analysis steps.


The various loop variable are:

* rosters

    The list of all lists of targets to analyze.   This might seem a bit like overkill, 
    but it is a way to allow you to define multiple versions of the same target, e.g.,
    to test different models.

   
* targets

    The list of all the analysis targets.   This is generated by merging all the targets 
    from the input rosters.


* target.profiles

    The list of all the spatial profiles to analyze for a particular targeet.
    This is generated by finding all the versions of a target from all the input rosters.


* sims

    This is of all the simulation scenarios to analyze.  This is provided by the user.


* first, last

    The first and last seeds to use the random number genreator (for simulations), or the first and last random
    directions to use, (for random direction control studies).


.. code-block:: python

    # Initialization, prepare the analysis directories and build the list of targets	
    PrepareTargets(rosters)
  
    # Data analysis

    # Loop over targets
    for target in targets:
        AnalyzeROI(target)

        for profile in target.profiles:
            AnalyzeSED(target, profile)
            PlotCastro(target, profile)
	   

    # Simulation analysis

    # Loop over simulation scenarios
    for sim in sims:
  
        # Loop over targets
        for target in targets:
            CopyBaseROI(sim, target)
    
            for profile in target.profiles:
                SimulateROI(sim, target, profile) # This loops over simulation seeds and produces SEDS	
                CollectSED(sim, target, profile)
       

    # Random direction control analysis

    # Loop over targets
    for target in targets:
        CopyBaseROI(target)
        RandomDirGen(target)
     
        for profile in target.profiles:
            for seed in range(first, last)
                AnalyzeSED(target, profile, seed)

            CollectSED('random', target, profile)


Master Configuration File
=========================

fermipy.jobs uses `YAML <http://yaml.org/>`_ files to read and write its
configuration in a persistent format.  The configuration file has a
hierarchical structure that groups parameters into dictionaries that
are keyed to a section name.


.. code-block:: yaml

   :caption: Sample Configuration

   ttype: dSphs
   rosters : ['test']
   spatial_models: ['point']
   
   sim_defaults:
       seed : 0
       nsims : 20
       profile : ['point']

   sims:
       'null' : {}
       'pl2_1em9' : {}

   random: {}

   data_plotting:
       plot-castro : {}


Options at the top level apply to all parts of the analysis pipeline

.. code-block:: yaml
   :caption: Sample *top level* Configuration
                
     # Top level
     ttype : 'dSphs'     
     rosters : ['test']  
     spatial_models: ['point']     

* ttype: str
  Target tpye.  This is used for bookkeeping mainly, to give the name of the top-level directory, and to
  call out specfic configuration files.

* rosters: list
  List of rosters of targets to analyze.   Each roster represents a self-consistent set of targets.  Different
  versions of the same target can be on several different rosters.  But no target should appear on a single
  roster more than once.

* spatial_models: : list
  List of types of spatial model to use when fitting the DM.  Options are
  * point : A point source

  
.. note::  
  If multiple rosters include the same target and profile, that target will only be analyzed once,
  and those results will be re-used when combining each roster.   


Simulation configuration
------------------------

The *sim_defaults*, *sims* and *random* sections can be used to define
analysis configurations for control studies with simulations and
random sky directions.

.. code-block:: yaml
   :caption: Sample *simulation* Configuration

   sim_defaults:
       seed : 0
       nsims : 20
       profile : point

   sims:
       'null' : {}
       'pl2_1em9' : {}

   random: {}

   
* sim_defaults : dict
  This is a dictionary of the parameters to use for simulations.
  This can be overridden for specific type of simulation.

  * seed : int
     Random number seed to use for the first simulation

  * nsims : int
     Number of simulations

  * profile : str
     Name of the spatial profile to use for simulations.  This must match a profile defined in the roster for each target.
     The 'alias_dict' file can be used to remap longer profile names, or to define a common name for all the profiles in a roster.

     
* sims : dict
  This is a dictionary of the simulation scenarious to consider, and
  of any option overrides for some of those scenarios.

  Each defined simulation needs a 'config/sim_{sim_name}.yaml' to define the injected source to use for that simulation.

* random: dict
  This is a dictionary of the options to use for random sky direction control studies.


Plotting configuration
----------------------

.. code-block:: yaml
   :caption: Sample *plotting* Configuration

   data_plotting:
       plot-castro : {}
  
* data_plotting : dict

  Dictionaries of which types of plots to make for data, simulations and random direction controls.   These dictionaries
  can be used to override the default set of channels for any particular set of plots.

  The various plot types are:

  * plot-castro :
    SED plots of a particular target, assuming a particular spatial profile.


Additional Configuration files
==============================

In addition to the master configuration file, the pipeline needs a few additional files.


Fermipy Analysis Configuration Yaml
-----------------------------------

This is simply a template of the `fermipy` configuration file to be used for the baseline analysis and SED fitting
in each ROI.  Details of the syntax and options are `here <https://fermipy.readthedocs.io/en/latest/config.html>` _
The actual direction and name of the target source in this file will be over written for each target.


Profile Alias Configuration Yaml
--------------------------------

This is an optional small file that remaps the target profile names to shorter names (without
underscores in them).  Removing the underscores helps keep the file name fields more logical, and
fermipy.jobs generally uses underscores as a field seperator.  This also keeps file names shorter, 
and allows us to use roster with a mixed set of profile versions to do simulations.  Here is an example:

  .. code-block:: yaml

    ackermann2016_photoj_0.6_nfw : ack2016
    geringer-sameth2015_nfw : gs2015


Simulation Scenario Configuration Yaml
--------------------------------------

This file specifies the signal to inject in the analysis (if any).  Here is a example, note
that everything inside the 'injected_source' tag is in the format that `fermipy` expects to see
source defintions.

  .. code-block:: yaml

    # For positive control tests we with injected source.
    # In this case it is a powerlaw specturm
    injected_source:
      name : testpl
      source_model :
        SpatialModel : PointSource
        SpectrumType : Powerlaw
      Prefactor : 
        value : 1e-9
      index : 
        value: 2.
      scale : 
        value : 1000.


For null simulations, you should include the 'injected_source' tag, but leave it blank
        
  .. code-block:: yaml

    # For positive control tests we with injected source.
    # In this case it is a DM annihilation spectrum.
    injected_source:


Random Direction Control Sample Configuration Yaml
--------------------------------------------------

The file define how we select random directions for the random direction control studies.  Here is an example:

  .. code-block:: yaml

    # These are the parameters for the random direction selection
    # The algorithm picks points on a grid 

    # File key for the first direction
    seed : 0
    # Number of directions to select
    nsims : 20

    # Step size between grid points (in deg)
    step_x : 1.0
    step_y : 1.0
    # Max distance from ROI center (in deg)
    max_x : 3.0
    max_y : 3.0


Preparing the analysis areas
----------------------------

The initial setup can be done either by using the `PrepareTargets` link directly 
from python, or by running the fermipy-prepare-targets executable.   This will
produce a number of analysis directories and populate them with the needed configuration
files.

  .. code-block:: python

    link = PrepareTargets()
    link.update_args(dict(ttype=dSphs,
                          rosters=['dsph_roster.yaml'],
                          spatial_models=['point'],
   	  		  sims=['null', 'random', 'pl2_1em9']))
    link.run()

  .. code-block:: shell

    fermipy-prepare-targets --ttype dSphs --rosters dsph_roster.yaml --spatial_models point --sims random --sims pl2_1em9 --sims null 

* Additional Arguments

  * alias_dict [None] 
    Optional path to a file that remaps the target profile name to shorter names. 


Baseline target analysis
------------------------

The first step of the analysis chain is to perform a baseline re-opimization of each ROI.
This is done by using `AnalyzeROI_SG` to run the `AnalyzeROI` link on each ROI in the
target list generated by `PrepareTargets`.  This can be done directly from python, or
from the shell using the fermipy-analyze-roi-sg executable.


.. code-block:: python

    link = AnalyzeROI_SG()
    link.update_args(dict(ttype=dSphs,
                          targetlist='dSphs/target_list.yaml'))
    link.run()

  .. code-block:: shell

    fermipy-analyze-roi-sg --ttype dSphs --targetlist dSphs/target_list.yaml --config config.yaml 


* Additional Arguments

  * config ['config.yaml']
    Name of the fermipy configuration file to use.
  
  * roi_baseline ['fit_baseline']
    Prefix to use for the output files from the baseline fit to the ROI

  * make_plots [False]
    Produce the standard plots for an ROI analysis.  


Target SED analysis
-------------------

The next step of the analysis chain is to perform extract the SED each spatial profile 
for each target.  This is done by using `AnalyzeSED_SG` to run the `AnalyzeSED` link on each 
ROI in the target list.  This uses the baseline fits as a starting point for the SED fits.
This can be done directly from python, or from the shell using the fermipy-analyze-sed-sg executable.

.. code-block:: python

    link = AnalyzeSED_SG()
    link.update_args(dict(ttype=dSphs,
                          targetlist='dSphs/target_list.yaml'))
    link.run()

  .. code-block:: shell

    fermipy-analyze-sed-sg --ttype dSphs --targetlist dSphs/target_list.yaml --config config.yaml


* Additional Arguments

  * config ['config.yaml']
    Name of the fermipy configuration file to use.
  
  * roi_baseline ['fit_baseline']
    Prefix to use for the output files from the baseline fit to the ROI

  * make_plots [False]
    Produce the standard plots for a SED analysis.  

  * non_null_src [False]
    If set to True, the analysis will zero out the source before computing the SED.  This
    is needed for positive control simulations.

  * skydirs [None]
    Optional file with a set of directions to build SEDs for.  This is used from random direction 
    control samples.


Simulated realizations of ROI analysis
--------------------------------------

This module provides tools to perform simulated realizations of the ROIs.  This
is done by copying the baseline ROI, using it as a starting point, and simulating
realizations of the analysis by throwing Poisson fluctuations on the expected models
counts of the ROI and then fitting those simulated data.  These simulations are done
for each target and can optionally include injecting a signal source. 
This can be done directly from python, or from the shell using executables.  Here is an
example of how to generate negative control ("null") simulations.  This requires 
having 'config/sim_null.yaml' consist of just a single empty tag 'injected_source'.
To run positive control sample you would just change "null" to, for example "pl2_1em9", 
where 'config/sim_pl2_1em9.yaml' is the yaml file with the spectral model described above.

  .. code-block:: python

    # Copy the base line ROI
    copy_link = CopyBaseROI_SG()
    copy_link.update_args(dict(ttype=dSphs,
                               targetlist='dSphs_sim/sim_null/target_list.yaml',
                               sim='null'))
    copy_link.run()

    # Run simulations of the ROI
    sim_link = SimulateROI_SG()
    sim_link.update_args(dict(ttype=dSphs,
                              targetlist='dSphs_sim/sim_null/target_list.yaml',
                              sim='null'))
    sim_link.run()

    # Collect the results of the simulations
    col_link = CollectSED_SG()
    col_link.update_args(dict(ttype=dSphs,
                              targetlist='dSphs_sim/sim_null/target_list.yaml',
                              sim='null'))
    col_link.run()

  .. code-block:: shell

    fermipy-copy-base-roi-sg --ttype dSphs --targetlist dSphs_sim/sim_null/target_list.yaml --sim null
    fermipy-simulate-roi-sg --ttype dSphs --targetlist dSphs_sim/sim_null/target_list.yaml --sim null 
    fermipy-collect-sed-sg --ttype dSphs --targetlist dSphs_sim/sim_null/target_list.yaml --sim null


* Additional Arguments

  * extracopy [] 
    Extra files to copy from basline fit directory.

  * config ['config.yaml']
    Name of the fermipy configuration file to use.
  
  * roi_baseline ['fit_baseline']
    Prefix to use for the output files from the baseline fit to the ROI

  * non_null_src [False]
    If set to True, the analysis will zero out the source before computing the SED.  This
    is needed for positive control simulations.

  * do_find_src [False]
    Do an additional setup of source finding in the ROI.  

  * sim_profile ['default']
    Name of the profile to use to produce the simulations

  * nsims [20]
    Number of simulations to run

  * seed [0]
    Starting random number seed.  Also used as in bookkeeping.

  * nsims_job [0]
    Number of simulations per job.  0 means to run all the simulations in a single job.


Random Direction Control Studies
--------------------------------

This module also provides tools to perform analyses of random directions in the ROI
as a control sample.  This done by copying the baseline ROI, using it as a starting point, 
and then picked directions away from the center of the ROI and treating them as the 
target.  Here is an example of how to generate random direction control simulations.   Defined
by having 'config/sim_null.yaml' consist of just a single empty tag 'injected_source'.
To run positive control sample you would just change "null" to, for example "pl2_1em9", 
where 'config/sim_pl2_1em9.yaml' is the yaml file with the spectral model described above.


  .. code-block:: python

    # Copy the base line ROI
    copy_link = CopyBaseROI_SG()
    copy_link.update_args(dict(ttype=dSphs,
                               targetlist='dSphs_sim/sim_random/target_random.yaml',
                               sim='random'))
    copy_link.run()

    # Make a set of random directions
    dir_link = RandomDirGen_SG()
    dir_link.update_args(dict(ttype=dSphs,
                              targetlist='dSphs_sim/sim_random/target_list.yaml',
                              sim='random',
			      rand_config='config/random_dSphs.yaml'))
    dir_link.run()

    # Construct the SED for each random direction
    sed_link = AnalyzeSED_SG()
    sed_link.update_args(dict(ttype=dSphs, 
                              targetlist='dSphs_sim/sim_random/target_list.yaml',
			      skydirs='skydirs.yaml'))
    sed_link.run()

    # Collect the results for the random directions
    col_link = CollectSED_SG()
    col_link.update_args(dict(ttype=dSphs,
                              targetlist='dSphs_sim/sim_random/target_list.yaml',
                              sim='random'))
    col_link.run()


  .. code-block:: shell

    fermipy-copy-base-roi-sg --ttype dSphs --targetlist dSphs_sim/sim_random/target_list.yaml --sim random
    fermipy-random-dir-gen-sg --ttype dSphs --targetlist dSphs_sim/sim_random/target_list.yaml --sim random --rand_config config/random_dSphs.yaml
    fermipy-analyze-sed-sg --ttype dSphs --targetlist dSphs_sim/sim_random/target_list.yaml --skydirs skydirs.yaml
    fermipy-collect-sed-sg --ttype dSphs --targetlist dSphs_sim/sim_random/target_list.yaml --sim random 


* Additional Arguments

  * extracopy [] 
    Extra files to copy from basline fit directory.

  * config ['config.yaml']
    Name of the fermipy configuration file to use.
  
  * roi_baseline ['fit_baseline']
    Prefix to use for the output files from the baseline fit to the ROI

  * non_null_src [False]
    If set to True, the analysis will zero out the source before computing the SED.  This
    is needed for positive control simulations.

  * do_find_src [False]
    Do an additional setup of source finding in the ROI.  

  * write_full [False
    Write a full description of all the collected SED results

  * write_summary [False]
    

Plotting Results
----------------

The module also includes code to plot the SED for each target.   Note that
this can also be done with the make_plots=True option in `AnalyzeSED_SG`.

  .. code-block:: python

    link = PlotCastro_SG()
    link.update_args(dict(ttype=dSphs,
                          targetlist='dSphs/target_list.yaml'))
    link.run()
  
  .. code-block:: shell

    fermipy-plot-castro-sg --ttype dSphs --targetlist dSphs/target_list.yaml 


One of the resulting plots would look something like this:

  .. image:: sed_default_point.png