Proof Usage

From Etp
Jump to: navigation, search

Compiling and loading your analysis applications for Proof

Depending on the complexity of your analysis setup and how you want to run Proof different steps are needed to properly compile and load.


Single-file macro

If you application is self contained in a single file (or single .C source and .h header file) you can simply specify the file-name when you call Process, e.g.

 
dset->Process("MyAnalysis.C"); // execute as CINT macro
dset->Process("MyAnalysis.C+"); // compile and load as C++ library

More complex analysis requiring multiple source/header files

This depends how your Proof cluster is setup:

Homogenous cluster

Same version of operating system and bit-length, same root version on all nodes (Proof-lite, Proof@LRZ for example), shared file system for code.

In this case you can compile your application locally and load the resulting .so library on all Proof slaves.

 
proof = TProof::Open(gSystem->GetFromPipe("pod-info -c")); // get connection string for PoD and create Proof Session
gROOT->ProcessLine(".L D3PDSelector.C+"); // Load main souce file in Root -> produces .so lib
// Load resulting .so lib on all slaves:
gProof->Exec("gSystem->Load(\"/full/path/of/your/so/lib/D3PDSelector_C.so\")"); // watch out syntax for quotes in quotes \"  !

proof->Process("/default/muellert/user.markhod.SUSYD3PD.mc09_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e510_s765_s767_r1302_r1306.V1#susy","D3PDSelector")// only class name

When calling Process(..) only specify class-name of your TSelector not the file-name! You also have to give the Treename you want to analyse with e.g. #susy

Inhomogenous cluster

Nodes with different operating system, mix of 32/64 bit or no shared file system. In this case your code needs to be compiled separately on all proof slaves. Recommended way to achieve this is to use Proof packages.

Simple recipe:

  • Create sub-directory where you put all you .C and .h files needed for building
  • within that sub-directory one has to provide macro PROOF-INF/SETUP.C which contains instructions what to load.
 
mkdir MySusyD3PD
# copy all source files into it
cd MySusyD3PD
# make further sub-dir
mkdir PROOF-INF
cd PROOF-INF
# create setup file
cat > SETUP.C
Int_t SETUP()
{
  return( gROOT->ProcessLine(".L D3PDSelector.C+") );
}
  • create .par package (tar.gz archive)
 
cd ../.. # parent dir of MySusyD3PD
tar czf MySusyD3PD.par
rm -rf MySusyD3PD # remove directory, confuses Root/Proof
  • upload the package in your Proof session
 
gProof->UploadPackage("MySusyD3PD"); 
gProof->EnablePackage("MySusyD3PD"); # compiles and builds package on each slave
  • you have to upload only once, the package will still be there in your next session
  • but you need to do gProof->EnablePackage("..") in each session.
  • When calling Process(..) only specify class-name of your TSelector not the file-name!
dset->Process("D3PDSelector"); # only class-name

More details in Proof working with par files


General tips for working with Proof

How to manage output objects (histos, trees, ...) within a PROOF analysis

The following is an example. Such objects could be declared as attributes of the analysis class, be instanciated in SlaveBegin(), and filled in Process(). To be able to retrieve them after processing, they could be 'booked' in SlaveBegin() as output objects.

#include "TH1F.h"

class ControlSample0 : public TSelector {
// ...
TH1F* htest ;
// ...
} ;
#include <iostream>
using namespace std ;

void MyAnalysisClass::SlaveBegin(TTree * /*tree*/) {

// Instanciate objects

htest = new TH1F("htest", "htest", 100, 0, 100000) ;

// Book all objects defined in current TDirectory

TList* obj_list = (TList*) gDirectory->GetList() ;
TIter next_object((TList*) obj_list) ;
TObject* obj ;

cout << "-- Booking objects:" << endl;
while ((obj = next_object())) {
  TString objname = obj->GetName() ;
  cout << " " << objname << endl ;
  fOutput->Add(obj) ;
}

}
Bool_t D3PDAnalysis::Process(Long64_t entry) {
// Load entry
  Long64_t ientry = fChain->GetTree()->LoadTree(entry);
  if (ientry < 0) return kTRUE ;
  int nb = GetEntry(entry, 0) ;// function already defined as " virtual Int_t GetEntry(Long64_t entry, Int_t getall = 0) { return fChain ? fChain->GetTree()->GetEntry(entry, getall) : 0; }" in header file

htest->Fill(nb) ;

}

To retrieve the objects after processing (TProof::Process() is over) and store them within a root file, it is then possible to use the following code:

// Define output file
TFile* output_file = new TFile("output.root", "recreate") ;

// Retrieve objects
TList* list = proof->GetOutputList() ;
TIter next_object((TList*) list);
TObject* obj ;
cout << "-- Retrieved objects:" << endl ;
output_file->cd() ;
while ((obj = next_object())) { TString objname = obj->GetName() ; cout << " " << objname << endl ; obj->Write() ; }

// Write output file
output_file->Write() ;

How to conveniently write the log file of each worker to one text file:

// get proof manager (if not already available)
TProofMgr* mgr = proof->GetManager() ;

// get proof logs
TProofLog *log = mgr->GetSessionLogs() ;

// log file name (set prefix)
TString log_file_name = "log_all-workers.txt" ;

// save log
int flag = log->Save("*", log_file_name) ;

How to register and use PROOF datasets

Datasets are separate for each user, but you can choose your username at the proof server freely (etpopt02). There must not be a dot in your username if you use datasets! The files must be accessible from the PROOF master node!

TFileCollection * fc = new TFileCollection();
fc->Add("/path/to/file/file1.root");
fc->Add("/path/to/file/file2.root");
...
proof->RegisterDataSet("MyDataSet",fc,"OV") // O means Overwrite previous; V means verify files