Register a Dataset in Proof
You can register a list of files as a Dataset in proof. (You can find more information about Datasets at http://root.cern.ch/drupal/content/working-data-sets).
- If a Dataset is registered you don't have to validate it every time you use it
- You can access the dataset with the dataset name, allowing easy access from different locations (i.e LRZ and local)
- The registered datasets are stored in ~/.proof
- Proof Dataset names may not contain a "/". To register a container as a dataset include the "/" in the name, the resulting dataset will be without "/".
You can use the script :
to register datasets.
The default is for LRZ and dcap protocol, but it supports other sites and protocols as well (call with option '-h' for more info).
The script requires the DQ2-client setup (which at the moment does not run on Ubuntu) and a Grid proxy
voms-proxy-init -voms atlas
to run the script.
Give the name of the datasets you want to register with:
/project/etpsw/Common/bin/registerdataset.py dataset1 dataset2 ...
using the name of the dataset as in dq2-ls (i.e. user.markhod.SUSYD3PD.mc09_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e510_s765_s767_r1302_r1306.V1) You can also supply a file with a list of datastes with one dataset per line:
/project/etpsw/Common/bin/registerdataset.py -f <filelist>
You can ignore messages that say "problems calculationg old checksum" and "problems notifiying update with 'NotifyUpdate'"
To see the list of registered datasets open a root session and:
TProof * proof = TProof::Open("") proof->ShowDataSets()
To access the tree #susy of a registered dataset with proof:
.L D3PDSelector.C+ proof->Process(/default/yourusername/user.markhod.SUSYD3PD.mc09_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e510_s765_s767_r1302_r1306.V1#susy","D3PDSelector.C+")
If you only want to create file-lists but not do the registration with Proof you can call the script with option '-n':
/project/etpsw/Common/bin/registerdataset.py -n dataset1 dataset2 ...
which will create file-lists 'dataset1.list, dataset2.list, ...' with contain the physical file-names of the datasets. Such lists can be used with copy scripts or in Root TFile::Open or TChain directly, e.g.:
TChain * chain = new TChain("physics",""); TFileCollection* fc = new TFileCollection("mylist", "mylist", "datasetXX.list"); chain->AddFileInfoList((TCollection*)fc->GetList());