Register a Dataset in Proof

From Etp
Jump to: navigation, search

You can register a list of files as a Dataset in proof. (You can find more information about Datasets at http://root.cern.ch/drupal/content/working-data-sets).

  • If a Dataset is registered you don't have to validate it every time you use it
  • You can access the dataset with the dataset name, allowing easy access from different locations (i.e LRZ and local)
  • The registered datasets are stored in ~/.proof
  • Proof Dataset names may not contain a "/". To register a container as a dataset include the "/" in the name, the resulting dataset will be without "/".

You can use the script :

   /project/etpsw/Common/bin/registerdataset.py <datasetname>

to register datasets.

The default is for LRZ and dcap protocol, but it supports other sites and protocols as well (call with option '-h' for more info).

The script requires the DQ2-client setup (which at the moment does not run on Ubuntu) and a Grid proxy

   voms-proxy-init -voms atlas

to run the script.

Give the name of the datasets you want to register with:

  /project/etpsw/Common/bin/registerdataset.py dataset1 dataset2 ...

using the name of the dataset as in dq2-ls (i.e. user.markhod.SUSYD3PD.mc09_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e510_s765_s767_r1302_r1306.V1) You can also supply a file with a list of datastes with one dataset per line:

   /project/etpsw/Common/bin/registerdataset.py -f <filelist>
  

You can ignore messages that say "problems calculationg old checksum" and "problems notifiying update with 'NotifyUpdate'"

To see the list of registered datasets open a root session and:

  TProof * proof = TProof::Open("")
  proof->ShowDataSets()

To access the tree #susy of a registered dataset with proof:

  .L D3PDSelector.C+
  proof->Process(/default/yourusername/user.markhod.SUSYD3PD.mc09_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e510_s765_s767_r1302_r1306.V1#susy","D3PDSelector.C+")


FileList creation

If you only want to create file-lists but not do the registration with Proof you can call the script with option '-n':

  /project/etpsw/Common/bin/registerdataset.py -n dataset1 dataset2 ...

which will create file-lists 'dataset1.list, dataset2.list, ...' with contain the physical file-names of the datasets. Such lists can be used with copy scripts or in Root TFile::Open or TChain directly, e.g.:

   TChain * chain = new TChain("physics","");
   TFileCollection* fc = new TFileCollection("mylist", "mylist", "datasetXX.list");
   chain->AddFileInfoList((TCollection*)fc->GetList());