Using DQ2

From Etp
Jump to: navigation, search





Warning, this is obsolete, use Distributed Data Management (DDM)




Preparation (once):

before you download a dataset you have to set the ports, because only a limited set of ports is open by the firewall in ~/.srmconfig/config.xml you must change the line about the tcp-ports to:

    <globus_tcp_port_range> 3000,3090 </globus_tcp_port_range>

Transfer datasets via DQ2 (everytime):

If you have a machine where /afs is mounted (true for every SL5 machine of the group), you can simply run the following setup :

    source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh

Or (recommended) you can use CVMFS (see also Athena Setup with CVMFS ) :

    export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
    source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh --quiet
    localSetupDQ2Client --skipConfirm

It sets a proper GRID environment and the DQ2 tools.

(Note that only version 2.5.0 of the dq2 tools can handle the rucio protocol correctly. If you see errors like "Unknown File Catalog implementation [rucio://atlas-rucio.cern.ch:/grid/atlas]" using the afs version (currently 2.4.1), use cvmfs instead.)

Then create a voms proxy :

    voms-proxy-init -voms atlas

You can specify a long proxy with option :

    -valid 96:00 

(where 96:00 = 96 hours is the upper limit for a voms proxy). Then define the following variable :

    export DQ2_LOCAL_SITE_ID=ROAMING

You have then access to all DQ2 commands. What you may want to use is:

dq2-ls to list available datasets and containers. e.g. :

    dq2-ls mc08.*5200*AOD*r* list all datasets matching this pattern 
dq2-ls -f mc08.105200.T1_McAtNlo_Jimmy.recon.AOD.e357_s462_r579_tid028664 gives all the files belonging to this dataset

dq2-get to get a dataset. e.g. :

    dq2-get mc08.105200.T1_McAtNlo_Jimmy.recon.AOD.e357_s462_r579_tid028664 to get the whole dataset
    dq2-get -f AOD.028664._01117.pool.root.1 mc08.105200.T1_McAtNlo_Jimmy.recon.AOD.e357_s462_r579_tid028664 to get one file

dq2-put to upload a file on the Grid (if you want to run small MC Production). e.g. :

    dq2-put -s mydirectory -L LRZ-LMU_LOCALGROUPDISK mydatasetname 

Upload all the files in mydirectory on LRZ-LMU_LOCALGROUPDISK and register them in the LFC and DQ2. Important:' You can upload only on SCRATCHDISK areas and on LRZ-LMU_LOCALGROUPDISK.

dq2-list-dataset-replicas to list the location of a dataset. .e.g :

    dq2-list-dataset-replicas mc08.105200.T1_McAtNlo_Jimmy.recon.AOD.e357_s462_r579_tid028664 --all

dq2-list-dataset-replicas-container to list the location of datasets belonging to a container (a container is a collection of dataset. It differs by a "/" at the end of its name). e.g. :

    dq2-list-dataset-replicas-container mc08.105200.T1_McAtNlo_Jimmy.recon.AOD.e357_s462_r579/

Further information can be found here and here.

Physical paths

To resolve a dataset name to the physical file paths on a given site use:

dq2-ls -L LRZ-LMU_LOCALGROUPDISK -fp user.mann.ntuple_slim_SUSY_2L_2012_v011_00017_01002_data12_00200913.physics_JetTauEtmiss.2013-09-05/

This will give output containing the path on the storage element for all files in the specified dataset (if it exists on that site):

srm://lcg-lrz-srm.grid.lrz.de/pnfs/lrz-muenchen.de/data/atlas/dq2/atlaslocalgroupdisk/rucio/user/mann/72/5d/user.mann.017267._00004.used.config

More interesting is maybe the reverse: Given the physical path that no longer contains the dataset name (since Rucio has been introduced), how do I find the dataset name the file belongs to. For this use dq2-list-parent-datasets:

dq2-list-parent-datasets srm://lcg-lrz-srm.grid.lrz.de/pnfs/lrz-muenchen.de/data/atlas/dq2/atlaslocalgroupdisk/rucio/user/mann/72/5d/user.mann.017267._00004.used.config

gives

user.mann.ntuple_slim_SUSY_2L_2012_v011_00017_01002_data12_00200913.physics_JetTauEtmiss.2013-09-05.130905131427