Data storage and transfers
Data and storage
Accessing the storage areas
Every storage area of Adastra is based on the LUSTRE filesystem.
On Adastra, we distinguish four main storage areas:
Home Long-term data (with a need for backups for the project’s duration ~ 12 months); for scripts and tools related to a project.
Scratch Short-term data (<= 30 days); for fast, batch-job access. Purged based on the age of the file (see below).
Work Mid-term data (few months); to store data one wishes to postprocess.
Store Long-term data (with a need for cold storage and backups); to keep in a secure manner.
The table below shows the environment variable set by the Login unique tool and gives additional details regarding the characteristics of the storage area.
Area |
Sub-area |
Paths or environment variables |
Permissions |
Default inode quota |
Backups |
Purged |
Retention |
Via compute nodes |
|
---|---|---|---|---|---|---|---|---|---|
Home |
Personal |
|
umask 0022
u=rwx
g=rx
o=rx
|
20 Gio |
30000 |
Yes (bacula) |
Never |
> 6 months |
Read-write |
Project personal |
|
100 Gio |
300000 |
||||||
Project shared |
|
||||||||
Store |
Project personal |
|
1.5 Tio [4] |
9000 |
Yes (HSM) |
Not mounted |
|||
Scratch |
Project personal |
|
20 Tio |
1000000 |
No |
Yes (30 days) [1] |
Read-write |
||
Project shared |
|
||||||||
Work |
Project personal |
|
15 Tio |
250000 |
Never |
||||
Project shared |
|
umask
).Footnotes
Warning
Quota are always defined per group except for the personal home. This means that wether you are 2 or 10 in a GENCI/DARI project, you get the same amount of space. Contact svp@cines.fr to ask for a quota reevaluation.
Note
The CCFR
and ALL_CCFR
prefixed variables, while being a bit of misnomer, have the benefit of being available (same name) on all three of the french computing sites while also carrying a similar meaning.
Important
Files within the scratch area are not backed up and are purged on a regular basis.
Important
Storage areas have their own specific use, try to respect the usage concept of the area. The store space may have long latencies thus it is not intended to be used during computations, actually, store is not available on compute nodes.
We now describe the specificities of a personal, project personal and project shared sub-area.
Personal or simply, non project related - Thanks to the unique login functionalities you can work on multiple projects through a single user account. This account has a dedicated home (
${OWN_HOMEDIR}
), the only strictly personal area which is not shared with other users nor specific to a project. It has its own quota counters. This is the home you see when doing anscp
and also the one which contains the.Xauthority
file. Note that the${HOME}
environment variable should never be set to point to this personal home by the unique login tool unless the user does not have a project.Project personal - A user account is attached to a project and is given a personal area using the project’s quota counters. You can change where this environment variable points to when manipulating the unique login tool.
Project shared - To ease data movements between users of a same project (say, avoid UNIX permissions issues), the project shared sub-area can be used. It shares the same quota counters as the project personal sub-area. You can change where this environment variable points to when manipulating the unique login tool.
We also give the user a way to exchange data between projects via an inter-project work sub-area.
Note
Project personal and Project shared have a similar meaning to what used to exist on Occigen (CINES’ previous machine).
Note
If an user is not linked to any project, its ${HOMEDIR}
will point to its personal home.
Note
You may find complementary information in Layout of common files and directories.
Warning
Be careful of what ends up in your personal home instead of your project personal home. For instance your .ssh
is expected by some tools to be in your personal home directory. You can work around these issues using links via the ln -s
command.
Additional storage characteristics
Some more advanced technical details are given in two tables below:
Mount point |
MDT count |
OST count |
Cumulated OST space |
SSD backed |
Maximal practical throughput (read / write) in Gio/s |
---|---|---|---|---|---|
|
2 |
2 |
125 Tio |
OST/MDT |
77 / 34 |
|
2 |
40 (4 backup) |
1.9 Pio |
OST/MDT |
1310 / 786 |
|
2 |
16 |
9.1 Pio |
MDT |
- |
|
2 |
24 |
13.6 Pio |
MDT |
175 / ~100 |
The acronyms exposed in the table above are LUSTRE slang and defined in the table below. It’s largely optional to the typical user to know any of them.
Name |
Acronym |
Meaning |
---|---|---|
MetaData |
MD |
Data attached to a file a POSIX file system should expose (say rights, date, etc.). |
Object Storage |
OS |
Object Storage is used to store the data contained in a file (if any or if large enough). |
MD Server |
MDS |
Handle and distributes the requests to the MDT. |
OS Server |
OSS |
Handle and distributes the read, write, etc. requests to the OST. |
MD Target |
MDT |
Block device used to store MetaData. |
OS Target |
OST |
Block device representing the Object Storage. |
MD Client |
MDC |
Queries the MDS. |
OS Client |
OSC |
Queries the OSS. |
The store area exposes a LUSTRE filesystem with a Hierarchical Storage Management (HSM) functionality. This HSM filesystem means that long left untouched data might migrate on tapes for long term storage. Accessing this long term stored data (on tapes), will take a long time.
Data storage good practices
User home directories
Users should note that the home performance will not be as high as the scratch filesystem.
Permissions
Users have the ability to change permissions on their home directories. It is recommended that permissions be set to be as restrictive as possible (without interfering with your work).
Lustre-specific lfs find command
When looking for a file, instead of using the traditional find
command, prefer the lfs find
variant.
Somme example are given below:
$ lfs find ./ -type f -print
.//.python_history-69887.tmp
$ ls -la ./.python_history-69887.tmp
-rw-------. 1 user dci 124 Dec 13 16:26 ./.python_history-69887.tmp
$ lfs find ./ -type f -size +1M -print
LUSTRE file striping
Supposing one care about performing IO (read/write) operations (say via MPI-IO, HDF5 parallel, NetCDF parallel, tarballing, etc.) on large files and at high throughput, then, one should consider striping his files. This operation will spread the data and the load across multiple drives (Object Storage Target in LUSTRE’s terminology) which can then be used in parallel. A file is then described as striped. This striping is transparent to the user. So if you have multiple threads or ranks accessing the same file at different offsets, they may in fact access data on different drives. Once defined, the striping is transparent to the user, the benefits are provided by filesystem.
This is how you obtain the number of stripes on a <directory/file>
and set the stripe count for all the upcoming new files in a <directory>
to be equal the number of available OST (note the --stripe-count -1
):
$ lfs getstripe <directory/file>
$ lfs setstripe --stripe-count <stripe-count> --stripe-index <starting-OST-index> --stripe-size <stripe-size> <directory>
Where the some <stripe-count>
are given in the table below, <starting-OST-index>
is highly recommended to be -1
and <stripe-size>
is highly recommended to be 1M
or 2M
.
You should stripe your directories before creating files inside it. Else you wont benefit from the feature. Some recommended stripe count are given below as a function of the file’s size:
File size in Gio |
Recommended stripe count |
---|---|
[0, 1) |
1 |
[1, 8) |
8 |
[8-128) |
24 |
[128-infinity) |
24+ |
The above table can be represented in one command which the user is recommended to use on his simulation result folder:
$ lfs setstripe -E 1G --stripe-count 1 \
-E 8G --stripe-count 8 \
-E -1 --stripe-count 24 \
<directory>
Note
Enabling striping on small files may degrade performance.
Note
Striping over all the OSTs may not lead to the best performance, so you may want to adjust the stripe count downward. Also, if your code assumes one or more file per process and the IOs of a process does not extend past its own files, we do not recommend stripping but only making sure the files will be spread all the OSTs (default behavior). Indeed the files will already reside on multiple OSTs and stripping would only add overhead (again, assuming all the processes read or write at the same time).
Other
Before copying data to the store, make sure you compact your data in, say, a tarball. This will reduce your data store and retrieval times (only big files are suited for HSM);
The scratch area is a temporary storage, move the important files produced by jobs on one of the other areas (work or store);
Repatriate your files to a remote site (i.e., your lab) using, say
scp
,rsync
, etc. see Tools to transfer Data for more details;Delete unnecessary files;
Prefer
lfs find
overfind
andlfs df
overdf
.
Note
Compressing your data (say, gzip’ing it) in store is not necessary as the HSM will do it for you. That being said, your file sizes wont appear smaller like what you would expect after compression, that is because the compression is transparent to the user (that is, there are few visible artifacts). We would prefer the user to ask for more store quota rather than he compresses his data himself. If user compression is not recommended, packing your data (say, in a tarball) in as few file as possible is good practice and will largely decrease the retrieval time of old files. If you need more store area quota ask svp@cines.fr.
Datasets
In order to avoid duplication of datasets, CINES provides the ${DSDIR}
environment variable. If you do not find a dataset in what we currently provide, you You can ask svp@cines.fr to add it to the set.
You list the dataset provided, simply do:
$ ls -- "${DSDIR}"
AlphaFold-2.3.1 cc12m coco GQA kinetics nuscenes ONCE openimage sbu visual_genome
AudioSet cc3m freesound ImageNet1k MSRVTT once ONCE.tar pile VideoCC WebVid
Note
The list above is subject to change. Notably, we will add new datasets.
Data policies
Quotas
You can check your quota usage for the current project via the myproject -s <project_identifier>
commands. To show the state of all your projects, use myproject -S
.
If you need an exception to the limits listed in the table above, such as temporary higher quotas in your home, work, scratch and store areas or a delay before a purge, contact svp@cines.fr with a summary of the exception that you need.
In case you start getting close to the hard quota, you should be notified via email. If you try to go past your inode or block quota you will receive an error message similar to:
Disk quota exceeded
If you do not whish to use the CINES specific myproject
command, you can use lfs quota
like so:
$ lfs quota -gh -- "grp_${USER}" "${HOMEDIR}"
$ lfs quota -gh -- "grp_${USER}" "${WORKDIR}"
$ lfs quota -gh -- "grp_${USER}" "${SCRATCHDIR}"
$ lfs quota -gh -- "grp_${USER}" "${STOREDIR}"
Note
We do not guarantee that the above (lfs quota
) commands will always work. You should prefer myproject
.
Purge
To keep enough space for big jobs on the scratch file system, files that have not been accessed (i.e., read) or modified in the project and user areas are purged at the intervals shown in the Accessing the storage areas above. Please make sure that valuable data are moved off the scratch file system regularly.
Data retention
After a user account deactivation or when a project ends, the user and project data in non-purged areas will be retained for 6 months. After this time frame, CINES has the reserved right to delete the data. Data in purged areas remain subject to normal purge policies.
Retrieving from backups
If you accidentally delete files from your home or store directory, you may be able to retrieve them. Online backups are performed at regular intervals. Daily backups for the last 10 days. You can ask svp@cines.fr to retrieve your lost files.
Data transfer
Note
The first thing the user must understand is that outbound connections are not allowed (with some exception, see below). For instance, you can’t execute a scp
or git clone
command from an Adastra login node toward your laboratory’s storage machine unless it was previously allowed. See Authorizing an outbound connection for more details.
Between Adastra filesystems
wdcp
is a tool based on mpiFileUtils that allows you to copy a directory using a dedicated node kind called transfer node.
Between CINES and your laboratory
Taking into account the note just above means that you must initiate the data transfer from your laboratory towards Adastra. You may be able to use it the other way around if, your laboratory’s machine’s IP is allowed.
Between computing site (CCFR)
Communication between french computing sites can rely the Centres de Calcul FRançais (CCFR) network. This network is relatively fast (10 Gb/s between CINES, IDRIS and TGCC, 100 Gb/s Soon™). Using CCFR is NOT mandatory you can use the traditional FQDN endpoints but this kind of transfer will be slower.
To use the network user needs to do theses things:
first, ask both the source and destination computing site for the authorization to use the CCFR resources (node and network);
second, connect to the site’s login node (slight twist at CINES and IDRIS, see below);
and third, initiate the transfer using the right tools.
The first part is a per site specific operation and is summarized in the table below, assuming you want to connect to the site show in the first column of the table below.
Site to connect to |
Process specificities |
---|---|
CINES |
The user requests at svp@cines.fr that his account be allowed to use CCFR resources. |
IDRIS |
The user fills the |
TGCC |
The user requests at hotline.tgcc@cea.fr that his account be allowed to use CCFR resources. |
Note
To check if you have access to the CINES CCFR node, you may connect to Adastra, use the id
command and look for the cinesccfr
group. If you have it then you are good to go on CINES’ side. You still have to make sure that the other site’s CCFR node access request was processed.
The second part’s procedure is then as follows:
You
ssh
to the computing site which will initiate the transfer. You land on a login node.Then, at CINES and IDRIS, you
ssh
again, to the initiating CCFR node FQDN (table below) associated to the computing site you have just logged on.At TGCC, you stay on the login node.
You are now on a CCFR node, ready to initiate a transfer towards the receiving computing site’s CCFR node FQDN (table below).
Site |
Sending CCFR node FQDN |
Receiving CCFR node FQDN |
---|---|---|
CINES |
adastra-ccfr.cines.fr (aka login1) |
adastra-ccfr.cines.fr |
IDRIS |
jean-zay-ccfr.idris.fr |
jean-zay-ccfr.idris.fr |
TGCC |
Any login node. |
irene-fr-ccfr-gw.ccc.cea.fr |
Now as an example, say you want to transfer 50 Tio of data from the Adastra machine (at CINES, Montpellier) to the Jean-Zay machine (at IDRIS, Paris). The user needs to have an account on both Adastra and Jean-Zay and have request to use the CCFR network. Then, to initiate a connection from Adastra towards an other site, you first need to connect to a site’s login node (say, to Adastra, adastra.cines.fr
) and then, connect from the login node to the sending CCFR node adastra-ccfr.cines.fr
endpoint using ssh <login>@adastra-ccfr.cines.fr
command. Finally, you can initiate connections towards the other sites which, following our previous Adastra and Jean-Zay example, would look like so:
$ scp -r my_adastra_directory <login>@jean-zay-ccfr.idris.fr:/path/to/my_directory
The traffic is automatically routed through the CCFR network and you can benefit from the faster interconnections.
In addition to using the CCFR endpoints explicitly, the national sites (but not CINES as of 2023/01/01) provide a module named ccfr
. It exposes the following commands: ccfr_cp
, ccfr_ssh
, ccfr_sync
and ccfr_mycert
. The first three commands respectively wrap rsync
, ssh
and rsync
to simplify the authentication via a process called Single Sign-On (which is different from Login unique). This authentication is setup using certificates and ccfr_mycert
. More details on these commands in this slightly outdated document
. You can use the CCFR network without this module.
Tools to transfer Data
Note
Standard File Transfer Protocol (FTP) and Remote Copy (RCP) should not be used to transfer files due to security concerns.
Warning
Before reading this subsections, understand that due to the unique login functionalities implementation, the remote home directory may not be what you expect. Indeed, it may point to your personal home instead of your project personal home. You are thus recommended to work with absolute paths.
If moving many small files, it can be beneficial to compress them into a single archive file, then transfer just the one archive file. When using command-line tools, you should use the login nodes.
wdcp
can be used to copy data across the Adastra filesystems. It takes the source directory and a destination directory:
$ wdcp <source_directory> <destination_directory>
To get information about your ongoing copies, use the --show
flag. To get information about all your previous transfer jobs, use the --list-all
flag.
Command-line tools such as parallel-sftp
, scp
and rsync
can be used to transfer data to and from CINES (via the internet).
psftp
orparallel-sftp
- parallel-sftp can be used to saturate the bandwidth when one copying thread is not enough (typically on the CCFR network)Its usage is identical as
sftp
, except the option-n
which let you choose the number of ssh connections used for the parallel transfer. Sending a file from CINES to IDRIS:$ parallel-sftp -n 5 <login>@jean-zay-ccfr.idris.fr
Once logged in, retrieving a file or directory from the remote host (IDRIS in this example) to the local host is done like so:
sftp> get [options] <remote_path> <local_path>
And to transfer data from the local host to the remote host (IDRIS in this example):
sftp> put [options] <local_path> <remote_path>
If you haven’t used the
-r
(recursive) option with thesftp
command, you can use it directly with get and put. For more information, typehelp
from thesftp
prompt, orman sftp
from the command line.
scp
- secure copy, a remote file-copying toolSending a local file
my_file
to/path/to/
on Adastra:$ scp my_file <login>@adastra.cines.fr:/path/to/
Retrieving a remote file at
/path/to/my_file
on Adastra to the local directory./
:$ scp <login>@adastra.cines.fr:/path/to/my_file ./
Sending a local directory
my_directory
to/path/to/
on Adastra:$ scp -r my_directory <login>@adastra.cines.fr:/path/to/
rsync
- a fast, versatile, remote (and local) file-copying toolSynchronize a remote directory
/path/to/
on Adastra with a local directorymy_directory
(readonly):$ rsync -avz my_directory <login>@adastra.cines.fr:/path/to/
Note
Where
a
is for archive mode,v
is for verbose mode andz
is for compressed mode. There is no/
aftermy_directory
. This will produce amy_directory
folder atadastra.cines.fr:/path/to/
(so we end up withadastra.cines.fr:/path/to/my_directory
).Synchronize a local directory
./
with a remote directory/path/to/my_directory
on Adastra (readonly):$ rsync -avz <login>@adastra.cines.fr:/path/to/my_directory ./
Synchronize a remote directory
/path/to/
on Adastra with a local directorymy_directory
(readonly) and show progress:$ rsync -avz --progress my_directory <login>@adastra.cines.fr:/path/to/
Synchronize a remote directory
/path/to/
on Adastra with a local directorymy_directory
(readonly), show progress, include files or directories starting with T and exclude all others$ rsync -avz --progress --include 'T*' --exclude '*' my_directory <login>@adastra.cines.fr:/path/to/
Synchronize a local directory
./
with a remote directory/path/to/my_directory
on Adastra (readonly), if the file or directory exists at the target (local) but not on the source (CINES), then delete it:$ rsync -avz --delete <login>@adastra.cines.fr:/path/to/my_directory ./
Synchronize a remote directory
/path/to/
on Adastra with a local directorymy_directory
(readonly), transfer only the files that are smaller than 1 Mio:$ rsync -avz --max-size='1m' my_directory <login>@adastra.cines.fr:/path/to/
If you want to verify the behavior is as intended, execute a dry-run:
$ rsync -avz --dry-run my_directory <login>@adastra.cines.fr:/path/to/
See the manual pages for more information:
$ man sftp
$ man scp
$ man rsync
- Differences:
scp
carry a similar semantic to the traditionalcp
with extended remote exchange functionalities. It cannot continue where it left if the transfer is interrupted.rsync
allows you to restart interrupted transfers (see the--partial
or-P
flags). By default,rsync
checks if the transfer of the data was successful (no data corruption).FileZilla starts multiple
sftp
processes and uses them parallel. Can be used to reach high throughput.parallel-sftp
is similar to FileZilla but uses multiple SSH channels instead of multiplesftp
processes. It does not come with a GUI.
Chaining wdcp
data transfer and job
You simply need to start an asynchronous wdcp
copy, and chain your work to the copy job (note that wdcp
prints the job identifier).
$ wdcp <source_directory> <destination_directory>
Dest dir doesn't exist, creating it
source path is <source_directory>
dest path is <destination_directory>
sbatch: INFO : As you didn't ask threads_per_core in your request: 2 was taken as default
sbatch: INFO : As you didn't ask cpus_per_task in your request: 2 was taken as default
Submitted batch job 2512458
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2512458 transfert dcp_tran <user> R 0:03 1 login1
$ sbatch --dependency=afterok:2512458 my_first_job.sh
Submitted batch job 2512459
Also, note that wdcp
can be executed from the inside of the batch script. This means that you could have a workflow where you work on the scratch, and if everything goes well, after the last srun, you launch an asynchronous wdcp
to copy your result files from the scratch area into, say, the work area.