FAQ new config : Différence entre versions
(→How can i access files on the cluster using sshfs ?) |
|||
Ligne 579 : | Ligne 579 : | ||
Example for a Fedora 23 machine connected on Inria Sophia network (with user privileges) : | Example for a Fedora 23 machine connected on Inria Sophia network (with user privileges) : | ||
mylaptop$ mkdir $XDG_RUNTIME_DIR/nef | mylaptop$ mkdir $XDG_RUNTIME_DIR/nef | ||
− | mylaptop$ sshfs -o transform_symlinks nef-devel2.inria.fr:$XDG_RUNTIME_DIR/nef | + | mylaptop$ sshfs -o transform_symlinks nef-devel2.inria.fr: $XDG_RUNTIME_DIR/nef |
mylaptop$ fusermount -u $XDG_RUNTIME_DIR/nef | mylaptop$ fusermount -u $XDG_RUNTIME_DIR/nef | ||
Ligne 585 : | Ligne 585 : | ||
*configure ssh tunneling through nef-frontal | *configure ssh tunneling through nef-frontal | ||
* or mount on nef-frontal instead of nef-devel2 (lower performance) | * or mount on nef-frontal instead of nef-devel2 (lower performance) | ||
− | |||
== Why the /data quota usage for users and groups do not match ? == | == Why the /data quota usage for users and groups do not match ? == |
Version du 20 avril 2016 à 07:49
Sommaire
- 1 General
- 2 Job submission
- 2.1 How do i submit a job ?
- 2.2 How to choose the node type and properties ?
- 2.3 How do i reserve GPU resources ?
- 2.4 What is the format of a submission script ?
- 2.5 What are the available queues ?
- 2.6 How are the jobs scheduled and prioritized ?
- 2.7 How do i submit a job in the "big" queue ?
- 2.8 How do i submit a besteffort job ?
- 2.9 How do i reserve resources in advance ?
- 2.10 How much memory (RAM) is allocated to my job ?
- 2.11 How can i change the memory (RAM) allocated to my job ?
- 2.12 How can i check the resources really used by a running or terminated job ?
- 2.13 How can i submit hundreds/thousands of jobs ?
- 2.14 How can i pass command line arguments to my job ?
- 3 How to Interact with a job when it is running
- 4 Software
- 4.1 How to run an OpenMPI application?
- 4.2 How to run an Intel MPI application?
- 4.3 How can i run caffe ?
- 4.4 How can i use spark ?
- 4.5 How can i run a graphical application on a node ?
- 4.6 Can i use tensorflow?
- 4.7 What are the Matlab licences available ?
- 4.8 What are the best practices for Matlab jobs ?
- 5 Troubleshooting
- 6 Disks and filesystems
General
Who can have an account on the cluster ?
- Inria users : nef is an Inria Sophia Antipolis - Méditerranée research center platform open for all people with an Inria account during the validity period of the account
- Academic and industrial partners of Inria, under agreement.
For account application, extension, renewal please follow the first steps procedure.
When does my cluster account expire ?
Type nef-user -l your_nef_login
on nef-devel2 or nef-frontal. The Expire date is the first day the account will be desactivated.
What is OAR ?
OAR is a versatile resource and task manager (also called a batch scheduler) for HPC clusters, and other computing infrastructures (like distributed computing experimental testbeds where versatility is a key).
OAR is the way you reserve resources (nodes, cores) on the cluster by submitting a job.
- The official User Documentation is here : http://oar.imag.fr/docs/2.5/#ref-user-docs
- The Inria Rennes Tutorial : http://igrida.gforge.inria.fr/tutorial.html
What are the most commonly used OAR commands ? (see official docs)
- oarsub : to submit a job
- oarstat : to see the state of the queues (Running/Waiting jobs)
- oardel : to cancel a job
- oarpeek : to show the stdout of a job when its running
- oarhold : to hold a job when its Waiting
- oarresume : to resume jobs in the states Hold or Suspended
Can i use a web interface rather than command line ?
Yes, connect to the Kali web portal : if you have an Inria account just Sign in/up with CAS ; if you have no Inria account use Sign in/up.
Job submission
How do i submit a job ?
Use command oarsub
.
With OAR you can directly use a binary as a submission argument in the command line, or even an inline script. You can also create a submission script. The script includes the command line to execute and the resources needed for the job. Do not forget to use the -S tag of oarsub if you want the OAR parameters in the script to be parsed and honored (oarsub -S ./myscript.sh
).
How to choose the node type and properties ?
The cluster has several kind of nodes.
To view all defined OAR properties :
- graphical : connect to Monika and click on the node name.
- command line : use
oarnodes
, example for nef085:oarnodes nef085.inria.fr
If you want all the cores from a single node :
oarsub -l /nodes=1
If you want 48 cores from any type and any number of nodes :
oarsub -l /core=48
In this case, the 48 cores can be spread on several nodes; Your application must handle this case ! (using MPI or other frameworks) A multithreaded application won't be able to use all the cores reserved if they are spreaded on several nodes.
If you need to reserve a given amount of cores from a single node, use :
oarsub -l /nodes=1/core=2
If you want all the cores of 2 nodes from xeon nodes with more than 80GB RAM each during 10 hours:
oarsub -p "cputype='xeon' and mem > 80000" -l /nodes=2,walltime=10:00:00
If you want 96 cores as 12 cores from 8 nodes from xeon nodes:
oarsub -p "cputype='xeon'" -l /nodes=8/core=12
You can make more specific reservations using additional resource tags. This job reserves a total of 16 cores as 8 cores from the same node on 2 different Infiniband network switches
oarsub -l /ibswitch=2/node=1/core=8
Reserve either 6 cores during 1 hour or 3 cores during 2 hours (moldable jobs, with a either-or
oarsub -l /core=6,walltime=1 -l /core=3,walltime=2
How do i reserve GPU resources ?
To reserve a single gpu, do:
oarsub -p "gpu='YES'" -l /gpunum=1
Several cores may be attached to a GPU, so, for example, on nefgpu05/06 , you will get 3 cores and 1 gpu; on nefgpu03/04 you will get one or two cores and 1 gpu
If you want mores gpus on a single node, say 4:
oarsub -p "gpu='YES'" -l /nodes=1/gpunum=4
If you want all the gpus on a node, during 4 hours
oarsub -p "gpu='YES'" -l /nodes=1,walltime=4
If you reserve a single core (-l /nodes=1/core=1) , you will NOT have exclusive access to the gpu attached to it
Remember: to check the available gpus and monitor them, use nvidia-smi
What is the format of a submission script ?
The script should not only includes the path to the program to execute, but also includes information about the needed resources (you can specify resources using oarsub options on the command line). Simple example : helloWorld.sh
# Submission script for the helloWorld program # # Comments starting with #OAR are used by the resource manager if using "oarsub -S" # # The job reserves 8 nodes with one processor (core) per node, # only on xeon nodes, job duration is less than 10min #OAR -l /nodes=8/core=1,walltime=00:10:00 #OAR -p cputype='xeon' # # The job is submitted to the default queue #OAR -q default # # Path to the binary to run ./helloWorld
You can mix parameters in the submission script and on the command line but take about how they combine. In this example the -p on the command line takes precedence over the script, while the -l from the script and the command line are combined (moldable jobs when using multiple -l options) :
oarsub -p "cputype='opteron'" -l /nodes=4/core=2 -S ./helloWorld.sh
What are the available queues ?
The limits and parameters of the queues are listed below :
queue name | max user resources | max duration (days) | priority | max user (hours*resources) |
default | 256 | 30 | 10 | 21504 |
big | 1024 | 30 | 5 | 2000 |
besteffort | 30 | 0 |
This means all jobs of a user running in the default queue at a given time can use at most 256 resources (eg 256 cores, or 128 cores with twice the default memory per core) with a cumulated reservation of 21504 hours*resources. Maximum walltime of each job is 30 days.
In other words a user can have at a time running jobs in the default queue using cumulated resources reservation of at most
- either 32 cores during 28 days with the default memory per core ;
- either 128 cores during 7 days with the default memory per core ;
- either 128 cores during 3 days 1/2 with twice the default memory per core ;
- either 256 cores during 3 days 1/2 with the default memory per core ;
- etc.
How are the jobs scheduled and prioritized ?
Jobs are scheduled :
- based on queue priority (jobs in higher priority queues are served first),
- and then based on the user Karma (for jobs of equal queue priority, jobs with lower Karma users are served first).
The user's fair share scheduling Karma measures his/her recent resources consumption during the last 30 days in a given queue. Resource consumption takes in account both the used resources and the requested (but unused) resources in a given queue with the same formula as detailed here. When you request or consume resources on the cluster, your priority in regard of other users decreases (as your Karma increases).
Jobs in the default and big queues wait until the requested resources can be reserved.
Jobs in the besteffort queue run without resource reservation : they are allowed to run as soon as there is available resource on the cluster (they are not subject to per user limits, etc.) but can be killed by the scheduler at any time when running if a non-besteffort job requests the used resource.
Using the besteffort queue enables a user to use more resources at a time than the per user limits and permits efficient cluster resource usage. Thus using the besteffort queue is encouraged for short jobs, jobs that can easily be resubmitted, etc.
How do i submit a job in the "big" queue ?
Use oarsub -q big
or use the equivalent option in your submission script (see submission script examples).
How do i submit a besteffort job ?
To submit a job to the best effort queue just use oarsub -t besteffort
or use the equivalent option in your submission script (see submission script examples).
Your jobs will be rescheduled automatically with the same behaviour if you additionnaly use the idempotent mode oarsub -t besteffort -t idempotent
OAR checkpoint facility may be useful for besteffort jobs but requires support by the running code.
How do i reserve resources in advance ?
Submit a job with oarsub -r "YYYY-MM-DD HH:MM:SS"
. A user can have at most 2 scheduled advance reservations at a given time.
How much memory (RAM) is allocated to my job ?
OAR is using the total amount of RAM of a node and divide it by the number of cores (minus a small amount for the system).
So for instance, if a node has 96GB of RAM and 12 cores, each reserved core will have ~8GB of RAM allocated by OAR. If you reserve only one core on this type of node, your job will be limited to ~8GB of RAM. RAM is counted for RSS (physical memory really used) not for VSZ (virtual memory allocated).
How can i change the memory (RAM) allocated to my job ?
If you need a single core, but more than the dedicated amount of RAM by core, you need to reserve more than one core. Since our cluster is heterogeneous (memory per core is not the same on each sub-cluster), it is not easy to have a single syntax to get the needed amount of memory.
You can use explicitly the mem_core property of OAR. If you want cores with a minimum amount of RAM per core, you can do (at lease 8GB per core in this example) :
oarsub -l '{mem_core > 8000}/nodes=1/core=3'
In this case, you will have 3 cores on the same node with at least 3x8GB = 24GB of RAM.
In this example you reserve a full node with at least 150GB of RAM :
oarsub -p 'mem > 150000' -l /nodes=1
For simple use cases (need to reserve a given amount of RAM, whatever the number of cores, on a single node), we have written a small wrapper around oarsub, called oarsub_mem (warning : still alpha, works only with simple cases). This wrapper understand a mem=XXg syntax. You can use it like this:
oarsub_mem -l mem=20g,walltime=1:0:0
How can i check the resources really used by a running or terminated job ?
Use the Colmet tool to view CPU and RAM usage profile of your job during or after its execution.
- warning : bug in Colmet, it crashes if you use 1 point per 5 seconds or more (eg: no more than 5 points for 30 seconds)
- warning : bug in Colmet, we observed that the reported RSS (RAM) is sometimes false
Alternatively, connect to a node while your job is running and check your process physical memory (RSS) usage and virtual memory (VSZ) usage with :
ps -o pid,command,vsz,rss -u yourlogin
How can i submit hundreds/thousands of jobs ?
You can have up to 500 jobs submitted at a time (includes jobs in all states : Waiting, Running, etc.).
OAR provides a feature called array job which allows the creation of multiple, similar jobs with one oarsub command.
Please consider using array jobs when submitting a large number of similar jobs to the queueing system. The obvious but inefficient way to do this would be to prepare a prototype job script and shell scripting a loop to call oarsub on this (possibly modified) job script the required number of times.
To submit an array comprised of array_number jobs use :
oarsub --array array_number
To submit an array comprised of array_number jobs with distinct parameters passed to each job use :
oarsub --array-param-file param_file
where param_file is a text file with array_number lines. Each line contains the arguments passed to the job with the corresponding index in the array, using shell syntax. Example for an array of 3 jobs :
foo 'a b' # First job receives 2 arguments : 'foo', 'a b' bar $HOME y # Second job receives 3 args : 'bar', the path to your homedir, y hi `hostname` $MYVAR # Third job receives 3 args : 'hi', result of hostname command, value of $MYVAR variable
Variables and commands are evaluated when launching the job not when running the oarsub command (thus in the user's context on the execution node, not on the submission frontend).
Don't use a parameter file with only one single line: the parameters in this line will be ignored. In other words OAR doesn't like arrays of size 1 :-(
When using a submission script, array job can be specified with a directive in the script :
#OAR --array array_number ##OR #OAR --array-param-file param_file
OAR creates one different job per member in the array, with the following environment variables :
- $OAR_JOB_ID : unique jobid for each member of the array
- $OAR_ARRAY_ID : common value for all members of the array (equal to the jobid of the first array member)
- $OAR_ARRAY_INDEX : unique index for each member of the array (first job has index 1, second job has index 2, etc.)
Example :
nef-devel2$ oarsub --array 2 ./runme Generate a job key... Generate a job key... OAR_JOB_ID=235542 OAR_JOB_ID=235543 OAR_ARRAY_ID=235542 nef-devel2$ oarstat --array 235542 Job id A. id A. index Name User Submission Date S Queue --------- --------- --------- ---------- -------- ------------------- - -------- 235542 235542 1 mvesin 2016-04-01 15:49:27 R default 235543 235542 2 mvesin 2016-04-01 15:49:27 R default nef-devel2$
When using oarsub -t besteffort -t idempotent
jobs with arrays, a job in the array may be killed while running and automatically resubmitted. In this case in the resubmitted job : $OAR_JOB_ID is the new jobid, $OAR_ARRAY_INDEX and $OAR_ARRAY_ID are unchanged.
Example of besteffort array member automatic resubmission with $OAR_ARRAY_ID = 235524, and job 235525 (array index 2) killed by OAR and resubmitted as 235527 :
nef-devel2$ oarstat --array 235524 Job id A. id A. index Name User Submission Date S Queue --------- --------- --------- ---------- -------- ------------------- - -------- 235524 235524 1 mvesin 2016-04-01 14:07:38 R besteffo 235525 235524 2 mvesin 2016-04-01 14:07:38 E besteffo 235527 235524 2 mvesin 2016-04-01 14:15:55 R besteffo nef-devel2$ oarstat -fj235527 | grep resubmit resubmit_job_id = 235525
How can i pass command line arguments to my job ?
oarsub does not have a command line option for this but you can pass parameters directly to your job, eg :
oarsub [-S] "./mycode abcde xyzt"
and then in ./mycode check $1 (abcde) and $2 (xyzt) variables, in the language specific syntax. Example :
# Submission script ./mycode # # Comments starting with #OAR are used by the resource manager if "oarsub -S" #OAR -p cputype='xeon' # pick first argument (abcde) in VAR1 VAR1=$1 # pick second argument (xyzt) in VAR2 VAR2=$2 # Place here your submission script body echo "var1=$VAR1 var2=$VAR2"
Another syntax for that :
oarsub [-S] "./mycode --VAR1 abcde --VAR2 xyzt"
and then in ./mycode use options parsing in the language specific syntax.
If you do not use the -S option of oarsub then you may prefer to use shell environment variables, eg :
oarsub -l /nodes=2/core=4 "env VAR1=abcde VAR2=xyzt ./myscript.sh"
How to Interact with a job when it is running
How do i connect to the nodes of my running job ?
Use oarsub -C jobid
to start an interactive shell on the master node of the job jobid, or use OAR_JOB_ID=jobid oarsh hostname
to connect to any node of the job.
To get the list of the job nodes do a cat $OAR_NODE_FILE
and then use oarsh hostname
to connect to other job nodes.
Other useful commands : oarcp
to copy files between nodes local filesystems, oarprint
to query resources allocated to the job (eg : oarprint host
for the list of the hostname your job is running on)
Please note ssh to the nodes is not allowed, but oarsh is a wrapper around ssh.
In which state is my job ?
The oarstat jobid
command let you show the state of job jobid and in which queue it has been scheduled.
Example for jobid 1839 :
nef-frontal$ oarstat -j 1839 Job id Name User Submission Date S Queue ---------- -------------- -------------- ------------------- - ---------- 1839 TEST_OAR rmichela 2015-08-21 17:49:08 T default
- the S column gives the the current state ( Waiting, Running, Launching, Terminating).
- the Queue column shows the job's queue
-f
gives full information about the job, --array
prints information for a whole array
You can use SQL syntax for advanced queries, example :
oarstat --sql "job_user='rmichela' and state='Terminated'"
When will my job be executed ?
oarstat -fj jobid | grep scheduledStart
gives an estimation on when your job will be started
How can i get the stderr or stdout of my job during its execution ?
oarpeek jobid
shows the stdout of jobid and oarpeek -e jobid
shows the stderr.
How can i cancel a job ?
oardel jobid
cancels job jobid.
How to know my Karma priorities ?
To see the Karma associated to one of your currently running jobs :
- use
oarstat -f -j jobid | grep Karma
- or use | Monika and click on jobid to view the job details
This gives your Karma for this job's queue at the time of the job submission.
If you want more details, the command oarstat -u login --accounting "YYYY-MM-DD, yyyy-mm-dd"
shows your resource consumption between two dates. The indicated Karma is the one of your last submitted job. To see the details of your resource consumption for a given queue use oarstat -u login --sql "queue_name = 'queue' " --accounting "YYYY-MM-DD, yyyy-mm-dd"
To see your time window used for Karma calculation use :
- yyyy-mm-dd = tomorrow
- YYYY-MM-DD = ( yyyy-mm-dd - 30 days )
Software
How to run an OpenMPI application?
The mpirun
binary included in openmpi run the application using the resources reserved by the jobs :
Submission script for OpenMPI : monAppliMPICH2.sh
The openmpi 1.10.1 version installed on nef is patched to discover automatically the ressources of your job, so you don't have to specify a machinefile.
# Fichier : monAppliOpenMPI.sh #!/bin/bash #OAR -l /nodes=3/core=1 source /etc/profile.d/modules.sh module load mpi/openmpi-1.10.1-gcc mpirun --prefix $MPI_HOME monAppliOpenMPI
in this case, mpirun
will start the MPI application on 3 nodes with a single core per node.
If you are using the main openmpi module (mpi/openmpi-x86_64) you have to add -machinefile $OAR_NODEFILE
module load mpi/openmpi-x86_64 mpirun --prefix $MPI_HOME -machinefile $OAR_NODEFILE monAppliOpenMPI
How to run an Intel MPI application?
the Intel compiler and mpi implementation is installed on nef. To run a mpi job:
#!/bin/bash #OAR -l /nodes=3/core=1 source /etc/profile.d/modules.sh module load mpi/intel64-5.1.1.109 mpirun -machinefile $OAR_NODEFILE monAppliIntelMPI
How can i run caffe ?
First you have to use a node with a GPU (it should be much faster with a GPU), for example:
oarsub -I -p "gpu='YES'" -l /nodes=1
Then you have to load the cuda and caffe modules:
source /etc/profile.d/modules.sh module load cuda/7.5 module load caffe/caffe-0.13 $CAFFE_HOME/build/tools/caffe
How can i use spark ?
Let's say you want to use spark on 4 nodes :
oarsub -I -l /nodes=4,walltime=3:0:0
This will reserve 4 nodes and start a shell on the first one (say nef107)
Then start the master:
./sbin/start-master.sh
Then you can start the slaves on three other nodes using oarsh ( the server URL in this case is spark://nef107.inria.fr:7077 ), like this:
for i in `uniq $OAR_NODEFILE | grep -v nef107`; do oarsh $i $HOME/spark-1.6.0-bin-hadoop2.6/sbin/start-slave.sh spark://nef107.inria.fr:7077 ; done
Then you can use spark, for ex. to run the sparkPi example:
export MASTER=spark://nef107.inria.fr:7077 ./bin/run-example SparkPi
To connect remotely to the WebUI you need to start Inria VPN (with vpn.inria.fr/all) or use SSH tunneling through nef-frontal.inria.fr
How can i run a graphical application on a node ?
First, connect to the nef frontend with ssh using the -X
option, then submit an interactive job like this , OAR will do the necessary to setup X11 forwarding:
oarsub -I ...
You can also use VirtualGL on GPU nodes, see this blog post
Can i use tensorflow?
Yes, you can install the CPU version of tensorflow (current GPUs on nef are too old and are not compatible with the GPU version of tensorflow) There is a conflict with the protobuf library, so you have to use virtualenv:
virtualenv ~/tensorflow cd tensorflow source bin/activate pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
What are the Matlab licences available ?
Matlab community licenses from Inria Sophia can be used on the cluster. They are shared with all the sites desktops and laptops. Please find here the complete licenses list.
What are the best practices for Matlab jobs ?
If launching many Matlab jobs at the same time, please launch them on as few nodes as possible. Matlab uses a floating licence per {node,user} couple. Eg :
- 10 jobs for user foo on 10 differents cores of nef012 node use 1 floating license,
- 1 job for user foo on each of nef01[0-9] nodes use 10 floating licenses.
Troubleshooting
Why is my job rejected at submission ?
The job system may refuse a job submission due to the admission rules, an explicit error message will be displayed, in case of contact the admin cluster team.
Most of the time it indicates that the requested resources are not available, which may be caused by a typo (eg -p "cluster='dell6220'"
instead of -p "cluster='dellc6220'"
).
Sometimes it may also be caused by some nodes being temporarily out of service. This may be verified typing oarnodes -s
for listing all nodes in service.
Another cause may be the job requested more resources than the total resources existing on the cluster.
Why is my job still Waiting while other jobs go Running ?
Many possible (normal) explanations include :
- other job may have higher priority : queue priority, user Karma
- your job requests currently unavailable resources (eg : only dellc6220 nodes while the other job accepts any node type)
- your job requests more resources than currently available and a lower priority job can be run before without delaying your job (best fit). Eg : you requested 4 nodes, only 2 are currently available, the 2 others will be available in 3 hours. A job requesting 2 nodes during at most 3 hours can be run before yours.
- the other job made an advance reservation of resources
- etc.
Why is my job still Waiting while some there are unused resources ?
Many possible (normal) explanations include :
- you have reached maximum resource reservation per user at a given time and your job is not besteffort
- resources are reserved for a higher priority job. Eg: a higher priority job requests 3 nodes, 2 are currently available, 1 will be available in 1 hour. Your job requests 1 node during 2 hours. Running your job would result in delaying a higher priority job.
- resources are reserved by an advance reservation (same example as above).
- etc.
I see several nodes in the StandBy state in Monika, are they available ?
Yes; it's because we have enabled the Energy Savings feature of OAR.
It means that when no jobs are waiting, OAR can decide to shut down nodes to save energy. As soon a new job is queued, OAR will automatically restart some nodes not enough nodes are alive. Usually, the nodes can boot in 2 minutes, so the job will wait at most a few minutes before starting.
Disks and filesystems
How can i access files on the cluster using sshfs ?
With sshfs you can access files on the cluster as a mounted filesystem on your client laptop/desktop.
Example for a Fedora 23 machine connected on Inria Sophia network (with user privileges) :
mylaptop$ mkdir $XDG_RUNTIME_DIR/nef mylaptop$ sshfs -o transform_symlinks nef-devel2.inria.fr: $XDG_RUNTIME_DIR/nef mylaptop$ fusermount -u $XDG_RUNTIME_DIR/nef
For a machine outside of Inria network :
- configure ssh tunneling through nef-frontal
- or mount on nef-frontal instead of nef-devel2 (lower performance)
Why the /data quota usage for users and groups do not match ?
- The group numbers indicates the long term storage quota usage by all the members of a group.
- The user numbers indicates the total disk usage of a user, long term storage plus scratch storage.
There is currently no simple way to get the long term storage quota usage by a single user.
Example :
- semir group is currently using 128.810 GiB out of its 1024 GiB long term storage usage quota which is the default quota for a team.
- user mvesin from group semir currently uses 10 GiB (mix of long term storage and scratch storage).
nef-devel2$ sudo nef-getquota -g semir Group quotas under /data, restricted to the given groups (sizes in GiB): Group Used Hard Declared semir 128.810 1024.000 1024.000 $default_data_quota Disk usage by user under /data for the semir group (sizes in GiB): User Used mvesin 210.000 fm 44.100
What are the performance of the different filesystems ?
This is a complex question that needs to be considered case by case :
- depends on the type of access (read/write/mix, long sequential/short random chunks, etc.)
- for /home and /data : overall performance is shared between jobs on all nodes of the cluster
- for /tmp and /dev/shm : overall performance is shared between jobs on the node
- etc.
Results of a test for big sequential write access (with caching disabled) :
- ~140 MB/s for /home access (shared between jobs on all nodes)
- ~250 MB/s for /data access (shared between jobs on all nodes) -- we expect to increase this by system tuning
- ~100-200 MB/s for /tmp access (shared between jobs on the node)
- ~2-3 GB/s for /dev/shm access (shared between jobs on the node)
What size can i use on the RAM filesystem ?
One limit is that the RAM filesystem (/dev/shm) space used by a job can be at most the RAM allocated to the job on this node (it is part of the resources allocated to the job).
The other limit is that the system of each node is configured with a total limit for the RAM filesystem (around 50% of the node RAM).