User Guide new config
Sommaire
Front-end
The cluster is based on several front-end servers and a lot of compute nodes. 2 servers are available:
- nef-frontal.inria.fr alias nef.inria.fr : ssh from the Internet, job submission
- nef-devel2.inria.fr : compilation, job submission, ssh from nef-frontal and Inria Sophia local network
To use the cluster, please connect to one of the front-ends using a SSH client.
Then access the computing resources available by using the OAR job manager.
The nef cluster is not part of the internal (production) network of Inria; therefore Inria (iLDAP) accounts/passwords are not used.
First steps
Web access
Web access with Kali is the preferred mode for simple usage :
- Connect to Kali web portal : if you have an Inria account just Sign in/up with CAS ; if you have no Inria account use Sign in/up
- From Kali, apply for an account on nef through the Clusters > Overview page
- Follow Kali online helps to prepare and launch your jobs ; please submit a ticket if you need assistance to get started with Kali
- You can view the current running jobs using the Monika web interface. You can also see jobs in gantt format DrawGantt and have a system load view with Ganglia.
Command line access
Command line access with ssh enables advanced usage :
- First you need to apply for an account on the nef cluster. You must give your ssh public key to have an account.
- Then connect to the front-end nef.inria.fr using ssh
- From nef you can then connect to nef-devel2.inria.fr using ssh
- Then you can import your data / sources files using rsync,scp on the nef front-end and compile your programs on nef-devel2.inria.fr
- Use the job manager to start your tasks
- You can view the current running jobs using the Monika web interface. You can also see jobs in gantt format DrawGantt and have a system load view with Ganglia.
Disk space management
All data stored on the cluster ARE NOT backed up ! They are LOST and NOT RECOVERABLE in the case of accidental deletion or server hardware failure.
Each user has a dedicated directory in the /home storage.
- A quota system is activated on the shared storage server:
- The soft limit is 150GB
- The hard limit 600GB
- The delay is 4 weeks
- You can use 150GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.
- A warning message will be sent by mail every Sunday when a limit is reached.
- You can check you current disk occupation with the
quota -s
command
/dfs (and symbolic link /home/<username>/workspace) will be suppressed on 17 April 2016. Running jobs still using this path at that date will fail.
A distributed scalable file system is available under /data for several usages :
- long term storage : 1TB quota per team shared among the team members ; teams may buy additional quota please contact the cluster administrators
- disk quota is available for sale ! price (late 2015) is 1kEuro for 5 TiB * 7 years
- scratch storage (for transient / short term storage) : variable total size, no quota is currently applied per user or per team, but data may be periodically purged by administrators
- data is tagged as long term storage or scratch storage based on Unix group of files :
- use
chgrp my_team_unix_group my_file
to tag my_file as long term storage - use
chgrp scratch my_file
to tag my_file as scratch storage
- use
- check your quota with
sudo beegfs-ctl --getquota --uid my_logname
or your team quota withsudo beegfs-ctl --getquota --gid my_team_unix_group
- Or also check your quota with
sudo nef-getquota -u <usernames>
or your team quota withsudo nef-getquota -g <groupnames>
There is also temporary disk space available on each node for jobs transient files :
- /tmp : node local hard disk
- /dev/shm : RAM filesystem
Legacy storage (hardware out of maintenance) is available for the ASCLEPIOS, MORPHEME, TROPICS team members in /epi/<teamname>/<username>.
Softwares
All nodes are installed using a Linux CentOS 7 64bit distribution
Main softwares available on the cluster:
- 3.10.0 linux x86_64 kernel
- PGI 14.10 compilers
- GCC (C,C++, Fortran77, Fortran95 compilers) 4.8.3 (
/usr/bin/gcc
) and 5.3.0 (/misc/opt/gcc-5.3.0/bin
) - OpenMPI 1.10.1 / 1.6.4
- Paraview 4.2.0
- MPICH2 1.9a2
- Java 1.8 + java3D 1.5.2
- Matlab 2015a (
matlab2015a matlab
) - Maple 2015 (
maple
) - Intel Parallel Studio 2015 (
/misc/opt/intel2015/
) - Intel Parallel Studio 2016 (
/misc/opt/intel2016/
) - CUDA 7.5 (/usr/local/cuda) + CUDA SDK (/usr/local/cuda/samples)
- DDT debugger (
/opt/allinea/ddt
, see Documentation)
Environment modules make it easier to use managed software :
module avail
show all available modulesmodule load module_name
properly configures your current session for using module_namemodule list
list loaded modules
Other tools/libraries available:
- blas, atlas
- openblas
- gmsh 2.8.5
- Python (including scipy, numpy, pycuda, pip)
- R 3.2.1
- Erlang
- GDB & DDD
- Valgrind
- GSL & GLPK
- boost 1.53.0 and 1.58.0
Slides
Slides presenting the cluster are also available.