User Guide new config : Différence entre versions

De ClustersSophia
Aller à : navigation, rechercher
Ligne 83 : Ligne 83 :
  
 
*    3.10.0 linux x86_64 kernel
 
*    3.10.0 linux x86_64 kernel
*    [[PGI|PGI]] 14.10 compilers
+
*    PGI 14.10 compilers
 
*    GCC (C,C++, Fortran77, Fortran95 compilers) 4.8.3 (<code>/usr/bin/gcc</code>) and 5.3.0 (<code>/misc/opt/gcc-5.3.0/bin</code>)
 
*    GCC (C,C++, Fortran77, Fortran95 compilers) 4.8.3 (<code>/usr/bin/gcc</code>) and 5.3.0 (<code>/misc/opt/gcc-5.3.0/bin</code>)
 
*    OpenMPI 1.10.1 / 1.6.4
 
*    OpenMPI 1.10.1 / 1.6.4
Ligne 95 : Ligne 95 :
 
*    CUDA 7.0 (/usr/local/cuda) + CUDA SDK (/usr/local/cuda/samples)
 
*    CUDA 7.0 (/usr/local/cuda) + CUDA SDK (/usr/local/cuda/samples)
 
*    DDT debugger (<code>/opt/allinea/ddt</code>, see [https://wiki.inria.fr/sed_ren/DDT Documentation])  
 
*    DDT debugger (<code>/opt/allinea/ddt</code>, see [https://wiki.inria.fr/sed_ren/DDT Documentation])  
*    Environment Modules: type <code>module avail</code> to list all available modules:
+
 
blacs/1.1-ompi-1.10.1
+
<div class="info">
boost/1.58.0
+
Environment modules make it easier to use managed software :
caffe/caffe-0.13
+
* <code>module avail</code> list all available modules :  
cmake/cmake-3.4.1
+
* <code>module load ''module_name''</code> properly configures your environment for using ''module_name''
cuda/7.0
+
</div>
ddt/5.1-43967
 
gcc/5.3.0
 
hdf5/1.8.16-ompi-1.10.1
 
hypre/2.10.0b
 
intel/intel64-2015
 
intel/intel64-2016
 
likwid/4.0.1
 
metis/5.1.0
 
mpi/intel64-5.0.3.048
 
mpi/intel64-5.1.1.109
 
mpi/openmpi-1.10.1-gcc
 
mpi/openmpi-x86_64
 
mpi/mvapich2-x86_64
 
mpi/openmpi-1.8.8-pgi
 
pastix/5.2.2.20-nompi
 
petsc/3.6.3
 
pgi/pgi-14.10
 
qwt/6.1.2-qt5
 
scalapack/1.8.0-ompi-1.10.1
 
scotch/6.0.4
 
trilinos/trilinos-12.2.1
 
vtk/6.2.0-qt5
 
  
 
Other tools/libraries available:
 
Other tools/libraries available:

Version du 16 mars 2016 à 15:58


Front-end

The cluster is based on several front-end servers and a lot of compute nodes.

2 servers are available:

  • nef-frontal.inria.fr alias nef.inria.fr : main front-end, ssh and submission front-end
  • nef-devel2.inria.fr : compilation and submission front-end

Currently, the only way to use the cluster is to connect to one of the front-end with ssh. Then you can have access to the computing resources available by using the Oar job manager.

The nef cluster is not part of the internal (production) network of Inria; therefore Inria (iLDAP) accounts/passwords are not used.

First steps

Web access

Web access with Kali is the preferred mode for simple usage :

  1. Connect to Kali web portal : if you have an Inria account just Sign in/up with CAS ; if you have no Inria account use Sign in/up
  2. From Kali, apply for an account on nef through the Clusters > Overview page
  3. Follow Kali online helps to prepare and launch your jobs ; please submit a ticket if you need assistance to get started with Kali
  4. You can view the current running jobs using the Monika web interface. You can also see jobs in gantt format DrawGantt .

Command line access

Command line access with ssh enables advanced usage :

  1. First you need to apply for an account on the nef cluster. You must give your ssh public key to have an account.
  2. Then connect to the front-end nef.inria.fr using ssh
  3. From nef you can then connect to nef-devel2.inria.fr using ssh
  4. Then you can import your data / sources files using rsync,scp on the nef front-end and compile your programs on nef-devel2.inria.fr
  5. Use the job manager to start your tasks
  6. You can view the current running jobs using the Monika web interface. You can also see jobs in gantt format DrawGantt .


 Disk space management

All data stored on the cluster ARE NOT backed up ! They are LOST and NOT RECOVERABLE in the case of accidental deletion or server hardware failure.


Each user has a dedicated directory in the /home storage.

  • A quota system is activated on the shared storage server:
    • The soft limit is 150GB
    • The hard limit 600GB
    • The delay is 4 weeks
  • You can use 150GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.
  • A warning message will be sent by mail every Sunday when a limit is reached.
  • You can check you current disk occupation with the quota -s command


A distributed scalable file system is available under /data for several usages :

  • long term storage : 1TB quota per team shared among the team members ; teams may buy additional quota please contact the cluster administrators
    • disk quota is available for sale ! price (late 2015) is 1kEuro for 5 TiB * 7 years
  • scratch storage (for transient / short term storage) : variable total size, no quota is currently applied per user or per team, but data may be periodically purged by administrators
  • data is tagged as long term storage or scratch storage based on Unix group of files :
    • use chgrp my_team_unix_group my_file to tag my_file as long term storage
    • use chgrp scratch my_file to tag my_file as scratch storage
  • check your quota with sudo beegfs-ctl --getquota --uid my_logname or your team quota with sudo beegfs-ctl --getquota --gid my_team_unix_group
  • Or also check your quota with sudo nef-getquota -u <usernames> or your team quota with sudo nef-getquota -g <groupnames>

More (legacy) storage is available for the ABS, ASCLEPIOS, MORPHEME, NEUROMATHCOMP and TROPICS team members in /epi/<teamname>/<username>.


There is also temporary disk space available on each node, in the /tmp directory

 Softwares

All nodes are installed using a Linux Centos 7 64bit distribution

Main softwares available on the cluster:

  • 3.10.0 linux x86_64 kernel
  • PGI 14.10 compilers
  • GCC (C,C++, Fortran77, Fortran95 compilers) 4.8.3 (/usr/bin/gcc) and 5.3.0 (/misc/opt/gcc-5.3.0/bin)
  • OpenMPI 1.10.1 / 1.6.4
  • Paraview 4.2.0
  • MPICH2 1.9a2
  • Java 1.8 + java3D 1.5.2
  • Matlab 2015a (matlab2015a matlab)
  • Maple 2015 (maple)
  • Intel Parallel Studio 2015 (/misc/opt/intel2015/ )
  • Intel Parallel Studio 2016 (/misc/opt/intel2016/ )
  • CUDA 7.0 (/usr/local/cuda) + CUDA SDK (/usr/local/cuda/samples)
  • DDT debugger (/opt/allinea/ddt, see Documentation)

Environment modules make it easier to use managed software :

  • module avail list all available modules :
  • module load module_name properly configures your environment for using module_name

Other tools/libraries available:

  • blas, atlas
  • openblas
  • gmsh 2.8.5
  • Python (including scipy, numpy, pycuda, pip)
  • R 3.2.1
  • Erlang
  • GDB & DDD
  • Valgrind
  • GSL & GLPK
  • boost 1.53.0 and 1.58.0