User Guide new config : Différence entre versions

De ClustersSophia
Aller à : navigation, rechercher
Ligne 77 : Ligne 77 :
 
* A warning message will be sent by mail every Sunday when a limit is reached.
 
* A warning message will be sent by mail every Sunday when a limit is reached.
 
* You can check you current disk occupation with the <code>quota -s</code> command
 
* You can check you current disk occupation with the <code>quota -s</code> command
 
+
* Files are removed at account expiration after a grace delay (currently : 8 months)
  
 
=== /data ===
 
=== /data ===

Version du 9 juillet 2018 à 17:29


Front-end

The cluster is based on several front-end servers and a lot of compute nodes. 2 servers are available:

  • nef-frontal.inria.fr : ssh from the Internet, job submission
  • nef-devel.inria.fr alias nef.inria.fr and nef-devel2.inria.fr : compilation, job submission, ssh from nef-frontal and Inria Sophia local network

To use the cluster, please connect to one of the front-ends using a SSH client.

Then access the computing resources available by using the OAR job manager.

The nef cluster is not part of the internal (production) network of Inria; therefore Inria (iLDAP) accounts/passwords are not used.


First steps

Please make sure you have read the cluster usage policy.

nef cluster uses ssh keys in the openssh format. Please convert your ssh public key before submitting an account request if using another ssh client (eg: export Putty public key in openssh format).

Command line access

Command line access with ssh enables advanced usage :

  1. First you need to apply for an account on the nef cluster. You must give your ssh public key (openssh format) to have an account. When doing an account renewal please mention in the description field.
  2. Then connect to the front-end nef-frontal.inria.fr using ssh
  3. From nef you can then connect to nef-devel.inria.fr or nef-devel2.inria.fr using ssh
  4. Then you can import your data / sources files using rsync on the nef-frontal front-end and compile your programs on nef-devel.inria.fr and nef-devel2.inria.fr
  5. Use the job manager to start your tasks
  6. You can view the current running jobs using the Monika web interface. You can also see jobs in gantt format DrawGantt and have a system load view with Ganglia.

Web access

Support for Kali is now discontinued. Kali is still partly functional.

Web access with Kali is a mode for simple usage :

  1. Connect to Kali web portal : if you have an Inria account just Sign in/up with CAS ; if you have no Inria account use Sign in/up
  2. From Kali, apply for an account on nef through the Clusters > Overview page
  3. Follow Kali online helps to prepare and launch your jobs ; please submit a ticket if you need assistance to get started with Kali
  4. You can view the current running jobs using the Monika web interface. You can also see jobs in gantt format DrawGantt and have a system load view with Ganglia.


Acknowledgment

By using the cluster you accept the following term of usage : all publications about works conducted / results obtained using the cluster acknowledge usage of the platform, please find a suggested formulation below :

The authors are grateful to Inria Sophia Antipolis - Méditerranée
"Nef" computation cluster for providing resources and support.

 Disk space management

All data stored on the cluster ARE NOT backed up ! They are LOST and NOT RECOVERABLE in the case of accidental deletion or server hardware failure.


/home

Each user has a home directory in the /home storage.

  • A quota system is activated on the shared storage server:
    • The soft limit is 150GB
    • The hard limit 600GB
    • The delay is 4 weeks
  • You can use 150GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.
  • A warning message will be sent by mail every Sunday when a limit is reached.
  • You can check you current disk occupation with the quota -s command
  • Files are removed at account expiration after a grace delay (currently : 8 months)

/data

A distributed scalable file system is available under /data for several usages :

  • long term storage : 1TB quota per team shared among the team members ; teams may buy additional quota please contact the cluster administrators
    • disk quota is available for sale : price (2018) is 1kEuro for 7.5 TiB during 7 years (tax excluded).
  • scratch storage (for transient / short term storage) : variable total size, no quota is currently applied per user or per team, but data may be periodically purged by administrators
  • data is tagged as long term storage or scratch storage based on Unix group of files using standard Unix group commands and rules
    • check your quota with sudo nef-getquota -u <usernames> or your team quota with sudo nef-getquota -g <groupnames>
    • or check your quota with sudo beegfs-ctl --getquota --uid my_logname or your team quota with sudo beegfs-ctl --getquota --gid my_team_unix_group


local storage

Additional local storage exist on specific nodes under /local, typically a SSD disk (check node description for node details).

  • on common nodes, /local/tmp is a scratch filesystem for all users, files older than 90 days are automatically deleted,
  • on dedicated nodes, /local is reserved for the users with privileged access to the node. Disk may be mounted under /local or under subdirectories giving a hint: /local/mixed (mixed-use SSD), /local/read (read-intensive SSD), etc.


There is also temporary disk space available on each node for jobs transient files :

  • /tmp : node local hard disk
  • /dev/shm : RAM (memory) filesystem

Local storages cannot be accessed from other nodes : you need to be running a job on the node to access it.


 Softwares

All nodes are installed using a Linux CentOS 7 64bit distribution

Main softwares available on the cluster:

  • 3.10.0 linux x86_64 kernel
  • PGI 14.10 compilers
  • GCC (C,C++, Fortran77, Fortran95 compilers) 4.8.5 (default) and 5.3.0 6.2.0 (via modules)
  • OpenMPI 1.10.7 / 2.0.0
  • Paraview 4.4.1
  • MPICH2 2.0
  • Java 1.8 + java3D 1.5.2
  • Matlab2017a (matlab2017a) Matlab 2015a (matlab2015a matlab )
  • Scilab 5.5.2 (via modules)
  • Maple 2015 (maple)
  • Intel Parallel Studio 2015 2016 2018 (via modules)
  • CUDA 7.5 (/usr/local/cuda) 8.0 9.1 (via modules)
  • DDT debugger 5.1 6.1 7.0 (via modules), see Documentation

Environment modules make it easier to use managed software :

  • module avail show all available modules
  • module load module_name properly configures your current session for using module_name
  • module list list loaded modules

Other tools/libraries available:

  • blas, atlas
  • openblas
  • gmsh 2.8.5
  • Python (including scipy, numpy, pycuda, pip)
  • R 3.4.3
  • Erlang
  • GDB & DDD
  • Valgrind
  • GSL & GLPK
  • boost 1.58.0

Slides

Slides presenting the cluster are also available.