User Guide new config : Différence entre versions

De ClustersSophia
Aller à : navigation, rechercher
m (Mvesin a déplacé la page User Guide Oar vers User Guide new config)
Ligne 3 : Ligne 3 :
  
 
<div class="alert">
 
<div class="alert">
This is a BETA Version : No Warranty at all regarding Data and Jobs !
+
nef new configuration is currently a BETA VERSION with no warranty concerning jobs and data : service may be interrupted, configuration modified and data spaces may be cleared
 
</div>
 
</div>
 
== Front-end ==
 
== Front-end ==
Ligne 10 : Ligne 10 :
 
The cluster is based on several front-end servers and a lot of compute nodes.
 
The cluster is based on several front-end servers and a lot of compute nodes.
  
3 servers are available:
+
2 servers are available:
  
*    [https://nef-frontal.inria.fr nef-frontal.inria.fr] : main front-end, ssh and submission front-end
+
*    [https://nef-frontal.inria.fr nef-frontal.inria.fr] alias nef.inria.fr : main front-end, ssh and submission front-end
 
*    nef-devel2.inria.fr : compilation, ssh and submission front-end
 
*    nef-devel2.inria.fr : compilation, ssh and submission front-end
*    A storage server (no direct access for users)
 
  
 
Currently, the only way to use the cluster is to connect to one of the front-end with ssh. Then you can have access to the computing resources available by using the [[FAQ_Oar|Oar]] job manager.
 
Currently, the only way to use the cluster is to connect to one of the front-end with ssh. Then you can have access to the computing resources available by using the [[FAQ_Oar|Oar]] job manager.
Ligne 29 : Ligne 28 :
 
#    Then you can import your data / sources files using rsync,scp on the ''nef'' front-end and compile your programs on ''nef-devel2.inria.fr''
 
#    Then you can import your data / sources files using rsync,scp on the ''nef'' front-end and compile your programs on ''nef-devel2.inria.fr''
 
#    Use the [[FAQ_Oar|job manager]] to start your tasks
 
#    Use the [[FAQ_Oar|job manager]] to start your tasks
#    You can view the current running jobs using the [https://nef-frontal.inria.fr/monika Monika] web interface. You can also see jobs in gantt format [https://nef-frontal.inria.fr/drawgantt DrawGantt] .You can also view the system activity on nodes using [https://nef-services.inria.fr/ganglia/?c=Nef ganglia]
+
#    You can view the current running jobs using the [https://nef-frontal.inria.fr/monika Monika] web interface. You can also see jobs in gantt format [https://nef-frontal.inria.fr/drawgantt DrawGantt] .
  
 
== Disk space management ==
 
== Disk space management ==
 
 
Each user has a dedicated home directory on the storage server. The total available disk space for users is 7TB. All nodes have access to this storage using NFS.
 
  
 
<div class="alert">
 
<div class="alert">
Data stored on the cluster IS NOT backed up !
+
All data stored on the cluster ARE NOT backed up ! They are LOST and NOT RECOVERABLE in the case of accidental deletion or server hardware failure.
 
</div>
 
</div>
  
A quota system is activated on the shared storage server:
 
*    The soft limit is 40GB
 
*    The hard limit 350GB
 
*    The delay is 4 weeks
 
 
You can use 40GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.
 
 
A warning message will be sent by mail every Sunday when a limit is reached.
 
 
You can check you current disk occupation with the <code>quota -s</code> command
 
  
 +
Each user has a dedicated directory in the '''/home''' storage.
 +
*    A quota system is activated on the shared storage server:
 +
**    The soft limit is '''40GB'''
 +
**    The hard limit 350GB
 +
**    The delay is 4 weeks
 +
* You can use 40GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.
 +
* A warning message will be sent by mail every Sunday when a limit is reached.
 +
* You can check you current disk occupation with the <code>quota -s</code> command
  
More storage is available for the ABS, ASCLEPIOS, MORPHEME, NEUROMATHCOMP and TROPICS team members in /epi/<teamname>/<username>. Teams needing more storage should contact the [[Support|cluster administrators]].
 
  
 +
A distributed scalable file system is available under '''/data''' for several usages :
 +
* '''long term storage''' : 1TB quota '''per team''' shared among the team members ; teams may buy additional quota please contact the [[Support|cluster administrators]]
 +
* '''scratch storage''' (for transient / short term storage) : variable total size, no quota is currently applied per user or per team, but '''data may be periodically purged''' by administrators
 +
* data is tagged as long term storage or scratch storage based on Unix group of files :
 +
** use <code>chgrp ''my_team_unix_group'' ''my_file''</code> to tag ''my_file'' as long term storage
 +
** use <code>chgrp scratch ''my_file''</code> to tag ''my_file'' as scratch storage
 +
* check your quota with <code>beegfs-ctl --getquota  --uid ''my_logname''</code> or your team quota with <code>beegfs-ctl --getquota  --gid ''my_team_unix_group''</code>
  
  
A distributed file system is now available under /data for all users. Its first target is short term storage, quotas are applied by team (1Tb), Data could also change the group of their data to scratch group to use the scratch.
+
More (legacy) storage is available for the ABS, ASCLEPIOS, MORPHEME, NEUROMATHCOMP and TROPICS team members in /epi/<teamname>/<username>.  
<!--It has low performance for activities intensive on metadata (eg : compilation, reading/writing lots of small files).-->
 
 
  
  
 
There is also temporary disk space available on each node, in the /tmp directory
 
There is also temporary disk space available on each node, in the /tmp directory
  
*    1.1TB on Dell R815 nodes
 
*    100 GB on Dell PE1950 nodes
 
*    420GB on HP nodes
 
*    110GB on Carri nodes
 
  
 
== Softwares ==
 
== Softwares ==
Ligne 102 : Ligne 96 :
 
*    GSL & GLPK
 
*    GSL & GLPK
 
*    boost 1.58
 
*    boost 1.58
 
 
<!--== Energy savings ==
 
 
 
<div class="info">
 
Energy savings is currently disabled.
 
</div>
 
 
Since the cluster is using an important amount of electricity (for the nodes and for the cooling system), unused computing nodes are automatically shutdown during nights and weekends
 
 
*    unused nodes are powered off at 22:00 every night
 
*    nodes are powered on every morning at 08:00 (except saturdays and sundays)
 
 
<div class="info">
 
If you really need to start new jobs during nights or week-ends, you can manually power on nodes using a dedicated command (to be executed on nef.inria.fr): wakeup-nodes
 
</div>-->
 

Version du 21 août 2015 à 17:42


nef new configuration is currently a BETA VERSION with no warranty concerning jobs and data : service may be interrupted, configuration modified and data spaces may be cleared

Front-end

The cluster is based on several front-end servers and a lot of compute nodes.

2 servers are available:

  • nef-frontal.inria.fr alias nef.inria.fr : main front-end, ssh and submission front-end
  • nef-devel2.inria.fr : compilation, ssh and submission front-end

Currently, the only way to use the cluster is to connect to one of the front-end with ssh. Then you can have access to the computing resources available by using the Oar job manager.

The nef cluster is not part of the internal (production) network of Inria; therefore Inria (iLDAP) accounts/passwords are not used.

First steps

  1. First you need to apply for an account on the nef cluster. You must give your ssh public key to have an account.
  2. Then connect to one of the main front-end nef.inria.fr or nef-devel2.inria.fr using ssh
  3. Then you can import your data / sources files using rsync,scp on the nef front-end and compile your programs on nef-devel2.inria.fr
  4. Use the job manager to start your tasks
  5. You can view the current running jobs using the Monika web interface. You can also see jobs in gantt format DrawGantt .

 Disk space management

All data stored on the cluster ARE NOT backed up ! They are LOST and NOT RECOVERABLE in the case of accidental deletion or server hardware failure.


Each user has a dedicated directory in the /home storage.

  • A quota system is activated on the shared storage server:
    • The soft limit is 40GB
    • The hard limit 350GB
    • The delay is 4 weeks
  • You can use 40GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.
  • A warning message will be sent by mail every Sunday when a limit is reached.
  • You can check you current disk occupation with the quota -s command


A distributed scalable file system is available under /data for several usages :

  • long term storage : 1TB quota per team shared among the team members ; teams may buy additional quota please contact the cluster administrators
  • scratch storage (for transient / short term storage) : variable total size, no quota is currently applied per user or per team, but data may be periodically purged by administrators
  • data is tagged as long term storage or scratch storage based on Unix group of files :
    • use chgrp my_team_unix_group my_file to tag my_file as long term storage
    • use chgrp scratch my_file to tag my_file as scratch storage
  • check your quota with beegfs-ctl --getquota --uid my_logname or your team quota with beegfs-ctl --getquota --gid my_team_unix_group


More (legacy) storage is available for the ABS, ASCLEPIOS, MORPHEME, NEUROMATHCOMP and TROPICS team members in /epi/<teamname>/<username>.


There is also temporary disk space available on each node, in the /tmp directory


 Softwares

All nodes are installed using a Linux Centos 7 64bit distribution

Main softwares available on the cluster:

  • 3.10.0 linux x86_64 kernel
  • PGI 14.10 compilers
  • GCC 4.8.3 (C,C++, Fortran77, Fortran95 compilers)
  • OpenMPI 1.8.8
  • Paraview 3.14.1
  • MvaMpich2 1.9a2
  • Java 1.6.0_24 + java3D 1.5.2
  • Petsc 3.4.5
  • Matlab 2015a (/opt/matlab2015a/bin/matlab), Matlab 2011b (/usr/local/matlab2011b/bin/matlab)
  • CUDA 5.0 (/usr/local/cuda) + CUDA SDK (/usr/local/cuda/samples)
  • DDT debugger (/opt/allinea/ddt, see Documentation)

Other tools/libraries available:

  • blas, atlas
  • blacs and scalapack (in /usr/local/lib64)
  • openblas 0.2.8 (in /opt/openblas/)
  • gmsh 2.6.1 (in /opt/gmsh)
  • Python (including scipy, numpy, pycuda)
  • Trilinos (/opt/Trilinos)
  • mesa 7.7.1(/opt/mesa)
  • Erlang
  • GDB & DDD
  • Valgrind
  • CGAL
  • GSL & GLPK
  • boost 1.58