User Guide : Différence entre versions

Version actuelle datée du 18 mars 2016 à 18:29

This documentation describes the obsolete "legacy nef" configuration which will definitively stop on 17 April 2016 : please use the new nef configuration

What will become my data on "legacy nef" when the cluster stops ?

/home/<username> and /epi/<teamname>/<username> : still exist on the "new nef" under the same path
/home/<username>/workspace : symbolic link to /dfs/workspace/<username> will be removed
/dfs : will be suppressed
- /dfs/workspace/<username> : a temporary copy will be made on "new nef" to /data/<teamname>/user/<username>/TEMPO-OLD-DFS after legacy cluster stops.
  - If you wish to switch from /dfs to /data before the 17 April 2016 please submit a ticket to the helpdesk
- /dfs/<teamname> : a temporary copy will be made on "new nef" to /data/<teamname>/share/TEMPO-OLD-DFS after legacy cluster stops
- important note : temporary copy belongs to the "scratch" Unix group and is subject to the scratch space purge policy. To move the temporary copy to your team long term storage please follow the new nef guidelines.
other data will be lost (node local files)

What will become the nodes from legacy nef when the cluster stops ?

Hardware from the legacy cluster will be re-installed in the new cluster

Front-end

The cluster is based on several front-end servers and a lot of compute nodes.

3 servers are available:

nef-frontal.inria.fr : main front-end, ssh and submission front-end
nef-devel.inria.fr : compilation, ssh and submission front-end
A storage server (no direct access for users)

Currently, the only way to use the cluster is to connect to one of the front-end with ssh. Then you can have access to the computing resources available by using the Torque job manager.

The nef cluster is not part of the internal (production) network of Inria; therefore Inria (iLDAP) accounts/passwords are not used.

First steps

First you need to apply for an account on the nef cluster. You must give your ssh public key to have an account.
Then connect to one of the main front-end nef.inria.fr or nef-devel.inria.fr using ssh
Then you can import your data / sources files using scp on the nef front-end and compile your programs on nef-devel.inria.fr
Use the job manager to start your tasks
You can view the current running jobs using the Monika web interface. You can also view the system activity on nodes using ganglia

Disk space management

Each user has a dedicated home directory on the storage server. The total available disk space for users is 15TB. All nodes have access to this storage using NFS.

Data stored on the cluster IS NOT backed up !

A quota system is activated on the shared storage server:

The soft limit is 150GB
The hard limit 600GB
The delay is 4 weeks

You can use 150GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.

A warning message will be sent by mail every Sunday when a limit is reached.

You can check you current disk occupation with the quota -s command

More storage is available for the ABS, ASCLEPIOS, MORPHEME, NEUROMATHCOMP and TROPICS team members in /epi/<teamname>/<username>. Teams needing more storage should contact the cluster administrators.

A 19TB experimental distributed scratch space is available under /dfs for all users. Its first target is short term storage, quotas or time limits may be applied in the future if it becomes permanently saturated. It has low performance for activities intensive on metadata (eg : compilation, reading/writing lots of small files).

There is also temporary disk space available on each node, in the /tmp directory

1.1TB on Dell R815 nodes
100 GB on Dell PE1950 nodes
420GB on HP nodes
110GB on Carri nodes

Softwares

All nodes are installed using a Linux Fedora 16 64bit distribution

Main softwares available on the cluster:

3.4.7 linux x86_64 kernel
PGI 13.5 compilers
GCC 4.6.3 (C,C++, Fortran77, Fortran95 compilers)
OpenMPI 1.6.3
Paraview 3.14.1
MvaMpich2 1.9a2
Java 1.6.0_24 + java3D 1.5.2
Petsc 3.4.2
Matlab 2015a (/opt/matlab2015a/bin/matlab), Matlab 2011b (/usr/local/matlab2011b/bin/matlab)
CUDA 5.0 (/usr/local/cuda) + CUDA SDK (/usr/local/cuda/samples)
DDT debugger (/opt/allinea/ddt, see Documentation)

Other tools/libraries available:

blas, atlas
blacs and scalapack (in /usr/local/lib64)
openblas 0.2.8 (in /opt/openblas/)
gmsh 2.6.1 (in /opt/gmsh)
Python (including scipy, numpy, pycuda)
Trilinos (/opt/Trilinos)
mesa 7.7.1(/opt/mesa)
Erlang
GDB & DDD
Valgrind
CGAL
GSL & GLPK
boost 1.47

@@ Ligne 1 : / Ligne 1 : @@
+<div class="alert">
+This documentation describes the obsolete "legacy nef" configuration which will definitively stop on 17 April 2016 : please use the [[User_Guide_new_config| new nef configuration]]
+</div>
+<div class="alert">
+What will become my data on "legacy nef" when the cluster stops ?
+* '''/home/''<username>'' '''and '''/epi/''<teamname>''/''<username>'' ''' : still exist on the "new nef" under the same path
+* '''/home/''<username>''/workspace''' : symbolic link to /dfs/workspace/''<username>'' will be removed
+* /dfs : will be suppressed
+** '''/dfs/workspace/''<username>'' ''': a temporary copy will be made on "new nef" to /data/''<teamname>''/user/''<username>/TEMPO-OLD-DFS'' after legacy cluster stops.
+*** If you wish to switch from /dfs to /data before the 17 April 2016 please [[Support|submit a ticket to the helpdesk]]
+** '''/dfs/''<teamname>'' ''': a temporary copy will be made on "new nef" to /data/''<teamname>''/share/TEMPO-OLD-DFS after legacy cluster stops
+** important note : temporary copy belongs to the "scratch" Unix group and is subject to the scratch space purge policy. To move the temporary copy to your team long term storage please [[User_Guide_new_config#.E2.80.82Disk_space_management|follow the new nef guidelines]].
+* '''other data will be lost''' (node local files)
+</div>
+<div class="alert">
+What will become the nodes from ''legacy nef'' when the cluster stops ?
+* Hardware from the legacy cluster will be re-installed in the new cluster
+</div>
 {{entete}}
-==  Front-end ==
+== Front-end ==
@@ Ligne 14 : / Ligne 35 : @@
 *    A storage server (no direct access for users)
-Currently, the only way to use the cluster is to connect to one of the front-end with ssh. Then you can have access to the computing resources available by using the [[ClusterFaq Torque]] job manager.
+Currently, the only way to use the cluster is to connect to one of the front-end with ssh. Then you can have access to the computing resources available by using the [[FAQ|Torque]] job manager.
-'''Info :''' {{{The nef cluster is not part of the internal (production) network of Inria; therefore Inria (iLDAP) accounts/passwords are not used.}}}
+<div class="info">
+The nef cluster is not part of the internal (production) network of Inria; therefore Inria (iLDAP) accounts/passwords are not used.
+</div>
 == First steps ==
@@ Ligne 23 : / Ligne 45 : @@
 #    First you need to [https://nef-services.inria.fr/account/request apply for an account] on the nef cluster. You must give your ssh public key to have an account.
-#    Then connect to the main front-end ''nef.inria.fr'' using ssh: during the first login, a dedicated ssh key will be created ; it's usage will be restricted to the nef cluster. '''Use an empty passphrase''' for this key (and only for this key !) (just press enter when asked for a passphrase)
+#    Then connect to one of the main front-end ''nef.inria.fr'' or ''nef-devel.inria.fr'' using ssh
 #    Then you can import your data / sources files using scp on the ''nef'' front-end and compile your programs on ''nef-devel.inria.fr''
-#    Use the [[ClusterFaq job manager]] to start your tasks
+#    Use the [[FAQ|job manager]] to start your tasks
 #    You can view the current running jobs using the [https://nef-services.inria.fr/cgi-bin/monika.cgi Monika] web interface. You can also view the system activity on nodes using [https://nef-services.inria.fr/ganglia/?c=Nef ganglia]
+== Disk space management ==
-==  Disk space management ==
+Each user has a dedicated home directory on the storage server. The total available disk space for users is 15TB. All nodes have access to this storage using NFS.
-Each user has a dedicated home directory on the storage server. The available disk space for users is 7TB. All nodes have access to this storage using NFS.
+<div class="alert">
+Data stored on the cluster IS NOT backed up !
- Data stored on the cluster IS NOT backed up !
+</div>
 A quota system is activated on the shared storage server:
+*    The soft limit is 150GB
-*    The soft limit is 40GB
+*    The hard limit 600GB
-*    The hard limit 350GB
 *    The delay is 4 weeks
-You can use 40GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.
+You can use 150GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.
 A warning message will be sent by mail every Sunday when a limit is reached.
@@ Ligne 48 : / Ligne 70 : @@
 You can check you current disk occupation with the <code>quota -s</code> command
-More storage is available for the ASCLEPIOS, OPALE, and TROPICS team members in /epi/<teamname>/<username>. Teams needing more storage should contact the [[Support|cluster administrators]].
-More storage is also available for the NEUROMATHCOMP, ABS, ODYSSEE and GEOMETRICA team members in <code>/epi/<teamname></code>.
+More storage is available for the ABS, ASCLEPIOS, MORPHEME, NEUROMATHCOMP and TROPICS team members in /epi/<teamname>/<username>. Teams needing more storage should contact the [[Support|cluster administrators]].
-There is also disk space available on each node, in the /tmp directory
+A 19TB experimental distributed scratch space is available under /dfs for all users. Its first target is short term storage, quotas or time limits may be applied in the future if it becomes permanently saturated.
+It has low performance for activities intensive on metadata (eg : compilation, reading/writing lots of small files).
+There is also temporary disk space available on each node, in the /tmp directory
 *    1.1TB on Dell R815 nodes
@@ Ligne 58 : / Ligne 86 : @@
 *    420GB on HP nodes
 *    110GB on Carri nodes
 == Softwares ==
@@ Ligne 67 : / Ligne 96 : @@
 *    3.4.7 linux x86_64 kernel
-*    [[ToolsPGI|PGI]] 13.5 compilers
+*    [[PGI|PGI]] 13.5 compilers
 *    GCC 4.6.3 (C,C++, Fortran77, Fortran95 compilers)
-*    [[ToolsOpenMPI|OpenMPI]] 1.6.3
+*    [[OpenMPI|OpenMPI]] 1.6.3
-*    [[ToolsParaview|Paraview]] 3.14.1
+*    [[Paraview|Paraview]] 3.14.1
-*    [[ToolsMpich2|MvaMpich2]] 1.9a2
+*    [[MPICH2|MvaMpich2]] 1.9a2
 *    Java 1.6.0_24 + java3D 1.5.2
 *    Petsc 3.4.2
-*    Matlab 2011b
+*    Matlab 2015a (<code>/opt/matlab2015a/bin/matlab</code>), Matlab 2011b (<code>/usr/local/matlab2011b/bin/matlab</code>)
 *    CUDA 5.0 (/usr/local/cuda) + CUDA SDK (/usr/local/cuda/samples)
-*    DDT debugger (<code>/opt/allinea/ddt</code>, see [[sed_ren:DDT|Documentation]])
+*    DDT debugger (<code>/opt/allinea/ddt</code>, see [https://wiki.inria.fr/sed_ren/DDT Documentation])
 Other tools/libraries available:
@@ Ligne 95 : / Ligne 124 : @@
-== Energy savings ==
+<!--== Energy savings ==
- Energy savings is currently disabled.
+<div class="info">
+Energy savings is currently disabled.
+</div>
 Since the cluster is using an important amount of electricity (for the nodes and for the cooling system), unused computing nodes are automatically shutdown during nights and weekends
@@ Ligne 105 : / Ligne 136 : @@
 *    nodes are powered on every morning at 08:00 (except saturdays and sundays)
- If you really need to start new jobs during nights or week-ends, you can manually power on nodes using a dedicated command (to be executed on nef.inria.fr): wakeup-nodes
+<div class="info">
+If you really need to start new jobs during nights or week-ends, you can manually power on nodes using a dedicated command (to be executed on nef.inria.fr): wakeup-nodes
+</div>-->

User Guide : Différence entre versions

Version actuelle datée du 18 mars 2016 à 18:29

Sommaire

Front-end

First steps

Disk space management

Softwares

Menu de navigation

Outils personnels

Espaces de noms

Variantes

Affichages

Plus

Rechercher

Navigation

Clusters Howto

Clusters Guides & Tools

Outils