User Guide : Différence entre versions
Ligne 14 : | Ligne 14 : | ||
* A storage server (no direct access for users) | * A storage server (no direct access for users) | ||
− | Currently, the only way to use the cluster is to connect to one of the front-end with ssh. Then you can have access to the computing resources available by using the [[ClusterFaq Torque]] job manager. | + | Currently, the only way to use the cluster is to connect to one of the front-end with ssh. Then you can have access to the computing resources available by using the [[ClusterFaq|Torque]] job manager. |
<div class="info"> | <div class="info"> | ||
Ligne 26 : | Ligne 26 : | ||
# Then connect to the main front-end ''nef.inria.fr'' using ssh: during the first login, a dedicated ssh key will be created ; it's usage will be restricted to the nef cluster. '''Use an empty passphrase''' for this key (and only for this key !) (just press enter when asked for a passphrase) | # Then connect to the main front-end ''nef.inria.fr'' using ssh: during the first login, a dedicated ssh key will be created ; it's usage will be restricted to the nef cluster. '''Use an empty passphrase''' for this key (and only for this key !) (just press enter when asked for a passphrase) | ||
# Then you can import your data / sources files using scp on the ''nef'' front-end and compile your programs on ''nef-devel.inria.fr'' | # Then you can import your data / sources files using scp on the ''nef'' front-end and compile your programs on ''nef-devel.inria.fr'' | ||
− | # Use the [[ClusterFaq job manager]] to start your tasks | + | # Use the [[ClusterFaq|job manager]] to start your tasks |
# You can view the current running jobs using the [https://nef-services.inria.fr/cgi-bin/monika.cgi Monika] web interface. You can also view the system activity on nodes using [https://nef-services.inria.fr/ganglia/?c=Nef ganglia] | # You can view the current running jobs using the [https://nef-services.inria.fr/cgi-bin/monika.cgi Monika] web interface. You can also view the system activity on nodes using [https://nef-services.inria.fr/ganglia/?c=Nef ganglia] | ||
Ligne 101 : | Ligne 101 : | ||
+ | <div class="info"> | ||
Energy savings is currently disabled. | Energy savings is currently disabled. | ||
+ | </div> | ||
Since the cluster is using an important amount of electricity (for the nodes and for the cooling system), unused computing nodes are automatically shutdown during nights and weekends | Since the cluster is using an important amount of electricity (for the nodes and for the cooling system), unused computing nodes are automatically shutdown during nights and weekends | ||
Ligne 108 : | Ligne 110 : | ||
* nodes are powered on every morning at 08:00 (except saturdays and sundays) | * nodes are powered on every morning at 08:00 (except saturdays and sundays) | ||
+ | <div class="info"> | ||
If you really need to start new jobs during nights or week-ends, you can manually power on nodes using a dedicated command (to be executed on nef.inria.fr): wakeup-nodes | If you really need to start new jobs during nights or week-ends, you can manually power on nodes using a dedicated command (to be executed on nef.inria.fr): wakeup-nodes | ||
+ | <div class="info"> |
Version du 5 décembre 2014 à 17:47
Front-end
The cluster is based on several front-end servers and a lot of compute nodes.
3 servers are available:
- nef-frontal.inria.fr : main front-end, ssh and submission front-end
- nef-devel.inria.fr : compilation, ssh and submission front-end
- A storage server (no direct access for users)
Currently, the only way to use the cluster is to connect to one of the front-end with ssh. Then you can have access to the computing resources available by using the Torque job manager.
The nef cluster is not part of the internal (production) network of Inria; therefore Inria (iLDAP) accounts/passwords are not used.
First steps
- First you need to apply for an account on the nef cluster. You must give your ssh public key to have an account.
- Then connect to the main front-end nef.inria.fr using ssh: during the first login, a dedicated ssh key will be created ; it's usage will be restricted to the nef cluster. Use an empty passphrase for this key (and only for this key !) (just press enter when asked for a passphrase)
- Then you can import your data / sources files using scp on the nef front-end and compile your programs on nef-devel.inria.fr
- Use the job manager to start your tasks
- You can view the current running jobs using the Monika web interface. You can also view the system activity on nodes using ganglia
Disk space management
Each user has a dedicated home directory on the storage server. The available disk space for users is 7TB. All nodes have access to this storage using NFS.
Data stored on the cluster IS NOT backed up !
A quota system is activated on the shared storage server:
- The soft limit is 40GB
- The hard limit 350GB
- The delay is 4 weeks
You can use 40GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.
A warning message will be sent by mail every Sunday when a limit is reached.
You can check you current disk occupation with the quota -s
command
More storage is available for the ASCLEPIOS, OPALE, and TROPICS team members in /epi/<teamname>/<username>. Teams needing more storage should contact the cluster administrators.
More storage is also available for the NEUROMATHCOMP, ABS, ODYSSEE and GEOMETRICA team members in /epi/<teamname>
.
There is also disk space available on each node, in the /tmp directory
- 1.1TB on Dell R815 nodes
- 100 GB on Dell PE1950 nodes
- 420GB on HP nodes
- 110GB on Carri nodes
Softwares
All nodes are installed using a Linux Fedora 16 64bit distribution
Main softwares available on the cluster:
- 3.4.7 linux x86_64 kernel
- PGI 13.5 compilers
- GCC 4.6.3 (C,C++, Fortran77, Fortran95 compilers)
- OpenMPI 1.6.3
- Paraview 3.14.1
- MvaMpich2 1.9a2
- Java 1.6.0_24 + java3D 1.5.2
- Petsc 3.4.2
- Matlab 2011b
- CUDA 5.0 (/usr/local/cuda) + CUDA SDK (/usr/local/cuda/samples)
- DDT debugger (
/opt/allinea/ddt
, see Documentation)
Other tools/libraries available:
- blas, atlas
- blacs and scalapack (in /usr/local/lib64)
- openblas 0.2.8 (in /opt/openblas/)
- gmsh 2.6.1 (in /opt/gmsh)
- Python (including scipy, numpy, pycuda)
- Trilinos (/opt/Trilinos)
- mesa 7.7.1(/opt/mesa)
- Erlang
- GDB & DDD
- Valgrind
- CGAL
- GSL & GLPK
- boost 1.47
Energy savings
Energy savings is currently disabled.
Since the cluster is using an important amount of electricity (for the nodes and for the cooling system), unused computing nodes are automatically shutdown during nights and weekends
- unused nodes are powered off at 22:00 every night
- nodes are powered on every morning at 08:00 (except saturdays and sundays)
If you really need to start new jobs during nights or week-ends, you can manually power on nodes using a dedicated command (to be executed on nef.inria.fr): wakeup-nodes