User Guide new config : Différence entre versions
(Suppr nef.inria.fr (alias à nef-devel)) |
|||
(61 révisions intermédiaires par 4 utilisateurs non affichées) | |||
Ligne 6 : | Ligne 6 : | ||
The cluster is based on several front-end servers and a lot of compute nodes. 2 servers are available: | The cluster is based on several front-end servers and a lot of compute nodes. 2 servers are available: | ||
− | * '''[https://nef-frontal.inria.fr nef-frontal.inria.fr]''' | + | * '''[https://nef-frontal.inria.fr nef-frontal.inria.fr]''' : ssh from the Internet, job submission |
− | * '''nef-devel2.inria.fr''' : compilation, job submission, ssh from nef-frontal and Inria Sophia local network | + | * '''nef-devel.inria.fr''' and '''nef-devel2.inria.fr''' : compilation, job submission, ssh from nef-frontal and Inria Sophia local network |
− | To use the cluster, please connect to one of the front-ends using a SSH client. | + | To use the cluster, please connect to one of the front-ends using a SSH client, and public/private ssh keypair authentication (reminder: private key is used on your desktop/laptop, public key is used on Nef). |
Then access the computing resources available by using the [[FAQ_Oar|OAR job manager]]. | Then access the computing resources available by using the [[FAQ_Oar|OAR job manager]]. | ||
Ligne 20 : | Ligne 20 : | ||
== First steps == | == First steps == | ||
+ | Please make sure you have read the [[Usage_policy|cluster usage policy]]. | ||
− | === | + | <div class="info"> |
+ | nef cluster authentication uses ssh RSA keypair in the openssh format. Please convert your ssh public key before submitting an account request if using another ssh client (eg: [https://docs.oseems.com/general/application/putty/convert-ppk-to-ssh-key export Putty public key] in openssh format). | ||
+ | </div> | ||
+ | |||
+ | === Account === | ||
+ | |||
+ | * First check that you are [[FAQ_new_config#Who_can_have_an_account_on_the_cluster_.3F|eligible to nef access]]. | ||
+ | * Then pass the administrative authorization step : | ||
+ | ** Inria users can skip this step | ||
+ | ** OPAL users need to get an [http://univ-cotedazur.eu/opal-computing-center OPAL accreditation] from their local provider (eg: people whose primary affiliation or project is UCA get accredited for OPAL by UCA) | ||
+ | ** other Inria academic and industrial partners need to sign an agreement with Inria | ||
+ | * Now you can [https://nef-services.inria.fr/account/request apply for an account] on the nef cluster. You must give your ssh public key (openssh format) to have an account. When doing an account renewal please mention in the description field. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Command line access === | === Command line access === | ||
− | Command line access with ssh | + | Command line access with ssh : |
− | # | + | # Connect to the front-end ''nef-frontal.inria.fr'' using ssh |
− | + | # From ''nef'' you can then connect to ''nef-devel.inria.fr'' or ''nef-devel2.inria.fr'' using ssh | |
− | # From ''nef'' you can then connect to ''nef-devel2.inria.fr'' using ssh | + | # Then you can import your data / sources files using rsync on the ''nef-frontal'' front-end and compile your programs on ''nef-devel.inria.fr'' and ''nef-devel2.inria.fr'' |
− | # Then you can import your data / sources files using rsync | ||
# Use the [[FAQ_Oar|job manager]] to start your tasks | # Use the [[FAQ_Oar|job manager]] to start your tasks | ||
# You can view the current running jobs using the [https://nef-frontal.inria.fr/monika Monika] web interface. You can also see jobs in gantt format [https://nef-frontal.inria.fr/drawgantt DrawGantt] and have a system load view with [https://nef-services.inria.fr/ganglia Ganglia]. | # You can view the current running jobs using the [https://nef-frontal.inria.fr/monika Monika] web interface. You can also see jobs in gantt format [https://nef-frontal.inria.fr/drawgantt DrawGantt] and have a system load view with [https://nef-services.inria.fr/ganglia Ganglia]. | ||
+ | |||
+ | |||
+ | == Acknowledgment == | ||
+ | |||
+ | <div class="info"> | ||
+ | |||
+ | By using the cluster you accept the [[Usage_policy#Acknowledgment | acknowledgement policy]] for your works that benefited from the platform (citation, HAL referencement) | ||
+ | |||
+ | </div> | ||
== Disk space management == | == Disk space management == | ||
Ligne 49 : | Ligne 64 : | ||
− | Each user has a | + | === /home === |
+ | |||
+ | Each user has a home directory in the '''/home''' storage. | ||
* A quota system is activated on the shared storage server: | * A quota system is activated on the shared storage server: | ||
** The soft limit is '''150GB''' | ** The soft limit is '''150GB''' | ||
Ligne 59 : | Ligne 76 : | ||
<div class="alert"> | <div class="alert"> | ||
− | ''' | + | Files in <code>/home/''user''</code> are removed after ''user'' account expiration and a grace delay (currently : 8 months), please [[FAQ_new_config#What_to_do_with_my_data_before_my_account_expires_.3F|sort your data before account expiration]] |
</div> | </div> | ||
+ | |||
+ | |||
+ | === /data === | ||
A distributed scalable file system is available under '''/data''' for several usages : | A distributed scalable file system is available under '''/data''' for several usages : | ||
* '''long term storage''' : 1TB quota '''per team''' shared among the team members ; teams may buy additional quota please contact the [[Support|cluster administrators]] | * '''long term storage''' : 1TB quota '''per team''' shared among the team members ; teams may buy additional quota please contact the [[Support|cluster administrators]] | ||
− | ** disk quota is available for sale | + | ** disk quota is available for sale : price (2019) is 1kEuro for 5.0 TiB during 7 years (tax excluded). |
* '''scratch storage''' (for transient / short term storage) : variable total size, no quota is currently applied per user or per team, but '''data may be periodically purged''' by administrators | * '''scratch storage''' (for transient / short term storage) : variable total size, no quota is currently applied per user or per team, but '''data may be periodically purged''' by administrators | ||
− | * data is tagged as long term storage or scratch storage based on Unix group of files | + | * data is tagged as long term storage or scratch storage based on Unix group of files [[FAQ_new_config#How_do_i_tag_files_on_.2Fdata_to_the_scratch_or_long_term_storage_.3F|using standard Unix group commands and rules]] |
− | ** | + | ** check your quota with <code>sudo nef-getquota -u <usernames></code> or your team quota with <code>sudo nef-getquota -g <groupnames></code> |
− | + | ** or check your quota with <code>sudo beegfs-ctl --getquota --uid ''my_logname''</code> or your team quota with <code>sudo beegfs-ctl --getquota --gid ''my_team_unix_group''</code> | |
+ | |||
+ | <div class="alert"> | ||
+ | Files in <code>/data/''team''/user/''user''</code> are removed after ''user'' account expiration and a grace delay (currently : 8 months), please [[FAQ_new_config#What_to_do_with_my_data_before_my_account_expires_.3F|sort your data before account expiration]] | ||
+ | </div> | ||
+ | |||
+ | |||
+ | === local storage === | ||
− | + | Additional local storage exist on specific nodes under '''/local''', typically a SSD disk (check [[Hardware | node description]] for node details). | |
− | * | + | * on common nodes, '''/local/tmp''' is a scratch filesystem for all users, files older than 90 days are automatically deleted, |
+ | * on [[FAQ_new_config#What_is_a_dedicated_node_.3F | dedicated nodes]], /local is reserved for the users with privileged access to the node. Disk may be mounted under /local or under subdirectories giving a hint: '''/local/mixed''' (mixed-use SSD), '''/local/read''' (read-intensive SSD), etc. | ||
There is also temporary disk space available on each node for jobs transient files : | There is also temporary disk space available on each node for jobs transient files : | ||
* '''/tmp''' : node local hard disk | * '''/tmp''' : node local hard disk | ||
− | * '''/dev/shm''' : RAM filesystem | + | * '''/dev/shm''' : RAM (memory) filesystem |
− | + | Local storages cannot be accessed from other nodes : you need to be running a job on the node to access it. | |
− | |||
Ligne 90 : | Ligne 117 : | ||
* 3.10.0 linux x86_64 kernel | * 3.10.0 linux x86_64 kernel | ||
− | * PGI 14.10 compilers | + | * Nvidia HPC SDK 20.11, and legacy PGI 14.10 19.10 community edition compilers |
− | * GCC (C,C++, Fortran77, Fortran95 compilers) 4.8.3 | + | * GCC (C,C++, Fortran77, Fortran95 compilers) |
− | * OpenMPI 1.10. | + | ** default: 4.8.5 |
− | * Paraview 4. | + | ** via modules: 5.3.0 6.2.0 7.3.0 9.2.0 |
− | * MPICH2 | + | ** via Software Collection (devtoolset-8): 8.3.1 |
+ | * devtoolset-8 | ||
+ | * OpenMPI 1.10.7 / 2.0.0 | ||
+ | * Paraview 4.4.1 | ||
+ | * MPICH2 2.0 | ||
* Java 1.8 + java3D 1.5.2 | * Java 1.8 + java3D 1.5.2 | ||
− | * Matlab | + | * Matlab (<code>matlab2019a</code>, <code>matlab2018a</code>, <code>matlab2017a</code>, <code>matlab2015a</code>) |
− | * | + | ** and Matlab runtime (<code>/opt/matlab2018a_runtime</code>, <code>/opt/matlab2017a_runtime</code> and <code>/opt/matlab2015a_runtime</code> ) |
− | + | * Scilab 5.5.2 (via modules) | |
− | * | + | * Maple 2015 (<code>maple</code>) and Maple 2021 (via modules) |
− | * CUDA 7.5 (/usr/local/cuda) | + | * Intel Parallel Studio 2015 2016 2018 (via modules) |
− | * DDT debugger ( | + | * CUDA 7.5 (/usr/local/cuda) 8.0 9.1 10.0 (via modules) |
+ | * DDT debugger 5.1 6.1 7.0 (via modules), see [https://wiki.inria.fr/sed_ren/DDT Documentation] | ||
<div class="info"> | <div class="info"> | ||
Ligne 108 : | Ligne 140 : | ||
* <code>module load ''module_name''</code> properly configures your current session for using ''module_name'' | * <code>module load ''module_name''</code> properly configures your current session for using ''module_name'' | ||
* <code>module list</code> list loaded modules | * <code>module list</code> list loaded modules | ||
+ | </div> | ||
+ | <div class="info"> | ||
+ | Software Collection is another way to select soltware: | ||
+ | * <code>scl enable devtoolset-8 bash</code> Start using software collection '''devtoolset-8''' (ex: gcc-8.3.1) | ||
+ | * Beware: do not reset the PATH variable nor other "list" variables in your .bashrc and .bash_profile othewisse '''scl enable''' will fail to setup the wanted environment. | ||
+ | Ref: https://www.softwarecollections.org/en/scls/rhscl/devtoolset-8/ | ||
+ | |||
+ | |||
</div> | </div> | ||
Ligne 116 : | Ligne 156 : | ||
* gmsh 2.8.5 | * gmsh 2.8.5 | ||
* Python (including scipy, numpy, pycuda, pip) | * Python (including scipy, numpy, pycuda, pip) | ||
− | * R | + | * R 4.0.4 (2021-02-15) |
* Erlang | * Erlang | ||
* GDB & DDD | * GDB & DDD | ||
* Valgrind | * Valgrind | ||
* GSL & GLPK | * GSL & GLPK | ||
− | * boost | + | * boost 1.58.0 |
== Slides == | == Slides == | ||
[[:Media:Atelier-cluster-20160329.pdf|Slides presenting the cluster]] are also available. | [[:Media:Atelier-cluster-20160329.pdf|Slides presenting the cluster]] are also available. |
Version actuelle datée du 21 mai 2024 à 16:24
Sommaire
Front-end
The cluster is based on several front-end servers and a lot of compute nodes. 2 servers are available:
- nef-frontal.inria.fr : ssh from the Internet, job submission
- nef-devel.inria.fr and nef-devel2.inria.fr : compilation, job submission, ssh from nef-frontal and Inria Sophia local network
To use the cluster, please connect to one of the front-ends using a SSH client, and public/private ssh keypair authentication (reminder: private key is used on your desktop/laptop, public key is used on Nef).
Then access the computing resources available by using the OAR job manager.
The nef cluster is not part of the internal (production) network of Inria; therefore Inria (iLDAP) accounts/passwords are not used.
First steps
Please make sure you have read the cluster usage policy.
nef cluster authentication uses ssh RSA keypair in the openssh format. Please convert your ssh public key before submitting an account request if using another ssh client (eg: export Putty public key in openssh format).
Account
- First check that you are eligible to nef access.
- Then pass the administrative authorization step :
- Inria users can skip this step
- OPAL users need to get an OPAL accreditation from their local provider (eg: people whose primary affiliation or project is UCA get accredited for OPAL by UCA)
- other Inria academic and industrial partners need to sign an agreement with Inria
- Now you can apply for an account on the nef cluster. You must give your ssh public key (openssh format) to have an account. When doing an account renewal please mention in the description field.
Command line access
Command line access with ssh :
- Connect to the front-end nef-frontal.inria.fr using ssh
- From nef you can then connect to nef-devel.inria.fr or nef-devel2.inria.fr using ssh
- Then you can import your data / sources files using rsync on the nef-frontal front-end and compile your programs on nef-devel.inria.fr and nef-devel2.inria.fr
- Use the job manager to start your tasks
- You can view the current running jobs using the Monika web interface. You can also see jobs in gantt format DrawGantt and have a system load view with Ganglia.
Acknowledgment
By using the cluster you accept the acknowledgement policy for your works that benefited from the platform (citation, HAL referencement)
Disk space management
All data stored on the cluster ARE NOT backed up ! They are LOST and NOT RECOVERABLE in the case of accidental deletion or server hardware failure.
/home
Each user has a home directory in the /home storage.
- A quota system is activated on the shared storage server:
- The soft limit is 150GB
- The hard limit 600GB
- The delay is 4 weeks
- You can use 150GB of data without restrictions; as soon as the soft limit is reached, you have 4 weeks in which to delete files and go back under the soft limit. You can never use more than the hard limit.
- A warning message will be sent by mail every Sunday when a limit is reached.
- You can check you current disk occupation with the
quota -s
command
Files in /home/user
are removed after user account expiration and a grace delay (currently : 8 months), please sort your data before account expiration
/data
A distributed scalable file system is available under /data for several usages :
- long term storage : 1TB quota per team shared among the team members ; teams may buy additional quota please contact the cluster administrators
- disk quota is available for sale : price (2019) is 1kEuro for 5.0 TiB during 7 years (tax excluded).
- scratch storage (for transient / short term storage) : variable total size, no quota is currently applied per user or per team, but data may be periodically purged by administrators
- data is tagged as long term storage or scratch storage based on Unix group of files using standard Unix group commands and rules
- check your quota with
sudo nef-getquota -u <usernames>
or your team quota withsudo nef-getquota -g <groupnames>
- or check your quota with
sudo beegfs-ctl --getquota --uid my_logname
or your team quota withsudo beegfs-ctl --getquota --gid my_team_unix_group
- check your quota with
Files in /data/team/user/user
are removed after user account expiration and a grace delay (currently : 8 months), please sort your data before account expiration
local storage
Additional local storage exist on specific nodes under /local, typically a SSD disk (check node description for node details).
- on common nodes, /local/tmp is a scratch filesystem for all users, files older than 90 days are automatically deleted,
- on dedicated nodes, /local is reserved for the users with privileged access to the node. Disk may be mounted under /local or under subdirectories giving a hint: /local/mixed (mixed-use SSD), /local/read (read-intensive SSD), etc.
There is also temporary disk space available on each node for jobs transient files :
- /tmp : node local hard disk
- /dev/shm : RAM (memory) filesystem
Local storages cannot be accessed from other nodes : you need to be running a job on the node to access it.
Softwares
All nodes are installed using a Linux CentOS 7 64bit distribution
Main softwares available on the cluster:
- 3.10.0 linux x86_64 kernel
- Nvidia HPC SDK 20.11, and legacy PGI 14.10 19.10 community edition compilers
- GCC (C,C++, Fortran77, Fortran95 compilers)
- default: 4.8.5
- via modules: 5.3.0 6.2.0 7.3.0 9.2.0
- via Software Collection (devtoolset-8): 8.3.1
- devtoolset-8
- OpenMPI 1.10.7 / 2.0.0
- Paraview 4.4.1
- MPICH2 2.0
- Java 1.8 + java3D 1.5.2
- Matlab (
matlab2019a
,matlab2018a
,matlab2017a
,matlab2015a
)- and Matlab runtime (
/opt/matlab2018a_runtime
,/opt/matlab2017a_runtime
and/opt/matlab2015a_runtime
)
- and Matlab runtime (
- Scilab 5.5.2 (via modules)
- Maple 2015 (
maple
) and Maple 2021 (via modules) - Intel Parallel Studio 2015 2016 2018 (via modules)
- CUDA 7.5 (/usr/local/cuda) 8.0 9.1 10.0 (via modules)
- DDT debugger 5.1 6.1 7.0 (via modules), see Documentation
Environment modules make it easier to use managed software :
module avail
show all available modulesmodule load module_name
properly configures your current session for using module_namemodule list
list loaded modules
Software Collection is another way to select soltware:
scl enable devtoolset-8 bash
Start using software collection devtoolset-8 (ex: gcc-8.3.1)- Beware: do not reset the PATH variable nor other "list" variables in your .bashrc and .bash_profile othewisse scl enable will fail to setup the wanted environment.
Ref: https://www.softwarecollections.org/en/scls/rhscl/devtoolset-8/
Other tools/libraries available:
- blas, atlas
- openblas
- gmsh 2.8.5
- Python (including scipy, numpy, pycuda, pip)
- R 4.0.4 (2021-02-15)
- Erlang
- GDB & DDD
- Valgrind
- GSL & GLPK
- boost 1.58.0
Slides
Slides presenting the cluster are also available.