PGI : Différence entre versions
Ligne 137 : | Ligne 137 : | ||
for the C++ compiler. | for the C++ compiler. | ||
− | + | <div class="alert"> | |
+ | By default, PGI compilers generate code that is optimized for the type of processor on which compilation is performed, the compilation host. This can be a problem if you want to run your application on all the nodes of the cluster (xeon and opteron). | ||
+ | </div> | ||
The PGI 11.1 compilers can produce PGI Unified Binary object or executable files containing code streams fully optimized and supported for both AMD and Intel x64 CPUs. | The PGI 11.1 compilers can produce PGI Unified Binary object or executable files containing code streams fully optimized and supported for both AMD and Intel x64 CPUs. | ||
− | + | <div class="info"> | |
+ | To generate code optimized for both architectures, use: -tp x64 | ||
+ | </div> | ||
To generate code optimized only for Xeon/quadcore, use: | To generate code optimized only for Xeon/quadcore, use: |
Version du 9 décembre 2014 à 18:48
Sommaire
Features
The PGI Cluster Development Kit (CDK) version 11.1 from The Portland Group, Inc. is installed on INRIA Sophia cluster. The PGI CDK 11.1 consists of the follow components:
Two network-floating seats of the following compilers and tools:
- Floating multi-user seats for PGI's parallel Fortran, C, and C++ compilers for Linux -- industry-leading single-processor performance and integrated native support for all 3 popular parallel programming models: HPF, OpenMP, and MPI.
- Graphical MPI and OpenMP Linux Cluster debugging (PGDBG®) and parallel performance profiling (PGPROF®) tools.
- Pre-compiled/pre-configured MPICH message-passing libraries and utilities ( including MVAPich with infiniband support)
- Pre-compiled ScaLAPACK parallel math library
- Optimized BLAS and LAPACK serial math libraries
- Tutorial examples and programs to help you get your codes up and running quickly using HPF, OpenMP, and MPI messaging
A partial list of the technical features supported by the PGI compilers includes the following:
- PGHPF data parallel compiler with native Full HPF language support
- PGF95 OpenMP and auto-parallel Fortran 95 compiler
- PGF77 OpenMP and auto-parallel FORTRAN 77 compiler
- PGC++ OpenMP and auto-parallel ANSI and cfront-compatible C++ compiler
- PGCC OpenMP and auto-parallel ANSI/K&R C compiler
- PGDBG multi-process/multi-thread graphical debugger
- PGPROF multi-process/multi-thread graphical performance profiler
- Full 64-bit support on AMD Opteron, AMD Athlon 64 and Intel Pentium and Xeon with EM64T including full support for -mcmodel=medium and single data objects > 2GB
- Includes separate 32-bit x86 and 64-bit EM64T/AMD64 development environments and compilers
- Optimizing 64-bit code generators with automatic or manual platform selection
- Executables generated by PGI's 32-bit x86 compilers can run unchanged on AMD64 or EM64T processor-based systems
- AMD Opteron and Intel EM64T optimizations including SSE/SSE2, prefetching, use of extended register sets, and 64-bit addressing
- Intel Pentium II/III/4/Xeon and AMD Athlon XP/MP optimizations including SSE/SSE2 and prefetching where supported in hardware
- Large file (> 2GB) support in Fortran on 32-bit x86 systems
- -r8/-i8 compilation flags, 64-bit integers
- Full support for Fortran 95 extensions
- Optimized ACML version 2.5 math library supported on all targets
- Highly-tuned math intrinsics library routines
- One pass interprocedural analysis (IPA)
- Interprocedural optimization of libraries
- Profile feedback optimization
- Function inlining including library functions
- Vectorization, loop interchange, loop splitting
- Loop unrolling, loop fusion, and cache tiling
- Support for creation of shared objects on Linux and DLLs on Windows
- Cray/DEC/IBM compatibility (including Cray POINTERs)
- Support for SGI-compatible DOACROSS in PGF77 and PGF95, and for SGI-compatible parallelization pragmas in PGCC C and C++
- Byte-swapping I/O for RISC/UNIX interoperability
- Integrated cpp pre-processing
- Threads-based auto-parallelization using PGF77, PGF95, and
- PGCC C and C++
- Full support for OpenMP in PGF77, PGF95, and PGCC C and C++
- Process/CPU affinity support in SMP/OpenMP applications
- FORALL and F95 array assignment merging
- Re-use of communication schedules
- Complete implementation of the HPF Library
- Parallelization of irregular DO loops, FORALLs, and array assignments
- HPF parallelization using direct accesses to shared memory
- Fully upward compatible with PGHPF for high-end parallel systems
- Support for graphical HPF profiling and performance tuning
PGI 2010 New Features and Performance:
- PGI Accelerator™ x64+GPU native Fortran 95/03 and C99 compilers now support the full PGI Accelerator Programming Model v1.0 standard for directive-based GPU programming and optimization.
- Now supported on Linux, MacOS and Windows
- Device-resident data using MIRROR, REFLECTED, UPDATE directives
- COMPLEX and DOUBLE COMPLEX data, Fortran derived types, C structs
- Automatic GPU-side loop unrolling, support for the UNROLL clause
- Support for Accelerator regions nested within OpenMP parallel regions
- PGI CUDA Fortran extensions supported in the PGI 2010 Fortran 95/03 compiler enable explicit CUDA GPU programming
- Declare variables in CUDA GPU device, constant or shared memory
- Dynamically allocate page-locked pinned host memory, CUDA device main memory, constant memory and shared memory
- Move data between host and GPU with Fortran assignment statements
- Declare explicit CUDA grids/thread-blocks to launch GPU compute kernels
- Support for CUDA Runtime API functions and features
- Efficient host-side emulation for easy CUDA Fortran debugging
- PGI Fortran 2003 incremental features. See full list below.
- PGC++/ PGCC enhancements include the latest EDG release 4.1 front-end with enhanced GNU and Microsoft compatibility, extern inline support, improved BOOST support, thread-safe exception handling
- PGI Visual Fortran supports launching and debugging of MSMPI programs on Windows clusters from within Visual Studio, adds support for the PGI Accelerator Programming model and PGI CUDA Fortran on NVIDIA CUDA-enabled GPUs, and now includes the standalone PGPROF performance profiler with CCFF support.
- Compiler optimizations and enhancements include OpenMP support for up to 256 cores, support for AVX code generation, C++ inlining and executable size improvements,
- PGPROF parallel OpenMP performance analysis and tuning tool
- Uniform cross-platform performance profiling without re-compiling or any special software privileges on Linux, MacOS and Windows
- PGI Accelerator and CUDA Fortran GPU-side performance statistics
- Updated graphical user interface
- Latest Operating Systems supported including RHEL 5, Fedora 11, SLES 11, SuSE 11.1, Ubuntu 9, Windows 7 and Mac OS X Snow Leopard
- Updated Documentation including the PGI Users Guide, PGI Tools Guide and PVF Users Guide
Documentation
Documentation includes the following:
- PGI User's Guide, PGHPF User's Guide, PGHPF Reference Manual, and Release Notes
See also the documentations files in nef-devel:/usr/local/pgi/linux86-64/current/doc on the cluster.
Usage
Setup
To initialize your environment to use the PGI CDK, issue the following commands if you're using csh:
% source /usr/local/pgi/pgi_user.csh
or the following if you're using sh, ksh, or bash:
% . /usr/local/pgi/pgi_user.sh
It is recommended that you place these initialization commands in your shell startup files so that you'll have access to the PGI CDK compilers and tools at each future login.
You'll be able to use the PGI Fortran, C, and C++ compilers and tools on any Linux workstation networked to cluster.inria.fr (the licence server). The commands used to invoke the compilers are as follows:
- pgf77 - FORTRAN 77
- pgf90 - Fortran 90
- pgf95 - Fortran 95
- pghpf - High Performanc Fortran
- pgcc - ANSI and K&R C
- pgCC - ANSI and cfront-compatible C++
- pgprof - Graphical Performance profiler
- pgdbg - Graphical debugger
Compilers options
After executing the commands above to initialize your environment, you should be able to bring up man pages for any of the above commands. If you aren't sure which options to use, PGI recommends:
-fast
for all of the Fortran compilers and the C compiler, and:
-fast -Minline=levels:10 --no_exceptions
for the C++ compiler.
By default, PGI compilers generate code that is optimized for the type of processor on which compilation is performed, the compilation host. This can be a problem if you want to run your application on all the nodes of the cluster (xeon and opteron).
The PGI 11.1 compilers can produce PGI Unified Binary object or executable files containing code streams fully optimized and supported for both AMD and Intel x64 CPUs.
To generate code optimized for both architectures, use: -tp x64
To generate code optimized only for Xeon/quadcore, use:
-tp core2-64
To generate code optimized only for Opteron, use:
-tp k8-64
To link with the MPICH libraries, add -Mmpi or -Mmpi2 (for Mpich2) to the link line for your Fortran applications.
PGI includes also an infiniband aware version of MPI: mvapich1
To use it:
/usr/local/pgi/linux86-64/current/mpi/mvapich/bin/mpicc
(or mpif77, mpif90, mpiCC)
Jobs submission
To submit jobs using torque, you must be logged in to host nef.inria.fr (or nef-devel.inria.fr). Following is an example of a script used to run an MPI "hello world" program:
% cat mpihello.f program hello include 'mpif.h' integer ierr, myproc call mpi_init(ierr) call mpi_comm_rank(MPI_COMM_WORLD, myproc, ierr) print *, "Hello world! I'm node", myproc call mpi_finalize(ierr) end % cat mpihello.pbs #!/bin/sh # Use the following command to go in your working directory (default is home) cd # The job mpiexec -comm=mpich2 mpihello # to run a binary compiled with mvapich1, use mpiexec --comm=ib % qsub -l nodes=2:nef:ppn=2 mpihello.pbs Hello world! I'm node 0 Hello world! I'm node 2 Hello world! I'm node 1 Hello world! I'm node 3 %
Debugger
In order to debug MPI applications, you have to use the MPICH2 implementation provided by PGI (it will not work with openmpi), ie. compile with -Mmpi2
.
- reserve a node interactively, for example reserve eight cores on one node: (qsub -I -l nodes=1:ppn=8)
- connect directly to the reserved node from your workstation: ssh -X nefXXX (you can also use qsub -X to have remote display from your interactive job, but it will be much slower)
- start mpd:
/usr/local/pgi/linux86-64/10.3/mpi2/mpich/bin/mpd &
(you must have a configuration file~/.mpd.conf
(the content of this file should be something like:secretword=mysecretword;
) - Configure the PGI environment:
source /usr/local/pgi/pgi_user.sh
- run pgdbg with your application:
pgdbg -mpi:/usr/local/pgi/linux86-64/10.3/mpi2/mpich/bin/mpirun -n 8
./myapp - once pgdbd is started, you just have to click in the 'resume' button to start your application.
If you need to run your MPI application on more than one node, instead of mpd, use mpdboot to start mpd on all nodes:
% NODES=`uniq < $PBS_NODEFILE | wc -l | tr -d ' '` % /usr/local/pgi/linux86-64/11.1/mpi2/mpich/bin/mpdboot --rsh=ssh --totalnum=$NODES --file=$PBS_NODEFILE
More documentations
All of the documentation for the PGI compilers and tools is on the cluster in nef-devel:/usr/local/pgi/linux86-64/current/doc.
For more information on the PGI compilers and tools, see the URLs:
For more information on HPF in general, see the High Performance Fortran homepage at: http://hpff.rice.edu/
For more information on OpenMP in general, see the OpenMP homepage at: http://www.openmp.org
For more information on the open source components of the PGI CDK, see the URLs: