PGI : Différence entre versions

De ClustersSophia
Aller à : navigation, rechercher
(  Usage)
Ligne 1 : Ligne 1 :
<div class="alert">
 
This documentation describes the obsolete "legacy nef" configuration which will definitively stop on 17 april 2016 : please use the [[User_Guide_new_config| new nef configuration]]
 
</div>
 
 
 
{{entete}}
 
{{entete}}
  

Version du 27 avril 2016 à 13:37


  Features

Pgi.gif

The PGI Cluster Development Kit (CDK) version 11.1 from The Portland Group, Inc. is installed on INRIA Sophia cluster. The PGI CDK 11.1 consists of the follow components:

Two network-floating seats of the following compilers and tools:

  • Floating multi-user seats for PGI's parallel Fortran, C, and C++ compilers for Linux -- industry-leading single-processor performance and integrated native support for all 3 popular parallel programming models: HPF, OpenMP, and MPI.
  • Graphical MPI and OpenMP Linux Cluster debugging (PGDBG®) and parallel performance profiling (PGPROF®) tools.
  • Pre-compiled/pre-configured MPICH message-passing libraries and utilities ( including MVAPich with infiniband support)
  • Pre-compiled ScaLAPACK parallel math library
  • Optimized BLAS and LAPACK serial math libraries
  • Tutorial examples and programs to help you get your codes up and running quickly using HPF, OpenMP, and MPI messaging


A partial list of the technical features supported by the PGI compilers includes the following:

  • PGHPF data parallel compiler with native Full HPF language support
  • PGF95 OpenMP and auto-parallel Fortran 95 compiler
  • PGF77 OpenMP and auto-parallel FORTRAN 77 compiler
  • PGC++ OpenMP and auto-parallel ANSI and cfront-compatible C++ compiler
  • PGCC OpenMP and auto-parallel ANSI/K&R C compiler
  • PGDBG multi-process/multi-thread graphical debugger
  • PGPROF multi-process/multi-thread graphical performance profiler
  • Full 64-bit support on AMD Opteron, AMD Athlon 64 and Intel Pentium and Xeon with EM64T including full support for -mcmodel=medium and single data objects > 2GB
  • Includes separate 32-bit x86 and 64-bit EM64T/AMD64 development environments and compilers
  • Optimizing 64-bit code generators with automatic or manual platform selection
  • Executables generated by PGI's 32-bit x86 compilers can run unchanged on AMD64 or EM64T processor-based systems
  • AMD Opteron and Intel EM64T optimizations including SSE/SSE2, prefetching, use of extended register sets, and 64-bit addressing
  • Intel Pentium II/III/4/Xeon and AMD Athlon XP/MP optimizations including SSE/SSE2 and prefetching where supported in hardware
  • Large file (> 2GB) support in Fortran on 32-bit x86 systems
  • -r8/-i8 compilation flags, 64-bit integers
  • Full support for Fortran 95 extensions
  • Optimized ACML version 2.5 math library supported on all targets
  • Highly-tuned math intrinsics library routines
  • One pass interprocedural analysis (IPA)
  • Interprocedural optimization of libraries
  • Profile feedback optimization
  • Function inlining including library functions
  • Vectorization, loop interchange, loop splitting
  • Loop unrolling, loop fusion, and cache tiling
  • Support for creation of shared objects on Linux and DLLs on Windows
  • Cray/DEC/IBM compatibility (including Cray POINTERs)
  • Support for SGI-compatible DOACROSS in PGF77 and PGF95, and for SGI-compatible parallelization pragmas in PGCC C and C++
  • Byte-swapping I/O for RISC/UNIX interoperability
  • Integrated cpp pre-processing
  • Threads-based auto-parallelization using PGF77, PGF95, and
  • PGCC C and C++
  • Full support for OpenMP in PGF77, PGF95, and PGCC C and C++
  • Process/CPU affinity support in SMP/OpenMP applications
  • FORALL and F95 array assignment merging
  • Re-use of communication schedules
  • Complete implementation of the HPF Library
  • Parallelization of irregular DO loops, FORALLs, and array assignments
  • HPF parallelization using direct accesses to shared memory
  • Fully upward compatible with PGHPF for high-end parallel systems
  • Support for graphical HPF profiling and performance tuning


PGI 2010 New Features and Performance:

  • PGI Accelerator™ x64+GPU native Fortran 95/03 and C99 compilers now support the full PGI Accelerator Programming Model v1.0 standard for directive-based GPU programming and optimization.
    • Now supported on Linux, MacOS and Windows
    • Device-resident data using MIRROR, REFLECTED, UPDATE directives
    • COMPLEX and DOUBLE COMPLEX data, Fortran derived types, C structs
    • Automatic GPU-side loop unrolling, support for the UNROLL clause
    • Support for Accelerator regions nested within OpenMP parallel regions
  • PGI CUDA Fortran extensions supported in the PGI 2010 Fortran 95/03 compiler enable explicit CUDA GPU programming
    • Declare variables in CUDA GPU device, constant or shared memory
    • Dynamically allocate page-locked pinned host memory, CUDA device main memory, constant memory and shared memory
    • Move data between host and GPU with Fortran assignment statements
    • Declare explicit CUDA grids/thread-blocks to launch GPU compute kernels
    • Support for CUDA Runtime API functions and features
    • Efficient host-side emulation for easy CUDA Fortran debugging
  • PGI Fortran 2003 incremental features. See full list below.
  • PGC++/ PGCC enhancements include the latest EDG release 4.1 front-end with enhanced GNU and Microsoft compatibility, extern inline support, improved BOOST support, thread-safe exception handling
  • PGI Visual Fortran supports launching and debugging of MSMPI programs on Windows clusters from within Visual Studio, adds support for the PGI Accelerator Programming model and PGI CUDA Fortran on NVIDIA CUDA-enabled GPUs, and now includes the standalone PGPROF performance profiler with CCFF support.
  • Compiler optimizations and enhancements include OpenMP support for up to 256 cores, support for AVX code generation, C++ inlining and executable size improvements,
  • PGPROF parallel OpenMP performance analysis and tuning tool
    • Uniform cross-platform performance profiling without re-compiling or any special software privileges on Linux, MacOS and Windows
    • PGI Accelerator and CUDA Fortran GPU-side performance statistics
    • Updated graphical user interface
  • Latest Operating Systems supported including RHEL 5, Fedora 11, SLES 11, SuSE 11.1, Ubuntu 9, Windows 7 and Mac OS X Snow Leopard
  • Updated Documentation including the PGI Users Guide, PGI Tools Guide and PVF Users Guide


  Documentation

Documentation includes the following:

  • PGI User's Guide, PGHPF User's Guide, PGHPF Reference Manual, and Release Notes

See also the documentations files in nef-devel:/usr/local/pgi/linux86-64/current/doc on the cluster.

  Usage

  Setup

To initialize your environment to use the PGI CDK, issue the following command:

 % module load pgi/pgi-14.10 

If you want also to have the PGI version of mpi in your path, use instead:

 % module load mpi/pgi-14.10

You'll be able to use the PGI Fortran, C, and C++ compilers and tools on any Linux workstation networked to cluster.inria.fr (the licence server). The commands used to invoke the compilers are as follows:

  • pgf77 - FORTRAN 77
  • pgf90 - Fortran 90
  • pgf95 - Fortran 95
  • pghpf - High Performanc Fortran
  • pgcc - ANSI and K&R C
  • pgCC - ANSI and cfront-compatible C++
  • pgprof - Graphical Performance profiler
  • pgdbg - Graphical debugger

  Compilers options

After executing the commands above to initialize your environment, you should be able to bring up man pages for any of the above commands. If you aren't sure which options to use, PGI recommends:

-fast  

for all of the Fortran compilers and the C compiler, and:

-fast -Minline=levels:10 --no_exceptions 

for the C++ compiler.

By default, PGI compilers generate code that is optimized for the type of processor on which compilation is performed, the compilation host. This can be a problem if you want to run your application on all the nodes of the cluster (xeon and opteron).

The PGI 11.1 compilers can produce PGI Unified Binary object or executable files containing code streams fully optimized and supported for both AMD and Intel x64 CPUs.

To generate code optimized for both architectures, use: -tp x64

To generate code optimized only for Xeon/quadcore, use:

-tp core2-64 

To generate code optimized only for Opteron, use:

-tp k8-64 

To link with the MPICH libraries, add -Mmpi or -Mmpi2 (for Mpich2) to the link line for your Fortran applications.

  Jobs submission

To submit jobs using torque, you must be logged in to host nef.inria.fr (or nef-devel2.inria.fr). Following is an example of a script used to run an MPI "hello world" program:

% cat mpihello.f
program hello
include 'mpif.h'
integer ierr, myproc
call mpi_init(ierr)
call mpi_comm_rank(MPI_COMM_WORLD, myproc, ierr) 
print *, "Hello world!  I'm node", myproc
call mpi_finalize(ierr)
end
% cat mpihello.sh
#!/bin/sh
# The job
source /etc/profile.d/modules.sh
module load mpi/pgi-14.10
mpirun  -machinefile $OAR_NODEFILE -launcher-exec oarsh  ./mpihello


% oarsub -l /nodes=2/core=2 ./mpihello.sh

Hello world!  I'm node 0
Hello world!  I'm node 2
Hello world!  I'm node 1
Hello world!  I'm node 3
%


  Debugger

In order to debug MPI applications, you have to use the MPICH2 implementation provided by PGI (it will not work with openmpi), ie. compile with -Mmpi2.

  • reserve a node interactively, for example reserve eight cores on one node: (oarsub -I -l /nodes=1)
  • Configure the PGI environment: module load pgi/pgi-14.10
  • run pgdbg with your application: pgdbg -mpi:$PGI/linux86-64/14.10/mpi/mpich/bin/mpirun -n 8 ./myapp
  • once pgdbd is started, you just have to click in the 'resume' button to start your application.

  More documentations

All of the documentation for the PGI compilers and tools is on the cluster in nef-devel:/usr/local/pgi/linux86-64/current/doc.


For more information on the PGI compilers and tools, see the URLs:

For more information on HPF in general, see the High Performance Fortran homepage at: http://hpff.rice.edu/

For more information on OpenMP in general, see the OpenMP homepage at: http://www.openmp.org

For more information on the open source components of the PGI CDK, see the URLs: