PGI : Différence entre versions

Version actuelle datée du 21 décembre 2020 à 17:03

Sommaire

1 Features
2 Documentation
3 Usage

This software is obsolete, dont use it. Documentation will be unpublished soon.

Features

The PGI Cluster Development Kit (CDK) version 14.10 from The Portland Group, Inc. is installed on INRIA Sophia cluster. The PGI CDK 14.10 consists of the follow components:

Two network-floating seats of the following compilers and tools:

Floating multi-user seats for PGI's parallel Fortran, C, and C++ compilers for Linux -- industry-leading single-processor performance and integrated native support for all 3 popular parallel programming models: HPF, OpenMP, and MPI.
Graphical MPI and OpenMP Linux Cluster debugging (PGDBG®) and parallel performance profiling (PGPROF®) tools.
Pre-compiled/pre-configured MPICH message-passing libraries and utilities ( including MVAPich with infiniband support)
Pre-compiled ScaLAPACK parallel math library
Optimized BLAS and LAPACK serial math libraries
Tutorial examples and programs to help you get your codes up and running quickly using HPF, OpenMP, and MPI messaging

A partial list of the technical features supported by the PGI compilers includes the following:

PGHPF data parallel compiler with native Full HPF language support
PGF95 OpenMP and auto-parallel Fortran 95 compiler
PGF77 OpenMP and auto-parallel FORTRAN 77 compiler
PGC++ OpenMP and auto-parallel ANSI and cfront-compatible C++ compiler
PGCC OpenMP and auto-parallel ANSI/K&R C compiler
PGDBG multi-process/multi-thread graphical debugger
PGPROF multi-process/multi-thread graphical performance profiler
Full 64-bit support on AMD Opteron, AMD Athlon 64 and Intel Pentium and Xeon with EM64T including full support for -mcmodel=medium and single data objects > 2GB
Includes separate 32-bit x86 and 64-bit EM64T/AMD64 development environments and compilers
Optimizing 64-bit code generators with automatic or manual platform selection
Executables generated by PGI's 32-bit x86 compilers can run unchanged on AMD64 or EM64T processor-based systems
AMD Opteron and Intel EM64T optimizations including SSE/SSE2, prefetching, use of extended register sets, and 64-bit addressing
Intel Pentium II/III/4/Xeon and AMD Athlon XP/MP optimizations including SSE/SSE2 and prefetching where supported in hardware
Large file (> 2GB) support in Fortran on 32-bit x86 systems
-r8/-i8 compilation flags, 64-bit integers
Full support for Fortran 95 extensions
Optimized ACML version 2.5 math library supported on all targets
Highly-tuned math intrinsics library routines
One pass interprocedural analysis (IPA)
Interprocedural optimization of libraries
Profile feedback optimization
Function inlining including library functions
Vectorization, loop interchange, loop splitting
Loop unrolling, loop fusion, and cache tiling
Support for creation of shared objects on Linux and DLLs on Windows
Cray/DEC/IBM compatibility (including Cray POINTERs)
Support for SGI-compatible DOACROSS in PGF77 and PGF95, and for SGI-compatible parallelization pragmas in PGCC C and C++
Byte-swapping I/O for RISC/UNIX interoperability
Integrated cpp pre-processing
Threads-based auto-parallelization using PGF77, PGF95, and
PGCC C and C++
Full support for OpenMP in PGF77, PGF95, and PGCC C and C++
Process/CPU affinity support in SMP/OpenMP applications
FORALL and F95 array assignment merging
Re-use of communication schedules
Complete implementation of the HPF Library
Parallelization of irregular DO loops, FORALLs, and array assignments
HPF parallelization using direct accesses to shared memory
Fully upward compatible with PGHPF for high-end parallel systems
Support for graphical HPF profiling and performance tuning

PGI 2010 New Features and Performance:

PGI Accelerator™ x64+GPU native Fortran 95/03 and C99 compilers now support the full PGI Accelerator Programming Model v1.0 standard for directive-based GPU programming and optimization.
- Now supported on Linux, MacOS and Windows
- Device-resident data using MIRROR, REFLECTED, UPDATE directives
- COMPLEX and DOUBLE COMPLEX data, Fortran derived types, C structs
- Automatic GPU-side loop unrolling, support for the UNROLL clause
- Support for Accelerator regions nested within OpenMP parallel regions
PGI CUDA Fortran extensions supported in the PGI 2010 Fortran 95/03 compiler enable explicit CUDA GPU programming
- Declare variables in CUDA GPU device, constant or shared memory
- Dynamically allocate page-locked pinned host memory, CUDA device main memory, constant memory and shared memory
- Move data between host and GPU with Fortran assignment statements
- Declare explicit CUDA grids/thread-blocks to launch GPU compute kernels
- Support for CUDA Runtime API functions and features
- Efficient host-side emulation for easy CUDA Fortran debugging
PGI Fortran 2003 incremental features. See full list below.
PGC++/ PGCC enhancements include the latest EDG release 4.1 front-end with enhanced GNU and Microsoft compatibility, extern inline support, improved BOOST support, thread-safe exception handling
PGI Visual Fortran supports launching and debugging of MSMPI programs on Windows clusters from within Visual Studio, adds support for the PGI Accelerator Programming model and PGI CUDA Fortran on NVIDIA CUDA-enabled GPUs, and now includes the standalone PGPROF performance profiler with CCFF support.
Compiler optimizations and enhancements include OpenMP support for up to 256 cores, support for AVX code generation, C++ inlining and executable size improvements,
PGPROF parallel OpenMP performance analysis and tuning tool
- Uniform cross-platform performance profiling without re-compiling or any special software privileges on Linux, MacOS and Windows
- PGI Accelerator and CUDA Fortran GPU-side performance statistics
- Updated graphical user interface
Updated Documentation including the PGI Users Guide, PGI Tools Guide and PVF Users Guide

Documentation

Documentation includes the following:

PGI User's Guide, PGHPF User's Guide, PGHPF Reference Manual, and Release Notes

See also the documentations files in nef-devel:/misc/opt/pgi/linux86-64/14.10/doc/ on the cluster.

Usage

Setup

To initialize your environment to use the PGI CDK, issue the following command:

 % module load pgi/pgi-14.10

If you want also to have the PGI version of mpi in your path, use instead:

 % module load mpi/pgi-14.10

You'll be able to use the PGI Fortran, C, and C++ compilers and tools on any Linux workstation networked to cluster.inria.fr (the licence server). The commands used to invoke the compilers are as follows:

pgf77 - FORTRAN 77
pgf90 - Fortran 90
pgf95 - Fortran 95
pghpf - High Performanc Fortran
pgcc - ANSI and K&R C
pgCC - ANSI and cfront-compatible C++
pgprof - Graphical Performance profiler
pgdbg - Graphical debugger

Compilers options

After executing the commands above to initialize your environment, you should be able to bring up man pages for any of the above commands. If you aren't sure which options to use, PGI recommends:

-fast

for all of the Fortran compilers and the C compiler, and:

-fast -Minline=levels:10 --no_exceptions

for the C++ compiler.

By default, PGI compilers generate code that is optimized for the type of processor on which compilation is performed, the compilation host. This can be a problem if you want to run your application on all the nodes of the cluster (xeon and opteron).

The PGI 14.10 compilers can produce PGI Unified Binary object or executable files containing code streams fully optimized and supported for both AMD and Intel x64 CPUs.

To generate code optimized for both architectures, use: -tp x64

To generate code optimized only for Xeon/quadcore, use:

-tp core2-64

To generate code optimized only for Opteron, use:

-tp k8-64

To link with the MPICH libraries, add -Mmpi or -Mmpi2 (for Mpich2) to the link line for your Fortran applications.

Jobs submission

To submit jobs using torque, you must be logged in to host nef.inria.fr (or nef-devel2.inria.fr). Following is an example of a script used to run an MPI "hello world" program:

% cat mpihello.f
program hello
include 'mpif.h'
integer ierr, myproc
call mpi_init(ierr)
call mpi_comm_rank(MPI_COMM_WORLD, myproc, ierr) 
print *, "Hello world!  I'm node", myproc
call mpi_finalize(ierr)
end
% cat mpihello.sh
#!/bin/sh
# The job
source /etc/profile.d/modules.sh
module load mpi/pgi-14.10
mpirun  -machinefile $OAR_NODEFILE -launcher-exec oarsh  ./mpihello

% oarsub -l /nodes=2/core=2 ./mpihello.sh

Hello world!  I'm node 0
Hello world!  I'm node 2
Hello world!  I'm node 1
Hello world!  I'm node 3
%

Debugger

In order to debug MPI applications, you have to use the MPICH2 implementation provided by PGI (it will not work with openmpi), ie. compile with -Mmpi2.

reserve a node interactively, for example reserve eight cores on one node: (oarsub -I -l /nodes=1)
Configure the PGI environment: module load pgi/pgi-14.10
run pgdbg with your application: pgdbg -mpi:$PGI/linux86-64/14.10/mpi/mpich/bin/mpirun -n 8 ./myapp
once pgdbd is started, you just have to click in the 'resume' button to start your application.

@@ Ligne 1 : / Ligne 1 : @@
 {{entete}}
+<div class="alert">
+This software is obsolete, dont use it. Documentation will be unpublished soon.
+</div>
 ==  Features ==

PGI : Différence entre versions

Version actuelle datée du 21 décembre 2020 à 17:03

Sommaire

Features

Documentation

Usage

Setup

Compilers options

Jobs submission

Debugger

More documentations

Menu de navigation

Outils personnels

Espaces de noms

Variantes

Affichages

Plus

Rechercher

Navigation

Clusters Howto

Clusters Guides & Tools

Outils