Development Environment

Compilers

IBM xlf Fortran compiler version 11.1 and xlc C/C++ compilers version 9.0 are installed. These compilers can be invoked under several different variations. Table 2 lists some frequently used compiler invocations.

Table 2: Summary of available compilers on the p-series machines
Compiler Standard MPI Thread-Safe Thread-Safe MPI GNU
C compiler xlc mpcc xlc_r mpcc_r gcc
C++ compiler xlC mpCC xlC_r mpCC_r g++
Fortran-77 xlf mpxlf xlf_r mpxlf_r g77
Fortran-90 xlf90 mpxlf90 xlf90_r mpxlf90_r gfortran
Fortran-95 xlf95 mpxlf95 xlf95_r mpxlf95_r gfortran

Note 1: All the IBM Fortran compilers assume that the source files have suffix .f. Source files that contain cpp preprocessor directives should end with .F. IBM’s C compiler(‘xlc’) and C++ compiler (‘xlC’) expect files with suffices .c and .C, respectively.

Note 2: The main difference between the ‘xlf’ and ‘xlf_r’ commands is that they use different default options as set in the configuration file /etc/xlf.cfg53. The ‘_r’ makes the linker use thread-safe versions of the Fortran/C/C++ libraries. Thread safe compilers are recommended for all parallel code whether threads are explicitly used or not.

Note 3: IBM JAVA compilers and JVM version 1.4.2 are installed in Ares. However, for Java applications, users are encouraged to use Java in Kronos, the Linux cluster in CCS.

Modules

The Modules Package is installed in Ares cluster which allows users to quickly switch between different programming environments. The Modules package sets up appropriate environment variables, such as PATH, MANPATH, LD_LIBRARY_PATH etc., depending on the modules chosen. Table 3 lists commonly used modules commands.


Table 3: Summary of modules package usage in Ares
Command Usage Purpose
module avail module avail list of all available modules
module load [package] module load ncarg load a module e.g., the ncarg package
module show [package] module show ncarg show environmental variables of a module
module list module list list modules currently loaded
module switch [old] [new] module switch ncarg gnuplot replace old module with new
mudule purge module purge unload all modules

The currently available modules in Ares as listed by the command “module avail” is shown below:

[user@/nethome/user]:>module avail

---------- /nethome/apps/modules/aix575/Modules/default/modulefiles ---------- ferret ftnchek gmt/4.3.1-32 hdf5 netcdf/3.6.3 tcl fftw gaussian gnuplot mpfr netcdf/4.0 freeware gmp grads ncarg null 

Typically, modules are loaded as a part of the login process by placing module commands in the .bash_profile /.bashrc (for bash/sh/ksh users) or .login/.cshrc for (csh/tcsh users).

NOTE: for some users, the default Modules feature might not be pre-configured; if that’s the case, please activate this feature by using the following command

[user@/nethome/user]: source /nethome/apps/modules/etc/profile.modules 

Compiling Sequential Programs

Assume your Fortran 90/C source program is stored in file ‘program.f/program.c’. These files are compiled with the command:

Fortran:

 xlf90 program.f -o program

C:

 xlc program.c -o program 

The ‘-o program’ option instructs the compiler to name the executable program ‘program’. If the ‘-o’ option is not used, the executable program is called ‘a.out’ by default. If your source program is spread over multiple files, say for example that a subroutine in the file ‘program.f’ calls a subroutine/function that is declared in the file ‘program2.f/program2.c’. Then one can compile in two steps:

Fortran:

 xlf90 -c program2.f xlf90 program.f program2.o -o program

C:

 xlc -c program2.c xlc program.c program2.o -o program 

The ‘-c’ option instructs the compiler to create an object file (program2.o) instead of an executable file. This object file is used at a later step of the compilation. In the last step of the compilation, all object files (or one source file with other object files) are usually linked together to create the executable file. By default, the compiler links these object files together with several libraries that are present on the system and that contain implementations of the language standard (for example Fortran intrinsics like abs, sin, mod, …).

On many systems, Fortran 77, Fortran 90 and Fortran 95 files have the suffixes .f, .f90 and .f95, respectively. If you port your program from another platform with a different suffix than .f, say for example .f90. you can rename the source files or compile and link the files as follows:

 xlf90 -qsuffix=f=f90 program.f90 

When your code consists of many source files, the venerable unix make utility can automate the maintenance, update, compilation, and regeneration of object and executable files. The make utility requires by default a file called ‘Makefile’ in which you specify your compiler options, source/object files, and rules for compilation. After you have created the Makefile, you can build your program with the command ‘make’ or ‘gmake’ (GNU make). The /nethome/examples directory contains an example Makefile for GNU make (which we recommend), that can be used to build an example Fortran program. The sample Makefile provides a template and can be edited to suit your needs Writing Makefiles is not always easy but can simplify your life considerably. Contact user support for help when needed.

Compiling OpenMP Programs

Most options for compiling sequential programs can also be used for compiling parallel OpenMP (or threaded) programs. Commands to compile Fortran/OpenMP programs are xlf_r, xlf90_r,xlf95_r and xlc_r. For example:

Fortran:

 xlf90_r -qsmp=omp:opt program.f

C:

 xlc_r -qsmp=omp:opt program.c 

NOTE: If you use xlf_r (F77 compiler) to compile your OpenMP program, you should always use the -qnosave option. For xlf90_r (F90 compiler), the -qnosave option is used by default and need not be set by the user.

Compiling MPI Programs

Most options for compiling sequential programs can also be used for compiling parallel MPI programs. Commands to compile Fortran/MPI programs are mpxlf_r, mpxlf90_r, mpxlf95_r and mpcc_r. For example:

Fortran:

 mpxlf90_r program.f

C:

 mpcc_r program.c 

The ‘mpxlf…’ commands are wrapper shell scripts that invoke the appropriate xlf compiler. In addition, the Partition Manager, Message Passing Interface (MPI), and/or Message Passing Library (MPL) are automatically linked in. Flags are passed by mpxlf to the xlf command, so any of the xlf options can be used on the mpxlf shell script. By default, the mpxlf scripts pass the proper MPI header files to the xlf compilers. It is therefore not necessary to specify the directory (normally with the ‘-I’ option) where the MPI header file is located. The MPI compilers also link in the MPI libraries by default (and linking with something like ‘-L… -lmpi’ is not needed).

NOTE: If your MPI program uses shared memory to communicate between processors, thread-safe MPI compilers should be used.

Frequently Used Compiler Options

The compilers accept many options e.g. for debugging, optimization for code size or performance. Use ‘xlf -help’ to get an overview of the options for the Fortran compilers and ‘xlc -help’ for the C/C++ compilers for detailed information on the available options. Table 4 describes some commonly used options.


Table 4: Commonly used compiler options and their description
Option Description
-qsuffix=f=f90 Allows .f90 extension to be used for fortran source files
-c Compile only, producing a “.o” file. Does not link object files
-g Produce information required by debuggers and some profiler tools
-I Names directories for additional include files
-L Specifies pathname where additional libraries reside directories will be searched in the order of their occurrence on the command line
-l Names additional libraries to be searched
-O0 (default) Performs only quick local optimizations such as constant folding and elimination of local common subexpressions.
-O2 Performs optimizations that the compiler developers considered the best combination for compilation speed and runtime performance.
-O3 Performs some memory and compile-time intensive optimizations in addition to those executed with -O2.
-O Equivalent to specifying -O2
-p -pg Generate profiling support code. -p is required for use with the prof utility and -pg is required for use with the gprof utility
-q32, -q64 Specifies generation of 32-bit or 64-bit objects
-qstrict Turns off aggressive optimizations which have the potential to alter the semantics of your program. Only valid with -O2 or higher optimization levels. By default, -qnostrict at -O3 or higher, and -qstrict otherwise
-qhot Determines whether or not to perform high-order transformations on loops and array language during optimization, and whether or not to pad array dimensions and data objects to avoid cache misses.
-qautodbl=dbl Promotes REALs to 64 bit double precision REALs
-qfullpath full path to source and include files in included in output files for debugging
-qwarn64 produces warning for 64 bit data size issues
-q produces warnings for 32 bit data size issues
-qmaxmem=[num] Specifies the memory limit in kilobytes used by space intensive optimizations. The special value -1 is used to indicate unlimited memory for such optimizations
-bmaxdata:<bytes> Specifies the maximum amount of space to reserve for the program data segment for programs where the size of these regions is a constraint. By default, combined data space is slightly less than 256MB, or lower, depending on the limits for the user ID
-bmaxstack:<bytes> Specifies the maximum amount of space to reserve for the program stack segment for programs where the size of these regions is a constraint. By default, combined stack space is slightly less than 256MB, or lower, depending on the limits for the user ID

For basic performance and run-time optimization a good starting point is to compile your code with the options

 -O3 -qarch=auto -qtune=auto -qcache=auto 

In some cases the compiler may generate messages saying that it needs more memory for additional optimization of a specific subroutine. To avoid this, use the option

 -qmaxmem=-1 

to give ‘unlimited’ memory to the compiler for space intensive optimization.

IBM compilers use by default 32-bit addressing. If your 32-bit executable program uses more than several hundred Megabytes of memory, then you must set (at link time) the maximum size allowed for the user data area (or user heap) when the executable is run. Use the linker option

 -bmaxdata:[bytes] 

where [bytes] sets the maximum size in bytes. For a 32-bit program, the maximum value allowed is (hexadecimal) 0×70000000, which corresponds to roughly 2 Gbyte. For a 64-bit program (compiled with -q64), the option -bmaxdata should not be used.

The maximum stack space for 32-bit code is 256 Megabytes and can be set by linker option

 -bmaxstack:256000000 

If 256 Megabyte stack is not enough, use ALLOCATE/DEALLOCATE instead of automatic arrays.

NOTE: The available amount of memory for 32-bit executables is limited to 2 Gbyte for your sequential or OpenMP job and to 2 Gbyte per MPI process. If your job or processes require more memory, you must compile/link your whole code with 64-bit addressing. Therefore, add the option -q64 to your compiler. Also add option -X64 to the command when you generate libraries.

Basic Debugging

In order to debug a code on Ares, you must compile and link your application with the options “-g -qnooptimize -qfullpath”. dbx is the standard AIX symbolic debugger. To run an interactive executable under the control of the debugger:

dbx ./program 

To analyze a core file:

dbx program core 

pdbx is a symbolic, textual, parallel debugger included with the IBM Parallel Environment. pdbx accepts the same options as poe. e.g.: To use the parallel debugger (pdbx) compile a parallel program using the appropriate mpxx_r compiler script (e.g. mpcc_r, mpxlf90_r) and specify the “-g -qnooptimize -qfullpath” option. Set-up any environment variables and load the parallel program

From the command line type:

pdbx ./program.exe -procs N 

the -procs N specifies the total number of instances of MPI tasks. After initialization the:

pdbx(all) 

prompt should be displayed. To trace the program instances in the debugger type

pdbx(all) tasks long 

For example if 2 instances of the program were loaded the output should be something like:

 0:Debug ready l1f35 172.31.6.137 0 1:Debug ready l1f35 172.31.6.137 0 

To set a break point at line $ 30$ in the code for all instances type:

 stop at 30 

To continue execution type

cont 

to exit the debugger type quit. For more details on the use of the pdbx debugger refer to the “IBM Parallel Environment for AIX – Operation and Use, Volume 2″.

Basic Profiling

gprof is a simple text-based utility that provides procedural-level profiling of serial and parallel codes. This helps users to identify how much time is being spent in subroutines and functions. xprofiler generates a graphical display of the performance, and provides application profiling at the source statement level. Both gprof and xprofiler are very simple to use:

  • Compile the code with the -pg option, in addition to optimisation flags. If you use xprofiler, using -g in addition to the -pg option will offer profiling at source line level, however the -g will degrade the performance and is incomptible with some optimisation flags (e.g. inlining).
  • Run the parallel code as usual.
  • Each process will write an additional file to disk named gmon.out.pid
  • Process the output with gprof or xprofiler where exec_file is the name of the compiled executable:
  • gprof exec_file gmon.out.pid

If using xprofiler, after starting xprofiler use the File > Load Files dialogue box to load the executable file and the gmon.out.pid file. gprof and xprofiler facilitate analysis of CPU usage only. They cannot provide other types of profiling information, such as CPU idle, I/O or communication. Additional information about gprof and xprofiler can be found in the man pages and in the “IBM Parallel Environment for AIX – Operation and Use, Volume 2″.