Development Environment
Compilers
IBM xlf Fortran compiler version 11.1 and xlc C/C++ compilers version 9.0 are installed. These compilers can be invoked under several different variations. Table 2 lists some frequently used compiler invocations.
| Compiler | Standard | MPI | Thread-Safe | Thread-Safe MPI | GNU |
| C compiler | xlc | mpcc | xlc_r | mpcc_r | gcc |
| C++ compiler | xlC | mpCC | xlC_r | mpCC_r | g++ |
| Fortran-77 | xlf | mpxlf | xlf_r | mpxlf_r | g77 |
| Fortran-90 | xlf90 | mpxlf90 | xlf90_r | mpxlf90_r | gfortran |
| Fortran-95 | xlf95 | mpxlf95 | xlf95_r | mpxlf95_r | gfortran |
Note 1: All the IBM Fortran compilers assume that the source files have suffix .f. Source files that contain cpp preprocessor directives should end with .F. IBM’s C compiler(‘xlc’) and C++ compiler (‘xlC’) expect files with suffices .c and .C, respectively.
Note 2: The main difference between the ‘xlf’ and ‘xlf_r’ commands is that they use different default options as set in the configuration file /etc/xlf.cfg53. The ‘_r’ makes the linker use thread-safe versions of the Fortran/C/C++ libraries. Thread safe compilers are recommended for all parallel code whether threads are explicitly used or not.
Note 3: IBM JAVA compilers and JVM version 1.4.2 are installed in Ares. However, for Java applications, users are encouraged to use Java in Kronos, the Linux cluster in CCS.
Modules
The Modules Package is installed in Ares cluster which allows users to quickly switch between different programming environments. The Modules package sets up appropriate environment variables, such as PATH, MANPATH, LD_LIBRARY_PATH etc., depending on the modules chosen. Table 3 lists commonly used modules commands.
| Command | Usage | Purpose |
| module avail | module avail | list of all available modules |
| module load [package] | module load ncarg | load a module e.g., the ncarg package |
| module show [package] | module show ncarg | show environmental variables of a module |
| module list | module list | list modules currently loaded |
| module switch [old] [new] | module switch ncarg gnuplot | replace old module with new |
| mudule purge | module purge | unload all modules |
The currently available modules in Ares as listed by the command “module avail” is shown below:
[user@/nethome/user]:>module avail ---------- /nethome/apps/modules/aix575/Modules/default/modulefiles ---------- ferret ftnchek gmt/4.3.1-32 hdf5 netcdf/3.6.3 tcl fftw gaussian gnuplot mpfr netcdf/4.0 freeware gmp grads ncarg null
Typically, modules are loaded as a part of the login process by placing module commands in the .bash_profile /.bashrc (for bash/sh/ksh users) or .login/.cshrc for (csh/tcsh users).
NOTE: for some users, the default Modules feature might not be pre-configured; if that’s the case, please activate this feature by using the following command
[user@/nethome/user]: source /nethome/apps/modules/etc/profile.modules
Compiling Sequential Programs
Assume your Fortran 90/C source program is stored in file ‘program.f/program.c’. These files are compiled with the command:
Fortran: xlf90 program.f -o program C: xlc program.c -o program
The ‘-o program’ option instructs the compiler to name the executable program ‘program’. If the ‘-o’ option is not used, the executable program is called ‘a.out’ by default. If your source program is spread over multiple files, say for example that a subroutine in the file ‘program.f’ calls a subroutine/function that is declared in the file ‘program2.f/program2.c’. Then one can compile in two steps:
Fortran: xlf90 -c program2.f xlf90 program.f program2.o -o program C: xlc -c program2.c xlc program.c program2.o -o program
The ‘-c’ option instructs the compiler to create an object file (program2.o) instead of an executable file. This object file is used at a later step of the compilation. In the last step of the compilation, all object files (or one source file with other object files) are usually linked together to create the executable file. By default, the compiler links these object files together with several libraries that are present on the system and that contain implementations of the language standard (for example Fortran intrinsics like abs, sin, mod, …).
On many systems, Fortran 77, Fortran 90 and Fortran 95 files have the suffixes .f, .f90 and .f95, respectively. If you port your program from another platform with a different suffix than .f, say for example .f90. you can rename the source files or compile and link the files as follows:
xlf90 -qsuffix=f=f90 program.f90
When your code consists of many source files, the venerable unix make utility can automate the maintenance, update, compilation, and regeneration of object and executable files. The make utility requires by default a file called ‘Makefile’ in which you specify your compiler options, source/object files, and rules for compilation. After you have created the Makefile, you can build your program with the command ‘make’ or ‘gmake’ (GNU make). The /nethome/examples directory contains an example Makefile for GNU make (which we recommend), that can be used to build an example Fortran program. The sample Makefile provides a template and can be edited to suit your needs Writing Makefiles is not always easy but can simplify your life considerably. Contact user support for help when needed.
Compiling OpenMP Programs
Most options for compiling sequential programs can also be used for compiling parallel OpenMP (or threaded) programs. Commands to compile Fortran/OpenMP programs are xlf_r, xlf90_r,xlf95_r and xlc_r. For example:
Fortran: xlf90_r -qsmp=omp:opt program.f C: xlc_r -qsmp=omp:opt program.c
NOTE: If you use xlf_r (F77 compiler) to compile your OpenMP program, you should always use the -qnosave option. For xlf90_r (F90 compiler), the -qnosave option is used by default and need not be set by the user.
Compiling MPI Programs
Most options for compiling sequential programs can also be used for compiling parallel MPI programs. Commands to compile Fortran/MPI programs are mpxlf_r, mpxlf90_r, mpxlf95_r and mpcc_r. For example:
Fortran: mpxlf90_r program.f C: mpcc_r program.c
The ‘mpxlf…’ commands are wrapper shell scripts that invoke the appropriate xlf compiler. In addition, the Partition Manager, Message Passing Interface (MPI), and/or Message Passing Library (MPL) are automatically linked in. Flags are passed by mpxlf to the xlf command, so any of the xlf options can be used on the mpxlf shell script. By default, the mpxlf scripts pass the proper MPI header files to the xlf compilers. It is therefore not necessary to specify the directory (normally with the ‘-I’ option) where the MPI header file is located. The MPI compilers also link in the MPI libraries by default (and linking with something like ‘-L… -lmpi’ is not needed).
NOTE: If your MPI program uses shared memory to communicate between processors, thread-safe MPI compilers should be used.
Frequently Used Compiler Options
The compilers accept many options e.g. for debugging, optimization for code size or performance. Use ‘xlf -help’ to get an overview of the options for the Fortran compilers and ‘xlc -help’ for the C/C++ compilers for detailed information on the available options. Table 4 describes some commonly used options.
| Option | Description |
| -qsuffix=f=f90 | Allows .f90 extension to be used for fortran source files |
| -c | Compile only, producing a “.o” file. Does not link object files |
| -g | Produce information required by debuggers and some profiler tools |
| -I | Names directories for additional include files |
| -L | Specifies pathname where additional libraries reside directories will be searched in the order of their occurrence on the command line |
| -l | Names additional libraries to be searched |
| -O0 | (default) Performs only quick local optimizations such as constant folding and elimination of local common subexpressions. |
| -O2 | Performs optimizations that the compiler developers considered the best combination for compilation speed and runtime performance. |
| -O3 | Performs some memory and compile-time intensive optimizations in addition to those executed with -O2. |
| -O | Equivalent to specifying -O2 |
| -p -pg | Generate profiling support code. -p is required for use with the prof utility and -pg is required for use with the gprof utility |
| -q32, -q64 | Specifies generation of 32-bit or 64-bit objects |
| -qstrict | Turns off aggressive optimizations which have the potential to alter the semantics of your program. Only valid with -O2 or higher optimization levels. By default, -qnostrict at -O3 or higher, and -qstrict otherwise |
| -qhot | Determines whether or not to perform high-order transformations on loops and array language during optimization, and whether or not to pad array dimensions and data objects to avoid cache misses. |
| -qautodbl=dbl | Promotes REALs to 64 bit double precision REALs |
| -qfullpath | full path to source and include files in included in output files for debugging |
| -qwarn64 | produces warning for 64 bit data size issues |
| -q | produces warnings for 32 bit data size issues |
| -qmaxmem=[num] | Specifies the memory limit in kilobytes used by space intensive optimizations. The special value -1 is used to indicate unlimited memory for such optimizations |
| -bmaxdata:<bytes> | Specifies the maximum amount of space to reserve for the program data segment for programs where the size of these regions is a constraint. By default, combined data space is slightly less than 256MB, or lower, depending on the limits for the user ID |
| -bmaxstack:<bytes> | Specifies the maximum amount of space to reserve for the program stack segment for programs where the size of these regions is a constraint. By default, combined stack space is slightly less than 256MB, or lower, depending on the limits for the user ID |
For basic performance and run-time optimization a good starting point is to compile your code with the options
-O3 -qarch=auto -qtune=auto -qcache=auto
In some cases the compiler may generate messages saying that it needs more memory for additional optimization of a specific subroutine. To avoid this, use the option
-qmaxmem=-1
to give ‘unlimited’ memory to the compiler for space intensive optimization.
IBM compilers use by default 32-bit addressing. If your 32-bit executable program uses more than several hundred Megabytes of memory, then you must set (at link time) the maximum size allowed for the user data area (or user heap) when the executable is run. Use the linker option
-bmaxdata:[bytes]
where [bytes] sets the maximum size in bytes. For a 32-bit program, the maximum value allowed is (hexadecimal) 0×70000000, which corresponds to roughly 2 Gbyte. For a 64-bit program (compiled with -q64), the option -bmaxdata should not be used.
The maximum stack space for 32-bit code is 256 Megabytes and can be set by linker option
-bmaxstack:256000000
If 256 Megabyte stack is not enough, use ALLOCATE/DEALLOCATE instead of automatic arrays.
NOTE: The available amount of memory for 32-bit executables is limited to 2 Gbyte for your sequential or OpenMP job and to 2 Gbyte per MPI process. If your job or processes require more memory, you must compile/link your whole code with 64-bit addressing. Therefore, add the option -q64 to your compiler. Also add option -X64 to the command when you generate libraries.
Basic Debugging
In order to debug a code on Ares, you must compile and link your application with the options “-g -qnooptimize -qfullpath”. dbx is the standard AIX symbolic debugger. To run an interactive executable under the control of the debugger:
dbx ./program
To analyze a core file:
dbx program core
pdbx is a symbolic, textual, parallel debugger included with the IBM Parallel Environment. pdbx accepts the same options as poe. e.g.: To use the parallel debugger (pdbx) compile a parallel program using the appropriate mpxx_r compiler script (e.g. mpcc_r, mpxlf90_r) and specify the “-g -qnooptimize -qfullpath” option. Set-up any environment variables and load the parallel program
From the command line type: pdbx ./program.exe -procs N
the -procs N specifies the total number of instances of MPI tasks. After initialization the:
pdbx(all)
prompt should be displayed. To trace the program instances in the debugger type
pdbx(all) tasks long
For example if 2 instances of the program were loaded the output should be something like:
0:Debug ready l1f35 172.31.6.137 0 1:Debug ready l1f35 172.31.6.137 0
To set a break point at line
in the code for all instances type:
stop at 30
To continue execution type
cont
to exit the debugger type quit. For more details on the use of the pdbx debugger refer to the “IBM Parallel Environment for AIX – Operation and Use, Volume 2″.
Basic Profiling
gprof is a simple text-based utility that provides procedural-level profiling of serial and parallel codes. This helps users to identify how much time is being spent in subroutines and functions. xprofiler generates a graphical display of the performance, and provides application profiling at the source statement level. Both gprof and xprofiler are very simple to use:
- Compile the code with the -pg option, in addition to optimisation flags. If you use xprofiler, using -g in addition to the -pg option will offer profiling at source line level, however the -g will degrade the performance and is incomptible with some optimisation flags (e.g. inlining).
- Run the parallel code as usual.
- Each process will write an additional file to disk named gmon.out.pid
- Process the output with gprof or xprofiler where exec_file is the name of the compiled executable:
- gprof exec_file gmon.out.pid
If using xprofiler, after starting xprofiler use the File > Load Files dialogue box to load the executable file and the gmon.out.pid file. gprof and xprofiler facilitate analysis of CPU usage only. They cannot provide other types of profiling information, such as CPU idle, I/O or communication. Additional information about gprof and xprofiler can be found in the man pages and in the “IBM Parallel Environment for AIX – Operation and Use, Volume 2″.



