A typical program that uses these macros will have the following structure:
... MPUTIL_INIT(0); ...any code that is executed only once, independent of the number of cores MPUTIL_BEGIN; ...any code that each active core must execute, such as initialization MPUTIL_LABEL("label text for row, in printf format"); MPUTIL_SYNC; ... benchmark code, including timer calls. Total time in tval MPUTIL_OUTAPP("\t%.2e\n",tval); MPUTIL_END; MPUTIL_FINALIZE;
The thread_level is used only when creating the parallel version, which uses MPI processes to execute the benchmark in parallel. To simplify the interface, this does not use the MPI-defined levels (thus, the benchmark never need inlude mpi.h). The valid values are 0 for no threads used (MPI_THREAD_SINGLE), 1 for threads used in loops (MPI_THREAD_FUNNELED), and 2 for benchmarks that need MPI_THREAD_MULTIPLE. Most regular benchmarks will use 0, but OpenMP benchmarks will need to set thread_level to 1.