When used to benchmark on muiltiple cores, the macros MPUTIL_LABEL and MPUTIL_OUTALL together should be used to output timing data. When used on a single core (the non-parallel option), each acts like printf. In the parallel case, the output has one column containing the data specified by MPUTIL_OUTAPP for each number of active cores; the leftmost column is the text provided by MPUTIL_LABEL.
The output from both MPUTIL_LABEL and MPUTIL_OUTAPP is generated when the timing block is completed with MPUTIL_END.