| Series Foreword | xv |
| Foreword | xvii |
| Preface | xix |
1 | Introduction | 1 |
1.1 | MPI-1 and MPI-2 | 1 |
1.2 | MPI-3 | 2 |
1.3 | Parallelism and MPI | 3 |
1.3.1 | Conway’s Game of Life | 4 |
1.3.2 | Poisson Solver | 5 |
1.4 | Passing Hints to the MPI Implementation with MPI_Info | 11 |
1.4.1 | Motivation, Description, and Rationale | 12 |
1.4.2 | An Example from Parallel I/O | 12 |
1.5 | Organization of This Book | 13 |
2 | Working with Large-Scale Systems | 15 |
2.1 | Nonblocking Collectives | 16 |
2.1.1 | Example: 2-D FFT | 16 |
2.1.2 | Example: Five-Point Stencil | 19 |
2.1.3 | Matching, Completion, and Progression | 20 |
2.1.4 | Restrictions | 22 |
2.1.5 | Collective Software Pipelining | 23 |
2.1.6 | A Nonblocking Barrier? | 27 |
2.1.7 | Nonblocking Allreduce and Krylov Methods | 30 |
2.2 | Distributed Graph Topologies | 31 |
2.2.1 | Example: The Peterson Graph | 37 |
2.2.2 | Edge Weights | 37 |
2.2.3 | Graph Topology Info Argument | 39 |
2.2.4 | Process Reordering | 39 |
2.3 | Collective Operations on Process Topologies | 40 |
2.3.1 | Neighborhood Collectives | 41 |
2.3.2 | Vector Neighborhood Collectives | 44 |
2.3.3 | Nonblocking Neighborhood Collectives | 45 |
2.4 | Advanced Communicator Creation | 48 |
2.4.1 | Nonblocking Communicator Duplication | 48 |
2.4.2 | Noncollective Communicator Creation | 50 |
3 | Introduction to Remote Memory Operations | 55 |
3.1 | Introduction | 57 |
3.2 | Contrast with Message Passing | 59 |
3.3 | Memory Windows | 62 |
3.3.1 | Hints on Choosing Window Parameters | 64 |
3.3.2 | Relationship to Other Approaches | 65 |
3.4 | Moving Data | 65 |
3.4.1 | Reasons for Using Displacement Units | 69 |
3.4.2 | Cautions in Using Displacement Units | 70 |
3.4.3 | Displacement Sizes in Fortran | 71 |
3.5 | Completing RMA Data Transfers | 71 |
3.6 | Examples of RMA Operations | 73 |
3.6.1 | Mesh Ghost Cell Communication | 74 |
3.6.2 | Combining Communication and Computation | 84 |
3.7 | Pitfalls in Accessing Memory | 88 |
3.7.1 | Atomicity of Memory Operations | 89 |
3.7.2 | Memory Coherency | 90 |
3.7.3 | Some Simple Rules for RMA | 91 |
3.7.4 | Overlapping Windows | 93 |
3.7.5 | Compiler Optimizations | 93 |
3.8 | Performance Tuning for RMA Operations | 95 |
3.8.1 | Options for MPI_Win_create | 95 |
3.8.2 | Options for MPI_Win_fence | 97 |
4 | Advanced Remote Memory Access | 101 |
4.1 | Passive Target Synchronization | 101 |
4.2 | Implementing Blocking, Independent RMA Operations | 102 |
4.3 | Allocating Memory for MPI Windows | 104 |
4.3.1 | Using MPI_Alloc_mem and MPI_Win_allocate from C | 104 |
4.3.2 | Using MPI_Alloc_mem and MPI_Win_allocate from Fortran 2008 | 105 |
4.3.3 | Using MPI_ALLOC_MEM and MPI_WIN_ALLOCATE from Older Fortran | 107 |
4.4 | Another Version of NXTVAL | 108 |
4.4.1 | The Nonblocking Lock | 110 |
4.4.2 | NXTVAL with MPI_Fetch_and_op | 110 |
4.4.3 | Window Attributes | 112 |
4.5 | An RMA Mutex | 115 |
4.6 | Global Arrays | 120 |
4.6.1 | Create and Free | 122 |
4.6.2 | Put and Get | 124 |
4.6.3 | Accumulate | 127 |
4.6.4 | The Rest of Global Arrays | 128 |
4.7 | A Better Mutex | 130 |
4.8 | Managing a Distributed Data Structure | 131 |
4.8.1 | A Shared-Memory Distributed List Implementation | 132 |
4.8.2 | An MPI Implementation of a Distributed List | 135 |
4.8.3 | Inserting into a Distributed List | 140 |
4.8.4 | An MPI Implementation of a Dynamic Distributed List | 143 |
4.8.5 | Comments on More Concurrent List Implementations | 145 |
4.9 | Compiler Optimization and Passive Targets | 148 |
4.10 | MPI RMA Memory Models | 149 |
4.11 | Scalable Synchronization | 152 |
4.11.1 | Exposure and Access Epochs | 152 |
4.11.2 | The Ghost-Point Exchange Revisited | 153 |
4.11.3 | Performance Optimizations for Scalable Synchronization | 155 |
4.12 | Summary | 156 |
5 | Using Shared Memory with MPI | 157 |
5.1 | Using MPI Shared Memory | 159 |
5.1.1 | Shared On-Node Data Structures | 159 |
5.1.2 | Communication through Shared Memory | 160 |
5.1.3 | Reducing the Number of Subdomains | 163 |
5.2 | Allocating Shared Memory | 163 |
5.3 | Address Calculation | 165 |
6 | Hybrid Programming | 169 |
6.1 | Background | 169 |
6.2 | Thread Basics and Issues | 170 |
6.2.1 | Thread Safety | 171 |
6.2.2 | Performance Issues with Threads | 172 |
6.2.3 | Threads and Processes | 173 |
6.3 | MPI and Threads | 173 |
6.4 | Yet Another Version of NXTVAL | 176 |
6.5 | Nonblocking Version of MPI_Comm_accept | 178 |
6.6 | Hybrid Programming with MPI | 179 |
6.7 | MPI Message and Thread-Safe Probe | 182 |
7 | Parallel I/O | 187 |
7.1 | Introduction | 187 |
7.2 | Using MPI for Simple I/O | 187 |
7.2.1 | Using Individual File Pointers | 187 |
7.2.2 | Using Explicit Offsets | 191 |
7.2.3 | Writing to a File | 194 |
7.3 | Noncontiguous Accesses and Collective I/O | 195 |
7.3.1 | Noncontiguous Accesses | 195 |
7.3.2 | Collective I/O | 199 |
7.4 | Accessing Arrays Stored in Files | 203 |
7.4.1 | Distributed Arrays | 204 |
7.4.2 | A Word of Warning about Darray | 206 |
7.4.3 | Subarray Datatype Constructor | 207 |
7.4.4 | Local Array with Ghost Area | 210 |
7.4.5 | Irregularly Distributed Arrays | 211 |
7.5 | Nonblocking I/O and Split Collective I/O | 215 |
7.6 | Shared File Pointers | 216 |
7.7 | Passing Hints to the Implementation | 219 |
7.8 | Consistency Semantics | 221 |
7.8.1 | Simple Cases | 224 |
7.8.2 | Accessing a Common File Opened with MPI_COMM_WORLD | 224 |
7.8.3 | Accessing a Common File Opened with MPI_COMM_SELF | 227 |
7.8.4 | General Recommendation | 228 |
7.9 | File Interoperability | 229 |
7.9.1 | File Structure | 229 |
7.9.2 | File Data Representation | 230 |
7.9.3 | Use of Datatypes for Portability | 231 |
7.9.4 | User-Defined Data Representations | 233 |
7.10 | Achieving High I/O Performance with MPI | 234 |
7.10.1 | The Four “Levels” of Access | 234 |
7.10.2 | Performance Results | 237 |
7.11 | An Example Application | 238 |
7.12 | Summary | 242 |
8 | Coping with Large Data | 243 |
8.1 | MPI Support for Large Data | 243 |
8.2 | Using Derived Datatypes | 243 |
8.3 | Example | 244 |
8.4 | Limitations of This Approach | 245 |
8.4.1 | Collective Reduction Functions | 245 |
8.4.2 | Irregular Collectives | 246 |
9 | Support for Performance and Correctness Debugging | 249 |
9.1 | The Tools Interface | 250 |
9.1.1 | Control Variables | 251 |
9.1.2 | Performance Variables | 257 |
9.2 | Info, Assertions, and MPI Objects | 263 |
9.3 | Debugging and the MPIR Debugger Interface | 267 |
9.4 | Summary | 269 |
10 | Dynamic Process Management | 271 |
10.1 | Intercommunicators | 271 |
10.2 | Creating New MPI Processes | 271 |
10.2.1 | Parallel cp: A Simple System Utility | 272 |
10.2.2 | Matrix-Vector Multiplication Example | 279 |
10.2.3 | Intercommunicator Collective Operations | 284 |
10.2.4 | Intercommunicator Point-to-Point Communication | 285 |
10.2.5 | Finding the Number of Available Processes | 285 |
10.2.6 | Passing Command-Line Arguments to Spawned Programs | 290 |
10.3 | Connecting MPI Processes | 291 |
10.3.1 | Visualizing the Computation in an MPI Program | 292 |
10.3.2 | Accepting Connections from Other Programs | 294 |
10.3.3 | Comparison with Sockets | 296 |
10.3.4 | Moving Data between Groups of Processes | 298 |
10.3.5 | Name Publishing | 299 |
10.4 | Design of the MPI Dynamic Process Routines | 302 |
10.4.1 | Goals for MPI Dynamic Process Management | 302 |
10.4.2 | What MPI Did Not Standardize | 303 |
11 | Working with Modern Fortran | 305 |
11.1 | The mpi_f08 Module | 305 |
11.2 | Problems with the Fortran Interface | 306 |
11.2.1 | Choice Parameters in Fortran | 307 |
11.2.2 | Nonblocking Routines in Fortran | 308 |
11.2.3 | Array Sections | 310 |
11.2.4 | Trouble with LOGICAL | 311 |
12 | Features for Libraries | 313 |
12.1 | External Interface Functions | 313 |
12.1.1 | Decoding Datatypes | 313 |
12.1.2 | Generalized Requests | 315 |
12.1.3 | Adding New Error Codes and Classes | 322 |
12.2 | Mixed-Language Programming | 324 |
12.3 | Attribute Caching | 327 |
12.4 | Using Reduction Operations Locally | 331 |
12.5 | Error Handling | 333 |
12.5.1 | Error Handlers | 333 |
12.5.2 | Error Codes and Classes | 335 |
12.6 | Topics Not Covered in This Book | 335 |
13 | Conclusions | 341 |
13.1 | MPI Implementation Status | 341 |
13.2 | Future Versions of the MPI Standard | 341 |
13.3 | MPI at Exascale | 342 |
| MPI Resources on the World Wide Web | 343 |
| References | 345 |
| Subject Index | 353 |
| Function and Term Index | 359 |