1、High Performance Computing Course Notes 2007-2008 Message Passing Programming I,Message Passing Programming,Message Passing is the most widely used parallel programming model Message passing works by creating a number of tasks, uniquely named, that interact by sending and receiving messages to and f
2、rom one another (hence the message passing) Generally, processes communicate through sending the data from the address space of one process to that of another Communication of processes (via files, pipe, socket) Communication of threads within a process (via global data area) Programs based on messa
3、ge passing can be based on standard sequential language programs (C/C+, Fortran), augmented with calls to library functions for sending and receiving messages,Message Passing Interface (MPI),MPI is a specification, not a particular implementation Does not specify process startup, error codes, amount
4、 of system buffer, etc MPI is a library, not a language The goals of MPI: functionality, portability and efficiency Message passing model MPI specification MPI implementation,OpenMP vs MPI,In a nutshellMPI is used on distributed-memory systemsOpenMP is used for code parallelisation on shared-memory
5、systemsBoth are explicit parallelism High-level control (OpenMP), lower-level control (MPI),A little history,Message-passing libraries developed for a number of early distributed memory computers By 1993 there were loads of vendor specific implementations By 1994 MPI-1 came into being By 1996 MPI-2
6、was finalized,The MPI programming model,MPI standards - MPI-1 (1.1, 1.2), MPI-2 (2.0) Forwards compatibility preserved between versions Standard bindings - for C, C+ and Fortran. Have seen MPI bindings for Python, Java etc (all non-standard) We will stick to the C binding, for the lectures and cours
7、ework. More info on MPI www.mpi-forum.org Implementations - For your laptop pick up MPICH (free portable implementation of MPI (http:/www-unix.mcs.anl. gov/mpi/mpich/index.htm) Coursework will use MPICH,MPI,MPI is a complex system comprising of 129 functions with numerous parameters and variants Six
8、 of them are indispensable, but can write a large number of useful programs already Other functions add flexibility (datatype), robustness (non-blocking send/receive), efficiency (ready-mode communication), modularity (communicators, groups) or convenience (collective operations, topology). In the l
9、ectures, we are going to cover most commonly encountered functions,The MPI programming model,Computation comprises one or more processes that communicate via library routines and sending and receiving messages to other processes (Generally) a fixed set of processes created at outset, one process per
10、 processor Different from PVM,Intuitive Interfaces for sending and receiving messages,Send(data, destination), Receive(data, source) minimal interface Not enough in some situations, we also need Message matching add message_id at both send and receive interfaces they become Send(data, destination, m
11、sg_id), receive(data, source, msg_id) Message_id Is expressed using an integer, termed as message tag Allows the programmer to deal with the arrival of messages in an orderly fashion (queue and then deal with,How to express the data in the send/receive interfaces,Early stages: (address, length) for
12、the send interface (address, max_length) for the receive interface They are not always good The data to be sent may not be in the contiguous memory locations Storing format for data may not be the same or known in advance in heterogeneous platform Enventually, a triple (address, count, datatype) is
13、used to express the data to be sent and (address, max_count, datatype) for the data to be received Reflecting the fact that a message contains much more structures than just a string of bits, For example, (vector_A, 300, MPI_REAL) Programmers can construct their own datatype Now, the interfaces beco
14、me send(address, count, datatype, destination, msg_id) and receive(address, max_count, datatype, source, msg_id),How to distinguish messages,Message tag is necessary, but not sufficientSo, communicator is introduced ,Communicators,Messages are put into contexts Contexts are allocated at run time by
15、the system in response to programmer requests The system can guarantee that each generated context is unique The processes belong to groups The notions of context and group are combined in a single object, which is called a communicator A communicator identifies a group of processes and a communicat
16、ion context The MPI library defines a initial communicator, MPI_COMM_WORLD, which contains all the processes running in the system The messages from different process groups can have the same tag So the send interface becomes send(address, count, datatype, destination, tag, comm),Status of the recei
17、ved messages,The structure of the message status is added to the receive interface Status holds the information about source, tag and actual message size In the C language, source can be retrieved by accessing status.MPI_SOURCE, tag can be retrieved by status.MPI_TAG and actual message size can be r
18、etrieved by calling the function MPI_Get_count(&status, datatype, &count) The receive interface becomes receive(address, maxcount, datatype, source, tag, communicator, status),How to express source and destination,The processes in a communicator (group) are identified by ranks If a communicator cont
19、ains n processes, process ranks are integers from 0 to n-1 Source and destination processes in the send/receive interface are the ranks,Some other issues,In the receive interface, tag can be a wildcard, which means any message will be received In the receive interface, source can also be a wildcard,
20、 which match any source,MPI basics,First six functions (C bindings)MPI_Send (buf, count, datatype, dest, tag, comm)Send a messagebuf address of send buffercount no. of elements to send (=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle),MPI basics,First s
21、ix functions (C bindings)MPI_Send (buf, count, datatype, dest, tag, comm)Send a messagebuf address of send buffercount no. of elements to send (=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle),MPI basics,First six functions (C bindings)MPI_Send (buf, co
22、unt, datatype, dest, tag, comm)Send a messagebuf address of send buffercount no. of elements to send (=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle),MPI basics,First six functions (C bindings)MPI_Send (buf, count, datatype, dest, tag, comm)Calculating
23、 the size of the data to be send buf address of send buffercount * sizeof (datatype) bytes of data,MPI basics,First six functions (C bindings)MPI_Send (buf, count, datatype, dest, tag, comm)Send a messagebuf address of send buffercount no. of elements to send (=0)datatype of elementsdest process id
24、of destination tag message tagcomm communicator (handle),MPI basics,First six functions (C bindings)MPI_Send (buf, count, datatype, dest, tag, comm)Send a messagebuf address of send buffercount no. of elements to send (=0)datatype of elementsdest process id of destination tag message tagcomm communi
25、cator (handle),MPI basics,First six functions (C bindings)MPI_Recv (buf, count, datatype, source, tag, comm, status)Receive a messagebuf address of receive buffer (var param)count max no. of elements in receive buffer (=0)datatype of receive buffer elementssource process id of source process, or MPI
26、_ANY_SOURCE tag message tag, or MPI_ANY_TAGcomm communicatorstatus status object,MPI basics,First six functions (C bindings)MPI_Init (int *argc, char *argv)Initiate a computationargc (number of arguments) and argv (argument vector) are main programs argumentsMust be called first, and once per proces
27、sMPI_Finalize ( )Shut down a computationThe last thing that happens,MPI basics,First six functions (C bindings)MPI_Comm_size (MPI_Comm comm, int *size)Determine number of processes in commcomm is communicator handle, MPI_COMM_WORLD is the default (including all MPI processes)size holds number of pro
28、cesses in groupMPI_Comm_rank (MPI_Comm comm, int *pid)Determine id of current (or calling) processpid holds id of current process,#include “mpi.h“ #include int main(int argc, char *argv) int rank, nprocs; MPI_Init( ,MPI basics a basic example,mpirun np 4 myprog Hello, world. I am 1 of 4 Hello, world
29、. I am 3 of 4 Hello, world. I am 0 of 4 Hello, world. I am 2 of 4,MPI basics send and recv example (1),#include “mpi.h“ #include int main(int argc, char *argv) int rank, size, i; int buffer10; MPI_Status status; MPI_Init( ,MPI basics send and recv example (2),if (rank = 1) for (i=0; i10; i+) bufferi
30、 = -1; MPI_Recv(buffer, 10, MPI_INT, 0, 123, MPI_COMM_WORLD, ,MPI language bindings,Standard (accepted) bindings for Fortran, C and C+ Java bindings are work in progressJavaMPI Java wrapper to native callsmpiJava JNI wrappers jmpi pure Java implementation of MPI libraryMPIJ same idea Java Grande For
31、um trying to sort it all out We will use the C bindings,High Performance Computing Course Notes 2007-2008,Message Passing Programming II,Modularity,MPI supports modular programming via communicators Provides information hiding by encapsulating local communications and having local namespaces for pro
32、cesses All MPI communication operations specify a communicator (process group that is engaged in the communication),Forming new communicators one approach,MPI_Comm world, workers; MPI_Group world_group, worker_group; int ranks1; MPI_Init(,Forming new communicators - functions,int MPI_Comm_group(MPI_
33、Comm comm, MPI_Group *group) int MPI_Group_excl(MPI_Group group, int n, int *ranks, MPI_Group *newgroup) Int MPI_Group_incl(MPI_Group group, int n, int *ranks, MPI_Group *newgroup) int MPI_Comm_create(MPI_Comm comm, MPI_Group group, MPI_Comm *newcomm) int MPI_Group_free(MPI_Group *group) int MPI_Com
34、m_free(MPI_Comm *comm),Forming new communicators another approach (1),MPI_Comm_split (comm, colour, key, newcomm)Creates one or more new communicators from the original commcomm communicator (handle)colour control of subset assignment (processes with same colour are in same new communicator)key cont
35、rol of rank assignmentnewcomm new communicatorIs a collective communication operation (must be executed by all processes in the process group comm)Is used to (re-) allocate processes to communicator (groups),Forming new communicators another approach (2),MPI_Comm_split (comm, colour, key, newcomm)MP
36、I_Comm comm, newcomm; int myid, color; MPI_Comm_rank(comm, ,0,1,2,3,4,5,6,7,0,1,2,2,1,1,0,0,0:,1:,2:,Forming new communicators another approach (3),MPI_Comm_split (comm, colour, key, newcomm)New communicator created for each new value of colourEach new communicator (sub-group) comprises those proces
37、ses that specify its value in colourThese processes are assigned new identifiers (ranks, starting at zero) with the order determined by the value of key (or by their ranks in the old communicator in event of ties),Communications,Point-to-point communications: involving exact two processes, one sende
38、r and one receiver For example, MPI_Send() and MPI_Recv() Collective communications: involving a group of processes,Collective operations,i.e. coordinated communication operations involving multiple processes Programmer could do this by hand (tedious), MPI provides a specialized collective communica
39、tions barrier synchronize all processes broadcast sends data from one to all processes gather gathers data from all processes to one process scatter scatters data from one process to all processes reduction operations sums, multiplies etc. distributed data all executed collectively (on all processes
40、 in the group, at the same time, with the same parameters),MPI_Barrier (comm)Global synchronizationcomm is the communicator handleNo processes return from function until all processes have called itGood way of separating one phase from another,Collective operations,Barrier synchronizations,You are o
41、nly as quick as your slowest process,Barrier sync.,Barrier sync.,MPI_Bcast (buf, count, type, root, comm)Broadcast data from root to all processesbuf address of input buffer or output buffer (root)count no. of entries in buffer (=0)type datatype of buffer elementsroot process id of root processcomm
42、communicator,Collective operations,proc.,data,A0,A0,A0,A0,A0,One to all broadcast,MPI_BCAST,Broadcast 100 ints from process 0 to every process in the groupMPI_Comm comm;int array100;int root = 0;MPI_Bcast (array, 100, MPI_INT, root, comm);,Example of MPI_Bcast,MPI_Gather (inbuf, incount, intype, out
43、buf, outcount, outtype, root, comm)Collective data movement functioninbuf address of input bufferincount no. of elements sent from each (=0)intype datatype of input buffer elementsoutbuf address of output buffer (var param)outcount no. of elements received from eachouttype datatype of output buffer
44、elementsroot process id of root processcomm communicator,Collective operations,proc.,data,A0,A0,A1,A2,A3,All to one gather,MPI_GATHER,A1,A2,A3,MPI_Gather (inbuf, incount, intype, outbuf, outcount, outtype, root, comm)Collective data movement functioninbuf address of input bufferincount no. of elemen
45、ts sent from each (=0)intype datatype of input buffer elementsoutbuf address of output bufferoutcount no. of elements received from eachouttype datatype of output buffer elementsroot process id of root processcomm communicator,Collective operations,proc.,data,A0,A0,A1,A2,A3,All to one gather,MPI_GAT
46、HER,A1,A2,A3,Input to gather,MPI_Gather (inbuf, incount, intype, outbuf, outcount, outtype, root, comm)Collective data movement functioninbuf address of input bufferincount no. of elements sent from each (=0)intype datatype of input buffer elementsoutbuf address of output buffer (var param)outcount
47、no. of elements received from eachouttype datatype of output buffer elementsroot process id of root processcomm communicator,Collective operations,proc.,data,A0,A0,A1,A2,A3,All to one gather,MPI_GATHER,A1,A2,A3,Output gather,MPI_Gather (inbuf, incount, intype, outbuf, outcount, outtype, root, comm)C
48、ollective data movement functioninbuf address of input bufferincount no. of elements sent from each (=0)intype datatype of input buffer elementsoutbuf address of output buffer (var param)outcount no. of elements received from eachouttype datatype of output buffer elementsroot process id of root proc
49、esscomm communicator,Collective operations,proc.,data,A0,A0,A1,A2,A3,All to one gather,MPI_GATHER,A1,A2,A3,Receiving proc.,MPI_Gather example,Gather 100 ints from every process in group to root MPI_Comm comm; int gsize, sendarray100; int root, myrank, *rbuf;. MPI_Comm_rank( comm, myrank); / find proc. id If (myrank = root) MPI_Comm_size( comm, ,