Running with MPI

Questions relating to the use of the WRF Post Processor V3

Running with MPI

Postby donmorton » Mon Jun 07, 2010 8:18 pm

Howdy,

After a fair amount of compilation struggles, I managed to compile the dmpar version of wrfpost.exe, and am now trying to run wrfpost.exe on a Cray XT5 by inserting the following command line in run_wrfpost:

aprun -n 8 ${POSTEXEC}/wrfpost.exe < itag > wrfpost_${domain}.$fhr.out 2>&1

Then, I have run_wrfpost called by a PBS script which allocates 8 cores. Although it does execute, what I get for output looks something like:

we will try to run with 1 server groups
we will try to run with 1 server groups
*** you specified 0 I/O servers
we will try to run with 1 server groups
we will try to run with 1 server groups
CHKOUT will write a file
*** you specified 0 I/O servers
*** you specified 0 I/O servers
CHKOUT will write a file
CHKOUT will write a file
The Posting is using 1 MPI task
There are 0 I/O servers
The Posting is using 1 MPI task
The Posting is using 1 MPI task
There are 0 I/O servers
There are 0 I/O servers
*** you specified 0 I/O servers
CHKOUT will write a file
The Posting is using 1 MPI task
There are 0 I/O servers
0


So, the 8 tasks are launched but

a) Task 7 does not appear to take on the role of an I/O server(the latest WRF-ARW user's guide seems to imply that it should?)
b) It appears that each task is only aware of itself, and not the other tasks.

The code actually runs, but takes 9 minutes (1049x1049x51 gridpoints) whether I use 4 or 8 tasks.

There are plenty of things I might be doing wrong, and I'm preparing to jump into sorc/wrfpost/SETUP_SERVERS.f to start some tracing, but before I get in too deep, I'm just wondering if anyone else out there has experience in this area and is aware of any "gotchas" that might save me a day or two!

I'm literate in MPI and such, so don't really need a lesson in that aspect. If I have to, I'll try to figure out why the call to mpi_comm_size() seems to be returning 1 for npes, rather than 8.

Thanks,

Don Morton
Arctic Region Supercomputing Center
Boreal Scientific Computing LLC
Fairbanks, Alaska USA
donmorton
 
Posts: 7
Joined: Wed Jul 30, 2008 2:16 pm
Location: Fairbanks, Alaska, USA

Re: Running with MPI

Postby donmorton » Thu Jul 01, 2010 4:38 pm

Here is a solution to the problem I posted - this is cut and pasted from my post to wrf-users:

Many thanks to Hui-Ya Chuang at EMC/NCEP for help with this. I will post some information here (and paste into WRF Users Forum) so that the next time somebody googles around, they may find something helpful.

1) After inserting a few debug statements, it became apparent that MPI_Init() simply wasn't working the way it should - each task was only aware of itself and not any others. It seems that the default behavior of the WPP downloaded from DTC is to assume that users don't want to use MPI, so an MPI stubs library is compiled and linked to. To get around this, I just went into WPPV3/sorc/wrfpost/makefile and removed the $(MPILIB) from the LIBS line, so that mpif90 would link in its own libmpi.a. This fixed the initial problem, and Hui-Ya saved me many hours of work by pointing this out.

2) The other problem I ran into, after wrfpost.exe had run a while, was an out of bounds array issue in an argument to one of the MPI calls. This was in the source file WPPV3/sorc/wrfpost/EXCH.f. It turns out that at this location, someone had entered an IBM compiler directive "!@PROCESS NOCHECK" to get around this problem, but since I'm using PGI on a Linux system, it was meaningless. So, there are two places in EXCH.f with that IBM compiler directive, and using the PGI equivalent of "cpgi$r nobounds" in both locations alleviated that problem.

wrfpost.exe is now running on multiple cores on the Linux system, and it's running much faster!

I do need to go in and verify that the resulting GRIB file is a reasonable approximation of the one obtained by serial wrfpost.exe.
- Hide quoted text -


On Wed, Jun 30, 2010 at 3:52 PM, Don Morton <Don.Morton@alaska.edu> wrote:

The appended is a post I made to the WRF Users Forum on 08 June. The absence of replies there suggests nobody loves me on that forum, so I'll try another :)

Since the time of my post, I've also compiled this (using mpif90, etc.) on a Penguin Computing cluster of Opteron processors, and am running in the same problem. I've also removed the "PBS Script" interface and am simply using PBS to grab an interactive node, then running ./run_wrfpost straight from the command line. My questions are

1) Are any of you actually running wrfpost.exe in parallel?
2) Are there any "gotchas" I might want to be aware of before digging in deeper?

Thanks for any help,

Don Morton
Arctic Region Supercomputing Center
Boreal Scientific Computing LLC
Fairbanks, Alaska USA
donmorton
 
Posts: 7
Joined: Wed Jul 30, 2008 2:16 pm
Location: Fairbanks, Alaska, USA

Re: Running with MPI

Postby wedgef5 » Fri Sep 17, 2010 8:50 am

I'd like to add a follow-up to Don's helpful post. I fought with WPP for over a week; trying to get it to compile and run with MPI. It would build cleanly, but would fail during the initial MPI setup. I found that there was a symbolic link from sorc/wrfpost/mpif.h pointing to an mpif.h in lib/wrfmpi_stubs. Once I removed that link, I got a wrfpost that would run with MPI.

Matt Foster
NWS Norman, OK
wedgef5
 
Posts: 8
Joined: Tue Mar 24, 2009 4:11 pm


Return to WRF Post Processor V3

Who is online

Users browsing this forum: No registered users and 1 guest