Running WRF off local storage on cluster

Any issues with the actual running of the WRF.

Running WRF off local storage on cluster

Postby alvaroafernandez » Tue Nov 06, 2018 12:33 pm

I am new to the forum, so please excuse any newbie mistakes...

I am running WRF 3.8. on a Red Hat cluster, and need to have the file I/O go to local drives on the nodes. Currently WRF is starting up on an NFS share, and I don't want to hit that with the I/O. (I know...)

So is there a way to tell WRF tasks to redirect their output to local directories? I expect only the master actually writes, so the question is likely better phrased as - how do I tell WRF to write its output to local drive which I designate?

Previous answers to a similar question indicated there was a place to do this on the namelist.input, but I don't find those tags in the current namelist...
alvaroafernandez
 
Posts: 3
Joined: Tue Oct 30, 2018 5:43 pm

Re: Running WRF off local storage on cluster

Postby kwthomas » Tue Nov 06, 2018 5:01 pm

Change the "history_outname" line in "namelist.input" to look like:

history_outname='/some_path/wrfout_d<domain>_<date>',

If isn't in your "namelist.input" file, put it under "&time_control". The location doesn't matter.

Note this will break "ready" files if you are running PNETCDF, however, I'm probably the only
person in the WRF world running this combo.
Kevin W. Thomas
Center for Analysis and Prediction of Storms
University of Oklahoma
kwthomas
 
Posts: 244
Joined: Thu Aug 07, 2008 6:53 pm

Re: Running WRF off local storage on cluster

Postby alvaroafernandez » Wed Nov 07, 2018 2:13 am

kwthomas wrote:Change the "history_outname" line in "namelist.input" to look like:

history_outname='/some_path/wrfout_d<domain>_<date>',

If isn't in your "namelist.input" file, put it under "&time_control". The location doesn't matter.

Note this will break "ready" files if you are running PNETCDF, however, I'm probably the only
person in the WRF world running this combo.


I've tried that - jury's still out on whether it helped. Scaling still isn't good at all.

I'm confused why this tag isn't mentioned in https://esrl.noaa.gov/gsd/wrfportal/namelist_input_options.html? Is it some sort of undocumented feature?

In addition - are there any other settings that might impact where WRF is writing its output?
alvaroafernandez
 
Posts: 3
Joined: Tue Oct 30, 2018 5:43 pm

Re: Running WRF off local storage on cluster

Postby kwthomas » Wed Nov 07, 2018 5:52 pm

File "run/README.namelist" in the release you are using is supposed to have everything, though
"history_outname" is missing there too. I must have learned about it by looking thru the code many
versions ago.

The biggest cause of slowness with WRF is I/O.

You are likely using "io_form_history=2". This is your basic netcdf, however, it can be horribly slow. It
does sequential I/O, but it also does a log\t of seeking (search to different locations in the file to write header
info). Like HDF4, this gets slower and slower as the file gets bigger and bigger.

You can get some improvement if I/O is on a Lustre filesystem that supports stripes. Most do, though I've been
on what that will scramble the file if you do stripes. If you do, always use an even number. My testing shows
that odd numbers are slow, while even numbers are fast. 2 or 4 are probably best.

io_form_history=102 is *very* fast, however, you are writing split files. This means that each processor
writes out individual small files in parallel. The catch is that you need software that knows how to read them.
I'm not aware of any software other than the CAPS ARPS software that knows what a split file is.

io_form_history=11, pnetcdf, is also fast. The catch is that you need to fix some bugs in the WRF code for
this to work. If you really need this, I can supply you with mine.

I've not been able to get HDF5/PHDF5 to do anything useful in terms of speed. Since other code we use
(GSI) doesn't support it anyways, it is of no value for us.

Did I answer your question? :-)
Kevin W. Thomas
Center for Analysis and Prediction of Storms
University of Oklahoma
kwthomas
 
Posts: 244
Joined: Thu Aug 07, 2008 6:53 pm

Re: Running WRF off local storage on cluster

Postby alvaroafernandez » Wed Nov 07, 2018 6:50 pm

This is a really useful answer. :) Do you know what WRF's going to do (if anything) now that Linux and Lustre will be less tightly integrated? There won't be kernel support for it in the future is what I've read - still runs of course, but no more of the integration.

With respect to other file systems - have you heard of anyone using pNFS? That's the parallel implementation of NFS for Red Hat, and it has a bit of a Lustre flavor - uses multiple servers and eliminates the NFS bottleneck.

Circling back to me: what I'm understanding is that I should put in io_form_history=102, but that this will create split files and I'll need to get another application or tool to stitch them back together? I can likely handle that if it helps speed this up. pNETCDF and refactoring WRF - I don't think I want to deal with that.
alvaroafernandez
 
Posts: 3
Joined: Tue Oct 30, 2018 5:43 pm

Re: Running WRF off local storage on cluster

Postby kwthomas » Thu Nov 08, 2018 7:20 pm

Lustre is a filesystem type and is independent of Linux outside of device drivers. WRF itself doesn't care about
the filesystem type.

I've not heard of pNFS. Checking the web page, it sounds interesting. All the supercomputers I run on use
Lustre.

I've not heard of any other software that handles WRF splitfiles. Maybe there is something that does, or used
to as WRF people wrote in the code. On the other hand, standard meteorology software that uses WRF output,
have no concept of a splitfile. That's why I switched to PNETCDF. Since we're running 60 hour forecasts, in
support of HWT, there is a time deadline to prepare the data for the experiment. I can't afford to add two hours
of runtime for I/O. That doesn't even mention the SU costs on the supercomputers.
Kevin W. Thomas
Center for Analysis and Prediction of Storms
University of Oklahoma
kwthomas
 
Posts: 244
Joined: Thu Aug 07, 2008 6:53 pm

Re: Running WRF off local storage on cluster

Postby dcvz » Thu Nov 08, 2018 9:13 pm

NCAR uses a slightly modified version of the CAPS joiner program for form-102. I don't know why they don't distribute it, but if you make a request on the new forum they will most likely provide it.

I did some testing of pnetcdf and PIO with WRF and found that the speed-up depends on the size of the domain and the computer architecture. Some systems really benefit from the enhanced I/O, others not so much.

If you have bugfixes for the WRF code then by all means post them to the new forum or modify a branch of the github repository. They're making WRF accessible on github so the community can contribute code more easily.
dcvz
 
Posts: 176
Joined: Tue Apr 15, 2008 12:02 am

Re: Running WRF off local storage on cluster

Postby kwthomas » Fri Nov 09, 2018 4:53 pm

I run PNETCDF on STAMPEDE2 and BRIDGES. I *think* I ran it on the TACC LS5.

Best performace is when "nio_tasks_per_group" is "nproc_y", or a multiple of "nproc_y".

Does the old email "wrfhelp@ucar.edu" work?
Kevin W. Thomas
Center for Analysis and Prediction of Storms
University of Oklahoma
kwthomas
 
Posts: 244
Joined: Thu Aug 07, 2008 6:53 pm


Return to Runtime Problems

Who is online

Users browsing this forum: No registered users and 9 guests