Page 1 of 1

MPI problem

PostPosted: Fri Feb 10, 2012 6:33 am
by kozokg
Hello
With help of a friend we connect my two computer and compile the model in parallel.
the problem is that when we try to run the model with one core the model run OK but when we try to use more core the run stop with a error.

the error that i get is:

kozokg@kozokg5:~/WRF/WRFV3/run$ mpirun --host kozokg5,kozokg5,kozokg5,kozokg5 -np 4 ./wrf.exe
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 16894 on
node kozokg5 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[kozokg5:16891] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[kozokg5:16891] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
kozokg@kozokg5:~/WRF/WRFV3/run$

this is the massage that i getin the rsl.error file:

taskid: 2 hostname: kozokg5
starting wrf task 2 of 4
Quilting with 1 groups of 0 I/O tasks.
starting wrf task 2 of 4
Ntasks in X 2 , ntasks in Y 2
--- NOTE: num_soil_layers has been set to 5
WRF V3.3.1 MODEL
*************************************
Parent domain
ids,ide,jds,jde 0 0 0 0
ims,ime,jms,jme 0 0 0 0
ips,ipe,jps,jpe 0 -1 0 -1
*************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
alloc_space_field: domain 1 , 121960 bytes allocated
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: set_timekeeping.F LINE: 102
WRFU_TimeSet(startTime) FAILED Routine returned error code = -1
-------------------------------------------
~
I don't have any idea for resolving this problem,
Thanks in advance,
Gilad

Re: MPI problem

PostPosted: Sun Jun 10, 2018 6:43 pm
by cxr
In case this helps anyone else: in my case I encountered this error because I'd accidentally swopped start_month and start_day, making start_month out of bounds.