segmentation fault

Any issues with the actual running of the WRF.

segmentation fault

Postby peter » Thu Nov 24, 2016 9:58 am

I am getting an annoying interruption of WRF. I changed different options but always at some point I get a segmentation fault message (in each run at a different time and with diverse rsl* tails). I modified timestep, amount of memory (up to 32 GB with dmpar), the number of tasks, the position and separation of the domains. For a 1 domain run I get a successful end but for 2 domains never. I also set ulimit -s unlimited. For debug_level =9999 I got the following message and tails in my last run, which provide no useful clue for me, may be there is some parametrization or topography problem which I am not aware of or any other namelist issue:

mpirun noticed that process rank 2 with PID 12367 on node g02 exited on signal 11 (Segmentation fault).

tail rsl.error.0003 (0001and 0000 are nearly the same)
d02 2012-10-06_10:45:20 calling inc/HALO_EM_C2_inline.inc
d02 2012-10-06_10:45:20 calling inc/PERIOD_BDY_EM_B3_inline.inc
d02 2012-10-06_10:45:20 calling inc/PERIOD_BDY_EM_B_inline.inc
d02 2012-10-06_10:45:20 calling inc/HALO_EM_C_inline.inc
d02 2012-10-06_10:45:20 calling inc/HALO_EM_C2_inline.inc
d02 2012-10-06_10:45:20 calling inc/PERIOD_BDY_EM_B3_inline.inc
d02 2012-10-06_10:45:20 calling inc/PERIOD_BDY_EM_B_inline.inc
d02 2012-10-06_10:45:20 calling inc/HALO_EM_C_inline.inc
d02 2012-10-06_10:45:20 calling inc/HALO_EM_C2_inline.inc
d02 2012-10-06_10:45:20 calling inc/PERIOD_BDY_EM_B3_inline.inc

tail rsl.error.0002
d02 2012-10-06_10:45:20 --> TOP OF DIAGNOSTICS PACKAGE
d02 2012-10-06_10:45:20 --> CALL DIAGNOSTICS PACKAGE: NWP DIAGNOSTICS
d02 2012-10-06_10:45:20 call HALO_RK_E
d02 2012-10-06_10:45:20 calling inc/HALO_EM_E_5_inline.inc
d02 2012-10-06_10:45:20 call HALO_RK_MOIST
d02 2012-10-06_10:45:20 calling inc/HALO_EM_MOIST_E_5_inline.inc
d02 2012-10-06_10:45:20 call end of solve_em
d02 2012-10-06_10:45:20 DEBUG wrf_timetoa(): returning with str = [2012-10-06_10:45:20]
d02 2012-10-06_10:45:20 DEBUG wrf_timetoa(): returning with str = [2012-10-06_00:00:00]
d02 2012-10-06_10:45:20 DEBUG wrf_timetoa(): returning with str = [2012-10-07_06:00:00]

tail rsl.out.0002
d01 2012-10-06_10:46:00 Top of Radiation Driver
d01 2012-10-06_10:46:00 calling inc/HALO_PWP_inline.inc
d01 2012-10-06_10:46:00 call surface_driver
d01 2012-10-06_10:46:00 in SFCLAY
d01 2012-10-06_10:46:00 in NOAH DRV
d01 2012-10-06_10:46:00 call pbl_driver
d01 2012-10-06_10:46:00 in YSU PBL
d01 2012-10-06_10:46:00 call cumulus_driver
d01 2012-10-06_10:46:00 calling inc/HALO_CUP_G3_IN_inline.inc
d01 2012-10-06_10:46:00 in kf_eta_cps

tail rsl.out.0003 (0001and 0000 are the same)
d01 2012-10-06_10:46:00 in SFCLAY
d01 2012-10-06_10:46:00 in NOAH DRV
d01 2012-10-06_10:46:00 call pbl_driver
d01 2012-10-06_10:46:00 in YSU PBL
d01 2012-10-06_10:46:00 call cumulus_driver
d01 2012-10-06_10:46:00 calling inc/HALO_CUP_G3_IN_inline.inc
d01 2012-10-06_10:46:00 in kf_eta_cps
d01 2012-10-06_10:46:00 returning from cumulus_driver
d01 2012-10-06_10:46:00 call shallow_cumulus_driver
d01 2012-10-06_10:46:00 calling inc/HALO_EM_FDDA_SFC_inline.inc

I am also including below my namelist. I will appreciate any suggestion

&time_control
run_days = 0,
run_hours = 30,
run_minutes = 0,
run_seconds = 0,
start_year = 2012, 2012, 2012, 2012,
start_month = 10, 10, 10, 10,
start_day = 6, 6, 6, 6,
start_hour = 0, 0, 0, 0,
start_minute = 00, 00, 00,
start_second = 00, 00, 00,
end_year = 2012, 2012, 2012, 2012,
end_month = 10, 10, 10, 10,
end_day = 7, 7, 7, 7,
end_hour = 6, 6, 6, 6,
end_minute = 00, 00, 00, 0,
end_second = 00, 00, 00, 0,
interval_seconds = 21600,
input_from_file = .true., .true., .true., .true.,
history_interval = 360, 360, 60, 60,
frames_per_outfile = 1000, 1000, 1000, 1000,
restart = .false.,
restart_interval = 5000,
io_form_history = 2,
io_form_restart = 2,
io_form_input = 2,
io_form_boundary = 2,
debug_level = 9999,
/

&domains
eta_levels = 1.000, 0.9418, 0.8863, 0.8335, 0.7833,
0.7355, 0.6902, 0.6471, 0.6062, 0.5675,
0.5307, 0.496, 0.4631, 0.432, 0.4026,
0.3748, 0.3487, 0.324, 0.3008, 0.279,
0.2584, 0.2391, 0.2188, 0.2017, 0.186,
0.1714, 0.158, 0.1455, 0.1341, 0.1234,
0.1136, 0.1046, 0.0962, 0.0884, 0.0813,
0.0747, 0.0686, 0.0629, 0.0577, 0.0529,
0.0484, 0.0443, 0.0405, 0.037, 0.0338,
0.0308, 0.028, 0.0255, 0.0231, 0.021,
0.019, 0.0171, 0.0154, 0.0139, 0.0124,
0.011, 0.0098, 0.0087, 0.0076, 0.0066,
0.0057, 0.0048, 0.0041, 0.0033, 0.0027,
0.0021, 0.0015, 0.0009, 0.0005, 0.000,
time_step = 120,
time_step_fract_num = 0,
time_step_fract_den = 1,
max_dom = 2,
e_we = 80, 121, 106, 145,
e_sn = 80, 136, 133, 166,
e_vert = 70, 70, 70, 70,
p_top_requested = 1000,
num_metgrid_levels = 27,
num_metgrid_soil_levels = 4,
dx = 27000, 9000, 3000, 1000,
dy = 27000, 9000, 3000, 1000,
grid_id = 1, 2, 3, 4,
parent_id = 1, 1, 2, 3,
i_parent_start = 1, 18, 50, 27,
j_parent_start = 1, 19, 38, 36,
parent_grid_ratio = 1, 3, 3, 3,
parent_time_step_ratio = 1, 3, 3, 3,
feedback = 1,
smooth_option = 0,
/

&physics
mp_physics = 3, 3, 3, 3,
ra_lw_physics = 1, 1, 1, 1,
ra_sw_physics = 1, 1, 1, 1,
radt = 30, 30, 30, 30,
sf_sfclay_physics = 1, 1, 1, 1,
sf_surface_physics = 2, 2, 2, 2,
bl_pbl_physics = 1, 1, 1, 1,
bldt = 0, 0, 0, 0,
cu_physics = 1, 1, 0, 0,
cudt = 5, 5, 5, 5,
isfflx = 1,
ifsnow = 0,
icloud = 1,
surface_input_source = 1,
num_soil_layers = 4,
sf_urban_physics = 0, 0, 0, 0,
maxiens = 1,
maxens = 3,
maxens2 = 3,
maxens3 = 16,
ensdim = 144,
/

&fdda
/

&dynamics
w_damping = 0,
diff_opt = 1,
km_opt = 4,
diff_6th_opt = 0, 0, 0, 0,
diff_6th_factor = 0.12, 0.12, 0.12, 0.12,
base_temp = 290.,
damp_opt = 0,
zdamp = 5000., 5000., 5000., 5000.,
dampcoef = 0.2, 0.2, 0.2, 0.2,
khdif = 0, 0, 0, 0,
kvdif = 0, 0, 0, 0,
non_hydrostatic = .true., .true., .true., .true.,
moist_adv_opt = 1, 1, 1, 1,
scalar_adv_opt = 1, 1, 1, 1,
/

&bdy_control
spec_bdy_width = 5,
spec_zone = 1,
relax_zone = 4,
specified = .true., .false., .false., .false.,
nested = .false., .true., .true., .true.,
/

&grib2
/

&namelist_quilt
nio_tasks_per_group = 0,
nio_groups = 1,
/
peter
 
Posts: 22
Joined: Tue Apr 15, 2008 10:06 am

Re: segmentation fault

Postby kwthomas » Mon Nov 28, 2016 7:30 pm

You might try setting "w_damping" to 1 to see if it helps.
Kevin W. Thomas
Center for Analysis and Prediction of Storms
University of Oklahoma
kwthomas
 
Posts: 168
Joined: Thu Aug 07, 2008 6:53 pm

Re: segmentation fault

Postby peter » Thu Dec 01, 2016 10:08 am

kwthomas wrote:You might try setting "w_damping" to 1 to see if it helps.

Thank you but that is not an option for me because I want to study gravity waves, and it will wipe them.
peter
 
Posts: 22
Joined: Tue Apr 15, 2008 10:06 am

Re: segmentation fault

Postby kwthomas » Fri Dec 02, 2016 7:11 pm

Hi...

Any chance you did an OpenMP build and are using OMP_NUM_THREADS in your run?

If so, you may have tripped a code bug. I see in in 3.8.1 and 3.7.1. I suspect it is in older versions too.

I finally tracked this one down on STAMPEDE-KNL thanks to changing the debug code to do "-check all" instead of just checking a few things. Some interesting complaints started showing up in the rsl.error* files that id'd what was happening and what was being altered when debug mode was run.
Kevin W. Thomas
Center for Analysis and Prediction of Storms
University of Oklahoma
kwthomas
 
Posts: 168
Joined: Thu Aug 07, 2008 6:53 pm

Re: segmentation fault

Postby peter » Mon Dec 05, 2016 9:29 am

Thank you. I am using WRF 3.8 with MPI and the problem was finally solved by changing the cumulus parametrization from 1 to 5.
peter
 
Posts: 22
Joined: Tue Apr 15, 2008 10:06 am


Return to Runtime Problems

Who is online

Users browsing this forum: beni, Google [Bot] and 4 guests

cron