elic Posted June 24, 2013 Back in 2011 I was experiencing problems using prboom+: the engine would randomly crash due to signal 11. I reported the bug here, but before I could post any useful information, the bug stopped appearing. I don't remember what I did to stop it (or if it just stopped on its own) but recently it reappeared. I switched to the newer prboom+ 2.5.1.4 and the crashes still occur. My OS is Windows 7. stdout.txt:M_LoadDefaults: Load system defaults. default file: C:\Users\radioshack\Desktop\wad\prboom-plus-2.5.1.4.test/prboom-plus.cfg found C:\Users\radioshack\Desktop\wad\prboom-plus-2.5.1.4.test/prboom-plus.wad PrBoom-Plus v2.5.1.4 (http://prboom-plus.sourceforge.net/) found C:\Users\radioshack\Desktop\wad\Doom.wad IWAD found: C:\Users\radioshack\Desktop\wad\Doom.wad PrBoom-Plus (built May 26 2013 23:39:20), playing: The Ultimate DOOM PrBoom-Plus is released under the GNU General Public license v2.0. You are welcome to redistribute it under certain conditions. It comes with ABSOLUTELY NO WARRANTY. See the file COPYING for details. V_Init: allocate screens. V_InitMode: using 32 bit video mode I_CalculateRes: trying to optimize screen pitch test case for pitch=5120 is processed 5742 times for 100 msec test case for pitch=5152 is processed 5740 times for 100 msec optimized screen pitch is 5120 I_InitScreenResolution: Using resolution 1280x768 found C:\Users\radioshack\Desktop\wad\prboom-plus-2.5.1.4.test/prboom-plus.wad found C:\Users\radioshack\Desktop\wad\Megawads\DTWID.wad found C:\Users\radioshack\Desktop\wad\Doom.wad D_InitNetGame: Checking for network game. W_Init: Init WADfiles. adding C:\Users\radioshack\Desktop\wad\Doom.wad adding C:\Users\radioshack\Desktop\wad\prboom-plus-2.5.1.4.test/prboom-plus.wad adding C:\Users\radioshack\Desktop\wad\Megawads\DTWID.wad W_InitCache Loading DEH lump from C:\Users\radioshack\Desktop\wad\Megawads\DTWID.wad Loading DEH file C:\Users\radioshack\Desktop\wad\Megawads\DTWID.deh M_Init: Init miscellaneous info. R_Init: Init DOOM refresh daemon - R_LoadTrigTables: Endianness...ok. R_InitData: Textures Flats Sprites R_Init: R_InitPlanes R_InitLightTables R_InitSkyMap R_InitTranslationsTables R_InitPatches P_Init: Init Playloop state. I_Init: Setting up machine state. I_InitSound: configured audio device with 1024 samples/slice Fluidplayer: Fluidsynth version 1.1.3 fl_init: error loading soundfont SGM-V2.01.sf2 portmidiplayer device list: MMSystem:Microsoft MIDI Mapper MMSystem:Microsoft GS Wavetable Synth MMSystem:Timidity++ Driver MMSystem:BASSMIDI Driver portmidiplayer: Opening device MMSystem:Microsoft MIDI Mapper for output I_InitSound: sound module ready S_Init: Setting up sound. S_Init: default sfx volume 15 HU_Init: Setting up heads up display. I_InitGraphics: 1280x768 I_UpdateVideoMode: 0xe0000000, SDL buffer, direct access SetRatio: width/height parameters 1280x768 SetRatio: storage aspect ratio 5:3 SetRatio: assuming square pixels SetRatio: display aspect ratio 5:3 SetRatio: overruled by user configuration setting SetRatio: revised display aspect ratio 4:3 SetRatio: gl_ratio 1.600000 SetRatio: multiplier 1/1 ST_Init: Init status bar. vorb_registersong: failed mad_registersong failed: input buffer too small (or EOF) db_registersong: couldn't load as tracker Exp_RegisterSongEx: Using player portmidi midi player vorb_registersong: failed mad_registersong failed: input buffer too small (or EOF) db_registersong: couldn't load as tracker Exp_RegisterSongEx: Using player portmidi midi player P_GetNodesVersion: using normal BSP nodes P_GetNodesVersion: using normal BSP nodes vorb_registersong: failed mad_registersong failed: input buffer too small (or EOF) db_registersong: couldn't load as tracker Exp_RegisterSongEx: Using player portmidi midi player vorb_registersong: failed mad_registersong failed: input buffer too small (or EOF) db_registersong: couldn't load as tracker Exp_RegisterSongEx: Using player portmidi midi player P_GetNodesVersion: using normal BSP nodes vorb_registersong: failed mad_registersong failed: input buffer too small (or EOF) db_registersong: couldn't load as tracker Exp_RegisterSongEx: Using player portmidi midi player P_GetNodesVersion: using normal BSP nodes vorb_registersong: failed mad_registersong failed: input buffer too small (or EOF) db_registersong: couldn't load as tracker Exp_RegisterSongEx: Using player portmidi midi player P_GetNodesVersion: using normal BSP nodes I_SignalHandler: Exiting on signal: signal 11 I_ShutdownSound: 0 Quote Share this post Link to post
entryway Posted June 24, 2013 Can you reproduce the crash with -devparm? 0 Quote Share this post Link to post
wesleyjohnson Posted June 25, 2013 First suggestion: HEAT, clean off your heatsinks, check your cpu and vid card fans. 0 Quote Share this post Link to post
kb1 Posted June 25, 2013 stdout.txtI_CalculateRes: trying to optimize screen pitch test case for pitch=5120 is processed 5742 times for 100 msec test case for pitch=5152 is processed 5740 times for 100 msec optimized screen pitch is 5120 I_InitScreenResolution: Using resolution 1280x768 This is interesting...are you timing the memory cache system by increasing the horizontal resolution? If so, cool! 0 Quote Share this post Link to post
GreyGhost Posted June 26, 2013 It might be my old nemesis the SDL_mixer, see if disabling music makes a difference. 0 Quote Share this post Link to post
entryway Posted June 26, 2013 kb1 said:This is interesting...are you timing the memory cache system by increasing the horizontal resolution? If so, cool! Sometimes it makes sense: Core2 (1x) test case for pitch=1024 is processed 28294 times for 100 msec test case for pitch=1056 is processed 28896 times for 100 msec AMD 64 X2 4200 (5x) test case for pitch=1024 is processed 1618 times for 100 msec test case for pitch=1056 is processed 8539 times for 100 msec Pentium4 (16x) test case for pitch=1024 is processed 1130 times for 100 msec test case for pitch=1056 is processed 18550 times for 100 msec IIRC, old versions of prboom run faster at 1600x1200 than at 1024x768. PrBoom has simple checkif (!(SCREENPITCH % 1024)) SCREENPITCH += 32; PrBoom+ uses test function, because not only 1024 is noticeable slower on some old hardware. 0 Quote Share this post Link to post
entryway Posted June 26, 2013 GreyGhost said:It might be my old nemesis the SDL_mixer, see if disabling music makes a difference. IIRC, SDL_mixer causes crashes even without SIGSEGV message. If he got "signal 11", then something is wrong with prboom-plus, and I need adress of crash (-devparm) and used exe+map 0 Quote Share this post Link to post
wesleyjohnson Posted June 26, 2013 Important: Is it a random crash at unpredictable times, that seems to only happen with PrBroom ? Is it predictable where it crashes when starting PrBoom, always at the same place ? - immediately restart in exactly the same way, and report if it fails in exactly the same way Does it depend on which PWAD or IWAD is loaded, or which game is being run ? - does the failure change when a different PWAD or game is selected Test another comparable program with many ptrs ? video game Linux kernel compile another unrelated Doom port Run PrBoom under a debugger to get exactly which instruction is segfaulting. Save segfault location for three failures. If location is not consistent, then suspect memory failure. So much data are memory ptrs that random memory errors will hit one sooner than you will notice an odd pixel or draw on the screen. Signal 11: Segmentation fault - random: overclocking, heat, low virtual memory, memory problem (bit flip hitting a ptr address) - PrBoom only, for all wads: blame PrBoom - PrBoom only on certain wads: blame the wad, and maybe PrBoom is not checking adequately for corrupt or ZDoom wads. Almost every cause of Sig11 (it is about Linux kernel compiles, but it also cover sig11 problems rather well) http://www.bitwizard.nl/sig11/ 0 Quote Share this post Link to post
entryway Posted June 26, 2013 it is random it does not depend from iwad/pwad it happens only in software mode, probably in R_DrawColumn [/vanga mode off] 0 Quote Share this post Link to post
Quasar Posted June 27, 2013 If you're still using that SDL_mixer postmix callback from PrBoom 2.x, then you're crashing because the channels[] array is not protected with a semaphore. Get some proper multithreading in there. 0 Quote Share this post Link to post
elic Posted June 27, 2013 Thanks for all the responses. I'll try playing with -devparm and updating when the game crashes again. Also I might try cleaning my heat sinks. Interestingly enough, while the crashes almost always occur inconsistently in random places, a while back there was one place where the problem kept occurring. During the first trek into the central courtyard of Coffee Break Map11, the game crashed several times while I was fighting the cyberdemon. After a while this stopped happening, and I can now play the map without the bug occurring. Quasar said:If you're still using that SDL_mixer postmix callback from PrBoom 2.x, then you're crashing because the channels[] array is not protected with a semaphore. Get some proper multithreading in there. Are you directing this post at me? Honestly I have no idea at what any of this means. 0 Quote Share this post Link to post
wesleyjohnson Posted June 27, 2013 Still would help to narrow the possibilities. It is likely more than one segfault source exists in PrBoom and SDL mixer. Two people getting a segfault does not mean it is the same cause. Unless you are exchanging information privately, there has not been enough to exclude these possibilities. I find the report that it went away and then came back again later to be most suspicious. Software faults do not react that way without some environment change (changing your config settings would do it though, or selecting different options). Tests for the SDL mixer suspect. 1. Faults in the mixer should vary with different music and sound (long vrs short) because that affects contention for mixer resources. The fault should vary with different wads. 2. Software draw mode should not affect software faults in the sound mixer. 3. Can be tested by turning off sound effects and music. Does the segfault stop ? 4. If the SDL mixer faults this much, it should also fault the same with other SDL mixer programs. Try DoomLegacy, it uses SDL mixer too. (It might also segfault if there is a memory failure just due to similarity to PrBoom layout). Try other SDL games too. 5. A debugger can verify the segfault is in SDL mixer code. Tests for drawer faults. 1. Usually caused by releasing a texture from memory while status bar drawing is still using it. Some textures have multiple uses. In DoomLegacy, I had to resort to locking all status bar textures so they cannot be released. 2. Test by disabling in code all the releasing and purging of memory allocation. Mostly, this can be done by modifying Z_Free (or the equivalent). Does the segfault stop ? 3. Some other user gets the same fault on a different machine. 4. Run the program in a debugger and record the exact failure location for three failures. Software failures will be consistent in some way, like the address, or the instruction that fails. Tests for memory failure faults. 1. Unfortunately, the existence of only one program, or even just software draw triggering the segfault does not prove anything. It is a matter of putting a memory pointer in the failing location while toggling the neighbor bits in a contrary way. The particular program that fails due to a particular memory fault is not related to its size, nor is it predictable. 2. Memory test programs cannot find all kinds of memory failure. I have more than once written a memory test program to try to find a fault. None of them ever found anything. In one case it was an operating system problem, and for the other I changed the memory chips. 3. If anyone can verify the same segfault on a different machine, then cannot be memory failure. Just having segfaults (like due to SDL mixer) by itself does not prove this segfault is not memory failure. It wastes time to try chasing down in software a fault that is hardware based. However, software modifications can move the fault location around and temporarily mask memory failure. Same fault, different machines, is the best discriminant. 4. Make sure all your memory is the same brand. With mixed brands must adjust memory settings by hand (some BIOS do not do this well). 5. Swap the memory chip locations and retest. Does the segfault change character ? 6. Run with only one memory enabled at a time. Test for segfault. 7. Alter the BIOS memory settings, wait states, and retest. 8. Disable cache and retest. 9. Change all the memory out and retest. Most drastic. One person reported that all his chips had same failure pattern, and that swapping positions did not affect failure. Swapping out all chips did cure it. 10. Start another execution intensive program first, leave it running, and retest PrBoom. This moves the PrBoom execution location (but not so much in cache). Does the segfault change in anyway ? If it is software, it should not be affected in any way. If it is memory failure, it should change. I might still segfault, but in a different location. 0 Quote Share this post Link to post
Quasar Posted June 27, 2013 Processingcontrol said:Are you directing this post at me? Honestly I have no idea at what any of this means. Nope, that was toward entryway. He earlier rebuffed my advice to add concurrency protection to the channel structs in PrBoom-Plus's i_sound.c, and I have never checked since then to see if it was ever implemented. The SDL_audio core's audiospec callback is registered as the "run" function for an SDL thread. This means that, with respect to the main application, it runs asynchronously. SDL_mixer in turn calls its postmix from the function it registers as the SDL audiospec callback. If the audiospec callback preempts the main application thread during I_StartSound or I_StopSound, or the main application thread preempts the audiospec callback in order to change any of the data in the channels structure, then a race condition is absolutely inevitable. Any amount of simply wishing this wasn't the case won't prevent it, and setting an affinity flag does not (and let me emphatically stress this as much as possible) prevent threads of the same application from preempting each other. Affinity only restricts those threads to all sharing the same CPU or core for scheduling. wesleyjohnson said:Tests for memory failure faults. Speculation about memory faults is useless. I've used machines with memory faults. They BSOD and reset constantly. This is not a memory fault. Never blame the end user's hardware when something is obviously a software error. Lazy and shameful to even think about it until everything else has been absolutely ruled out. 0 Quote Share this post Link to post
wesleyjohnson Posted June 27, 2013 I have run PrBoom 2.5 on XP and Linux, with SDL, and have not seen segfault problems. PrBoom software faults should not be affected significantly enough by the difference in OS that my machines remain unaffected. I keep memory faults open as a possibility because they are difficult to differentiate from software, and the reported fault is suspiciously like memory failure. It is usually the first to get eliminated. The tests only have to show any characteristic that excludes the memory fault possibility. Memory faults that do not BSOD have been reported frequently. I have had machines with them, and they did not BSOD. Linux has proven capable of finding faulty memory where Windows would not notice. It is easy to suspect that PrBoom could find faulty memory where Windows would not BSOD. It only requires that the failing bit be outside of the windows OS itself, and that is most of memory. It is impossible to exclude all software possibilities first because no software is ever bug free. That time will never occur and no one could ever prove that it had arrived. To not check the memory possibility until software has been looked at leads to never checking the memory. As far as I am concerned, you can do both in parallel. I usually do one test from one possibility and one from another until something important happens. Should make some attempt to exclude the memory failure possibility or confirm a connection. While a comprehensive memory test is impossible and there will be no absolute answer of memory perfection ... it is worth at least a half hours effort. 0 Quote Share this post Link to post
wesleyjohnson Posted June 28, 2013 First time this happened was 2011 (what time of year was that). Then it stopped. Now it is happening again. Does that correspond to summer, winter, and then summer again ??? Should give the HEAT possibility some more consideration (in addition to the above efforts). 0 Quote Share this post Link to post
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.