Jump to content

RealDOOM: DOOM Ported to 16-bit Real Mode


sqpat

Recommended Posts

14 hours ago, viti95 said:

Another idea to reduce memory usage, compress better the IWADs. @fraggle has updated wadptr and fixed old issues, this reduces the amount of memory used for certain graphics, and specially the size of the sidedefs.

 

Hmm, so my original intent was to make it be compatible with the original commercial WADs. However, there's no reason I can't add some initialization code (in an overlay, which wont take up ingame runtime memory) to run the wadptr code on the input commercial WAD at startup if it's detected as uncompressed then  to compress the IWAD ... so for now, I will probably just use the compressed WADs and make a TODO  to port wadptr into some 16 bit code that can optionally run at startup.

 

Thanks for the suggestion. Some doom 2 sidedef lumps were over 64k and I didn't really want to write a kludge to deal with that.

 

 

Edited by sqpat

Share this post


Link to post

Well, I spent a week cleaning up bugs but also trying out some optimizations.

 

There are some notes in the comments of the codebase about precalculating lineopenings and i gave it a try. You can precalulcate it in level setup but have to treat it as a cache and mark things dirty whenever platforms move and sector floor/ceils change, basically. It wastes 6-7 KB and unfortunately its hard to measure the improvement. I tried average ten runs worth of timedemos and it's a tiny bit faster, maybe 0.25% on average or so?  (is there a better way?) . 

 

I tried 8 bit validcounts for linedefs to save some space but it seems that's not enough uniqueness to avoid overlaps/collisions of values and bugs happen like a thousand tics into demos.

 

Removing the back references in mobj_t (for sector lists and block lists) is not hard to do  since they aren't used all that much. A few KB can be saved by rewriting a bit of code and removing sPrev and bPrev fields. I think they are ultimately a tiny bit measurably slower, so I may revert this later.

 

I'm realizing that if i can get the EMS 4.0 improvements I mentioned earlier to work, then I will have way more memory than i need during the physics portion of the code, so all these tricks to reduce sizes of certain fields wont really matter - what will matter is only speed during that phase, and memory usage becomes more important during rendering due to the amount of memory used by textures.

 

I'm going to clean a couple remaining broken features and known bugs, make a simple release build that can hopefully run all the shareware levels, then after that focus on allocating UMB space and EMS 4.0.

Share this post


Link to post

I've managed to fix a half dozen or so known bugs (corrupted palettes, timedemo desyncs, readthis rendering, level intermission crashes) and all the demos are running correctly again with all the improvements from the past few weeks. There is a weird random bug happening where the level begins to render in a mess - A lot of the walls suddenly rendering at the wrong angles among other garbage. I've managed to dump all the level data (sectors, lines, segs, etc etc etc) from memory to file after this has happened but all the data checked out as good. Automap also renders fine when this happens. I don't think this seems like a visplane bug either. I will probably check a few other things like if the trigonometry lookup tables are getting corrupted or something.

 

I think once this last bug is fixed, I will cut the 0.1 release based off of that. Basically, all the shareware content plays fine except for E1M6 which is so much bigger than all the other shareware levels - I would need to clear up another 30kb or so to make it fit. That'll be easy once I have UMB allocation working, which should be another easy 64k (at least)- I will probably update with another minor release after that, then begin on EMS 4.0 work, which feels like it will free up at least another 100k. Stability is as good as it's ever been, outside of the random render bug popping up every couple minutes and other known issues (savegames, sound, nightmare respawns) I don't see too much wrong anymore.

 

Share this post


Link to post

0.1 Release

 

OK, the game has been working pretty cleanly in DOSBOX again, which for it's faults with compatibility at least makes development cycles so much faster since I don't have to mount images and stuff. Thanks to that I got tons of work done in the past few days - I cleaned up a dozen or so bugs and also converted lots and lots of physics and render code to 16 bit logic (then fixed the bugs that caused). Tomorrow I will figure out how github releases work and update the project page there, but attached here is the exe I assume will go out as RealDOOM 0.1.

 

Basically this should work with all maps in shareware except E1M6. I don't *think* it will crash anymore - I haven't seen funny memory bugs recently, I think they are all fixed... There's some features not in there yet like sound of course, savegames, and screen wiping. There are some render bugs like overdrawing in the intermission screen, and a minor fuzz draw bug in timedemo3 and a also some sprite masking bugs. 

 

You need about 605-610k for this one, a little less than the previous one but if you use a standard nearly blank MS-DOS 6.22 config with EMM386 loaded (sample config.sys in the zip file) it should work. The zip also has the dstrings.txt file which is necessary. Don't forget to include DOOM1.WAD

 

The 286-25 bench (screenblocks 5 hi quality Demo3) ran at 31821 realtics, compared to 36074 a month ago. That corresponds to about 12% faster, and the fps is around 2.37 now. This has pretty much just been from rewriting algorithms and refactoring code - there's still no ASM or anything. The pentium times are around 10% faster from the same period... it's hard to say for sure but I think some of the improvements aimed at the 286 specifically (like reducing shifts) have been beneficial.

 

EDIT: 

I've run some extra benches, and it seems like on real hardware (vs 86box) i get 15% faster times on my 286 and 4-5% faster times on my 386 DX-40. My mmx meanwhile ran about 30% faster. Not really sure what's the reason, maybe memory access times or something, but it's a good sign anyway.  I may try to get it to run on the turbo xt tomorrow for funsies... i wonder if it will beat 100,000 realtics...

 

realdoom_0.1.7z

Edited by sqpat

Share this post


Link to post

Cool stuff. I'm surprised it runs, slow as it may be. I tried running it normally as well as running it on the highest cycle speed in DOSBox for funsies, and you can very well brute-force your way to a stable-ish 20fps through emulation, although that defeats the point.

 

Not sure how helpful this will be, but I'll post some of my findings after a lot of messing around:

 

- Doom 2 does not run, no matter what. I slimmed the IWAD down massively - only the textures used in MAP01 (as well as only MAP01), removing all of the sprites for the Doom 2 enemies and any enemies that aren't in MAP01 (e.g Pinkies, Chaingunners), and removing some sounds. It's now at just 4,637 KB, and it runs fine in the original executable, but not here, giving a "ran out of refs" error, presumably something to do with memory.

 

- After beating E1M8 with the shareware IWAD, the text scroll doesn't show up. The game just hangs before it even appears, perhaps it's because of the screenwipe? I'm not super knowledgable about Doom's inner workings.

 

- Compressing the truncated Doom 2 IWAD using wadptr didn't cause an "out of refs" error, but instead hanged on Init Playloop State. The same happened with the shareware Doom IWAD. Both IWADs work just fine in the original executables, so it seems it has trouble with the compressed wadptr output.

Share this post


Link to post
11 minutes ago, realjohnmadden said:

Cool stuff. I'm surprised it runs, slow as it may be. I tried running it normally as well as running it on the highest cycle speed in DOSBox for funsies, and you can very well brute-force your way to a stable-ish 20fps through emulation, although that defeats the point.

 

You can definitely get faster fps on 86box by a factor of 2 or 3 so, but yeah dosbox should at least be playable and is generally more convenient.

 

11 minutes ago, realjohnmadden said:

- Doom 2 does not run, no matter what. I slimmed the IWAD down massively - only the textures used in MAP01 (as well as only MAP01), removing all of the sprites for the Doom 2 enemies and any enemies that aren't in MAP01 (e.g Pinkies, Chaingunners), and removing some sounds. It's now at just 4,637 KB, and it runs fine in the original executable, but not here, giving a "ran out of refs" error, presumably something to do with memory.

 

Oh, yeah - sorry, I thought that was clear - only shareware doom is supported right now. Doom2 generally has larger levels and different content and will need more work. I haven't tested doom2 content at all and it may be buggy - especially viles and the last boss and stuff. Once I've freed up a lot more memory via some upcoming improvements, it'll make more sense to get things like doom 2 or sound working,.

 

11 minutes ago, realjohnmadden said:

- After beating E1M8 with the shareware IWAD, the text scroll doesn't show up. The game just hangs before it even appears, perhaps it's because of the screenwipe? I'm not super knowledgable about Doom's inner workings.

 

 

Oh yeah, the finale... I've not tested at all! I'm sure it's probably broken, haha. Maybe it is because of the wipe. Wipes were removed because they basically need 128KB of free space to run (two VGA screen buffers) which is a huge chunk of your 640 KB. Once EMS multitasking is in there I can probably make it happen.

That reminds me - level restarts after game over also don't work - perhaps because of the screen wipe. You have to go to the menu and select a new game in that case
 

 

11 minutes ago, realjohnmadden said:

- Compressing the truncated Doom 2 IWAD using wadptr didn't cause an "out of refs" error, but instead hanged on Init Playloop State. The same happened with the shareware Doom IWAD. Both IWADs work just fine in the original executables, so it seems it has trouble with the compressed wadptr output.

 

Wadptr will crash without extra work on the codebase due to some of it's optimizations. In order to save space on indexing the wad, I calculate and store the sizes in a certain way and when wad entries overlap it creates 'negative size' entries which leads to trouble. I think what's likely to happen down the road is that rather than using wadptr, I will basically generate the wadptr-style compressed wad at runtime by finding duplicate entries and filtering out duplicates.

 

Thanks for your testing and input!

 

Share this post


Link to post

Some updates - I got UMBs working, and now pull around 70k from there. I then went ahead and allocated a bunch more memory to level data allowing e1m6 to be playable. The conventional memory usage for the build went from 615818 -> 543738 -> 569802, which is still pretty comfortably low now.

 

I might fix a couple bugs and make a quick 0.11 release before going heavy on EMS 4.0 multitasking prep work. EMS 4.0 hardware is somewhat more difficult to come by, especially for a board that will be compatible with XT class systems. The lo-tech EMS boards won't work, and a lot of other ones are 16 bit only I think. Later, faster 286es have chipset support and aren't a problem. I'm not sure how emm386 will work out yet. 

 

Meanwhile, I benched the 0.1 release on a few pieces of hardware over the past day. Below is a full quality timedemo 3 playback on a 4.77 mhz 8088.  It's a little under a million realtics, about a 7 hour video and 0.0888 fps :) 

 

 

 

A V20 at 9.5 mhz finished the typical 5 screenblocks hi detail run bench I tend to run at 162157 realtics, which is about 1/5th of the speed of a fast 286. I have a 16 mhz turbo v20 board I can try it on too at some point.

 

Share this post


Link to post

Now that is peak gaming content!

 

Funnily enough, my Lo-Tech EMS card had just arrived yesterday (and still sitting in the post office until I can pick it up) and I was going to do the exact same thing with my NuXT for shits and giggles, great to see the results regardless :D

Share this post


Link to post

The github readme has been updated with a basic roadmap (and more timedemo scores)

 

Release 0.11 is will come soon - It's not really too different from 0.10, I've fixed a few render bugs and also added UMB support to free enough memory to make the last remaining shareware level playable (e1m6).

 

Upcoming releases will all have to do with varying levels of EMS 4.0 and multitasking support, so this will be the last EMS 3.2 playable version. This isn't a problem for late 286 machines whose chipsets should support these features or machines running EMM386, but earlier 286es without advanced ISA memory cards or XT machines dependent on lotech EMS cards and other simple EMS boards won't be able to run later versions. Maybe a repro card will come around that makes this easier to do, oh well.

 

 

Share this post


Link to post
  • 2 weeks later...

I tried to run it in 86box in 8088 mode with an Everex EMS card but it says it needs 64 KB of UMB and won't run.

 

config.sys is pretty bare, just loading the EMS driver and FILES=30

 

Nothing is being loaded in autoexec.

 

DOS 6.22

Share this post


Link to post

this project seems impressive and all, i don't know that much about how DOS works, but are you aware that this project shares a name with one of Eric Harris' WADs? like i don't want anything messy or generally awkward to happen as this project moves forward.

Share this post


Link to post
22 minutes ago, shroomie said:

this project seems impressive and all, i don't know that much about how DOS works, but are you aware that this project shares a name with one of Eric Harris' WADs? like i don't want anything messy or generally awkward to happen as this project moves forward.

Considering how obscure that WAD is (I hadn't even heard of it until now), especially since it was apparently never even released, I highly doubt anyone will confuse the two. This project gets its name from Real Mode, a CPU operating mode which is limited to 640k of RAM and is what anything designed to run on an 8088 runs in.

Share this post


Link to post
1 minute ago, Plerb said:

This project gets its name from Real Mode, a CPU operating mode which is limited to 640k of RAM and is what anything designed to run on an 8088 runs in.

i skimmed the original post, i know that much

Share this post


Link to post
1 hour ago, Mike Chambers said:

I tried to run it in 86box in 8088 mode with an Everex EMS card but it says it needs 64 KB of UMB and won't run.

 

config.sys is pretty bare, just loading the EMS driver and FILES=30

  

Nothing is being loaded in autoexec.

 

DOS 6.22

 

The 0.11 release needs UMBs - try the 0.10 release which didn't use them (basically the only difference between the two). You will probably have to make sure your config has as much free memory as absolutely possible - it's definitely easier if you have UMBs and DOSMAX or something like that available.

Share this post


Link to post
  • 2 weeks later...
On 12/23/2023 at 7:53 AM, Plerb said:

since it was apparently never even released

I'm guessing at least someone has it but they don't fess up because of probably the chance of being accused as an accomplice or the shame of it

You're telling me no one sent an email to him?

 

That being said, it's a good thing that name is being used for something better

Share this post


Link to post

Happy new year! Time for a big post on technicals and progress since it's been a while.

 

A lot of big work has been going into RealDOOM recently, mostly having to do with EMS 4.0 rework. It's somewhat feature complete, with a few big nasty bugs I have to track down, but I'm sure a meaningful release isn't too far away. I went ahead and created a diagram of how the previous versions work versus how the current version works. It's not 100% exact but it should get the point across. Diagrams are pretty big and hidden behind spoiler text.

 

Here is how RealDOOM up till version 0.11 has handled extra memory using the 64KB EMS page frame. Compare this with vanilla doom, where all this extra data is just freely accessible in memory at any time. Things had to be paged in to the EMS page and a cache system using a LRU (least-recently-used) algorithm replaced the most stale data with newer data as required. Any time new data was needed, it'd just get allocated to the next spot it fit in the logical EMS pages. This page frame system is basically the way EMS worked up until EMS version 3.2, and it was one of the main ways where the 640k memory limit on IBM PCs and DOS was overcome.


 

Spoiler

 

MVjvTO7.png


 

 

 

 

EMS 4.0 eventually came along and made multi-tasking operating systems possible by allowing a big portion of main memory to be swapped, similar to the page frame. Multitasking operating systems would page out entire programs and large regions of memory (384 KB worth, the region from 256KB-640KB) at a time. Using this strategy in RealDOOM, it's no longer necessary for each variable to be paged in and out individually. Instead, we can have a fairly small number of memory setups we switch between.

 

Spoiler


qd3tzCu.png

 

 

In the old system, I had a bunch of code that had to figure out what data to page out to make room to page more data in, and this code originally ran thousands of time per game tic, but I got it down to several dozen eventually. In the last release (0.11), I think there were about 60000 cache misses resulting in page swaps (out of 800,000 accesses to the page frame) to run timedemo 3 on screenblocks 5 and high quality. Each of these cache misses was a relatively slow operation, mostly due to all the LRU updating and overhead. The EMS page swap isn't exactly fast, but it's not so terrible to do on its own either.

 

There are a few other cool aspects to this system - I am loading data for trig tables, strings, mobj states, and stuff like that dynamically from data files at runtime and placing them memory. In fact, every variable in the 256-640kb range is being placed at exact locations I determine at runtime. So once ASM code starts to get written, I can load that ASM code dynamically too, and then modify it in memory to use exact data addresses and immediates instead of loading pointers from variables which should produce faster code. I can also place relevant data into the same segments. That ASM code can also be paged in and out as necessary. And now there's also the fact that there's very little static data and a lot of dynamically loaded leading to a very small binary. The binary is under 256k, and a lot of that is overlayed startup code. WLink reports the conventional memory usage is at under 190,000 bytes right now. Most of that is code - there is less than 64k of data in the binary now, which means additional memory models have become possible. Which may also mean another compiler?

 

Now it the new system, for the most part all the page swaps are being done in predetermined fashion - we know what memory we want active in a given part of the runtime and call functions with preloaded arguments for the page swap interrupts. Where there used to be about 60000 cache misses + page swaps on 800,000 calls to the EMS memory access code, there is now 50000 page swaps in total that do not involve any sort of LRU maintenance. There is one exception, which basically has to do with all the texture stuff. Even in shareware doom, many levels need 500-750 KB of textures each, which wont fit in memory at once. So some space is allocated to a texture cache and we do run an LRU algorithm in there that manages the caches for patches, flats, composite textures, and sprites. 

 

The current architecture has some empty space and room to grow, and I'm pretty sure it will be able to handle the biggest levels in DOOM 1 and DOOM 2. I have also completely removed any use of the page frame - so technically I have another 64 KB I am not using for anything. I am sort of imagining this will eventually be used for sound data, but maybe I will allocate it as extra texture cache in the meanwhile.


There's been a noticeable performance improvement - sort of. Funnily enough the performance improvement is more noticeable on faster machines. Pentiums are 5-10% faster and its like 1% for slower machines like 286-386. I think that just boils down to the fact that pentiums do not waste a lot of time doing things like multiplications and divisions while 286 processors do, and there's not a ton of slow mult/div calls going on in this memory management code. So pentiums were wasting a comparatively higher amount of their cycles in this code.

 

Aside from this, I also got screen wipes working again, and the finale screen is working. I have to re-add dynamic visplanes, which were recently removed, but 60 is fine for all the timedemos so I haven't bothered to fix this yet. I'm also upping the memory requirement from 2 MB of EMS to 4 MB, which hopefully will stay there even for doom 2. Partly out of laziness I have not implemented cache eviction for the texture cache, which made the memory numbers balloon a bit. Texture cache clears after a level of course, but I don't bother clearing unused textures to save space mid-level.

 

I will probably do a release for version 0.15 with these EMS 4.0 features once bugfixing is complete.

 

The high level roadmap going forward:

 

Version 0.16 will mostly revolve around a medium memory model build. I may investigate another compiler, but openwatcom overlays save me 30KB right now...

Version 0.20 will mostly revolve around savegame support and commercial DOOM 1 support.

Version 0.21 will mostly revolve around commercial DOOM 2 support.

 

Heavy optimizations and ASM work will begin after that with some small tasks first, but eventually a full handwritten ASM rewrite of the render codepath (RenderPlayerView) being the goal.

 

The ASM stuff sounds like a lot of fun and I can't wait to get to it. I have a lot of ideas . . .

 

Share this post


Link to post

Wow, you've done a lot! Good to hear it's going so well.

Share this post


Link to post

Version 0.15 is now released. You can read the release notes for more details - a few features were re-implemented or fixed, but mostly it was an architectural rework. It makes doom1 and doom2 support much more straightforward, which should come up soon.

 

- EMS 4.0 support is mandatory of course now. I think you technically need 2.8 MB or so, but let's just say 4 MB going forward. I don't think DOOM 1 will require more. DOOM 2, maybe not either.

- The requirements on memory and UMBs are pretty tight. You need about 90 KB of UMB space in additional to the EMS page frame, so basically C800-EFFF all needs to be free. Until commercial doom1/doom2 support art added and I know exactly how much memory I need for certain fields, I'm going to leave things this way.

- You need about 604k or so free in conventional sapce the full 384kb region between 256K-640K, then about 220k free below that to fit the binary, stack, etc. This isn't a big deal on EMM386 machines, but it's tighter than before on a 16-bit machine with an EMS driver in low memory.

- There is a 286-optimized binary now - it's about 6-7k smaller than the 8088 library. I'm not sure, it might be just because of shift instructions (8088 can only shift one bit at a time, which is funny).

 

The conventional memory requirements should become less tight with 0.16, as the goal of that is a medium memory model compilation, which should naturally result in smaller code.

 

EDIT: I quickly tossed together a medium memory model build, which of course isn't really bugfixed or running right (will take some time, it seems I need to write far versions of fread/fwrite among other things) but immediately there is another 12 KB of memory savings in the binary. If a decent amount of memory frees up, I may be able to move the more critical items into the main data segment to speed things up down the road.

Edited by sqpat

Share this post


Link to post
On 1/7/2024 at 12:18 AM, sqpat said:

The ASM stuff sounds like a lot of fun and I can't wait to get to it. I have a lot of ideas . . .

 

I use some assembly in Doom8088, but it doesn't speed up the game that much compared to the gcc-ia16 generated code.

 

In most cases FixedDiv(a, b) can be replaced by FixedMul(a, FixedDiv(0x10000, b)) or actually FixedMul(a, 0xffffffff / b). (Except for P_InterceptVector() which needs FixedDiv() to keep demo 3 in sync.)

To calculate the reciprocal of a fixed_t value gcc-ia16 generates generic code that divides two int32_t values. To calculate the reciprocal one of the input values is always 0xffffffff, so I made my own reciprocal specific assembly code.

 

I got a speed boost of 4% by programming the loop in R_DrawColumn() in assembly. Then I noticed it could be faster if the segment registers could be set once at the start of the loop, instead of switching during every iteration between pointing to the texture source, video memory destination and the colormap. So I put the colormap in near memory. This change also sped up the C code variant, so now the assembly version is only 0.8% faster. :|

Share this post


Link to post
1 hour ago, Frenkel said:

In most cases FixedDiv(a, b) can be replaced by FixedMul(a, FixedDiv(0x10000, b)) or actually FixedMul(a, 0xffffffff / b). (Except for P_InterceptVector() which needs FixedDiv() to keep demo 3 in sync.)

 

I'm guessing this is a good approximation but not necessarily 100% accurate? There's a number of these 'good enough but not perfect' types of optimizations. (For example, all the hassle with sine table special cases for off-by-one errors). For now I'm leaving them out, but once the project is further along, I can imagine making a branch with a number of changes that break timedemo compatibility but aren't noticeable in normal gameplay, saving a bunch of memory and speeding things up. But maybe at the same time there is a way to do the FixedDiv thing in a lossless way that does not break timedemos?

 

Quote

I got a speed boost of 4% by programming the loop in R_DrawColumn() in assembly. Then I noticed it could be faster if the segment registers could be set once at the start of the loop, instead of switching during every iteration between pointing to the texture source, video memory destination and the colormap. So I put the colormap in near memory. This change also sped up the C code variant, so now the assembly version is only 0.8% faster. :|

 

Yes, I don't want to do too much assembly optimization too early myself because I think a better compiler and optimizer might do most of the work. But I think there is a lot of savings to be done with data locality within segments.

 

I also really want to investigate some other crazy ideas - for instance hacking the data segment at startup. For example, if we know all the initial values for the variables in the data segment at compile time, we can output that to a file and then say "let's set DS to 0x4000" - which is a pageable EMS region. And we load the default variable values into that segment from the file - but now it's in a pageable EMS segment and that wouldn't only address 64k of memory, but maybe 128k, or 192k of effective memory depending on paging and what code was running at a particular time. I have already done some things with far variables such as (not an exact example, but for illustrative purposes)

 

#define thinker_list ((mobj_t far *)  0x90000000)

 

This is possible because at run time I am placing this variable at that memory address, and paging the necessary EMS pages in whenever it needs to be accessed. However, if we change the location of this variable to segment 0x4000 offset 0x4000 maybe we could do:

 

#define thinker_list ((mobj_t near *)  0x4000)

 

Now, thinkers might be really useful in P_Ticker but not in D_Drawer, so in that case we page out the thinkers from there and instead put seg_t, sector_t, line_t, etc in that region. Then we could at the same time have alternatie __far versions of these defines, pointing to a different address range and page these variables to those areas in regions of code when they are needed, but less frequently accessed.

 

This whole idea might require a custom compiler solution, but that's not out of the question either considering the toolsets are open source.

Edited by sqpat

Share this post


Link to post

I managed to save another 15-20k in conventional memory again today, by pulling the lumpinfo data (basically the WAD directory) into EMS pages and only pulling it in dynamically when necessary. In order to make this work I had to save a more accurate picture of what was currently in memory at a given time so I could page certain areas of memory back and forth like a stack. This was really necessary because with DOOM1/DOOM2 looming, this structure would grow from 16k or so to 25 or 30k or more. In practice it only increased EMS page swaps by a couple percent and speed wasn't affected.

 

I will extend this idea of having a clearer picture of what's in the memory at a given time, and see if i can reduce some instances where i page something that is already in memory, reduing some delay. (But I sort of have a feeling I'm not doing it too much).

 

Additionally I'm going to keep hammering at the DGROUP/data segment, which now stands at about 20,000 bytes (including 3000 bytes allocated to stack.). Moving DS to 0x4000 will be much easier if the actual near data fits in just on EMS page and i can dynamically move things in and out of the other three pages of the 0x4000 segment  freely.

 

Some other crazy ideas:

 - openwatcom overlays have proven to be buggy, so that's like 40k of memory savings I have lost by taking them out. Basically, there's just a whole bunch of initialization code that runs once and not again (or other things like credits code, intermission code, etc that in theory could be paged in and out). Anyway, If i can't use overlays, then what I'll try is just packing those game initialization functions near each other in the game binary, then at runtime once initialization is done, just zero out that memory region and use it for something else.

 - move sine tables back into low memory (< 0x4000, but not near allocations) if i can free up 48k down there, which doesn't look too hard. I'm already around 25-30k free and medium memory model already gives me 7k more. This will free up almost another 48k in EMS pageable memory, which can easily be used for more texture cache.

  - move a few more static data fields to files - things like sprnames, switchlists, animdefs... these add up to a few KB of the 20kb i have allocated, and really they can be in files and loaded into temporary variables instead of always being in memory.

 

Otherwise, the medium memory model builds, but fails during certain file opens. I had to write a far version of fread, which works fine, but my far version of read is not working. I may be better off just replacing open/read/write with fopen/fread/fwrite anyway, but I have to examine it's use more closely - i think maybe it is keeping the file handle to reopen the wad file over and over so it's a little more complicated.

 

Share this post


Link to post

I tried idea 1 above (zeroing out init code memory) and it worked fine, after i made sure to remove all functions that were truly "game setup" and not "level setup". This creates an empty area of about 13k. I haven't figured out what to put in there yet. While this is a low memory region it's still a far data segment. Maybe I can use it to reduce UMB usage.

 

However, i tried moving a few other regions (idea 3 above) into files  and loading. and it seems that because of the way the initialization code is ordered... once I do an EMS pagination involving segment 0x4000 (0x4400 is fine) all later fopens fail. It's not even a problem with fread or fwrite - the fopen itself fails. I think this is because far malloc returns an address barely in the 0x4000 segment area, and something internally must be using farmalloc when I call fopen in a large memory model, so I need to reduce memory usage a bit more. I think whats going on is, while my binary has like 190k or so used and 16k or so is from the data segment - the near data segment (at the end of the binary) is considered to extend to 64k, so the first far segment address for a far malloc is going to be after the near data segment is extended to its maximum, which pushes into that 0x4000 region. Maybe after I cut memory  usage several thousand more bytes, this problem will disappear. But i've actually had a lot of these weird errors pop up (including when I use overlays) where everything just breaks after I EMS paginate that memory area. I may have to look into the openwatcom source to understand things more.

Share this post


Link to post

Progress on the medium memory model stuff has felt a little slow. Some big bugs have been fixed, but I keep finding out weird new details (dont use alloca, as it crashes even when there's stack space left sometimes). 

 

I took a break from that and converted automap code to 16 bits of precision the past couple days. The automap code is terrible to read, like a few other spots (menu and hud code among others) so first I took some time to rename everything and figure it all out. Anyway, 32 bit arithmetic code results in a lot of extra instructions on 16 bit cpus, and automap was filled with this stuff. I whittled down the am_map code from around 8kb to around 5kb, which is a pretty healthy reduction in size. (Maybe this can help out in Doom8088?)

 

I'm going to sort of continue down this path and change all the x/y/z fields on mobj_t, etc from int32_t (or fixed_t) to fixed_t_union which lets me access high or low bits as a 16 bit value. There's a lot of random places in the code where that's all you need, and avoiding a shift or 32 bit math and just accessing the 16 bit portion will reduce code size in a lot of spots.

 

I've already reduced the conventional memory usage size by 25k (over 10%) since the last release 10 days ago which is pretty wild. There's another 3-4k worth of easy savings pending a certain bugfix. Medium memory model will be another 10k on top of that once that's done, and I have some other crazy ideas. There's also the 12k or so of space in the zeroed-out initialization code which I am not using. I think I'll be able get sine tables into lower memory after all and free up a whole bunch EMS region space for texture caching - or maybe I will just reduce UMB requirements. I'd like to get that under 64k rather than the current 90k or so.

 

Share this post


Link to post

OK, that was sudden. Everything was so buggy a few hours ago and I was really bummed out thinking this would take over a week to fix everything. A couple hours later and medium memory model build mostly working - all demos run fine. Just came down to finding a few pointers not marked far, a few memcpys that needed to be far, and a weird case of pointer math that worked fine on large but not medium. Things aren't perfect yet - there are some bugs i need to fish out (level intermission crashes... maybe some other bugs will come up).

 

So, this lowered code size by 6000 and increased speed by ~1%, which doesn't sound like a ton, but but there's more to that story... right after I made the last post I discovered the "-zdp" in wcc  which i believe makes a large-memory model program keep DS fixed. This is default behavior for medium memory model (as apposed to -zdf I believe). I tried this flag, and it lowered code size by 2k and increased speed 5%! So the previously mentioned gains are on top of these, but all this time there was this big performance gain just sitting there all these months. 

 

Anyway, I will work to fish out the remaining bugs in medium then try out gccia16, which everyone says should be quite fast. I'm guessing it will take a little while to get that working though.

 

 

 

Share this post


Link to post

For funsies, I tried the 0.15 version in DOSBox (86box would probably fit better, but bleh), and when I pressed Escape it started spewing garbage columns onto the screen kind of like the melt effect, but without any actual melting. I accidentally left it running for a while and at some point it just white-screened.

dosbox_Wa3YKNyGem.png.6c575d34f52524ce0e95f5ad74808ff3.png

Share this post


Link to post
17 minutes ago, realjohnmadden said:

For funsies, I tried the 0.15 version in DOSBox (86box would probably fit better, but bleh), and when I pressed Escape it started spewing garbage columns onto the screen kind of like the melt effect, but without any actual melting. I accidentally left it running for a while and at some point it just white-screened.

 

That's funny. Yeah, DOSBox stopped running the game since I began using EMS 4.0 features (v0.11). DOSBox doesn't support mapping 0x4000-0xA000 at all, only the page frame. A proper machine or emulator with EMM386 will handle it fine. So in DOSBox, basically whatever got written to that address last will be there. There was a while there when it would still make it to the title screen. Maybe some "LIMulator" could make it work in dosbox, albeit very slowly. 

Share this post


Link to post

OK - I've gotten all the (known) medium memory bugfixes cleaned up on openwatcom. I started working on a gcc-ia-16 build and I've got a build but as expected it's not going to work right away.

 

1. The binary is way bigger than openwatcom's - 212KB or so instead of 170 KB or so (even with watcom optimized for time over size). This makes far mallocs reach into the 0x4000 segment range which spells trouble since I'm using that region for EMS page swapping. By adjusting build params and cutting out some code I can get this down to 190 KB or so, but it's not enough to get through R_Init/P_Init without issues. 

2. The memory usage in data is higher. This leads to data in a few places no longer fitting in their segments anymore - see below, particular how the render memory usage has grown in the 0x9000 and 0x8000 areas just a little bit so that it pushed over 65535 - but then also how the physics memory usage in 0x9000 has grown over over 8000 bytes.

 

bzHJ9BJ.png   image.png.9836a4b9c5970865a6a492add6076a41.png

 

I would in general assume this is some stack/data/field alignment issue but not sure yet. I thought I had that all turned off. I'm still new to these compiler options and will have to work at it, but shuffling data into different segments in this model often leads to debugging and delays which I want to avoid. I do want to at some point to word-align everything, but not until way down the road once I know doom2, etc will run in the given memory space.

 

I think the binary size issue will work itself out eventually. As some code starts to become hand-written, I will write that to a file and load that code at runtime via EMS pagination. This will naturally pull code out of the binary and make it smaller... I also don't know if I will be able to figure out a way to mix openwatcom and gcc-ia-16 object files. If I could, then there's a lot of code (namely the thinker code) which is big in code size, but doesn't run often. This could use the openwatcom compiler to make it smaller and slower, giving us way more memory but negligibly affecting performance. 

 

I might put the gcc-ia-16 stuff on the backburner, and prioritize getting doom 1 and 2 and remaining features working. I don't want to be in a back-and-forth of adding a feature then struggling to find memory again, over and over. One thing I have considered is, as I add back features like savegames and sound, to just do it in ASM from the beginning, so I can remove that code from the binary and load it elsewhere in memory so I'm not under so much pressure in that < 0x4000 region.

 

 

Share this post


Link to post
On 9/20/2023 at 10:12 PM, sqpat said:

RealDOOM is a port of vanilla DOOM (forked from PCDoomv2) made to run in Real Mode. (Coincidentally, Doom8088 was being worked on at the same time, with a similar goal but a different starting point.) It runs through the use of EMS to use memory beyond 640kb.
 

I've been working on this since June, and the project now sort-of runs in 16-bit mode. Timedemo 2 seems to work right, but demos 1 and 3 have desyncs or memory-related crashing bugs. It's kind of in an alpha state. It'll run pretty okay on 233 MhZ hardware and up, using EMM386 whether real hardware or on 86box. DOSBox isn't recommended, it seems to struggle with 16 bit applications or EMS. It will of course also run on 16 bit computers. You probably want close to 620 KB free and 3-4 MB of EMS minimum right now.

 

Basically, you can't really just take the original codebase and build for 16 bit for many reasons. Slowly, the code was rewritten with more and more 16-bit style restrictions, until it became possible to actually build the code with a 16-bit compiler. You can still build and run the code in 32-bit mode, and it will use a sort of EMS emulator, simulating what the 16 bit code is doing. The 32-bit codebase is a lot more stable than the 16-bit one, but eventually they should end up pretty equal.

 

The goal for RealDOOM is really to make the port run as fast as possible as a 16-bit executable with the same level of quality, etc. as the original game. It may turn out futile to try and get this to run at smooth speeds on 16 bit processors, but I'll try to take it as far as reasonably possible. Then once that's done, it can also always be forked and modified with some tradeoffs between quality and speed. I don't want to start making those tradeoffs earlier than necessary though.

 

I haven't made major efforts on optimizations yet as it took a few months just to get the game to work as a 16-bit executable at all. It only started working last week and I haven't worked out all the bugs yet. To be honest, I had hoped to clean up the 16-bit build a little bit more before posting this here - but I'm going on a month-long trip starting tomorrow and I might not have much time to work on this in the near future... oh well. 

 

Known (Major) Issues
- Savegames dont work
- No sound (need to find a 16 bit compatible library or write from scratch?)
- Untested outside of doom1 shareware for now.
- 16 bit mode has some desyncs and memory bugs, but it's almost there.

 

Work that has been done

 - Removal of some features (multiplayer, joystick...)

 - Lots of optimizations especially to lower conventional memory usage and size of the executable

 - Zone memory manager rewritten to use EMS, "MEMREFs" passed around between functions instead of pointers, lots and lots of code rewritten to support this.

 - Lots of changed types, explicitly declared bit sizes, etc. to make 16/32 bit both work off the same codebase.

 

 

RealDOOM on real hardware 286-20:

 

 

RealDOOM on 86box Pentium MMX 233

 

 

 

I want to shout out Viti95, who wrote FastDOOM of course, which I referenced for a lot of code removal and optimizations, and he personally contributed a couple of optimizations to RealDOOM as well.

Just watching the lag in this makes me wanna punch my computer. 

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...