RealDOOM: DOOM Ported to 16-bit Real Mode

TheCorinthian · January 27

5 minutes ago, sqpat said:

You should watch this video, even slower. maybe you actually will punch your computer!

https://www.youtube.com/watch?v=O613ctZRBuY

Nah, I'm good. Thanks. I'll just do drugs instead.

sqpat · January 29

I fixed enough bugs to get the gccia16 build to run a demo. The build seems to have a stack memory leak. It crashes after a bit of runtime - I eventually removed enough code unused in the demo to get it to not crash in the runtime of demo 3. (I'll need to print out SP/BP to file or something to track where the leak is happening.)

In 86box in my usual settings: (detail high, screenblocks 5 timedemo 3)

286: ??? vs 30526 realtics (I can't run the gccia16 build on a 286-25 - the EMS driver takes up too much extra memory. Maybe after I find the leak.)

386: 386DX-40, 10618 vs around 11713 realtics. Around 10% improvement.

486: DX2-66 VLB 4112 vs 4569 realtics 10% faster.

586: P133 ISA: 1598 vs 1608 1% faster (almost certainly bottlenecked on isa bus)

25% code size increase for 10% speed increase is nice to have, but it's rough to fit this in memory in the near term. I haven't tested out too many of the optimization options. Maybe It can do better. I want to get this wrapped up, so after I find the memory leak I'd like to cut a release, then move on to DOOM1/DOOM2 support. After I've done those I'll have a better idea of how much memory i need in in worst case scenarios and it'll be easier to make decisions on memory.

sqpat · February 5

I have been busy with some other stuff (hardware projects, work, travel) but I really want to be done with this medium model transition and the gccia16 build. I have thought about it and I think I won't prioritize that build too much right now. First, I'm kind of burnt out on hunting down memory bugs. Second - If its only 10% speed improvement now, and if I eventually am going to write all the performance critical bits by hand, then that 10% difference will only continue to shrink. I think I need to keep the binary smaller than gccia16 will allow. Maybe it's something I can revisit later.

In some other news, I took a crack at some drawing algorithm improvements and came up with some ideas. The drawcolumn algorithm will definitely fit in registers and you never have to go back to memory except for of course texture, colormap, screen array lookups/write. It's going to involve mixing a lot of data and code, to make good use of CS to reference data. This should be fairly easy once this ASM is loaded at runtime and I am already placing data wherever I want in memory, so I know where data and code will be relative to each other.

Drawspan will "kind of" fit in registers. Basically, you can work the u, v coords off 16 bit precision, but need to correct periodically. So the idea is, you run 16-32 iterations of the function in an unrolled loop off 16-bit u/v, then when you go back to the start of the unrolled loop, you correct the error using 24-bit precision (you don't need 32-bit, I think). This might introduce some off by one pixels but nothing actually noticeable. I think going to memory can be avoided except when you have to correct the precision and pull the larger u/v steps out of memory. I've made a version of the algorithm in C, and accidentally made realdoom run 8-12 percent faster (depending on processor). The compiler is still sort of an obstacle and this can be better handwritten. Flat color rendering of course would be much faster, but I'd like to have optimize this as much as possible as well. Credit for the idea for correcting goes to Maes in this topic:

I don't want to dwell too much on these improvements now since I still should get DOOM 1/2 support finished before I focus on performance.

viti95 · February 5

I was able to create a fully unrolled version of R_DrawSpan for potato detail in FastDoom, it's quite fast compared to original code but don't know if it could be of any help, as it takes quite a chunk of memory (also is 32-bit code). For high and low detail modes I wasn't able to optimize equally as there aren't enough registers avaible.

As for columns, you can take a look at Heretic/Hexen fully unrolled functions, those are faster compared to vanilla Doom (I also used them in FastDoom).

sqpat · February 5

12 hours ago, viti95 said:

I was able to create a fully unrolled version of R_DrawSpan for potato detail in FastDoom, it's quite fast compared to original code but don't know if it could be of any help, as it takes quite a chunk of memory (also is 32-bit code). For high and low detail modes I wasn't able to optimize equally as there aren't enough registers avaible.

As for columns, you can take a look at Heretic/Hexen fully unrolled functions, those are faster compared to vanilla Doom (I also used them in FastDoom).

I havent pushed the commit yet but drawspan's core loop is:

// set full precision
xfrac.w = basex = ds_xfrac + ds_xstep * prt;
yfrac.w = basey = ds_yfrac + ds_ystep * prt;
xfrac16.hu = xfrac.wu >> 8;
yfrac16.hu = yfrac.wu >> 8;

xadder = ds_xstep >> 6; 
yadder = ds_ystep >> 6;
while (countp >= 8) {

  spot = ((yfrac16.h >> 2)&(4032)) + (xfrac16.b.bytehigh & 63);
  dest[0] = ds_colormap[ds_source[spot]];
  xfrac16.hu += xadder;
  yfrac16.hu += yadder;
  ...
  spot = ((yfrac16.h >> 2)&(4032)) + (xfrac16.b.bytehigh & 63);
  dest[7] = ds_colormap[ds_source[spot]];
  xfrac16.hu += xadder;
  yfrac16.hu += yadder;

  countp -= 8;
  dest+=8;

  // reset full precision 
  xfrac.w += x32step;
  yfrac.w += y32step;
  xfrac16.hu = xfrac.wu >> 8;
  yfrac16.hu = yfrac.wu >> 8;
}

this should be faster in 12, 16, 24 or 32 iterations, but for some reason the compiler ruins the unrolled final loop, so it only works as a single iterator loop, which slows down performance as the loop number gets bigger. The unrolling does help of course but working off 16-bit precision and avoiding 32 bit operations helps more, because 16 bit x86 is so limited on register bytes.

EDIT: I took a look at the fastdoom unrolled loop and I think I just don't have enough register room. There's also some other small details (shifts are very slow on 16 bit).

Here is what i had for the drawspan loop. It mirrors the four repeated lines above.

; core drawspan loop, assume:
ES = segment to dest
DI = OFFSET to screen/dest
CS = code segment contains ds_colormap
BX = offset for the mov from the texture and xlat for ds_colormap
DS = DS segment points to texture

AH = Holds 0x63 for a faster AND
AL = used for xlat.
CX = yfrac's MID 16 bits but shifted right by two out of the loop
DX = xfrac's LOW 16 bits
BP = xstep's LOW 16 bits
SI = ystep's MID 16 bits but shifted right by two out of the loop

;;;;;;;;;;;;
; get 12 bit value directly into bx
0x00:  89 CB          mov   bx, cx
0x02:  81 E3 C0 0F    and   bx, 0xfc0 ; would be faster with register containing 0xfc0
0x06:  20 E6          and   dh, ah ; ah contains 0x63, a little faster than using immediate
0x08:  00 F3          add   bl, dh
; look up spot and store to screen. Spot is not 8 bit so we cant XLAT.
0x0a:  8A 07          mov   al, byte ptr [bx]
; replace FF with compile time constant for colormap's CS Offset
0x0c:  BB FF FF       mov   bx, 0xffff ; would be faster with register containing 0xffff
0x0f:  2E D7          xlatb byte ptr cs:[bx]
0x11:  AA             stosb byte ptr es:[di], al
;increment frac
0x12:  01 F1          add   cx, si
0x14:  01 EA          add   dx, bp

so you do that 8 or 16 times or whatever, then grab the 32 bit fields from memory and correct your 16 bit values and jump and continue the loop.

If we were to do some sort of solution where we completely unrolled it 320 times and jumped to the appropriate spot in memory to run x iterations (is that what you mean by "fully unrolled"?) then we would still need to correct every 32 or so iterations, anyway. But it would infrequently go to memory except when really necessary. We're pretty tapped out for register space though.

Edited February 5 by sqpat
Fixing bug in c code and adding asm

sqpat · February 5

For good measure, here is the basic idea for drawcolumn. This one I'm pretty happy with. There are some segment offset shenanigans going on, and i can't really "fake it" to test this out until the project is further along.

BX = offset for where it needs to be in CS for colormap and DS for dc_source[0]. 
DS is calculated preloop based on known BX and CS values. Textures in texture cache are guaranteed 256 byte aligned, so calculating segment should not be an issue.
CS is obviously code, but will also contain colormaps in its segment (easy, use EMS to put it in the right segment at runtime)
ES contains DEST (screen 0, 0x8000 or whatever)
SI is contains 79 (for add 79 - add from register is one tick faster and one byte less
DI used for stosb screen lookup
DX contains bits 0-15 of precision for frac
BP contains bits 0-15 of precision for fracstep
CH contains bits 16-23 of precision for frac
CL contains bits 16-23 of precision for fracstep. 
AH contains 127 for a faster AND

Count and remaining count lookup probably stored in BP or something
;;;;;

0x00:  88 E8    mov   al, ch // bits 16-23 of frac
0x02:  20 E0    and   al, ah // ah contains 127
0x04:  D7       xlatb byte ptr ds:[bx]
0x05:  2E D7    xlatb byte ptr cs:[bx]
0x07:  AA       stosb byte ptr es:[di], al
0x08:  01 F7    add   di, si ; calculate next dest, si is 79
0x0a:  01 EA    add   dx, bp
0x0c:  10 CD    adc   ch, cl ; carryover bit from 16th bit

This can be repeated over and over without any correcting of precision like drawspan. Keeping the prefetch queue filled is pretty important even on a 286 and this should never stall for prefetch even on 0ws, due to the instructions being so compact. I really like back-to-back xlatb into the stosb, doing 2 memory reads and 1 memory write in 4 bytes.

Edited February 5 by sqpat
change code formatting

fishy · February 5

goofy question, but can't you somewhat increase performance by using the jaguar maps instead of the pc ones?

sqpat · February 5

1 hour ago, fishy said:

goofy question, but can't you somewhat increase performance by using the jaguar maps instead of the pc ones?

No idea what is special about the jaguar maps, but first I'm not going to change the original wads at all for this project, and secondly, I don't think map composition affects performance significantly anyway.

Individualised · February 5

1 hour ago, sqpat said:

No idea what is special about the jaguar maps, but first I'm not going to change the original wads at all for this project, and secondly, I don't think map composition affects performance significantly anyway.

They have significantly simplified geometry (and some maps are removed altogether), but as you say it's not within the scope of this project.

Edited February 5 by Individualised

sqpat · February 18

I haven't done a whole lot of work on RealDOOM this month yet.. I was working on some hardware projects - I have a 286 system I have run as fast as 36.9 mhz before (with a peltier cooler). I have also overclocked some video cards as fast as 35 mhz on the ISA bus. I hope to run a 286 at 40 mhz with 0 wait states at some point this year, if I can get some better CPUs and memory. And hopefully RealDOOM is ready by then. But unfortunately the board's BIOS and EMS driver are kind of busted, so I will probably have to hack the BIOS and rewrite its EMS driver to make this possible, which is not something I'm excited about. As thigns are now I can't run RealDOOM on that machine even though the chipset supports EMS.

Anyway, I decided to skip the gccia16 release and move on. I started the first bit of work on DOOM 1/2 support by mapping out memory locations for sectors, lines, etc etc based on worst-case maximum values for those fields based on DOOM 1/2 levels. I looked into TNT/Plutonia but those levels' worst-case sizes are like twice as big, so there's no near-term plan to work on that. But I had mostly planned this part ahead of time and there's plenty of memory available to make this possible.

While I had planned for the level data increase in size, one thing I didn't expect was that all the texture lump/column offset lookup tables balloon from 22k each in shareware doom to 80k each in doom2. I cant possibly keep that 160k in memory all at once. However, I think the lump lookup table is very very repetitive and if I use RLE compression it will shrink by 90% or more, and maybe I can fit these in memory if its around 90k in size.

There are some lingering issues with something colliding with my own memory allocations - It's probably standard library fopen/fread, as the bug always triggers with fread. It's making it difficult to use anything in certain memory areas - there is probably about 100k in total memory going unused due to this. I'd really like to move the sine tables into the 0x3000 segment area to clear up more EMS memory for the aforementioned texture lookup tables. At some point I am probably going to have to write my own file handling functions so that nothing is internally calling MS-DOS for malloc. That way I can control of all the memory allocation going on, and the OS and RealDOOM don't step on each other's toes.

DOOM 1/2 semi-playable... in a couple weeks, hopefully.

Frenkel · February 21

On 1/29/2024 at 9:56 PM, sqpat said:

25% code size increase for 10% speed increase is nice to have

Doom8088 got a nice size improvement by compiling with -mnewlib-nano-stdio. When optimized for size the Watcom executable is about 5 kB smaller than the GCC executable.

sqpat · February 21

3 minutes ago, Frenkel said:

Doom8088 got a nice size improvement by compiling with -mnewlib-nano-stdio. When optimized for size the Watcom executable is about 5 kB smaller than the GCC executable.

Neat... am going to have to give that a try. I assumed i'd get faster code with some other options but not smaller executables. As more and more critical code moves to hand-written assembly I assume the compiler speed differences are going to mean less and less so smaller code feels like it means more.

Currently am continuing work on the RLE tex lumps... seems to drop the 80k collumps to 3-5k in size, but it seems like there is either always a bug with normal textures or composite textures. Feels like I'm learning more about a few parts of the codebase I never worked with along the way though.

sqpat · February 24

Okay - RLE implementation of column lumps is done. For shareware doom it cut memory usage for that data field from 21-22k to < 500 bytes. Doom2 doesn't get far enough that I can calculate the exact savings but it should be under 2000 bytes. It's relatively little code change and doesn't really change runtime speed.

I'm fiddling around with UMB allocations now. I think I am going to skip dynamically allocating via DOS apis and I will just be a bad citizen and use those memory regions directly too. I will probably create a release config with EMS page frame as D000 and another with EMS page frame as E000 and the other 64k block will be used for UMB. Then I can hardcode memory addresses and remove variable pointers in these ranges too. I also need some more memory beyond that 64k block and I am currently using the c800-cfff block but ideally I would like to use b000-bfff instead (but it seems to be bugging out if I use that during the textmode initialization). I know c800-cfff gets used for stuff like XT-IDE and such often, which considering the target platform would be annoying to lose. Using all these UMBs will make machines with 128k bios have trouble running this, but oh well. They can run regular doom I guess...

I have managed to hardcode sine/cosine/tangent tables to memory regions 0x35F0-0x3FFF or so and nothing is crashing. I couldn't go 0x2000 lower to fit the tantoangle tables... so after DoomMain runs and before DoomLoop runs now I do this basically...

extern angle_t __far* tantoangle;

// clears dead initialization code.
void Z_ClearDeadCode() {
	byte __far *startaddr =	(byte __far*)D_InitStrings;
	byte __far *endaddr =		(byte __far*)P_Init;

	//9320 bytes or so
	uint16_t size = endaddr - startaddr;
	FILE* fp;
	FAR_memset(startaddr, 0, size);
	
	tantoangle = (angle_t __far* )startaddr;
	fp = fopen("D_TANTOA.BIN", "rb");
	FAR_fread(tantoangle, 4, 2049, fp);
	fclose(fp);

}

lol... it works. This codebase is becoming a terrible pile of black magic. (Technically there is like another 1200 bytes free in there, but its hard to find stuff that can be allocated this late. )

sqpat · February 26

image.png.52cb91c4f7d4ff058b803acc414f9e32.png

Getting there... a few texture bugs need to be cleaned up... I think I broke composite textures. Shareware timedemo 3 is also out of sync again, and in doom 2 the game eventually crashed.. commercial doom1 also doesn't work (something about loading TEXTURE2 lump_ but I've mostly made enough space for everything to fit in memory. I noticed there's some commercial content like the BFG which is a little buggy too (only the actual 'bullet' does damage) and I think there will be a lot of bugs like that for me to track down for commercial and doom2 specific content.

sqpat · March 16

In the week or so after the last post, I did some work on graphics caches. Previously there was a 64k texture cache (shared by sprites and non-flat textures including composite textures) and a 16k flat cache (all flats are 4k, so it fit 4 flats). There was a lot of thrashing going on in both of those so what I did was split the render phase into separate sub-phases with their own memory mappings. It turned out that there were some variables and fields only needed during either one or two of R_RenderBSPNode(), R_DrawPlanes(), or R_DrawMasked() - which represent the 3 major phases of rendering. I divided these variables into EMS pages where they could be paged out in chunks between phases freeing up space for each subphase and was able to create enough space for a 64k flat cache and a 64k sprite cache, indepdendant of the original 64k texture cache - which reduced the amount of page remapping going on by quite a bit. This also lowered worst-case conventional memory usage a little bit. Ultimately there was some speed gain but somewhat minimal (a couple percent). And there are some bugs when lots of sprites get used... so I think something about sprite cache eviction is bugged... but I'm confident I'll figure that out at some point.

After that, I fixed loading bugs for DOOM1 and DOOM2 commercial, as well as a couple of the aforementioned bugs (BFG and timedemo 1). They are working now, but here are definitely a variety of bugs still to fix for example, in some instances teleporters don't seem to work, one of the doom1 demos seems to load the wrong level, doom2 seems to crash on level change, and so on.

Anyway, I've been pretty busy the past two weeks or so, and will be pretty busy for the rest of march and unable to work on this much. April might see some good work done, and May for sure I think. RealDOOM is probably more than 50% done at this point, I think - getting DOOM 1/2 fully working doesn't feel too far off, then finally, I can have fun with optimizations and ASM hacking.

slowfade · March 17

Thanks for the progress info. Amazing to see this happening bit by bit.

sqpat · April 14

Finally got back to the project yesterday and today. I managed to get sprite cache eviction working, and the code should be very similar for all the other texture cache eviction so the rest will follow soon. The reason this feature is important is that I need the engine to be able to cycle various graphical assets in and out of EMS correctly for these huge DOOM1/DOOM2 levels with more graphical variety. There was already a conventional memory cache with the EMS cache behind it, but evicting from EMS to be able to pull more stuff from the WAD was not implemented.

Once this is also working properly I can reduce the amount of EMS memory that is dedicated to these graphics. I was able to do shareware doom fine with 2 MB of EMS (without eviction implemented), but I was already pushing past 3 MB required for commercial DOOM. I was just increasing these caches by a lot to run levels without running out of cache space, but of course there are performance implications with swapping too much. I can later make build configs that reduce EMS/disk texture swapping by supporting a larger EMS cache but it would be really cool if I could support a 2MB or 3MB build to make it easier for more machines to run RealDOOM.

Anyway, its a big relief to get this done and to feel confident about how well its working because I was really dreading working on this feature a couple months back and thought it would be much harder. This is one of the bigger remaining steps in the way of getting a release done for DOOM 1/2. I'll have a lot of time in the next week and a half or so - I don't know if thats enough time to finish it, but I'll have a lot of time in May as well to work on this, so I think sometime in May i'll be done with commercial DOOM support and moving onto ASM improvements.

sqpat · April 16

The aforementioned caches are implemented and working great. I've lowered the required EMS back to near 2 MB from 3 MB. I've also gotten rid of level-preloading of graphics - honestly its way too aggressive in how much it tries to preallocate, and it'll never fit in a small amount of EMS - and pulling that code saved 700 bytes or so, which isn't small.

Meanwhile... there's still a lot to polish before the next release and the code is deep in the midst of development so it can be hard from the outside to see progress or get everything running, so I took a video of some gameplay in doom2 map 20 which really shows off a lot of the commercial enemies and levels.

I think my focus over the next weeks will be fixing timedemos in doom1/doom2, I think only one of the six work... I'm sure fixing the timedemos will point me in the direction of some bugs, but as you can see from the video the game seems generally playable otherwise. There's then a couple of other things to implement - for example a couple doom2 levels have sidedef WAD fields bigger than 64k , which will need to be handled with some special code. I have to look into the commercial finale screens and some stuff like that too. Once everything major is covered I may cut a release - I originally planned to included savegame support with the doom1/2 support but I think I will do it afterwards because I might prefer to write all the code in ASM, and load it in dynamically when needed so the code doesn't take up precious conventional memory.

sqpat · April 27

Small progress update - I'm busy with other things again for a couple weeks but made it most of the way to what will probably be the next release.

- Saved around 4-5 KB of conventional memory (the binary is under 170 KB now)

- Implemented commercial credits/finale screens (including bunny scroll)

- Fixed teleporter bug (was related to player not being the first mobj defined in the map)

- Implemented >64k sidedef loading (all levels in doom2 work now)

- Fixed some wad loading bugs (last flat in the wad was bugged)

I also moved some things in memory around. Once I set the DS register to a fixed value (0x3C00 or whatever) by modifying watcom source code, then I can page in different variables to 0x4000-0x4c00 via EMS to have more near variables. Thinkers are there for physics code.. During render, i can have something different in that segment - possibly something different for each phase of rendering. Maybe nodes during bsp traversal, vissprites during sprite rendering, etc.

Surprisingly even doom 2 map 30 seems to work fine, but boy is it slow when you look out towards the boss's direction. It's very stuttery even with a fast k6-3. There must be some major slowdown related to the animated textures or rendering walls that are very far away. I need to investigate if I introduced some bottleneck there.

I still need to fix DOOM 1/2 timedemo desyncs and I'd like to get EMS visplanes done and I need to fix the readme screen. If i can figure out how to save a few more KB of space that would be great too. I think this release will be ready in mid May then I will move to some of the ASM tasks. I think theres an easy 30-50% speed gain there just on render functions. Maybe a lot more.

Darkcrafter07 · April 27

It's really cool that it executes in 16-bits but there were no AMD K6-III back in 80's. Seems like each gametick there happen a lot of things running in "parallel", maybe you could determine some sort of queue, a hopped execution. Let's say, you take 4 gameticks and perform:

1) BSP traversal and geometric transformations, then use the same results for the next 3 ticks;

2) Actors logics and behavior but you need to speed them up 4 times so that they don't slow down as in Rachael's slaughtermap performance booster for GZDoom;

3) Most probably, textures need to be approximated with 1 color so that the old slow computers render those parts easier, floors and ceilings aren't rendered at all, instead these parts are transparent and the background colors show through. Your engine would divide the screen horizontally in half, one for floors and another for ceilings. Floors are always black and ceilings gray. Your engine detects that there's slime or lava close to player, or player stepped on it and changes those "background" colors to green and red.

4) Skies are one color just like in fastdoom, however, it changes from episode to episode like E1 being gray, E2 - pale red, E3 - red, E4 - orange, D2E1 - brown, D2E2 - orange-gray, D2E3 - red, etc...

5) All monsters downscaled like in Doom 8088 but 4 times so that it takes even less memory.

6) Map geometry rendering slowed down in half, so that only actors would be rendered in normal speed.

7) Transparency rendered like in fast doom halftone fashion.

8) Not sure if limited palette would help to increase the performance, like 16 colors mode.

sqpat · April 27

3 hours ago, Darkcrafter07 said:

It's really cool that it executes in 16-bits but there were no AMD K6-III back in 80's.

Doom didn't exist in the 80s (yet the 486 did) and also this project is not called Doom80's.

Darkcrafter07 · April 27

31 minutes ago, sqpat said:

Doom didn't exist in the 80s (yet the 486 did) and also this project is not called Doom80's.

Right but 486 came out in 1989, the last year of 80's and it was a 32-bit one just like 386, which also came out the second half of the decade. 386 and 486 were the systems that original Doom was written for.

Is there a need then to make a 16-bit version of Doom for 32-bit systems to get them barely running?

As for 16-bit CPUs like 286 and earlier, in order to get Doom running fast enough to call it a proper Doom experience corners need to be cut as 286 isn't capable of delivering such in its full glory. That's why I proposed options to go with.

sqpat · April 27

30 minutes ago, Darkcrafter07 said:

Is there a need then to make a 16-bit version of Doom for 32-bit systems to get them barely running?

I started this project because I wanted to, and I think it's awesome. That's really reason enough for me.

The scope of the project is in the first post.

Quote

The goal for RealDOOM is really to make the port run as fast as possible as a 16-bit executable with the same level of quality, etc. as the original game. It may turn out futile to try and get this to run at smooth speeds on 16 bit processors, but I'll try to take it as far as reasonably possible. Then once that's done, it can also always be forked and modified with some tradeoffs between quality and speed. I don't want to start making those tradeoffs earlier than necessary though.

You probably haven't followed the project for the past year and that's fine, so to just catch you up - ASM level optimizations haven't started yet because the engine is not far along enough yet. (It's not like you just reduce colors and magically the engine runs on a 16 bit processor.) Those bigger ASM optimizations should start soon, and I do think its possible a (very very) fast 286 will be able to run the game at playable speeds eventually without doing things like removing textures. As for right now, any pentium or maybe a fast 486 is runs the game "okay". If the engine does ever get to be as fast as the original engine one day, I would consider that a huge success.

Redneckerz · April 28

On 4/27/2024 at 5:56 PM, Darkcrafter07 said:

Right but 486 came out in 1989, the last year of 80's and it was a 32-bit one just like 386, which also came out the second half of the decade. 386 and 486 were the systems that original Doom was written for.

Is there a need then to make a 16-bit version of Doom for 32-bit systems to get them barely running?

As for 16-bit CPUs like 286 and earlier, in order to get Doom running fast enough to call it a proper Doom experience corners need to be cut as 286 isn't capable of delivering such in its full glory. That's why I proposed options to go with.

You really have a habit of making yourself look bad, no?

Why do projects like FastDoom exist? Or Helion? Or this?

Because people want to.

In any case, RealDoom answers another what if - how can we get as much of Doom on a system thats far below spec?

It reminds me of GBADoom. But on PC. I want this to go the distance.

Also if an Amiga 500 can get Dread/Grind then 286/386 can get Doom. Wolf3D also runs on a 8088, so why cant Doom also run on lower spec?

Darkcrafter07 · April 29

On 4/27/2024 at 7:44 PM, sqpat said:

I started this project because I wanted to, and I think it's awesome. That's really reason enough for me.

The scope of the project is in the first post.

You probably haven't followed the project for the past year and that's fine, so to just catch you up - ASM level optimizations haven't started yet because the engine is not far along enough yet. (It's not like you just reduce colors and magically the engine runs on a 16 bit processor.) Those bigger ASM optimizations should start soon, and I do think its possible a (very very) fast 286 will be able to run the game at playable speeds eventually without doing things like removing textures. As for right now, any pentium or maybe a fast 486 is runs the game "okay". If the engine does ever get to be as fast as the original engine one day, I would consider that a huge success.

My bad I didn't read it.

deathz0r · April 30

On 4/29/2024 at 8:08 AM, Redneckerz said:

Wolf3D also runs on a 8088, so why cant Doom also run on lower spec?

Minor correction here - vanilla Wolf3D requires a 286 and will not run on a 8088/V20 at all. While there has been at least one bad hack to get it to run on a 8088, the only port I'm aware of that properly runs on a 8088 is WolfensteinCGA, and that throws away VGA support.

sqpat · May 2

I figured out the issue with doom 2 level 30 - its not the animated backgrounds, its that huge boss demon graphic texture on the wall! It's composed 9 textures that are each 34KB (so they each require three pages of EMS since they don't fit in 32kb.). So you cant even fit two in the 64kb texture cache at the same time, and so for every column rendered with multiple of these stacked, theres a lot of pagination and disk access going on (they don't even all fit in the backup EMS texture cache asit would require 450k or so dedicated to that) . Usually there aren't this many different large textures present in the same time, much less on the same column. I'll later think about potential ways to improve the performance on this.

On the upside i played it on real hardware again, albeit a 233 mhz mmx, but it was super smooth in max quality. The memory requirements are still so tight that it's hard to run on very many setups though. EMM386 is pretty compact but 8088/286 systems requiring something like QRAM and then an EMS driver on top of that don't have enough conventional memory space to fit everything. Once some code (probably the render code) is pulled out of C and converted to ASM and loaded in via EMS rather than always present in the binary, I think things will be okay again.

I will be working on the project again next week. A "stable" alpha release of the game is fast approaching so that I can begin to ask the community for feedback on their hardware setups for configuring things and bugs.

sqpat · May 3

I finally had time to get to my pocket386 today (a new 386sx40 handheld laptop made by the maker of the book8088 and hand386). RealDOOM ran on it first try!

IMG_8316.jpg.c529bd11bd43b7f8ae61d7c469e337bc.jpg

.. Of course, so did FastDoom and vanilla DOOM 1.9. This is a 32-bit machine after all.

So of course I ran some benchmarks, shareware demo3 with 9 screenblocks (default settings)

High Detail

FastDoom
2134 in 10547 (7.081 FPS)

Vanilla DOOM
2134 in 15505 (4.817 FPS)

RealDOOM
2134 in 28804 (2.593 FPS)

Low Detail

FastDoom
2134 in 6144 (12.156 FPS, 1.72x improvement)

Vanilla DOOM
2134 in 9388 (7.96 FPS, 1.65x improvement)

RealDOOM
2134 in 20437 (3.65 FPS, 1.40x improvement) (i think something in the realdoom low detail renderer is currently bugged - i'll have to revisit)

Potato

FastDoom

2134 in 3678 (20.307 FPS, 2.87x improvement)

I'm starting to believe that a large screened Potato quality on a fast 286 may actually lead to some playable framerates...

viti95 · May 3

I'd say a very fast 286 (Harris 20/25 MHz) would require at least visplanes to be disabled to run at decent speeds. Even with that some maps like E4M2 would be very hard to play.

It's a shame there are no 286 boards with any kind of cache, that would help a lot. 386SX boards with cache handle Doom much better.

sqpat · May 7

On 5/3/2024 at 1:44 AM, viti95 said:

I'd say a very fast 286 (Harris 20/25 MHz) would require at least visplanes to be disabled to run at decent speeds. Even with that some maps like E4M2 would be very hard to play.

It's a shame there are no 286 boards with any kind of cache, that would help a lot. 386SX boards with cache handle Doom much better.

By visplanes do you mean flat rendering? They will be heavy for sure, I don't know if it's going to be 100% necessary to remove.

Fast 286es can run with 0 ws and later chipsets even support FPM - its not as fast as cache but its 25% faster than 1ws. I dont know how much faster a 386sx with cache runs compared to one with 0ws. I assume that bench has been run somewhere by someone? Anyway, growing up i played doom2 on a 486sx25 or 33. I didn't consider it slow or anything, it was 'normal' to me at the time - we didn't have other games to compare it to. I imagine sub 15 fps was very common. People will have very different opinions on what a playable framerate means I suppose, but I'd be thrilled to average 10 fps on real hardware with a big screen size probably.

Sort of an idea I want to try out at some point later after optimizations are implemented is to make a 640k version with a minimal wad where everything fits in memory with no paging or anything. Obviously not a full doom engine but more of a 'tech demo' sort of thing running a small sliver of doom content with best-case scenarios.

Anyway, work will begin in a day or two again. Hope to have something stable people can play with soon.

Sign In

RealDOOM: DOOM Ported to 16-bit Real Mode

Recommended Posts

Share this post

Link to post

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

sqpat

Dark Pulse

sqpat

Posted Images

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Join the conversation