Jump to content

Rum and Raisin Doom - Haha software render go BRRRRR! (0.3.1-pre.8 bugfix release - I'm back! FOV sliders ahoy! STILL works with KDiKDiZD!)


GooberMan

Recommended Posts

Guess who just halved the flat rendering time in single threaded mode.

 

v2OFvqG.png

 

Tell me why again no one tried transposing the render buffer before?

 

EDIT: And interesting stats too. My Linux box is showing a performance degrade, with the ARM showing no change. Hurmmmmmm. I do have an unrolled loop in there, lessee what happens after I play with everything some more...

 

EDIT2: Got some performance back on ARM by cleaning up. REALLY need to sort visplanes by distance.

 

DwVlEMH.png

 

Over to Linux x64 now then.

 

EDIT 3: Jeebus, this box ain't happy. Might be time to update the build scripts to always use Clang.

 

jt6h0Q7.png

Edited by GooberMan

Share this post


Link to post

Since I'm interested in the performance of the rendering engine, I learned a lot of things from your articles. Thank you very much!

 

Regarding the transposed backbuffer, my C# port also uses that technique.

 

I wanted to implement a C# renderer which is exactly the same as the original renderer, but I couldn't fully understand the visplane-based system. So I simplified my renderer by rendering everything vertically, and noticed that transposing the buffer might boost the performance.

 

Probably thanks to the optimization above, my C# port runs as fast as Crispy Doom (at least on my machine) even with the performance disadvantage of managed languages.

 

However, I know that PrBoom+ runs much faster than my implementation. I tried to find the optimization methods in PrBoom+, but the rendering code in PrBoom+ was too complex for me to understand. So I gave up the idea of improving the performance of my port;)

Share this post


Link to post

Ah, finally, I figured someone had to have done it somewhere. It's way too obvious an idea for it to be nochickened in to oblivion for anyone interested in software rendering.

 

I'll definitely be writing an article about visplanes and flat rendering. The above was my second attempt at making them render by column, but I also did this work in the space of a few hours (six total according to Discord logs, and including an attempt at brute force optimising the exiting code to only render visplane columns) so I'm still slotting all the pieces together in my mind from everything I've done and everything the codebase is doing. But I'll leave this little hint:

 

A visplane generates exactly raster lines for a perspective-correct texture mapper.

Edited by GooberMan

Share this post


Link to post

Yep, that's essentially what it's doing. The data has basically always been clipped rasterlines for a perspective-correct texture mapper, but Carmack either missed this (unlikely I guess) or did tests and realised the multiplications involved were annoyingly slow and went with the method of estimating horizontal interpolations. Either way, everyone has been rolling with it ever since.

 

As I started writing this post, I wasn't sure of the benefit of this particular implementation though for a 320x200 output. Short story: The method used is accurate every 16 pixels in a column from the start of the raster line to the end. You can change the PLANE_PIXELLEAP and PLANE_PIXELLEAP_LOG2 constants to 8/3 or 4/2 or 2/1 and remove the unrolled loop elements to deal with it to make it more accurate and subsequently slower.

 

Then I realised: It's easy to change some constants myself and recompile.

 

Getting comparable visual results is best at that resolution with 4/2. But. Here's the results with 16/4.

SWInGeH.png

 

PJOgUz5.png

 

Yep. We run slower (4/2 is even slower still), and I'd say the image quality isn't anywhere near as accurate. This is, of course, the complete opposite of high res rendering where it's faster and we fix inaccuracies introduced by the horizontal line scaling of the original code.

 

This doesn't surprise me at all really. Most of the work I'm doing really only starts seeing tangible benefits at higher resolutions than the original renderer. Which is entirely the point of this work. I mean, the scene renders in 0.26 milliseconds here on a single thread. The problem with Doom on modern systems is that people want it to look good and run well, which is pretty much the opposite philosophy of Fast Doom where you just want it to run well on contemporaneous hardware.

I am still optimising this new code though, and stripping out parts of the old code that no longer make sense. I could very well get it faster at low resolutions, but it's really not my focus. Sorry this one won't work out for you.

Edited by GooberMan

Share this post


Link to post

Actually, the above got me thinking. I'm testing on a 2560x1600 target buffer. What happens if I decrease the accuracy a bit more?

1igSECR.png

 

So there's very definitely clear results to going with less accuracy on a target of that size. Which then leads to the question: How inaccurate is the end result?

Subtractive blend in GIMP tells quite a bit:

xOwPnW7.png

 

There's inaccuracies alright, off in the distance where the relative distances of each pixel changes quicker and especially with already-noisy textures, but will you notice them? Eeeeeehhhhhhhhhhhhh, probably not.

 

So I'll go ahead and choose constants for the texture sampler based on view height, which means writing multiple copies of that function. Which means I _really_ would like the code to be in a language like C/C++/D where I can constexpr branch with template parameters and not have to deal with multiple copies of the same code everywhere/messy defines/etc.

EDIT: Updating the first post with this latest little thing

AP7sTxU.png

 

That's stats compared to a straight-uprezzed Vanilla renderer. Doing miles better at the moment.

Edited by GooberMan
Added latest benchmark graph

Share this post


Link to post

So here's a bit of fun. You can actually very easily break my new code by using it at low resolutions. So I auto select the right function dependent on resolution. But you can just plain override it anyway for strange visuals.

 

 

Also, ditching the original function at high resolutions is just plain necessary. This version of the original function even upgrades the sampling coordinates to the full 32-bit and it's still plain awful. But unnoticable at the original 320x200.
 

 

Share this post


Link to post
1 hour ago, GooberMan said:

So here's a bit of fun. You can actually very easily break my new code by using it at low resolutions. So I auto select the right function dependent on resolution. But you can just plain override it anyway for strange visuals.

Heh, pretty cool! This effect would look neat with the intestine textures, or maybe the liquid ones

Edited by Noiser

Share this post


Link to post
2 hours ago, GooberMan said:

Also, ditching the original function at high resolutions is just plain necessary. This version of the original function even upgrades the sampling coordinates to the full 32-bit and it's still plain awful. But unnoticable at the original 320x200.

Have you compared the original function with what existing ports have done to make it work?  Off hand I don't think ZDoom made any major changes to how span rendering worked, but I've only compared the high level.  Can't say I've seen any obvious precision distortions even at 5k.

 

Not that it's the point of your project, but I do find such comparisons a little weird since the community at large has known about the precision issues and fixed them a long time ago.

Share this post


Link to post
1 hour ago, Blzut3 said:

Have you compared the original function with what existing ports have done to make it work?  Off hand I don't think ZDoom made any major changes to how span rendering worked, but I've only compared the high level.  Can't say I've seen any obvious precision distortions even at 5k.

The span drawer in gzdoom also uses 32 bits for the sampling coordinates. I also fixed the sampling to be done at the pixel center. If I remember correctly, Eternity uses floats for the sampling coordinates.

Share this post


Link to post
2 hours ago, Blzut3 said:

Not that it's the point of your project, but I do find such comparisons a little weird since the community at large has known about the precision issues and fixed them a long time ago.

Can the community at large say offhand what exactly the precision issue is? Or do you really mean source port authors at large?

 

The average person around here doesn't know how or why. The Doom wiki won't tell you. Even reading the Doom black book won't tell you. A video like that, I'll use later when documenting everything to illustrate the example. Note that I describe visually what transposed means in the first article I list on the github wiki - something that anyone who's ever looked at how Doom texture data is stored knows, but that most people don't need to know how or why it's like that. But I'll give a ground up understanding for whoever wants to know.

 

(Short story, since it's now a Thing: That original sampler recorded in that video actually isn't purely the original. I modified the span function to use the full 32-bit values to sample the texture. The issue comes from a precalculated scale value based on the centre column that is used to adjust the X and Y integration values for span rendering. These values become more and more inaccurate the further away from the centre of the screen you get. And it's entirely avoided with my function going vertically along the screen and self-correcting after N pixels depending on the backbuffer resolution.)

 

Needless to say, this next bit should be addressed as a separate thing:

 

2 hours ago, Blzut3 said:

Have you compared the original function with what existing ports have done to make it work?

No, and I have no intention to.

 

Again, this is a way for me to relax. More pragmatically as a wider community, this is also a way for anyone that's interested to see what someone with my knowledge and experience would do when given a blank slate.

 

Using ZDoom's headers as an example, the flat plane still wants to draw by span instead of by visplane column. I would gain nothing from that, since transposing the backbuffer means that's not a good idea. So I had to look at exactly what a visplane is, going in knowing that it stores top and bottom pixel values for screen columns. Which led to the realisation that these are exactly rasterlines for a texture mapper, and that the span translation is really quite unnecessary.

 

Which leads to a further realisation: Every piece of render data in Rum and Raisin Doom is now actually a rasterline for a texture mapper. I'm seriously considering the benefit of deleting visplanes, deleting the column renderer, and generating and sorting raster fragments in a list for absolutely everything. Sprites, walls, whatever. They're all exactly rasterlines now. At which point, I'll basically have accidentally converted vanilla Doom in to a full 3D renderer - and be more efficient while I'm at it. And keep slime trails too.

 

Where exactly am I going to get those kind of realisations by reading another port's code? Maybe Vavoom? Except that started life as porting the Quake renderer to Doom, and I've already separately dug in to Quake's renderer.

 

With visplanes rendering faster, I'm approaching the limits of what I can do at a raw pixel level. Once SIMD is up and running properly, it's all algorithmic from here. And I'm going to get more speed wins by similarly abandoning conventional wisdom.

Share this post


Link to post
44 minutes ago, GooberMan said:

Can the community at large say offhand what exactly the precision issue is? Or do you really mean source port authors at large?

I mean there have been threads pointing out the issues that appear when you increase the resolution in Chocolate Doom.  Maybe not this issue in particular, but I've seen enough to come to the conclusion that one doesn't just simply increase the resolution and expect a ZDoom quality render.  While I may have exaggerated, what I was getting at is that there are many ports with high resolution renders that don't exhibit these issues.  It's cool that you want to document the issues, but I just found your comment that your new sampler is "just plain necessary" odd when clearly ports have been using a variant of the original code without issue.  (At least not at the resolutions you're using, there's going to eventually be a limit even with adjustment math.)  My suggestion to check existing ports was more of a "check that you're not making misinformation" rather than a "see if you can apply the fix."

 

I do realize that I may have misunderstood the exact context of your "just plain necessary" quote, in which case carry on.

53 minutes ago, GooberMan said:

Where exactly am I going to get those kind of realisations by reading another port's code? Maybe Vavoom? Except that started life as porting the Quake renderer to Doom, and I've already separately dug in to Quake's renderer.

GZDoom does have a second software renderer called softpoly, which is a full 3D renderer and not Quake's.  Still probably won't get much in the way of revelations from it, but just wanted to point out that technicality. ;)

 

In any case, still looking forward to reading your write ups of course.  As I said before, you're going down some paths I've wondered about in my head (and well beyond) so it's cool to see what the results turn out to be.

Share this post


Link to post

The only places where accuracy in flat rendering is really needed is when you have a "metatexture" made of several different flats; the simplest example of which being the icon face teleporter at the end of E1M8.

Share this post


Link to post
2 hours ago, Blzut3 said:

I do realize that I may have misunderstood the exact context of your "just plain necessary" quote, in which case carry on.

Yeah mate, the full context is "ditching the original function at high resolutions is just plain necessary" - which it sounds like basically every high res source port either does, or fixes the problem another way. All good.

 

5 minutes ago, Gez said:

The only places where accuracy in flat rendering is really needed is when you have a "metatexture" made of several different flats; the simplest example of which being the icon face teleporter at the end of E1M8.

This reminds me, it was glitchy with the original code and I haven't checked with the new code.

 

Chocolate Doom, so entirely untainted by my code:

4yVkG1j.png

 

Original code, 32-bit sample indexing at 2560x1600:
uKnUh92.png

 

And my new renderer:
7E7tYGw.png

 

I can spot a bad pixel, but that line down the centre of the Icon's visage is otherwise eliminated.

The pixel artefact also appears more at lower Log2 samples. So clearly there's more work to be done.

Share this post


Link to post
5 hours ago, dpJudas said:

If I remember correctly, Eternity uses floats for the sampling coordinates.

Yuppers.

 

Anyway off the back of the repo's wiki and talk on Discord I managed to get Eternity's backbuffer transposed, which should open the doorway for lots of other optimisations. The increased overall complexity of EE's renderer (such as the buttload of v_block.cpp drawing functions for both scaled and unscaled buffers) have made this quite a difficult task, but I'm super appreciative of the effort put in here, as without it I doubt EE would ever have gotten this.

 

Moving forward I'll likely be referring to the code more, though how much can be directly used exactly is beyond me. Stuff like pre-calculating colormaps for textures is potentially very memory intensive in EE due to having an arbitrary number of colormaps, unless this stuff was cached appropriately. The real big thing I want is to render spans as columns, but last I tried that I didn't get too far.

Share this post


Link to post
9 hours ago, GooberMan said:

Can the community at large say offhand what exactly the precision issue is? Or do you really mean source port authors at large?

 

Most Doom fans in the community aren't developers, so of course they can't go into technical details. Nevertheless, there are several people in the community over time that has attempted to address the same issues as you are battling here. Will you do it better? Who knows, but don't assume that Chocolate Doom is the state of art when it comes to improving Doom's performance. That's not at all the focus of that port.

 

Also please keep in mind that computers in 2020 are significantly different than what they looked like in 1993. That no source port did the transpose is mainly because using the GPU to rotate the final frame buffer image wasn't a real option in the golden age of Doom software renderer source ports. However, that still doesn't mean that they all perform as badly as your Chocolate Doom numbers indicate. Take ZDoom for example, randi spent quite some time optimizing the drawer functions there to write in 4 columns at a time (DWORDs instead of bytes). That port also memory aligned the frame buffer it used for the Pentium age cache lines.

 

Now don't get me wrong - I don't want to take away your thunder for being the first guy to actually implement the transpose, because that really is cool. And fixing the span drawer is an important quality fix.

 

Quote

(Short story, since it's now a Thing: That original sampler recorded in that video actually isn't purely the original. I modified the span function to use the full 32-bit values to sample the texture. The issue comes from a precalculated scale value based on the centre column that is used to adjust the X and Y integration values for span rendering. These values become more and more inaccurate the further away from the centre of the screen you get. And it's entirely avoided with my function going vertically along the screen and self-correcting after N pixels depending on the backbuffer resolution.)

 

The real issue here isn't so much the precision of the span renderer, but rather that the Doom renderer never properly implemented drawing at the pixel centers. The GZDoom drawer, for example, doesn't need to self-correct after N pixels at all. The errors that creep in aren't enough for the artifacts to show. I don't have the link, but there's an image somewhere here on Doomworld that show dancing sprites in older version of (G)ZDoom - all that stuff happens if pixels aren't clipped and sampled at pixel centers.

Share this post


Link to post
1 hour ago, dpJudas said:

...Who knows, but don't assume that Chocolate Doom is the state of art when it comes to improving Doom's performance. That's not at all the focus of that port...

 

...Also please keep in mind that computers in 2020 are significantly different than what they looked like in 1993. That no source port did the transpose is mainly because using the GPU to rotate the final frame buffer image wasn't a real option in the golden age of Doom software renderer source ports.

I think you'll find literally every post I've made in this thread emphasises that you did not need to say this.

Share this post


Link to post
20 hours ago, GooberMan said:

So here's a bit of fun. You can actually very easily break my new code by using it at low resolutions. So I auto select the right function dependent on resolution. But you can just plain override it anyway for strange visuals.

It looks like someone dented the floor and then continued flexing the floor as you moved across it.

Share this post


Link to post

I keep saying that I don't want this to become a real source port, that I'm happy to have this as an academic project and provide articles and explanations for what I'm doing so that anyone else interested in what I'm doing can grok it and try it out themselves.

 

But it is dangerously close to becoming a real source port after all.

 

kHzJml3.png

 

Implementing widescreen support, and I'm all "It's 2020, I expect to drag my window out to 42:9 and have it work". The reality of the Doom renderer is that the trig calculations start breaking down after a horizontal FOV of 165 degrees, so I probably can't get it to go that far without rewriting the projection functions entirely. First time I tried Doom in 21:9 though, and ooh yeah. Lean in close to my monitor (27") so that it fills my peripheral vision. It's gooooooooooood.

 

But widescreen though, there's a thing I wanted to profile here.

 

6uSRvz7.png

 

3 cores, pixel density just a tiny bit more than a proper 1080p render buffer. Three threads. Raspberry Pi 4. Clean out the spikes, and this is basically "If I had a Switch I could make Doom render at 60FPS on it" territory right here (ignore that bit about being a debug build, that's just me getting the defines wrong on not-Windows). I've shipped games on 11 platforms. 12 when Returnal releases. Worked on several others platforms. So yeah, making it work well across multiple platforms is one of my things - and being an engine programmer, gaming hardware especially interests me.

 

Maybe I'll finally do something about visplane merging when I'm done fixing all the widescreen bugs. Although "use a hash map" for what's there like every other port is probably the best I'll do, short of actually doing what I said and dealing exclusively in rasterlines instead of visplanes and walls.

Edited by GooberMan

Share this post


Link to post

I honestly need to tread lightly around jailbroken/modded devices. I mean, back in the day we used modded Xboxes as devkits and I did work on our PSP engine with my own hacked PSP. But it's a different industry these days. Doing it on a Raspberry Pi is the closest I'll get without professional concerns coming in.

 

Back-and-forth development on someone else's Switch with them reporting results is not something I want to do either. It's already a ballache trying to get ImGui playing nice with GL3 on Ling's mac (and I've given up for now and just #if 0 the offending code out). Subsequently, I'm hunting for a cheap Mac that I will literally only be using to compile and test on, so it can be an i3 for all I care it just needs to compile and run code.

Share this post


Link to post

Well, getting it to run officially is just never gonna be a thing no matter what. If there's one thing Nintendo's been consistent on, it's an anti-homebrew stance since they fear that opens the door to piracy.

 

Which kinda sucks, but well, what are you gonna do...

Share this post


Link to post
On 11/5/2020 at 4:57 PM, GooberMan said:

I keep saying that I don't want this to become a real source port, that I'm happy to have this as an academic project and provide articles and explanations for what I'm doing so that anyone else interested in what I'm doing can grok it and try it out themselves.

 

But it is dangerously close to becoming a real source port after all.

 

Nothing wrong with that my friend.  Obviously it's ultimately your choice but you have my support for what it is worth if you want to go that route.  I find what you are doing interesting and the fact it is built on something solid and vanilla is cool with more advanced visual features and optimizations.  I think there is a "market" for this so to speak.

Share this post


Link to post

Well, this is the thing. If it's a real source port, I have to provide support to users.

 

And if it's a real source port, it's harder for any other source port maintainers to see the ideas I'm employing and pull them in to their own ports.

 

This is ultimately the point of the articles I write. I want everyone to understand what I'm doing. Seriously, redefining visplanes is a big thing. This is something that hasn't changed since 1993, but this work has allowed a new understanding of what they actually are. And if I can put that in a format that anyone can understand from the ground-up and employ in whatever source port they're developing, then that's far more valuable than a reference implementation in my opinion - for example, Linux Doom is a reference implementation and yet everyone's still using spans for software rendering. And it's already giving results - just from my articles and questions on Discord, Altazimuth has implemented the backbuffer transpose in Eternity. I really want to see some results with Eviternity, get an idea of what else is deficient.

 

This is a resource for all Doomers. Being a real source port does diminish from that.

Edited by GooberMan

Share this post


Link to post
2 minutes ago, GooberMan said:

Well, this is the thing. If it's a real source port, I have to provide support to users.

 

And if it's a real source port, it's harder for any other source port maintainers to see the ideas I'm employing and pull them in to their own ports.

 

This is ultimately the point of the articles I write. I want everyone to understand what I'm doing. Seriously, redefining visplanes is a big thing. This is something that hasn't changed since 1993, but this work has allowed a new understanding of what they actually are. And if I can put that in a format that anyone can understand from the ground-up and employ in whatever source port they're developing, then that's far more valuable than a reference implementation in my opinion. And it's already giving results - just from my articles and questions on Discord, Altazimuth has implemented the backbuffer transpose in Eternity. I really want to see some results with Eviternity, get an idea of what else is deficient.

 

This is a resource for all Doomers. Being a real source port does diminish from that.

 

Fair enough.  If you just want to show off proof of concept and experimentation that's fine too and like I said ultimately it is up to you. I personally don't feel that this being a source port with bug fix support would detract from your aim, but if you feel that it would detract from what you do best, and you don't want to deal with any of the associated headaches then that is the best choice for you.  The work is still valuable and indeed somebody might take your code/concepts and work further with it down the line in a new or as you have already stated existing source port anyway which as you said is good for all Doomers.  Keep up the good work whatever it is you choose to do!  I just wanted to say if you decided to go with or try the source port direction I would definitely be a follower and I am sure others would be as well and not to feel discouraged from that path if you decide you wanted to pursue it.

Share this post


Link to post

If this were to become its own port, the performance improvements would most benefit mapsets that use advanced editing features, so in addition to everything GooberMan already said, he'd probably have to do the work of implementing those features as well. It makes more sense to keep this the way it is.

Share this post


Link to post
4 minutes ago, M_W said:

If this were to become its own port, the performance improvements would most benefit mapsets that use advanced editing features, so in addition to everything GooberMan already said, he'd probably have to do the work of implementing those features as well. It makes more sense to keep this the way it is.

 

I suppose you have a point.

Edited by Eric Claus

Share this post


Link to post

Being a real source port means I'd isolate the playsims in to their own dynamic library. Vanilla. Boom. Even ZDoom if you wanted would go right in to a switchable-at-runtime library. All dependent on render support of course. Whatever, playsim features can be their own thing. Even up to MBF level you don't need to do a whole lot different from a rendering level to support the advanced features. Again, I really want profiles from Eviternity. Since that's a gold standard mapset and a first-stop for many people new to the Doom modding community.

 

It's not an unusual idea. Pretty sure Ling wants to do the same thing with his dream port.

Share this post


Link to post
1 minute ago, GooberMan said:

Being a real source port means I'd isolate the playsims in to their own dynamic library. Vanilla. Boom. Even ZDoom if you wanted would go right in to a switchable-at-runtime library. All dependent on render support of course. Whatever, playsim features can be their own thing. Even up to MBF level you don't need to do a whole lot different from a rendering level to support the advanced features. Again, I really want profiles from Eviternity. Since that's a gold standard mapset and a first-stop for many people new to the Doom modding community.

 

It's not an unusual idea. Pretty sure Ling wants to do the same thing with his dream port.

 

Hey do whatever you feel is best if it seemed like I was trying to pressure you that wasn't my intent at all.

Share this post


Link to post

Oh, nah, sorry mate. Context, I guess.

 

Short story is that I am bipolar, so sooner or later I'll just flat out lose interest in this. The fact that people like Altazimuth are waiting on my results and that it will subsequently benefit a large part of the community means that my focus will stay for far longer than it did for, say, BSP2DOOM. Which I couldn't get enough people proper interested in. Code still exists, theory is still about the same even after digging deep in to the renderer, but no real incentive to continue the work. And also my Demon Workers Unite! mapset, which I had grand plans of being an industry-wide relaxation effort but no one (outside of Kaiser for an intro map) seemed interested in.

 

A bunch of other things, including professional concerns, would need to go a certain way before I commit to a full blown source port effort with users and all that.

 

I do know that my kind of attitude and experience would be a good thing in general for source ports. See the bottom of my first post, and the nochicken reference a few posts above? This work is also, in part, my response to that kind of insanity. I can describe theory until the cows come home, but now for the first time thanks to very favorable professional conditions I can also supply code which is a very effective STFU mechanism. But actually committing? I've got a Playstation 5 game that needs my attention, and when I get to the point where relaxing doesn't mean "thinking about code 24/7" any dedicated source port efforts I make will take the hit.

Edited by GooberMan

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...