Jump to content

Rum and Raisin Doom - Haha software render go BRRRRR! (0.3.1-pre.8 bugfix release - I'm back! FOV sliders ahoy! STILL works with KDiKDiZD!)


GooberMan

Recommended Posts

Just did a run against a nearly-stock Chocco I have here locally that adds high res support and not much else. And wow. I've been so focused on incremental improvements that I forgot how far it's already come.

First post updated with the profile in question.

Share this post


Link to post

You think that's amazing? I just compared the high-res Chocco running on my i7-6700HQ to my optimisations running on the Raspberry Pi at the same resolution.

 

jSfez73.png

 

aaaaahahahahahahaha an ARM that has a maximum clockrate of 1.5GHz running my optimisations performs basically as well an i7 running an uprezzed Chocco.

 

*ahem* So. Uh. That red line is gonna go further down by the time I'm done.

Edited by GooberMan

Share this post


Link to post

Oh man it's finally here. Time to fail to bring these improvements to EE. This work has continually astounded me and I look forward to further developments.

Edited by Altazimuth

Share this post


Link to post
9 minutes ago, Altazimuth said:

Oh man it's finally here. Time to fail to bring these improvements to EE. This work has continually astounded me and I look forward to further developments.

Heck yes. It'll be great if you can get these optimizations into Eternity.

Share this post


Link to post

At a minimum, the backbuffer transpose should be applicable to every port with a software renderer. I am curious to see it profiled against ports that try to render multiple columns at a time, but my suspicion is that this will perform better because I'm not branching all over the place to handle multiple columns and it stays within one cache line for writes far longer than other methods.

 

This really should have been done and made standard years ago IMO.

Share this post


Link to post

sorry i am really fucking dumb here.. so.. it's another modern source port that uh.......

 

tries to be as close to vanilla?

 

so what does it do better than the other sourceports like chocolate doom and (the one i mostly use crispy doom) Prboom?

Share this post


Link to post

It's using Choco more as a proof that it'll work with the classic renderer, to my knowledge. Many of these should be applicable to just about any traditional software renderer, which would greatly improve rendering performance of said ports.

Share this post


Link to post

Definitely will be following the progress on this project.  I've thought about similar ideas (particularly transposing the frame buffer) myself but usually figured these things would be a wash at best for various reasons.  Prerendering the light levels for textures in order to make the code SIMD friendly is a pretty cool idea I've never even thought about.  Although I'd be surprised if you have a significant problem with vanilla compatible data sets, I'm not sure if that would be a reasonable default for say GZDoom's or EE software renderer.  Maybe I'm overestimating but I would suspect that some of the larger mods could easily use several GB of memory with this technique.  It's pretty easy to push GZDoom over a GB with 4x texture resizing.  It would make a lot of sense as an opt in feature for systems with enough RAM since if the hardware is there then the performance benefit could be huge.

 

Share this post


Link to post

The most fascinating part of this project to me is explaining why something like Doom still can have performance problems on more modern low-end hardware that, in my uneducated brain, should still be far more capable than anything from the DOS era. I never considered how many optimizations were made for the limited hardware at the time that would end up being a problem on modern hardware at higher resolutions. This is amazing!

7 hours ago, GooberMan said:

DISCLAIMER: Do you suffer from the following symptoms?

 

  • You think software renderers are pointless
  • You think it will have a limited audience
  • You don't think anything I'm doing is technically possible or worthwhile

 

Then by all means, direct your concerns to the correct part of the internet...

Why would you say such a thing? Surely no one here has a track record of behaving like that :^)

Share this post


Link to post
4 hours ago, Blzut3 said:

Prerendering the light levels for textures in order to make the code SIMD friendly is a pretty cool idea I've never even thought about.  Although I'd be surprised if you have a significant problem with vanilla compatible data sets, I'm not sure if that would be a reasonable default for say GZDoom's or EE software renderer. 

Which reminds me, how maintained is GZ's software renderer these days? I looked at the code the other day but not the history. Things like PNGs would definitely need special consideration to even run properly in this code path.

 

(I've also stated how I'd do a hardware renderer previously on these forums. I'll get back to that at some point, but now that I'm learning the software renderer inside out this will honestly improve the methods I was going to employ.)

 

I have had to bump the default page size to 128MiB thanks to REKKR. I'll rewrite the allocator one day to be a bit more modern, specifically grabbing new virtual pages when needed. Likely a solved problem in every other source port, but as noted above Chocco is so close to vanilla.

Share this post


Link to post
1 hour ago, GooberMan said:

Which reminds me, how maintained is GZ's software renderer these days? I looked at the code the other day but not the history. Things like PNGs would definitely need special consideration to even run properly in this code path.

I don't have enough time in the day to stay active with GZDoom, so I don't know what the official current status is.  dpJudas did a lot of refactoring over the past few years, but it looks like he recently stepped away from the project.  As you may know, there are now two software renderers since the softpoly backend has an "hardware accelerated" mode with is full 3D.  The classic renderer is still largely the same just reorganized for multithreading and what not.

 

Just realized that my statement about pre-lighting textures not being really possible with more advanced mods is more true than I initially thought, since with colored lighting and fog there's potentially tens or hundreds more texture variations required depending on the map.  (Was previously just thinking about the sheer number of textures/sprites and higher resolution assets in larger mods.)  Of course one could treat those as exceptions and fall back to a slow path.  In any case though, not something you'd need to worry about for this project since the scope is limited.

 

Edited by Blzut3

Share this post


Link to post

So the advantage of working at a company with a strong demoscene culture/history. One of the graphics guys, programs Atari ST demos in his spare time. Suggested to just use a lookup table for a SIMD mask I was trying to calculate at runtime. Given that I've been trying to avoid loads, I didn't think of it. Or, as I've been putting it: "It's so obvious, it's unintuitive". Because the results speak for themselves.

Before:

Jdrfvof.png

 

And after:

3eXGhiH.png

 

(It looks much clearer side-by-side, open in different tabs and switch back and forth)

Share this post


Link to post

I actually knew of this before, but it was not exactly clear what you planned out to do with it, @GooberMan.

 

Now that the Lost Soul has escaped its prison, this is fucking awesome. Turbocharged software rendering with multicore support? Where i can sign up?

 

This is as beastly as FastDoom for totally different reasons. And it being done by a Housemarque with a demoscene background (Which is all the more amazing): This has serious potential. Any kind of demoscene magic/influence applied to Doom should be cherished with a shrine of dedication, for the scene is where the real coding comes to be.

Share this post


Link to post

Oh, to be clear, I'm Australian and haven't written a demo in my life. Working at Remedy and Housemarque though, I've been surrounded by demosceners. Getting arcane knowledge about bit twiddling is just a matter of finding the right person to ask.

Share this post


Link to post
14 hours ago, GooberMan said:

Fog is something I'm have to deal with when I get to making Hexen run again. Let's see what I come up with when I get to it.

Unless I'm misremembering the fog is Hexen is just a full level colormap swap (to fogmap) so if nothing else you could just re-render the textures on map change.  But even if you rendered textures once that's still only going to double what you have now.  Compared to Boom colormaps or ZDoom's colored lighting/fog where the growth is basically unbounded depending on the mod (and dynamic).  The more interesting question for Heretic/Hexen will be if there's anything that can be done about TINTTAB.

Share this post


Link to post

About the backbuffer transpose, I wonder how much that would affect the bottleneck in the GZDoom software renderer. Right now the drawers doesn't really seem to be the performance bottleneck in GZD. At least not if you increase the resolution to 4K. Even though I went from an i7 haswell cpu (4c/8t) to a threadripper (32c/64t) the frame time stayed virtually the same. Right now it seems that drawer setup is what slows it down more than anything. If you're lucky you'll be less impacted by this in vanilla Doom because there's less features there than what zdoom supported.

 

For very complex scenes the BSP traversal and sprites becomes the main bottlenecks. You can multithread the BSP by splitting the frame buffer into multiple subsections of the scene, reducing the field of view for each thread. The sprite performance can be improved by not always calculating the top/bottom clipping lists from scratch.

Share this post


Link to post
4 hours ago, dpJudas said:

For very complex scenes the BSP traversal and sprites becomes the main bottlenecks. You can multithread the BSP by splitting the frame buffer into multiple subsections of the scene, reducing the field of view for each thread.

This is my exact plan, in fact. Well. In my experience with similar splitting of buffers, you need to pay attention to cache sizes on your system or else the L3 will trip over itself trying to propagate the buffers before it needs to. So I won't use a single buffer.

 

There'sextra advantages to not using a single buffer besides the complete avoidance of cache contention. I'll be doing threading next actually, it's time to take that break from SIMD, so I'll have more information if it does actually work as I think it should soonish.

Share this post


Link to post

Oh, and just to highlight that cache really is the problem on modern systems. Here's performance against a non-transposed renderer at 2560x1600 on an ARM processor.

 

c61y0Yx.png

 

Ignore the titlepic performance, I didn't patch the scaling code across to my clean Chocco build. But that graph is essentially the same as the original i7 graph I captured at lower resolutions. Notice how outright terrible ARM's cache performs on wall/sprite heavy parts of the DEMO1 loop.

(The capture is 700 frames from program start, it ends around where the barrel in front of the secret wall is being shot)

Share this post


Link to post

So here's a sneak preview of something I'll be ready to talk about proper in a few days time, screencapped from the Pi used in the above post.

 

ri1jIAl.png

Share this post


Link to post

And another sneak preview. Being Chocolate based (and testing that I don't break Vanilla every step of the way) means that I can just go ahead and load up Plutonia 2 to get a screenshot.

 

0uWaA4O.png

Share this post


Link to post

I should stop being a wimp and figure out how to compile this on Windows if you need testers.

 

Edit: Bah having trouble with Windows ill just set up a linux environment and mess around with it when I am bored

Edited by Eric Claus

Share this post


Link to post

The CMake files are currently not up to date, so until I fix that in about 12 hours I can't compile it for my Raspberry Pi nor my Linux box and neither can you. So you might want to hold off a little there.

Edited by GooberMan

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...