Doom High Res Sprites - Kickstarter?

StalkerZHS · December 9, 2017

If we were to make 3d models of anything, first we'd have to base their animations off of smoothdoom for fluidity's sake.

hardcore_gamer · December 9, 2017

5 hours ago, Tea Monster said:

The model itself looks nice but the textures make it look like it's made out plastic, like's its some kind of toy or action figure. This is actually the problem I have with a lot of 3D models. Good texturing is a MUST if the model is suppose to look convincing.

Maes · December 10, 2017

On 9/12/2017 at 4:32 AM, kb1 said:

The overdraw is exactly the same for hires models as it is for low res, if you think about it. Both sprite sources create the same-sized output. Now, I would argue with anyone claiming that overdraw was not a big deal - it is. It wreaks havoc on the cache, like nothing else. But, again, the impact is the same, regardless of source sprite size.

Sprites are stored in the most-desirable order, when you are forced to paint them vertically. So, reading the sprite source memory takes full advantage of pipeline burst. Of course, you hit a lot more cache lines for a hires source - it's not free. There will absolutely be a performance hit. But there's no reason to believe that it will track linearly with source sprite size factor. (4x size increase does not mean 4x hit on performance, unless the source sprite gets *really* big). The fact that the sprite is being read forward, consecutively (more or less) makes all the difference.

Agreed, the Doom software renderer itself is pretty efficient when it comes to skip source pixels, and the output pixels are a constant number regardless of the source ones. However, now that I think about it, the performance penalty is likely gonna be even worse than O(n^2) for a linear increase n in size: with the old blocky renderer and low-res sprites, a single source pixel can be read once and then "plastered" over the screen super-efficiently by a cached instruction loop, so for e.g. 16x output pixels you may only end up "paying" for one data read (and there are further economies to be realized with optimized renderers that render more than one column at a time).

But with a hi-res source that cuts down on scaling, almost every pixel that ends up on the screen is gonna cost an additional data read. Of course, that is assuming that the drawing is done software-side. Now, if you do draw the same sprite or column over and over, then yeah, CPU-size cache is gonna help somewhat so maybe it won't be o(n^2) all the time, but not always equal or better, either.

On a GPU, assuming that at least the sprites are stored in Video RAM and not main RAM, then you are pretty much stuck with the full O(n^2) scaling penalty, but at least you keep it out of the way of the regular cache.

kb1 · December 12, 2017

It's true - with larger sprites, you'll walk across a lot of cache lines. This is typically ok, as long as each new address mod 64 is different. When they are the same, you're flushing the very same cache you just filled, which is what yields the worst performance. The low 6 bits often determines which cache line gets used. This is why painting 1024x768 is a lot slower than some larger resolutions.

With Truecolor (3 bytes per pixel), you're practically guaranteed to avoid this in most cases. You can get about 21 pixels per 64-byte cache line, so you take advantage of the cache until the sprite becomes 2/21 size or smaller (cause it's far away). But, then you're painting a lot less pixels, so, in a way it's self-governing, somewhat. If you're *really* cool, you tell the CPU to precache, but that'd be a tricky algorithm :)

Al that is to say that it's worth trying, with a clause in the text file that it is designed for high end PCs. It's easy enough to test if you have 1 hi-res sprite: Rename it so it replaces all sprites, and give it a try. That might help determine how large you could go.

dpJudas · December 12, 2017

11 minutes ago, kb1 said:

With Truecolor (3 bytes per pixel), you're practically guaranteed to avoid this in most cases. You can get about 21 pixels per 64-byte cache line, so you take advantage of the cache until the sprite becomes 2/21 size or smaller (cause it's far away). But, then you're painting a lot less pixels, so, in a way it's self-governing, somewhat. If you're *really* cool, you tell the CPU to precache, but that'd be a tricky algorithm :)

All truecolor renderers generally always use 4 bytes per pixel to keep things dword aligned. That gives you 16 pixels per cache line rather than 21. As for drawing things far away, mipmaps were invented to fix that cache problem. Doom doesn't support those out of the box, but they are fairly easy to add (but as you said yourself, once the sprite is far away, the speed gains get a lot lesser, especially compared to the sprite setup and sorting costs).

Zemini · December 12, 2017

I would rather see a overhaul much like what Blizzard did with Starcraft Remastered. Still 2d Sprites, but higher 4k resolution. We would have to break the color barrier though. It may even ot work on the doom engine.

https://starcraft.com/en-us/

Edited December 12, 2017 by Zemini

kb1 · December 12, 2017

1 hour ago, dpJudas said:

All truecolor renderers generally always use 4 bytes per pixel to keep things dword aligned. That gives you 16 pixels per cache line rather than 21. As for drawing things far away, mipmaps were invented to fix that cache problem. Doom doesn't support those out of the box, but they are fairly easy to add (but as you said yourself, once the sprite is far away, the speed gains get a lot lesser, especially compared to the sprite setup and sorting costs).

All? Always? Not if you want the 5.1 extra source pixels per cache line. Modern CPUs handle *most* unaligned reads just fine. Sure, there's a penalty when crossing a cache boundary, but, in this case, you'll typically take advantage of that, since you're going to use that next cache line anyway.

Another option is to go back to using a palette. Could be a per-monster. 2 or 4K colors, with 32-bit light calcs would ease up on cache, while producing a high-fidelity image without too much compromise. It's conceptually messier than straight truecolor pixels, but with proper pipelining, it can run at comparable speeds while reducing cache pressure. Some empirical balancing is in order here.

Yes, mipmapping can help (as mentioned a few posts back). Personally, I find that the developers get a bit conservative with their mipmap generators, deciding to build the smaller versions to scale to 1:1 right at the transition point. This causes the normal aliasing effects to change character, to the point where it becomes visually obvious when that transition occurs. Like most things, it's a memory vs. quality tradeoff. I hope to put all of these options to the test someday.

dpJudas · December 12, 2017

6 hours ago, kb1 said:

All? Always? Not if you want the 5.1 extra source pixels per cache line. Modern CPUs handle *most* unaligned reads just fine. Sure, there's a penalty when crossing a cache boundary, but, in this case, you'll typically take advantage of that, since you're going to use that next cache line anyway.

Another option is to go back to using a palette. Could be a per-monster. 2 or 4K colors, with 32-bit light calcs would ease up on cache, while producing a high-fidelity image without too much compromise. It's conceptually messier than straight truecolor pixels, but with proper pipelining, it can run at comparable speeds while reducing cache pressure. Some empirical balancing is in order here.

Yes, mipmapping can help (as mentioned a few posts back). Personally, I find that the developers get a bit conservative with their mipmap generators, deciding to build the smaller versions to scale to 1:1 right at the transition point. This causes the normal aliasing effects to change character, to the point where it becomes visually obvious when that transition occurs. Like most things, it's a memory vs. quality tradeoff. I hope to put all of these options to the test someday.

As far as I know, yes all and yes always. Memory addressing gets a lot simpler when you are dword aligned as you can use shifting to calculate the address. I.e. (fracpos >> (FRACBITS - 2)) & ~3. With 3 bytes you'd have to use a multiply instruction. It also unlocks the opportunity to use vectored instructions. On today's GPUs you can't even pick a texture/framebuffer format that is only 3 bytes - your choices are always 1, 2 or 4. I'm sure some hardware engineers somewhere ran the numbers and came to the conclusion that the addressing advantages of 4 bytes were greater than the cache hit. Also must be a reason why all compilers align their reads and writes.

The interesting thing about mipmaps is that from a theoretical point of view they are both better for the cache and provides a higher quality result (less aliasing). However, that transition jump you describe makes it look pretty bad unless it is paired with linear mipmap sampling. I'm applying a slight texture bias in the GZDoom mipmap implementation to try counter it, but of course if you look for it you will see the jumps. Overall though, I prefer the lower aliasing it gives in a scene over not having the transition jumps.

Edited December 12, 2017 by dpJudas

Maes · December 12, 2017

Of course, all this talk about cache lines etc. presumes that hi-res sprite support would be implemented with a software renderer, which is certainly not the case today, at least not with truecolor resources. And realistically, any future ports supporting it would do the heavy lifting on the GPU, so we're really looking at GPU architectures here. There you have no caches, no NUMA, only -hopefully- very wide and independent memory channels, and any increase in the size of assets/textures will have an immediate impact on used memory bandwidth -hopefully way below what the hardware can handle.

Now, if you manage to pair GPUs with continuous main RAM access....then you get the worst of both worlds.

Edited December 12, 2017 by Maes

dpJudas · December 12, 2017

Just to be clear: my comments were meant for a software renderer implementation, such as the truecolor one in GZDoom. I only brought GPU's into the discussion as an illustration that 24-bit truecolor textures are always being stored as 32-bit to the degree that modern API's does not even offer a 24-bit backing anymore.

On the subject of caches and GPU's, they do have caches and they do use a NUMA architecture. This PDF has some nice images of the various caches and such.

Maes · December 12, 2017

Heh, so they started complicating GPUs with NUMA now? Wow.

I am not very familiar with GZDoom's truecolor software renderer, TBQH. Does it support truecolor resources as well, or only 8-bit ones?

dpJudas · December 12, 2017

It always uses 32-bit truecolor textures (the 8-bit ones are converted to 32-bit at load, including generating mipmaps). It can sample from them using either nearest or linear filtering.

Maes · December 12, 2017

What about sprites? Also, is lighting computed in real time or are there precalculated copies of the same textures at different lighting levels? Certain effects like overall brightness are not cheap to apply in RGB space, if that's what the renderer uses.

dpJudas · December 12, 2017

The sprites also use 32-bit textures. The only exception to this rule are translated sprites, where the translation texture is still 8-bit. Light shading is applied per pixel in RGB space: out.rgb = (texture.rgb * lightshade + 127) >> 8. It does this using SSE instructions where it can do 8 word multiplications (two pixels) in one instruction. Much more expensive than palette mode, but the only alternative would be using a huge lookup table. Given the size required I'm not sure it would be faster.

Interesting idea doing precalculated copies of the same texture - I hadn't even considered that option. Unfortunately it would require 32 copies just to get the same number of shades as the palette renderer uses. It would probably be faster, but use a lot more memory and cause a quality loss (32 shades vs 256 now).

Maes · December 12, 2017

IIRC, _bruce_'s truecolor Chocolate Doom branch did all lighting and color manipulation in HSV or HSL color space -changing the light level of any pixel or texture/sprite was a breeze, as was adding colorization effects, however other effects like transparency were much more expensive to do (no Alpha in HSV space!) and everything required converting back to RGB for rendering.

In Mocha Doom I used a mixture of 8-bit textures and a series of true color palettes precalculated for each lighting level -there could be more than 32, and in theory there could even be separate sets of palettes for specific sprites or textures, somewhat breaking the limitations of strictly 8-bit resources.

geo · December 12, 2017

@Maes needs a high res avatar.

Stealth Frag · December 12, 2017

So I finally found that old hires imp pack. Unexpectedly I didn't notice any framedrops with 120 imps. Don't know how engine can handle with more monster sprites to draw. Don't know what things in GZDoom are changed from time when I start this pack in 2014, but for some reason the latest version of GZD don't scale this sprite correctly. I don't have time to play with this so I just use old GZD version (which display that astronomical count of fps).

EDIT:

Here's zip with scrrenies (DW horrible downscale it)

imp.zip

Edited December 12, 2017 by Reinchard

Stealth Frag · December 12, 2017

Sorry for doublepost but I also did some quick test with Doom3 sprites pack + Dawn of Reality. Even with big monster count I don't notice any visible framedrops:

Edited December 12, 2017 by Reinchard

kb1 · December 12, 2017

13 hours ago, dpJudas said:

As far as I know, yes all and yes always. Memory addressing gets a lot simpler when you are dword aligned as you can use shifting to calculate the address. I.e. (fracpos >> (FRACBITS - 2)) & ~3. With 3 bytes you'd have to use a multiply instruction. It also unlocks the opportunity to use vectored instructions. On today's GPUs you can't even pick a texture/framebuffer format that is only 3 bytes - your choices are always 1, 2 or 4. I'm sure some hardware engineers somewhere ran the numbers and came to the conclusion that the addressing advantages of 4 bytes were greater than the cache hit. Also must be a reason why all compilers align their reads and writes.

Any of that time pales in comparison with a cache miss, which is what using 3 bytes is minimizing. By the way, * 3 = shift + add

dpJudas · December 12, 2017

1 minute ago, kb1 said:

Any of that time pales in comparison with a cache miss, which is what using 3 bytes is minimizing. By the way, * 3 = shift + add

I guess I'm not going to convince you. All I can say that I've never seen a performance critical codebase use 3 bytes for truecolor. Keep in mind you can't just store or load 3 bytes - you either have to do three 1 byte stores, or one short + one byte, or start shifting things around and store half-processed pixels. The cache miss is rare, while the time you decided to pay has to be paid for every single pixel processed.

kb1 · December 13, 2017

Why not just read as a DWord, write as a DWord, and advance by 3? The video card couldn't care less about the extra byte. AND it with 0x00FFFFFF if need be. It works. The cache miss is not only not rare, it can be almost constant, as you back away from the sprite. 3 vs. 4 is a non-issue compared to those cache misses.

No need to convince: I know where you're coming from, though: Aligning data on the processor's native data size is a common plan, and often done by default. But the task at hand may or may not always benefit. This task, in particular, has a frequent, obvious, nasty tendency to thrash the cache. It's worth investigating alternative ways to alleviate the pressure, if the goal is to have hi-res sprites in software.

Put it this way: If there was a nice hi-res sprite pack ready to go, I'll try anything and everything to get it to perform well!

Stealth Frag · December 13, 2017

1 hour ago, kb1 said:

If there was a nice hi-res sprite pack ready to go, I'll try anything and everything to get it to perform well!

Hmm... As I said (and documented with scrrens) 3 post above there is no visual framedrops using hires sprite packs even on sloughter maps, wich is little surprising for me. But ok, I will say it again - even with Doom3 hires sprites pack (which is really massive pack with lot of sprites and fx stuff) I never have less then 60 fps. You can try it - that pack call ultimatesd3mod.

Edited December 13, 2017 by Reinchard

Maes · December 13, 2017

TBQH, the used examples (except the 120 imp one) don't really test what they are supposed to test. They look like Doom 4 or any other modern "Doom-like" FPS, at best. High monster count for a modern FPS, but low compared to what even the most tame slaughtermap can throw at you. Dawn of Reality's monster count is really low for a map of that size, and in my book it would not classify as "slaughter" in the classic sense. Try going straight to NUTS.WAD or any of the CHILLAX.WAD maps and see what happens then.

Edited December 13, 2017 by Maes

Stealth Frag · December 13, 2017

If monster count from DOR doesn't cause framedrops, then majority of Doom wads also should not cause framedrops. I know there are wads like NUTS, but let's be serious - that kind of wads would have problem in both cases: models or sprites. The most important fact is that most of wads can handle it.

snapshot · December 13, 2017

Starting to think the main obstacle why there's not many High poly model mods is because they're optimized for that kind of levelpacks with large numbers of enemies, levelpacks like Slaughterfest and Chillax, maybe one should optimize them for the main game then regular levelpacks instead (I mean packs like BTSX and such).

And if it causes FPS to drop in levelpacks with large numbers of enemies, turn them off ?

Edited December 13, 2017 by dmg_64

Stealth Frag · December 13, 2017

Exactly. It doesn't make sense. NUTS.wad make framedrops even without any addons pack. If that packs works smoothly on most of wads, then should we take into account maps with extreme non-sense monster count? Even if that kind of maps don't cause framedrops, someone can always make wads with more monsters - that point of view have no sense.

Maes · December 13, 2017

1 hour ago, Reinchard said:

If monster count from DOR doesn't cause framedrops, then majority of Doom wads also should not cause framedrops. I know there are wads like NUTS, but let's be serious - that kind of wads would have problem in both cases: models or sprites. The most important fact is that most of wads can handle it.

My point was that with DOR (which has no more than 500 monsters all in all) you never fight more than what, 50 at a time? That's nothing for Doom, even if way more than that may be active "off screen".

It's also true that with NUTS.WAD-like monster counts, the gameplay code can be just as "heavy" (if not more heavy) than the rendering one, but the latter one depends on many more variables: type of renderer, display hardware and driver (if it's a HW accelerated port) etc. Don't forget that, at least in the past, it wasn't unusual for prboom+ to mop the floor with glboom+ in timedemos, simply because the hardware couldn't stand being "zerg rushed" with tons of individual draw commands.

Edited December 13, 2017 by Maes

hardcore_gamer · December 13, 2017

1 hour ago, Reinchard said:

Exactly. It doesn't make sense. NUTS.wad make framedrops even without any addons pack. If that packs works smoothly on most of wads, then should we take into account maps with extreme non-sense monster count?

No. I don't even care for playing those kinds of maps. Most of the time maps with huge numbers of enemies just devolve into boring fights where you either lead them into a corridor where you can kill them slowly over time or just boring circle strafing fights where you circle strafe for 10 min until the horde is dead. Slaughter maps suck imo.

Negostrike · December 13, 2017

Eh, no thanks.

Stealth Frag · December 13, 2017

1 hour ago, Maes said:

It's also true that with NUTS.WAD-like monster counts, the gameplay code can be just as "heavy" (if not more heavy) than the rendering one, but the latter one depends on many more variables: type of renderer, display hardware and driver (if it's a HW accelerated port) etc. Don't forget that, at least in the past, it wasn't unusual for prboom+ to mop the floor with glboom+ in timedemos, simply because the hardware couldn't stand being "zerg rushed" with tons of individual draw commands.

Ok, but my point is that if wad like DOR + hires sprites pack don't cause any visual slowdowns, that majority of Doom wads do the same. Ok, maybe for Doom 50 monsters at a time is nothing, but first - most of Doom wads is located in similiar value, second - everything points to the fact that even a larger count of monsters should not be a problem (120 imps test for example). So aside of all that technical mambo-jumbo everything points to the fact that hires sprites should work. And for iwads with much less count of monsters this will be like nothing to todays hardware.

Sign In

Doom High Res Sprites - Kickstarter?

Doom High-res Sprites 60 members have voted

Recommended Posts

Share this post

Link to post

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Tea Monster

Tea Monster

Mordeth

Posted Images

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Doom High-res Sprites
60 members have voted