Jump to content

Dynamic wiggle/Tall Sector fix for fixed-point software renderer


kb1

Recommended Posts

If you don't like idea to have precalculated lengths of segs, you can calculate it on the fly with slightly modified R_PointToDist to get distance between two points instead of point and (viewx,viewy).

Share this post


Link to post
  • Replies 147
  • Created
  • Last Reply

Top Posters In This Topic

Well, I couldn't get my super-duper mind-blowing code to work. I was going to impose an opposite error upon the calculation to cancel out the wobble. I gave it a half-assed effort and didn't have luck.

So, instead I did this:

//	rw_distance = FixedMul(hyp, sineval);
	rw_distance = (int)(hyp * sin((distangle * .00000000146291808f)));
That eliminates the long wall error, and is probably pretty fast. Of course, it uses that dirty floating point :)

The point is that, it seems that the only real problem is in the finesine table. If you look at the middle entries, you'll see quite a few angles all map to the same 16-bit sine value. finesine stores 32-bit fixed sine values, but only 16 of those bits are set (the upper 16 bits are all zero). I am considering shifting the finesine values left 8 or 16 bits or so, and filling in the low half properly, to see if the table can still be used.

If I read the code correctly, hyp can be no larger than 8192 (see question below), so I could use 2 extra finesine bits right away without overflow, if they were available.

Linguica's approach is the most mathmatically correct approach, but I think this might be faster, especially by creating a more-precise finesine. Actually, I am going to create a new finesine table: "finersine" :), still fixed-point, but, maybe shifted 8 bits left in comparison to finesine. This new table will be the same size, and will be indexed the same way, and only used for rendering.

Can anyone verify that R_PointToDist cannot return values higher than 8192 fixed-point?

Share this post


Link to post

Calculating the inverse square root of the length of segs is the perfect opportunity to use this famous function:

float Q_rsqrt( float number )
{
	long i;
	float x2, y;
	const float threehalfs = 1.5F;
 
	x2 = number * 0.5F;
	y  = number;
	i  = * ( long * ) &y;                       // evil floating point bit level hacking
	i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
	y  = * ( float * ) &i;
	y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//      y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
 
	return y;
}

Share this post


Link to post
Linguica said:

Calculating the inverse square root of the length of segs is the perfect opportunity to use this famous function:

Yeah, I've been wanting to use that formula ever since I saw it. It's one of those things that shouldn't work, but does.

I guess some profiling is in order:
. Original w/long wall error
. Linguica's fix
. Linguica's fix w/Entryway's optimizations and pre-calc inv_length
. Linguica's fix w/Entryway's optimizations and magic inv sqrt
. kb's original code + one-line floating-point sin() cheat
. kb's original code + more precise "finersine[]' table

Maybe I'll get some time to try them all, but I may need to wait for the weekend.

Quasar said:...Sticking to it religiously for Doom is an exercise in frustration beyond a certain point...

If this stuff is just an intellectual exercise, that's one thing... being religious about fixed point - then I don't think that's a good idea at all...[/B]

Linguica said:

...I'm partial to thinking about optimizations / improvements that *could* have been done in the original DOS engine in 1993. It's more interesting that way...


Of course, you both have very valid points. Here's a third viewpoint, for what it's worth (actually a bunch of small points):

. Doom can be compiled to run on a lot of hardware, with mixed support for floating-point, some very good, some very slow.
. Conversion to/from fixed/float must be done properly, so it's not slow.
. If you rewrite the whole renderer with floating-point in mind, you can create an awesome, fast renderer (like Cardboard). But, it's sort of all or nothing - you don;t want to be doing conversions everywhere.
. Yes, computers are very fast now, but Doom must do a lot more than it did in '93: 100x the pixels, massive limit-removal levels, 15K+monster levels, etc.

If I'm in a peer-to-peer coop game with, say, 2 others, playing a huge level, if that single float calculation is even a millisecond slower, multiplied by 500 walls, 35 fps, that just might push me past that 35 fps boundary, where I lose a frame. It matters.

In other words, fixing these old renderer bugs better not slow me down much. I have gotten used to my renderer's speed. And, yes, it's kinda neat to see what could have been done back in '93.

Having said all that, yes, a lot of these fixes are in the realm of creating what I call "poor man's floating point" with fixed-point math, and it can get quite ridiculous. For me, this is an intermediate step towards full conversion to floating-point, which I'm not ready for yet.

And, finally, these old bugs have frustrated me and others for years. I want to know why they occur, and I want to finally put them down!

By the way, I didn't get an answer, so let me please ask again:
Can anyone verify that R_PointToDist will never return a value higher than 8192? Thanks in advance.

Share this post


Link to post

Another issue with long walls:
https://www.youtube.com/watch?v=gO4HD_ZbYhc

Looks like precise value for rw_offset does help:

// rw_offset = FixedMul (hyp, -finesine[offsetangle >>ANGLETOFINESHIFT]);
double dx = viewx - curline->v1->x;
double dy = viewy - curline->v1->y;
double hyp = sqrt(dx*dx+dy*dy);
double a = (double)offsetangle/(1<<19)*2*M_PI/8192;
rw_offset = -sin(a)*hyp;
https://dl.dropboxusercontent.com/u/235592644/files/logwall_test.wad.zip

Share this post


Link to post
Linguica said:

Calculating the inverse square root of the length of segs is the perfect opportunity to use this famous function:
*snip*

Curious, does Q_rsqrt still work without -fno-strict-aliasing in GCC and clang?

Share this post


Link to post
Quasar said:

Curious, does Q_rsqrt still work without -fno-strict-aliasing in GCC and clang?

$ gcc -O2 -Wall -o check check.c
check.c: In function ‘Q_rsqrt’:
check.c:19:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
  i  = * ( long * ) &y;                       // evil floating point bit level hacking
of course, the Doom engine is hardly one to talk, since its entire functionality is built around type punning and void pointers...

Share this post


Link to post
entryway said:

Another issue with long walls:
https://www.youtube.com/watch?v=gO4HD_ZbYhc

Looks like precise value for rw_offset does help:

// rw_offset = FixedMul (hyp, -finesine[offsetangle >>ANGLETOFINESHIFT]);
double dx = viewx - curline->v1->x;
double dy = viewy - curline->v1->y;
double hyp = sqrt(dx*dx+dy*dy);
double a = (double)offsetangle/(1<<19)*2*M_PI/8192;
rw_offset = -sin(a)*hyp;
https://dl.dropboxusercontent.com/u/235592644/files/logwall_test.wad.zip

What code were you using when you made the video? Does the code quoted above fix the video's issue, or does it cause the issue to occur?

Share this post


Link to post
kb1 said:

What code were you using when you made the video? Does the code quoted above fix the video's issue, or does it cause the issue to occur?

Yes, brutal code above fixes the issue. Original calculation of rw_scale is commented in the first line.

Share this post


Link to post
entryway said:

Yes, brutal code above fixes the issue. Original calculation of rw_scale is commented in the first line.

What happens if you use my first formula?

I believe that the call to R_PointToDist is ok to calculate hyp. I think the only thing that causes long wall error is bad finesine values.

So, what happens in that area in the map you posted, if you change your code to this below?

// // rw_offset = FixedMul (hyp, -finesine[offsetangle >>ANGLETOFINESHIFT]);
// double dx = viewx - curline->v1->x;
// double dy = viewy - curline->v1->y;
// double hyp = sqrt(dx*dx+dy*dy);
// double a = (double)offsetangle/(1<<19)*2*M_PI/8192;
// rw_offset = -sin(a)*hyp;

rw_distance = (int)(hyp * sin((distangle * .00000000146291808f)));

EDIT: Or, if you prefer:
#define DOOM_ANG_TO_RAD   ((2 * M_PI / 8192) / (1<<19))
...

rw_distance = (int)(hyp * sin((distangle * DOOM_ANG_TO_RAD)));
Does look nicer. If this works, I'll try to make a more precise finesine table, which, if that also works, should be quite fast.

Share this post


Link to post
kb1 said:

What happens if you use my first formula?

I don't know what is distangle in your code. Anyway, I fixed it with nice code.

const int shift_bits = 1;

int_64_t dx = (curline->v2->x - curline->v1->x) >> shift_bits;
int_64_t dy = (curline->v2->y - curline->v1->y) >> shift_bits;
int_64_t dx1 = (viewx - curline->v1->x) >> shift_bits;
int_64_t dy1 = (viewy - curline->v1->y) >> shift_bits;
      
int_64_t distance = (dy * dx1 - dx * dy1) / (curline->length >> shift_bits);
int_64_t offset = (dx*dx1 + dy*dy1) / (curline->length >> shift_bits);

rw_distance = (fixed_t)(distance << shift_bits);
rw_offset = (fixed_t)(offset << shift_bits);
You even can remove this shift_bits stuff. It's hard to get int64 overflow there. Code becomes very simple.
int_64_t dx = curline->v2->x - curline->v1->x;
int_64_t dy = curline->v2->y - curline->v1->y;
int_64_t dx1 = viewx - curline->v1->x;
int_64_t dy1 = viewy - curline->v1->y;
      
rw_distance = (fixed_t)((dy * dx1 - dx * dy1) / curline->length);
rw_offset = (fixed_t)((dx*dx1 + dy*dy1) / curline->length);

Share this post


Link to post

I understood. You meant

rw_distance = (int)(hyp * cos((offsetangle * .00000000146291808)));
rw_offset = (int)(hyp * -sin((offsetangle * .00000000146291808)));
It does help for long walls, but doesn't help for too long walls (30k) on my test level above.

Share this post


Link to post
entryway said:

I understood. You meant

rw_distance = (int)(hyp * cos((offsetangle * .00000000146291808)));
rw_offset = (int)(hyp * -sin((offsetangle * .00000000146291808)));
It does help for long walls, but doesn't help for too long walls (30k) in my test level above.

Ah, ok then. Nice fix!

Share this post


Link to post
entryway said:

You even can remove this shift_bits stuff. It's hard to get int64 overflow there. Code becomes very simple.

int_64_t dx = curline->v2->x - curline->v1->x;
int_64_t dy = curline->v2->y - curline->v1->y;
int_64_t dx1 = viewx - curline->v1->x;
int_64_t dy1 = viewy - curline->v1->y;
      
rw_distance = (fixed_t)((dy * dx1 - dx * dy1) / curline->length);
rw_offset = (fixed_t)((dx*dx1 + dy*dy1) / curline->length);

This is beautiful, thank you!

Share this post


Link to post

What about segs with zero length? Are they possible? Are they possible in R_StoreWallRange? Then we should check before dividing or skip them.

Share this post


Link to post

Segs are defined by their start and end vertices so if you have a map with 2 vertices on top of each other then yes, I guess. IIRC the game would never try to draw them anyway, since such a seg would always span 0 pixels.

Share this post


Link to post
Linguica said:

IIRC the game would never try to draw them anyway, since such a seg would always span 0 pixels.

Correct. Segs with zero length should be skipped in R_AddLine

x1 = viewangletox[angle1];
x2 = viewangletox[angle2];

// Does not cross a pixel?
if (x1 == x2)
  return;				

Share this post


Link to post

Alright, now that this thread has born the fix for the wiggling lines and the long wall bug, here's the next curiosity that needs to get fixed. Look at the alignment of the floor flat tiles when the light flickers:


My hypothesis is that this has again to do with insufficient angle calculations: When the lights are off, all the floor has the same height, flat and lighting -- so the engine considers it as one visplane. When the lights are on, in turn, the floor turns into two different visplanes, so the engine needs to recalculate the floor texture offset from the point on where the lighting changes. Since this point is a lot closer to the player than the starting point of the formerly combined visplane, the alignment shifts by a single pixel and the floor wiggles. Can someone confirm that this makes sense?

Share this post


Link to post

Isn't that already fixed in prboom-plus? I definitely notice that when I play normal prboom at high resolutions I can see floor textures jittering around a lot when I turn slowly. I assumed it had to do with the affine mapping the Doom engine uses and that prboom-plus does something more perspective-correct. Or am I totally wrong?

Share this post


Link to post

Downloaded latest build, 8 bit, 1280x960:



edit: it happens when I change rendering quality from "Quality" to "Speed". Looking at the source this changes the behavior of R_MapPlane():

  // e6y
  //
  // [RH]Instead of using the xtoviewangle array, I calculated the fractional values
  // at the middle of the screen, then used the calculated ds_xstep and ds_ystep
  // to step from those to the proper texture coordinate to start drawing at.
  // That way, the texture coordinate is always calculated by its position
  // on the screen and not by its position relative to the edge of the visplane.
  //
  // Visplanes with the same texture now match up far better than before.
  //
  // See cchest2.wad/map02/room with sector #265

Share this post


Link to post
Linguica said:

edit: it happens when I change rendering quality from "Quality" to "Speed". Looking at the source this changes the behavior of R_MapPlane():]

Ah, I see, I had this set to "Quality". Now we just need someone to translate this routine into fixed-point arithmetics. ;)

Share this post


Link to post
Breezeep said:

I would love to see an update to Prboom + with this fix applied sometime.

It is applied since ages, you just need to enable it by setting Options->General->Rendering Quality from "Speed" to "Quality".

Share this post


Link to post
  • 2 weeks later...

This is how it looks in fixed-point math:

distance = FixedMul(planeheight, yslope[y]);

ds_xstep = FixedMul(viewsin, planeheight) / abs(centery - y);
ds_ystep = FixedMul(viewcos, planeheight) / abs(centery - y);

ds_xfrac =  viewx + FixedMul(viewcos, distance) + (x1 - centerx) * ds_xstep;
ds_yfrac = -viewy - FixedMul(viewsin, distance) + (x1 - centerx) * ds_ystep;
Edit: Don't forget to return() early if (y == centery)!

Share this post


Link to post

Nice!

I feel like I am at the crossroads, between fixed-point and floating-point. I wonder if it's worth it to dynamically switch functions to/from fixed/float, maybe based on map size. Or just go float and be done with it. I know that modern float is very fast, so should I abandon the older processors? I have a feeling that, on modern CPUs, float may even get parallelled, incurring very low cost, if written carefully.


I guess some profiling is in order. Has anyone done any timings in these areas?

Share this post


Link to post

There were/are source ports to choose from for older cpus, no need to cater to them anymore in 2015.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...