Maes Posted December 24, 2010 While I was investigating the distorted visplanes problem in MochaDoom (the tiny squares look OK only at certain view angles, while they get increasingly distorted in others), believing it was related to precision crapping out due to "high" resolutions I tried to see how other source ports handled this problem...surprisingly, with the notable exception of Eternity that has done a partial transition to a floating point renderer, I found that none of the other ports did anything special to handle precision issues (nor did I recall there being any). So the problem had to be somewhere else... I remembered that I had "resurrected" an unused R_DrawSpan function found in the linuxdoom code that implemented an unrolled version of the R_DrawSpan code (well, almost, as it turned out afterwards). It did work, and was a hefty 20-25% faster than the plain drawspan function, but I got the localized distortions. I reverted to the plain DrawSpan, and the distortions went away... So I was all WTF and tried to re-implement my own version of an unrolled R_DrawSpan (based on the "normal" one this time) and lo' and behold, the distortion went away, however the unrolling speedup was not nearly as good. It was there, but it wasn't as good. So I inspected the "outcast" function closely, and discovered how it gets its speed: position = ((ds_xfrac<<10)&0xffff0000) | ((ds_yfrac>>6)&0xffff); step = ((ds_xstep<<10)&0xffff0000) | ((ds_ystep>>6)&0xffff); source = ds_source; colormap = ds_colormap; dest = ylookup[ds_y] + columnofs[ds_x1]; count = ds_x2 - ds_x1 + 1; while (count >= 4) { ytemp = position>>4; ytemp = ytemp & 4032; xtemp = position>>26; spot = xtemp | ytemp; position += step; dest[0] = colormap[source[spot]]; ytemp = position>>4; ytemp = ytemp & 4032; xtemp = position>>26; spot = xtemp | ytemp; position += step; dest[1] = colormap[source[spot]]; ... etc. In nutshell, what happens is that the x-step and y-step are combined into one "step" operation, and both the starting x,y coords and the x,y stepping are truncated to 10 bits each and forced to reside in a 32-bit array. In practice, this leaves just 6 bits to express 64 screen units, and a 10-bit fractional part (not counting overflows between the two words forced to share the same 32 bits), and when drawing long runs of same-texture spans or watching them closely upon certain angles, distortion ensues. By comparison, the "normal" R_DrawSpan uses full 32-bit precision when incrementing the indexes, while now this has been effectively reduced to a 16 bit operation. The verdict: it works, it IS way faster than the normal R_DrawSpan (it might even be OK for low resolutions) but the distortion is noticeable. It is clearly a tradeoff between speed and quality, although not as extreme as the low-detail mode. I am going to attempt a fix by "refreshing" the position with a forced full-precision operation every so many counts. Edit: Adding this check at the bottom of the loop seems to fix the problem if ((rolls++)%32==0){ position = ((((rolls*4)*ds_xstep+ds_xfrac) << 10) & 0xffff0000) | ((((rolls*4)*ds_ystep+ds_yfrac) >> 6) & 0xffff); } where "rolls" is a count of how many unrolled loops you've done (so once every 32 loops I recompute the position with maximum precision). The tradeoff is that this slows down the code somewhat, however it still beats even a fully unrolled version of R_DrawSpan squarely. 0 Quote Share this post Link to post
Quasar Posted December 24, 2010 Interesting solution. SoM found the same thing out some time ago (long before EE converted to floating point with Cardboard) and created a high precision span drawer. Though in his case, he leaves only enough bits to express the maximum texture dimension (for example, 64 bytes) and uses the rest as fractional bits - 6.26 fixed point. He also managed to get a speed improvement over id's code by eliminating some of the variables out of the calculation entirely. 0 Quote Share this post Link to post
entryway Posted December 24, 2010 http://prboom-plus.sf.net/clip/2010-12-24_9.36.10.png 0 Quote Share this post Link to post
Maes Posted December 24, 2010 In theory, this method could be applied to a full 64-bit variable (Storing two 16.16|16.16 fixed_t numbers in both position and step) while retaining full precision and coalescing two operations in one, but depending on the OS and architecture it might be slower or faster than the plain 32-bit method. As it is however it effectively reduces the initial precision to 0.10 and then as the span draws slooowly accumulates to an "amazing" 6.10 Let alone that the LSW can overflow and affect the top one.... 0 Quote Share this post Link to post
RjY Posted December 24, 2010 Maes: the lines in your original post are so long that it is hard to read. Could you perhaps split that one line in the code tags so your post doesn't break the tables so much? Thanks! 0 Quote Share this post Link to post
_bruce_ Posted December 24, 2010 Damn. Such threads are the most interesting. I often read through the source but couldn't, and still can't, make much of it. 0 Quote Share this post Link to post
Maes Posted January 8, 2011 As I reviewed revro.wad for newstuff, I noticed that in vanilla Doom the floor distortions become progressively more noticeable to the right side of the screen, which is consistent with how the roundoff error is supposed to cumulate in the unrolled drawspan. Also, I compared the same (more or less) location in vanilla and zdoom screenshots at the same resolution, and it seems to me that ZDoom's rendering is more accurate, proving that vanilla Doom actually uses the unrolled function. Vanilla: ZDoom: I don't know why it's marked as "unused" in the Linux Doom code, other that it behaves progressively worse with resolution increases. At vanilla resolution however, you really gotta try to find the differences. 0 Quote Share this post Link to post
Quasar Posted January 8, 2011 Erm. In at least SOME of the DOOM exe's I have disassembled, *none* of the C versions of R_DrawSpan are used. There is a third version, and it was implemented in x86 assembly. The LINEAR.ASM file containing this implementation can be found in the Heretic source code. I believe it also suffers the same poor accuracy. Either way, the C version that was in Eternity, which was not unrolled to my memory, was also inaccurate in the exact same way. ;==================================================== ; ; unwound horizontal texture mapping code ; ; eax lighttable ; ebx scratch register ; ecx position 6.10 bits x, 6.10 bits y ; edx step 6.10 bits x, 6.10 bits y ; esi start of block ; edi dest ; ebp fff to mask bx ; ; ebp should by preset from ebx / ecx before calling ;==================================================== OP_SHLD = 0fh .DATA MAPDEFINE MACRO number dd hmap&number ENDM ALIGN 4 mapcalls LABEL LINE = 0 REPT SCREENWIDTH+1 MAPDEFINE %LINE LINE = LINE+1 ENDM callpoint dd 0 returnpoint dd 0 .CODE ;================ ; ; R_DrawSpan ; ; Horizontal texture mapping ; ;================ PROC R_DrawSpan_ PUBLIC R_DrawSpan_ PUSHR IFE SKIPPRIMITIVES mov eax,[_ds_x1] mov ebx,[_ds_x2] mov eax,[mapcalls+eax*4] mov [callpoint],eax ; spot to jump into unwound mov eax,[mapcalls+4+ebx*4] mov [returnpoint],eax ; spot to patch a ret at mov BYTE PTR [eax], OP_RET ; ; build composite position ; mov ecx,[_ds_xfrac] shl ecx,10 and ecx,0ffff0000h mov eax,[_ds_yfrac] shr eax,6 and eax,0ffffh or ecx,eax ; ; build composite step ; mov edx,[_ds_xstep] shl edx,10 and edx,0ffff0000h mov eax,[_ds_ystep] shr eax,6 and eax,0ffffh or edx,eax mov esi,[_ds_source] mov edi,[_ds_y] mov edi,[_ylookup+edi*4] add edi,[_columnofs] mov eax,[_ds_colormap] ; ; feed the pipeline and jump in ; mov ebp,0fffh ; used to mask off slop high bits from position shld ebx,ecx,22 ; shift y units in shld ebx,ecx,6 ; shift x units in and ebx,ebp ; mask off slop bits call [callpoint] mov ebx,[returnpoint] mov BYTE PTR [ebx],OP_MOVAL ; remove the ret patched in ENDIF POPR ret ;============= HIGH DETAIL ============ .CODE MAPLABEL MACRO number hmap&number: ENDM LINE = 0 PCOL = 0 REPT SCREENWIDTH/4 PLANE = 0 REPT 4 MAPLABEL %LINE LINE = LINE + 1 mov al,[esi+ebx] ; get source pixel shld ebx,ecx,22 ; shift y units in shld ebx,ecx,6 ; shift x units in mov al,[eax] ; translate color and ebx,ebp ; mask off slop bits add ecx,edx ; position += step mov [edi+PLANE+PCOL*4],al ; write pixel PLANE = PLANE + 1 ENDM PCOL = PCOL + 1 ENDM hmap320: ret ENDP END 0 Quote Share this post Link to post
Maes Posted January 8, 2011 Could it be that they first compiled the "unrolled" C version to ASM and then tried to further optimize it by hand? If so, then ports that aim for 100% vanilla display accuracy should use the "unrolled" C version, since it's obviously closer to what the actual ASM is doing. But perhaps only the original programmers and maybe the Boom team can tell us what is actually going on here (since they had limited access to the DOS code). Either way, it works really poorly for high resolutions unless you introduce my "accuracy sync" hack every so many loops, though. Edit: Chocolate Doom is also "too accurate". Compare this screenshot from the Control Tower's window in E1M1: Vanilla: Chocolate 1.21: However Chocolate v1.50 seems to use a "correctly buggy" function: 0 Quote Share this post Link to post
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.