entryway Posted September 2, 2010 I added support for textured automap in glboom-plus and noticed that MSVC 6.0 build is very slow on textured automap. I did not believe it can be 2x slower then MSVC 2005 build. fraps + "glboom-plus dv.wad -warp 5" + "map_textured 1" in cfg + <Tab> IDDT MSVC 6.0 - 45 fps MSVC 2005/2008 - 93 fps Intel Compiler 10 - 103 fps http://prboom-plus.sf.net/vc6-vc2005-intel10.zip How such code can be 2x slower when I build it with vc6? Massive cache misses I think. 0 Quote Share this post Link to post
entryway Posted September 2, 2010 lol, I replacedglTexCoord2f(flats_vbo[vertexnum].u + floor_uoffs, flats_vbo[vertexnum].v + floor_voffs); glVertex3f(flats_vbo[vertexnum].x, flats_vbo[vertexnum].z, 0);withglVertex3f(flats_vbo[vertexnum].x, flats_vbo[vertexnum].z, 0); glTexCoord2f(flats_vbo[vertexnum].u + floor_uoffs, flats_vbo[vertexnum].v + floor_voffs);and now I have 23 fps! with MSVC 6.0 and 30 with MSVC 2005 It's strange, becausetypedef struct vbo_xyz_uv_s { float x, y, z; float u, v; } vbo_xyz_uv_t; 0 Quote Share this post Link to post
Graf Zahl Posted September 2, 2010 entryway said:lol, I replacedglTexCoord2f(flats_vbo[vertexnum].u + floor_uoffs, flats_vbo[vertexnum].v + floor_voffs); glVertex3f(flats_vbo[vertexnum].x, flats_vbo[vertexnum].z, 0);withglVertex3f(flats_vbo[vertexnum].x, flats_vbo[vertexnum].z, 0); glTexCoord2f(flats_vbo[vertexnum].u + floor_uoffs, flats_vbo[vertexnum].v + floor_voffs);and now I have 23 fps! with MSVC 6.0 and 30 with MSVC 2005 Does that even work right? If I understand OpenGL correctly the glVertex command must always be the last one to terminate one vertex's attributes. 0 Quote Share this post Link to post
entryway Posted September 2, 2010 Graf Zahl said:Does that even work right? If I understand OpenGL correctly the glVertex command must always be the last one to terminate one vertex's attributes. Ah yes, you are right. But big difference between mcvc 6 and 2005 remains for that code. 0 Quote Share this post Link to post
Quasar Posted September 2, 2010 I think it's generally agreed that Visual C++ 6.0 was terrible at generating floating point code in general. Just judging from the number of hacks in Quake 2's source code to work around them would be enough to deduce that, probably, even if I hadn't heard similar complaints all over the place :) I know EE's Cardboard runs better compiled with 2008 than it did with 6.0. Ten+ whole years of development separate the two. 0 Quote Share this post Link to post
entryway Posted September 3, 2010 lol // sort subsectors by texture //qsort(visible_subsectors, visible_subsectors_count, // sizeof(visible_subsectors[0]), dicmp_visible_subsectors_by_pic);.. and FPS doubled for MSVC 6.0 executable wtf? why msvc6' implementation of qsort is so fucking slow? 0 Quote Share this post Link to post
Spleen Posted September 3, 2010 I wonder how fast mingw-generated code runs compared to the Windows compilers. It might be nicer than MSVC 6.0 for backwards compatibility with older Windows versions. 0 Quote Share this post Link to post
entryway Posted September 3, 2010 Spleen said:I wonder how fast mingw-generated code runs compared to the Windows compilers. It might be nicer than MSVC 6.0 for backwards compatibility with older Windows versions. gcc ~= msvc 2005 msvc 2005 and intel 10 are compatible with older Windows versions 0 Quote Share this post Link to post
entryway Posted September 3, 2010 MSVC 6.0 on my work computer: standard qsort - 25 fps no sorting at all - 45 fps custom implementation of qsort - 56 fps 0 Quote Share this post Link to post
Graf Zahl Posted September 3, 2010 Did they implement a Bubblesort or what? 0 Quote Share this post Link to post
entryway Posted September 3, 2010 Graf Zahl said:Did they implement a Bubblesort or what? heh I have compared custom qsort with msvc2005 qsort and the first one is only ~1% faster. I think because 'compare' func is inlined. 0 Quote Share this post Link to post
Quasar Posted September 3, 2010 If history is any indication then the sort is probably cache-bound, which means it probably won't matter what algorithm you replace it with. 0 Quote Share this post Link to post
entryway Posted September 3, 2010 Quasar said:If history is any indication then the sort is probably cache-bound, which means it probably won't matter what algorithm you replace it with. Probably I did not understand you, but 'my' implementation of qsort is 2.5x faster than msvc6'. On Deus Vult map05 + 'IDDT' + 'textured automap' I sort 20k pointers every frame and custom implementation is much faster (25 fps -> 55 fps) #ifdef USE_CUSTOM_QSORT void qsort_subsectors_by_pic(subsector_t **arr, unsigned int n) { #define cmp_subsectors_by_pic(a, b) ((*a)->sector->floorpic > (*b)->sector->floorpic) QSORT(subsector_t*, arr, n, cmp_subsectors_by_pic); } #else static int C_DECL dicmp_visible_subsectors_by_pic(const void *a, const void *b) { return (*((const subsector_t **)b))->sector->floorpic - (*((const subsector_t **)a))->sector->floorpic; } #endif #ifdef USE_CUSTOM_QSORT qsort_subsectors_by_pic(visible_subsectors, visible_subsectors_count); #else qsort(visible_subsectors, visible_subsectors_count, sizeof(visible_subsectors[0]), dicmp_visible_subsectors_by_pic); #endif 0 Quote Share this post Link to post
entryway Posted September 3, 2010 Graf Zahl said:Did they implement a Bubblesort or what? I have tested qsort speed exclusively. Code: http://pastebin.com/WR2ztyNP Results:MSVC 6.0 qsort - 32600 ms Custom qsort - 1200 ms MSVC 2005 qsort - 1003 ms Intel 10 qsort - 980 msThus MSVC6 qsort is 32x slower than MSVC2005, heh. With my 'compare' function and my data of course. Anyway, I have changed the algo a little and now I have 100+ fps with any compiler, because I do not need to sort that array every frame. 0 Quote Share this post Link to post
Graf Zahl Posted September 3, 2010 entryway said:I have tested qsort speed exclusively. Are you sure that MSVC 6 even implements a proper qsort? The speed you get suggests that there's something terribly wrong with it. 0 Quote Share this post Link to post
entryway Posted September 3, 2010 Graf Zahl said:Are you sure that MSVC 6 even implements a proper qsort? At least it sorts the data :) I think modern compilers use some tricks (there are a few tricks which can be applied to classic qsort) and read data less randomly to minimize cache misses. 0 Quote Share this post Link to post
entryway Posted September 4, 2010 Graf Zahl said:Are you sure that MSVC 6 even implements a proper qsort? The speed you get suggests that there's something terribly wrong with it. Such trick does not help for MSVC 6.0. 21000ms against 1000ms for custom sort. typedef struct visible_subsectors_s { subsector_t *sub; int floorpic; } visible_subsectors_t; for (i = 0; i < numsubsectors; i++) { visible_subsectors[visible_subsectors_count].sub = &subsectors[i]; visible_subsectors[visible_subsectors_count].floorpic = subsectors[i].sector->floorpic; visible_subsectors_count++; } static int C_DECL dicmp_visible_subsectors_by_pic(const void *a, const void *b) { return ((const visible_subsectors_t *)b)->floorpic - ((const visible_subsectors_t *)a)->floorpic; } 0 Quote Share this post Link to post
Maes Posted September 4, 2010 It's quite possible that Microsoft's implementation degenerates to O(n^2) way too easily for that kind of data or spams a fuckton of recursive calls rather than being iterative. Not good for something you need to call in between frame renders. Of course I'd need to see the implementation to see that...it would not be the first time that a custom implementation (perhaps for a precise kind of object/fixed comparison key) is faster than a more generic one, or can be provided with extra knowledge that make it more efficient (e.g. more optimal pivoting, in your case). 0 Quote Share this post Link to post
entryway Posted September 4, 2010 MSVC 6.0: http://pastebin.com/nmZVzrPG MSVC 2005: http://pastebin.com/T3LBsR3Y Custom: http://pastebin.com/WfG9qtbj The problem exists only with MSVC 6.0 qsort, but I did not try to recompile its crt/src 0 Quote Share this post Link to post
Maes Posted September 5, 2010 Heh. The two MSVC implementations of qsort are more or less the same as far as raw code is concerned, except for this interesting algorithmic tidbit here: from MSVC 6.0 implementation, about choosing a pivot element: The efficiency of the algorithm demands that we find one that is approximately the median of the values, but also that we select one fast. Using the first one produces bad performace if the array is already sorted, so we use the middle one, which would require a very wierdly arranged array for worst case performance. Testing shows that a median-of-three algorithm does not, in general, increase performance. ...or does it? from the MSVC 2005 implementation, on the same code spot: We choose the median of the first, middle, and last elements, to avoid bad performance in the face of already sorted data, or data that is made up of multiple sorted runs appended together. Testing shows that a median-of-three algorithm provides better performance than simply picking the middle element for the latter case. So apparently they got scathed in the butt with that design decision in MSVC 6.0, as even the optimized custom implementation uses a median of three: from the custom qsort implementation: 2. Chose the pivot element using a median-of-three decision tree. This reduces the probability of selecting a bad pivot value and eliminates certain extraneous comparisons. It also has other optimizations such as using insertion sort for sufficiently small partitions, while both MS implementations only do so if the WHOLE array to sort is below this "sufficiently small" threshold, not individual partitions. But the big player seems to be the pivot choosing method. This generates interesting questions about the nature of the data to be sorted -if it can fuck up with qsort so badly, maybe another nlogn sorting algorithm should be used. Or maybe it should not be sorted at all/only sorted approximatively. 0 Quote Share this post Link to post
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.