Hi all,
during the last two days, cartrite and I have been busy testing and
benchmarking the new DXT output of my F-TexTools 2.0pre1/nmtools 2.0pre1, over at CelestialMatters.
While normalmap DXT compression is not yet CUDA supported, all the other relevant compressed DXT formats are.
++++++++++++++++++++++++++++
...and that is SPECTACULAR!
++++++++++++++++++++++++++++
Unfortunately the FX5900Ultra card of my Desktop is far too old for CUDA support and the Quadro NMS 110 card of my laptop just misses being supported...
+++++++++++++++++++++++++++++
However cartrite owns a 8600 GTS card. That is CUDA enabled.
Let's see what this meant in practice!
+++++++++++++++++++++++++++++
Here is the complete list of NVIDIA cards where
CUDA acceleration can be activated simply by means of installing an appropriate, CUDA-enabled NVIDIA driver:
http://www.nvidia.com/object/cuda_learn_products.htmlSome of you might ask what CUDA is all about?
=========================================
[ Beyond my subsequent, short summary, you might want to read a bit more on the CUDA project pages:
http://www.nvidia.com/object/cuda_what_is.htmlhttp://www.nvidia.com/object/cuda_get.html]
It's quite an ingenuous approach for tremendously accelerating calculations on your computer:
In our context, the idea of the Cuda project is to
use the GPU (Graphical Processor Unit) of your graphics card rather than you normal CPU for the DXT compression! Note well, this has nothing to do with a faster rendering...
CUDA includes a specialized compiler that outputs code which your GPU (!) understands from input code that is very close to C. For our purpose, all this code has been written already and is part of the NVTT tools that are implemented now in my tools via a single library.
The point is simply that the highly specialized processors in modern graphics cards are much ... much faster than normal allround CPUs. They execute these DXT compression jobs IN PARALLEL and also, the card memory is VERY fast.
So
CUDA just "outsources" the job of calculating the ~3000 VTs to your graphics card. Unlike normal outsourcing, this one is entirely FREE-OF-CHARGE (if you got a reasonably modern NVIDIA card)
Have a look at the standard advertisement plot from NVIDIA :
Hence, if you own a Geforce 8800GTX card, CUDA acceleration would let you convert the ~3000 VT's a factor of TEN faster than e.g. a Core 2 Duo CPU!! Not bad at all...
+++++++++++++++++++++++++++++
Now after these prerequisites, let me report how all this theory looks in practice, taking cartrite's 8600 GTS card as an example:
+++++++++++++++++++++++++++++
Firstly, with his previous CUDA NON-enabled NVIDIA driver, he and I got about a very similar performance of calculating 2048 level5 VT's in high-quality DXT3 format. We used a 64k RGBA input texture (with RGB base texture and cartrite's beautiful SWBD-based specmap as the alpha (A) channel).
Here are e.g. his logs:
cartrite wrote:tile[ 64 VT's of 2048 -> 2.47 s]
tile[ 128 VT's of 2048 -> 15.89 s]
tile[ 192 VT's of 2048 -> 119.93 s]
tile[ 256 VT's of 2048 -> 266.46 s]
tile[ 320 VT's of 2048 -> 426.47 s]
tile[ 384 VT's of 2048 -> 708.26 s]
tile[ 448 VT's of 2048 -> 957.17 s]
tile[ 512 VT's of 2048 -> 1186.79 s]
tile[ 576 VT's of 2048 -> 1384.66 s]
...
tile[ 1792 VT's of 2048 -> 3098.95 s]
tile[ 1856 VT's of 2048 -> 3181.83 s]
tile[ 1920 VT's of 2048 -> 3323.20 s]
tile[ 1984 VT's of 2048 -> 3411.35 s]
tile[ 2048 VT's of 2048 -> 3455.60 s]
My machine did it a little faster, but this was not significant. You see, without CUDA acceleration,
it took 3455 sec, i.e.a little less than 1 hour for these 2048 high-quality DXT VTs.
Then cartrite did nothing else but installing the corresponding CUDA-enabled NVIDIA driver for his card and run the same job with my new
txtilesDXT tool again. Here are his logs:
cartrite wrote:[txtilesDXT]: Input file is a 4x8 bit RGBA texture: 65536 x 32768
Generating 2048 optimized VT tiles for level 5
in DXT3 = BC2 format, of size from 128 x 1024 to 1024 x 1024
High-quality DXT compression,
about (3 - 6)x slower than fast-mode!!
tile[ 64 VT's of 2048 -> 2.40 s]
tile[ 128 VT's of 2048 -> 7.34 s]
tile[ 192 VT's of 2048 -> 23.43 s]
tile[ 256 VT's of 2048 -> 43.53 s]
tile[ 320 VT's of 2048 -> 65.03 s]
tile[ 384 VT's of 2048 -> 104.10 s]
tile[ 448 VT's of 2048 -> 138.90 s]
tile[ 512 VT's of 2048 -> 171.68 s]
tile[ 576 VT's of 2048 -> 201.00 s]
tile[ 640 VT's of 2048 -> 228.32 s]
tile[ 704 VT's of 2048 -> 255.58 s]
tile[ 768 VT's of 2048 -> 281.16 s]
tile[ 832 VT's of 2048 -> 304.62 s]
tile[ 896 VT's of 2048 -> 324.99 s]
tile[ 960 VT's of 2048 -> 346.18 s]
tile[ 1024 VT's of 2048 -> 365.74 s]
tile[ 1088 VT's of 2048 -> 386.51 s]
tile[ 1152 VT's of 2048 -> 406.53 s]
tile[ 1216 VT's of 2048 -> 425.96 s]
tile[ 1280 VT's of 2048 -> 445.82 s]
tile[ 1344 VT's of 2048 -> 464.44 s]
tile[ 1408 VT's of 2048 -> 481.90 s]
tile[ 1472 VT's of 2048 -> 495.86 s]
tile[ 1536 VT's of 2048 -> 508.24 s]
tile[ 1600 VT's of 2048 -> 520.06 s]
tile[ 1664 VT's of 2048 -> 531.38 s]
tile[ 1728 VT's of 2048 -> 541.15 s]
tile[ 1792 VT's of 2048 -> 547.63 s]
tile[ 1856 VT's of 2048 -> 561.50 s]
tile[ 1920 VT's of 2048 -> 581.19 s]
tile[ 1984 VT's of 2048 -> 593.14 s]
tile[ 2048 VT's of 2048 -> 599.46 s]
AMAZING! You can see that this time the job was done
a factor of SIX faster, i.e. in just 10 minutes. With a more recent card, the acceleration would be a factor of TEN and higher!
I would very much like to know who in our community would deserve being entered into "the Guinness book of world records"
Let me remind you, that installing and using my new tools is VERY simple, notably for Windows, where I provided an installer, which gets you going with ONE click....
E.g. for level 5, it's the following simple console command that produces all these 2048 VT's DIRECTLY in high-quality DXT3 format:
Code: Select all
txtilesDXT 4 65536 5 < world.200406.3x65536x32768.RGBA.bin
++++++++++++++++++++++++++++++++++++
Of course, it would be great if DW would again find a little time to port this stuff to MAC OSX
. That, --I am afraid-- I cannot do...
++++++++++++++++++++++++++++++++++++
Fridger
PS: If you are interested to follow more closely what cartrite and I have been excited about during the last couple of days, here is the link to my respective CM thread:
http://forum.celestialmatters.org/viewt ... sc&start=0