Page 1 of 1

F-TexTools/nmtools with direct high quality DDS output!

Posted: 16.10.2008, 19:06
by t00fri
Hi all,

Ignacio Castano has just released version 2.0.4 of the (open source) NVIDIA texture tools 2 (NVTT). So -- from time to time-- when I was fed up fiddling with my globular blending, I played with implementing his libnvtt into my F-TexTools/ nmtools!

This is quite attractive and not hard.

So I now coded besides the usual VT tiling program txtiles (generating PNG format tiles), another one, called txtilesDXT that DIRECTLY produces highest quality DXT3 tiles, supporting BOTH the fast and normal compression modes.

Analogously, there will be a tiling program (nmtilesDXT) for normal maps in dedicated high-quality DXT5nm normal map format.

This saves lots of time and "shell script" logistics...

I will be soon be ready for the testing stage in the various operating systems. So please let me know, if you are interested to help with the tests. Some experience with simple source code compilation in your own operating system would be desirable, since I don't have the time for going through yet another tutorial about the same old stuff.... ;-)

Fridger

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 16.10.2008, 20:45
by cartrite
t00fri wrote:I will be soon be ready for the testing stage in the various operating systems. So please let me know, if you are interested to help with the tests. Some experience with simple source code compilation in your own operating system would be desirable, since I don't have the time for going through yet another tutorial about the same old stuff.... ;-)

Fridger
I'll help. But I can't until this upcoming Monday (October 20). I'll be away from my computer this weekend.

I was also wondering if it would be possible to add a 16 bit type to some of the tools in the FTexTools package. There is no support for 16 bit. Only 8, 24, or 32. I only ask this because I was thinking of seeing how it would work with a 16 bit grayscale alpha cloudmap. That would be half the size of a 32 bit rgba file. I was using virtualtex but that still has problems doing the larger files.
cartrite

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 16.10.2008, 21:34
by t00fri
cartrite wrote:
t00fri wrote:I will be soon be ready for the testing stage in the various operating systems. So please let me know, if you are interested to help with the tests. Some experience with simple source code compilation in your own operating system would be desirable, since I don't have the time for going through yet another tutorial about the same old stuff.... ;-)

Fridger
I'll help. But I can't until this upcoming Monday (October 20). I'll be away from my computer this weekend.
cartrite

I was hoping you would ;-)
No hurry. I still have a deadline with my globular clusters end of next week at the latest.

As to 16 bit, the little-big endian issues are like for 2Byte grayscale input for normalmaps. Shouldn't be hard.

Fridger

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 17.10.2008, 19:56
by t00fri
Since a couple of days I am clearing up with Ignacio Castano/NVIDIA various "oddities" in libnvtt. For people interested to follow our communication, here is the link to his forum:

http://groups.google.com/group/nvidia-t ... 5b6e5182e4

Altogether libnvtt is pretty solid (apart from a segfault in Linux that Ignacio is currently investigating with the help of my gdb trace back).

The ApiDocs of libnvtt are here
http://code.google.com/p/nvidia-texture ... umentation

and well written...

Currently I have completed the main VT tiling application txtilesDXT that will be part of my forthcoming F-TexTools-2.0. In all cases of interest (grayscale, RGB and RGBA input) the DXT1, DXT3 tiles tx_x_y.dds have identical compression errors compared to the corresponding PNG tiles, as obtained with the official DXT compressor application nvcompress. The nvtt tool nvimgdiff allows in each case to quantify any minor deviations!! This is most helpful for testing.

Next comes the analogous code for DXT5nm normal maps, which is completely routine now...

Fridger

PS: Let me add that within my txtilesDXT code, I could easily implement the following mipmap "fancyness":

For level0 VT's, I request the full mipmap set, since that is what is needed at larger distances. In principle, for higher levels, one does not need mipmaps, since their role is played by the lower VT levels... but only in principle. You will recall the familiar regions of bad focus in-between successive levels... Depending on the used downsampling mipmap filter, one mipmap per level might provide a much faster/smoother interpolation between successive levels. This is so far a speculation (Chris?) and will be quantitatively tested very soon.

At least, that's what my code does for now ;-)

Fridger

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 17.10.2008, 20:39
by t00fri
Chris,

is it correct that -- for now-- , Celestia does NOT support BC4 format?
http://msdn.microsoft.com/en-us/library/bb694531(VS.85).aspx#BC4

It is the optimal format to store one-component color data using 8 bits for each color. As a result of the increased accuracy (compared to BC1 = DXT1), BC4 is ideal for storing floating point data in the range of [0 to 1] using the DXGI_FORMAT_BC4_UNORM format and [-1 to +1] using the DXGI_FORMAT_BC4_SNORM format. Assuming a 4x4 texture using the largest data format possible, this compression technique reduces the memory required from 16 bytes (16 colors x 1 components/color x 1 byte/component) to 8 bytes.

Fridger

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 18.10.2008, 03:47
by chris
t00fri wrote:Chris,

is it correct that -- for now-- , Celestia does NOT support BC4 format?
http://msdn.microsoft.com/en-us/library/bb694531(VS.85).aspx#BC4

It is the optimal format to store one-component color data using 8 bits for each color. As a result of the increased accuracy (compared to BC1 = DXT1), BC4 is ideal for storing floating point data in the range of [0 to 1] using the DXGI_FORMAT_BC4_UNORM format and [-1 to +1] using the DXGI_FORMAT_BC4_SNORM format. Assuming a 4x4 texture using the largest data format possible, this compression technique reduces the memory required from 16 bytes (16 colors x 1 components/color x 1 byte/component) to 8 bytes.

Correct--Celestia doesn't yet support this format, though it's not hard to change that. However, hardware support for this compressed format is only available on DX10-capable hardware: GeForce 8 and 9 series, and Radeon HD 2, 3, and 4 series graphics cards. If the format is available, ARB_texture_compression_rgtc should appear in the list of extensions in Celestia's OpenGL Info window.

--Chris

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 18.10.2008, 09:36
by BobHegwood
Doctor Schrempp?

Can I interject another Brain-Dead question here?
I'm just curious to know why the DDS format is now important. Is it because
the quality of the direct-draw textures can now be improved drastically over the PNG
files? Or, is it simply to provide faster displays of planetary features.

Sorry, but you know how I am.

Thanks, Brain-Dead

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 18.10.2008, 13:24
by t00fri
BobHegwood wrote:Doctor Schrempp?

Can I interject another Brain-Dead question here?
I'm just curious to know why the DDS format is now important. Is it because
the quality of the direct-draw textures can now be improved drastically over the PNG
files? Or, is it simply to provide faster displays of planetary features.

Sorry, but you know how I am.

Thanks, Brain-Dead

Bob,

for VT's, the DXT format was always CRUCIAL!

It would be a real pity if people never compressed the resulting PNG tiles from my F-TexTools/nmtools with the help of the opensource NVIDIA Texture Tools (2) that work for ANY OS... The point is that the DXT formats are hardware-supported by your graphics card. Hence unlike JPG or PNG tiles, they can be uncompressed in NO TIME (almost ;-) ) by your GPU. Hence the performance increases amazingly, once you only use good quality DXT tiles in your "kingsize" VT sets.

The main concern has always been the achieved quality, since --like JPGs--, the DXT format is NOT lossless. Mainly the compression of the delicate normalmap VT's was unsatisfactory for a long time. Meanwhile, great progress has been made,

  • due to the quality-optimized SQUISH compressor by Simon Brown that has been implemented into the NVIDIA Texture Tools 2, as maintained and further developed by Ignacio C., and
  • the introduction and Celestia support of a high-quality dedicated normalmap DXT format, the so-called DXT5nm. Note the little nm=normalmap at the end!

There is a most useful NVTT 2 tool (nvimgdiff) that allows to measure and to compare quantitatively the residual errors arising from the various available compression algorithms.

Altogether the quality aspect is not a major concern anymore. I did a LOT of careful respective tests in the past.

Cheers,
Fridger

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 18.10.2008, 14:07
by BobHegwood
t00fri wrote:The point is that the DXT formats are hardware-supported by your graphics card. Hence unlike JPG or PNG tiles, they can be uncompressed in NO TIME (almost ;-) ) by your GPU. Hence the performance increases amazingly, once you only use good quality DXT tiles in your "kingsize" VT sets.

Well, as always, I appreciate the explanation here.
I might be able to contribute in your efforts, but methinks you probably need someone with
more, shall we say, experience here in order to get some benefit from the participation.
Does sound as if this would be worth pursuing, since most of the drops in FPS come from my
use of VT's.

At any rate, again, thanks for the explanation. :wink:

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 19.10.2008, 18:13
by t00fri
Meanwhile, I have finished coding the DXT interface to my tools. I am waiting for Ignacio's fix of a segfault under gcc 4.x compilation and some instructions about how I can publish a "run-time" version of libnvtt. Then we will be ready for testing.

Both my new txtilesDXT and nmtilesDXT work VERY smoothly and fast, giving satisfactory high-quality VT's both for 64k normalmaps and the 64k (base + spec) textures.

I now can generate from scratch a complete high-quality 64k DDS tileset in 15-20 minutes....

What is also most useful is the possibility of quantitatively checking slight quality differences with the 'nvimgdiff' tools under modifications e.g. of the compressor speed and other possible parameter settings...


Fridger

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 20.10.2008, 18:20
by t00fri
Chris,

since my experience with shaders is (still) small, could you please confirm what kind of mapping the present code

Code: Select all

            if (props.texUsage & ShaderProperties::CompressedNormalTexture)
            {
                source += "vec3 n;\n";
                source += "n.xy = texture2D(normTex, " + normTexCoord + ".st).ag * 2.0 - vec2(1.0, 1.0);\n";
                source += "n.z = sqrt(1.0 - n.x * n.x - n.y * n.y);\n";
            }


in shadermanager.cpp implies.

The actual (x,y) normalmap mapping of NVTT is

[R=0xFF, y->G, B=0,x->A]

This assignment is also conform with Capcon's DXT trick of treating DXT1n and DXT5n under one heading (shader code)
See:
http://code.google.com/p/nvidia-texture ... ompression

Fridger

Re: F-TexTools/nmtools with direct high quality DDS output!

Posted: 25.10.2008, 17:38
by t00fri
Hi all,

during the last two days, cartrite and I have been busy testing and benchmarking the new DXT output of my F-TexTools 2.0pre1/nmtools 2.0pre1, over at CelestialMatters.


While normalmap DXT compression is not yet CUDA supported, all the other relevant compressed DXT formats are.

++++++++++++++++++++++++++++
...and that is SPECTACULAR!
++++++++++++++++++++++++++++

Unfortunately the FX5900Ultra card of my Desktop is far too old for CUDA support and the Quadro NMS 110 card of my laptop just misses being supported...

+++++++++++++++++++++++++++++
However cartrite owns a 8600 GTS card. That is CUDA enabled.

Let's see what this meant in practice!
+++++++++++++++++++++++++++++

Here is the complete list of NVIDIA cards where CUDA acceleration can be activated simply by means of installing an appropriate, CUDA-enabled NVIDIA driver:

http://www.nvidia.com/object/cuda_learn_products.html

Some of you might ask what CUDA is all about? ;-)
=========================================

[ Beyond my subsequent, short summary, you might want to read a bit more on the CUDA project pages:

http://www.nvidia.com/object/cuda_what_is.html
http://www.nvidia.com/object/cuda_get.html
]

It's quite an ingenuous approach for tremendously accelerating calculations on your computer:

In our context, the idea of the Cuda project is to use the GPU (Graphical Processor Unit) of your graphics card rather than you normal CPU for the DXT compression! Note well, this has nothing to do with a faster rendering...

CUDA includes a specialized compiler that outputs code which your GPU (!) understands from input code that is very close to C. For our purpose, all this code has been written already and is part of the NVTT tools that are implemented now in my tools via a single library.

The point is simply that the highly specialized processors in modern graphics cards are much ... much faster than normal allround CPUs. They execute these DXT compression jobs IN PARALLEL and also, the card memory is VERY fast.

So CUDA just "outsources" the job of calculating the ~3000 VTs to your graphics card. Unlike normal outsourcing, this one is entirely FREE-OF-CHARGE (if you got a reasonably modern NVIDIA card) ;-)

Have a look at the standard advertisement plot from NVIDIA :
Image

Hence, if you own a Geforce 8800GTX card, CUDA acceleration would let you convert the ~3000 VT's a factor of TEN faster than e.g. a Core 2 Duo CPU!! Not bad at all...

+++++++++++++++++++++++++++++
Now after these prerequisites, let me report how all this theory looks in practice, taking cartrite's 8600 GTS card as an example:
+++++++++++++++++++++++++++++

Firstly, with his previous CUDA NON-enabled NVIDIA driver, he and I got about a very similar performance of calculating 2048 level5 VT's in high-quality DXT3 format. We used a 64k RGBA input texture (with RGB base texture and cartrite's beautiful SWBD-based specmap as the alpha (A) channel).

Here are e.g. his logs:

cartrite wrote:tile[ 64 VT's of 2048 -> 2.47 s]
tile[ 128 VT's of 2048 -> 15.89 s]
tile[ 192 VT's of 2048 -> 119.93 s]
tile[ 256 VT's of 2048 -> 266.46 s]
tile[ 320 VT's of 2048 -> 426.47 s]
tile[ 384 VT's of 2048 -> 708.26 s]
tile[ 448 VT's of 2048 -> 957.17 s]
tile[ 512 VT's of 2048 -> 1186.79 s]
tile[ 576 VT's of 2048 -> 1384.66 s]
...
tile[ 1792 VT's of 2048 -> 3098.95 s]
tile[ 1856 VT's of 2048 -> 3181.83 s]
tile[ 1920 VT's of 2048 -> 3323.20 s]
tile[ 1984 VT's of 2048 -> 3411.35 s]
tile[ 2048 VT's of 2048 -> 3455.60 s]

My machine did it a little faster, but this was not significant. You see, without CUDA acceleration, it took 3455 sec, i.e.a little less than 1 hour for these 2048 high-quality DXT VTs.

Then cartrite did nothing else but installing the corresponding CUDA-enabled NVIDIA driver for his card and run the same job with my new txtilesDXT tool again. Here are his logs:

cartrite wrote:[txtilesDXT]: Input file is a 4x8 bit RGBA texture: 65536 x 32768

Generating 2048 optimized VT tiles for level 5
in DXT3 = BC2 format, of size from 128 x 1024 to 1024 x 1024

High-quality DXT compression,
about (3 - 6)x slower than fast-mode!!

tile[ 64 VT's of 2048 -> 2.40 s]
tile[ 128 VT's of 2048 -> 7.34 s]
tile[ 192 VT's of 2048 -> 23.43 s]
tile[ 256 VT's of 2048 -> 43.53 s]
tile[ 320 VT's of 2048 -> 65.03 s]
tile[ 384 VT's of 2048 -> 104.10 s]
tile[ 448 VT's of 2048 -> 138.90 s]
tile[ 512 VT's of 2048 -> 171.68 s]
tile[ 576 VT's of 2048 -> 201.00 s]
tile[ 640 VT's of 2048 -> 228.32 s]
tile[ 704 VT's of 2048 -> 255.58 s]
tile[ 768 VT's of 2048 -> 281.16 s]
tile[ 832 VT's of 2048 -> 304.62 s]
tile[ 896 VT's of 2048 -> 324.99 s]
tile[ 960 VT's of 2048 -> 346.18 s]
tile[ 1024 VT's of 2048 -> 365.74 s]
tile[ 1088 VT's of 2048 -> 386.51 s]
tile[ 1152 VT's of 2048 -> 406.53 s]
tile[ 1216 VT's of 2048 -> 425.96 s]
tile[ 1280 VT's of 2048 -> 445.82 s]
tile[ 1344 VT's of 2048 -> 464.44 s]
tile[ 1408 VT's of 2048 -> 481.90 s]
tile[ 1472 VT's of 2048 -> 495.86 s]
tile[ 1536 VT's of 2048 -> 508.24 s]
tile[ 1600 VT's of 2048 -> 520.06 s]
tile[ 1664 VT's of 2048 -> 531.38 s]
tile[ 1728 VT's of 2048 -> 541.15 s]
tile[ 1792 VT's of 2048 -> 547.63 s]
tile[ 1856 VT's of 2048 -> 561.50 s]
tile[ 1920 VT's of 2048 -> 581.19 s]
tile[ 1984 VT's of 2048 -> 593.14 s]
tile[ 2048 VT's of 2048 -> 599.46 s]

AMAZING! You can see that this time the job was done a factor of SIX faster, i.e. in just 10 minutes. With a more recent card, the acceleration would be a factor of TEN and higher!

I would very much like to know who in our community would deserve being entered into "the Guinness book of world records" ;-)

Let me remind you, that installing and using my new tools is VERY simple, notably for Windows, where I provided an installer, which gets you going with ONE click....

E.g. for level 5, it's the following simple console command that produces all these 2048 VT's DIRECTLY in high-quality DXT3 format:

Code: Select all

txtilesDXT  4 65536 5 < world.200406.3x65536x32768.RGBA.bin


++++++++++++++++++++++++++++++++++++
Of course, it would be great if DW would again find a little time to port this stuff to MAC OSX ;-) . That, --I am afraid-- I cannot do...
++++++++++++++++++++++++++++++++++++

Fridger

PS: If you are interested to follow more closely what cartrite and I have been excited about during the last couple of days, here is the link to my respective CM thread:

http://forum.celestialmatters.org/viewt ... sc&start=0