GPU computing (CUDA) & new NV DXTcompressor (squish)
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 7 months
- Location: Hamburg, Germany
GPU computing (CUDA) & new NV DXTcompressor (squish)
In the context of my ongoing 'nmtools' & the new 'txtools' project to generate highest-quality VT sets of /monster/ textures, I am of course alert to new, exciting developments in this area.
One such new "hotspot" is certainly GPU Supported Texture Compression advocated a lot by NVIDIA recently! Chris also quoted a respective link to NVIDIA's developer site.
http://developer.nvidia.com/object/texture_tools.html
http://developer.nvidia.com/object/cuda.html
If I have a little more time, I'll report here about some more basic issues related to GPU computing as well as on my detailed experiments with this exciting stuff...
In short, the idea is to exploit the GPU of your graphics card as a /highest/ speed "coprocessor", supporting most importantly also "parallelization" of tasks. For me the latter technique is very familiar. We do this in lattice gauge theory since years with specially designed, horribly expensive processor arrays (that talk to each other via high-end = superfast ethernets).
But the fun is now that everyone with a reasonably modern graphics card can profit enormously from this additional power! The trick is NVIDIA's CUDA with its compiler driver 'nvcc' that understands the C programming language. This makes GPU computing a fairly straightforward affair, notably for people like me who know from professional experience how to parallelize some given code.
GPU tasks are typically most effective, if always the same type of operations have to be applied to many different blocks of data! That's why DXT compression is ideal for amazing speed increases via GPUs!
The next figure explains why you should spend your last penny on a GeForce 8800 GTX card
Unfortunately, my present equipment is far from optimal for GPU computing. My old FX 5900 Ultra corresponds to the left-hand red dot (NV35 chip), while my Core2Duo notebook has a more poweful G72M chip (right-hand dot), yet I cannot install a CUDA supporting driver, since DELL doesn't offer one and the standard NVIDIA drivers don't work for my notebook...too bad. Also for < G80 chips, the CUDA code only works in "emulation mode" so far.
Chris with his 8800 GTX= G80 chip is in a really good situation for GPU computing. NVIDIA claims that execution speeds are like a factor 10 faster than a 3.0 GHz Core2Duo CPU!! (cf. Figure)
I spent quite a bit of time during last week and over the weekend to experiment with CUDA nevertheless. I will tell you more when there is more time.
NOTE: it all works equally for Windows AND Linux!
But most importantly, there are now the new NVIDIA texture tools 2, which you may download and install via simple Setup.exe. They come with full source code and it was no problem to compile (and modify!) the code both under Windows with my VC++ .NET 2003 and under Linux with gcc! They also work without GPU acceleration, of course!
Compared to the classical NV texture tools, they now involve Simon Brown's squish DXT compressor library. Not the latest version (1.7 instead of 1.9), though. This compressor I know very well, since I am in contact with Simon and anyway want to implement it into my forthcoming txtools.
However their new DXT5n format amusingly equals now Chris' original DXT5nm variant, i.e. with r<->g interchanged. . It's very easy to modify.
Anyway there is a -fast option of somewhat lower quality. It's as fast as my DeVIL-based DXT5nm code and about the same quality.
But then there is the quite slow highest-quality version!
It produces even lower RMS errors than NVIDIA's best quality compression via their standard texture tools. On my 3.2 Ghz P4/3GB RAM it takes about 3 secs for DXT5nm compressing a 1k x1k tile.
This squish-based algorithm is really VERY impressive for DXT5nm normalmap compression. If one switches on -msse -msse2 -mmmx options then the non-CUDA suported code also becomes pretty usable. Of course, I immediately modified the code to output .dxt5nm format with proper dxt5nm file endings and applied the new compressor to my 64k monster tiles.
+++++++++++
The best is that this stuff works equally for Windows and Linux... (and supposedly also for OSX, yet I don't have one)
+++++++++++
Finally, here is a little "appetizer" from my new DXT5nm compressed tiles of my 64k texture set. The result is certainly the best of what I have seen so far!
Bye Fridger
One such new "hotspot" is certainly GPU Supported Texture Compression advocated a lot by NVIDIA recently! Chris also quoted a respective link to NVIDIA's developer site.
http://developer.nvidia.com/object/texture_tools.html
http://developer.nvidia.com/object/cuda.html
If I have a little more time, I'll report here about some more basic issues related to GPU computing as well as on my detailed experiments with this exciting stuff...
In short, the idea is to exploit the GPU of your graphics card as a /highest/ speed "coprocessor", supporting most importantly also "parallelization" of tasks. For me the latter technique is very familiar. We do this in lattice gauge theory since years with specially designed, horribly expensive processor arrays (that talk to each other via high-end = superfast ethernets).
But the fun is now that everyone with a reasonably modern graphics card can profit enormously from this additional power! The trick is NVIDIA's CUDA with its compiler driver 'nvcc' that understands the C programming language. This makes GPU computing a fairly straightforward affair, notably for people like me who know from professional experience how to parallelize some given code.
GPU tasks are typically most effective, if always the same type of operations have to be applied to many different blocks of data! That's why DXT compression is ideal for amazing speed increases via GPUs!
The next figure explains why you should spend your last penny on a GeForce 8800 GTX card
Unfortunately, my present equipment is far from optimal for GPU computing. My old FX 5900 Ultra corresponds to the left-hand red dot (NV35 chip), while my Core2Duo notebook has a more poweful G72M chip (right-hand dot), yet I cannot install a CUDA supporting driver, since DELL doesn't offer one and the standard NVIDIA drivers don't work for my notebook...too bad. Also for < G80 chips, the CUDA code only works in "emulation mode" so far.
Chris with his 8800 GTX= G80 chip is in a really good situation for GPU computing. NVIDIA claims that execution speeds are like a factor 10 faster than a 3.0 GHz Core2Duo CPU!! (cf. Figure)
I spent quite a bit of time during last week and over the weekend to experiment with CUDA nevertheless. I will tell you more when there is more time.
NOTE: it all works equally for Windows AND Linux!
But most importantly, there are now the new NVIDIA texture tools 2, which you may download and install via simple Setup.exe. They come with full source code and it was no problem to compile (and modify!) the code both under Windows with my VC++ .NET 2003 and under Linux with gcc! They also work without GPU acceleration, of course!
Compared to the classical NV texture tools, they now involve Simon Brown's squish DXT compressor library. Not the latest version (1.7 instead of 1.9), though. This compressor I know very well, since I am in contact with Simon and anyway want to implement it into my forthcoming txtools.
However their new DXT5n format amusingly equals now Chris' original DXT5nm variant, i.e. with r<->g interchanged. . It's very easy to modify.
Anyway there is a -fast option of somewhat lower quality. It's as fast as my DeVIL-based DXT5nm code and about the same quality.
But then there is the quite slow highest-quality version!
It produces even lower RMS errors than NVIDIA's best quality compression via their standard texture tools. On my 3.2 Ghz P4/3GB RAM it takes about 3 secs for DXT5nm compressing a 1k x1k tile.
This squish-based algorithm is really VERY impressive for DXT5nm normalmap compression. If one switches on -msse -msse2 -mmmx options then the non-CUDA suported code also becomes pretty usable. Of course, I immediately modified the code to output .dxt5nm format with proper dxt5nm file endings and applied the new compressor to my 64k monster tiles.
+++++++++++
The best is that this stuff works equally for Windows and Linux... (and supposedly also for OSX, yet I don't have one)
+++++++++++
Finally, here is a little "appetizer" from my new DXT5nm compressed tiles of my 64k texture set. The result is certainly the best of what I have seen so far!
Bye Fridger
Last edited by t00fri on 12.03.2007, 23:03, edited 2 times in total.
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 7 months
- Location: Hamburg, Germany
For people interested to try the new NV tools 2 out, here are the most important mods you need to apply to the code:
nvcompress modifications for Celestia:
=======================
1) Modify output dds-ending in 'compress.cpp' for -bc3n option
2) DXT5n-format exchanges channels via the method swizzleDXT5n() in ColorBlock.cpp. Flip back r<->g to Celestia format.
3) The DXT5n = RXGB format is incompatible with Celestia, Gimp, XnView read routines due to the FOURCC header entry RXGB instead of DXT5.
cf. method 'setFOURCC()' in DirectDrawSurface.cpp.
Modify DXT5n header in dxtlib.cpp
4) In compress.cpp, switch from
compressionOptions.setQuality(nvtt::Quality_Normal);
to
compressionOptions.setQuality(nvtt::Quality_Production, 0.5f);
Enjoy,
Fridger
nvcompress modifications for Celestia:
=======================
1) Modify output dds-ending in 'compress.cpp' for -bc3n option
Code: Select all
if (format == nvtt::Format_BC3n)
output.append(".dxt5nm");
else
output.append(".dds");
2) DXT5n-format exchanges channels via the method swizzleDXT5n() in ColorBlock.cpp. Flip back r<->g to Celestia format.
Code: Select all
void ColorBlock::swizzleDXT5n()
{
for(int i = 0; i < 16; i++)
{
Color32 c = m_color[i];
// m_color[i] = Color32(0, c.r, 0, c.g);
m_color[i] = Color32(0, c.g, 0, c.r);
}
}
3) The DXT5n = RXGB format is incompatible with Celestia, Gimp, XnView read routines due to the FOURCC header entry RXGB instead of DXT5.
cf. method 'setFOURCC()' in DirectDrawSurface.cpp.
Modify DXT5n header in dxtlib.cpp
Code: Select all
else if (compressionOptions.format == Format_DXT5n) {
// header.setFourCC('R', 'X', 'G', 'B');
header.setFourCC('D', 'X', 'T', '5');
}
4) In compress.cpp, switch from
compressionOptions.setQuality(nvtt::Quality_Normal);
to
compressionOptions.setQuality(nvtt::Quality_Production, 0.5f);
Enjoy,
Fridger
-
- Developer
- Posts: 3776
- Joined: 04.02.2005
- With us: 19 years 9 months
-
- Site Admin
- Posts: 4211
- Joined: 28.01.2002
- With us: 22 years 9 months
- Location: Seattle, Washington, USA
t00fri wrote:3) The DXT5n = RXGB format is incompatible with Celestia, Gimp, XnView read routines due to the FOURCC header entry RXGB instead of DXT5.
cf. method 'setFOURCC()' in DirectDrawSurface.cpp.
Modify DXT5n header in dxtlib.cppCode: Select all
else if (compressionOptions.format == Format_DXT5n) {
// header.setFourCC('R', 'X', 'G', 'B');
header.setFourCC('D', 'X', 'T', '5');
}
I think that the right thing to do is to modify Celestia so that it recognizes dds files with the FOURCC code RXGB as compresed normal maps. This is actually a more elegant approach than the dxt5nm extension, though that could still be allowed as an option for normal maps produced by tools other than NVIDIA's TextureTools. When Celestia sees this FourCC code, it could even switch to the different channel mapping (though I wish that the TextureTools developers had followed precendent and not reassigned the channels.)
--Chris
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 7 months
- Location: Hamburg, Germany
chris wrote:
I think that the right thing to do is to modify Celestia so that it recognizes dds files with the FOURCC code RXGB as compresed normal maps. This is actually a more elegant approach than the dxt5nm extension, though that could still be allowed as an option for normal maps produced by tools other than NVIDIA's TextureTools. When Celestia sees this FourCC code, it could even switch to the different channel mapping (though I wish that the TextureTools developers had followed precendent and not reassigned the channels.)
--Chris
Well,...yes and no. One issue being that none of the standard image manipulation programs is presently able to read RXGB format, if that is using the FourCC code 'R','X','G','B' in the header! (NVIDIA's WTV, GIMP DXT plugin, XnView, IrfanView,...) Except for a DeVIL-based tool, 'showgl' that I wrote quite a while ago.
Also the new texture tools are ALPHA code and thus many things might still change...It's bad to adapt with Celestia each time
Bye Fridger
-
- Site Admin
- Posts: 4211
- Joined: 28.01.2002
- With us: 22 years 9 months
- Location: Seattle, Washington, USA
t00fri wrote:Also the new texture tools are ALPHA code and thus many things might still change...It's bad to adapt with Celestia each time
Good point . . . I do hope that the RXGB FourCC code emerges as a standard way to identify DXT5nm textures, but I'll wait before modifying Celestia to actually do anything with it.
--Chris
Re: GPU computing (CUDA) & new NV DXTcompressor (squish)
It appears that one year from now,I will have the WORST PC of the community.Everyone will have money,but I won??t until at least the middle of 2008 and probably even after that.Celestia will be each time,more and more demanding and can??t do anything about this because I lost my scholarship.Very sad,indeed...
[unnecessary quote of previous message removed ...s.]
[unnecessary quote of previous message removed ...s.]
-
- Posts: 420
- Joined: 21.02.2002
- With us: 22 years 9 months
- Location: Darmstadt, Germany.
Ah Daniel,
could you please, please, please, por favor:
1) not include such massive blocks of quote (especially ones from Fridger with huge graphics inside) in your posts. That alone extended the length of the page by 60% - or screen heights - and makes finding following posts a hard job to do with so much scrolling. For example, my next post which believe is worthy and informative to you and the developers...). Please do edit yours to delete that quote.
[ Ah, that's been fixed now!]
2) note that if you recently purchased a new computer (I'm sure you wrote that somewhere a few months ago) then your computer cannot beat my 6 year old computer for the title of "the WORST PC of the community", and I have no plans to upgrade it.
Ta,
Spiff.
could you please, please, please, por favor:
1) not include such massive blocks of quote (especially ones from Fridger with huge graphics inside) in your posts. That alone extended the length of the page by 60% - or screen heights - and makes finding following posts a hard job to do with so much scrolling. For example, my next post which believe is worthy and informative to you and the developers...). Please do edit yours to delete that quote.
[ Ah, that's been fixed now!]
2) note that if you recently purchased a new computer (I'm sure you wrote that somewhere a few months ago) then your computer cannot beat my 6 year old computer for the title of "the WORST PC of the community", and I have no plans to upgrade it.
Ta,
Spiff.
Last edited by Spaceman Spiff on 17.03.2007, 11:46, edited 1 time in total.
-
- Posts: 420
- Joined: 21.02.2002
- With us: 22 years 9 months
- Location: Darmstadt, Germany.
Something new from Germany to keep an eye on: Rays light up life-like graphics ( http://news.bbc.co.uk/1/hi/technology/6457951.stm ).
Open RT is the thing...
Spiff.
Open RT is the thing...
Spiff.
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 7 months
- Location: Hamburg, Germany
Re: GPU computing (CUDA) & new NV DXTcompressor (squish)
danielj wrote:It appears that one year from now,I will have the WORST PC of the community.Everyone will have money,but I won??t until at least the middle of 2008 and probably even after that.Celestia will be each time,more and more demanding and can??t do anything about this because I lost my scholarship.Very sad,indeed...
Daniel,
you just don't want to understand.
Celestia has remained a one-click installable program in its default version that is meant for END-USERS, as you tend to call yourself. Your hardware is perfectly adequate for this official distribution. If YOU want to profit from more advanced features (beyond the low-quality stuff available at the Motherlode), YOU have to LEARN how to do things yourself and to contribute something to the community.
It was always like this.
Some people can learn fast, others need some more time, but EVERYONE CAN LEARN something!
In the course of the past year or so, MANY active members of this community (except yourself!) have learned to download Celestia from CVS or also my texture tools and to build these programs (or large textures) themselves. They now can actively help the devs in debugging the CVS version...This a VERY valuable contribution!
You still do nothing but complain.
Moreover, there is a lot of motivation among the devs to make Celestia more attractive for serious, scientific vizualisation tasks. That's why Chris has put in a lot of work recently to incorporate a general approach to frames of reference, to implement the professional SPICE interface and to allow for real-time orientation changes e.g. of spacecraft. I am proceeding to implement vastly more deep-space objects from professional catalogs, for example.
As a result of such endeavours, we now can be proud that Celestia has become a well appreciated vizualisation tool also for NASA and ESA space missions.
Last not least, your graphics card is a much more advanced model than my old FX 5900. So even there you have NO point whatsoever.
Bye Fridger
-
- Posts: 435
- Joined: 25.08.2004
- With us: 20 years 2 months
- Location: Brittany, close to the Ocean
Re: GPU computing (CUDA) & new NV DXTcompressor (squish)
t00fri wrote:
... while my Core2Duo notebook has a more poweful G72M chip (right-hand dot), yet I cannot install a CUDA supporting driver, since DELL doesn't offer one and the standard NVIDIA drivers don't work for my notebook...
t00fri, regarding your DELL laptop, you may want to have a look here:
http://www.laptopvideo2go.com
This guy is an active member of the NvNews forums and maintains modified inf setup files for mobile chips.
He has made himself a reputation and earned the respect of the community in this area.
Intel core i7 3770 Ivy Bridge @ 4.4 GHz -16 GB ram - 128 GB SSD cache - AMD Radeon 7970 3 GB o'clocked - Windows 7 64 Ultimate / Linux Kubuntu
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 7 months
- Location: Hamburg, Germany
Re: GPU computing (CUDA) & new NV DXTcompressor (squish)
Boux wrote:t00fri wrote:
... while my Core2Duo notebook has a more poweful G72M chip (right-hand dot), yet I cannot install a CUDA supporting driver, since DELL doesn't offer one and the standard NVIDIA drivers don't work for my notebook...
t00fri, regarding your DELL laptop, you may want to have a look here:
http://www.laptopvideo2go.com
This guy is an active member of the NvNews forums and maintains modified inf setup files for mobile chips.
He has made himself a reputation and earned the respect of the community in this area.
Many thanks, Boux!
I wasn't aware of this site. Works all very well. Now the name of my card has slightly changed from Quadro NVS 110M to Geforce Go 7300. Both use the G72M chip, though.
I have tried the two newest drivers and both work fine.
Bye Fridger
Hey guys, if you want to make suggestions or contributions to the nvidia texture tools, you are welcome to do so at the google project website:
http://code.google.com/p/nvidia-texture-tools/
Feel free to open bug reports or requests. There's also a mailing list where you can send patches or ask questions:
http://groups.google.com/group/nvidia-texture-tools
Hope that helps!
http://code.google.com/p/nvidia-texture-tools/
Feel free to open bug reports or requests. There's also a mailing list where you can send patches or ask questions:
http://groups.google.com/group/nvidia-texture-tools
Hope that helps!
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 7 months
- Location: Hamburg, Germany
icastano wrote:Hey guys, if you want to make suggestions or contributions to the nvidia texture tools, you are welcome to do so at the google project website:
http://code.google.com/p/nvidia-texture-tools/
Feel free to open bug reports or requests. There's also a mailing list where you can send patches or ask questions:
http://groups.google.com/group/nvidia-texture-tools
Hope that helps!
Great, thanks. I am following attentively what's going on at this exciting front. We really have a LOT of use for good /and/ fast DXT compression. I have been in contact with Simon Brown and noticed the new squish version...
++++++++++++++
How would you judge the quality of your fast DXT compressor as compared to squish and DevIL, respectively??
++++++++++++++
Bye Fridger
Last edited by t00fri on 03.05.2007, 21:02, edited 1 time in total.
I'm not sure about DevIL, but last time I looked I think it was a low quality fast compressor. That might have changed recently, though.
The latest squish release basically contains the same optimizations that we did for our texture tools. My goal is to use squish unmodified in the future, there's no point in having two open source libraries to do the same. However, our version still has some advantages over Simon's:
- improved grid fitting (color quantization).
- faster least squares line fitting.
- custom color weighting.
Although I hope those also get added to squish in the future too.
In the latest release, Simon has removed the scalar version of the code and I think that's unfortunate. The vectorized version is harder to understand, so I'd still like to keep the scalar code around, even if it's just for documentation purposes.
The latest squish release basically contains the same optimizations that we did for our texture tools. My goal is to use squish unmodified in the future, there's no point in having two open source libraries to do the same. However, our version still has some advantages over Simon's:
- improved grid fitting (color quantization).
- faster least squares line fitting.
- custom color weighting.
Although I hope those also get added to squish in the future too.
In the latest release, Simon has removed the scalar version of the code and I think that's unfortunate. The vectorized version is harder to understand, so I'd still like to keep the scalar code around, even if it's just for documentation purposes.
Hmm... rereading your question I noticed that you might be asking about the *fast* compression mode. Our fast compressor is based on id software's paper:
http://www.intel.com/cd/ids/developer/a ... 324337.htm
However, once the indices have been computed, I optimize the endpoints using least squares. You can find the code for that in:
http://nvidia-texture-tools.googlecode. ... essDXT.cpp
in the 'nv::optimizeEndPoints' function.
The objective of this compressor is to be fast, but not real-time. For real-time compression I'd recommend to have a look at the following SDK example:
http://developer.download.nvidia.com/SD ... mpress_DXT
http://www.intel.com/cd/ids/developer/a ... 324337.htm
However, once the indices have been computed, I optimize the endpoints using least squares. You can find the code for that in:
http://nvidia-texture-tools.googlecode. ... essDXT.cpp
in the 'nv::optimizeEndPoints' function.
The objective of this compressor is to be fast, but not real-time. For real-time compression I'd recommend to have a look at the following SDK example:
http://developer.download.nvidia.com/SD ... mpress_DXT
-
Topic authort00fri
- Developer
- Posts: 8772
- Joined: 29.03.2002
- Age: 22
- With us: 22 years 7 months
- Location: Hamburg, Germany
icastano wrote:Hmm... rereading your question I noticed that you might be asking about the *fast* compression mode. Our fast compressor is based on id software's paper:
http://www.intel.com/cd/ids/developer/a ... 324337.htm
Many thanks, that paper is very interesting. I think high quality DXT5 compression of color images by using the YCoCg color space would also be ideal for RGB hires tiles of monster textures in Celestia!
Chris?
What do you think? I guess, we could easily support the respective unpacking of such YCoCg-DXT5 compression, with luminance (Y) stored in the alpha channel and the chrominance (CoCg) being stored in the first two of the 5:6:5 color channels. For color images this results in a 3:1 or 4:1 compression ratio.
All that is needed for unpacking is
Code: Select all
R = Y + Co - Cg
G = Y + Cg
B = Y - Co - Cg
The objective of this compressor is to be fast, but not real-time.
For the extremely hires normalmaps we are concerned with in 64k x 32k Earth textures, smoothness and noise-freedom are paramount requirements. Noise freedom is largely effected by doing the normalmap conversion of the published, scientific elevation (gray) maps entirely at the 16bit integer level with truncation to 8bit done only in the final step.
Moreover, just the highest VT tile level for such a texture involves 2048 1k x 1k normalmap tiles! So obviously DXT compression speed is also important and not everyone owns a G80 GPU
Another important requirement is of course that proper texel normalization (r*r+g*g+b*b = 1) is pertained for the normalmaps throughout all operations. That's why DXT5nm format comes handy, where the normalization is always manifest. However, in Celestia we are still struggling with the issue, which variety of DXT5nm we should finally support (by default):
The old NVDXt tools map r-> alpha while the new tools map g->alpha in DXT5nm. Either is fine, but we should settle for one choice soon or later. I know that the old nv tools support a -switchRG option...this would also be nice for nvcompress.
Bye Fridger
Adding support for YCoCg was on my TODO list, I already added some information about it on the wiki:
http://code.google.com/p/nvidia-texture ... sionTricks
If you point me to some of your textures, I can add them to my testsuite, so that I have them in consideration when tuning the algorithms.
Normal map compression in particular, can be optimized. Currently that code is not vectorized, and that could provide a 4x speedup.
I never realized that my normal map components were in the opposite order. I should probably swap them to match the old nvdxt.
Please, open issues at the google project for the features that you think are more important, so that I can prioritize them.
Thanks!
http://code.google.com/p/nvidia-texture ... sionTricks
If you point me to some of your textures, I can add them to my testsuite, so that I have them in consideration when tuning the algorithms.
Normal map compression in particular, can be optimized. Currently that code is not vectorized, and that could provide a 4x speedup.
I never realized that my normal map components were in the opposite order. I should probably swap them to match the old nvdxt.
Please, open issues at the google project for the features that you think are more important, so that I can prioritize them.
Thanks!