GPU computing (CUDA) & new NV DXTcompressor (squish)

Discussion forum for Celestia developers; topics may only be started by members of the developers group, but anyone can post replies.
Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 7 months
Location: Hamburg, Germany

Post #21by t00fri » 04.05.2007, 19:53

icastano wrote:Adding support for YCoCg was on my TODO list, I already added some information about it on the wiki:

http://code.google.com/p/nvidia-texture ... sionTricks

Great!

If you point me to some of your textures, I can add them to my testsuite, so that I have them in consideration when tuning the algorithms.

With pleasure:

Here are the links to two of the 2048 level5 1k x 1k normalmap tiles for the 64k x 32k Earth in lossless PNG format. These got to be compressed into DXT5nm format.

http://www.celestiaproject.net/~t00fri/images/tx_18_17.png
http://www.celestiaproject.net/~t00fri/images/tx_19_18.png

Note how smooth and noisefee they are: Here is an example display that I color-inverted for better visual 3d impression:

Image

The normalmap tiles have been generated with all optimizations and in highest quality with my 'nmtools' package that is available from our CelestialMatters site (cross-platform, GPL, OpenSource):

http://www.celestialmatters.org/cm/index.shtml

Here are some nice shots of the views these 'monster' normalmaps generate...

http://www.celestialmatters.org/cm/host ... s/nmtools/

There is also a detailed tutorial

http://www.celestialmatters.org/cm/host ... al00.shtml

Normal map compression in particular, can be optimized. Currently that code is not vectorized, and that could provide a 4x speedup.

Yes, indeed. Simon and I were discussing about this possibility. Certainly an important target to go for!


I never realized that my normal map components were in the opposite order. I should probably swap them to match the old nvdxt.

Yeah...

Please, open issues at the google project for the features that you think are more important, so that I can prioritize them.


Well with ~ 5 000 000 Celestia downloads from SF, we feel pretty much at home here ;-), but I shall try my best.

Thanks,
Fridger


PS: Do you also have a first name?
Image

chris
Site Admin
Posts: 4211
Joined: 28.01.2002
With us: 22 years 9 months
Location: Seattle, Washington, USA

Post #22by chris » 04.05.2007, 21:22

t00fri wrote:
icastano wrote:Hmm... rereading your question I noticed that you might be asking about the *fast* compression mode. Our fast compressor is based on id software's paper:

http://www.intel.com/cd/ids/developer/a ... 324337.htm

Many thanks, that paper is very interesting. I think high quality DXT5 compression of color images by using the YCoCg color space would also be ideal for RGB hires tiles of monster textures in Celestia!

Chris?

I read this too and was intrigued . . . It's a simple idea, and not difficult at all to implement in Celestia's shaders. I'd advocate a similar approach to the one we use for dxt5nm textures: a custom file extension to indicate that the color channels should be interpreted as YCoCg. It would take only about an hour to do it.

The old NVDXt tools map r-> alpha while the new tools map g->alpha in DXT5nm. Either is fine, but we should settle for one choice soon or later. I know that the old nv tools support a -switchRG option...this would also be nice for nvcompress.


Yes, if anyone knows which mapping is predominant, I'd like to know. I was about revert to Celestia's old approach . . .

--Chris

chris
Site Admin
Posts: 4211
Joined: 28.01.2002
With us: 22 years 9 months
Location: Seattle, Washington, USA

Post #23by chris » 04.05.2007, 21:27

t00fri wrote:

I never realized that my normal map components were in the opposite order. I should probably swap them to match the old nvdxt.

Yeah...


Aha! So it seems that Celestia's current handling of normal components in DXT5 textures is ok.

(And hello Ignacio--I'm also an NVIDIA employee, working up in the Bellevue office.)

--Chris

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 7 months
Location: Hamburg, Germany

Post #24by t00fri » 04.05.2007, 21:52

chris wrote:
t00fri wrote:

I never realized that my normal map components were in the opposite order. I should probably swap them to match the old nvdxt.

Yeah...

Aha! So it seems that Celestia's current handling of normal components in DXT5 textures is ok.

(And hello Ignacio--I'm also an NVIDIA employee, working up in the Bellevue office.)

--Chris

Chris,

now I am confused ;-) (or too tired...)

Here is what I wrote in my email to you, recently:

4) As I also wrote in the forum, the story goes on and now the NEW
(squish-based) NVIDIA DXT tools (nvcompress) use your ORIGINAL
assignment g->alpha, r->g.

I think this statement is still correct: your original assignment was g-> alpha which was in disagreement with the old NVDXT format. Then you switched recently to
r->alpha in CVS. That is now in agreement with the old NVDXT, but does not match nvcompress ;-) as Ignacio just implicitly confirmed.

So what do you take as a reference when writing:
Aha! So it seems that Celestia's current handling of normal components in DXT5 textures is ok.


;-)

Bye Fridger
Image

chris
Site Admin
Posts: 4211
Joined: 28.01.2002
With us: 22 years 9 months
Location: Seattle, Washington, USA

Post #25by chris » 07.05.2007, 21:04

t00fri wrote:
chris wrote:
t00fri wrote:

I never realized that my normal map components were in the opposite order. I should probably swap them to match the old nvdxt.

Yeah...

Aha! So it seems that Celestia's current handling of normal components in DXT5 textures is ok.

Chris,

now I am confused ;-) (or too tired...)

Here is what I wrote in my email to you, recently:

4) As I also wrote in the forum, the story goes on and now the NEW
(squish-based) NVIDIA DXT tools (nvcompress) use your ORIGINAL
assignment g->alpha, r->g.

I think this statement is still correct: your original assignment was g-> alpha which was in disagreement with the old NVDXT format. Then you switched recently to
r->alpha in CVS. That is now in agreement with the old NVDXT, but does not match nvcompress ;-) as Ignacio just implicitly confirmed.

So what do you take as a reference when writing:
Aha! So it seems that Celestia's current handling of normal components in DXT5 textures is ok.

;-)


My understanding of this increasingly confusing situation is that Celestia currently matches the normal component conventions of NVDXT, but not nvcompress. I read Ignacio's message to mean that he had intended nvcompress to match NVDXT, and that he plans on changing nvcompress so that it does. With this change to nvcompress, Celestia (as coded now) will use the same compressed normal map conventions as both NVDXT and nvcompress.

--Chris

Avatar
Topic author
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 7 months
Location: Hamburg, Germany

Post #26by t00fri » 07.05.2007, 22:07

chris wrote:My understanding of this increasingly confusing situation is that Celestia currently matches the normal component conventions of NVDXT, but not nvcompress. I read Ignacio's message to mean that he had intended nvcompress to match NVDXT, and that he plans on changing nvcompress so that it does. With this change to nvcompress, Celestia (as coded now) will use the same compressed normal map conventions as both NVDXT and nvcompress.

--Chris


OK, then we agree completely ;-).

Bye Fridger
Image

icastano
Posts: 11
Joined: 03.05.2007
With us: 17 years 6 months

Post #27by icastano » 14.05.2007, 04:55

Hey guys, sorry for not following the thread closely, I thought I would receive a notification every time somebody replies, but apparently that's not the case! Anyway, I'll try to have an update during this week with the normal map fixes and other improvements. BTW, thanks for the normal map textures!

MrE
Posts: 2
Joined: 30.05.2007
With us: 17 years 5 months

Post #28by MrE » 30.05.2007, 04:33

Hey Ignacio ;)

For anyone interested there's another related article on real-time texture streaming available on the Intel web site.

http://softwarecommunity.intel.com/arti ... g/1221.htm

MrE,

icastano
Posts: 11
Joined: 03.05.2007
With us: 17 years 6 months

Post #29by icastano » 30.05.2007, 08:05

Thanks for pointing it out. I also saw that article recently. Their compression algorithm is interesting, but in my opinion they are not doing anything really innovative, it's just a jpeg-style compressor with some tweaks for faster decompression. It's however a highly optimized implementation.

I'm impressed by Waveren's work, but I personally wouldn't recommend anybody to do the same, unless they are MMX/SSE assembly guru's. I think some stages of their streaming pipeline are much easier to implement (and probably more efficient too) on the GPU, specially the iDCT, color conversions and DXT compression. However, SSE extensions are more widely available than latest graphics hardware, so I guess it still makes some sense to make that effort.

MrE
Posts: 2
Joined: 30.05.2007
With us: 17 years 5 months

Post #30by MrE » 30.05.2007, 13:00

My apologies for the confusion, should have chosen a better nickname. I actually wrote the articles. Excellent work on the CUDA implementation and the DXT tools. It's great to see more of this out in the open. The reason I mentioned the second article is because it seems people have a hard time finding it on the Intel site.

Agreed in that the article doesn't describe something all that innovative. However, the use of the YCoCg color space allows you to go straight through to a better quality hardware format where the color conversion is done on the GPU.

Also agreed that the MMX/SSE2 code is probably not for everyone. However, the more reason to release the full source code.

Doing compression and decompression on the GPU is definitely something that will become more common. The problem with decompression is that except perhaps when using the latest hardware it's hard to implement an efficient entropy decoder on the GPU. Most people are also not that fortunate to have the latest graphics hardware. Splitting a decompressor across the CPU and GPU is usually not a good idea.

The performance of the GPU implementations is by no means unimpressive. However, the performance is typically measured using all the GPU resources available where the GPU is doing no other work. We have a tendency to swamp the GPU with rendering which leaves little room to also do de-re-compression on the GPU. With all the assembler optimizations the pipeline described in the articles consumes only a small percentage of the available CPU time (typically < 10% even on a low end CPU) and a relatively low end graphics card can be used for rendering.

Furthermore multi-core CPUs are rapidly taking over the PC gaming landscape and compared to the latest graphics cards the individual cores are dirt cheap. With a definite trend to a growing number of cores we end up with all these cores laying around that can easily be used for some de-re-compression work.

However, don't get me wrong. The power of the GPU has a tendency to grow at an impressive rate. The cutoff point for doing work on the GPU vs. the CPU shifts around all the time and any research done on fast GPU implementations is most welcome :)

icastano
Posts: 11
Joined: 03.05.2007
With us: 17 years 6 months

Post #31by icastano » 31.05.2007, 02:21

I agree it's hard to find, I only learned about it thanks to a google alert, but didn't find any link to it anywhere else. :)

I agree that entropy decoders are very hard to parallelize, the best you can do is to use multiple streams and decode them in parallel (I think that's what jpeg2k does), but that model doesn't map very well to the GPU. In my opinion new data-parallel compression algorithms need to be developed.

Another possibility would be to use the specialized video processors available on some chipsets, but again availability is limited and at the end data-parallel algorithms will scale better.

Fightspit
Posts: 510
Joined: 15.05.2005
With us: 19 years 6 months

Post #32by Fightspit » 27.06.2007, 18:54

The new version of CUDA (1.0) is available:

http://developer.nvidia.com/object/cuda.html#downloads

You can find here the release notes.
Motherboard: Intel D975XBX2
Processor: Intel Core2 E6700 @ 3Ghz
Ram: Corsair 2 x 1GB DDR2 PC6400
Video Card: Nvidia GeForce 8800 GTX 768MB GDDR3 384 bits PCI-Express 16x
HDD: Western Digital Raptor 150GB 10000 rpm
OS: Windows Vista Business 32 bits

icastano
Posts: 11
Joined: 03.05.2007
With us: 17 years 6 months

Post #33by icastano » 27.06.2007, 21:23

Yup, the code in subversion should work fine with the new CUDA release:

http://code.google.com/p/nvidia-texture-tools/source

The binaries at the NVIDIA website will be updated shortly.

http://developer.nvidia.com/object/texture_tools.html


Return to “Ideas & News”