Page 1 of 4

Post-1.6.0 performance

Posted: 04.09.2009, 01:01
by chris
Several people have expressed concerns about the performance of the current SVN version of Celestia. The SVN code isn't near a release state--there's lots of feature and optimization work left to be done. Don't despair if your freshly compiled SVN version of Celestia seems to be running slower than it should. There's probably a simple reason for it that will be addressed well before the release of the next major version. Here are a few cel URLs you can use to try and track down the source of any slowdown (or speedup :) ):

1. Earth view at startup, no DSOs:
cel://Follow/Sol:Earth/2009-09-04T00:37 ... rc=0&ver=3

2. Antarctica zoom:
cel://Follow/Sol:Earth/2009-09-04T00:39 ... rc=0&ver=3

3. Just stars:
cel://Follow/Sol:Earth/2009-09-04T00:41 ... rc=0&ver=3

4. Stars + DSOs:
cel://Follow/Sol:Earth/2009-09-04T00:42 ... rc=0&ver=3

5. Orbits:
cel://Follow/Sol/2009-09-04T00:44:46.59 ... rc=0&ver=3

To benchmark, do the following:
- Disable sync to vertical refresh in you graphics card control panel
- Start Celestia with just the basic package content (i.e. no add-ons)
- Turn on the frames per second display by pressing ` (backquote).
- The frame rate will fluctuate somewhat; I'm in the habit of reporting the peak frame rate

For this benchmark, it's best to run without anti-aliasing to see more dramatic performance differences.

There are four changes since 1.6.0 that are most likely to affect performance: the switch to Eigen, enabling anisotropic filtering, enabling vertex buffer objects (VBOs), and the new orbit drawing code. Here are the changes most likely to affect the performance for the five cel URLs listed:

1. VBOs, Anisotropic filtering
2. Anisotropic filtering, VBOs (aniso will have more of an impact for this URL than for #1)
3. Eigen
4. Eigen
5. New orbit code, Eigen

So, if you see a slowdown when viewing URLs 1 and 2, it would suggest that either VBOs or anisotropic filtering is responsible. Etc...

For those interested, here are some more details on each of the four performance affecting changes:

Conversion to Eigen
In version 1.6.0, Celestia used a custom set of classes for arithmetic on vectors, matrices, and quaternions. These are replaced in the SVN version by the classes from the Eigen linear algebra library. The switch to Eigen was a very intrusive change--almost every Celestia source file had to be changed. The motivations for switching were code clarity and performance (see this thread for details: viewtopic.php?f=10&t=14025.) Eigen is optimized to use a CPU's vector instruction set--SSE2 on Intel, AltiVec on PowerPC. Some things may actually run a bit slower without SSE2/Altivec enabled. There were a few places in the Celestia code where I've optimized code for vectorization at the expense of lower performance without vectorization. In some cases, compiler settings to enable vectorization haven't been enabled in the project files in SVN (yet): the XCode project for Mac, and Qt4 project for Mac and Windows will probably--it depends on the OS version an compiler install--build Celestia without vectorization, resulting in reduced performance. We're working on it... When vectorization is enabled and all compiler flags are set properly, Celestia should run faster than before.

Anisotropic filtering
Anisotropic texture filtering was enable by default in Celestia after 1.6.0. Turning it on greatly improves the appearance of planet texture near the poles, but it does come at some cost in performance. Exactly how much depends on the graphics card--in some cases the performance drop is hardly noticeable, while in other situations it can reduce the frame rate by 25% or more. The choice between quality and speed should be up to the user, and the plan is to provide 'Render Quality' GUI that will give the user control over anisotropic filtering and other settings.

Vertex buffers objects
Vertex buffer objects (VBOs) are a relatively new OpenGL feature that allow 3D applications to use graphics hardware more efficiently. After 1.6.0, I turned on a switch to use VBOs for planet rendering. Several people on the developers list tried out the change, and the results ranged from a very small speed increase to over 30% gains with newer hardware. It's possible that older graphics cards and/or drivers don't support VBOs well, and might actually run slower with VBOs on. Something to watch out for, but so far, we haven't seen this happen.

New trajectory rendering
The code to draw orbit paths has been completely rewritten (and is still being updated.) The new code draws orbits much more accurately, and in a lot of cases it's also faster (around 100% faster when drawing the orbit of Cassini.) We're still looking at the performance of the new orbit code. It's possible that it will be slower for some objects and/or viewpoints. The new orbit code will only affect performance when orbit paths are visible; when they're disable, the new orbit code isn't used at all.

--Chris

Re: Post-1.6.0 performance

Posted: 04.09.2009, 01:16
by chris
Here are the results for my system:

1.6.0 / SVN 4864
1. 263 / 423 fps
2. 110 / 298 fps
3. 430 / 428 fps
4. 267 / 251 fps
5. 356 / 365 fps

I ran with a window size of roughly 1300x900. The system hardware and software:
CPU: Intel i7 quad core 2.66GHz
GPU: GeForce 285 GTX, 1GB VRAM
12GB RAM
Windows 7, 64-bit (still a 32-bit version of Celestia though)

The massive speed increases for URLs 1 and 2 is a result of the switch to vertex buffer objects. On a high-end graphics card, the performance boost from using VBOs completely overwhelms any slowdown from enabling anisotropic filtering. When no planets are in view, performance of the SVN version is on par with 1.6.0, though there's still a slight performance hit when DSOs are enabled (still investigating this...) Finally, there's a small performance increase from the new orbit code--there's a much greater increase when visualizing complex trajectories like Cassini's.

--Chris

Re: Post-1.6.0 performance

Posted: 05.09.2009, 16:27
by cartrite
Here is what I get using the KDE versions.
SVN / 1.6.0
202 / 165
75.7 / 49.5
225 / 206
116 / 113
185 / 168
9800 gt 1gb ram 2ghz amd processor 3 gb ram
openSuse 11.1 64 bit
Display = 1280x1024
cartrite

Re: Post-1.6.0 performance

Posted: 05.09.2009, 17:00
by chris
cartrite wrote:Here is what I get using the KDE versions.

Very good to see that performance has improved across the board on Linux. Enabling vertex buffer objects is making a big difference for planet views.

--Chris

Re: Post-1.6.0 performance

Posted: 05.09.2009, 17:09
by chris
A very interesting result when testing the SVN version on my MacBook Pro. For cel URL 4 (Stars+DSOs), changing the compiler version makes a big difference:

gcc 4.0: 34.1 fps
gcc 4.2: 59.4 fps

The second result should actually be higher, but I can't figure out how to disable vsync on the Mac, so it's maxing out at the refresh rate of 60Hz.

The MacBook Pro:
2.4 GHz Intel Core 2 Duo
Geforce 8600M GT, 256MB VRAM
4GB RAM
Snow Leopard (Mac OS 10.6)

Eigen's vectorization is only enabled in gcc 4.2 in higher, so that may account for most of the performance difference. gcc 4.2 is available as an option in XCode 3.1 and higher and is the default in XCode 3.2 (which comes with Snow Leopard.) gcc+LLVM is supposed to produce even better code, but I need to figure out how to disable vsync before it's worth benchmarking.

--Chris

Re: Post-1.6.0 performance

Posted: 05.09.2009, 17:37
by Ricardo
Mac user, same about vsync and don't really know about the compiler, anyway:

1.6 - SVN
74,9 - 75
69,7 - 73,5
75,2 - 75
74,9 - 38,2
74,2 - 48

(with Xcode 3.1.2, specs in my sign- Edit: 2x 1440x900)

Results not famous... :|

Re: Post-1.6.0 performance

Posted: 05.09.2009, 18:38
by selden
v1.6.0 / r4864:

191/202
111/157
210/204
123/116
162/165

System:
1GB 3.4GHz P4-550, WinXP Pro sp3
256MB GF 7800 GTX, ForceWare v182.50
1600x1200 display, vertical sync disabled

r4864 compiled with VS 2008 C++ SP1, anisotropic filtering and SSE2 enabled

p.s.
I'm actually disappointed that Chris is getting only 2x better performance than I am. I've had this system for almost 5 years. I had been planning to upgrade in early November, but I'm not sure an improvement of only 2x is worth the effort :(

It seems to me that multithreading would be one way to improve CPU utilization and thus performance, but I don't know how much it would improve the frame rate.

Re: Post-1.6.0 performance

Posted: 05.09.2009, 21:03
by chris
Ricardo wrote:Mac user, same about vsync and don't really know about the compiler, anyway:

1.6 - SVN
74,9 - 75
69,7 - 73,5
75,2 - 75
74,9 - 38,2
74,2 - 48

(with Xcode 3.1.2, specs in my sign- Edit: 2x 1440x900)

Results not famous... :|

Based on your results, I'd say that the slowdown is a result of using gcc 4.0 instead of gcc 4.2. The biggest slowdowns are seen in #4 and #5, when deep sky objects and orbits are enabled. Both these code paths are optimized to use vectorization at the expense of some performance when vectorization is disabled (all ultimately a result of the memory alignment requirements of the CPU vector instructino sets.) The performance improvement for URL #2 is a result of enabling vertex buffer objects.

To set the compiler version to gcc 4.2, go to Edit Project Settings in the Project menu. In the Project Info dialog that pops up, set the C/C++ Compiler Version to "GCC 4.2". It may also be necessary to change the value of Base SDK to Mac OS X 10.5.

--Chris

Re: Post-1.6.0 performance

Posted: 05.09.2009, 22:10
by Ricardo
After changes as indicated, new build, new numbers:

1.6 - SVN
74,9 - 75
69,7 - 74,2
75,2 - 75
74,9 - 75
74,2 - 74,2

Better but no major boost as in others tests above (Windows/Linux).

Re: Post-1.6.0 performance

Posted: 06.09.2009, 01:37
by chris
Ricardo wrote:After changes as indicated, new build, new numbers:

1.6 - SVN
74,9 - 75
69,7 - 74,2
75,2 - 75
74,9 - 75
74,2 - 74,2

Better but no major boost as in others tests above (Windows/Linux).

Vsync is restricting your frame rate to 75 Hz. You would see bigger numbers for both 1.6.0 and the SVN version if vsync were disabled. Unfortunately, I haven't figured out how to do this on the Mac. In general, you want vsync on except when benchmarking. Leaving it disabled results in 'tearing' whenever a screen refresh occurs in the middle of copying the rendered image to the displayed region of video memory.

--Chris

Re: Post-1.6.0 performance

Posted: 06.09.2009, 13:02
by Boux
Hello, all!

Here is what I get (Vista64 SP2 - 1920x1080 HDTV display - Q9550 3.7 Ghz)

1.6 --> SVN4864
362 --> 346
250 --> 256
434 --> 447
235 --> 230
340 --> 381

Looks like the mileage is varying.
I am running ATI cards, not Nvidia btw.
Maybe at such high fps number we are somehow getting cpu-limited

Re: Post-1.6.0 performance

Posted: 07.09.2009, 14:00
by selden
Boux,

When comparing the system specs, it seems to me that you and Chris have systems which should have comparable performance, and that is borne out by the Celestia performance numbers. Would you agree?

(I personally consider differences as large as 15% to be "comparable", although others would disagree. In my own testing, I've always had a lot of trouble getting benchmark results to vary less than 10% from one run to another.)

Your CPUs are clocked faster but are the previous generation architecture (Core2 vs i7).
Some graphics benchmarks seem to indicate that your pair of AMD/ATI graphics cards are about 15% faster than the dualGPU nVidia card that Chris has.

I'm not sure how to judge the difference between AMD/ATI's Linux graphics drivers and nVidia's Windows drivers, though. In the past Linux drivers have suffered in comparison, but I have the impression that this may not be the case now.

Re: Post-1.6.0 performance

Posted: 07.09.2009, 18:04
by t00fri
chris wrote:Here are the results for my system:

1.6.0 / SVN 4864
===========
1. 263 / 423 fps
2. 110 / 298 fps
3. 430 / 428 fps
4. 267 / 251 fps
5. 356 / 365 fps

I ran with a window size of roughly 1300x900. The system hardware and software:
CPU: Intel i7 quad core 2.66GHz
GPU: GeForce 285 GTX, 1GB VRAM
12GB RAM
Windows 7, 64-bit (still a 32-bit version of Celestia though)



--Chris

Well, here are some interesting benchmarks.

As I wrote elsewhere, I recently got a new computer for my office in the laboratory,

++++++++++++++++++++
CPU=Core2Duo E8400@3.0GHz/4GB Ram, Graphics=GX 9500GT/512 MB (passively cooled!)
Clearly, noisy machines are out in researchers' offices ;-) . What is amazing is that the 9500GT just costs 50 Euro!!

On that machine I run with a window size of 1280x1024, which is close to Chris' dimensions.
++++++++++++++++++++

This new configuration sports some features that my much older machines did not have:

  • powerful GPU computing (CUDA 2.3) with latest WHQL driver 190.62 (Win XP SP3)!
    Acceleration of my F-TexTools with direct DXT output: by a factor 5 - 10! 2048 VT tiles (level5) only take ~ 12 minutes in highest quality. Stay tuned at CelestialMatters...there will soon be some interesting stuff.

  • an amazing performance in view of Chris' above "high-end" benchmarks for almost no money (compared to his i7 /12 GB + GX 285 GTX/ 1GB equipment)

Here is what I get with this machine under Linux(32bit, KDE4, gcc 4.3, OpenSuSE 11.1)

....1.6.0....|....SVN r4864....
==================

.....367.....|.....375.......
.....202.....|.....221.......
.....473.....|.....527......
.....252.....|.....273.......
.....332.....|.....309.......

=====================

Just compare these values with Chris' "high-end" performance above....

Under Windows XP/ SP3, things so far are not quite as fast, but still not bad. Let me just display the 1.6.0 reference values here. Anyway, they are quite close to Chris' results for 1.6.0!

....1.6.0...(32bit Win XP SP3)
==================

.....284.....
.....123.....
.....442.....
.....271.....
.....367.....

=====================

No AA or anisotropic filtering has been activated and the test configurations were straight from Chris' above links.

Finally, as a historic touch, here are the results from my VERY old Desktop at home for Linux, (32bit KDE 3.5.10, gcc 4.2.1, OpenSuSE 10.3)

CPU = Pentium4, 3.2 GHz/3 Gb Ram, Graphics=FX 5900 Ultra/256 MB (!!!)

Note that ~ 6 years ago, I paid for this card > 450 Euro ;-)
viewtopic.php?f=2&t=3243&p=22062&hilit=5900#p22062
Now the 9500GT just costs 50 Euro ....

On that "historical" machine I run with fullscreen, 1600x1200, which is significantly larger than what Chris used...


....1.6.0....|....SVN r4864....
==================

.....112.....|.....105.......
.....27.4....|.....27.3......
.....211.....|.... 221.......
.....114.....|.....120.......
.....132.....|.....153.......

=====================


All these benchmarks support quantitatively the various remarks I made earlier about anisotropic filtering etc.

Fridger

PS: Of course I also took plenty of benchmarks with the various celestia-qt4 configurations...

Re: Post-1.6.0 performance

Posted: 07.09.2009, 21:27
by chris
A few conclusions from the benchmarks so far...

* On a current desktop configuration, Celestia tends to be CPU limited. There's plenty of evidence for this in the numbers: extremely fast GPUs don't perform that much better than lower end configurations. But, the benchmark numbers presented in this thread are in some regard artificial: they are mean to highlight any performance differences between Celestia 1.6.0 and the current SVN revision. Visually, it makes no difference if Celestia is running at 100 fps or 400 fps--both frame rates are higher than the monitor refresh rate (typically 60 Hz). Except during benchmarking, you want is to maximize visual quality while maintaining frame rates above the refresh rate. The best and easiest way to boost quality is to enable 4x or better antialiasing by editing the AntialiasingSamples setting in celestia.cfg (the next version of Celestia will offer a GUI control for antialiasing.)

* On mid- to high-end Windows system, there is no significant overall performance drop between Celestia 1.6.0 and the SVN revision. In most cases, Celestia SVN is faster, sometimes very significantly so.

* Linux performance seems to have improved overall with the SVN version.

* Performance on the Mac drops significantly from 1.6.0 to SVN unless the gcc 4.2 compiler is used. Performance seems on par with 1.6.0 with gcc 4.2, but we need to find out how to disable vsync for proper benchmarking.

* On mid- to high-end systems, Celestia SVN is running faster than 1.6.0 despite having anisotropic filtering enabled to reduce the 'polar pinch' effect. On low-end systems, anisotropic filtering can slow down rendering considerably. Celestia needs to offer a GUI control to adjust the level of anisotropic filtering so that people with older machines don't suffer choppy frame rates.

* More benchmarks from non-Windows and older or lower-end systems would be very useful. The situations with anisotropic filtering and gcc 4.0 vs 4.2 on the Mac demonstrate that there are still performance issues to look out for.

--Chris

Re: Post-1.6.0 performance

Posted: 07.09.2009, 21:31
by t00fri
chris wrote:A few conclusions from the benchmarks so far...


* Only one data point so far, but Linux performance seems to have improved overall with the SVN version.

--Chris

Did you overlook my post above yours??

For Linux we now have THREE data points, not ONE. Cartrite's and my two, from my new office machine and my old one.

Fridger

Re: Post-1.6.0 performance

Posted: 07.09.2009, 21:35
by chris
t00fri wrote:
chris wrote:A few conclusions from the benchmarks so far...


* Only one data point so far, but Linux performance seems to have improved overall with the SVN version.

--Chris

Did you overlook my post above yours??

For Linux we now have THREE data points, not ONE. Cartrite's and my two, the new office machine and my old one.

Oops... I saw your post but thought that those were all Windows benchmarks. I'll correct my post.

--Chris

Re: Post-1.6.0 performance

Posted: 07.09.2009, 21:58
by t00fri
chris wrote:...
On low-end systems, anisotropic filtering can slow down rendering considerably.

--Chris

Sorry, but I could not find any data (for the first two tests) that support this statement of yours. As I have stated since a long long time, anisotropic filtering barely reduces the rate (while it significantly improves the rendering). Notably this is even borne out in my old "low-end" machine (FX5900Ultra).

Typically I take from the data that anisotropic filtering (Tests 1 and 2) does not significantly lower the rate below the typical average performance from tests 1-5. Both in low-end and high-end machines. Moreover, in ALL cases, the performance of tests 1 and 2 increases from 1.6.0 towards SVN 4864. Did I overlook any data?

Fridger

Re: Post-1.6.0 performance

Posted: 07.09.2009, 22:06
by chris
t00fri wrote:
chris wrote:...
On low-end systems, anisotropic filtering can slow down rendering considerably.

--Chris

Sorry, but I could not find any data (for the first two tests) that support this statement of yours in any way. As I have stated since a long long time, anisotropic filtering barely reduces the rate (while it significantly improves the rendering). Notably this is also born out in my old "low-end" machine (FX5900Ultra).

Typically I take from the data that anisotropic filtering (Tests 1 and 2) does not significantly lower the rate below the typical average performance from tests 1-5. Both in low-end and high-end machines. Moreover, in ALL cases, the performance of tests 1 and 2 increases from 1.6.0 towards SVN 4864. Did I overlook any data?

Selden's results from this thread:

viewtopic.php?f=10&t=14111&st=0&sk=t&sd=a

Also, DOJOMO's slowdown is likely due to anisotropic filtering, though he may not be able to verify this until an anisotropic filtering control is added to Celestia.

On higher end systems, enabling vertex buffer objects appears to more than compensate for any slowdown due to anisotropic filtering.

--Chris

Re: Post-1.6.0 performance

Posted: 07.09.2009, 22:13
by t00fri
chris wrote:
t00fri wrote:
chris wrote:...
On low-end systems, anisotropic filtering can slow down rendering considerably.

--Chris

Sorry, but I could not find any data (for the first two tests) that support this statement of yours in any way. As I have stated since a long long time, anisotropic filtering barely reduces the rate (while it significantly improves the rendering). Notably this is also born out in my old "low-end" machine (FX5900Ultra).

Typically I take from the data that anisotropic filtering (Tests 1 and 2) does not significantly lower the rate below the typical average performance from tests 1-5. Both in low-end and high-end machines. Moreover, in ALL cases, the performance of tests 1 and 2 increases from 1.6.0 towards SVN 4864. Did I overlook any data?

Selden's results from this thread:

viewtopic.php?f=10&t=14111&st=0&sk=t&sd=a

Also, DOJOMO's slowdown is likely due to anisotropic filtering, though he may not be able to verify this until an anisotropic filtering control is added to Celestia.

On higher end systems, enabling vertex buffer objects appears to more than compensate for any slowdown due to anisotropic filtering.

--Chris


--Chris

I thought we take THIS thread as a basis, since here everyone started from the same conditions. And indeed, Selden's above results are again in line with my observations.

For my taste, DOJOMO's results are too vague as to anisotropic filtering.

Fridger

Re: Post-1.6.0 performance

Posted: 07.09.2009, 22:18
by selden
Fridger,

I consider the 26% reduction in frame rate that I reported (117 fps --> 86 fps) to be a considerable slowdown, and that's with a graphics card which is much more capable than the average. Granted, that test was with a Quadro workstation card and not a GeForce gaming card. I don't know what difference that makes in relative performance of anisotropic filtering. My tests here all have been using a gaming card with anisotropic filtering enabled. I'll make another test with it disabled to see how much it improves.

Chris,

I provided DOJOMO with a copy of Celestia compiled from the current svn code, but with both anisotropic filtering and SSE2 code disabled. He reported no improvement in performance. :(

For what it's worth, my personal interest is in whatever can be done to improve the performance when viewing many small objects simultaneously. Celestia's performance when viewing the control desk of the Hale Telescope Addon is one example. I'm currently getting only 10fps when viewing it :(