Post-1.6.0 performance

Discussion forum for Celestia developers; topics may only be started by members of the developers group, but anyone can post replies.
Reiko
Posts: 1119
Joined: 05.10.2006
Age: 41
With us: 18 years 1 month
Location: Out there...

Re: Post-1.6.0 performance

Post #41by Reiko » 14.09.2009, 14:31

Fenerit wrote:
Reiko wrote:I notice no difference in look or performance when I have AiF turned on for celestia. It won't seem to work for me.

Do you not see things like these?

On default Mars texture:

No anisotropic filter:
Image

Anisotropic filter 2x (minimum)
Image

Note the pole's pinch reduction.

Your images are not showing up so I can't say.

Avatar
Fenerit M
Posts: 1880
Joined: 26.03.2007
Age: 17
With us: 17 years 7 months
Location: Thyrrenian sea

Re: Post-1.6.0 performance

Post #42by Fenerit » 14.09.2009, 17:53

Ah! It's the server. Then they will return.
Never at rest.
Massimo

Reiko
Posts: 1119
Joined: 05.10.2006
Age: 41
With us: 18 years 1 month
Location: Out there...

Re: Post-1.6.0 performance

Post #43by Reiko » 15.09.2009, 13:15

Ok they show up now. I will give the test on celestia and see if I see the difference there. :blue:

Topic author
chris
Site Admin
Posts: 4211
Joined: 28.01.2002
With us: 22 years 9 months
Location: Seattle, Washington, USA

Re: Post-1.6.0 performance

Post #45by chris » 21.09.2009, 21:57

bjacob wrote:Hi,
i'm one of the developers of Eigen. First of all let me say that Celestia is a beautiful project and we're very proud that you chose Eigen. I saw that you were discussing performance issues related to Eigen.

Welcome to the Celestia forums, Benoit!

Here are a few tips.

1) GCC versions: as you noted, and as we say on Eigen's website, GCC 4.2 gives far, far better performance than previous versions (in addition to allowing vectorization). Stay away from GCC 4.1 or older, if you care about performance with Eigen (that said, we do support compilation all the way down to GCC 3). Also note that GCC 4.4 tends to give even 10% better performance than GCC 4.2. On the other hand, GCC 4.3 is slightly below 4.2. So use GCC 4.4 if possible, otherwise use GCC 4.2.

I haven't tried gcc 4.4 yet. Any idea how Eigen does with gcc+LLVM on Mac OS X?

2) Vectorization. Note that due to SSE and AltiVec working on aligned 128 bit packets, operations on 3D vectors and 3x3 matrices typically can't be vectorized. That can easily explain why you don't get a large boost from SSE2. If you use quaternions, well in the development branch of Eigen they are vectorized, but not in the 2.0 branch, so expect a performance boost the day you upgrade to the next Eigen version (not yet released, far from it). For example, quaternion multiplication is 2x faster with SSE. The next Eigen version isn't compatible with Eigen 2.0, but if you use only tiny vectors/matrices and quaternions, it might be that it is compatible on that subset, or with just a few #ifdefs. But I don't recommend to upgrade yet as the API isn't finalized.

Mostly, Celestia uses Vector3/Matrix3 and quaternions. I did notice that the development version of Eigen has a vectorized quaternion multiply--I was hoping that would end up in 2.0, but it sounds like you're going to hold off until the next version. I was thinking about integrating the code into Eigen myself (we've got a copy checked into Celestia's SVN tree), but that would make it painful keep up with the 2.0.x updates.

There are two places where I used Vector4/Matrix4 explicitly to enable vectorization:
- The new orbit code
- The code to render galaxy sprites

I'm aware that this will cost some performance when vectorization isn't available, but I've observed a significant improvement in the orbit code when using vectorized Matrix4d versus unvectorized Matrix3d. The galaxy code uses single precision Matrix4, so there should be an even bigger performance improvement for vectorized code. So far, that hasn't proven to be the case, which makes me suspect that the bottleneck is elsewhere in the galaxy code. I need to do dig into the code with a profiler to figure out what's going on.

3) While enabling SSE doesn't always give a benefit, it also should never give a significant performance regression, so if you hit such a situation, please report to us :)

Some people have reported a slight decrease in performance when sse2 is enabled on MSVC++ 2008. But, I'm not sure if this is related to Eigen or if 's the compiler performing ineffective autovectorization of non-Eigen code.

--Chris

Topic author
chris
Site Admin
Posts: 4211
Joined: 28.01.2002
With us: 22 years 9 months
Location: Seattle, Washington, USA

Re: Post-1.6.0 performance

Post #48by chris » 22.09.2009, 18:27

bjacob wrote:
Mostly, Celestia uses Vector3/Matrix3 and quaternions. I did notice that the development version of Eigen has a vectorized quaternion multiply--I was hoping that would end up in 2.0, but it sounds like you're going to hold off until the next version. I was thinking about integrating the code into Eigen myself (we've got a copy checked into Celestia's SVN tree), but that would make it painful keep up with the 2.0.x updates.

OK, I have just backported the vectorized quaternion multiply. It is in the 2.0 branch, which you can get here:
http://bitbucket.org/eigen/eigen2/get/2.0.tar.gz

Please test before i release this as 2.0.6 :)

Notice that this is only with float, not with double.

Here on my old 32bit x86 machine, this gives me only a +10% speed increase, but it is known to be a x2 increase on 64-bit "core 2" machines.

Benoit,

Thank you! I'll test this out right away. What's the reason for float only? Are there not enough SSE registers for double precision? Or is the performance benefit not great enough?

--Chris

Topic author
chris
Site Admin
Posts: 4211
Joined: 28.01.2002
With us: 22 years 9 months
Location: Seattle, Washington, USA

Re: Post-1.6.0 performance

Post #50by chris » 23.09.2009, 02:35

I replaced Celestia's version of Eigen (2.0.3, I think) with the one you linked to. Everything works fine on the one platform/compiler combination I tested: Windows 7, MSVC++ 2008, SSE2/vectorization enabled. There's a slight performance boost of about ~10 frames per second in tests 3, 4, and 5. The first two tests perform as before. I wouldn't expect single precision quaternion multiplication to consume a large amount of time in any of the tests, but there are quite a few quaternion multiplies sprinkled about the code.

bjacob wrote:
What's the reason for float only? Are there not enough SSE registers for double precision? Or is the performance benefit not great enough?

You'd have to ask this to the specialists (Rohit wrote this code and Gael made an improvement) but i guess that the answer is that nobody has yet found a way of making it faster than non-vectorized code.

Here's the code for float: as you can see, besides the additions and multiplications, there are a lot of other instructions that make the code slower. Indeed, quaternion multiplication isn't very convenient to vectorize.

Yes, I can see that. I can also see that writing a double precision version would require the efforts of someone more familiar than me with the performance of various sse2 instructions.

--Chris

Avatar
Fenerit M
Posts: 1880
Joined: 26.03.2007
Age: 17
With us: 17 years 7 months
Location: Thyrrenian sea

Re: Post-1.6.0 performance

Post #52by Fenerit » 27.09.2009, 22:48

Just my two cents.
Builded the last SVN just yesterday and for my system (geforce 6100 256 Mb shared latest drivers) the performances are really improved. Now I can goto to galaxies and clusters at 160 fps against the 121 of official. Then I get 78 with orbits at OrbitPathSamplePoints = 256 against 58 of "old". Anyhow I've noticed that the performances decrease when the OrbitPathSamplePoints has the default value of 100 in both versions. Practically Celestia is more slow at 100 instead of to be at 128 as well as at 200 instead of 256 (and this is true also for RingSystemSections). Probably with high end cards this fact is unnoticeable. In short, these values doesn't seems appropriate to be well managed by graphic cards.
Never at rest.
Massimo

Topic author
chris
Site Admin
Posts: 4211
Joined: 28.01.2002
With us: 22 years 9 months
Location: Seattle, Washington, USA

Re: Post-1.6.0 performance

Post #53by chris » 07.10.2009, 11:30

The version of Eigen in the Celestia SVN repository is now upgraded to 2.0.6.

--Chris

BobHegwood
Posts: 1803
Joined: 12.10.2007
With us: 17 years 1 month

Re: Post-1.6.0 performance

Post #54by BobHegwood » 14.10.2009, 12:59

Chris?

Just FYI here...
I just went from 1.6.0 to 1.6.0 RC3, and noticed a marked increase in performance on my Vista box.
On 1.6.0 the opening sequence and (G)oto any planets caused the planets to jump into appearance
after a long wait.

Now, I can actually view the planets as they are coming towards me. :wink:
Don't know the specifics of the improvements, but they are there. :)

Again, just FYI...

Thanks very much, Brain-Dead
Brain-Dead Geezer Bob is now using...
Windows Vista Home Premium, 64-bit on a
Gateway Pentium Dual-Core CPU E5200, 2.5GHz
7 GB RAM, 500 GB hard disk, Nvidia GeForce 7100
Nvidia nForce 630i, 1680x1050 screen, Latest SVN

duds26
Posts: 328
Joined: 05.02.2007
Age: 34
With us: 17 years 9 months
Location: Europe

Re: Post-1.6.0 performance

Post #55by duds26 » 14.10.2009, 19:33

These SSE-only things are worrying.
Because Celestia, at it's core should be as portable as possible.
(ARM comes to mind.)
Last edited by duds26 on 15.04.2018, 19:18, edited 1 time in total.

Avatar
selden
Developer
Posts: 10192
Joined: 04.09.2002
With us: 22 years 2 months
Location: NY, USA

Re: Post-1.6.0 performance

Post #56by selden » 14.10.2009, 20:04

Celestia's source code doesn't specify SSE2. SSE2 code generation is enabled when Celestia is compiled using MS Visual C++ for Windows on x86 chips. It's just another way for the compiler to try to generate the fastest set of binary instructions for that type of CPU.

ARM chips have a very different instruction set and a different compiler would have to be used. Celestia compiled for the Intel x86 architecture can't run at all on computers which use the ARM architecture.
Selden

Topic author
chris
Site Admin
Posts: 4211
Joined: 28.01.2002
With us: 22 years 9 months
Location: Seattle, Washington, USA

Re: Post-1.6.0 performance

Post #57by chris » 16.10.2009, 09:11

Fenerit wrote:Just my two cents.
Builded the last SVN just yesterday and for my system (geforce 6100 256 Mb shared latest drivers) the performances are really improved. Now I can goto to galaxies and clusters at 160 fps against the 121 of official. Then I get 78 with orbits at OrbitPathSamplePoints = 256 against 58 of "old". Anyhow I've noticed that the performances decrease when the OrbitPathSamplePoints has the default value of 100 in both versions. Practically Celestia is more slow at 100 instead of to be at 128 as well as at 200 instead of 256 (and this is true also for RingSystemSections). Probably with high end cards this fact is unnoticeable. In short, these values doesn't seems appropriate to be well managed by graphic cards.

The OrbitPathSamplePoints setting will be going away in the next version of Celestia. With the new orbit rendering code, Celestia automatically determines the number of points required to draw the orbit smoothly and accurately.

--Chris

Avatar
t00fri
Developer
Posts: 8772
Joined: 29.03.2002
Age: 22
With us: 22 years 7 months
Location: Hamburg, Germany

Re: Post-1.6.0 performance

Post #58by t00fri » 16.10.2009, 14:22

chris wrote:...
With the new orbit rendering code, Celestia automatically determines the number of points required to draw the orbit smoothly and accurately.

--Chris

Did all the nasty inaccuracies that I spotted with your new splined orbit code now go away?? At least towards the end of our discussion they were still there in somewhat modified form (despite your modifications).

In my view it was way too early to speak of controlled accuracy at any rate.

Fridger
Image

duds26
Posts: 328
Joined: 05.02.2007
Age: 34
With us: 17 years 9 months
Location: Europe

Re: Post-1.6.0 performance

Post #59by duds26 » 16.10.2009, 19:49

selden wrote:Celestia's source code doesn't specify SSE2. SSE2 code generation is enabled when Celestia is compiled using MS Visual C++ for Windows on x86 chips. It's just another way for the compiler to try to generate the fastest set of binary instructions for that type of CPU.

ARM chips have a very different instruction set and a different compiler would have to be used. Celestia compiled for the Intel x86 architecture can't run at all on computers which use the ARM architecture.

That's my point, can it be compiled for another architecture?
So you're saying here that the SSE code is only being generated if compiled for x86 and is completely optional?

Avatar
selden
Developer
Posts: 10192
Joined: 04.09.2002
With us: 22 years 2 months
Location: NY, USA

Re: Post-1.6.0 performance

Post #60by selden » 16.10.2009, 20:00

duds26 wrote:
selden wrote:Celestia's source code doesn't specify SSE2. SSE2 code generation is enabled when Celestia is compiled using MS Visual C++ for Windows on x86 chips. It's just another way for the compiler to try to generate the fastest set of binary instructions for that type of CPU.

ARM chips have a very different instruction set and a different compiler would have to be used. Celestia compiled for the Intel x86 architecture can't run at all on computers which use the ARM architecture.

That's my point, can it be compiled for another architecture?
Yes, it can. Celestia is written in C++.
In principle, it can be compiled on any computer which has a C++ compiler.
Some of Celestia's calls to runtime library entry points (e.g. for file I/O or OpenGL) might have to be modified depending on how the operating system or graphics vendor implemented them, however.
So you're saying here that the SSE code is only being generated if compiled for x86 and is completely optional?
That is correct. The C++ compiler decides what kind of binary instructions get generated. The GNU C++ compiler is designed to be able to emit instructions for many different hardware architectures: Intel x86, VAX, Alpha, MIPS, PowerPC and others. I don't know if it supports ARM yet..
Selden


Return to “Ideas & News”