bjacob wrote:Hi,
i'm one of the developers of Eigen. First of all let me say that Celestia is a beautiful project and we're very proud that you chose Eigen. I saw that you were discussing performance issues related to Eigen.
Welcome to the Celestia forums, Benoit!
Here are a few tips.
1) GCC versions: as you noted, and as we say on Eigen's website, GCC 4.2 gives far, far better performance than previous versions (in addition to allowing vectorization). Stay away from GCC 4.1 or older, if you care about performance with Eigen (that said, we do support compilation all the way down to GCC 3). Also note that GCC 4.4 tends to give even 10% better performance than GCC 4.2. On the other hand, GCC 4.3 is slightly below 4.2. So use GCC 4.4 if possible, otherwise use GCC 4.2.
I haven't tried gcc 4.4 yet. Any idea how Eigen does with gcc+LLVM on Mac OS X?
2) Vectorization. Note that due to SSE and AltiVec working on aligned 128 bit packets, operations on 3D vectors and 3x3 matrices typically can't be vectorized. That can easily explain why you don't get a large boost from SSE2. If you use quaternions, well in the development branch of Eigen they are vectorized, but not in the 2.0 branch, so expect a performance boost the day you upgrade to the next Eigen version (not yet released, far from it). For example, quaternion multiplication is 2x faster with SSE. The next Eigen version isn't compatible with Eigen 2.0, but if you use only tiny vectors/matrices and quaternions, it might be that it is compatible on that subset, or with just a few #ifdefs. But I don't recommend to upgrade yet as the API isn't finalized.
Mostly, Celestia uses Vector3/Matrix3 and quaternions. I did notice that the development version of Eigen has a vectorized quaternion multiply--I was hoping that would end up in 2.0, but it sounds like you're going to hold off until the next version. I was thinking about integrating the code into Eigen myself (we've got a copy checked into Celestia's SVN tree), but that would make it painful keep up with the 2.0.x updates.
There are two places where I used Vector4/Matrix4 explicitly to enable vectorization:
- The new orbit code
- The code to render galaxy sprites
I'm aware that this will cost some performance when vectorization isn't available, but I've observed a significant improvement in the orbit code when using vectorized Matrix4d versus unvectorized Matrix3d. The galaxy code uses single precision Matrix4, so there should be an even bigger performance improvement for vectorized code. So far, that hasn't proven to be the case, which makes me suspect that the bottleneck is elsewhere in the galaxy code. I need to do dig into the code with a profiler to figure out what's going on.
3) While enabling SSE doesn't always give a benefit, it also should never give a significant performance regression, so if you hit such a situation, please report to us
Some people have reported a slight decrease in performance when sse2 is enabled on MSVC++ 2008. But, I'm not sure if this is related to Eigen or if 's the compiler performing ineffective autovectorization of non-Eigen code.
--Chris