For those new to the discussion, Eigen is a C++ template library for manipulating vectors and matrices. Through version 1.6.0, Celestia was using its own routines for vector and matrix operations. Moving from these custom classes to Eigen will have benefits in code readability and performance. Here's the main page for Eigen:
http://eigen.tuxfamily.org/index.php?title=Main_Page
Programmers will be most interested in the API showcase, which has a lot of code samples demonstrating the expressiveness of the API:
http://eigen.tuxfamily.org/index.php?title=API_Showcase
Here's an example from Celestia. This is some code from staroctree.cpp in 1.6.0:
Code: Select all
float r = scale * (abs(plane->normal.x) +
abs(plane->normal.y) +
abs(plane->normal.z));
With Eigen, this expression can be written more simply:
Code: Select all
float r = scale * plane.normal().cwise().abs().sum();
In the above code, 'cwise' means 'component-wise', i.e. apply the next operation to each component of the vector. It's one of many features that makes is possible to write more concise and readable code with Eigen. When manipulating vectors, the less one has to refer explicitly to the vector's individual components, the better.
The performance benefits of the switch to Eigen will be the result of three things:
1. Vectorization
2. Reduced amount of copying to store temporary results
3. Elimination of a few Celestia vector class inefficiencies (e.g. lots of copying from Point -> Vector and vice versa0
Vectorization promises the biggest benefit. All CPUs these days have extended instruction sets for performing multiple mathematical operations in parallel: SSE2 on Intel, Altivec on PowerPC. Both Altivec and SSE2 operate on 128-bit packets of data: either 4 single precision floats, or 2 double precision values. Simple code like this:
Code: Select all
Vector4f v = v1 + v2;
can be handled with a single SSE2 instruction instead of four regular CPU add instructions. Up until now, Celestia wasn't benefiting from CPU vector instructions; that will finally change after the switch to Eigen.
It will take some cleverness to fully exploit vectorization. CPUs required data for vector instructions to be aligned to 16-byte boundaries. Eigen takes care of this for structures with sizes that are multiples of 16-bytes such as Vector4f, Vector2d, and Vector4d. But, Vector3f and Vector3d are 12- and 24-bytes in size, respectively, and will not be vectorized. This is rather unfortunate, as these are the types used to represent points and normals in 3D space. In some cases, it will be worthwhile to convert a Vector3f to a Vector4f in order to take advantage of vectorization. I haven't made any optimizations like this so far. Better I think to wait until the conversion to Eigen is complete before tuning performance critical code sections for vectorization.
At this point, I don't expect to see much benefit from enabling SSE2 with the current SVN revision of Celestia. Most code in Celestia operates on Vector3f and Vector3d objects, which don't get vectorized. I'm optimistic that we can change some of this code to be vectorizable. I've had good results already with Vector3 to Vector4 conversion in some new code to draw orbits and trajectories.
There's already been some talk of benchmarking the current SVN version of Celestia with and without SSE2 enabled (at the end of this topic: viewtopic.php?f=4&t=13998&start=0 ). The most performance critical sections of code so far converted to Eigen are the star and DSO octree traversal, and the code to draw stars. For the most useful benchmark, it's best to aim the viewpoint away from any planets and observe a field with just distant stars and DSOs. In my own tests, I wait for the default star script to complete and then press * to point away from the Earth:
1.6.0: 284 fps peak
SVN, SSE2 enabled: 292 fps peak
So at this point, an insignificant performance increase on my system.
Edit: the above values are with galaxies disabled. When galaxies are enabled, I see these frame rates:
1.6.0: 166 fps peak
SVN, SSE2 enabled: 160 fps peak
...a performance [i]decrease[i]. I don't know yet whether this is a result of enabling SSE2 or whether there's an inefficiency in some of the Eigen-ized code I've checked in.
--Chris