dirkpitt wrote:For me, the Celestia Apple demo runs significantly slower than Hank's build. I chose Warp 8.
I believe there is a combination of several reasons for this:
1) I do not have dual processors, so threaded animation actually slows things down
2) I have a G4, not a G5
3) I am not running Tiger
4) The demo is in fact deliberately slowed down in code when "Time Demo" is checked
5) The Apple optimizations do not actually cause much speedup
Yeah, there's some funny stuff happening there in Apple's code. They've purposely made it crunch harder. I think they did that to make it easier and more obvious to bench(?). The proof is in the pudding, so to speak. I think we'll only really know once the optimizations are applied to a clean tree.
I'm starting clean with a fresh 1.3.2 code-base. I'm using Apple's macosx directory which has hooks for the frameworks and libs because it's the only one I have available (it shouldn't matter much, I hope).
BTW- do a search for 'warp' and you can see what gets enabled for each 'warp' setting:
Code: Select all
#define WARP1 gLimitDrawing=0; gThread=0; gApproxCos=0; gUnrollCos=0; gVForce=0; gAltivec=0; gSchedule=0; gG5=0
#define WARP2 gLimitDrawing=1; gThread=0; gApproxCos=0; gUnrollCos=0; gVForce=0; gAltivec=0; gSchedule=0; gG5=0
#define WARP3 gLimitDrawing=1; gThread=1; gApproxCos=0; gUnrollCos=0; gVForce=0; gAltivec=0; gSchedule=0; gG5=0
#define WARP4 gLimitDrawing=1; gThread=1; gApproxCos=0; gUnrollCos=0; gVForce=1; gAltivec=0; gSchedule=0; gG5=0
#define WARP5 gLimitDrawing=1; gThread=1; gApproxCos=1; gUnrollCos=0; gVForce=0; gAltivec=0; gSchedule=0; gG5=0
#define WARP6 gLimitDrawing=1; gThread=1; gApproxCos=0; gUnrollCos=1; gVForce=0; gAltivec=0; gSchedule=0; gG5=0
#define WARP7 gLimitDrawing=1; gThread=1; gApproxCos=0; gUnrollCos=1; gVForce=0; gAltivec=1; gSchedule=0; gG5=0
#define WARP8 gLimitDrawing=1; gThread=1; gApproxCos=0; gUnrollCos=1; gVForce=0; gAltivec=1; gSchedule=1; gG5=0
#define WARP9 gLimitDrawing=1; gThread=1; gApproxCos=0; gUnrollCos=1; gVForce=0; gAltivec=1; gSchedule=1; gG5=1
These variables are used to switch between different functions (Apple's vs. stock).
Options are disabled depending on the environment:
Code: Select all
if(item==[optimizationButton itemWithTitle:@"Warp 3 - Thread"]) {
return (HasMultipleProcessors());
}
if(item==[optimizationButton itemWithTitle:@"Warp 4 - vForce Cos"]) {
return (IsTigerOrBetter());
}
if(item==[optimizationButton itemWithTitle:@"Warp 7 - AltiVec"]) {
return IsAltiVecAvailable();
}
if(item==[optimizationButton itemWithTitle:@"Warp 8 - Scheduling"]) {
return IsAltiVecAvailable();
}
if(item==[optimizationButton itemWithTitle:@"Warp 9 - G5"]) {
return IsAltiVecAvailable() && Is64Bit();
}
There are also general optimizations, like a structure change in star.h (I'm assuming this came from Apple. I didn't see it in 1.3.2.):
Code: Select all
private:
// SKP - use a union to overlay a vector over
// the scalar position and magnitude variables
union {
float f[4];
vector float v;
} posMag;
uint32 catalogNumbers[CatalogCount];
// Point3f position;
// float absMag;
I've done just this change against 1.3.2 and it increased the FPS from 29 to 32 when looking at the Earth. (You also have to change where position and absMag are used to use the union instead, of course, or else it won't compile. I copied Apple's example.) I get about 27 with Hanks build, which is close to stock.
I'll continue to move over some of the other optimizations to see what they do.
-Phil