Hi!
The aeroquad version of the QuakeIII InvSqrt multiwii is using (
https://code.google.com/p/multiwii/sour ... MU.cpp#153)
Code: Select all
float InvSqrt (float x){
union{
int32_t i;
float f;
} conv;
conv.f = x;
conv.i = 0x5f3759df - (conv.i >> 1);
return 0.5f * (conv.f * (3.0f - x * conv.f * conv.f));
}
can be improved in accuracy with no extra cpu cost according to this:
http://rrrola.wz.cz/inv_sqrt.htmlSo the proposed more accurate function with different "magic numbers" would be that:
Code: Select all
float InvSqrt(float x){
union
{
float f;
uint32_t u;
} y = {x};
y.u = 0x5F1FFFF9 - (y.u >> 1);
return 0.703952253f * y.f * (2.38924456f - x * y.f * y.f);
}
The link states the following maximal errors:
QuakeIII Method: 0,00175233867
Alternative Method: 0,000650196699
Maybe something worth to check out since 2,7 times more accuracy for free isn't bad.
Cheers
Rob
EDIT:
Did some testruns with 199 values (damn, forgot one ...), the absolute error is >2,6 fold better (less).
Concerning the speed I found that the union-free version proposed here
http://pizer.wordpress.com/2008/10/12/f ... uare-root/ is 2,6% faster (tested on stm cpu).
Putting both sources together I end up with that code:
Code: Select all
float InvSqrt(float x)
{
uint32_t i = 0x5F1FFFF9 - (*(uint32_t*)&x >> 1);
float tmp = *(float*)&i;
return tmp * (1.68191409f - 0.703952253f * x * tmp * tmp);
}
Timechart on STM F3 (Test with 1000000 cycles run Note: GCC compiler, not keil):
No optimization = 100%
"Union" Version = 54,1%
"No Union" Version = 51,5% (So thats a little closer to "double" speed)
Avg Error (rounded, based on absolute error, based on 199 values 1-199 in integer "1" steps)
Old: +- 0,00013
New: +- 0,00005
Looks like my errorcalculation is off by a magnitude of 10 but the proportion is still correct, the new magic numbers increase accuracy 2.6 times.
I will not look into that "off by a magnitude" stuff because it's just testcode to verify the increased accuracy - and I consider it as *done* - *myth busted* - whatever.
Note: I've read that sqrtf alone takes 44us (
https://github.com/diydrones/ardupilot/ ... M.cpp#L183) on 8bit arduino. What I measured on my platfrom with float devision 1.0f/sqrtf(x) and an float addition and the "for" loop was 11.6us total (averaged 1mio runs). So it can be assumed that the sqrtf alone on F3 takes less than 10us, so 1/sqrtf optimization doesn't seem to be reasonable there, unless you do plenty of them...
Note: I use opensource gcc compiler.
Note: Don't feed InvSqrt with zero. The unoptimized version will produce an devision by zero exception of course, the "hacked" version will produce a bogus value. I don't know if any of the mwii devs will read this but, insert a check in your IMU part. It sounds unlikely but it will happen and result in a short spike of crazy attitude, it will not ground the copter but is not nice.
So from my perspective the mwii project could have a faster and more accurate algo - just my 2cents.
Cheers Rob