Hi,
Small angle approximation does a good job at speeding up the rotation of the gravity vector by the measured angular velocity, by assuming cos(small angle) = 1 and sin(small angle) = angle. It gets rid of all the cosinus terms in the rotation matrix and also takes care of the sinus terms. The result is a matrix that isn''t a rotation anymore (not orthogonal) and the errors build up. I think this is why the code uses a higher gyro scale factor and without correction by the accelerometer this would quickly fall apart. It's ok to do that ... nothing wrong with it ... but it's not accurate
Yes, but:
1) Let's imagine you rotating with angular velocity 2000deg/sec (which is not poossible even for wartox.

)
2000deg/sec = 34.91 rad/sec
rotation in 2ms would be 0.069 rad
http://en.wikipedia.org/wiki/Small-angle_approximation relative error is less than 0.1% which is much less than gyro dynamic drift anyway.

2) Yes matrix not orthoganal and errors build up but:
a) directional errors are much smaller than magnitude error (and we do not care about magnitude at this stage)
b) they much smaller than gyro dynamic drift again
3) why the code uses a higher gyro scale factor, this is another story.

That's correct. The code doing the quaternion calculations is adapted from other flight control software that use it like this. For small angles the rotation quaternion can be simplified as [1, roll/2, pitch/2, yaw/2]. Let's call it a similar approximation.
This is just conversion from Euler to quaternion, which supposed to be:

Applying small angle approximation to it, you introducing same errors as before. Additionally this transformation is also not orthoganal and you have to normalize quaternion on each step. Which adds additional numeric error. (remember, SQRT is also approximation).
So, numerically this is less precise than vector(s) rotation.
1) Integrating Euler angles (remember: accuracy) needs cos(), sin() and matrix multiplication. Using quaternions is faster when continously integrating rotation and normalisation quaternions is faster than normalizing a 3x3 matrix. However, MultiWii doesn't integrate Euler angles. The code just rotates the previously found gravity vector and then calculates roll/pitch/yaw from that and corrects gravity with the accelerometer measurement. The quaternion code you quoted does a similar thing and estimates the the gravity vector from the roll/pitch/yaw angle (in quaternion representation), but it then corrects the gyro input with the error between this estimate and the accelerometer measurement and actually executes an integration step.
The problem is that sensors reporting vectors and Euler angles (angular velocities). Overheads of conversions is much higher than benefit. (and don't skip the step.

to properly convert from euler to quaternion you need cos(), sin() as well).
The code just rotates the previously found gravity vector and then calculates roll/pitch/yaw from that and corrects gravity with the accelerometer measurement. The quaternion code you quoted does a similar thing and estimates the the gravity vector from the roll/pitch/yaw angle (in quaternion representation), but it then corrects the gyro input with the error between this estimate and the accelerometer measurement and actually executes an integration step.
And what a difference?

Apart from the first system is unconditionally stable. The second one supposed to be stable in quite wide range, but due round-off errors, discussed previously, it is quite challenging to make it stable.
BTW: PI observer pattern can be applied regardles what rotation representation is used.
Euler angles "gimbal lock" around 90°. Since MultiWii gets these angles from a gravity vector it should be lock-free, but they still behave strange in "extreme" situations.
My point was, that this "gimbal lock" is property of the Euler angles and switching to quaternion/DCM/whatever is not going to magically solve it.

The Eulers always have singularities around 90deg. It is possible to implement workaround for it but what is the point?

Correct way is to pass vector/quaternion/DCM/etc to gui and perform proper transformation there.
Again, nothing wrong as it speeds up the code. But the math.h atan2 is hopefully more accurate.
Without definition of the "accuracy" it is not.

Ok, but I am afraid the sensors wont play along when we want measurements every µs, even if the calculations would be fast enough for that frequency
Not sure that it means.

But sensors have defined bandwidth, this usually defines minimum sampling rate (to avoid aliasing). Sampling faster than that sometimes beneficial and called "oversampling" (reduces noise and increases resolution). Some FC projects using really high oversampling factors, like x64-x128
regards,
ziss_dm