gle.ovo.

Fast Inverse Square Root

The invSqrt method is defined as the inverse of the square root of a value :

function invSqrt( v : Float ) {
    return 1.0 / Math.sqrt(v);
}

It’s used in a lot of different graphics and sound applications. For instance, if you want to calculate the unit-vector between two points, you can do the following :

function distanceVector( pa : Point, pb : Point, v : Point ) {
    var dx = pb.x - pa.x;
    var dy = pb.y - pa.y;
    var dist = Math.sqrt(dx*dx + dy*dy);
    v.x = dx / dist;
    v.y = dy / dist;
}

In that case, we are doing two float divides, which are expensive operations, so we can optimize the function by using two multiply and a single division :

function distanceVector( pa : Point, pb : Point, v : Point ) {
    var dx = pb.x - pa.x;
    var dy = pb.y - pa.y;
    var idist = invSqrt(dx*dx + dy*dy);
    v.x = dx * idist;
    v.y = dy * idist;
}

Of course, calculating the inverse square root is quite an expensive operation as well. See for instance the following benchmarks results for variable Number operations on Flash Player 10 :

+        1.3
*        1.5
/        10.9
sqrt     76.7

Quake Fast Inverse Square Root

A lot of research has been done to optimize the invSqrt calculus. One of the most famous was discovered in Quake source code and is commonly refered as “fast inverse square root”. It’s written in C language as the following function :

float InvSqrt( float x ) {
    float xhalf = 0.5f*x;
    int i = *(int*)&x;
    i = 0x5f3759df - (i>>1);
    x = *(float*)&i;
    x = x*(1.5f - xhalf*x*x);
    return x;
}

I’ll not enter into technical details on how and why this is working, but you will notice two operations that are converting float-to-int and int-to-float using pointer calculus. This is used to get the internal representation of the float-bits as an integer, and vice-versa.

In Flash, you can do that by using a ByteArray :

  • write a Number by using the writeFloat() method
  • reset the position to 0
  • use readInt() to read the float bits

Sadly, calling these functions is very slow, so by using a ByteArray to implement invSqrt is actually slower than using 1.0 / Math.sqrt

haXe Optimizations

When you need to do a lot of Math calculus in a loop, haXe can really be helpful since it allow you to store the Math class into local variable outside of the loop :

var m = Math;
for( p in array )
     p.invDist = 1.0 / m.sqrt(p.x*p.x + p.y*p.y);

By doing this, you are caching the Math class into a register, which increases performances by about 60% already !

But that’s not all. Thanks to the flash.VMem API and Flash Player 10 Alchemy Opcodes (see previous post about it) we can implement the Quake invSqrt :

public function prepare() {
    var b = new flash.utils.ByteArray();
    b.length = 1024;
    flash.VMem.select(b);
}

public function invSqrt( x : Float ) : Float {
    var half = 0.5 * x;
    flash.VMem.setFloat(0,x);
    var i = flash.VMem.getI32(0);
    i = 0x5f3759df - (i>>1);
    flash.VMem.setI32(0,i);
    x = flash.VMem.getFloat(0);
    x = x * (1.5 - half*x*x);
    return x;
}

Here are the benchmarks results :

  • classic invSqrt : 92.21
  • optimized invSqrt (Math in a local variable) : 56.71 (+62%)
  • VMem invSqrt (using Quake algorithm) : 11.72 (+686% , or around 8 times faster !)

You’ll notice that by using this, the inverse-square-root takes around the same time as as Float division ! Again very good news for haxe3D, physaXe and all other haXe Flash applications that will use this trick !



Comments are closed.