gle.ovo.

# Fast Inverse Square Root

The `invSqrt` method is defined as the inverse of the square root of a value :

```function invSqrt( v : Float ) {
return 1.0 / Math.sqrt(v);
}```

It’s used in a lot of different graphics and sound applications. For instance, if you want to calculate the unit-vector between two points, you can do the following :

```function distanceVector( pa : Point, pb : Point, v : Point ) {
var dx = pb.x - pa.x;
var dy = pb.y - pa.y;
var dist = Math.sqrt(dx*dx + dy*dy);
v.x = dx / dist;
v.y = dy / dist;
}```

In that case, we are doing two float divides, which are expensive operations, so we can optimize the function by using two multiply and a single division :

```function distanceVector( pa : Point, pb : Point, v : Point ) {
var dx = pb.x - pa.x;
var dy = pb.y - pa.y;
var idist = invSqrt(dx*dx + dy*dy);
v.x = dx * idist;
v.y = dy * idist;
}```

Of course, calculating the inverse square root is quite an expensive operation as well. See for instance the following benchmarks results for variable Number operations on Flash Player 10 :

```+        1.3
*        1.5
/        10.9
sqrt     76.7```

# Quake Fast Inverse Square Root

A lot of research has been done to optimize the `invSqrt` calculus. One of the most famous was discovered in Quake source code and is commonly refered as “fast inverse square root”. It’s written in C language as the following function :

```float InvSqrt( float x ) {
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}```

I’ll not enter into technical details on how and why this is working, but you will notice two operations that are converting float-to-int and int-to-float using pointer calculus. This is used to get the internal representation of the float-bits as an integer, and vice-versa.

In Flash, you can do that by using a `ByteArray` :

• write a Number by using the `writeFloat()` method
• reset the `position` to 0
• use `readInt()` to read the float bits

Sadly, calling these functions is very slow, so by using a ByteArray to implement `invSqrt` is actually slower than using `1.0 / Math.sqrt`

# haXe Optimizations

When you need to do a lot of `Math` calculus in a loop, haXe can really be helpful since it allow you to store the `Math` class into local variable outside of the loop :

```var m = Math;
for( p in array )
p.invDist = 1.0 / m.sqrt(p.x*p.x + p.y*p.y);```

By doing this, you are caching the `Math` class into a register, which increases performances by about 60% already !

But that’s not all. Thanks to the `flash.VMem` API and Flash Player 10 Alchemy Opcodes (see previous post about it) we can implement the Quake `invSqrt` :

```public function prepare() {
var b = new flash.utils.ByteArray();
b.length = 1024;
flash.VMem.select(b);
}

public function invSqrt( x : Float ) : Float {
var half = 0.5 * x;
flash.VMem.setFloat(0,x);
var i = flash.VMem.getI32(0);
i = 0x5f3759df - (i>>1);
flash.VMem.setI32(0,i);
x = flash.VMem.getFloat(0);
x = x * (1.5 - half*x*x);
return x;
}```

Here are the benchmarks results :

• classic `invSqrt` : 92.21
• optimized `invSqrt` (Math in a local variable) : 56.71 (+62%)
• VMem `invSqrt` (using Quake algorithm) : 11.72 (+686% , or around 8 times faster !)

You’ll notice that by using this, the inverse-square-root takes around the same time as as Float division ! Again very good news for haxe3D, physaXe and all other haXe Flash applications that will use this trick !

﻿