# Fast Inverse Square Root

The `invSqrt`

method is defined as the inverse of the square root of a value :

function invSqrt( v : Float ) { return 1.0 / Math.sqrt(v); }

It’s used in a lot of different graphics and sound applications. For instance, if you want to calculate the unit-vector between two points, you can do the following :

function distanceVector( pa : Point, pb : Point, v : Point ) { var dx = pb.x - pa.x; var dy = pb.y - pa.y; var dist = Math.sqrt(dx*dx + dy*dy); v.x = dx / dist; v.y = dy / dist; }

In that case, we are doing two float divides, which are expensive operations, so we can optimize the function by using two *multiply* and a single division :

function distanceVector( pa : Point, pb : Point, v : Point ) { var dx = pb.x - pa.x; var dy = pb.y - pa.y; var idist = invSqrt(dx*dx + dy*dy); v.x = dx * idist; v.y = dy * idist; }

Of course, calculating the inverse square root is quite an expensive operation as well. See for instance the following benchmarks results for variable Number operations on Flash Player 10 :

+ 1.3 * 1.5 / 10.9 sqrt 76.7

# Quake Fast Inverse Square Root

A lot of research has been done to optimize the `invSqrt`

calculus. One of the most famous was discovered in Quake source code and is commonly refered as “fast inverse square root”. It’s written in C language as the following function :

float InvSqrt( float x ) { float xhalf = 0.5f*x; int i = *(int*)&x; i = 0x5f3759df - (i>>1); x = *(float*)&i; x = x*(1.5f - xhalf*x*x); return x; }

I’ll not enter into technical details on how and why this is working, but you will notice two operations that are converting float-to-int and int-to-float using pointer calculus. This is used to get the internal representation of the float-bits as an integer, and vice-versa.

In Flash, you can do that by using a `ByteArray`

:

- write a Number by using the
`writeFloat()`

method - reset the
`position`

to 0 - use
`readInt()`

to read the float bits

Sadly, calling these functions is very slow, so by using a ByteArray to implement `invSqrt`

is actually slower than using `1.0 / Math.sqrt`

…

# haXe Optimizations

When you need to do a lot of `Math`

calculus in a loop, haXe can really be helpful since it allow you to store the `Math`

class into local variable outside of the loop :

var m = Math; for( p in array ) p.invDist = 1.0 / m.sqrt(p.x*p.x + p.y*p.y);

By doing this, you are *caching* the `Math`

class into a register, which increases performances by about 60% already !

But that’s not all. Thanks to the `flash.VMem`

API and Flash Player 10 Alchemy Opcodes (see previous post about it) we can implement the Quake `invSqrt`

:

public function prepare() { var b = new flash.utils.ByteArray(); b.length = 1024; flash.VMem.select(b); } public function invSqrt( x : Float ) : Float { var half = 0.5 * x; flash.VMem.setFloat(0,x); var i = flash.VMem.getI32(0); i = 0x5f3759df - (i>>1); flash.VMem.setI32(0,i); x = flash.VMem.getFloat(0); x = x * (1.5 - half*x*x); return x; }

Here are the benchmarks results :

- classic
`invSqrt`

: 92.21 - optimized
`invSqrt`

(Math in a local variable) : 56.71 (+62%) - VMem
`invSqrt`

(using Quake algorithm) : 11.72 (+686% , or around 8 times faster !)

You’ll notice that by using this, the inverse-square-root takes around the same time as as Float division ! Again very good news for haxe3D, physaXe and all other haXe Flash applications that will use this trick !