Skip to content

Branchless programming

Branchless programming

Overview

branching “smaller” function:

int smaller(int a, int b)
{
  if (a < b)
    return a;
  else
    return b;
}

branchless version using arithmetic with conditional operators (actually slower, see below):

int smaller_branchless(int a, int b)
{
  return a * (a < b) + b * (b < a);
}

A lesson on premature optimization

assembly generated by the compiler for branching “smaller” function:

mov ebx,dword ptr [b]
cmp dword ptr[a],ebx
cmovl ebx,dword ptr [a]

AKA:

  1. move pointer b into register ebx
  2. compare the value in ebx with a
  3. conditionally move a into ebx if it’s less

In other words, the compiler recognized what we were trying to do and actually wrote a conditional into the assembly code.

If we were to translate our branchless example into straight assembly, it would look something like this:

mov ecx,dword ptr [a]
xor r8d,r8d
mov eas,dword ptr [b]
mov ebx,r8d
cmp eax,ecx
setle bl
imul ebx,eax
cmp ecx,eax
setl r8b
imul r8d,ecx
add ebx,r8d

An example of performant branchless code

branching toUpper implementation:

void toUpper(char* d, int count)
{
  for (int i = 0; i < count; i++)
    if (d[i] >= 'a' && d[i] <= 'z')
      d[i] -= 32;
}

branchless implementation:

void toUpper_branchless(char* d, int count)
{
  for (int i = 0; i < count; i++)
    d[i] = (d[i] * !(d[i] >= 'a' && d[i] <= 'z')) + (d[i] - 32) * (d[i] >= 'a' && d[i] <= 'z');
}

marginally cleaner branchless implementation:

void toUpper_branchless(char* d, int count)
{
  for (int i = 0; i < count; i++)
    d[i] -= 32 * (d[i] >= 'a' && d[i] <= 'z');
}

Review