Impossible Algorithm: Computing income tax in constant time

Dan
ITNEXT
Published in
11 min readMay 17, 2023

--

Machinery that categorizes huge piles of money — Bing Image Creator

I had the strangest encounter while interviewing for a senior developer role at Oracle in 2016. I was tasked with designing an algorithm to compute income taxes. After coming up with the typical answer, I was told that I achieved the optimal solution, but then I came up with an algorithm that was deemed to be mathematically impossible.

Luckily, the interviewer entertained the idea, and many surprises followed. I eventually moved on from Oracle two years later, but it turns out that this is a common interview question at many companies, and they’re all testing for the false “optimal” solution.

The solution turned into a never-ending source of amusement over the years as many developers continued to bet that a constant-time algorithm with reasonable memory consumption couldn’t exist as it implies being able to search a sorted list of ranges in constant time. Of all the developers that tried, I haven’t met one that was able to find an efficient constant-time solution, so I decided to share this with the world.

Defining the problem

Most regions have an incremental income tax system where the tax rate starts low and increases for each subsequent chunk of income.

For example, here are the effective tax rates in the state of Texas for 2022:

10% on income from $0 to $10,275
12% on income from $10,275 to $41,775
22% on income from $41,775 to $89,075
24% on income from $89,075 to $170,050
32% on income from $170,050 to $215,950
35% on income from $215,950 to $539,900
37% on income over $539,900

Given the above tax brackets, if you lived in Texas in 2022 and had a taxable income of $50,000, then your income tax would be calculated as follows:

  $1,027.50 (10%) for the first $10,275
+ $3,780.00 (12%) for the $31,500 chunk between $10,275 and $41,775
+ $1,809.50 (22%) for the last $8,225 chunk between $41,775 and $50,000
-----------
= $6,617.00

Your task is to create a function that calculates an individual’s income tax, given their taxable income.

A large number of tax brackets will be provided in the structure of your choice. The tax bracket data contains income ranges along with the tax percentage for each range, like the Texas example above.

Your solution will be used many times to compute the income tax for each person in a large population. If your function requires anything else, provide a one-time setup function to prepare your custom data structure.

Approach

An optimal solution usually relies on many insights that build on each other. This makes it very difficult to design or even understand an optimal algorithm without gaining the necessary insights along the way.

I’ll include the process and decisions iterating towards more sophisticated solutions. Understanding that process will make you a better problem-solver and unlock new ways of designing solutions. I recommend pausing before each of the following sections to try solving the rest on your own.

Modeling

Modeling is one of the main aspects that’s used to gauge the level of a candidate during interviews. A good model reduces defect rates while also making it easier to see and iterate on improvements.

We’ll use Kotlin to define zero-cost abstractions for the money and percent representations. This centralizes computations so that if the representation changes, we wouldn’t need to find and fix all usages.

Money representation:

To avoid errors arising from floating point arithmetic, we’ll store amounts as pennies in 64-bit integers:

@JvmInline
value class Money(val cents: Long) {
operator fun plus(amount: Money) = Money(cents + amount.cents)

operator fun minus(amount: Money) = Money(cents - amount.cents)

operator fun times(amount: Long) = Money(cents * amount)

operator fun div(amount: Long) = Money(cents / amount)

operator fun compareTo(other: Money) = cents.compareTo(other.cents)
}

Although the above computations appear to create new Money instances, the JvmInline annotation instructs the Kotlin compiler to eliminate the wrapper at compile time. Variables of type Money will end up storing and operating on the underlying primitive type, making it very efficient.

Although not included here, I would add checks to throw an exception when encountering overflow in the above calculations to reduce the chance of defects in production. The overflow checks would also force us to re-evaluate the representation if we ever encounter extreme hyperinflation.

Percent representation:

Percentages will also use integer types to avoid floating-point arithmetic and use basis points, so 37.25% will be stored as 3725:

@JvmInline
value class Percent(val amountBps: Long) {
operator fun times(money: Money) = money * amountBps / 10_000
}

Tax bracket representation:

data class TaxBracket(
val lowerBound: Money,
val upperBound: Money?, // null when bracket has no upper bound
val taxRate: Percent,
) {
init {
require(upperBound == null || upperBound > lowerBound)
}

/** Computes the tax for the income that falls within this bracket */
fun computeBracketTax(income: Money): Money {
require(income >= lowerBound)

val highestIncomeInBracket = when {
upperBound == null -> income
income <= upperBound -> income
else -> upperBound
}
return taxRate * (highestIncomeInBracket - lowerBound)
}
}

For this challenge, we’ll assume that the tax brackets are well formed without issues like duplicates, overlapping ranges, etc.

We’ll choose the data structure that will be provided to be a list of TaxBracket instances ordered by their lower bound in ascending order.

Linear time solution

The obvious solution would be to loop through all the relevant tax brackets, summing the tax associated with each bracket:

fun computeTax(income: Money, taxBrackets: List<TaxBracket>): Money {
var accumulatedTax = Money(cents = 0)
for (taxBracket in taxBrackets) {
if (income <= taxBracket.lowerBound) break

accumulatedTax += taxBracket.computeBracketTax(income)
}
return accumulatedTax
}

Note how clean, trivial, and obvious it is to reason about the above code due to the great modeling from the previous step.

Performance

Given
N = # of tax brackets

Space Complexity
O(1) to keep track of the accumulated tax

Time complexity of computeTax function
O(N) to iterate through all the brackets in the worst case

“Optimal” Log(N) solution

From the linear solution, note that the accumulated tax up to a certain bracket will always be identical. Instead of re-accumulating that for every tax computation, we could have each bracket store the accumulated tax from all earlier brackets. This will allow us to compute the total tax once we find the appropriate bracket:

data class AccumulatedTaxBracket(
private val regularBracket: TaxBracket,
private val accumulatedTax: Money,
) {
val lowerBound: Money
get() = regularBracket.lowerBound

val upperBound: Money?
get() = regularBracket.upperBound

fun computeTotalTax(income: Money): Money {
require(income >= lowerBound)
require(upperBound == null || income < upperBound!!)

return accumulatedTax + regularBracket.computeBracketTax(income)
}
}

The AccumulatedTaxBracket can be populated from the regular TaxBracket:

fun setup(taxBrackets: List<TaxBracket>): List<AccumulatedTaxBracket> {
var accumulatedTax = Money(cents = 0)

return taxBrackets.map { bracket ->
val accumulatedTaxUpToStartOfBracket = accumulatedTax
if (bracket.upperBound != null) {
accumulatedTax += bracket.computeBracketTax(bracket.upperBound)
}
AccumulatedTaxBracket(bracket, accumulatedTaxUpToStartOfBracket)
}
}

To compute income tax, we can use binary search to find the appropriate bracket in Log(N) time and then compute the tax in constant time:

fun computeTax(
income: Money,
taxBrackets: List<AccumulatedTaxBracket>
): Money {
val bracketIndex = taxBrackets.binarySearch { bracket ->
when {
bracket.lowerBound > income -> 1
bracket.upperBound != null && bracket.upperBound!! <= income -> -1
else -> 0
}
}
return taxBrackets[bracketIndex].computeTotalTax(income)
}

Performance

Given
N = # of tax brackets

Space Complexity
O(N) to keep track of accumulated tax amount for each bracket

Time complexity of computeTax function
O(Log(N)) to binary search the brackets and the rest is constant time

This is the widely accepted “optimal” solution, but we can do much better.

Constant-time iteration 1

We’ll start with a very inefficient idea as a starting point and iterate on it. The previous solution isn’t constant time due to the effort of finding the correct bracket. We can eliminate this effort by pre-computing the tax for every income up to the highest reported income:

fun setup(
maxIncome: Money,
brackets: List<AccumulatedTaxBracket>,
): Map<Money, Money> {
return generateSequence(Money(cents = 0)) { it + Money(cents = 1)}
.takeWhile { it <= maxIncome}
.associateWith { computeTax(it, brackets) } // use Log(N) algorithm
}

The function to compute the tax becomes trivial as it looks up the tax in a hashmap which is constant amortized time:

fun computeTax(income: Money, precomputedTax: Map<Money, Money>): Money {
return precomputedTax[income]!!
}

There are several problems with this approach:

  1. The computeTax function isn’t constant time, even though it’s just looking up the tax in the map. Hashmaps are backed by arrays, and array sizes cannot exceed about 2.1 billion in most languages. A billionaire recently reported an income of $10 billion. This would exceed the scalability of hashmaps as each bucket would end up storing about 500 tax values when accounting for every penny.
  2. Even ignoring the hashmap array limitation, calling this constant time is disingenuous. For a max income of $10 billion, we’re performing a trillion tax calculations, orders of magnitude larger than the size of the population, so we can’t amortize it away. From now on, we’ll provide the amortized time complexity per person by including the initial setup effort divided by the population size.
  3. The memory consumption of storing pre-computed taxes for all possible incomes is horrendous. As a rough estimate, a max income of $10 billion would use about 100 terabytes of RAM after accounting for the hashmap nodes, keys, values, and object overheads.

Performance

Given
N = # of tax brackets
I = Max income in pennies
P = # of individuals in the population

Space Complexity
O(I) to store the precomputed tax for every possible income

Time complexity of computeTax function
O(I / maxArraySize) as we're exceeding the scalability of hashmaps

Time complexity amortized per person
O(I*Log(N)/P) because the initial setup amortized per person dominates

Constant-time iteration 2

We can reduce the computation and memory overhead of the previous solution by several thousand times. Instead of pre-computing taxes up to the highest income, we could stop at the lower bound of the highest tax bracket and compute higher incomes on demand in constant time:

fun computeTax(
income: Money,
highestBracket: AccumulatedTaxBracket,
precomputedTax: Map<Money, Money>,
): Money {
return precomputedTax[income] ?: highestBracket.computeTotalTax(income)
}

For Texas, this avoids the hashmap array limitation. However, the overhead of the initial setup is still too large to be amortized away, so this still isn’t truly a constant amortized time solution. The roughly 6-gigabyte memory consumption of this solution is still unacceptably high.

Performance

Given
N = # of tax brackets
B = The lower bound of the highest bracket in pennies
P = # of individuals in the population

Space Complexity
O(B) to store the tax for every possible income up to the highest bracket

Time complexity of computeTax function
O(1) to lookup the value in the map

Time complexity amortized per person
O(B*Log(N)/P) Setup complexity divided by population size

Constant-time iteration 3

During the initial setup, instead of searching for the tax bracket each time using a previous algorithm, we can keep track of the appropriate bracket and conditionally advance to the next one as we increase the income. This allows us to drop the per-income effort to constant time.

Additionally, since we can compute the tax in constant time when given the right tax bracket, we can stop storing pre-computed tax values and have each income associated with the appropriate bracket. This opens the door for the next iteration while reducing memory consumption by about 40%.

fun setup(
taxBrackets: List<AccumulatedTaxBracket>,
): Map<Money, AccumulatedTaxBracket> {
var bracketIndex = 0
return generateSequence(Money(cents = 0)) { it + Money(cents = 1)}
.takeWhile { it < taxBrackets.last().lowerBound }
.associateWith { income ->
if (income >= taxBrackets[bracketIndex].upperBound!!) {
bracketIndex++
}
taxBrackets[bracketIndex]
}
}

The tax will always be computed on demand in constant amortized time:

fun computeTax(
income: Money,
highestTaxBracket: AccumulatedTaxBracket,
incomeToBracket: Map<Money, AccumulatedTaxBracket>,
): Money {
return incomeToBracket[income]?.computeTotalTax(income)
?: highestTaxBracket.computeTotalTax(income)
}

Performance

This is much better than the previous iteration as it allows us to amortize the initial setup effort for regions with a high population-to-maxBracket ratio, such as Texas, resulting in an overall constant amortized time. However, this isn’t true in general, so it’s not expected to be constant amortized time for all regions in the real world. The roughly 4-gigabyte memory consumption is also unacceptably high.

Given
B = The lower bound of the highest bracket in pennies
P = # of individuals in the population

Space Complexity
O(B) to reference a bracket for every possible income up to the max bracket

Time complexity of computeTax function
O(1) to lookup value in the map and perform another constant-time operation

Time complexity amortized per person
O(B/P) setup complexity divided by population size

Constant-time iteration 4

We can reduce both the computation and memory overhead of the previous solution by several thousand times. Storing pre-computed taxes in earlier iterations forced us to store the income associated with each tax amount. Now that we’re referencing brackets instead, we can be much more efficient about the way we think about income ranges.

When looking at the tax brackets of Texas:

10% on income from $0 to $10,275
12% on income from $10,275 to $41,775
22% on income from $41,775 to $89,075
24% on income from $89,075 to $170,050
32% on income from $170,050 to $215,950
35% on income from $215,950 to $539,900
37% on income over $539,900

We can shift from thinking about the exact lower and upper bounds of each bracket and instead think about the size of these brackets:

10% for the first $10,275 (10,275 - 0)
12% for the next $31,500 (41,775 - 10,275)
22% for the next $47,300 (89,075 - 41,775)
24% for the next $80,975 (170,050 - 89,075)
32% for the next $45,900 (215,950 - 170,050)
35% for the next $323,950 (539,900 - 215,950)
37% for anything more

We can compute the greatest common divisor (gcd) of all bracket sizes, which for the state of Texas works out to $25, which is 2500 pennies.

Wait, what if the gcd is 1? Tax brackets are defined in whole dollars in the real world so while the gcd can evaluate to $1, this is 100 pennies so it’s still at least 100 times more efficient than iteration 3 in the worst case. Nevertheless, the next iteration build on this solution to eliminate any concerns about the gcd even if we wanted to allow brackets with penny-level granularity.

Instead of having an entry in the map for every income, we can have one for every multiple of the gcd in pennies:

data class TaxData(
val gcdAmount: Long,
val highestBracket: AccumulatedTaxBracket,
val incomeToBracket: Map<Money, AccumulatedTaxBracket>,
)

fun setup(
taxBrackets: List<AccumulatedTaxBracket>,
): TaxData {
val greatestCommonDivisor = brackets.asSequence()
.filter { it.upperBound != null }
.map { it.upperBound!!.cents - it.lowerBound.cents }
.reduceOrNull { result, size -> gcd(result, size) }
?: return TaxData(1L, taxBrackets.last(), emptyMap())

val gcdAmount = Money(cents = greatestCommonDivisor)
var bracketIndex = 0
val bracketMap = generateSequence(Money(cents = 0)) { it + gcdAmount }
.takeWhile { it < taxBrackets.last().lowerBound }
.associateWith { income ->
if (income >= taxBrackets[bracketIndex].upperBound!!) {
bracketIndex++
}
taxBrackets[bracketIndex]
}

return TaxData(gcdAmount.cents, taxBrackets.last(), bracketMap)
}

where the greatest common divisor is implemented as:

fun gcd(a: Long, b: Long): Long = when {
a == 0L -> b
else -> gcd(b % a, a)
}

Given an arbitrary income, we need to compute which of the multiples of the gcd it rounds down to and then look that up in the map:

fun computeTax(
income: Money,
taxData: TaxData,
): Money {
val (gcdAmount, highestBracket, incomeToBracket) = taxData
val roundedDownIncome = income / gcdAmount * gcdAmount

return incomeToBracket[roundedDownIncome]?.computeTotalTax(income)
?: highestBracket.computeTotalTax(income)
}

Performance

For the state of Texas, this solution:

  • Adds $539,900 * 100 / 2,500 = 21,596 brackets into the map
  • Performs a single constant time computation per person. The 21,596 steps during the initial setup amortize to less than 0.001 constant-time operations per person given the population of 29 million.
  • Uses about 2 megabytes of memory versus the 4 gigabytes the previous solution required.
Given
B = The lower bound of the highest bracket in pennies
P = # of individuals in the population
gcd = greatest common divisor of bracket sizes in pennies

Space Complexity
O(B/gcd) for every multiple of the gcd up to the max bracket

Time complexity of computeTax function
O(1) to round down to the nearest multiple, lookup bracket, and compute tax

Time complexity amortized per person
O(B/gcd/P)
initial setup effort divided by the population size
~ O(1) expected in the real world

B will be significantly smaller than gcd * P in the real world, so the
amortized time per person is expected to be O(1) in practice.

This is the first iteration I would consider to have constant amortized time in the real world with reasonable memory consumption.

Constant-time iteration 5

As promised, iteration 5 eliminates the Achilles heel of a gcd of 1 penny and is over 500 times more efficient on common datasets.

See part 2, which includes iteration 5 and the mind-boggling iteration 6!

--

--