Recursive DFS III: Hyper-Dimensional Depth

In Part 2, we hit 2.1 billion recursive calls using Tail Call Optimization. We thought that was the limit. It wasn't. The rabbit hole goes much deeper. Welcome to Hyper-Dimensional Depth.

🚀 The Results

1 TRILLION recursive calls. Not million. Not billion. Trillion.

Where We Left Off

At the end of Part 2, we had this algorithm hitting 2.1 billion recursive calls:

// Part 2: 2.1 Billion (limited by array size)
public static void DfsRecursiveLinear(
    this NodeLinear[] nodes,
    int index,
    bool[] visited)
{
    if ((uint)index >= (uint)nodes.Length || visited[index])
        return;

    visited[index] = true;

    int next = nodes[index].NextIndex;
    if (next >= 0)
        nodes.DfsRecursiveLinear(next, visited);
}

We stopped at 2.1 billion because of two bottlenecks:

NodeLinear[] - C# arrays cap at ~2.1B elements
bool[] - Another 2.1B element array
int indices - Max value of ~2.1B

But here's the thing: TCO itself has no limit. It's just a jmp instruction. The only thing stopping us was our data structures. So we stripped them away.

The Stripping: Piece by Piece 🔪

1. Kill the Node Array

For a linear graph, we don't need a node array at all. The next node is always index + 1. That's 8 bytes per node × 2.1 billion = 17GB of RAM eliminated.

// Before: nodes[index].NextIndex
// After:  index + 1
// Node array? GONE.

2. Upgrade to `long` Indices

int caps at 2.1 billion. long caps at 9.2 quintillion. That's 9,223,372,036,854,775,807. We're not hitting that ceiling.

3. ChunkedBitArray Instead of `bool[]`

bool[] uses 1 byte per element. That's wasteful—we only need 1 bit. But C# arrays cap at 2.1B elements. Solution: chunk it.

// ChunkedBitArray: 1 bit per node, unlimited size
public class ChunkedBitArray
{
    private const int BitsPerChunk = 1 << 30; // 1 billion bits per chunk
    private readonly ulong[][] _chunks;

    // Get/Set use bit manipulation - O(1)
    public bool Get(long index) { ... }
    public void Set(long index) { ... }
}

Memory savings:

bool[] at 1 trillion: ~1 TB (impossible)
ChunkedBitArray at 1 trillion: ~119 GB ✅

The Final Algorithm: Bare Metal 🔩

// Hyper-Dimensional DFS: 1 TRILLION+ recursive calls
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public static void DfsLinearLong(
    long index,
    long maxDepth,
    ChunkedBitArray visited)
{
    if (index >= maxDepth || visited.Get(index))
        return;

    visited.Set(index);

    // Linear graph: next = index + 1
    // This is in tail position - TCO kicks in
    DfsLinearLong(index + 1, maxDepth, visited);
}

That's it. No node array. No bounds checking. No unnecessary allocations. Just pure recursion hitting the metal.

🚀 The Results: Hyper-Dimensional Depth

	Depth	Time	Ops/sec	Memory
🎯	3 Billion	2.8s	1.06B	357 MB
🔥	5 Billion	4.6s	1.08B	596 MB
🚀	10 Billion	8.7s	1.15B	1.2 GB
🤯	16 Billion	13.6s	1.18B	1.9 GB
💥	50 Billion	45.9s	1.09B	5.9 GB
🌟	100 Billion	89.8s	1.11B	11.9 GB
🌌	500 Billion	7.6 min	1.09B	59.6 GB
🏆	1 TRILLION	15.2 min	1.10B	119 GB

🏆 1 Trillion Recursive Calls

From 30 million in Part 1, to 2.1 billion in Part 2, to 1 TRILLION in Part 3. That's a 33,333x improvement from where we started.

The Key Insights

1. TCO Has No Depth Limit

Tail Call Optimization transforms recursion into a jmp instruction. It doesn't accumulate stack frames. The only limits are:

RAM for your data structures
Time (patience)

2. Data Structures Are the Bottleneck

We didn't hit 2.1 billion because of recursion limits—we hit it because of int and array limits. Strip those away, and recursion goes to infinity.

3. BitArrays Are 8x More Efficient

bool[] wastes 7 bits per element. A proper bit array uses 1 bit. At scale, that's the difference between "possible" and "impossible."

4. Linear Performance

Every run maintained ~1.1 billion operations per second. The performance is linear. Double the depth, double the time. No degradation, no surprises.

But Wait—Isn't Using the Heap "Cheating"? 🤔

You might be thinking: "Hold on—you're using a ChunkedBitArray on the heap. Doesn't that violate the rules of recursion vs iteration?"

No. In fact, this exposes the biggest lie in the "Recursion vs. Iteration" debate.

The "Iterative Stack" is ALSO on the Heap!

This is the most common misconception. When people say "Use Stack<T> to avoid Stack Overflow," they are effectively saying: "Move the data from the Thread Stack (limited to ~1MB) to the Heap (limited by RAM)."

Approach	Control Flow	State Storage
Standard Recursive	Thread Stack (crashes ~10K)	Heap (visited array)
Iterative Stack<T>	Heap (Stack<T>)	Heap (visited array)
TCO Recursive	CPU Registers (jmp)	Heap (BitArray)

The comparison is fair: Heap vs. Heap.

The Iterative approach uses Heap memory to store the Stack<T>.
My approach uses Heap memory to store the BitArray.

Why TCO Still Wins

The iterative approach is double-dipping on the Heap:

Stack<T> for the traversal stack (pointers to nodes)
Plus memory for the visited tracking

The Stack<T> class internally uses an array T[]. In .NET, arrays cannot have more than ~2.1 Billion elements—even if you have 2TB of RAM. That's a software limit, not a hardware one.

My TCO approach:

Control Flow: Zero memory (just a JMP instruction)
State: The ChunkedBitArray (Heap)

I bypassed the T[] array limit entirely by chunking. I removed the "middle man" (the Stack object). The result? 1 Trillion+ while iterative is stuck at 2.1 Billion.

The Verdict

Did I use System.StackOverflowException to crash? No.
Did I use infinite recursion? Yes.
Did I execute 1 Trillion+ calls? Yes.

I simply proved that Hardware (TCO) > Abstraction (Stack<T>).

Test Environment

OS	Windows 11
Runtime	.NET 10 Native AOT
CPU	AMD Ryzen 9 9950X3D 16-Core Processor
RAM	192 GB DDR5
Stack Size	~2GB (int.MaxValue thread stack allocation)

Closing Thoughts

The funny part? Even though this result was entirely unexpected, the irony is that tail call optimization has been around since the very beginning. It is not at all a new idea.

But it's amazing how it is now changing my mental model of what is possible and what isn't—and debunks the myth that you should always convert recursive calls to iterative ones.

Stay tuned. :)

Related Skills

csharp dotnet performance algorithm-design

Keywords

DFS depth first search recursion TCO tail call optimization Native AOT trillion BitArray ChunkedBitArray hyper-dimensional bare metal

Recursive DFS III: Hyper-Dimensional Depth

🚀 The Results

Where We Left Off

The Stripping: Piece by Piece 🔪

1. Kill the Node Array

2. Upgrade to `long` Indices

3. ChunkedBitArray Instead of `bool[]`

The Final Algorithm: Bare Metal 🔩

🚀 The Results: Hyper-Dimensional Depth

🏆 1 Trillion Recursive Calls

The Key Insights

1. TCO Has No Depth Limit

2. Data Structures Are the Bottleneck

3. BitArrays Are 8x More Efficient

4. Linear Performance

But Wait—Isn't Using the Heap "Cheating"? 🤔

The "Iterative Stack" is ALSO on the Heap!

Why TCO Still Wins

The Verdict

Test Environment

Closing Thoughts

Related Skills

Keywords

Related Articles

HPC DFS Recursive

HPC DFS Recursive II

🚀 The Results

Where We Left Off

The Stripping: Piece by Piece 🔪

1. Kill the Node Array

2. Upgrade to long Indices

3. ChunkedBitArray Instead of bool[]

The Final Algorithm: Bare Metal 🔩

🚀 The Results: Hyper-Dimensional Depth

🏆 1 Trillion Recursive Calls

The Key Insights

1. TCO Has No Depth Limit

2. Data Structures Are the Bottleneck

3. BitArrays Are 8x More Efficient

4. Linear Performance

But Wait—Isn't Using the Heap "Cheating"? 🤔

The "Iterative Stack" is ALSO on the Heap!

Why TCO Still Wins

The Verdict

Test Environment

Closing Thoughts

Related Skills

Keywords

Related Articles

HPC DFS Recursive

HPC DFS Recursive II

2. Upgrade to `long` Indices

3. ChunkedBitArray Instead of `bool[]`