In Part 2, we hit 2.1 billion recursive calls using Tail Call Optimization. We thought that was the limit. It wasn't. The rabbit hole goes much deeper. Welcome to Hyper-Dimensional Depth.
🚀 The Results
1 TRILLION recursive calls. Not million. Not billion. Trillion.
Where We Left Off
At the end of Part 2, we had this algorithm hitting 2.1 billion recursive calls:
// Part 2: 2.1 Billion (limited by array size)
public static void DfsRecursiveLinear(
this NodeLinear[] nodes,
int index,
bool[] visited)
{
if ((uint)index >= (uint)nodes.Length || visited[index])
return;
visited[index] = true;
int next = nodes[index].NextIndex;
if (next >= 0)
nodes.DfsRecursiveLinear(next, visited);
}
We stopped at 2.1 billion because of two bottlenecks:
NodeLinear[]- C# arrays cap at ~2.1B elementsbool[]- Another 2.1B element arrayintindices - Max value of ~2.1B
But here's the thing: TCO itself has no limit. It's just a jmp instruction.
The only thing stopping us was our data structures. So we stripped them away.
The Stripping: Piece by Piece 🔪
1. Kill the Node Array
For a linear graph, we don't need a node array at all. The next node is always index + 1.
That's 8 bytes per node Ă— 2.1 billion = 17GB of RAM eliminated.
// Before: nodes[index].NextIndex
// After: index + 1
// Node array? GONE.
2. Upgrade to long Indices
int caps at 2.1 billion. long caps at 9.2 quintillion.
That's 9,223,372,036,854,775,807. We're not hitting that ceiling.
3. ChunkedBitArray Instead of bool[]
bool[] uses 1 byte per element. That's wasteful—we only need 1 bit.
But C# arrays cap at 2.1B elements. Solution: chunk it.
// ChunkedBitArray: 1 bit per node, unlimited size
public class ChunkedBitArray
{
private const int BitsPerChunk = 1 << 30; // 1 billion bits per chunk
private readonly ulong[][] _chunks;
// Get/Set use bit manipulation - O(1)
public bool Get(long index) { ... }
public void Set(long index) { ... }
}
Memory savings:
bool[]at 1 trillion: ~1 TB (impossible)ChunkedBitArrayat 1 trillion: ~119 GB âś…
The Final Algorithm: Bare Metal 🔩
// Hyper-Dimensional DFS: 1 TRILLION+ recursive calls
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public static void DfsLinearLong(
long index,
long maxDepth,
ChunkedBitArray visited)
{
if (index >= maxDepth || visited.Get(index))
return;
visited.Set(index);
// Linear graph: next = index + 1
// This is in tail position - TCO kicks in
DfsLinearLong(index + 1, maxDepth, visited);
}
That's it. No node array. No bounds checking. No unnecessary allocations. Just pure recursion hitting the metal.
🚀 The Results: Hyper-Dimensional Depth
| Depth | Time | Ops/sec | Memory | |
|---|---|---|---|---|
| 🎯 | 3 Billion | 2.8s | 1.06B | 357 MB |
| 🔥 | 5 Billion | 4.6s | 1.08B | 596 MB |
| 🚀 | 10 Billion | 8.7s | 1.15B | 1.2 GB |
| 🤯 | 16 Billion | 13.6s | 1.18B | 1.9 GB |
| đź’Ą | 50 Billion | 45.9s | 1.09B | 5.9 GB |
| 🌟 | 100 Billion | 89.8s | 1.11B | 11.9 GB |
| 🌌 | 500 Billion | 7.6 min | 1.09B | 59.6 GB |
| 🏆 | 1 TRILLION | 15.2 min | 1.10B | 119 GB |
🏆 1 Trillion Recursive Calls
From 30 million in Part 1, to 2.1 billion in Part 2, to 1 TRILLION in Part 3. That's a 33,333x improvement from where we started.
The Key Insights
1. TCO Has No Depth Limit
Tail Call Optimization transforms recursion into a jmp instruction.
It doesn't accumulate stack frames. The only limits are:
- RAM for your data structures
- Time (patience)
2. Data Structures Are the Bottleneck
We didn't hit 2.1 billion because of recursion limits—we hit it because of int and array limits.
Strip those away, and recursion goes to infinity.
3. BitArrays Are 8x More Efficient
bool[] wastes 7 bits per element. A proper bit array uses 1 bit.
At scale, that's the difference between "possible" and "impossible."
4. Linear Performance
Every run maintained ~1.1 billion operations per second. The performance is linear. Double the depth, double the time. No degradation, no surprises.
But Wait—Isn't Using the Heap "Cheating"? 🤔
You might be thinking: "Hold on—you're using a ChunkedBitArray on the heap.
Doesn't that violate the rules of recursion vs iteration?"
No. In fact, this exposes the biggest lie in the "Recursion vs. Iteration" debate.
The "Iterative Stack" is ALSO on the Heap!
This is the most common misconception. When people say "Use Stack<T> to avoid
Stack Overflow," they are effectively saying: "Move the data from the Thread Stack (limited to ~1MB)
to the Heap (limited by RAM)."
| Approach | Control Flow | State Storage |
|---|---|---|
| Standard Recursive | Thread Stack (crashes ~10K) | Heap (visited array) |
| Iterative Stack<T> | Heap (Stack<T>) | Heap (visited array) |
| TCO Recursive | CPU Registers (jmp) | Heap (BitArray) |
The comparison is fair: Heap vs. Heap.
- The Iterative approach uses Heap memory to store the
Stack<T>. - My approach uses Heap memory to store the
BitArray.
Why TCO Still Wins
The iterative approach is double-dipping on the Heap:
Stack<T>for the traversal stack (pointers to nodes)- Plus memory for the visited tracking
The Stack<T> class internally uses an array T[]. In .NET, arrays
cannot have more than ~2.1 Billion elements—even if you have 2TB of RAM. That's a
software limit, not a hardware one.
My TCO approach:
- Control Flow: Zero memory (just a
JMPinstruction) - State: The
ChunkedBitArray(Heap)
I bypassed the T[] array limit entirely by chunking. I removed the "middle man"
(the Stack object). The result? 1 Trillion+ while iterative is stuck at 2.1 Billion.
The Verdict
- Did I use
System.StackOverflowExceptionto crash? No. - Did I use infinite recursion? Yes.
- Did I execute 1 Trillion+ calls? Yes.
I simply proved that Hardware (TCO) > Abstraction (Stack<T>).
Test Environment
| OS | Windows 11 |
|---|---|
| Runtime | .NET 10 Native AOT |
| CPU | AMD Ryzen 9 9950X3D 16-Core Processor |
| RAM | 192 GB DDR5 |
| Stack Size | ~2GB (int.MaxValue thread stack allocation) |
Closing Thoughts
The funny part? Even though this result was entirely unexpected, the irony is that tail call optimization has been around since the very beginning. It is not at all a new idea.
But it's amazing how it is now changing my mental model of what is possible and what isn't—and debunks the myth that you should always convert recursive calls to iterative ones.
Stay tuned. :)