In the previous post, we worked through the enormous power of logarithmic functions for reducing search spaces. For our Binary Tree of the Universe, it would take us only 266 steps to locate any atom in the observable universe. Tree data structures such as B-trees with wider fan-outs require even less steps.
I find this kind of miraculous. But this holds in the most extreme case of cosmic scope. While searching for individual atoms in the universe may be more relevant for humans millennia from now, we on the other hand grapple with more Earthly confines. Let’s bring this down to Earth (huhu):
The Earth has an estimated ~10^50 atoms; convert to powers of 2:
ln 10 / ln 2 = 3.322~ 10^50 = 2^(50 * 3.322) = 2^166.1
Simplifying 2^166.1 to 2^166, our Binary Tree of the Universe would handle searching every atom on Earth with 166 steps. A mildy pleasing coincidence that 2^100 is the difference between Earth and the observable universe.
Perhaps every atom on Earth is still too ambitious. Let’s further ground this within the context of humanity:
- 1 million - typical token context window for LLMs today: ~2^19.93 ≈ 20 steps
- 1 billion - favorite valuation goal of startups: ~2^29.90 ≈ 30 steps
- 8 billion - all humans alive today: ~2^32.90 ≈ 33 steps
- 100 billion - all humans that have ever lived: 2^36.541 ≈ 37 steps
37 steps from every person that has ever lived. You would need only a little over a quarter of the Spanish steps to search through every human that has ever lived!
But one could say, that’s only the people! What about all the content and information they’re creating? If we indexed all the internet data ever created, currently estimated to be ~100 zettabytes, or ~2^76.4 bytes and some change, our search for an individual byte still only requires 77 steps! We can then define Earth-scale log n as 166 and the current Human-scale log n as 77. It'd be nice to have some breathing room for human data's future growth, so let's make Human-scale log n a nice round 80.
This explains part of the magic of YouTube, Meta, TikTok, and the rest of the social media players having reasonable access times for their massive info and content libraries. It’s possible to reasonably store, index, and retrieve social media content for every human alive. After all, all human data ever created is only a little more than half way up the Spanish Steps.
As a final exercise to give us a sense of what logarithmically sits between 1 and 2^266 (all the atoms in the observable universe), I'll compile it all into two handy reference tables, starting with Human-scale:
| Scale | Count | log₂(n) | Spanish Steps |
|---|---|---|---|
| LLM context window (1M tokens) | ~10^6 | 20 | 15% up the stairs |
| Unicorn valuation ($1B) | ~10^9 | 30 | 22% up the stairs |
| All humans alive today | ~8×10^9 | 33 | 24% up the stairs |
| All humans ever lived | ~10^11 | 37 | 27% up the stairs |
| All internet data (bytes) | ~10^23 | 77 | 57% up the stairs |
Extending beyond humanity, I'll add in cosmological structures:
| Scale | Estimated Atoms | log₂(n) | Spanish Steps |
|---|---|---|---|
| Earth | ~10^50 | 166 | Up and down 31 steps |
| Solar System | ~10^57 | 189 | Up and down 54 steps |
| Nebula (typical) | ~10^60 | 199 | Up and down 64 steps |
| Milky Way Galaxy | ~10^68 | 226 | Up and down 91 steps |
| Local Group | ~10^72 | 239 | Up and down 104 steps |
| Virgo Supercluster | ~10^75 | 249 | Up and down 114 steps |
| Observable Universe | ~10^80 | 266 | Up and down (2 sets of stairs) |
| Universe of Universes | ~10^160 | 532 | Up and down twice (4 sets of stairs) |
I find something about this comforting. The universe is so unimaginably, incomprehensibly vast, and yet there are paths for us to structure it, make sense of it, and explore its immense depth.
-----
Part 3 of this series: What the cosmos teaches us about quadratic growth and LLM context windows