The year is 1995. Millions of people roam the corridors of Doom, melting monsters with the BFG. Those in business might remember IBM acquiring Lotus Notes, one of the industry’s most venerable software platforms.
Unlike Doom, however, Lotus Notes was designed from the ground up to be a partially distributed system. A perfect fit for the internet. It was ahead of its time and pushed the limits of distributed computing. I believe programming for the browser will do the same in coming years.
In 1995 the number of US households with internet access was a mere 13%. This number jumped significantly in 1996 to 22%. The internet was growing - more nodes were connecting. We were creating the backbone for what would become the largest distributed system in the world.
Brendan Eich also birthed a language this year, you may have heard of it. The delivery room was Netscape Navigator. The browser was simple at that time, but its introduction to the layperson helped spread the foundation for a distributed platform.
Parallel to the adoption of Service Workers, we’ve also seen an uptick in “offline-first” databases like the combination of CouchDB and PouchDB or the proprietary Firestore database. These databases allow developers to leverage the browser’s local database, IndexedDB, to store user data for offline access. Then later when the connection is regained, they synchronize with a centralized copy.
A lot has happened since 1995. At some point, frameworks will provide these tools out of the box. In theory, we have everything we need to build offline applications. In theory.
Our tools are not perfect. If what we need is a machete, what we have is a butter knife. All browsers will eventually support Service Workers, but this only enables more intelligent caching - a relatively shallow problem. Decentralized databases like Couch are available, but they are limited in their ability to handle conflicting data when networks are unstable.
These are not theoretical limitations. Back in 2015 I was building affordable healthcare software for Africa, similar to HospitalRun. Due to the instability of Africa’s network and power infrastructure, offline-first was a natural choice. But it was naive to believe we could easily build HIPAA compliant, robust software that was offline-first. The shortcomings of our tools are all too real.
If we want robust offline applications, we need to think distributed. Offline is just the tip of the iceberg, though. By thinking distributed, we solve many more problems systemic to the web. Let’s take a look at a few of these problems.
Some would claim the web is broken. That 1995 wants its technology back.
HTTP is apparently broken. Man-in-the-middle attacks are an obvious risk without a secure connection, so Chrome will begin enforcing HTTPS. When using SSL, we need "certificate authorities," a highly centralized offering, to validate domain ownership. This is hardly necessary.
Our privacy is at risk on today’s internet. Governments, corporations, and hackers all want access to our personal information. They may only want to mine our data, but censorship is a real threat to our productivity.
When building a new centralized app, there are a lot of expenses. Spinning up a minimal yet production-ready environment requires deploying the application across coordinated servers, reinforcing a cluster of databases, and configuring monitoring services to ensure all is well. It's becoming prohibitively expensive to build apps this way.
When a centralized app hits scale, compliance emerges from the shadows. In some countries, user data must be stored on servers located in the user's country. What does this means from an architectural point of view? It means the system ends up distributed anyway.
Whether or not the web is truly broken is irrelevant - there is always room for improvement. But we do need something better. Something distributed.
To visualize our migration towards a distributed web, let’s review Paul Baran’s seminal diagram on network types.
This diagram illustrates three network types: centralized, decentralized and distributed. As it stands today, we’re mostly centralized and decentralized. Some services more than others. If you think about the ubiquity of AWS, we might be more centralized than meets the eye.
To solve the problems outlined above, we need to move away from centralized and decentralized networks. “Offline-first” is a great exercise, but what about “distributed-first”? Distributed-first technology is, by definition, offline-first technology but with the added benefit of privacy, security, cost reduction, and compliance.
A distributed platform would address content by its value instead of its name, making it more cacheable and rendering certificate authorities unnecessary. Distributed users would locally federate without any governing oversight, preventing censorship. The burdens of cost and compliance are reduced as data can be stored locally on a subnet, dispersing bandwidth and storage. Going distributed has a lot of perks.
Fortunately for us, people are already working on these problems.
More advanced caching with Service Workers is a step in the right direction, but it doesn’t solve the hard challenges of data synchronization. What we need here are CRDTs: Conflict-free Replicated Data Types. CRDTs are mathematically-proven data types that are guaranteed to be correct in a distributed system. Even when a node goes offline for an indefinite amount of time, CRDTs will eventually converge. Not everything can be modeled as a CRDT but with a little creativity, a lot can.
If we are to become more distributed, we need new protocols in order to store and index distributed content. One such protocol is the InterPlanetary File System (IPFS). IPFS is a fully distributed web protocol that is compatible with the browser you’re using right now. There are distributed databases built on IPFS, like OrbitDB, that use CRDTs to provide eventually consistent data, meaning the data is guaranteed to be predictably synchronized.
Latency is another challenge in distributed systems. One of the perks of centralized services is the ability to control the user experience, including the latency of the backbone network. In a fully distributed system, the backbone is the application network. Think BitTorrent: when there are no seeds to serve a file, that download crawls to a halt. In an ideal world and at scale, there are enough nodes (peers) to serve content quickly. This is why the Beaker browser has incorporated network sharing to distribute content more evenly.
What about compute-heavy distributed applications? We’ll need better protocols for sharing idle compute resources. If your CPU is 90% unused, why not give it back to the network? For more nostalgic 90’s software, think SETI@home, which uses spare CPU to find extraterrestrials. Nowadays, websites can cryptomine directly in your browser, but there are more formal distributed protocols emerging, like Golem.
These tools will help shape the future. Although interesting in their own right, the question may still remain: so what? We’re accustomed to new tools being introduced all the time. What does this mean for our development practices?
What I believe will actually happen is that most developers will need to have some basic understanding of distributed systems. A loose grasp of the vernacular will be a minimum requirement and a good sense of how data is synchronized will be imperative.
This is no small ask. A developer will need to understand the principles of distributed timestamps like vector clocks and their lineage. An answer to a basic question about data persistence will involve an understanding of CAP theorem and the tradeoffs inherent in distributed databases. Avoiding the headaches associated with synchronization will involve an understanding of CRDTs, their tradeoffs, and a sense of creativity around their use cases.
None of this will happen overnight, or even within a few revolutions around the sun. It will take time to shift our centralized mindset towards decentralization and distribution.
I’m imagining a world where node discovery is hyper-localized. Where my cell phone can directly discover and connect to my neighbor’s cell phone. My neighbor’s cell phone may have a part of a file I'm looking to download, or they may be authorized to store an encrypted copy of data within an application I'm using.
Once our physical infrastructure enables localized connectivity, it’s game on. The browser will be the best poised to utilize it. It is the uniform application platform we’ve been striving toward for decades. The browser, and therefore the internet ecosystem, is primed for distribution.
When browser vendors adopt, we all get to reap the rewards. End-to-end encryption over a localized network will prevent unauthorized parties from sniffing our packets. Every user owns and stores their own data, encrypted at rest. Developers can leverage existing distributed networks, reducing the cost of production. And our applications will be fault-tolerant, replicated, convergent, and offline-first - for free.
This is the world I want to live in. Where distributed tools are ubiquitous. Where we outfox the depraved and wealth is more uniform. Where I control my data. A world the pioneers of the internet always envisioned: accessible, secure, and distributed.