At Irori, we are always interested in exploring how new technologies can be applied to solve problems. Over the last years, we have seen a range of new technologies emerge in the IT tech landscape. By spreading user data across thousands of collaborating Internet devices, new technologies challenge today’s centralized web. They provide unlimited and redundant data storage, as well as uncompromisable digital currency and digital contracts. They require no corporate cloud infrastructure, no large data centers and server arrays.There are no admins and root accounts, only strong end-to-end cryptography. This is the distributed web, also known as Web3.
As mentioned, at Irori we’re focused on knowing all about the latest technologies and how to best apply them to deliver efficient, stable and fault-tolerant solutions to our customers. We use Kubernetes, Kafka and other key software products in the cloud to provide event-driven solutions to real customer problems. But the tech landscape is a continuously evolving realm that we need to keep track of. Every now and then there is a leap in tech, and before we know it our current way of solving problems and building solutions could be outdated.
One tech leap that is bound to dictate the future of IT is the advent of peer-to-peer networks and the representation of data as blockchains. We’ve already encountered some of them. Today there’s BitCoin, an independent and uncompromisable digital currency. There’s Ethererum, which is a host of other digital currencies as well as digital contracts, equally uncompromisable and safe.
What started out with Bittorrent, the immensely successful global peer-to-peer file sharing network (which came about to circumvent copyright restrictions, in the absence of trusted media streaming services), has matured into new game-changing data storage and sharing technologies that now pave their way into the world of IT.
Let’s look at some features these technologies provide:
- Data redundancy – data is always at hand, since it is stored “everywhere”.
- Data integrity – no one other than the creator of some data can change it.
- Privacy – private/public cryptography ensures that data is only accessible to the publisher’s intended consumers.
- Resilience – strike out some nodes in the network and others will quickly take over their duties.
- Encryption, at rest and in transit – no one can hack into the data being stored or transferred (unless maybe if they have a billion dollar quantum computer).
- No central servers and hubs – data is spread out everywhere, thus available from virtually anywhere in a snap.
- High bandwidth – the “everywhere” responds collectively to a query for data. Thousands contribute to the response of a single request.
With those requirements fulfilled, we’d be fools not to look into how these technologies can be used to create high quality solutions in new application areas.
Location based addressing – identifying data by its physical address
First, let’s look at today’s Internet and its most important application, the World Wide Web. Technically, the WWW is built up from using the combination of TCP packet routing and transfer of HTTP data streams together with DNS host naming.
Because of this design, the WWW suffers from one very serious flaw. The web is woven together with hyperlinks. A hyperlink is location based, meaning it points to a certain host location on the net where the content it denotes is supposed to be found.
However, when a server no longer responds on its hyperlink address, as can be the result of it having gone off-line or is hacked or otherwise malfunctioning, or DNS no longer knowing the proper IP address of the provided hostname, theis part of the web it makes up is torn away. The web server address can of course be altered to point to another IP address, maybe a backup machine. If it exists, that is. Otherwise the link is broken and the content it represents cannot be retrieved.
When a host of a hyperlink goes offline or is suddenly unknown by DNS, the hyperlink is dead and a part of the web is torn away.
Research has found that the average lifespan of a web hyperlink is about 100 days from its inception. From a statistical point of view, we’re very lucky if it is valid for a year or more. Thus, the world wide web is by no means a guarantee to be the technology to host all the world’s knowledge. Quite the contrary, the location based web is in decay and has been so for quite some time.
From centralized to distributed data ownership
Today there’s an ongoing centralization of the Internet, where big corporations host all available data. Centralization does have some benefits, for instance making it easy to gather all resources in one place needed to provide complete services, but it also means we involuntarily create single points of failure. As we saw from the dead hyperlink example above, should the precious central service location go offline, the service as a whole is unavailable. Centralization also leads to the debate of data ownership – do you as a user own the data you create, or does the hosting corporation own your data?
To increase availability and data redundancy, one can offload the need for centrally located hosts and services to more decentralized network structures. CDNs (Content Delivery Networks) are examples of such decentralization. Also regarded as decentralized solutions are the cloud providers, such as Amazon AWS, Google Cloud and Microsoft Azure, since they provide geographically separated data centers (known as regions) which cooperate to provide services deployed into them. Another category of services labeled as decentralized are the BitCoin and the Ethereum blockchain networks.
Web3 is the next generation of the web
With the advent of peer-to-peer networks we see a transition of centrally located data to a completely distributed structure, where the hosted data is redundantly stored and processed among all network participants. This structure is what is commonly referred to as the distributed web.
In the distributed web, there are no admin or root accounts that could be hacked into, there are no central servers, gateways and firewalls, nor any large and vulnerable storage arrays. Instead all data is distributed across thousands, and in the long run millions of collaborating computers and devices.
In short, the distributed web is the complete opposite to the aggregated, centralized and vulnerable web we have today. For this and other good reasons, the distributed web has boldly been termed Web3, meaning the next generation of the web.
How does Web3 work then?
The distributed Web3 is made of some rather recently emerged software initiatives, such as BitCoin, Ethereum and IPFS. They serve different purposes but they are all made possible by leveraging a number of powerful IT technologies. These are by no means new, some of them have existed for decades. But when put together in new combinations and ways of being applied, they work miracles.
Peer-to-peer mesh networks
BitCoin, Ethereum and IPFS are examples of overlay networks that operate on top of the Internet. They use the Internet as a carrier of the information they host and manage.
Also, they’re all examples of peer-to-peer mesh networks. The word “mesh” tells us about the way they’re structured. A peer-to-peer mesh network is made of nodes. In this context, a node is a computing device connected to the Internet. A node device runs a peer-to-peer software (like IPFS, BitCoin or Ethereum), making it part of its particular peer-to-peer mesh.
Nodes only know of the IP-addresses of their peers, i.e. their immediate neighbor nodes on the Internet. They in turn know of other, more distant neighbors, who in turn know of yet other nodes. Data in the mesh travels along paths of known neighbor relations.
Each node in a peer-to-peer network also has a unique identifier (which is not its IP address). This is used when storing and retrieving data in the mesh. We’ll come back to this later.
Content based addressing – separating content from its location
In a peer-to-peer mesh there is no need for host names, host addresses and domain names that express the location of any given content.
Instead, content based addressing is used. Here, the binary content that is supposed to be stored in the mesh is used as input to a hash algorithm, which will produce a long number that uniquely identifies the content. The identifier produced this way is used as a key to store the content in the mesh.
Not only do hashed content identifiers make sure that each piece of data is uniquely identified. They also ensure data integrity. Should the content be changed it no longer computes to the same identifier. This makes it pointless to try to secretly patch data stored in the mesh, since that would immediately be evident when checking retrieved content against its identifier.
Distributed hash tables – an easy way to find information in a large collection of data
A peer-to-peer mesh of nodes makes up a big storage space which is referred to as a distributed hash table, or DHT for short. Since all nodes collaborate to store content the storage space is virtually infinitely large.
Which nodes should store a given content is determined by their node identifiers. When storing content, the mesh software uses the content identifier to find nodes in the mesh whose first digits of their node identifiers match those of the content identifier. Not one but several nodes for the content can be targeted this way. This is how data redundancy in a peer-to-peer mesh is accomplished.
Block chains and Merkle DAGs
Content based addressing is the fundamental way to ensure data integrity. But there are also some other ways to be absolutely sure content cannot be altered.
A block chain is a data structure with which one can build up infinitely long sequences of content. Each piece in the chain is called a block. When a block is stored in the mesh it also encodes the content identifier of its preceding block. Should anyone try to modify a block in the chain it is broken, since the modified block now has a different identifier.
Block chains are ideal for representing ledgers of any kind, for instance a digital wallet and all of its transactions. They are what make digital currencies like BitCoin and Ethereum possible (not forgetting to mention the very energy consuming mining processes used to verify the cryptocurrency blockchains).
If one wants to store a lot of data below a content identity another technique known as Merkle Direct Acyclic Graphs, or DAGs, is used. When storing a large file into the mesh, let’s say 1GB worth of data, it is subdivided into blocks that are arranged into an unbreakable tree structure of blocks and their identities. In the next article in this series we’ll take a look at how this is done in more detail.
As powerful as they may be, all of the techniques described above wouldn’t be worth much if content was stored in the clear. In such a case it could be read and understood by unknown intruders.
To ensure privacy and also full control over who can read what, strong cryptography is used. It is based on the RSA crypto technology where a pair of digital keys is used to encrypt and decrypt data.
The producer of some content has a secret write key to encrypt it, the intended consumers have a corresponding public read key to decrypt it. Keys come in pairs – no other read key but the one having its particular write key can be used to decrypt the content. Since encryption and decryption takes place at the producer and consumer ends, it makes the content secure, both at rest and in transit.
Distributed applications on Web3
We have mainly covered how data is stored and secured in peer-to-peer mesh networks, such as IPFS. One could easily get the impression that it’s all about providing an alternative file storage in the cloud. But that is just one of the many possibilities associated with Web3.
The ecosystem around the Web3 platforms provides APIs that successfully provide functionality like bucket data storage, key-value databases, pub/sub event processing, Conflict Free Replicated Data types and much more, all global, all inherently secure. But without the cloud providers.
If we choose to make a solution based on a distributed, peer-to-peer mesh network we’d be making a distributed web application, a dApp for short.
Examples of dApps
For what dApps and distributed solutions can we use these technologies? The answer is – pretty much any application you’d implement using current cloud provider technologies and APIs.
With a bit of good work one could make a Facebook clone, looking and working like the original, but without the data centers, servers, behavior algorithms, censorship and illegitimate data sharing.
A team collaboration solution working like Sharepoint together with Teams is fully possible to make real as a distributed application. Documents and images would be stored safely encrypted in the mesh, their producers and consumers well identified, away from all peering eyes, intruders and server attacks.
The technologies described above lend themselves very well to host a foolproof and unbreakable application for managing personal identity and storing sensitive, truly personal information. Such an app could be used as a source for delegated authorization to other systems (like “login with Facebook”).
To sum up – there is more to learn
We have seen how distributed networks work, their remarkable features and capabilities and what they can be used for. This gives us some insight and inspiration to an alternative way of providing powerful business applications.
At Irori, we constantly explore how new technologies can help us solve problems. With distributed applications we can open a whole new way of implementing systems and solutions. We will continue to follow developments and reflect on how solution design can be adapted to make use of technological evolution.
In the next article of this series we’ll make a deep dive into IPFS, the Interplanetary File System, which is one of Web3’s most powerful platforms, to learn how it works and what can be done using it. Stay tuned.