The process of mining bitcoins is similar to a lottery. Bitcoin miners are competing to produce hashes—alphanumeric strings of fixed length that are calculated from data of an arbitrary length. They are producing the hashes from a combination of three pieces of data: New blocks of Bitcoin transactions; the last block on the blockchain; and a random number. In this article, we will analyse hashing and the digital signature process to understand their use past Bitcoin transaction verification alone.
A Small Introduction
Diving right into some technical details made as simple as possible, the first step to signing anything digitally is to hash the contents of what is being signed. To securely hash is not easy, and the 256-bit secure hash algorithm (SHA-256) ensures two critical concretions.
First, a brute force attack on the hashing algorithm must be as computationally expensive as going through all binary combinations of 256 bits. It is currently unrealistic, however, to expect an attacker to accomplish such a search for small Bitcoin transactions, for example, since 2^256 combinations exist for 256 binary bits (2*2*2*…256 times). It’s a big number.
Second, a different kind of brute force attack on the hashing algorithm must be as computationally expensive as going through *a large fraction* of all binary combinations of 256 bits. This large fraction is complex, provable formula related to the theoretical input size (which for SHA-256 is not bounded, other than by what is typically used in the world wide web) and the fact that 256 bits will undoubtedly need to have collisions to support this.
Collision? What is That?
9 inputs must collide for a 3-bit hash, with 8 “holes”. This simplified example produces a 3-bit hash as opposed to 256, for simplicity and clarity. There are 2*2*2=8 combinations for 3 binary bits (each is either 0, or 1). If, however, there are 9 items to hash, then there has to be at least two inputs that lead to the same 3-bit hash output. By general logic, it’s visible immediately that some inputs will therefore have to hash to the same output.
With hashing better understood, back how it’s used: legal contracts are one use, but miners of bitcoin, and forgers of Ethereum, also rely heavily on hashing for their livelihoods. Cryptocurrency investors also use them in some way by transitive property (Transitive property, simplified: A=B, and B=C, ergo A=C).
Tampering with a digitally signed legal contract is so obvious, it never happens. Even a single character change generates drastically different hashes with SHA-256, for example:
“bitcoin blockchain” hashes to: b43636e6232a977b6a614c93da701f938f9faa90d355a74d71aa8210474c8ebf
“Bitcoin blockchain” hashes to: 7c96cf30947914ab1d9844d93707baf2435f9d9b290c8258622ab635054c8041
Just a single character difference – the capital “B”- and the hashes are completely different sets of bits.
Proof of Work vs Proof of Stake
For the proof of work (PoW) method of network distribution of network computational resources in a blockchain type network like bitcoin, hashing is used heavily in the reverse direction. As described before, a brute force method is computationally intensive. There are therefore sub problems to a full hacking attack that can be measurably difficult. By continuously upping the problem difficulty, but only slightly, Bitcoin operates on a proof of work model in which miners prove they have spent considerable resources and power towards operating the network by solving these super hard problems (and being first, claiming a “first comers reward”). This model is extremely energy inefficient, however, and is not in the purpose of Bitcoin and blockchain technology.
Proof of Stake (PoS) is however much less energy inefficient, since the “miners” (in a proof of stake network, those performing the network capacity are referred to as forgers are not rewarded any first comer’s fees, but instead through a deterministic system provide computational resources to the network and in return forge their own new coins. In an economic sense, miners are rewarded for their work, while forgers literally mint their own money).
In both cases, though, hashing is used to provide these computationally intensive problems which provide a natural distribution of new currency into the network. Remember, Bitcoin will only ever mint 21 million coins, and many of those are already in circulation already, since there is an exponential backoff function built into its code. Ethereum, a big PoS cryptocurrency, similarly will have to limit its new currency outflow as forgers join the fray since this could otherwise lead to undesired inflation if not properly controlled.
Conclusion
So, hashing provides a robust mechanism by which to obfuscate data, specifically used in digital signing but in other applications as well. On top of this, though the exact reason hashing came to be, its reverse problem (to decode a hashed value) is now being used to democratise the use of network resources on a blockchain based network.
Benjamin is a passionate software engineer with a strong technical background, with ambitions to deliver a delightful experience to as many users as possible. He previously interned at Google, Apple and LinkedIn. He built his first PC at 15, and has recently upgraded to iOS/crypto-currency experiments. Benjamin holds a bachelor's degree in computer science from UCLA and is completing a master’s degree in Software Engineering at Harvard University.