What is a hash function
A hash function is any function that can be used to map data of arbitrary size into data of a fixed size. The input to the hash function is of arbitrary length but the output is always of fixed length. see Image 1.
Image 1. A cryptographic hash function allows verifying whether input data map onto a hash value.
If the input data is unknown, it is deliberately difficult to reconstruct it by knowing the stored hash value since it’s considered a one-way function. This is used for assuring the integrity of transmitted data
Note: Hash functions are related but often confused with checksums, check digits, fingerprints, lossy compression, randomization functions, error-correcting codes, and ciphers. Although the concepts overlap to some extent, each one has its own uses and requirements and is designed and optimized differently.
A hash procedure must be deterministic. For a given input value it must always generate the same hash value. If a cryptographic hash function were to produce different outputs each time the same input was entered, the hash function would be random and therefore useless
This is critical because if you get different hashes every single time it will be impossible to verify that the input value is the same.
Computationally Efficient (Performance)
Generally, for any hash function h with input x, computation of h(x) is a fast operation. To be useful hash functions must be computationally efficient. Computers must be able to perform a hash function’s mathematical labor in an extremely short period of time. If an ordinary computer needed several minutes to process a cryptographic hash function and receive the output, it would not be very practical.
Note: Computationally hash functions are much faster than symmetric encryption.
Defined range (Fixed Length Output)
A hash function converts data of arbitrary length to a fixed length. It is often desirable that the output of a hash function has a fixed size. Cryptographic hash functions produce much larger hash values, in order to ensure the computational complexity of brute-force inversion. For example, SHA-512, one of the most widely used cryptographic hash functions, produces a 512-bit value.
In general, the hash is much smaller than the input data. Producing fixed-length output from variable length input can be accomplished by breaking the input data into chunks of a specific size. In cryptographic hash functions, these chunks are processed by a one-way compression function, with the last chunk being padded if necessary. The size of these chunks is called block size and may differ from the output size. SHA-512, the example from the previous paragraph, has a block size of 1024 bits.
Note: Hash function with n bit output is referred to as an n-bit hash function. Popular hash functions generate values between 160 and 512 bits.
This property means that it should be computationally hard to reverse a hash function. The output of a cryptographic hash function must not reveal any information about the input. Changing one character in a long string of text must result in a radically different digest.
If a hash function h produced a hash value z, then it should be a difficult process to find any input value x that hashes to z. This property protects against an attacker who only has a hash value and is trying to find the input.
Note: This is the most usual property developers think of when they think of a cryptographic hash function. Unlike encryption, there should be no “dehashing” function. A good preimage resistant function should be “hard” to invert.
Second Pre-Image Resistance (Collision resistance)
This property means given input and its hash, it should be hard to find a different input with the same hash. If a hash function h for an input x produces hash value h(x), then it should be difficult to find any other input value y such that h(y) = h(x).
Since a hash function is a compressing function with fixed hash length, it is impossible for a hash function not to have collisions (since the input has infinite possibilities). This property of collision-free only confirms that these collisions should be hard to find. This property makes it very difficult for an attacker to find two input 2 values with the same hash. Also, the property protects against an attacker who has an input value and its hash and wants to substitute different value as the legitimate value in place of the original input value.
Note: it’s important to reiterate that hashes are “digests”, and not “encryption” algorithms. Encryption is a two-way operation. That is, given a message m, an encryption algorithm enc and its corresponding decryption algorithm dec, dec(enc(m)) = m. Hash functions are different, there is no way to “decrypt” a hashed message, per preimage resistance. Because of this, it is common to say that hash functions are one-way operations.