A hash function in cryptography refers to a mathematical function that converts a numerical input value into another compress numerical value. The input to the hash function is of arbitrary length but the output is always of fixed length.
A hash function is also known as a hashing algorithm or message digest function.
To put it simply, a hash function takes a group of characters and maps it to a value of certain length.
That is to say, for any x input value, you will always get the same y output value whenever the hash function is run.
f(x) = y
This means every input has a predetermined output. The hash value represents the original string of characters but it usually smaller than the original. The value you get after processing a set of data through a hash function is called a hash value or message digest.
Note that the input can be any data like numbers, files and other types of files. The process of transforming a given set of data to a specific hash value is called hashing the data. The hash value or message digest is always in form of a hexadecimal number.
Since the resulting value after hashing is much smaller compared to the data that passed through the function, makes hash functions to act like compression functions.
Difference between cryptographic hash functions and normal hash functions
Though cryptographic hash functions are often referred to as “hash functions”, but that’s incorrect. A hash function is a generic term that encompasses cryptographic hash functions along with other sorts of algorithms like cyclic redundancy checks.
What is the usage of a cryptographic hash function?
A cryptographic hash function is run on data such as an individual file or a password to produce a value called checksum. Cryptographic hash functions are also used to verify the authenticity of a piece of data.
For example, two files can be assumed to be identical only if the checksums generated from each file, using the same cryptographic hash function, are identical.
What is the purpose of hashing?
The hash function is used to index the original value or key and then used later each time the data associated with the value or key is to be retrieved. This makes hashing a one-way operation. This solves the need to reverse engineer the hash function by analyzing the hashed values.
Also, hashing is done for indexing and locating items in the database since it is much easier to find shorter hash value compared to the original strings of data.
Can hash functions be reversed?
Hash functions are designed to adhere to the most important property, it should be computationally difficult to find the reverse of a hash, but extremely easy to find the hash of any data. This makes it hard to come up with an input which produces a specific hash.
Another reason why you can’t reverse the hash function is that most of the data is lost during the hashing process.
In addition to being one-way and irreversible, hash functions should not result in the same value for two different pieces of data. If two pieces of data result to the same hash value, in that case, a hash collision has occurred, making the function not fit for use.
Features of Hash Functions
- Fixed length output (hash value). Any hash function should be able to convert data of arbitrary length to a fixed length. This process is called hashing of data. Most hash functions generate values between 160 and 512 bits.
- The efficiency of operation. For any hash function when given input x, the computation of the hash value should be very fast.
Properties of hash functions
- Pre-image resistance. This simply means that it should be very hard to reverse a hash function. For example, a hash function k produces a hash value of p, then it should be a very difficult process to find any input value y that hashes to p.
Importance of this property is to prevent an attacker who only has a hash value and is trying to find the input.
- Second preimage resistance. This means that, given an input and its hash, it should be very hard to find a different input with the same hash. For example, if a hash function k for an input x produces hash value k(x), then it should be difficult to find any other input value y such that k(y) = k(x).
Importance of this property is to protect against an attacker who has an input value and its hash and wants to substitute different value as a legitimate value in place of the original input value.
- Collision resistance (collision-free hash function). This means it should be hard to find two different inputs of any length that result in the same hash. For example, for a hash function k, it should be difficult to find any two different inputs x and y such that k(x) = k(y).
Importance of this property is to make it hard for an attacker to find two input values with the same hash.
Uses of cryptographic hash functions
- Hashing is used with a database to enable items to be retrieved more efficiently. This is used for password storage.
- Hashing is also used in the encryption and decryption of digital signatures.
- A hash function is used in folding, taking an original value, dividing it into several parts, then adding the parts and using the last four remaining digits as the hashes value or key.
- It’s also used in a digital arrangement, taking the digits in certain positions of the original value, such as the third and sixth numbers, and reversing their order, then using the number left over as the hashed value.
- Data integrity check used to generate the checksums on data files. This provides assurance to users about the correctness of the data.
Popular hash functions
- Hashed message authentication code (HMAC). Combines authentication via a shared secret with hashing.
- Message Digest 2 (MD2). Byte-oriented produces a 128-bit hash value from an arbitrary-length message, designed for smart cards.
- MD4. Similar to MD2, designed specifically for fast processing in software.
- MD5. Similar to MD4 but slower because the data is manipulated more. Developed after potential weaknesses were reported in MD4.
- Secure hash algorithm (SHA). Modeled after MD4 and proposes by NIST for the Secure Hash Standard (SHS), produces a 160-bit hash value.
Characteristics of a good hash function
- The hash value should be fully determined by the data being hashed.
- The hash function should use all the input data.
- The hash function should uniformly distribute the data across the entire set of possible hash values.
Here is a complete list of different types of hashing algorithms.
Now I want to hear from you.
What do you think of hash functions?
Or maybe I missed an important of cryptographic hash functions.
Either way, let me know by leaving a comment below.