The Essence of ACID Properties
project photo
author photo
Iftekhar AhmedNov 17, 2022

ACID are the four properties a database should obey in order to maintain its integrity. ACID transactions guarantee that a database will be in a consistent state after running a group of operations.

Before we further explore ACID properties, it is important that we know about transactions.

Transaction

A transaction really is nothing but a collection of sequel queries that are treated as one unit of work. We're treating transactions as one unit of work because we cannot do everything we want in a single query.

For example, if you want to deposit some money from one account to another account first you need to SELECT the money from the first account. After selecting the account you do an UPDATE to that account to deduct let’s say $100. Then you SELECT the second account and UPDATE the second account so you are doing (SELECT, UPDATE, SELECT, UPDATE), which is a part of one single transaction.

So a transaction consists of one or more queries, which is considered as one unit of work.

ACID Properties

ACID stands for Atomicity Consistency Isolation and Durability. Let’s explore them in detail.

Atomicity

Atomicity means “All queries in a Transaction must succeed. And if one query fails, then all prior successful queries in the transaction should roll back to the initial state”. Even if the database went down during a mid-transaction it should be able to roll back to its initial state.

Example: Imagine you have two accounts. You want to deposit 100$ from account1 ( which has 1000$ ) to account2 (which has 500$ ). So your transaction should look something like this, first, you SELECT account1 and deduct 100$ from it (UPDATE). Then SELECT account2 and UPDATE.

But what if after deducting 100$ your database crashes? Now, account1 has 900$ but account2 is still 500$ which is really bad. We don't want our database to behave this way. If our database follows Atomicity then it will roll back to the initial state where account1 = 1000$ and account2 = 500$. Which can save us from losing 100$ from thin air.

Consistency

Consistency means the data of a database should be consistent. Meaning the data we are using in some parts of our database should be consistent throughout the whole database.

Consistency can be divided into two categories. Consistency in Data and Consistency in Reads. Let’s explore both of them with some examples.

Consistency in Data

Example: Let's say we have two data tables

project photo

here we can see that Pic_id_1 has 10 likes but only A, C, D, and E like Pic_id_1. Therefore the data is not consistent because the number of likes didn’t match the total user who liked that picture.

Consistency in Reads

Let's say you have a database X and it has multiple instances. For example, my main database is located in the US and I have two more instances of that same database one in Singapore and another in Dubai. Now if I update something in my US database and at the same time, someone requests that exact data from the Singapore database what will happen?

project photo

The Singapore database will show the previous data ( Though eventually, it will sync with the US database. This is called Eventual Consistency). Which leads to inconsistency. So whenever we try to cache some data it is eventually going to have inconsistency. So what can we do? Well, it actually depends on what that data is being used for. If I wanted to show the likes of Cristiano Ronaldo’s recent Instagram photo, It doesn’t actually matter if a user sees 11334563 or 11334521. But if the data is being used for any money transaction then all the small details will matter.

Isolation

Isolation is a property that guarantees the individuality of each transaction and prevents them from being affected by other transactions. It ensures that transactions are securely and independently processed at the same time without interference. Perfect isolation ensures that all concurrent transactions will not affect each other.

Example:

project photo

Let’s say user A and user B requested a withdrawal from a Database at the same time. If both transactions happen concurrently while making a request, the database will have 1000$ in the account for both transactions so user A will SELECT 1000$ also user B will SELECT 1000$ and both will deduct 100$ from 1000$. So, after both transaction Database will have 900$ left, which is not correct. To fix this issue user B or user A needs to wait for the other users to finish his/her transaction first.

There are several ways that a transaction can be interfered by other transactions that runs simultaneously with it. This interference will cause something we call Read Phenomena.

Let’s talk about some Read Phenomena that can occur because of a low level of transaction isolation

Read Phenomena

  • Dirty Read: It happens when a transaction reads data written by another concurrent transaction that has not been committed yet. This is terribly bad because we don’t know if that other transaction will eventually be committed or rolled back. So we might end up using incorrect data in case rollback occurs.
  • Non-repeatable Read: When a transaction reads the same record twice and sees different values because the row has been modified by another transaction that was committed after the first read.
  • Phantom Read: It is a similar phenomenon but affects queries that search for multiple rows instead of one. In this case, the same query is re-executed, but a different set of rows is returned, due to some changes made by other recently-committed transactions, such as inserting new rows or deleting existing rows which happen to satisfy the search condition of the current transactions query.
  • Serialization Anomaly: It’s when the result of a group of concurrent committed transactions could not be achieved if we try to run them sequentially in any order without overlapping each other.

To fix these Read Phenomena there are 4 standard Isolation Levels.

Isolation Levels

  • Read Uncommitted: Transactions in this level can see data written by other uncommitted transactions, any changes from the outside are visible to the transaction whether committed or not.
  • Read Committed: where transactions can only see data that has been committed by other transactions.
  • Repeatable Read: The transaction will make sure that when a query reads a row, that row will remain unchanged while it is running.
  • Serializable: Transactions are run as if they are serialized one after the other.

Here is a table from Wikipedia that shows what may or may not occur in an Isolation level

project photo

Durability

Durability is a property that enforces completed transactions, guaranteeing that once each one of them has been committed, it will remain in the system even in case of subsequent failures.

If a transaction is successful, all changes generated by it are stored permanently.

But if we save a snapshot of the database every time we WRITE, the WRITE operation will become slow. So there is always a tradeoff.