Wednesday, 5 October 2016


CAP and ACID are two distinct terms that are used often in database world, where ACID is used to define properties of a database and CAP is actually used to determine distributed systems nature.
ACID stands for
  • Atomicity
  • Consistency
  • Isolation
  • Durability
Before going into the meaning of these terms one should be fairly aware of transactions. A transaction is like a unit of work which may comprise of multiple write operations or just a single operation. Treating a set of operations like this actually enables databases to maintain ACID properties.
Atomicity – Either all statements/queries inside a transaction happen or none of it does. In layman terms, if you start a transaction saying I want these 5 queries to be run. The database will make sure that either all of 5 queries are successful or none of them is. So if 5th query fails automatically the previous 4 queries effect will be wiped off.
Consistency – This ensures that whatever data you put into your database does not violate the rules that you have set for that type of data structure. This also includes the rules for related data and so forth.
Isolation – Every transactions run independent of other transactions. No transaction can refer to the data which is being manipulated by other transaction until it finishes.
Durability – Once a transaction has been committed it is stored in a durable medium (hard disk). This ensures even after crashes or restarts the database will have its state preserved.
 CAP stands for
  • Consistency
  • Availability
  • Partition Tolerance
CAP is basically applied to a distributed system architecture where database is not just a single machine but is a set of nodes over a network, say 5 different machines running same database connected through network.
Consistency – Although some confuse CAP’s consistency with ACID’s one, it is entirely of different meaning when it comes to distributed system. It means that all the nodes will always have latest data, so it doesn’t matter from which node you read, you will always get the latest copy.
Availability – This means that a system will always be available i.e. respond to requests even if a node goes down. Although this does not guarantee that other nodes will be in sync with unavailable node.
Partition Tolerance – This means system should be able to tolerate a failure in internal network i.e. communication between nodes. This means that nodes are individually able to serve the clients but are unable to communicate with each other.
This theorem is actually used to describe a system as either CP or AP, there cannot be a practical CA distributed system. In a cluster the nodes transmit the data to each other to maintain its consistency. When does this happen is the question.
In an AP system, the system still functions even if a single node is down. This does not ensure consistency since the failed node might have not updated other nodes with its operations. In this case the running nodes will give outdated data to the clients.
In an CP system, the consistency is of priority. So if a node goes down though the system can run, it chooses to stop responding to requests so that other nodes do not get ahead of failed node in terms of data. When the failed node is restored the system will be back up.