
In this post, I want to talk about MongoDB data durability options across MongoDB versions.
I consider a write durable if, once confirmed by the server, it becomes permanent at the node or cluster level (ignoring catastrophic failures like all nodes on a cluster failing at the same time).
MongoDB lets you choose between different levels of data durability using Write Concern. Unlike server-side configured durability (as you get with Innodb using innodb_flush_log_at_trx_commit), the client specifies the Write Concern on each write operation.
As indicated in the linked manual page, the Write Concern specification can include a
wand a
jfield (among other things).
The
wfield determines the number of nodes that must confirm a write before the client acknowledges it, with the following possible values:
- 1: meaning the primary,
- “majority”: meaning a majority of the nodes,
- Any other integer value, meaning that many nodes.
The
jfield requests acknowledgement that for every node determined by the “w” value, writes are confirmed to the on-disk journal. Otherwise, the write is confirmed only in memory.
How the client specifies Write Concern depends on the programming language and driver used. Here is how it javascript does it, using the mongo command line client:
db.test.insert({_id: 1}, {writeConcern: {w:1, j:1}})
while to use the same write concern on C, with the mongo-c-driver, you must do this before the corresponding write operation:
mongoc_write_concern_t wc = mongoc_write_concern_new();
mongoc_write_concern_set_w(wc, 1);
mongoc_write_concern_set_journal(wc, 1);
To get a better understanding of what this means from a durability perspective I ran a few tests using the following environment:
- A single client, using the mongo command line client, inserting an auto-incrementing integer as the single field (_id) of a collection.
- Standalone mongod, and a replica set of 4 mongod instances, all on the same machine. You can repeat the tests using this script as a guide (the only requisite would be that mongod and mongo are on the shell’s path).
- SIGKILL sent to the Primary node while the writes are happening.
- Comparing the last value for _id reported by the client, with the maximum value available in the collection, on the new Primary node after the replica set reconfigures (or on the standalone mongod, after I manually restarted it).
- MongoDB 3.0.4 and 3.2.7, using WiredTiger as the storage engine.
(I’ll discuss performance perspectives in a future post.)
In all cases, I indicate “missing docs” if the value reported by the client is higher than the value reported by
db.collection.find().sort({_id:-1}).limit(1)
Here are the results for a standalone mongod:
Standalone | ||
---|---|---|
w | j | Missing docs |
1 | 1 | No |
1 | 0 | Yes |
0 | 0 | Yes |
0 | 1 | No |
The first three don’t hold surprises, but the last one does. The mongo-c-driver does not let you specify a write concern of
{w:0, j:1}, and a cursory inspection of the MongoDB code makes me believe that “w:0” is interpreted as “w:1”. This would explain the result.
Here are the results for a four node replica set:
Replica Set | ||
---|---|---|
w | j | Missing docs |
“majority” | 1 | No |
“majority” | 0 | No |
0 | 1 | Yes |
Again,
w:0, j:1is transformed into
w:1, j:1. How can no data get lost in a standalone mongod, but can get lost in a replica set? The answer is in the standalone case, after SIGKILL I restarted the same instance. In that case, WiredTiger performs crash recovery. Since we request acknowledgement for write confirmation to the on-disk journal, the last _id is recovered (if needed), and no docs go missing.
However, in my replica set tests, I did not restart the SIGKILLED instance. Instead, I let mongod do its thing and automatically reconfigure the set, promoting one of the Secondaries as a new Primary. In this context, having a write concern that only requests acknowledgements of writes on the master is a liability, and leads to lost data.
When specifying w:”majority”, it is important to note that the value
j:0gets replaced with
j:1since version 3.2. That explains the lack of lost documents. I also tested 3.0 and, in that case, docs went missing when using
w:"majority", j:0. This probably explains the behavior changed in 3.2, and, depending on your use cases, might justify an upgrade if you’re on an older version.
In conclusion, MongoDB data durability options lets you satisfy different requirements on a per operation basis, with the client being responsible for using the desired setting. When using a Write Concern that does not guarantee full durability, a mongod crash is enough to cause the loss of unconfirmed documents. In this sense, the Write Concern values that include
j:0are analogous to running Innodb with
innodb_flush_log_at_trx_commitset to 0.
The “majority” value for the w component is valid even in the standalone case (where it is treated as “1”), so I think
{w:"majority", j:1}is a good value to use in the general case to guarantee data durability.