This section elaborates how a VerneMQ cluster deals with network partitions (aka. netsplit or split brain situation). A netsplit is mostly the result of a failure of one or more network devices resulting in a cluster where nodes can no longer reach each other.
VerneMQ is able to detect a network partition, and by default it will stop serving
UNSUBSCRIBE requests. A properly implemented client will always resend unacked commands and messages are therefore not lost (QoS 0 publishes will be lost). However, the time window between the network partition and the time VerneMQ detects the partition much can happen. Moreover, this time frame will be different on every participating cluster node. In this guide we're referring to this time frame as the Window of Uncertainty.
Note As of VerneMQ 0.15.1 the behaviour during a netsplit is completely configurable via
allow_unsubscribe_during_netsplit. These options supersede the
trade_consistency option. In order to reach the same behaviour as
trade_consistency = on all the mentioned netsplit options have to set to
on. VerneMQ versions prior to 0.15.1 required to set the
allow_multiple_sessions = on parameter to allow new client connections during a netsplit. This was a work around, which isn't required anymore. It is still possible to use
allow_multiple_sessions however is has no impact on serving new clients during a netsplit situation anymore.
VerneMQ follows an eventually consistent model for storing and replicating the subscription data. This also includes retained messages.
Due to the eventually consistent data model it is possible that during the Window of Uncertainty a publish won't take into account a subscription made on a remote node (in another partition). Obviously, VerneMQ can't deliver the message in this case. The same holds for delivering retained messages to remote subscribers.
last will messages that are triggered during the Window of Uncertainty will be delivered to the reachable subscribers. Currently during a netsplit, but after the Window of Uncertainty last will messages will be lost.
Normally, client registration is synchronized using an elected leader node for the given client id. Such a synchronization removes the race condition between multiple clients trying to connect with the same client id on different nodes. However, during the Window of Uncertainty it is currently possible that VerneMQ fails to disconnect a client connected to a different node. Although this scenario sounds like artificially crafted it is possible to end up with duplicate clients connected to the cluster.
As soon as the partition is healed, and connectivity reestablished, the VerneMQ nodes replicate the latest changes made to the subscription data. This includes all the changes 'accidentally' made during the Window of Uncertainty. Using Dotted Version Vectors VerneMQ ensures that convergence regarding subscription data and retained messages is eventually reached.
Starting with VerneMQ 0.14.2, duplicate clients are automatically resolved.