At Loadsmart, many of our services are organized within an event-driven architecture and face some usual challenges that come with such an architecture. One of them is dual writes, which happens when an application has to write data in two different places. A common example of dual writes is an application that needs to store data on a database and notify that change by emitting an event that may be consumed by different services. The inherent issue with this scenario is that if one of these operations fails, the producer and consumer applications will have data inconsistencies. If we choose to write the change to the database first and then notify the event, in the case of a notification failure, the database would have the updated data and the consumer wouldn’t. If we reverse it (i.e., notifying first and then storing), a failure when committing the transaction in the database will cause the consumer to receive data that isn’t on the application database.
There are well-known methods to deal with such issues, and we chose to apply the transactional outbox pattern. Since we couldn’t find any related solution in the Django community, we decided to implement an open-source library called Jaiminho(available through Pypi), a broker-agnostic implementation of the outbox pattern.
For simplicity purposes, let’s assume the common case of an application that has to save data to a database and notify it to an event stream. The idea behind the transactional outbox pattern is combining both writes into a single atomic transaction, which guarantees that if one of them fails, the entire transaction fails, avoiding inconsistent scenarios; in this case, it means that if a record is persisted into the database, it is guaranteed to be notified to the stream. The pattern achieves this goal by introducing two concepts:
When storing a record in the database, the corresponding event should be persisted into an outbox table, and both operations should be in the same transaction. With this strategy, we can leverage the atomicity property of relational database transactions to ensure either both or no inserts will be successful.
The outbox table is guaranteed to have all events that should be notified, so we just have to ensure that they are really sent. This responsibility goes to the message relayer, which will publish all events in the table to the stream and mark them as sent. Note: Since the message relay has to send an event and then mark it as sent, that characterizes another dual write, because if the relayer fails to update the status after sending the event, it will be sent again on the next execution. So, to ensure no duplicated events are consumed, the message consumers must implement idempotence on their side.
With the default configuration, when you add the @save_to_outbox
decorator to a method, whenever it is called and raises an exception, Jaiminho will create a database entry storing:
This data will allow for the message to be sent again using the original method, message and kwargs. Also, at the end of the decorated method a Django signal will be sent to indicate the event's success or failure.
Since the message and kwargs are serialized into bytes, any type of message is supported. For example, in Loadsmart we work with JSON and Protobuf messages, and both are using Jaiminho with no need for special configurations.
With Jaiminho, you can choose between the following two publish strategies, using the PUBLISH_STRATEGY
configuration:
This strategy is similar to the transactional outbox described by Chris Richardson. The decorated function intercepts the function call and saves it on the local DB to be executed later. A separate event relayer will keep polling local DB and executing those functions in the same order it was stored. Be careful with this approach, if any execution fails, the relayer will get stuck. Otherwise, it would not possible to guarantee delivery order.
This strategy will always execute the decorated function after the current transaction commit. With this approach, we don’t depend on an event relayer to execute the decorated function and deliver the message. Failed items will only be executed through the relayer. Although we can decrease the delay to execute the decorated function with this approach, we cannot guarantee delivery order.
Besides the publish strategies, there are two other options that can be configured:
While the default behavior of Jaiminho makes losing an event less likely, it still has a point of failure: if the notifying method fails and Jaiminho receives an error trying to store the database entry, the event will be lost. In order to avoid this possibility, the PERSIST_ALL_EVENTS
configuration can be set to True, which will save the database entry before trying to notify the event. In this case, the notifying method will be called when the database transaction is committed, which will ensure that the event is persisted. This is not applicable when the publishing strategy is Keep order, because all the messages must be stored to achieve this strategy.
The PERSIST_ALL_EVENTS
configuration can be combined with the DELETE_AFTER_SEND
one. If DELETE_AFTER_SEND
is activated, after a successful notifying of the event, the persisted database entry will be deleted, in order to make sure Jaiminho will not store successfully sent events.
You might be wondering: why isn’t the PERSIST_ALL_EVENTS
configuration the default behavior of Jaiminho? The reason behind it is performance. For each event sent, the activation of this setting will cause two extra database operations (one insertion and one deletion), which can be a lot for use cases where there is a large number of events being sent. So, it’s up to your use case and you to decide whether you prefer the performance of just going to the database if something fails or the consistency guarantee of not losing any events.
Jaiminho comes with two Django Commands that can be useful:
The Event Relaying command will query the database for all events that were not sent in a given timebox and will try to resend them using the original method, message and kwargs. Note that if the event relay runs after a deployment that changed or removed the original method, it will not be able to resend the failed events and they will have to be dealt with manually. Also, the DELETE_AFTER_SEND
configuration also applies to this command.
To ensure all failed events are re-sent by Jaiminho, you have two options with the Event Relaying command:
run_in_loop
and loop_interval
flagsThe Event Cleaning Command will query the database for all events that were successfully re-sent in a given timebox and will delete them from the database in order to limit storage using for old events.
Jaiminho emits two Django signals, which can be used, for example, for collecting metrics:
event_publish
: emitted when an event is successfully sentevent_failed_to_published
: emitted when an event failed to be sentTo add Jaiminho to your project, you have to:
python -m pip install django-jaiminho
INSTALLED_APPS
:INSTALLED_APPS = [
"django.contrib.auth",
"django.contrib.contenttypes",
...
...
...
"jaiminho"
]
settings.py
:JAIMINHO_CONFIG = {
"PERSIST_ALL_EVENTS": False,
"DELETE_AFTER_SEND": True,
"PUBLISH_STRATEGY: "keep-order"
}
python manage.py migrate
@save_to_outbox
decorator to any method responsible for communicating with external systems (brokers, external APIs, etc):from jaiminho.send import save_to_outbox
@save_to_outbox
def any_external_call(**kwargs):
# do something
return
python manage.py events_relay loop_interval 15
If you are interested in learning more, there is a detailed documentation on the GitHub repository. Also, contributions are more than welcome! Feel free to contribute if you find any bugs or have any suggestions.