TransWikia.com

CQRS for avoiding heavy joins

Software Engineering Asked by Atef on October 29, 2021

I’ve been wondering about what is a perfect use case for CQRS where the benefits overcome the complexity and the cost with come with the package. So for me to better understand it, I want to share with you a theoretical use case and we can see together whether it’s a good CQRS fit or not, and if not what alternatives would be good in this case.

Let’s say we have the following very simple model:
Simple model
As
we can see, we have departments which have multiple students where they can pass exams on subjects which are classified into categories. Very simple.

Let’s also say our solution have a screen where we want to display all these information. Here’s our ugly UI for this page:

Our ugly UI

As you can see the information shown in the table must be extracted from all the entities shown above. So we have a lot of joins.

Supposing that our client have w very successful online school with millions students. And they do not tolerate performance issues, so the solution must be as fast as possible.

It’s clear that there are multiple joins when displaying this page, so to improve the overall readability we went ahead and started experimenting on CQRS.
We need to add a new database for read operations which stores all the above information in a single row, so the read model for our view is as follows:

enter image description here

This will definitively make reading faster but it will also make thing more complicated. For example, we need an event source to publish event that sync the read database with the write database, etc…

So my question is, in this particular use case, is CQRS beneficial ?

One Answer

CQRS is about two types of interactions, one that mutates state (writes) and one that does not mutate state (reads). It says you shouldn't have everything into one place, since you usually treat writes and reads differently. Writes for example need more constraints when it comes to transactions, which you are forced to have in place also for reads because you operate on the same set of data that others might update. If you read from a different place you can, at the extreme, read with no need for transactions whatsoever because all you do is read and nobody does changes to the data.

For the use case you are describing, the CQRS pattern can work nicely. In this case, having separate tables (student, department, subject, etc) is good for writes (high normalization database) but it's bad for reads because you need lots of joins (you need data denormalized for reads to gain higher performance). So splitting the two, makes sense.

The challenge will be how to get you data from the "write database" into the "read database" and keep them in sync. You can do that with ETL in batches at night, in real time if the ETL tools support it, you can have some job run at night when nobody uses your application and do the joins and write them in a separate denormalized table overwriting everything, you can have timestamps on your exam records and a job at night that processes only what's new from the last time you did the joins instead of overwriting everything, etc. How you do this depends if you want to be delays between when the data is available in the write database and when it can be retrieved from the read database.

Before jumping to CQRS though, you might want to analyze your use cases to see exactly what data you extract from the database and display in the UI. If it's always something for just one user or one exam or whatever, maybe more indexes on the write database might be sufficient. That way you avoid having to add another database and complicate your design.

Answered by Bogdan on October 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP