Getting started with MongoDB is not too hard, but as you start building apps with it, you will see that there are a few complex issues that need to be addressed. Some deal with normalized and denormalized data, some with replica set failures, and so on.
However, there’s no reason to worry about a thing. The community around MongoDB is there to share various tips and tricks that should help you resolve any issue that may arise, from app design and implementation to data safety and monitoring.
Without further ado, take a look below and check out six proven tips for MongoDB developers.
MongoDB is not a data warehouse
This is a common misconception among many industry enthusiasts, thinking that MongoDB is a data warehouse.
It is a NoSQL database which means that it stores data as objects in dictionaries rather than in tabular format. To access the data, backend tools like Python or Node.js are usually used to import and transform the data.
MongoDB is so popular because web developers store a lot of their information in JSON format, which is Javascript’s native object storage method.
Keep in mind that you can build a data warehouse for MongoDB using various tools, but MongoDB is not a data warehouse on its own. It was created as an operational database, not to support serious analysis.
Duplicate data
Data that is used by more than one documents can be:
- Embedded (denormalized)
- Referenced (normalized)
Denormalization is not a better choice than normalization and vice versa. To be precise, each has its own trade-offs and users should choose to do whatever works best with their application.
When it comes to denormalization, it can lead to inconsistent data. Inconsistency is not something you want to see, but the level of imperfection depends on what users are storing. For many apps, short periods of inconsistency are fine.
Should someone change their username, it’s not alarming if old posts show up with the old username for a few hours. On the other hand, in case you have inconsistent values, even for a short period of time, you should aim at normalization.
And when you decide to normalize, the app has to do an additional query every time it wants to perform an action. If the app can’t afford to perform a hit, you can reconcile inconsistencies later.
Normalization
Normalize, but only if you need to future-proof your data. The goal behind this action is to still have a chance to use normalized data for different applications that will query the data in different ways in the future.
However, this happens when there is an assumption that the user has some data set that will be used by several applications for years and years.
Data sets like this exist, but most users’ data is constantly evolving, and old data is either updated or dropped. The vast majority of users want their database working as fast as possible on the queries they’re doing now.
On top of that, if they change the given queries in the future, they’ll optimize their database for the new queries.
Embedding dependent fields
Have you ever thought of embedding or referencing a document?
If yes, you should ask yourself if you will be querying for the information in this field by itself, or only in the framework of the larger document.
For instance, a user might wish to query on a tag, but only to link back to the posts with that tag, not for the tag on its own. Similarly, with comments, a user might have a list of recent comments, but people are interested in seeing the post that inspired the comment.
So, if someone has been using a relational database and is migrating an existing schema to MongoDB, joined tables would be excellent candidates for embedding.
Tables are virtually like a key and a value (just like permissions, tags, or addresses). It almost always works better when embedded in MongoDB. If only one document cares about certain information, make sure to embed the information in that document.
Design for self-sufficiency
MongoDB was designed to be a huge, dumb datastore. What this means is that it does almost no processing.
Instead, it just stores and retrieves data. As a user, you should have this goal in mind and do your best to avoid forcing MongoDB to do any computation that could be done on the client.
Even the simpler tasks, such as finding averages or summing fields, should usually be pushed to the client.
If a user wants to query for information that has to be computed and is not explicitly present in the document, they have two choices:
- Incurring a serious performance penalty (making MongoDB do the computation using JavaScript)
- Making the information explicit in the document
In general, users just make the information explicit in their documents.
Computing aggregations
It is advised to compute aggregations over time with $inc and this should be done whenever it is possible.
If your aggregations need more munging, it would be best to store the data in the minutes field and then have an ongoing batch process that computes averages from the latest minutes.
Since all of the information necessary to compute the aggregation is stored in one document, this processing could even be passed off to the client for newer (unaggregated) documents.
Final thoughts
Having a good understanding of MongoDB, along with a clear view of what a user wants to do with the database is the formula for great database design. Keep that in mind and go through the article again to make sure you get each tip right.
Leave a Reply