This post summarises all of my research and analysis on the performance gains made possible by utilising projection in MongoDB. We will be able to determine whether using projection will enhance MongoDB query performance at the conclusion of this tutorial.

Let's get started without further ado.

What is MongoDB Projection?

With a MongoDB projection query, we can specify the fields that should be returned. By placing a 0 or 1 next to a field's name after included it in a query, we can do projection in MongoDB. It will be visible if you specify the parameter 1; if you specify 0 it will be hidden.

Queries by default return all fields from matched documents. The server manipulating the result set using projection criteria will be less effective if you need all the fields; entire documents should be returned instead.

However, efficiency can be enhanced by utilising projection to restrict the fields that query results return by:

  • eliminating unnecessary fields from search results (saving on network bandwidth)
  • reducing the number of response fields to satisfy a covered query (returning indexed query results without fetching full documents)

The MongoDB server will have to fetch each whole document into memory (assuming it isn't already there) and filter the results to return when using projection to remove unneeded data. Depending on your data model and the projected fields, this use of projection can significantly reduce network traffic for query results without affecting memory use or the working set on the MongoDB server.

An exception to this rule is a covered query, which saves the server from having to retrieve the entire document by having all requested fields in the query result contained in the index that was used. If other queries don't need to fetch the same content, covered queries can decrease memory usage and enhance performance.

Examples

Imagine you have the following document to use as an example with the mongo shell:

db.data.insert({
    a: 'abc',
    b: new Array(10*1024*1024).join('z')
})

A range of values could be represented by the field b. (or in this case a very long string).

Next, build an index on the field a:1, which is frequently used by your use case:

db.data.createIndex({a:1})

simple findOne()?gives a query result that is around 10MB in size with no projection criteria:

> bsonsize(db.data.findOne({}))
10485805

The result will only include the field a and the document _id if you add the projection a:1 (which is included by default). The query result is now only 33 bytes, but the MongoDB server is still manipulating a 10MB document to choose two fields:

> bsonsize(db.data.findOne({}, {a:1}))
33

This query is not covered because it is necessary to fetch the entire page in order to determine the _id value. As a document's unique identifier, the _id field is included by default in query results; but, unless specifically included, _id won't be included in a secondary index.

The results from explain() will display the number of documents and index keys investigated using the totalDocsExamined and totalKeysExamined metrics:

 > db.data.find(
     {a:'abc'}, 
     {a:1}
 ).explain('executionStats').executionStats.totalDocsExamined
 > 1

The _id field can be removed from this query using projection to create a covered query that just uses the a:1 index. The covered query will be effective in terms of network and memory usage because it won't need to fetch a roughly 10MB document into memory:

 > db.data.find(
     {a:'abc'},
     {a:1, _id:0}
 ).explain('executionStats').executionStats.totalDocsExamined
 0

 > bsonsize(db.data.findOne( {a:'abc'},{a:1, _id:0}))
 21
My MongoDB queries are slow. Does my sluggish query?which uses a compound index on the field?get affected by returning a subset?

Without the context of a particular query, an example document, and the entire explain output, this cannot be answered. To compare the results of the same query with and without projection, you might run some benchmarks in your own environment. It may be a clear indication that your data model needs to be updated if your projection significantly increases the time it takes for a query to execute overall (including processing and transferring results).

It might be preferable to file a new question with specific details to examine if it's unclear why a query is delayed.


Recommended Posts

View All

Understanding the Pros and Cons of MongoDB


A NoSQL document database is MongoDB. It is a document-based open-source application for large-scale data storage.

Understanding MongoDB Data Modeling


Data modelling, in general, contains numerous components that necessitate active participation from a variety of stakeholders

10 Reasons to Study MongoDB in 2023


Why should we pick MongoDB, we'll discuss here. It has advantages and disadvantages of its own, just like every coin has two sides.

How to Drop a MongoDB Database?


Dropping a database is straightforward and may be accomplished via three methods from the command line as well as any GUI (Graphical User Interface) t...

Create, Check, and Uncap the Capped Collection in MongoDB


MongoDB Capped Collection tutorial- how to create, convert, check, advantages, disadvantages of cap collection, How to uncapped the capped collection,...