For one of our customers we have a property search API backed by a mongo database using the original storage engine (MMAPv1) with almost 400k documents. As the document count rose over time I kept horizontally scaling the Linux VM to the point where it was getting a little expensive (4GB RAM and 160GB disk) to host. I did this scaling because general searching is difficult to keep fast without indexing every combination of fields you may search against and so I was constantly adding indexes to keep the performance reasonable.
Under load the mongod process would consistently have 28.9 GB Virtual Memory allocated and 940MB Resident while using about 20GB on disk.
Without the 4GB RAM searching performance would really suffer when a new (non-uniq) query would be executed. Usually the second and subsequent times the same query was performant due I think to the fact that it was now in the working set. Being RAM constrained keep all popular combinations from being available in the working set at the same time.
Besides just being slow many times the query would pass the timeout and just fail, I think we can all agree that an API that fails the 1st time you call it but works on subsequent calls is not ideal.
What to do? I could just click the slider in the hosting control panel over one more click to the 8GB VM but now the cost is about comparable to some of the Mongo as a Service offerings and I’m wondering if I should have just used a SQL db…
Before we move on to the obvious solution mentioned in the title I should mention another constraint that a general purpose search has that makes the mongo queries difficult to keep fast; paging. We have to support paging and provide the count of matching documents plus the predicate can be almost any combination of fields in the documents. The only implementation I could come up with was to use limit and skip and do it in 2 steps:
- $cursor = $collection->find($query)->limit($pageSize)->skip($skip);
- $count = $collection->count($query, $max);
Note that tuning batchSize also helped but we were at the edge of acceptable performance and something needed to be done.
Along comes the Mongo press release for the optional new WiredTiger storage engine, what’s this? It supports compression, that’s what we need!
Please understand this is not a direct apples to apples comparison because the Linux distro and kernel are not exactly the same but after provisioning a new VM and restoring from a mongo dump the numbers are staggering. The on disk size (/var/lib/mongo) is now down to 2.1GB! Under load the mongod process is using around 2GB Virtual and 1GB Resident.
Win, win. Performance is comparable and it costs less to host with just the pain of converting the config file to the new format. Nice job WiredTiger developers.