MongoDB vs. SQL Server 2008 Performance

This article is a follow up one I wrote last week entitled “The NoSQL Movement, LINQ, and MongoDB – Oh My!”. In that article I introduced the NoSQL movement, MongoDB, and showed you how to program against it in .NET using LINQ and NoRM.

I highlighted two cornerstone reasons why you might ditch your SQL Server for the NoSQL world of MongoDB. Those were

1. Ease-of-use and deployment
2. Performance

For ease-of-use, you’ll want to read the original article.

This article is about the performance argument for MongoDB over SQL Server (or MySql or Oracle). In the first article, I threw out a potentially controversial graph showing MongoDBperforming 100 *times* better than SQL Server for inserts.

“A potentially controversial graph showing MongoDBperforming 100 times better than SQL Server”

We’ll see source code, downloadable and executable examples and you can verify all of this for yourselves. But first, here’s a new twist on an old proverb:

“Data is money”

If your application is data intensive and stores lots of data, queries lots of data, and generally lives and breathes by its data, then you’d better do that efficiently or have resources (i.e. money) to burn.

Let’s imagine you’re creating a website that is for-pay and data intensive. If you were to attempt to plan out your operating costs per user to help guide the pricing of your product then the cost of storing, querying, and managing your data will likely be a significant part of that calculation.

If there is a database that is 100 times faster than SQL Server, free, easy to administer and you program it with LINQ just as you would with SQL Server then that is a very compelling choice.

When you have such a database, it means you can run your system on commodity hardware rather than high-end servers. It means you can have fewer servers to maintain and purchase or lease. It means you can charge a lot less per user of your application and get the same revenue. Think about it.

“It means you can charge a lot less per user of your application and get the same revenue. Think about it.”

One more story before we see the statistics. Kristina Chodorow from 10Gen gave a talk a few weeks ago at San Francisco’s MySQL Meetup entitled “Dropping ACID with MongoDB”. You can watch the recording here:

http://www.ustream.tv/recorded/6146875

[The audio and video isn’t too hot, but the content is. Skip the first minute without audio.]

During this talk, Kristina describes SourceForge’s experience moving from MySql toMongoDB. On MySql, SourceForge was reaching its limits of performance at its current user load. Using some of the easy scale-out options in MongoDB, they fully replaced MySQL and found MongoDB could handle the current user load easily. In fact, after some testing, they found their site can now handle 100 times the number of users it currently supports.

Not convinced of this NoSQL thing yet? Fair enough. Here are some graphs, some stats, and some code.

The scenario:

Model a data intensive web application aiming to support as many concurrent users as possible. There will be users from the web application itself. But there will also be users from an API and external applications. Users will interact with the data by having nearly as many inserts as they do queries. Their inserts are all small pieces of data and are all independent of each other.

Let me just get this out of the way and I mean the following in the nicest of ways: I don’t care about your scenario or use-case. The scenario above is what I’m trying to model. I’m not trying to do bulk-inserts or loading large files into databases or anything like that. MongoDBmay be great for these. SQL Server may have specialized features around your use-case, etc. They don’t apply in my scenario. So please don’t wonder why I’m not using bulk inserts or anything like that in the examples below.

Insert Speed Comparison

It’s the inserts where the differences are most obvious between MongoDB and SQL Server.

These inserts were performed by inserting 50,000 independent objects using NoRM forMongoDB and LINQ to SQL for SQL Server 2008. Here are the data models:


MongoDB basic class


SQL Server basic class

I ran five concurrent clients hammering the databases with inserts. Here’s the screenshots for running against MongoDB and against SQL Server. Let’s zoom into the most important result with the output from one of five concurrent clients:

MongoDB:

SQL Server:

That’s right. It’s 2 seconds verses 3 1/2 minutes!

Now to be fair, this was using LINQ to SQL on the SQL side which is slow on the inserts. After discussing these results with some friends, I re-ran the tests using raw ADO.NET style programming and saw a 1.5x-3x performance improvement for SQL. That still leavesMongoDB 30x-50x faster than SQL.

Query Speed Comparison

Now let’s see about getting the data out using the same objects above on the indexed Id field for each database.

Here MongoDB still kicks some SQL butt with almost 3x performance. If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.

“If we were to leverage the mad scale-out options thatMongoDB affords then we could kick that up to many times more.”

Complex Data and the Real World

Feel like that was an overly simplified example? Here’s some real world data with foreign keys and joins. Below is the complex data model.

MongoDB:

SQL Server:

It shouldn’t surprise you that MongoDB does even better here without its joins.

The Hardware

All of these tests were run on a Lenovo T61 on Windows 7 64-bit with a dual-core 2.8 GHz processor using the 64-bit versions of both SQL Server 2008 Standard and MongoDB 1.4.1. You can even see a picture of the computer here: http://twitpic.com/hywa8

Your Turn

If you want to see the entire set of data above as an Excel spreadsheet, you can download that here:

http://www.michaelckennedy.com/Downloads/sql-vs-mongo.xlsx

You can also download the sample code. Before you do, realize I haven’t done a bunch of work to make it super easy to run. But you should be able to figure it out. Just turn the knobs on the PerfConstants class for the number of inserts and queries. Then comment or uncomment sections of the code in the clients for your scenarios.

The expected use is that you’ll start the launcher application then use it to launch five concurrent clients at exactly the same time.

Download Sample:

http://www.michaelckennedy.com/Samples/SpeedOfSqlVsMongoDBAnddotNetSample.zip

Got feedback? Write a comment or contact me on Twitter: @mkennedy or find me in any of these other ways.

(via Michael C. Kennedy’s Blog)

Advertisements

3 thoughts on “MongoDB vs. SQL Server 2008 Performance

  1. MongoDB achieves high write performance by writing to memory and eventually writing to disk at some unspecified point in time in the future. RDBMSes write to a write ahead log before they commit data to disk.

    If you pull the power cord out of an RDMBS and start up the server and start it back up, the RDBMS will start back up and enter the recovery process. During the recovery process, an RDBMS will read from the log and rollback and transactions that need to be rolled back and commit any transactions that need to be committed.

    MongoDB does have a log, but it’s just another MongoDB collection. It syncs every 100ms, but you can configure that.

    If you rip the power cord out of a MongoDB server, what happens? If a sync is in process, you may have a corrupt data file. Why? Because MongoDB uses memory mapped files.

    The other side of this is that you could lose up to 1/10th of a second of data because of the sync delay between the journal and disk. If you’re using MongoDB because it’s faster, you could lose a considerable amount of data.

    That’s not to say MongoDB isn’t a good choice. It could be. It is, after all, very fast in a limited number of use cases. But I don’t know that I’d call it a durable or safe place to store data.

    • About loss data you can use MongoDB with “–journal” to write ahead log (Durability & Repair). MongoDB lack transaction and keep it far Schemas because performance problem ( but it can add by developer if need ). Power cord out / disk fail it is hardware problem you can easy bypass by UPS/ RAID.

      • It is smart to mitigate hardware failures with hardware solutions (as you say UPS and RAID). But it not always necessarily so. It’s not uncommon tat OSs have software bugs. Or even software (namely drivers) within OSs have bugs. And they cause the system to sometimes fails as if power was cut. System just breaks and restarts… What I’m trying to say is that certain features although seemingly related to hardware failures actually mitigate software ones.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s