Pinterest Architecture Update – 18 Million Visitors, 10x Growth,12 Employees, 410 TB Of Data

There has been an update on Pinterest: Pinterest growth driven by Amazon cloud scalability since our last post: A Short on the Pinterest Stack for Handling 3+ Million Users.

With Pinterest we see a story very similar to that of Instagram. Huge growth, lots of users, lots of data, with remarkably few employees, all on the cloud.

While it’s true that both Pinterest and Instagram are not making great advances in science and technology, that is more indicator of the easy power of today’s commodity environments rather than a sign of Silicon Valley’s lack of innovation. The numbers are so huge and the valuations are so high we naturally want some sort of fundamental technological revolution to underlie their growth. The revolution is more subtle. It really is just that easy to attain such growth these days, if you can execute on the right idea. Get used to it.

Here’s what Pinterst looks like today:

  • 80 million objects stored in S3 with 410 terabytes of user data, 10x what they had in August. EC2 instances have grown by 3x.  Around $39K fo S3 and $30K for EC2.
  • 12 employees as of last December. Using the cloud a site can grow dramatically while maintaining a very small team. Looks like 31 employees as of now.
  • Pay for what you use saves money. Most traffic happens in the afternoons and evenings, so they reduce the number of instances at night by 40%. At peak traffic  $52 an hour is spent on EC2 and at night, during off peak, the spend is as little as $15 an hour.
  • 150 EC2 instances in the web tier
  • 90 instances for in-memory caching, which removes database load
  • 35 instances used for internal purposes
  • 70 master databases with a parallel set of backup databases in different regions around the world for redundancy
  • Written in Python and Django
  • Sharding is used, a database is split when it reaches 50% of capacity, allows easy growth and gives sufficient IO capacity
  • ELB is used to load balance across instances. The ELB API makes it easy to move instances in and out of production.
  • One of the fastest growing sites in history. Cites AWS for making it possible to handle 18 million visitors in March, a 50% increase from the previous month, with very little IT infrastructure.
  • The cloud supports easy and low cost experimenation. New services can be tested without buying new servers, no big up front costs.
  • Hadoop-based Elastic Map Reduce is used for data analysis and costs only a few hundred dollars a month.

(via  HighScalability.com)

Hacker commandeers GitHub to prove Rails vulnerability

A Russian hacker dramatically demonstrated one of the most common security weaknesses in the Ruby on Rails web application language. By doing so, he took full control of the databases GitHub uses to distribute Linux and thousands of other open-source software packages.

Egor Homakov exploited what’s known as a mass assignment vulnerability in GitHub to gain administrator access to the Ruby on Rails repository hosted on the popular website. The weekend hack allowed him to post an entry in the framework’s bug tracker dated 1,001 years into the future. It also allowed him to gain write privileges to the code repository. He carried out the attack by replacing a cryptographic key of a known developer with one he created. While the hack was innocuous, it sparked alarm among open-source advocates because it could have been used to plant malicious code in repositories millions of people use to download trusted software.

Homakov launched the attack two days after he posted a vulnerability report to the Rails bug list warning mass assignments in Rails made the websites relying on the developer language susceptible to compromise. A variety of developers replied with posts saying the vulnerability is already well known and responsibility for preventing exploits rests with those who use the language. Homakov responded by saying even developers for large sites for GitHub, PosterSpeakerdeck, and Scribd were failing to adequately protect against the vulnerability.

In the following hours, participants in the online discussion continued to debate the issue. The mass assignment vulnerability is to Rails what SQL injection weaknesses are to other web applications. It’s a bug that’s so common many users have grown impatient with warnings about them. Maintainers of Rails have largely argued individual developers should single out and “blacklist” attributes that are too sensitive to security to be externally modified. Others such as Homakov have said Rails maintainers should turn on whitelist technology by default. Currently, applications must explicitly enable such protections.

A couple days into the debate, Homakov responded by exploiting mass assignment bugs in GitHub to take control of the site. Less than an hour after discovering the attack, GitHub administrators deployed a fix for the underlying vulnerability and initiated an investigation to see if other parts of the site suffered from similar weaknesses. The site also temporarily suspended Homakov, later reinstating him.

“Now that we’ve had a chance to review his activity, and have determined that no malicious intent was present, @homakov’s account has been reinstated,” a blog post published on Monday said. It went on to encourage developers to practice “responsible disclosure.”

Updated to differentiate between Ruby and Rails.

(via arstechnica.com)