Amazon S3 Outage: How we minimized the impact

March 6, 2017

Few days ago, Amazon S3 (part of Amazon Web Services) in N. Virginia data center suffered a 4-hour downtime. N. Virgina region is one of the largest AWS data center and many companies, both big and small, rely on it to run their IT infrastructure.

Outages in cloud-based services is impossible to avoid. However, with careful planning, you can minimize the impact to your applications significantly.

#1: Redundancy

So, why don’t we just replicate the entire bucket to another region? Amazon S3 does support cross-region replication natively.

While it is a valid idea, not every company can afford the twice the cost of their data storage (not to mention the data transfer).

If you have the budget for it and your use case calls for full redundancy, then go ahead.

Otherwise, let’s talk about partial replication. What should be replicated? Here are some examples.

High-priority things:

  • Revenue-impacting objects. For example, advertising assets.
  • Static website files.
  • Most recent database backup.
  • Security-related log files/audit trails

Lower-priority things to replicate:

  • User generated content, especially if you are providing a free service.
  • Older database backups
  • Cold storage/archives
  • General log files
  • Application-generated objects – non revenue-impacting files that can be re-generated by your application using existing data

#2: Use A Newer AWS Data Center/Region

Remember this, this, and this?

My team has always worked very closely with an AWS Technical Account Manager (TAM) or an AWS Solutions Architect. Since 2013, my AWS TAMs have been informing my team that unless I have an absolute need to be on N. Virginia data center, customers are generally advised to stay away from it. They have been recommending US-West-2 to be used as the primary region for new developments.

#3: Cache Items on CDN-Level

Many modern Content Distribution Network (CDN) services offer content caching services. If your website/server (origin) becomes unavailable, the CDN will serve cached (“stale”) content until the origin becomes available again.

Your CDN may not mirror your entire S3 bucket contents, but it will have many of the recently-accessed objects.

Cloudflare (Free) and Fastly are two of many CDNs that offer this feature (see more).

Author
Ryan Harijanto

Head of Engineering. Former Sr. Engineer @Netflix , @HotelTonight , @Shutterstock. Previously a Senior Systems Engineer at Netflix, currently technology advisor and board member for emerging companies. Diverse technological knowledge and understanding of various industries.

Leave a Reply

Your email address will not be published. Required fields are marked *