CloudStacking.com

S3 URL sharing: simply available

As S3 is a web-based file share, rather than a locally attached block device such as a SCSI disk or thumb drive. Because it is (only) accessible via HTTP, we can choose to direct web clients directly to it, instead of serving it from our web server, thereby offloading the load from the web servers and enjoying the built-in redundancy of S3.

The beauty of it is that it requires absolutely no change from either the web server or the client browser - just be sure to generate your HTML code with absolute paths to the relevant files in S3 and we are good to go:

<img src="http://s3.amazonaws.com/MyBucket/MyPicture.jpeg"/>

Simple has its own limitations

The classic use-case for this feature is where we have a public website serving equally public multimedia content (such as pictures) for anonymous internet clients.

But what happens when we want to implement access-control and authenticate users in our application before we allow them direct access to the content stored on S3?

The bad news is that S3 supports setting file permission ACLs, but it only works with Amazon user accounts (the same ones used for AWS and the Amazon bookstore) - which isn’t really practical to control from inside our application and doesn’t integrate with any existing user database.

The solution is to use an S3 feature called URL Expiration.

There’s nothing wrong with SQS, but nobody’s perfect either.

In a previous post, I’ve covered the basics of Message Queuing, and Amazon’s implementation of it: SQS (Simple Queue Service).

Amazon has built SQS with three leading principles in mind: Simplicity, Scalability and Redundancy.

In order to achieve exactly that (and achieve they have), some concessions and unorthodox design decisions, creating a few gotchas that we need to keep in mind when working with SQS.

Some technical background before we dive in

In order to provide the so-called unlimited scalability for a given Queue (represented by the unlimited number of messages that could be placed in it) - the operation of the Queue service divided between a number of servers

This is done by grouping messages in the Queue Blocks composed of a certain number of messages which are dispersed, with each Block handled by a separate server.

This Note that in order to also achieve Redundancy, a given Block is actually duplicated to multiple servers in multiple AWS Availability Zones so that no single server failure will result in a loss of messages.

Anyone who has ever tried scaling software on a massive scale (and massive is really the word that comes into mind when thinking of AWS) knows that the #1 challenge and as such the bane of scalability is synchronization.

Having multiple servers/services/applications/pieces trying to coordinate and synchronize their actions by definition involves one component spending part of its time waiting for another (the more components, the greater the wait to work ratio) - and down goes the utilization.

That’s why AWS has bent some rules and made concessions as far as synchronization goes in order to enable the massive scalability of their offerings - SQS included (for example: you can have unlimited number of messages in a single SQS Queue).

But these amazing qualities come at a price - and below I will describe what exactly that price may be and how we can live with it.

S3 URL Expiration

S3 URL sharing: simply available

Simple has its own limitations

Amazon SQS Gotchas

There’s nothing wrong with SQS, but nobody’s perfect either.

Some technical background before we dive in