Cheap and cheerful config storage using DynamoDB
Recently we implemented a utility that needed to store a bit of state between runs. This script could run on a number of machines (well, VMs) so relying on local filesystem was not a good idea.
Luckily, AWS provides a service called DynamoDB which is a NoSQL database in the cloud. Unlike most AWS resources, you don’t create instances of Dynamo, instead you just create tables on it and use them as needed.
DynamoDB: cheap, scalable, zero operational overhead
Using dynamo as a config store costs close to zero because Dynamo is very cheap when your reads and writes are infrequent. It’s also very simple from an ops standpoint because you don’t worry about it going down, or even scaling for that matter, as compared to running a Redis instance, even via AWS ElastiCache.
Below I’ll attempt to demystify Dynamo a bit while giving a handful of simple bash commands that allow you to use Dynamo as a simple key value store.
In our example, we’re storing some key-value pairs in a Dynamo table. This table will act as a history of all the config that we’ll put in there, but we’ll also provide a way to fetch the latest row so that we can just read the current state of the config.
Hash Keys and Range Keys
Dynamo is an “infinitely scalable” distributed database. So your logical table is actually many physical tables. For this to work, they use the relatively simple concept of sharding: they look at your data, and figure out on which shard it should go based on what is called the Hash Key. When we query the data, we will need to provide a value for the hash key so that dynamo knows which shard to look at.
Then there’s the question of how the table should be sorted (and queried). For this, we tell dynamo which column to use as the Range Key. Unlike a traditional SQL database, you cannot change the order arbitrarily during query time, but you can read the pre-sorted table forwards or backwards (query results are always sorted ascending by default).
First, decide how we will partition our table (hash key)
Since we’re using our table for app config, and we need some key to partition it by, we can use the app name as the hash key. If you only ever use this for one app, this still makes sense because dynamo requires you to have some kind of hash key to query the table. So our field will be called app_name
Now decide how to sort it (range key)
Since we won’t have control of sorting at query time, we will have to bake the idea of timestamps into our table. That means we’ll create a field called updated_at and store an ISO8601 timestamp in it. This field will be our range key, telling Dynamo to use it for sorting.
Create the table
Note that at the end we also had to specify how many reads/writes per second we want. Dynamo charges us based on that number rather than what we actually use. Adjust it as necessary for your usecase. Now we wait a 5–10 seconds for the provisioning to complete.
Add records
Dynamo can be used fully schemaless; besides the two keys we identified above, we can shove any data we want in there. Here’s an example of inserting a key-value pair along with a timestamp (remember, the timestamp is needed to query for the latest record later). The slightly strange syntax for inserting items requires that each item be identified with its type. In this case we are using “S” to insert a string. You can learn more about types here.
Query for the latest config for our app
To construct the query, we are going to query on the hash key (our app name), and because the table is already sorted by timestamp, we just need to scan it in reverse, which means using the “ — no-scan-index-forward” flag. The final bit at the end uses the jq utility to parse the value we’re interested in out of the JSON response.
The result prior to going into jq, looks something like this:
So the jq query of “Items[0].my_key.S” retrieves the value of the string under “my_key”. You can scale this out to read as many key/value pairs as you want.
Scanning the whole table to read history
There are lots more things you can do here, including querying on the range key in order to get config from a certain historical period, or you can simply view what is stored in the entire table, using a scan:
aws dynamodb scan --table-name app-config
Keep in mind this is an expensive operation that can burn the read capacity on your table, so don’t do it unless you have to. Instead, prefer the query operation.
If you think this is cool, you might like to work at Reverb.com
Till next time — Yan Pritzker