• srdan Apr 3, 2012

    Storing Stuff Not Worth Storing (Redis) by srdan

    Once you pass about 1 000 requests per second you start realizing that storing all that data is gonna be a problem. At 5 000 requests you’ve pretty much stricken off all solutions having the letters SQL on their frontpage. At more than 10 000 requests, your only real option is Redis.

    In-memory distributed caches are a great way of throwing away your data. In a good way. No, really. Look at Redis for example.

    You see, we have this one use-case where we need to assemble a bunch of fragments into a complete … thing. All the fragments that belong together share a common key, but we cannot tell beforehand how many fragments there are going to be or in what order they will arrive. We can however assume that if we haven’t received a new fragment in a certain period of time for a certain key, the fragments sharing that key are ready to be assembled.

    The fragments are arriving at 5-10k/s, and once they’ve been assembled they get to stick around for a couple of days - just in case we decide we need to reassemble some of them. Storing these in a regular database and removing them randomly caused no end of trouble with fragmented databases, lousy scaling and itchy trousers[1].

    So we started using Redis. It’s a lightweight in-memory database-like key-value store. It supports 4 general datatypes, strings, lists, hashes and sets. What we would do is we’d create lists acting as buckets for fragments, setting an expire date on each bucket. That way we’d collect what we want and the database would purge overdue buckets after their individual TTL was reached. Beautiful. And since it was all kept in memory at all times, there was no fragmentation, no slowdowns, no unwanted side effects.

    Check it out! This appends a BSON packed fragment to the list (Redis auto-creates the list if there isn’t any) and sets the expiration to 2 days.

    def parse(fragment)
      @redis ||= Redis.new()
      @redis.lpush(fragment.key, BSON.serialize(fragment.to_h))
      @redis.expire(fragment.key, 2*24*60*60)
    end

    While this is all very sleek and clever, one is left to ponder the question of ‘why not just do this from within your program?’ It’s got lists, it’s got memory. You’re not a half bad developer either, you could probably pull it off without third party infrastracture, right?

    Well.. You could. I guess. But that means all your data is kept within the program instance itself. If it crashes or gets too slow you may be in serious trouble. Using Redis you can attach any number of processes and have them perform actions on a colaborative data set.

    But Redis is weirder than that.

    Redis is single threaded by design. This means that one and only one thread may manipulate the data set at any given time. While sounding like a massive drawback, it’s actually quite clever. In particular because Redis itself can perform some basic operations. I’ve already mentioned appending data to sets and list, but Redis can compare bits, increment numbers, return and manipulate substrings and so on. Blazingly fast. These abilities have fostered a number of packages that allow you to use Redis as a backend for your ordinary variables. One of them is Redis-roc.

    Store = ROC::Store::RedisStore.new(Redis.new)
    stuff = Store.init_hash('stuff')
    puts "before: #{stuff.to_h}"
    stuff[rand(100).to_s] = 'something'
    puts "after: #{stuff.to_h}"

    Running this snippet multiple times or on multiple machines will keep setting ‘something’ for random keys in the hash making the hash grow. You can treat the hash like an ordinary ruby hash while keeping everything stored on a remote Redis-server where it’s easly disitributable and safe from crashing. That’s neat. And useful. And a bit scary. You can do all the ordinary replication stuff you’ve grown accustomed to with a master for writes and listeners for reads.

    Of course.. You can’t really keep very large amounts of data in Redis. You’re limited to the amount of memory you have in your Redis machine. That’s potentially a deal-breaker of course, but for keeping temporary items, partial computation results and stuff you need access to quickly, Redis is a God-sent.

    [1] See the mongo posts on this blog. I don’t think they mention the itchy trousers specifically, but you can sort of infer that..

    #engineering, #Redis, #Burt, #burtcorp,