-
There’s some stuff you store that you can’t afford to get wrong. Saving your users passwords, emails, credit card accounts, favorites, etc — all that stuff’s vital. If you get it wrong your users are gonna know and complain or leave you. So you make sure you build a system that’ll handle it. It’s easy. It is. Really. You just sit your ass down and solve it. It’s important god damnit! And people know that.
I’m not gonna talk about that. If you’re having performance or stability issues storing important stuff, that probably means you haven’t made the distinction of what’s important and what isn’t.
The unimportant stuff is stuff like logs, partial computing results, anything temporary, anything that you’re gonna need once — maybe, statistical data records and so on. Basically anything where you could stand to loose a chunk of data and it wouldn’t be a big deal. I’m not necessarily talking about throwaway data, although that certainly is a part of it. I’m talking about stuff where each individual data entry is more or less worthless but a total lack of any would be catastrophical… or at least very bad. For instance, receipt emails you’ve sent to users. You probably want to keep them for years and years, for support and legal purposes, but each entry is not that important in itself - it’ll most likely never be used.
If you’re not generating humongous amounts of such data you can well stuff it in your high-reliability data store. Odds are you’re already doing this anyway. And then you run into problems. Your important data becomes inaccessible, or slow, because of a bunch of data that doesn’t need to be there.
sigh
You’ll need to separate you important from your volumous data. And like Cargill, I’ve narrowed your choices down to 5 unthinkable options. We’ll examine each in more detail.
-
Storing in files is fast, it’s decentralized, easy to move, delete, compress, backup — everything. And it’s durable. But you can’t edit the data very easily.
-
Storing in a SQL database. It’s blazing fast, you can do advance queries, there’s tons of tools, everyone is doing it, but there will come a time when you store too much stuff and everything will start crawling until it grinds to a halt [1]. And it can be hard to change the schema.
-
Storing in a NoSQL database. It’s also blazing fast, you can do queries, add columns, shard your data and feel smug. The software can be a bit buggy though as its not as omnipresent in the monolithic corporations that made SQL databases the tanks that they are.
-
Storing in-memory. Shove your stowaway data in-memory data stores. Quick, easy, reliable and you can do in on the fly, perform calculations and what not. You can’t store too much data though.
-
Storing in key-value store on the interwebs. Endless storage, practically endlessly scalable but can be a bit slow at times.
[1] And then you’ll start looking at that 500Gb database file and realize that you can’t do ANYTHING with that. You can’t export 500Gb, you can’t move it, it’s just a goddamn clog. And the data just keeps on coming in. Gaah!
-