Programming and Technology
RSS icon Home icon
  • mySQL FIFO and Thoughts on Data Posterity

    Posted on July 13th, 2011 brandon 1 comment



    I am working on a project that will track the actions of users and store them in a database in order to build an activity stream not dissimilar to Facebook’s news feed. Right away I can see that I am very quickly going to accumulate more data than i can possibly manage using conventional methods so I am left with a few obstacles to overcome if I am to implement something useful while not sacrificing performance or posterity. The idea is a user wants to see their own activity stream or possibly another user’s stream. The first decision that must be made is how far should the user be able to go back to see activities in the past? Should they be capped at the last 25 activities, or should there be pagination available to see older activities? For posterity my instinct is to never delete any activities ever however anyone can see how this could become unmanageable especially if I were to track activities as mundane as page views. Additionally, this would be the only way to generate statistical reports for individual users such as how many total page views they have, or what content categories do they frequent the most.

    Another thought that occured to me was I could investigate the possibility of using one of the many touted ‘noSQL’ solutions such as mongo and the ilk. From what i’ve read they would be good solutions for storing large amounts of data that do not need to be fetched very often. I can see activities that fall off the end of a FIFO queue going into one of these archival data stores and only the latest activities remaining in the mysql database. If i were to go this route, the follow up concern would be performance of maintaining this FIFO queue in a mysql table when recording activities. I came across a great article on implementing a FIFO queue in mysql which i’ve considered experimenting with. The situation I’d like to avoid is having to do multiple sql queries for every activity insertion. This article seems to have a solution for doing it all in one shot which sounds promising however I would need to modify it slightly to handle a modulus that is associated to a two column index since my table would track multiple datasets not a single one (multiple users).

    In exploring solutions I would love to hear some suggestions on optimal ways of handling these problems. What’s best practice for managing user activity history using either standalone mySQL or mySQL + an additional data store?

    Share
     

    One response to “mySQL FIFO and Thoughts on Data Posterity”

    1. Using both mySQL and noSQL sounds like a nightmare.

      If you already know mySQL and have this working, then you can push it pretty far with memcached. And, there’s the Bret Taylor method that kept FriendFeed running a massive feed system in mySQL well past when Facebook acquired them, http://bret.appspot.com/entry/how-friendfeed-uses-mysql. Plus, SSDs are cheaper now, so memcached + RAID 10 SSDs will solve a lot of problems before you get to Bret Taylor engineering-levels.

      If you’re really optimistic about your site, or short on hardware + have time on your hands to learn new stuff, then noSQL is an option. For your feed-type lookups the key-value stores (like MongoDB) should work far better than the document DBs (CouchDB).

      Since you don’t know a noSQL system yet, you’re close to qualifying as premature optimization like Hacker News defines it lately, http://news.ycombinator.com/item?id=2796098. I don’t 100% agree with that linked article, but you definitely do want to scale before you get into the hole Facebook dug itself. Here’s a good summary of that & better analysis of mySQL vs. noSQL for scaling, http://highscalability.com/blog/2011/7/25/is-nosql-a-premature-optimization-thats-worse-than-death-or.html

    Leave a reply

Get Adobe Flash playerPlugin by wpburn.com wordpress themes