April 18th, 2008

The Information Firehose, Lifestreams, and the Curse of Granularity

by Joshua Porter  |   10 Comments

I remember when I first started using feed readers. I was excited. So excited, in fact, that I wrote an enthusiastic post about it. Here was a tool that would allow me to know whether sites had been updated without having to visit each one in turn. I wouldn’t have to suffer from the pain of manually checking web sites to see if something was new. From the single interface of my feed reader, I could keep track of all the domains that I was interested in, without missing anything. In theory, I would save a tremendous amount of time and be able to receive much more signal and a lot less noise.

The reality, however, was quite the opposite. Once I got a taste for feeds, I started subscribing to more and more of them. I subscribed to hundreds of them: news, design topics, friends, bloggers whose writing I enjoy. But before long I started being unable to follow all of them, as I would fire up my feed reader and have literally thousands of new posts to look through. I was suffering again.

Instead of solving my information problems like I had imagined, feeds had simply substituted one problem for another. Whereas before feed readers I was having a hard time finding all the newly updated content, after feed readers I was having a hard time reading it all.

Lifestreams

We see the same thing happening again with a new type of interface element called lifestreams. Lifestreams (also called activity streams) display the aggregated feeds of individuals in reverse chronological order. The Facebook news feed is probably the most well-known example of a lifestream. It aggregates all the activity of your friends on the service and displays it in a neverending stream of content. Ridiculed at first for being a privacy concern, the news feed is now a primary driver of activity on the site. People I’ve talked to seem to be very polarized about the feature. Some love it and use it constantly. Others ignore it, especially since the introduction of applications. And, just recently Facebook added the ability to import activity from other services such as Flickr, Digg, and Del.icio.us, adding even more content to the flow.

So, again, we have the firehose problem. While at first it seemed like a great idea to be able to follow our friends on all the services they participate in, the reality is that seeing it all in one place is overwhelming. Information flows in at such a high rate that you can’t come close to seeing it all, let alone making sense of it. If someone asks you if you saw their most recent macro shots of flowers uploaded on Flickr, for example, you have to think back to the hundreds of Flickr pictures you’ve seen floating in your stream recently and try to remember which ones were theirs. This might be manageable if you only follow a couple people, but it quickly scales out of control.

Instead of solving my information problems like I had imagined, lifestreams have yet again substituted one problem for another. Whereas before lifestreams I was having a hard time following my friends, after lifestreams I find out that most of what they do isn’t that interesting after all.

The Curse of Granularity

One way to combat this problem would be to create tools that allow people to granularly define what content they want to see. So, for example, you might be able to say “I want to see Josh’s pictures on Flickr but not his activity on Digg” or “Show me only Josh’s Del.icio.us feed”. This sounds like a great thing to be able to do, but it comes at a big expense: the time and effort of managing all of those decisions. Do you really want to manage settings for each one of the people you follow? I’ve started to do this on those services that allow me to, but one thought keeps niggling in my mind: what if I’m turning off good content? What if, for example, I miss a great photo in Flickr because I’ve granularly shut them out of my lifestream?

The larger problem is that we don’t know what content is valuable before we see it. While we would think that most of our friend’s content would be worth seeing, it’s definitely not the case. Once we are able to track the activities of the people we know and love we can’t help but come to the conclusion that they’re as mundane as we are. Perhaps this is another example of the 80/20 rule: 80% of the valuable information we receive from our friends comes from 20% of their activity.

What’s Next?

So how can we best manage the information firehose? We could hope for more granular controls, which have the overhead of managing them. Or, perhaps we need to periodically declare feed bankruptcy, where we simply turn it all off for a while to regain our sanity? Perhaps I’m not being optimistic enough: is there a way to help solve this problem with software that we just haven’t seen yet? Or, perhaps it exists and just isn’t evenly distributed?

Comments (10 Responses so far)

  1. I think one interesting way to handle the information overload is using a tool to determine what has a high signal to noise ratio. You aren’t ever going to find every good story or post or email. What you can do it figure out what is wasting your time and what you get value from.

    When I switched over to Google Reader I started using the List View where you see all of the titles, but have to click to actually read them. Just a few weeks ago I started playing with the Trends tool in greader. With this you can effectively see what the most important feeds are to you, in terms of the percentage of articles you actually read.

    I think a technique like this combined with an RSS feed of life streams can be a tool in combatting information overload. I will admit it would be nice to unsubscribe from whole services in a tool like FriendFeed. I’m more interested in seeing all blog posts since they require more effort to write, thus implying more useful information, whereas I’m not concerned about missing a few tweets. Thus I would like to be able to turn off twitter for all the people I am subscribed to in FF, leaving stuff like bookmarks, photos and videos turned on.

    At the same time, this all ties into managing what sort of information you push out into your lifestreams. If I find something that I believe others would be interested in, I’ll share it in a way that others can find it. If it is really only applicable to myself, I’ll keep it as a private item.

    Overall I find this overload to be sort of a general information tool trend. Every few months I go through and unsubscribe from all the email newsletters that I get on from purchasing products and signing up for sites, thus increasing my signal to noise ratio in my email.

    I’m sure this has gotten quite disconnected by now, but thanks for the post!

  2. A possible solution is APML:

    http://www.apml.org/

    But there’s not much in the way of actual implementation yet. Several mechanisms need to be in place. First, web utilities (feedreaders, e-commerce sites, social networks) need to be able to generate APML files for their users. Then there needs to be a mechanism to combine APML files into one, and have that be portable throughout different services.

    Actually, strike all that. Dealing with brittle files is far too complicated. Attention profiling should be a web service that’s mostly invisible to the end-user. It’ll require a universal identity solution like OpenID, but after it’s turned on, it should Just Work. Web utilities should report my attention profile data to a utility living at my OpenID endpoint, and the OpenID endpoint in turn reports my aggregated attention profile whenever I log into an applicable service.

  3. When I discovered Google Reader I was also very excited, at the time I was actively reading a couple dozen websites and it made life a lot easier. With the reader came the addiction, and now I read a couple hundred sites. One of the things I noticed is that a lot of the sites within a similar niche shared a lot of the same content, interlinking it with “John Doe just wrote a great…”

    Now I tend to subscribe quickly, give the site a probationary period to see if they’re actually posting or just echoing the blogosphere, and unsubscribe just as quickly if they’re not improving my life. Granted, this does not cover the lifestreams since they’re typically friends and not just arbitrary websites in a niche, but the more sites a lifestream is aggregating the more noise in the pipe. Maybe it would be a better bet to just grab the individual feeds, and make your own lifestream-per-friend with only what you’re interested in, and hope that anything really valuable shows up in more than one of their feeds.

  4. Wow. I didn’t think I was the only one with this problem, but I never realized how bleak it really seems. I can usually keep up with my 94 Google Reader subscriptions, 5 mailing lists and aggregated FriendFeed data, but only if I keep on top of it.

    I recently spent two weeks in Chicago, and after just a day, I got miserably behind on it all, and it finally dawned on me just how much work it was to manage all that content. After a long day of museum-hopping, I’d sit in the hotel and relax. And then I’d spend the rest of the night sifting through the day’s data so it didn’t back up even further.

    This was my vacation, and I felt obligated to sort through piles of meaningless drivel, star the few things that looked interesting (I certainly didn’t have time to actually *read* any of it) and mark the rest as read, so it didn’t show up the next day. I realize I didn’t *have to*, and that doing so was my choice, but there has to be a better way.

    I’m actually really interested now to read up on APML, after Luigi’s comment. I hadn’t heard of it before, and I really wonder if it’d be of any use. I like Luigi’s idea of how it could/should be implemented as well: dump all the content at one central location, which can be managed via tags or something, rather than individual granularity, then just read the output. Sounds like a job for chi.mp.

    Ideally, I’d like to see this content move away from time-based sorting to priority-based sorting. I think that’s the biggest issue, for me at least. Whether it’s APML that does it or not, I just need a way to get the important stuff at the top, rather than just the most recent stuff.

    In fact, I generally don’t care at all *when* something took place. I know it’s new since the last time I checked my aggregator, and that’s all I really need to know. If I care about specifics, I’ll look at the detail of a particular item, I don’t need them sorted by some arbitrary property that has absolutely no bearing on their content or my interests.

    Personally, I’m a fan of behavioral learning systems, though I don’t know if something like that could be applied here. Basically, if you star something to read later, or share it with others, or spend enough time looking at it that it can be considered to actually be *read*, the item’s source and all the tags associated with it go up in value. If you delete it, ignore it, skip over it really quickly, or mark it as spam, the item’s source and all the tags associated with it go down in value. Of course, it’d be even better if you could specify certain tags that you know you’ll always be interested in, so that a few bad articles on the subject don’t adversely affect that tag’s impact.

    This would then allow the system to sort content by the combined value of the tags associated with them. Perhaps users could even create separate piles of content, each with a value range. Then, you could subscribe to a feed of every item in your top range, maybe a daily digest of content in the midrange, and the stuff on the low end would only be accessible if you expressly look through it.

    Given that not everything uses tags, and not everything pushes tags out through an API, this idea is probably of limited usefulness. And even if it were used on those services that could support it, there’s no telling whether it would actually suffice to sort new content into meaningful piles. I think it’d certainly be an interesting experiment, though. Maybe I’ll look into the content I’m subscribed to and see if I can make heads or tails of it.

    If nothing else, these kinds of discussions are exactly what the Web needs right now. Keep it up, guys!

  5. During a really busy month last summer I didn’t look at my feed reader once. After things slowed down I had so many unread messages that I simply marked all as read and started from scratch. I missed a ton of useful stuff, but it had to be done or I never would have caught up! Some sort of rating system would have definitely been useful to see the stuff I really wanted and just ignore the echo.

  6. My solution with regards to lifestreams is to only look at recent content. If it’s fallen off the bottom of my aggregator before I check, I don’t hunt for it. Most lifestream data is only valuable closer to the time of publishing anyway.

  7. @Will, Eric, and @Marty, what I find fascinating is the tension between our overriding behavior (unsubscribe and start from scratch) vs. a technical solution (something that gauges relevance). Could we safely assume that neither one of those strategies will work outright? In other words, we need a powerful algorithm but also the ability to override it when necessary.

    I’m reminded of “reading kicks” and other trends I get myself on. Sometimes, I just get into reading kicks and read a few books, and other times I don’t read at all. I wonder if this is our mind’s way of regulating our information intake, even if we aren’t aware of it.

  8. [...] The Information Firehose, Lifestreams, and the Curse of Granularity (tags: lifestream weblog) [...]

  9. I like Marty’s idea of almost a stockprice for information sources that adapts to our behaviour, but I wonder whether any technical solution that attempts to forecast what we want to read will put us into a holding pattern where because something has a higher stockprice we see more of it and thus read more of it and thus the stockprice rises, leaving new and interesting sources by the wayside.

  10. Tony:

    Yes, that’s definitely a concern, and always is any time you start looking at weighted systems. Even if unrated sources and tags are considered neither positive nor negative, the distribution of a given user’s content might well knock that new content out of the first tier by default. One simplistic approach might be to just give any new item (source or tag) automatic top-tier billing, until the user has had a chance to give it a few ratings.

    Beyond the unrated content, though, remember that this isn’t the typical “rich get richer” scenario: my preferences aren’t affecting anyone else. The typical problem is that wealth is some constant value, and the more of it that goes to a select few, the less there is to go around. With information that can be duplicated and segmented, that doesn’t adversely affect any other users of any service. And if I rank a source and/or tag high enough that it dominates my content feed, that’s clearly the exact behavior I want.

    The opposing argument, of course, is that our attention is still a limited resource, and more of it that gets taken by some sources, the less there is to go around. But that’s where it’s useful to manually vote down some sources, to explicitly identify those that deserve less of our attention. That just leaves unrated content, which is still something to tackle, but I think realizing the limits of the problem helps when looking for solutions.

    More to the point though, the lower-tier data always needs to be easily accessible, and the second-tier content should always make its way to the user in at least a limited form. I think a single daily digest would be a great way to accomplish this. It’s only one extra item per day, with just titles that can be easily scanned, allowing the titles to jump out as applicable. That way, the tiers are used simply as ways to get our attention, rather than ways to hide content.

    You’re right, I’m sure there’s no perfect solution, but I think it would be better than the problem we currently have with this ever-increasing flood of information. I’m very interested to see what kind of solutions you guys have.

Add Your Comment