focused data-mining
Mar. 2nd, 2006 08:55 pmThere is a service that I wish someone -- Amazon is an obvious candidate -- would provide to me, and I wonder how realistic it is. (It seems like it should be easy given the data, and they've surely got the data.)
Here's an example: I've bought three CDs from a particular artist. I really like two of them, and I don't care for the third. She has more CDs. Which (if any) am I likely to like? If I could find ratings for those CDs from people who liked the two I liked and didn't like the one I didn't like, that would be informative. Overall ratings of those products are irrelevant, because I don't know anything the raters. Reviews can be more informative but, again, I don't know how the reviewers' tastes match my own and most reviews are not well-written. User-submitted "if you like this buy that" links can be helpful but are under-used; they're also not annotated. Knowing that people also bought them isn't helpful; I bought the one I didn't like, too. I want to tap into the actual ratings (not purchase history) from a subset of raters -- the ones who match my own profile. Ratings are more reliable than the other options available because Amazon uses them to push things at you, so you're motivated to get them right.
I'm casting this in terms of Amazon because they're doing part of the problem already, but in theory this could be done by anyone who can mine the data.
Is anyone providing a service like this (with a large-enough user base to be relevant)? Does any Amazonian reading this want to run with it? :-)
Here's an example: I've bought three CDs from a particular artist. I really like two of them, and I don't care for the third. She has more CDs. Which (if any) am I likely to like? If I could find ratings for those CDs from people who liked the two I liked and didn't like the one I didn't like, that would be informative. Overall ratings of those products are irrelevant, because I don't know anything the raters. Reviews can be more informative but, again, I don't know how the reviewers' tastes match my own and most reviews are not well-written. User-submitted "if you like this buy that" links can be helpful but are under-used; they're also not annotated. Knowing that people also bought them isn't helpful; I bought the one I didn't like, too. I want to tap into the actual ratings (not purchase history) from a subset of raters -- the ones who match my own profile. Ratings are more reliable than the other options available because Amazon uses them to push things at you, so you're motivated to get them right.
I'm casting this in terms of Amazon because they're doing part of the problem already, but in theory this could be done by anyone who can mine the data.
Is anyone providing a service like this (with a large-enough user base to be relevant)? Does any Amazonian reading this want to run with it? :-)
(no subject)
Date: 2006-03-03 02:14 am (UTC)(no subject)
Date: 2006-03-03 02:50 am (UTC)(no subject)
Date: 2006-03-03 04:03 am (UTC)(In an ideal universe I'd also be able to perform the same kind of analysis with books, or movies, or anything that can garner sufficient ratings.)
(no subject)
Date: 2006-03-03 04:07 am (UTC)(no subject)
Date: 2006-03-03 05:33 am (UTC)Sadly, I don't work at Amazon anymore.
(no subject)
Date: 2006-03-03 05:43 am (UTC)(no subject)
Date: 2006-03-03 08:08 pm (UTC)I can't find it
Date: 2006-03-04 12:39 am (UTC)As I remember, though, it wasn't ready for prime time. I also remember something about it using neural nets to do the analysis. That's actually problematic. Neural nets can find connections between things that humans don't care about. For example, a data set comparing images with and without tanks worked fine in training, but presented with new images not in either data set, the system couldn't determine if there was a tank in the picture. It seems that all the non-tank images were taken on sunny days and all the tank images on cloudy days. The computer completely missed the tanks and paid attention to sunny vs. non-sunny. Oops.
Rob of UnSpace (http://www.unspace.net/)
PS: Jupiter has a new red spot! (http://www.scienceblog.com/cms/jupiters_got_a_new_red_spot_10142.html) Wheee! Ok, so I get excited over really stupid stuff....
(no subject)
Date: 2006-03-04 11:52 pm (UTC)(no subject)
Date: 2006-03-04 11:55 pm (UTC)In this case, oddly, the earliest and latest albums I own are strong, and one in the middle is weak. So I now look with some suspicion on the other "middle" album that's out there.
(no subject)
Date: 2006-03-04 11:59 pm (UTC)I'm hoping to find a fairly narrow profile. Amazon, for instance, will already make recommendations based on my entire rating/buying history, but I don't think there's necessarily any correlation between SF books, DVD sets of certain TV shows, and Jewish music, for instance. But, near as I can tell, with Amazon it's all or nothing; I can look at the recommendations based on the hundreds of varied things I've rated or not, but I can't limit it to particular domains. Launchcast, being specifically about music, at least helps narrow the scope somewhat; I can narrow it more by being careful about my profile (using multiple profiles if I want to collect what I think are unlrelated recommendations).
Re: I can't find it
Date: 2006-03-05 12:00 am (UTC)Re: I can't find it
Date: 2006-03-06 03:08 am (UTC)For my full, in-depth, gushing review, you can check out this post (http://gclectic.typepad.com/gclectic/2005/08/my_favorite_sof.html) on my blog.
Re: I can't find it
Date: 2006-03-06 03:27 am (UTC)(no subject)
Date: 2006-03-06 03:33 am (UTC)What I find most interesting about your suggestion is the desire to take a broad user profile and emphasize a selected small portion of it in order to produce a query. This is, indeed, an idea that seems incredibly obvious and useful -- and is entirely new to me. Maybe someone will pick up on it and run with it.