March 10, 2010

Electronic Versions of BWRS On Sale Now

Please forgive us if the Reputation Wednesday posts for a few weeks are focused on book-release related information. We're first time authors and find every one of these personal firsts terribly exciting!
The electronic versions of Building Web Reputation Systems are now for sale at O'Reilly.com!

You can read it on Safari and/or you can by a downloadable version as a color PDF, Mobi, or ePub file These versions are great on your computer and can be read on most mobile devices.All of the electronic versions have both the internal references and the web URLs hot-linked, which is great for a book like ours.

There's also a bundle combining the print and ebook editions at a significant savings. Search the digital version to find correct section in the physical book. Save $20.00!


Randy's iPhone [Stanza ePub fmt]

March 02, 2010

Coming to SxSW: Production Copies of Building Web Reputation Systems!

Bryce and I are happy to announce that Building Web Reputation Systems has gone to the printers! We're absolutely excited to share this news with you all today. It's hard to believe it's been more than a year since we started. Thank you so much to all who've been reading our work as we developed it and providing such helpful feedback—it wouldn't be half as good as it is without you!

The book will hit the retail shelves on 4/1, but if you can't wait that long you have 2 options: (1) early copies will be available from the O'Reilly booth at SxSW!; and (2) there are some eBook codes that will be made available for those willing to review the book and post it online—see the booth or contact us via email (our address is over there → in the sidebar.) I guess it will depend on your blogging karma score. :-)

If Amazon sales rank is any indicator, sales are already picking up, so it seems that, after mobbing the SxSW booth, the fastest way to get a paper copy is to preorder at O'Reilly,
Amazon, Borders, or your favorite book retailer.

For those of you who don't already know what this book is about, here's the back cover copy:

What do Amazon's product reviews, eBay's feedback score system, Slashdot's Karma System, and Xbox Live's Achievements have in common? They're all examples of successful reputation systems that enable consumer websites to manage and present user contributions most effectively. This book shows you how to design and develop reputation systems for your own sites or web applications, written by experts who have designed web communities for Yahoo! and other prominent sites.

Building Web Reputation Systems helps you ask the hard questions about these underlying mechanisms, and why they're critical for any organization that draws from or depends on user-generated content. It's a must-have for system architects, product managers, community support staff, and UI designers.

  • Scale your reputation system to handle an overwhelming inflow of user contributions
  • Determine the quality of contributions, and learn why some are more useful than others
  • Become familiar with different models that encourage first-class contributions
  • Discover tricks of moderation and how to stamp out the worst contributions quickly and efficiently
  • Engage contributors and reward them in a way that gets them to return
  • Examine a case study based on actual reputation deployments at industry-leading social sites, including Yahoo!, Flickr, and eBay

February 16, 2010

On Karma: Top-line Lessons on User Reputation Design

In Building Web Reputation Systems, we appropriate the term karma to mean a user reputation in an online service. As you might expect, karma is discussed heavily throughout the more than 300 pages. During the final editing process, it became clear that a simple summary of the main points would be helpful to those looking for guidance. It seemed that our first post in over a month (congratulations on the new delivery, Bryce!) should be something big and useful...

This post covers the following top-line points about designing karma systems, drawn from our book and other blog posts:

  • Karma is user reputation within a context
  • Karma is useful for building trust between users, and between a user and the site
  • Karma can be an incentive for participation and contributions
  • Karma is contextual and has limited utility globally. [A chessmaster is not a good eBay Seller]
  • Karma comes in several flavors - Participation, Quality and Robust (combined)
  • Karma should be complex and the result of indirect evaluations, and the formulation is often opaque
  • Personal karma is displayed only to the owner, and is good for measuring progress
  • Corporate karma is used by the site operator to find the very best and very worst users
  • Public karma is displayed to other users, which is what makes it the hardest to get right
  • Public karma should be used sparingly - it is hard to understand, isn't expected, and is easily confused with content ratings
  • Negative public karma should be avoided all together. In karma-math -1 is not the same magnitude as +1, and information loss is too expensive.
  • Public karma often encourages competitive behavior in users, which may not be compatible with their motivations. This is most easily seen with leaderboards, but can happen any time karma scores are prominently displayed. [i.e.: Twitter follower count]

Why bother with karma? [Preface]

Karma is a reputation score for a user in a community, it may be comprised of many components, such as:

  • How long has this person been a member of the community?
  • What types of activities has she engaged in?
  • How well has she performed at them?
  • What do other people think about this person?

Having access to a person's reputation might help you make better informed judgments. Judgements like…

  • Can I trust this person?
  • Should I transact with this person?
  • Is it worth my time to listen to this person?

Besides providing a means for trust between users, karma is often used as an incentive to encourage contributions to a service, or to identify specific users for special action - either recognition or corrective action. The tricky part is balancing the producer incentives against the potential for abuse and the consumers need for good filters over the content.

Karma is contextual (local) and has limited scope [Chapter 1]

Karma is built based on the actions of a user within a context, such as a web site, or even as a member a sub-community of a site. And those contributions are often limited to a very narrow range of actions - care must be taken to not over-generalize the value of a karma score. For example a eBay seller feedback karma only reflects the feelings of the buyers for the exact transactions completed. One of the known scamming patterns is for a scammer to develop strong positive karma selling a large number of smaller items and then switch to simultaneously listing a large number high-ticket items for auction at low prices, collecting the funds and then canceling their account. This is an evil form of reputation bankruptcy (see below).

There is a common misconception about karma - that it can be used across contexts, just as the FICO credit score is broadly used in the United States to determine suitability for issuing credit cards, purchasing a home, or even being hired for a job. Chapter 1 talks about this idea of a "Web Fico":

Several startup companies have attempted to codify a global user reputation for use across web sites, and some try to leverage a user's preexisting eBay seller's Feedback score as a primary value in their rating. They are trying to create some sort of “real person” or “good citizen” reputation system for use across all contexts. As with the FICO score, it is a bad idea to co-opt a reputation system for another purpose, and it dilutes the actual meaning of the score in its original context. The eBay Feedback score reflects only the transaction worthiness of a specific account, and it does so only for particular products bought or sold on eBay. The user behind that identity may in fact steal candy from babies, cheat at online poker, and fail to pay his credit card bills. Even eBay displays multiple types of reputation ratings within its singular limited context. There is no web FICO because there is no kind of reputation statement that can be legitimately applied to all contexts.

Participation vs. quality, and robust karma [Chapter 4]

There are two primitive forms of karma models: models that measure the amount of user participation and models that measure the quality of contributions. When these types of karma models are combined, we refer to the combined model as robust. Including both types of measures in the model gives the highest scores to the users who are both active and produce the best content.

Participation karma

Participation karma: As a user engages in various activities, they are recorded, weighted, and tallied.

Counting socially and/or commercially significant events by content creators is probably the most common type of participation karma model. This model is often implemented as a point system (Chap_4-Points), in which each action is worth a fixed number of points and the points accumulate. A participation karma model looks exactly like the figure above, where the input event represents the number of points for the action and the source of the activity becomes the target of the karma.

There is also a negative participation karma model, which counts how many bad things a user does. Some people call this model strikes, after the three-strikes rule of American baseball. Again, the model is the same, except that the application interprets a high score inversely.

Quality karma

A quality-karma model, such as eBay's seller feedback (Chap_4-eBay_Merchant_Feedback_Karma) model, deals solely with the quality of contributions by users. In a quality-karma model, the number of contributions is meaningless unless it is accompanied by an indication of whether each contribution is good or bad for business. The best quality-karma scores are always calculated as a side effect of other users evaluating the contributions of the target.

In the eBay example, a successful auction bid is the subject of the evaluation, and the results roll up to the seller: if there is no transaction, there should be no evaluation.

Robust karma

By itself, a participation-based karma score is inadequate to describe the value of a user's contributions to the community: we will caution time and again throughout the book that rewarding simple activity is an impoverished way to think about user karma. However, you probably don't want a karma score based solely on quality of contributions either. Under this circumstance, you may find your system rewarding cautious contributors-ones who, out of a desire to keep their quality-ratings high-only contribute to “safe” topics, or-once having attained a certain quality ranking-decide to stop contributing to protect that ranking.

What you really want to do is to combine quality-karma and participation-karma scores into one score-call it robust karma. The robust-karma score represents the overall value of a user's contributions: the quality component ensures some thought and care in the preparation of contributions, and the participation side ensures that the contributor is very active, that she's contributed recently, and (probably) that she's surpassed some minimal thresholds for user participation-enough that you can reasonably separate the passionate, dedicated contributors from the fly-by post-then-flee crowd.

The weight you'll give to each component depends on the application. Robust-karma scores often are not displayed to users, but may be used instead for internal ranking or flagging, or as factors influencing search ranking; see Chap_4-Keep_Your_Barn_Door_Closed , for common reasons for this secrecy. But even when karma scores are displayed, a robust-karma model has the advantage of encouraging users both to contribute the best stuff (as evaluated by their peers) and to do it often.

When negative factors are included in factoring robust-karma scores, it is particularly useful for customer care staff-both to highlight users who have become abusive or users whose contributions decrease the overall value of content on the site, and potentially to provide an increased level of service to proven-excellent users who become involved in a customer service procedure. A robust-karma model helps find the best of the best and the worst of the worst.

Robust karma: A robust-karma model might combine multiple other karma scores-measuring, perhaps, not just a user's output (Participation) but their effectiveness (or Quality) as well.

Unlike most content reputation, karma is implicit, opaque, and complex [Chapter 7]

A reputable entity is potentially any entry in a database, including users and content items, with one or more reputations attached to it. All kinds of reputation score types and all kinds of display and use patterns might seem equally valid for content reputation and karma, but usually they're not. To highlight the differences between content reputation and karma, we've categorized them by the ways in which they're typically calculated: simple and complex reputation.

Simple Reputation
Simple reputation is any reputation score that is generated directly by user evaluation of a reputable entity and that is subject to an elementary aggregation calculation, such as simple average. For example, simple reputation is used on most ratings-and-reviews sites. Simple reputation is direct and easy to understand.
Complex Reputation
Complex reputation is a score aggregated from multiple evaluations, including evaluations of different but related targets, calculated with an opaque method. email IP spammer, Google PageRank, and eBay feedback reputations are examples of complex reputation. It's an indirect evaluation, and users may not understand how it was calculated even if the score is displayed.

Content reputation is about things-typically inanimate objects without emotions or the ability to directly respond in any way to its reputation.

But karma represents the reputation of users, and users are people-they are alive, they have feelings, and they are the engine that powers your site. Karma is significantly more personal and therefore sensitive and meaningful. If a manufacturer gets a single bad product review on a web site, it probably won't even notice. But if a user gets a bad rating from a friend-or feels slighted or alienated by the way your karma system works-she might abandon an identity that has become valuable to your business. Worse yet, she might abandon your site altogether and take her content with her. (Worst of all, she might take others with her.)

Take extreme care in creating a karma system. User reputation on the web has undergone many experiments, and the primary lesson from that research is that karma should be a complex reputation and it should be displayed rarely.

Karma is complex, built of indirect inputs

Be careful with Karma-sometimes making things as simple and explicit as possible is the wrong choice for reputation:

  • Rating a user directly should be avoided. Typical implementations only require a user to click once to rate another user and are therefore prone to abuse. When direct evaluation karma models are combined with the common practice of streamlining user registration processes (on many sites opening a new account is an easier operation than changing the password on an existing account), they get out of hand quickly. See the example of Orkut in Chap_7-Display_Numbered_Levels.
  • Asking people to evaluate others directly is socially awkward. Don't put users in the position of lying about their friends.
  • Using multiple inputs presents a broader picture of the target user's value.
  • Economics research into “revealed preference,” or what people actually do, as opposed to what they say, indicates that actions provide a more accurate picture of value than elicited ratings.

Karma calculations are often opaque

Karma calculations may be opaque because the score is valuable as status, has revenue potential, and/or unlocks privileged application features.

Display karma sparingly

In Building Web Reputation Systems we separate reputation display into three categories: public (shown to other users), personal (shown only to the owner), and corporate (for company internal use.) Corporate karma is normally used to identify the very best and the very worst users for special actions, such as PR contact or account termination. Personal karma is typically used for reflecting progress against some goal - as a dieter tracks their body weight over time. Where karma display becomes challenging is when it is public.

There are several important things to consider when displaying karma to the public:

  • Publicly displayed karma should be rare because, as with content reputation, users are easily confused by the display of many reputations on the same page or within the same context.
  • Publicly displayed karma should be rare because it can create the wrong incentives for your community. Avoid sorting users by karma. See Chap_7-Leaderboards_Considered_Harmful.
  • If you do display it publicly, make karma visually distinct from any nearby content reputation. Yahoo!'s EU message board displays the karma of a post's author as a colored medallion, with the message rated with stars. But consider this: Slashdot's message board doesn't display the karma of post authors to anyone. Even the display of a user's own karma is vague: “positive,” “good,” or “excellent.” After originally displaying karma publicly as a number, over time Slashdot has shifted to an increasingly opaque display of karma.
  • Public displayed karma should be rare because it isn't expected. When Yahoo! Shopping added Top Reviewer karma to encourage review creation, they displayed a Top Reviewer badge with each review and rushed it out for the Christmas 2006 season. After the New Year had passed, user testing revealed that most users didn't even notice the badges. When they did notice them, many thought they meant either that the item was top rated or that the user was a paid shill for the product manufacturer or Yahoo!.

Though karma should be complex, it should still be limited to as narrow a context as possible. Don't mix shopping review karma with chess rank. It may sound silly now, but you'd be surprised how many people think they can make a business out of creating an Internet-wide trustworthiness karma.

Yahoo! holds reputation for karma scores to a higher standard than reputation for content. Be very careful in applying terminology and labels to people, for several reasons:

  • Avoid labels that might appear as attacks. They set a hostile tone that will be amplified in users' responses. This caution applies both to overly positive labels (such as “hotshot” or “top” designations) or negative ones (such as “newbie” or “rookie” ).
  • Avoid labels that introduce legal risks. What if a site labeled members of a health forum “experts,” and these “experts” then gave out bad advice?

These are rules of thumb that may not necessarily apply to a given context. In role-playing games, for example, publicly shared simple karma is displayed in terms of experience levels, which are inherently competitive.

Avoid negative public karma [Chapter 6]

This point is covered in detail in an earlier post The Dollhouse Mafia, or "Don't Display Negative Karma" - which anyone considering having negative karma effects in public reputation should read carefully. We'll only excerpt a small portion here:

This thinking—though seemingly intuitive—is impoverished, and is wrong in at least two important ways.

  • There can be no negative public karma-at least for establishing the trustworthiness of active users. A bad enough public score will simply lead to that user's abandoning the account and starting a new one, a process we call karma bankruptcy. This setup defeats the primary goal of karma-to publicly identify bad actors. Assuming that a karma starts at zero for a brand-new user that an application has no information about, it can never go below zero, since karma bankruptcy resets it. Just look at the record of eBay sellers with more than three red stars-you'll see that most haven't sold anything in months or years, either because the sellers quit or they're now doing business under different account names.
  • It's not a good idea to combine positive and negative inputs in a single public karma score. Say you encounter a user with 75 karma points and another with 69 karma points. Who is more trustworthy? You can't tell: maybe the first user used to have hundreds of good points but recently accumulated a lot of negative ones, while the second user has never received a negative point at all. If you must have public negative reputation, handle it as a separate score (as in the eBay seller feedback pattern).

Even eBay, with the most well-known example of public negative karma, doesn't represent how untrustworthy an actual seller might be-it only gives buyers reasons to take specific actions to protect themselves. In general, avoid negative public karma. If you really want to know who the bad guys are, keep the score separate and restrict it to internal use by moderation staff.

If you're still considering negative reputation, please [re]read the story of the Dollhouse Mafia and imagine your enemies attacking your system.

Public karma can discourage some contributors

Putting user reputations in a public ranked list, creates a competitive environment and some users' motivations are not at all compatible with being being publicly recognized. Still others will see high karma as the goal of the activity instead of the benefit and start to change their behavior to optimize their actions around their karma instead of using the site as intended.

In Leaderboards Considered Harmful, we pointed out:

[...]ranking the members of your community—and pitting them one-against-the-other in a competitive fashion—is typically a bad idea. Like the fabled djinni of yore, leaderboards on your site promise riches (comparisons! incentives! user engagement!!) but often lead to undesired consequences.

[...]

This may be the most insidious artifact of a leaderboard community: the very presence of a leaderboard changes the community dynamic and calls into question the motivations of everyone for any action they might take.

January 05, 2010

Waking a Sleeping Chowhound: Another Star-Ratings Misstep?

Adding new social media features to established communities is always disruptive and not always a good idea.

In Chowhound Comes of Age (For Better or Worse) , Luke Tsai writes about how and why the addition of "industry-standard 5-star ratings" to restaurants on Chowhound.com has quaked the community there.

...Log on to the Chowhound message board for the San Francisco Bay Area and you'll find lengthy threads about where to find, say, the most decadent slice of chocolate cake or the best pajeon (Korean seafood pancakes) in the East Bay. You'll find highly technical analyses of the roasting and brewing methodology of local coffee purveyors.
...Up until fairly recently, one thing you wouldn't find on Chowhound was the kind of star ratings system favored by almost every other restaurant guide, whether in print or on the web — from Frommer's to Zagat to Yelp. On Chowhound, you couldn't give a restaurant any kind of quantitative rating.

In short, it was message boards about food, for and by chowhounds - self selected folks who liked to go off the beaten track to find something interesting to eat. Specifically, they liked to go to the places that were unrated, or rated poorly on other sites, just to find any diamonds-in-the-rough, especially unusual items.

It had no ads, no ratings, no shills (because of strong moderation), and no membership fees. It bootstrapped as a contribution financed community site. Eventually it was sold to CNET, which was sold to CBS, which has added ads and ratings in an attempt to capture revenue.

...Jacquilynne Schlesier, the site's community manager, has been helping to moderate Chowhound since the pre-CNET all-volunteer days. "Our users are incredibly passionate and incredibly knowledgeable," Schlesier says. "But it can be a little daunting if you're someone who's not a long-term chowhound." To help make the process less intimidating, they've revamped the site's restaurant listings — individual pages that have all the basic information about a particular restaurant along with links to relevant discussions on the message boards. It's on these pages that the star-rating feature appears.

Generating revenue is good goal. Most food sites that make money have ratings. Your typical product manager would get this far in their reasoning and implement an industry-standard 5-star rating system. This is what CBS/Chowhound apparently did.

But according to many of the site's devotees, the latest set of changes is particularly "unchowish," in large part because of the star-rating feature. ... Among other criticisms, [the founder of Chowhound] questions how it's possible to "rate a bakery that is horrendous except for one item so great it's worth a 100-mile trip along the same rating scale as a pretty-good diner, an inconsistent high-end sushi place, and an exemplary Italian-ice cart."

This is an excellent point. There is a context mismatch between the discussions (interesting food items) and the rating for a restaurant overall.

Why bother asking Chowhound users for a star rating? It's not like they were clamoring for this feature. This looks like Yelp envy to me. I saw similar lazy product design while at Yahoo! around the time Digg originally exploded in growth - property after property wanted to add "Thumbs up" buttons to everything from the weather to search results. [This was a bad design choice for almost all of them - fortunately, during this me-too frenzy, the legal mess from the posting of the DVD crack key helped most Yahoo! product managers figure out that the Yahoo! audience and Digg's were almost mutually exclusive.]

After spending some time at the Chowhound, I've noticed that those participating in the discussions aren't rating much. I couldn't find a restaurant in my area with more than 5 reviews, and five is probably the absolute minimum number of ratings that should be required for the average rating to mean anything. And even then, the average overall is going to be 4.5 stars - familiar to outside users, sure, but in the end pretty useless as a gauge of quality. And, unless CBS is going to buy ratings from someone else, they will never have enough to be useful in a regional search. Bootstrapping 5-star ratings from scratch is a big mistake.

If not 5-star overall ratings, what else?

Clearly the staff needs to find revenue, and advertising is what they've bet the farm on - so increasing the number of users and user-engagement is required. They had do to something.

But, given just the things discussed in this post, there are several other reputation-based things they could try instead...

1) Let the active board posters determine the context! If it's Best Pastrami Sandwich or Most Exotic Menu - let them give the awards to the restaurant. The simplest implementation of this is tagging, but allowing users to create award categories makes search-ranking easier.

2) Allow discussions/posts to be tagged as well - both with the name of the places that are discussed as well as the same user-generated topics...

3) Allow users to mark a place as a "favorite" which both increases the popularity of the place and puts that place on their profile. Combined with tagging, this is an advertisers dream!

4) Implement a karma system for contributors to discussions, increasing the search-rank value of the businesses they discuss, tag, favorite, etc.

All of these techniques are discussed in detail in our upcoming O'Reilly/Yahoo! Press book: Building Web Reputation Systems, also available in searchable draft form on our wiki.

The Chowhounds have valuable expertise they are sharing, they deserve better tools than a poor copy of every-other restaurant site!

December 16, 2009

The Sensical Moment: Asking for User Opinion When the Time is Right

If you're asking for explicit user opinions in your reputation system (ratings, reviews or even just a simple “Like”), pay special attention to exactly when you are asking for them. You'll get better data if you try to gather opinions when it makes most sense to do so: try to find the sensical moments to solicit user input.

Ideally, you'll catch reviewers in moments where they're…

Sufficiently Invested

Can you make it too easy for users to give reviews? You may not think so—if you're in the early stages of deploying your reputation system (or building your site), then you're probably more worried about getting people to use the system at all. And putting obstacles in front of potential reviewers certainly doesn't sound like a good way to alleviate those fears. But, long-term, the success of your reputation system will depend on quality, honest and unbiased opinions.

It may well be in your best interest to limit those who can, and cannot, give ratings. Require that users register, at least. Plain and simple. It should be the bare minimum level of investment that a user should make to voice an opinion on your site.

You may want to go even further. Yahoo! Answers, for instance, limits certain functions (rating questions & answers) to only those users who've achieved a certain status (Level 2) on the site.

Recommendation: Make it easy, but not too easy, for users to give an opinion. Bake in some degree of accountability and ownership for publicly stated opinions.

Appropriately Informed

Don't ask your users to provide opinions on things they haven't experienced. This may be tricky, because the temptation will be strong to make rating objects as easy and low-friction as possible, which typically means putting rating controls in an easy-to-find location and keeping them there consistently. But consider the reputation value of 5-star ratings on YouTube (which we covered here only recently): do you suppose those generally-lackluster ratings distributions would improve if YouTube only allowed users to rate a video after first watching it? (To completion?)

This shortcoming is not limited to YouTube: years ago, Saleem Khan noted a trend on Digg where people were Digging up submissions with no way to have actually read the associated articles. (They couldn't have read them—the articles in question had gone offline before the favorable reviews continued to pour in.)

And even Apple has fallen victim to this oversight. Early iterations of the App Store rating system allowed for anyone to rate an iPhone app—whether they'd ever actually installed the app or not! This violates the "sufficient investment" principle, above, but it also seriously calls into question those reviewers' qualification to review. There's simply no way those ratings could have carried any real value—the reviewers weren't making informed decisions.

Apple eventually fixed this oversight. Now, you're given the opportunity to rate any app from the App Store interface, but when you try to do so for an app you've never tried?

MustOwn.png


Recommendation: Place ratings inputs either spatially or temporally downstream of the act of consumption.

But Not Overly Biased

Although Apple addressed that problem, they also introduced a new one. Now, when iPhone users attempt to delete an app from their device, they are asked to first rate the app.

iphone-rate.jpg

This is, of course, a horrible time to ask a user to rate an application. After they've made the decision that they no longer need the app and just as they're in the process of deleting it. Even an app that a user loved may fare poorly under these circumstances.

Perhaps it's truly a horrible app—in which case a bad rating would be justified— or perhaps the user just no longer has any use for it. (Maybe it's a game that he or she has already beaten, or a Twitter client made superfluous by a newer, sexier alternative.) By the time a user is uninstalling an iPhone app, the love affair with that app—if there ever was one—is unmistakably on the wane, and the average ratings likely reflect that fact.

Recommendation: Don't ask for ratings at the low-point of a user's relationship to the rated object.

And not too distracted

Another major sin of the App Store's "parting shot" rating request is that it makes the act of rating into a roadblock. In this excellent comment, PJ Cabrera makes the point:

Who knows how many users are just inputting anything just to move on, without paying attention to what they're doing[?]
True, there is a "No Thanks" button, but its meaning is ambiguous and some reviewers may mistake its intent (perhaps reading it as a "Cancel this deletion" action instead.) It is hard for users to give honest and considered opinions when they are still caught up in the experience that you're asking them to evaluate.

It's common practice, when buying a new car, to receive a customer satisfaction survey from the manufacturer. (This survey is used as an input into the car-selling reputation of the dealership you bought from.) Why do you suppose that the manufacturers will typically wait a week or more before sending you the survey? It's because they know that with a little time and distance from the (often stressful) day of the transaction that you're more likely to give a measured, thoughtful and accurate assessment of the transaction. (You're probably also more inclined to give a positive review, but that's an discussion for another post.)

Recommendation: Respect the primary tasks that a user may be engaged in on your site. Don't interrupt them unnecessarily in order to solicit ratings.

Special thanks to Laurent Stanevich for providing the iPhone app rating screenshot.

December 09, 2009

A Sneak-Peek at Reputation Concepts

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week, Bryce shares a simple work-in-progress and solicits your input to make it better.

Once upon a time, in (what feels like) a previous life, I illustrated some moderately well-received concept maps: diagrams intended to communicate some simple concepts about software systems and show the interrelationships between their moving parts.

Throughout work on Building Web Reputation Systems, it has always been my intent to attempt a compelling, engaging and fun-to-read concept map. Something to demonstrate the concepts that we've drawn on throughout the book. That was my intent anyway—it just never occurred to me how much work writing a book was going to be. So it hasn't been until fairly recently (like… um, tonight, actually) that I've been able to start pulling something together.

Adhering to our open policy, here, then is that very first rough-and-ugly (and incomplete!) sketch. (Click it for the full version on Flickr.)

RepConcepts.png

I usually don't use Omnigraffle in the design of these concept maps, but it's looseness and speed of idea-capture just felt right for this one, so I'll probably let the general shape of the map simmer for a while in it before moving it over to Illustrator for some fun touches and polish.

This sketch is, admittedly, incomplete. I have a paper version, drafted beforehand, that's easily 150% this size (in terms of # of concepts and linkages.) Please feel free to comment here, or over on Flickr. Hopefully you've enjoyed this brief light interlude, and I'll share more about the progress on the Reputation Systems Concept Map as it evolves.

December 02, 2009

The Cake is a Lie: Reputation, Facebook Apps, and "Consent" User Interfaces

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week, Randy comes back from the IIW with a simple idea for improving application permissioning.

In early November, I attended the 9th meeting of the Internet Identity Workshop. One of the working sessions I attended was on Social Consent user interface design. After the session, I had an insight that reputation might play a pivotal role in solving one of the key challenges presented. I shared my detailed, yet simple, idea with Kevin Marks and he encouraged me to share my thoughts through a blog post—so here goes…

The Problem: Consent Dialogs

The technical requirements for the dialog are pretty simple: applications have to ask users for permission to access their sensitive personal data in order to produce the desired output—whether that's to create an invitation list, or to draw a pretty graph, or to create a personalized high-score table including your friends, or to simply sign and attach an optional profile photo to a blog comment.

The problem, however, is this—users often don't understand what they are being asked to provide, or the risks posed by granting access. It's not uncommon for a trivial quiz application to request access to virtually the same amount of data as much more "heavyweight"applications (like, say, an app to migrate your data between social networks.) Explaining this to users—in any reasonable level of detail—just before running the application causes them to (perhaps rightfully) get spooked and abandon the permission grant.

Conflicting Interests

The platform providers want to make sure that their users are making as informed a decision as possible, and that unscrupulous applications don't take advantage of their users.

The application developers want to keep the barriers to entry as low as possible. This fact creates a lot of pressure to (over)simplify the consent flow. One designer quipped that it reduces the user decision to a dialog with only two buttons: "Go" and "Go Away" (and no other text.)

The working group made no real progress. Kevin proposed creating categories, but that didn't get anywhere because it just moved the problem onto user education—"What permissions does QuizApp grant again?"

Reputation to the Rescue?

All consent dialogs of this stripe suffer from the same problem: Users are asked to make a trust decision about an application that, by definition, they know nothing about!

This is where identity meets trust, and that's the kind of problem that reputation is perfect for. Applications should have reputations in the platform's database. That reputation can be displayed as part of the information provided when granting consent.

Here's one proposed model (others are possible, this is offered as an exemplar).

The Cake is a Lie: Your Friends as Canaries in the Coal Mine of New Apps

First a formalism: when an application wants to access a user's private Information (I), they have a set of intended Purposes (P) they wish to use it for. Therefore, the consent could be phrased thusly:

"If you let me have your (I), I will give you (P). [Grant] [Deny]"

Example: "If you give me access to your friends list, I will give you cake."

In this system, I propose that the applications be compelled to declare this formulation as part of the consent API call. (P) would be stored along with the app's record in the platform database. So far, this is only slightly different from what we have now, and of course, the application could omit or distort the request.

This is where the reputation comes in. Whenever a user uninstalls an application, the user is asked to provide a reason, including abusive use of data and specifically asks a question to see if the promise of (P) was kept.

"Did this application give you the [cake] it promised?"

All negative feedback is kept—to be re-used later when other new users install the app and encounter the consent dialog. If they have friends who have uninstalled this application already complaining that "If (I) then (P)" string was false, then the moral equivalent of this would appear scrawled in the consent box:


"Randy says the [cake] was unsatisfactory.
Bryce says the [cake] was unsatisfactory.
Pamela says the application spammed her friends list."

Afterthoughts

Lots of improvements are possible (not limiting it to friends, and letting early-adopters know that they are canaries in the coal mine.) These are left for future discussion.

Sure, this doesn't help early adopters.

But application reputation quickly shuts down apps that do obviously evil stuff.

Most importantly, it provides some insight to users by which they can make more informed consent decisions.

(And if you don't get the cake reference, you obviously haven't been playing Portal.)

December 01, 2009

Pardon our dust...

The book is coming into it's next phase, and we're cleaning up all the messy bits before we hand it off to O'Reilly. This entails renaming every image file and other grubbiness that is bound to break things for a day or two.

Please bear with us while we get things back in order on the blog and wiki.

Bryce and Randy

November 18, 2009

Reputation is Identity

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week's entry discusses the ways that reputation can make for richer user identities on your site. It is lightly adapted from our draft of Chapter 8.

Imagine you're at a party, and your friend Ted wants you to meet his friend, Mary. He might very well say something like… "I want you to meet my friend, Mary. She's the brunette over by the buffet line." A fine, beginning, to be sure. It helps to know who you're dealing with. But now imagine that Ted ended there as well. He doesn't take you by the arm, walk you over to Mary, and introduce you face to face. Maybe he walks off to get another drink. Um… this does not bode well for your new friendship with Mary.

Sadly, until fairly recently, this has been the state of identity on much of the Web. When people were represented at all, they were often nothing more than a meager collection of sparse data elements: a username; maybe an avatar; just enough identifying characteristics that you might recognize them again later, but not much else.

With the advent of social on the web, things have improved. Perhaps the biggest improvement has been that now people's relationships formulate a sizable component of their identity and presence on most sites. Now, mutual friends or acquaintances can act as a natural entree to forming new relationships. So at least Ted now will go that extra step and walk you over to that buffet table for a proper introduction.

But, you still won't know much about Mary, will you? Once introductions are out of the way, what will you possibly have to talk about? The addition of reputation to your site will provide that much-needed final dimension to your users' identities: depth. Wouldn't it be nice to review a truly rich and deep view of Mary's identity on your site before deciding what you and she will or won't have in common?

Here are but a few reasons why user identities on your site will be stronger with reputation than they would be without.

  • Reputation is based on history and the simple act of recording those histories – a user's past actions, or voting history, or the history of their relationship to the site – provides you with a lot of content (and context) that you can present to other users. This is a much richer model of identity than just a display-name and an avatar.
  • Visible histories reveal shared affinities and allow users with common interests to find each other. If you are a Top Contributor in the Board Games section of a site, then like-minded folks can find you, follow you, or invite you to participate in their activities.

    You will, however, find contexts where this is not desirable. On a question-and-answer site like Yahoo! Answers, for instance, don't be surprised to find out that many users won't want their questions about gonorrhea or chlamydia to appear as part of their historical record. Err on the side of giving your users control over what appears, or give them the ability to hide their participation history altogether.

  • A past is hard to fake. Most site identities are cheap. In and of themselves, they just don't mean much. A couple of quick form-fields, a 'Submit' button and practically anyone (or no one– bots welcome!) can become a full-fledged member of most sites. It is much harder, however, to fake a history of interaction with a site for any duration of time.

    We don't mean to imply that it can't be done – harvesting 'deep' identities is practically an offshoot industry of the MMORPG world (See the figure above.) But it does provide a fairly high participatory hurdle to jump. When done properly, user karma can assure some level of commitment and engagement from your users. (Or at least allow you to ascertain those levels quickly.)

  • Reputation disambiguates identity conflicts. Hopefully, you've moved away from publicly identifying users on your site by their unique identifier. (You have read the Tripartite Identity Pattern, right?) But this introduces a whole new headache: identity spoofing. If your public namespace doesn't guarantee uniqueness (or even if it does– it'll be hard to guard against similar-appearing/l33t-speak equivalents and the like) then you'll have this problem.

    Once your community is at scale, trolls will take great delight in appropriating others' identities – assuming the same display name, uploading the same avatar – purely in an effort to disrupt conversations. It's not a perfect defense, but always associate a contributor's identity with his or her participation history or reputation to help mitigate these occurrences. You will, at least, have armed the community with the information they need to decide who's legit and who's an interloper.

These are some of the reasons that extending user identities with reputation is useful. Chapter 8 of Building Web Reputation Systems offers a series of considerations for how to do so most effectively.

November 11, 2009

5-Star Failure?

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week's entry confirms that poorly chosen reputation inputs will indeed yield poor results.

Pity the poor, beleaguered 5-Star rating. Not so very long ago, it was the belle of the online ratings ball: its widespread adoption by high-profile sites like Amazon, Yahoo!, and Netflix influenced a host of imitators, and—at one point—star-ratings were practically an a priori choice for site designers when considering how best to capture their users' opinions. Their no-brainer inclusion had almost reached cargo cult design status.

This has subsided in recent years, as stars have received stiff competition from hot, upstart mechanisms like "Digg-style" voting (what we, when contributing to the Yahoo! Pattern Library, rechristened as Vote to Promote.) And Facebook's "Like" action (which, I guess, was ahem, "inspired by" FriendFeed though, let us not forget that for a time, also flirted with Thumbs Up & Down rating of feed items.) Definitely, within the past 2 or 3 years, stars 'obvious' appeal as the ratings mechanism of choice is no longer so obvious.

Even more recently, 5-Star ratings fall from grace is almost complete. YouTube fired the first volley, declaring that, by and large, people on YouTube overwhelmingly give 5 stars to videos on that site. (For readers of this site, you'll recall that we blogged about similar J-Curve distributions that are prevalent on Yahoo! as well.)

And then the venerable Wall Street Journal declared that On the Internet, Everyone's a Critic But They're Not Very Critical:

One of the Web's little secrets is that when consumers write online reviews, they tend to leave positive ratings: The average grade for things online is about 4.3 stars out of five.
And, just like that, as quickly as 'stars are it' rose to prominence, 'stars are dead' is rapidly becoming the accepted wisdom. (Don't believe me? Read the comments when TechCrunch covered the YouTube discovery, and you'll see folks all-but-rushing to prop up a variety of their 'preferred rating mechanism' in stars' place.)

Are stars dead?

This is, of course, the wrong way to frame the question. Stars, thumbs, favorites, or sliders: any of these ratings input mechanisms are dead-on-arrival if they're not carefully considered within the context of use. 5-Star ratings require a little more cognitive investment than a simple 'I Like This' statement, so--before designing 5-star ratings into your system--consider the following.

Will it be clear to users what you're asking them to assess? It's not entirely surprising that YouTube's ratings overwhelmingly tend toward the positive. That's a long-observed and well understood phenomenon in the social sciences called Acquiescence Bias. It is "the tendency of a respondent to agree with a statement when in doubt." And 5-star ratings, in the case of YouTube, are nothing but doubt. What, exactly, is a fair and accurate quantitative assessment for a video on YouTube? The input mechanism does provide some clues, in the form of text hints for the various ratings levels (ranging from 'Poor' to 'Awesome!') but these are highly subjective and - themselves - way too open to interpretation.

Is a scale necessary? If the primary decision you're asking users to make is 'good vs. bad' or 'I liked it' or 'I didn't', then are multiple steps of decisioning really adding anything to their evaluation?

Are comparisons being made? Should I, as a user, rate videos in comparison to other similar videos on YouTube? What, exactly, distinguishes a 5-star football to the groin video from a 2-star? Am I rating against like videos? Or all videos on YouTube? (Or every video I've ever seen!?)

Have they watched the video? One way to encourage more-thoughtful ratings is to place the input mechanism at the proper juncture: make some attempt, at least, to ensure that the user is rating the thing only after having experienced it. YouTube's 5-star mechanism is fixed and always-present, encouraging drive-by ratings, premature ratings or just general sloppiness of assessment.

So, are stars inappropriate for YouTube, at least in the way that they've designed them? Probably, yes.

To wrap up, some quick links. Check out this elegant and innovative design that the folks at Steepster recently rolled out, and think about the ways it cleverly addresses all four of the concerns listed above.

And to see a really in-depth study of 5-star ratings used effectively, check out Using 5-Star Ratings from Christopher Allen & Shannon Appelcline's excellent series on Systems for Collective Choice.