« January 2010 | Main | March 2010 »

February 16, 2010

On Karma: Top-line Lessons on User Reputation Design

In Building Web Reputation Systems, we appropriate the term karma to mean a user reputation in an online service. As you might expect, karma is discussed heavily throughout the more than 300 pages. During the final editing process, it became clear that a simple summary of the main points would be helpful to those looking for guidance. It seemed that our first post in over a month (congratulations on the new delivery, Bryce!) should be something big and useful...

This post covers the following top-line points about designing karma systems, drawn from our book and other blog posts:

  • Karma is user reputation within a context
  • Karma is useful for building trust between users, and between a user and the site
  • Karma can be an incentive for participation and contributions
  • Karma is contextual and has limited utility globally. [A chessmaster is not a good eBay Seller]
  • Karma comes in several flavors - Participation, Quality and Robust (combined)
  • Karma should be complex and the result of indirect evaluations, and the formulation is often opaque
  • Personal karma is displayed only to the owner, and is good for measuring progress
  • Corporate karma is used by the site operator to find the very best and very worst users
  • Public karma is displayed to other users, which is what makes it the hardest to get right
  • Public karma should be used sparingly - it is hard to understand, isn't expected, and is easily confused with content ratings
  • Negative public karma should be avoided all together. In karma-math -1 is not the same magnitude as +1, and information loss is too expensive.
  • Public karma often encourages competitive behavior in users, which may not be compatible with their motivations. This is most easily seen with leaderboards, but can happen any time karma scores are prominently displayed. [i.e.: Twitter follower count]

Why bother with karma? [Preface]

Karma is a reputation score for a user in a community, it may be comprised of many components, such as:

  • How long has this person been a member of the community?
  • What types of activities has she engaged in?
  • How well has she performed at them?
  • What do other people think about this person?

Having access to a person's reputation might help you make better informed judgments. Judgements like…

  • Can I trust this person?
  • Should I transact with this person?
  • Is it worth my time to listen to this person?

Besides providing a means for trust between users, karma is often used as an incentive to encourage contributions to a service, or to identify specific users for special action - either recognition or corrective action. The tricky part is balancing the producer incentives against the potential for abuse and the consumers need for good filters over the content.

Karma is contextual (local) and has limited scope [Chapter 1]

Karma is built based on the actions of a user within a context, such as a web site, or even as a member a sub-community of a site. And those contributions are often limited to a very narrow range of actions - care must be taken to not over-generalize the value of a karma score. For example a eBay seller feedback karma only reflects the feelings of the buyers for the exact transactions completed. One of the known scamming patterns is for a scammer to develop strong positive karma selling a large number of smaller items and then switch to simultaneously listing a large number high-ticket items for auction at low prices, collecting the funds and then canceling their account. This is an evil form of reputation bankruptcy (see below).

There is a common misconception about karma - that it can be used across contexts, just as the FICO credit score is broadly used in the United States to determine suitability for issuing credit cards, purchasing a home, or even being hired for a job. Chapter 1 talks about this idea of a "Web Fico":

Several startup companies have attempted to codify a global user reputation for use across web sites, and some try to leverage a user's preexisting eBay seller's Feedback score as a primary value in their rating. They are trying to create some sort of “real person” or “good citizen” reputation system for use across all contexts. As with the FICO score, it is a bad idea to co-opt a reputation system for another purpose, and it dilutes the actual meaning of the score in its original context. The eBay Feedback score reflects only the transaction worthiness of a specific account, and it does so only for particular products bought or sold on eBay. The user behind that identity may in fact steal candy from babies, cheat at online poker, and fail to pay his credit card bills. Even eBay displays multiple types of reputation ratings within its singular limited context. There is no web FICO because there is no kind of reputation statement that can be legitimately applied to all contexts.

Participation vs. quality, and robust karma [Chapter 4]

There are two primitive forms of karma models: models that measure the amount of user participation and models that measure the quality of contributions. When these types of karma models are combined, we refer to the combined model as robust. Including both types of measures in the model gives the highest scores to the users who are both active and produce the best content.

Participation karma

Participation karma: As a user engages in various activities, they are recorded, weighted, and tallied.

Counting socially and/or commercially significant events by content creators is probably the most common type of participation karma model. This model is often implemented as a point system (Chap_4-Points), in which each action is worth a fixed number of points and the points accumulate. A participation karma model looks exactly like the figure above, where the input event represents the number of points for the action and the source of the activity becomes the target of the karma.

There is also a negative participation karma model, which counts how many bad things a user does. Some people call this model strikes, after the three-strikes rule of American baseball. Again, the model is the same, except that the application interprets a high score inversely.

Quality karma

A quality-karma model, such as eBay's seller feedback (Chap_4-eBay_Merchant_Feedback_Karma) model, deals solely with the quality of contributions by users. In a quality-karma model, the number of contributions is meaningless unless it is accompanied by an indication of whether each contribution is good or bad for business. The best quality-karma scores are always calculated as a side effect of other users evaluating the contributions of the target.

In the eBay example, a successful auction bid is the subject of the evaluation, and the results roll up to the seller: if there is no transaction, there should be no evaluation.

Robust karma

By itself, a participation-based karma score is inadequate to describe the value of a user's contributions to the community: we will caution time and again throughout the book that rewarding simple activity is an impoverished way to think about user karma. However, you probably don't want a karma score based solely on quality of contributions either. Under this circumstance, you may find your system rewarding cautious contributors-ones who, out of a desire to keep their quality-ratings high-only contribute to “safe” topics, or-once having attained a certain quality ranking-decide to stop contributing to protect that ranking.

What you really want to do is to combine quality-karma and participation-karma scores into one score-call it robust karma. The robust-karma score represents the overall value of a user's contributions: the quality component ensures some thought and care in the preparation of contributions, and the participation side ensures that the contributor is very active, that she's contributed recently, and (probably) that she's surpassed some minimal thresholds for user participation-enough that you can reasonably separate the passionate, dedicated contributors from the fly-by post-then-flee crowd.

The weight you'll give to each component depends on the application. Robust-karma scores often are not displayed to users, but may be used instead for internal ranking or flagging, or as factors influencing search ranking; see Chap_4-Keep_Your_Barn_Door_Closed , for common reasons for this secrecy. But even when karma scores are displayed, a robust-karma model has the advantage of encouraging users both to contribute the best stuff (as evaluated by their peers) and to do it often.

When negative factors are included in factoring robust-karma scores, it is particularly useful for customer care staff-both to highlight users who have become abusive or users whose contributions decrease the overall value of content on the site, and potentially to provide an increased level of service to proven-excellent users who become involved in a customer service procedure. A robust-karma model helps find the best of the best and the worst of the worst.

Robust karma: A robust-karma model might combine multiple other karma scores-measuring, perhaps, not just a user's output (Participation) but their effectiveness (or Quality) as well.

Unlike most content reputation, karma is implicit, opaque, and complex [Chapter 7]

A reputable entity is potentially any entry in a database, including users and content items, with one or more reputations attached to it. All kinds of reputation score types and all kinds of display and use patterns might seem equally valid for content reputation and karma, but usually they're not. To highlight the differences between content reputation and karma, we've categorized them by the ways in which they're typically calculated: simple and complex reputation.

Simple Reputation
Simple reputation is any reputation score that is generated directly by user evaluation of a reputable entity and that is subject to an elementary aggregation calculation, such as simple average. For example, simple reputation is used on most ratings-and-reviews sites. Simple reputation is direct and easy to understand.
Complex Reputation
Complex reputation is a score aggregated from multiple evaluations, including evaluations of different but related targets, calculated with an opaque method. email IP spammer, Google PageRank, and eBay feedback reputations are examples of complex reputation. It's an indirect evaluation, and users may not understand how it was calculated even if the score is displayed.

Content reputation is about things-typically inanimate objects without emotions or the ability to directly respond in any way to its reputation.

But karma represents the reputation of users, and users are people-they are alive, they have feelings, and they are the engine that powers your site. Karma is significantly more personal and therefore sensitive and meaningful. If a manufacturer gets a single bad product review on a web site, it probably won't even notice. But if a user gets a bad rating from a friend-or feels slighted or alienated by the way your karma system works-she might abandon an identity that has become valuable to your business. Worse yet, she might abandon your site altogether and take her content with her. (Worst of all, she might take others with her.)

Take extreme care in creating a karma system. User reputation on the web has undergone many experiments, and the primary lesson from that research is that karma should be a complex reputation and it should be displayed rarely.

Karma is complex, built of indirect inputs

Be careful with Karma-sometimes making things as simple and explicit as possible is the wrong choice for reputation:

  • Rating a user directly should be avoided. Typical implementations only require a user to click once to rate another user and are therefore prone to abuse. When direct evaluation karma models are combined with the common practice of streamlining user registration processes (on many sites opening a new account is an easier operation than changing the password on an existing account), they get out of hand quickly. See the example of Orkut in Chap_7-Display_Numbered_Levels.
  • Asking people to evaluate others directly is socially awkward. Don't put users in the position of lying about their friends.
  • Using multiple inputs presents a broader picture of the target user's value.
  • Economics research into “revealed preference,” or what people actually do, as opposed to what they say, indicates that actions provide a more accurate picture of value than elicited ratings.

Karma calculations are often opaque

Karma calculations may be opaque because the score is valuable as status, has revenue potential, and/or unlocks privileged application features.

Display karma sparingly

In Building Web Reputation Systems we separate reputation display into three categories: public (shown to other users), personal (shown only to the owner), and corporate (for company internal use.) Corporate karma is normally used to identify the very best and the very worst users for special actions, such as PR contact or account termination. Personal karma is typically used for reflecting progress against some goal - as a dieter tracks their body weight over time. Where karma display becomes challenging is when it is public.

There are several important things to consider when displaying karma to the public:

  • Publicly displayed karma should be rare because, as with content reputation, users are easily confused by the display of many reputations on the same page or within the same context.
  • Publicly displayed karma should be rare because it can create the wrong incentives for your community. Avoid sorting users by karma. See Chap_7-Leaderboards_Considered_Harmful.
  • If you do display it publicly, make karma visually distinct from any nearby content reputation. Yahoo!'s EU message board displays the karma of a post's author as a colored medallion, with the message rated with stars. But consider this: Slashdot's message board doesn't display the karma of post authors to anyone. Even the display of a user's own karma is vague: “positive,” “good,” or “excellent.” After originally displaying karma publicly as a number, over time Slashdot has shifted to an increasingly opaque display of karma.
  • Public displayed karma should be rare because it isn't expected. When Yahoo! Shopping added Top Reviewer karma to encourage review creation, they displayed a Top Reviewer badge with each review and rushed it out for the Christmas 2006 season. After the New Year had passed, user testing revealed that most users didn't even notice the badges. When they did notice them, many thought they meant either that the item was top rated or that the user was a paid shill for the product manufacturer or Yahoo!.

Though karma should be complex, it should still be limited to as narrow a context as possible. Don't mix shopping review karma with chess rank. It may sound silly now, but you'd be surprised how many people think they can make a business out of creating an Internet-wide trustworthiness karma.

Yahoo! holds reputation for karma scores to a higher standard than reputation for content. Be very careful in applying terminology and labels to people, for several reasons:

  • Avoid labels that might appear as attacks. They set a hostile tone that will be amplified in users' responses. This caution applies both to overly positive labels (such as “hotshot” or “top” designations) or negative ones (such as “newbie” or “rookie” ).
  • Avoid labels that introduce legal risks. What if a site labeled members of a health forum “experts,” and these “experts” then gave out bad advice?

These are rules of thumb that may not necessarily apply to a given context. In role-playing games, for example, publicly shared simple karma is displayed in terms of experience levels, which are inherently competitive.

Avoid negative public karma [Chapter 6]

This point is covered in detail in an earlier post The Dollhouse Mafia, or "Don't Display Negative Karma" - which anyone considering having negative karma effects in public reputation should read carefully. We'll only excerpt a small portion here:

This thinking—though seemingly intuitive—is impoverished, and is wrong in at least two important ways.

  • There can be no negative public karma-at least for establishing the trustworthiness of active users. A bad enough public score will simply lead to that user's abandoning the account and starting a new one, a process we call karma bankruptcy. This setup defeats the primary goal of karma-to publicly identify bad actors. Assuming that a karma starts at zero for a brand-new user that an application has no information about, it can never go below zero, since karma bankruptcy resets it. Just look at the record of eBay sellers with more than three red stars-you'll see that most haven't sold anything in months or years, either because the sellers quit or they're now doing business under different account names.
  • It's not a good idea to combine positive and negative inputs in a single public karma score. Say you encounter a user with 75 karma points and another with 69 karma points. Who is more trustworthy? You can't tell: maybe the first user used to have hundreds of good points but recently accumulated a lot of negative ones, while the second user has never received a negative point at all. If you must have public negative reputation, handle it as a separate score (as in the eBay seller feedback pattern).

Even eBay, with the most well-known example of public negative karma, doesn't represent how untrustworthy an actual seller might be-it only gives buyers reasons to take specific actions to protect themselves. In general, avoid negative public karma. If you really want to know who the bad guys are, keep the score separate and restrict it to internal use by moderation staff.

If you're still considering negative reputation, please [re]read the story of the Dollhouse Mafia and imagine your enemies attacking your system.

Public karma can discourage some contributors

Putting user reputations in a public ranked list, creates a competitive environment and some users' motivations are not at all compatible with being being publicly recognized. Still others will see high karma as the goal of the activity instead of the benefit and start to change their behavior to optimize their actions around their karma instead of using the site as intended.

In Leaderboards Considered Harmful, we pointed out:

[...]ranking the members of your community—and pitting them one-against-the-other in a competitive fashion—is typically a bad idea. Like the fabled djinni of yore, leaderboards on your site promise riches (comparisons! incentives! user engagement!!) but often lead to undesired consequences.

[...]

This may be the most insidious artifact of a leaderboard community: the very presence of a leaderboard changes the community dynamic and calls into question the motivations of everyone for any action they might take.