- The (un)Book (draft!)
- Appendix A - The Reputation Framework
- Appendix B - Related Resources
- Meta
-
Having answered the three key questions posed in Chapter_5 , you should have a pretty good idea of what you want to accomplish with your system. In this chapter, we'll start showing you how to accomplish those goals. We'll start identifying the components of your reputation system, and we'll systematically determine these details:
In addition, we'll share a number of practitioner's tips, as always. This time around, we'll consider the effects of exclusivity on your reputation system (how stingy, or how generous, should you be when doling out reputations?). We'll also provide some guidance for determining the appropriate scope of your reputations in relation to context.
To accomplish your reputation-related goals, you'll have two main weapons-the objects that your software understands and the software tools, or mechanisms, that you provide to users and embed in processes. To put it simply, if you want to build birdhouses, you need wood and nails (the objects) along with saws and hammers (the tools)-and someone to do the actual construction (the users).
Where will the relevant objects in your reputation system come from? Why, they're present in your larger application, of course. (Not all objects need be contained within the application-see Chap_3-External_Objects -but this will be the majority-case.) Let's start by thinking clearly about the architecture and makeup of your application.
You would consider the same questions and dependencies regardless of whether your host application was already live or was still in the planning stages. It's just much easier for us to talk about reputation as if there were preexisting objects and models to graft it onto. (Planning your application architecture is a whole other book altogether.)
You know your application model, right? You can probably list the five most important objects represented in your system without even breaking a sweat. In fact, a good place to start is by composing an “elevator pitch” for your application: describe, as succinctly as you can, what your application will do. Here are some examples. - A social site that lets you share recipes and keep and print shopping lists of ingredients
These are short, sweet, and somewhat vague descriptions of three very different applications, but each one still tells us much of what we need to know to plan our reputation needs. The recipe sharing site likely will benefit from some form of reputation for recipes and will require some way for users to rate them. The shopping lists? Not so much-those are more like utilities for individual users to manage the application data.
In the intranet application for paralegals, the briefs are the primary atomic unit of interest. Who saved what briefs, how are users adding metadata, and how many people attach themselves to a document? These are all useful bits of information that will help filter and rank briefs to present back to other users.
So the first step toward defining the relevant objects in your reputation system is to start with what's important in the application model, then think forward a little to what types of problems reputation will help your users solve. Then you can start to catalog the elements of your application in a more formal fashion.
Although you've thought at a high level about the primary objects in your application model, you've probably overlooked some smaller-order, secondary objects and concepts. These primary and secondary objects relate to one another in interesting ways that we can make use of in our reputation system. An application audit can help you to fully understand the entities and relationships in your application.
Make a complete inventory of every kind of object in your application model that may have anything to do with accomplishing your goals. Some obvious items are user profile records and data objects that are special to your application model: movies, transactions, landmarks, CDs, cameras, or whatever. All of these are clear candidates for tracking as reputable entities.
It is very important to know what objects will be the targets of reputation statements and any new objects you'll need to create for that purpose. Make sure you understand the metadata surrounding each object and how your application will access it. How are the objects organized? Are they searchable by attributes? Which attributes? How are the different objects related to one another?
Some objects in your application model will be visually represented in the interface, so one way to start an audit is with a simple survey of screen designs, at whatever fidelity is available. For in-progress projects, early-stage wireframes are fine-if your application is in production, take some screenshots and print them. Figure_6-1 shows an audit-screengrab for an already-in-production Yahoo! Message Boards.
Also be sure to list items whose creation, editing, or content can be used as input into your reputation model. Some common types of such items are: * categorization systems, like folders, albums, or collections
Spend some time considering more nuanced sources of information, such as historical activity data, external reputation information, and special processes that provide application-relevant insight.
As an example, consider a profanity filter applied to user-supplied text messages, replacing positive matches with asterisks (); the real-time results of applying the filter might provide a way to measure the effects of interface changes on this particular user behavior (the use of profanity).
An object that makes a good candidate for tracking as a reputable entity in your system probably has one or more of the following characteristics.
I guess this should go without saying, but-what the heck-let's say it anyway. If your entire application offering is built around a specific type of object, social or otherwise, the object is probably a good candidate for tracking as a reputable entity.
And remember, nothing interests people more than… other people. That phenomenon alone is an argument for at least considering using karma (people reputation) in any application that you might build. Users will always want every possible edge in understanding other actors in a community. What motivates them? What actions have they performed in the past? How might they behave in the future?
When you're considering what objects will be of the most interest to your users, don't overlook other, related objects that also may benefit users. For example, on a photo sharing site, it seems natural that you'd track a photo's reputation. But what about photo albums? They may not the very first application object that you think of, but in some situations it's likely you'll want to direct users' attention to high-quality groupings or collections of photos.
The solution is to track both objects. Each may affect the reputation of the other to some degree, but the inputs and weightings you'll use to generate reputation for each will necessarily differ.
Some decisions are easy to make, requiring little investment of time or effort. Likewise, the cost of recovering from such decisions is negligible. For example, suppose I ask myself, “Should I read this blog entry?”
If it's a short entry, and I'm already engaged in the act of reading blogs, and no other distractions are calling me away from that activity, then, yeah, I'll probably go ahead and give it a read. (Of course, the content of the entry itself is a big factor. If the neither the entry's title or a quick skim of the body interests me, I'll likely pass it up.) And-once I've read it-if it turns out not to have been a very good choice? Well, no harm done. I can recover gracefully, move on to the next entry in my feed reader and proceed on my way.
The level of decision investment is important because it affects the likelihood that a user will make use of available reputation information for an item. In general, the greater the investment in a decision, the more a user will expect (and make use of) robust supporting data. (See Figure_6-2 .)
So for high-investment decisions (such as purchasing or any other decision that is not easily rescinded), offer more-robust mechanisms, such as reputation information, for use in decision making.
You can do a lot with design to “ease” the presence of reputation inputs in your application interface. Sites continue to get more and more sophisticated with incorporating explicit and implicit controls for stating an entities value. Furthermore, the web-using public is becoming more and more accepting of, and familiar with, popular voting and favoriting input mechanisms.
Neither of these facts, however, obviates this requirement: reputable entities themselves must have some intrinsic value apart from the reputation system. Only ask your users to participate in ways that are appropriate in relation to that object's intrinsic value. Don't ask users for contributions (such as reviews or other metadata) that add value to an object whose intrinsic apparent value is low.
It might be OK, for example, to ask someone to give a thumbs-up rating to someone else's blog comment (because the cost to the user providing the rating is low-basically, a click). But it would be inappropriate to ask for a full-blown review of the comment. Writing the review would require more effort and thought than there was in the initial comment.
Reputable entities must remain in the community pool long enough for all members of the community to cast a vote. (Figure_6-3 ) There's little use in asking users for metadata for an item if other users cannot come along afterward and enjoy the benefit of that metadata.
Highly ephemeral items, such as news articles that disappear after 48 or 72 hours, probably aren't good candidates for certain types of reputation inputs. For example, you wouldn't ask users to author a multi-part review of a news story destined to vanish in less than a day, but you might ask them to click a “Digg this” or “Buzz this” button.
Items with a great deal of persistence (such as real-world establishments like restaurants or businesses) make excellent candidates for reputation. Furthermore, it can be appropriate to ask users for more involved types of inputs for persistent items, because it's likely that other users will have a chance to benefit from the work that the community puts into contributing content.
Now that you've got a firm grasp of the objects in your system and you've elected a handful as reputable entities, the next step is to decide what's good and what's bad. How will you decide? What inputs will you feed into the system, to be tabulated and rolled up to establish relative reputations among like objects?
Now, instead of merely listing the objects that a user might interact with in your application, we'll enumerate all the actions that a user might take in relation to those objects. Again, many actions will be obvious and visible right there in your application interface, Figure_6-4 , so let's build on the audit that you performed earlier for objects.
Explicit claims represent your community's voice and opinion. They operate through interface elements you provide that solicit users' opinions about an entity, good or bad. A fundamental difference exists between explicit claims and implicit ones (discussed below), which boils down to user intent and comprehension.
With explicit claims, users should be fully aware that the action they're performing is intended as an expression of an opinion. That intent differs greatly from the ones for implicit claims, in which users mostly just go about their business, generating valuable reputation information as a side effect.
If you present explicit inputs to your users as only that-a mechanism for generating reputation information to feed the system and make your site smarter-you may be inhibiting the community from providing inputs. You are likely to see more input surrendered if the contributors get some primary value from their contributions.
The primary value can be big or small, but it probably will have some of the following characteristics:
Likewise, the 5-star rating system in iTunes is surpassingly useful not because of any secondary or tertiary reputation benefits it may yield, but primarily because it offers a well-articulated and extremely flexible way to manage data. iTunes users can take advantage of stars to sort track listings, build smart playlists, and get recommendations from the iTunes Store. Star rating widgets in iTunes are full of primary value.
Any time a user takes some action in relation to a reputation entity, it is very likely that you can derive valuable reputation information from that action. Recall the discussion of implicit and explicit reputation claims in Chap_1-The_Reputation_Statement . With implicit reputation claims, we watch not what the user says about the quality of an entity but how they interact with that object. For example, assume that a reputable entity in your system is a text article. You'll find valuable reputation information in the answers to the following questions. * Does the user read the article? To completion?
You can construct any number of smart, nuanced, and effective reputation models from relevant related action-type claims. Your only real limitation is the level and refinement of instrumentation in your application: are you prepared to capture relevant actions at the right junctures and generate reputation events to share with the system?
A reputation system doesn't exist in a vacuum-it's part of a bigger application, which itself is part of a bigger ecosystem of applications. Relevant inputs come from many sources other than those generated directly by user actions. Be sure to build, buy, or otherwise include them in your model where needed. Here are some common examples:
cron
that executes them), these often are used to start a reputation process as part of periodic maintenance; for example, a timer can trigger reputation scores to decay or expire. (For one benefit to calculating reputation on a time-delay, see Chap_4-Decay_and_Delay .) Timers may be periodic or scheduled ad hoc by an application.Of course, the messages that each reputation process outputs also are potential inputs into other processes. Well-designed karma models don't usually take direct inputs at all-those processes always take place downstream from other processes that encode the understanding of the relationship between the objects and transform that code into a normalized score.
Whether your system features explicit reputation claims, implicit ones, or a skillful combination of both, to maintain the quality of the inputs to the system, strive to follow these practices:
Don't continue to reward people or objects for performing the same action over and over-rather, try to single out events which indicate that the target of the claim is worth paying attention to. For instance, the act of bookmarking an article is probably a more significant event than a number of page views. Why? Because bookmarking something is a deliberate act-the user has assessed the object in question and decided that it's worth further action.
The process of creating karma is subtly more complex and socially delicate than creating reputation for things. For a deeper explanation, see Karma Is Complex, Chap_7-Displaying_Karma . Yahoo! has a general community policy of soliciting explicit ratings input only for user-created content-never having users directly rate other users. This is a good practice for a number of reasons:
It's often worthwhile to reward firsts for users in your system: perhaps rewarding the first user to “touch” an object (leave a review, comment on a story, or post a link.) Or, conversely, you might reward a whole host of firsts for a single user (to encourage them to interact with a wide range of features, for instance.)
But once a first has been acknowledged, don't continue to reward users for more of the same.
Pick events that are hard for users to replicate; this combats gaming of the system. But anticipate these patterns of behavior anyway, and build a way to deal with offenders into your system.
In Chap_3-Ratings_Bias_Effects , we discussed ratings distributions and why it's important to pay attention to them. If you're seeing data with poorly actionable distributions (basically, data that doesn't tell you much), it's likely that you're asking for the wrong inputs.
Pay attention to the context in which you're asking. For example: if interest in the object being rated is relatively low (perhaps it's official feed content from a staid, corporate source), 5-star ratings are probably overkill. Your users won't have such a wide range of opinions about the content that they'll need five stars to judge it.
Ask for information in a way that's consistent and appropriate with how you're going to use it. For example, if your intent is to display the community average rating for a movie on a scale of 1 to 5 stars, it makes the most sense to ask users to enter movie ratings on a scale of 1 to 5 stars. Of course, you can transform reputation scores and present them back to the community in different ways (see Chap_6-What_Comes_In ), but strive to do that only when it makes sense, and in a way that doesn't confuse your users.
In Chapter_7 , we'll focus on displaying aggregated reputation, which is partly constructed with the inputs we discuss here, and which has several output formats identical to the inputs (for example, 5 stars in, 5 stars out). But that symmetry exists only for a subset of inputs and an ever-smaller subset of the aggregated outputs. For example, an individual vote may be a yes or a no, but the result is a percentage of the total votes for each.
In this chapter, we're discussing only claims on the input side-what does a user see when she takes an action that is sent to the reputation model and transformed into a claim?
Below are best practices for the use and deployment of the common explicit input types. The user experience implications of these patterns are also covered in more depth in Designing Social Interfaces (O'Reilly).
Before we dive into all the ways in which users might provide explicit feedback about objects in a system, think about the context in which users provide feedback. Remember, this book is organized around the reputation information that users generate-usable metadata about objects and about other people. But users themselves have a different perspective, and their focus is often on other matters altogether. (See Figure_6-5 ). Feeding your reputation system is likely the last thing on their mind.
Given the other priorities, goals, and actions that your users might possibly be focused on at any given point during their interaction with your application, here are some good general guidelines for gathering explicit reputation inputs effectively.
The Interface Design of Reputation Inputs
Your application's interface design can reinforce the presence of input mechanisms in several ways: * Place input mechanisms in comfortable and clear proximity to the target object that they modify. Don't expect users to find a ratings widget for that television episode when the widget is buried at the bottom of the page. (Or at least don't expect them to remember which episode it applies to… or why they should click it… or…)
For example, in a shopping context, it's probably appropriate to make “Add to Cart” the predominant call to action and keep the “Rate This” button less noticeable-even much less noticeable. (Would you rather have a rating, or a sale?)
In a word: don't. Your users will have enough work in finding, learning, and coming to appreciate the benefits of interacting with your reputation inputs. Don't throw unnecessary variations in front of them.
A number of different input mechanisms let users express an opinion about an object across a range of values. A very typical such mechanism is star ratings. Yahoo! Local (See Figure_6-6 ) allows users to rate business establishments on a scale of 1 to 5 stars.
Stars seem like a pretty straightforward mechanism, both for your users to consume (5-star rating systems seem to be everywhere, so users aren't unfamiliar with them) and for you, the system designer, to plan. Tread carefully, though. Here are some small behavioral and interaction “gotchas” to think about early, during the design phase.
The Schizophrenic Nature of Stars
Star ratings often are displayed back to users in a format very similar to the one in which they're gathered from users. That arrangement need not be the case-scores generated by stars can be transformed into any number of output formats for display-but as we noted above (see Chap_6-Match_User_Expectations ), it's usually what is clearest to users.
As the application designer, you should be wary, however, of making the input-form for stars match too closely their final display presentation. The temptation is strong to design one comprehensive widget that accomplishes both: displaying the current community average rating for an object and accepting user input to cast their own vote.
Slick mouseover effects or toggle switches that change the state of the widget are some attempts that we've seen, but this is tricky to pull off, and almost never done well-you'll either end up with a widget that does a poor job at displaying the community average, or one that doesn't present a very strong call to action.
The solution that's most typically used at Yahoo! is to separate these two functions into two entirely different widgets and present them side by side on the page. The widgets are even color-coded to keep their intended uses straight. On Yahoo!, red stars are typically read-only (you can't interact with them) and always reflect the community average rating for an entity, while yellow stars reflect the rating that you as a user entered (or, alternately, empty yellow stars wait eagerly to record your rating).
From a design standpoint, the distinction does introduce additional interactive and visual complexity to any component that displays ratings, but the increase in clarity more than compensates for any additional clutter.
Do I Like You, or Do I “Like” Like You?
Though it's a fairly trivial task to determine numerical values for selections along a 5-point scale, there's no widespread agreement among users on exactly what star ratings represent. Each user applies a subtly different interpretation (complete with biases) to star ratings. Ask yourself the following questions-they're the questions that users have to ask each time they come across a ratings system: * What does “one star” mean on this scale? Should it express strong dislike? Apathy? Mild “like” ? Many star-ratings widgets provide suggested interpretations at each point along the spectrum, such as “Dislike it,” “Like it,” and “Love it.” The drawback to that approach is that it constrains the uses that individual users might find for the system. The advantage is that it brings the community interpretation of the scale into greater agreement.
“Thumb” voting (thumb up or thumb down) lets a user quickly rate content in a fun, engaging way. (See Figure_6-7 .)The benefit to the user for voting is primarily related to self-expression (“I love this!” or “I hate this!” ). The ratings don't need to be presented visually as thumbs (in fact, sometimes they shouldn't), but in this book we'll use “thumb” as shorthand for a two-state voting mechanism.
Thumb voting allows users to express strongly polarized opinions about assets. For example, if you can state your question as simply as “Did you like this or not?,” thumb voting may be appropriate. If it seems more natural to state your question as “How much did you like this?,” then star ratings seem more appropriate.
A popular and effective use of two-state voting is as a meta-moderation device for user-submitted opinions, comments, and reviews, Figure_6-8 . Wherever you solicit user opinions about an object, also consider letting the community voice opinions about that opinion, by providing an easy control such as “Was this helpful?” or “Do you agree?”
Avoid thumb voting for multiple facets of an entity. For example, don't provide multiple thumb widgets for a product review intended to record users' satisfaction with the product's price, quality, design, and features. Generally, a thumb vote should be associated with an object in a one-to-one relationship: one entity gets one thumb up or down. Think of the metaphor, after all: Emperor Nero never would have let a gladiator's arm survive while putting his leg to death. Think of thumb voting as an all-or-nothing rating.
Consider thumb voting when you want a fun, lightweight rating mechanism. The context for thumb voting should be appropriately fun and lighthearted, too; don't use thumb voting in contexts where it will appear insensitive or inappropriate.
Vote-to-promote fulfills a very specific niche in the world of online opinion-gathering. As users browse a collection or pool of media objects, they mark items as worthwhile using a control consisting of a simple gesture. This pattern has been popularized by social news sites such as Digg, Yahoo! Buzz, and Newsvine.
Typically, these votes accumulate and are used to change the rank of items in the community pool and present winners with more prominence or a higher status, but that's not an absolute necessity. Facebook offers an I-like-this link (Figure_6-9 ) that simply communicates a user's good will toward an item.
The names of users who “like” an item are displayed to other users who encounter the item (until the list of vote casters becomes cumbersome, when the display switches to a sort of summary score), but highly liked items aren't promoted above other items in the news feed in any obvious or overt way.
Writing a user review demands a lot of effort from users. It is among the most involved explicit input mechanisms that you can present, usually consisting of detailed, multi-part data entry (see Figure_6-10 ).
To get good, usable comparison data from user reviews, try to ensure that the objects you offer up for review meet all the criteria that we listed above for good reputable entities (see Chap_6-Good_Reputable_Entities ). Ask users to write reviews only for objects that are valuable, long-lived and that have a high decision-investment.
Reviews typically are compound reputation claims, with each review made up of a number of smaller inputs bundled together. You might consider any combination of the following for your user-generated reviews: * You can include a freeform comment field (see Chap_3-Text_Comments ) for users to provide their impressions of the rated object, good or bad. You may impose some standards on this field (for example, checking for profanity or requiring length in a certain character range), but generally this field is provided for users to fill in as they please.
For example, even if you're asking users to rate multiple facets of, say, a movie (direction, acting, plot, and effects), you can provide one prominent rating input for the users' overall opinion. You could just derive the average from a combination of the facet reviews, but that wouldn't be as viscerally satisfying for opinionated reviewers.
Remember-with implicit reputation inputs, we are going to pay attention to some subtle and non-obvious indicators in our interface. Actions that-when users take them-may indicate a higher level of interest (indicative of higher quality) in the object targeted.
It is difficult to generalize about implicit inputs because they are highly contextual. They depend on the particulars of your application: its object model, interaction styles, and screen designs. However, the following inputs represent the types of inputs you can track.
Any time a user saves an object or marks it for future consideration-either for himself, or to pass along to someone else-that could be a reputable event.
This type of input has a lot in common with the explicit vote-to-promote input (see Chap_6-Vote_To_Promote ), but the difference lies in user intent and perception: in a vote-to-promote system, the primary motivator for marking something as worthwhile is to publicly state a preference, and the expectation is that information will be shared with the community. It is a more extrinsic motivation.
By contrast, favorites, forwarding, and adding to a collection are more intrinsically motivated reputation inputs-actions that users take for largely private purposes, including the following.
Favorites
To mark an item as a favorite, a user activates some simple control (usually clicking an icon or a link), which adds the item to a list that she can return to later for browsing, reading, or viewing.
Some conceptual overlap exists between the favorites pattern, which can be semipublic-favorites often are displayed to the community as part of a user's profile-and the liking pattern. Both favorites and liking can be tallied and fed in almost identical ways into an object's quality reputation.
Forwarding
You might know this input pattern as “send to a friend.” This pattern facilitates a largely private communication between two friends, in which one passes a reputable entity on to another for review. Yahoo! News has long promoted most-emailed articles as a type of currency-of-interest proxy reputation. (See Figure_6-11 .)
Adding to a Collection
Many applications provide ordering mechanisms for users as conveniences: a way to group like items, save them for later, edit them en masse. Depending on the context of your application, and the culture of use that emerges out of how people interact with collections on your site, you may want to consider each “add to collection” action to be an implicit reputation statement, akin to favoriting or sending to a friend.
Greater disclosure is a highly variable input: there's a wide range of ways to present it in an interface (and weight it in reputation processes). But if users request “more information” about an object, you might consider those requests as a measure of the interest in that object-especially if users are making the requests after having already evaluated some small component of the object, such as an excerpt, a thumbnail, or a teaser.
A common format for blogs, for instance, is to present a menu of blog entries, with excerpts for each, and “read more” links. Clicks on a link to a post may be a reasonably accurate indicator of interest in the destination article.
But beware-the limitations of a user interface can render such data misleading. In the example in Figure_6-12 , the interface may not reveal enough content to allow you to infer the real level of interest from the number of clicks. (Weight such input accordingly.)
One of the very best indicators of interest in an entity is the amount of conversation, rebuttal, and response that it generates. While we've cautioned against using activity alone as a reputation input (to the detriment of good quality indicators), we certainly don't want to imply that conversational activity has no place in your system. Far from it. If an item is a topic of conversation, the item should benefit from that interest.
The operators of some popular web sites realize the value of rebuttal mechanisms and have formalized the ability to attach a response to a reputable entity. YouTube's video responses feature (see Figure_6-13 ) is an example. As with any implicit input, however, be careful-the more your site's design shows how your system uses those associations, the more tempting it will be for members of your community to misuse them.
When you're considering all the objects that your system will interact with, and all the interactions between those objects and your users, it's critical to take into account an idea that we have been reinforcing throughout this book: all reputation exists within a limited context, which is always specific to your audience and application. Try to determine the correct scope, or restrictive context, for the reputations in your system. Resist the temptation to lump all reputation-generating interactions into one score-the score will be diluted to the point of meaninglessness. The following example from Yahoo! makes our point perfectly.
This story tells how Yahoo! Sports unsuccessfully tried to integrate social media into its top-tier web site. Even seasoned product managers and designers can fall into the trap of making the scope of an application's objects and interactions much broader than it should be.
Yahoo!'s sports product managers had a belief that they should integrate user generated content quickly across their entire site. They did an audit of their offering, and started to identify candidate objects, reputable entities and some potential inputs.
The site had sports news articles, and the product team knew that they could tell a lot about what was in each article: the recognized team names, sport names, player names, cities, countries, and other important game-specific terms-the objects. They knew that users liked to respond to the articles by leaving text comments-the inputs.
They proposed an obvious intersection of the objects and the inputs: every comment on a news article would be a blog post, tagged with the keywords from the article, and optionally by user-generated tags too. Whenever a tag appeared on another page, such as a different article mentioning the same city, the user's comment on the original article could be displayed.
At the same time, those comments would be displayed on the team and player-detail pages for each tag attached to the comment. The product managers even had aspirations to surface comments on the sports portal, not just for the specific sport, but for all sports.
Seems very social, clever, and efficient, right?
No. It's a horrible design mistake. Consider this detailed example from British football:
An article reports that a prominent player, Mike Brolly, who plays for the Chelsea team, has been injured and may not be able to play in an upcoming championship football match with Manchester United. Users comment on the article, and their comments are tagged with Manchester United, Chelsea, and Brolly.
Those comments would be surfaced-news feed-style-on the article page itself, the sports home page, the football home page, the team pages, and the player page. One post-six destination pages, each with a different context of use, different social norms, and different communities that they've attracted.
Nearly all these contexts are wrong, and the correct contexts aren't even considered:
Online, the cross-posting of the comments on the team pages encouraged conflict between fans of the opposing teams. Fans of opposing teams have completely opposite reactions to the injury of a star player, and intermixing those conversations would yield anti-social (if sometime hilarious) results.
Comments, like reputation statements, are created in a context. In the case of comments, the context is a specific target audience for the message. Here are some possible correct contexts for cross-posting comments:
In this context-where the performance and day-to-day circumstances of real-life players affects the outcome of users' virtual teams-it might be very useful information to have cross-posted right into a league's page.
The terms of service for Fantasy Football are so much more lax than the terms of service for public-facing posts. These players swear and taunt and harass each other. A post such as “Ha, Chris-you and the Bay City Bombers are gonna suck my team's dust tomorrow while Brolly is home sobbing to his mommy!” clearly could not be automatically cross-posted to the main portal page.
When thinking about your objects and user-generated inputs and how to combine them, remember the rule of email:
You need a “subject” line and a “to” line (an addressee, or a small number of addressees).
Tags for user-generated content act as subject identifiers, but not as addressees. Making your addressees as explicit as possible will encourage people to participate in many different ways.
Sharing content too widely discourages contributions and dilutes content quality and value.
When Yahoo! EuroSport, based in the UK, wanted to revise its message board system to provide feedback on which discussions were the highest quality and to provide incentives for users to contribute better content, it turned for help to reputation systems.
It was clear that the scope of reputation was different for each post and for all the posts in a thread and, as the American Yahoo! Sports team had initially assumed, that each user should have one posting karma: other users would flag the quality of a post and that would roll up to their all-sports-message-boards user reputation.
It did not take long for the product team to realize, however, that having Chelsea fans rate the posts of Manchester fans was folly: users would use ratings to disagree with any comment by a fan of another team, not to honestly evaluate the quality of the posting.
The right answer, in this case, ended up being a tighter definition of scope for the context: rather than rewarding “all message boards” participation, or “everything within a particular sport” , instead, an effort was made to identify the most-granular, cohesive units of community possible on the boards, and only reward participation within those narrow scopes.
Yahoo! EuroSport implemented a system of karma medallions (bronze, silver, and gold) rewarding both the quantity and quality of a user's participation on a per-board basis. This carried different repercussions for different sports on the boards.
Each UK football team has it's own dedicated message board, so theoretically an active contributor could earn medallions in any number of football contexts: a gold for participating on the Chelsea boards, a bronze for Manchester.
Many users have only a single medallion, participating mostly on a single board, but some are disciplined and friendly enough to have bronze badges or better in each of multiple boards, and each badge is displayed in a little trophy case when you mouse over the user's avatar or examine the user's profile. See Figure_6-14 .
Now you've established your goals, listed your objects, categorized your inputs, and taken care to group the objects and inputs in appropriate contexts with appropriate scope. You're ready to create the reputation mechanisms that will help you reach your goals for the system.
Though it might be tempting to jump straight to designing the display of reputation to your users, we're going to delay that portion of the discussion until Chapter_7 , where we'll dig into the reasons not to explicitly display some of your most valuable reputation information. Instead on focusing on presentation first, we're going to take a goal-centered approach.
Probably the most important thing to remember when you're thinking about how to generate reputations is the context in which they will be used: your application. You might track bad-user behavior to save money in your customer care flow by prioritizing the worst cases of apparent abuse for quick review. You might also deemphasize cases involving users who are otherwise strong contributors to your bottom line. Likewise, if users evaluate your products and services with ratings and reviews, you will build significant machinery to gather users' claims and transform your application's output on the basis of their aggregated opinions.
For every reputation score you generate and display or use, expect at least 10 times as much development effort to adapt your product to accommodate it-including the user interface and coding to gather the events and transform them into reputation inputs, and all the locations that will be influenced by the aggregated results.
Though all reputation is generated from custom-built models, we've identified certain common patterns in the course of designing reputation systems and observing systems that others have created. These few patterns are not at all comprehensive, and never could be. We provide them as a starting point for anyone whose application is similar to well-established patterns. We'll expand on each reputation generation pattern in the rest of this chapter.
Don't confuse the input types with the reputation generation patterns-what comes in is not always what goes out. In our example in Chap_4-User_Reviews_with_Karma , the inputs were reviews and helpful votes, but one of the generated reputation outputs was a user quality karma score-which had no display symmetry with the inputs, since no user was asked to evaluate another user directly.
Roll-ups are often of a completely different claim type then their component parts, and sometimes, as with karma calculations, the target object of the reputation changes drastically from the evaluator's original target: for example, the author, a user-object, of a movie review gets some reputation from a helpful score given to the review that the author wrote about the movie-object.
This section focuses on calculating reputation, so the patterns don't describe the methods used to display any user's inputs back to the user. Typically, the decision to store users' actions and display them is a function of the application design-for example, users don't usually get access to a log of all of their clicks through a site, even if some of them are used in a reputation system. On the other hand, heavyweight operations, such as user-created reviews with multiple ratings and text fields, are normally at least readable by the creator, and often editable and/or deletable.
The desire to personalization their personal experience (see Chap_5-Fulfillment ) is often the initial driver for many users to go through the effort required provide input to a reputation system. For example, if you tell an application what your favorite music is, it can customize your Internet radio station, making it worth the effort to teach the application your preferences. The effort required to do this also provides a wonderful side effect: it generates voluminous and accurate input into aggregated community ratings.
Personalization roll-ups are stored on a per-user basis and generally consist of preference information that is not shared publicly. Often these reputations are attached to very fine-grained contexts derived from metadata attached to the input targets and therefore can be surfaced, in aggregate, to the public. (See Figure_6-15 .) For example, a song by the Foo Fighters may be listed as being in the “alternative” and “rock” music categories.
When a user marks the song as a favorite, the system would increase the personalization reputation for this user for three entities: “Foo Fighters,” “alternative” , and “rock” . Personalization reputation can require a lot of storage-so plan accordingly-but the benefits to the user experience, and your product offering, may make it well worth the investment.
Reputation models |
Vote-to-promote, favorites, flagging, simple ratings, and so on |
Inputs |
Scalar |
Processes |
Counters, accumulators |
Common uses |
Site personalization and display Input to predictive modeling Personalized search ranking component |
Pros |
A single click is as low-effort as user-generated content gets. Computation is trivial and speedy. Intended for personalization, these inputs can also be used to generate aggregated community ratings to facilitate nonpersonalized discovery of content. |
Cons |
It takes quite a few user inputs before personalization starts working properly, and until then the user experience can be unsatisfactory. (One method of bootstrapping is to create templates of typical user profiles and ask the user to select one to autopopulate a short list of targeted popular objects to rate quickly.) Data storage can be problematic-potentially keeping a score for every target and category per user is very powerful but also very data intensive. |
Generating aggregated community ratings is the process of collecting normalized numerical ratings from multiple sources and merging them into a single score, often an average or a percentage of the total, as in Figure_6-16
Reputation models |
Vote-to-promote, favorites, flagging, simple ratings, and so on |
Inputs |
Quantitative-normalized, scalar |
Processes |
Counters, averages, and ratios |
Common uses |
Aggregated rating display Search ranking component Quality ranking for moderation |
Pros |
A single click is as low-effort as user-generated content gets. Computation is trivial and speedy. |
Cons |
Too many targets can cause low liquidity. Low liquidity limits accuracy and value of the aggregate score. See Chap_3-Low_Liquidity_Effects . Danger exists of using the wrong scalar model. See Chap_3-Bias_Freshness_and_Decay . |
One specific form of aggregate community ratings requires special mechanisms to get useful results: when an application needs to rank a large dataset of objects completely and only a small number of evaluations can be expected from users. For example, a special mechanism would be required to rank the current year's players in each sports league of an annual fantasy sports draft. Hundreds of players would be involved, and there would be no reasonable way that each individual user could evaluate each pair against the others-even rating one pair per second would take many times longer than the available time before the draft. The same is true for community-judged contests in which thousands of users submit content. Letting users rate randomly selected objects on a percentage or star scale doesn't help at all. (See Chap_3-Bias_Freshness_and_Decay .)
This kind of ranking is called preference ordering. When this kind of ranking takes place online, users evaluate successively generated pairs of objects and choose the most appropriate one in each pair. Each participant goes through the process a small number of times, typically less than 10.
The secret sauce is in selecting the pairings. At first, the ranking engine looks for pairs that it knows nothing about, but over time it begins to select pairings that help users sort similarly ranked objects. It also generates pairs to determine whether the user's evaluations are consistent or not. Consistency is good for the system, because it indicates reliability-if a users evaluations fluctuate wildly or don't have a consistent pattern, this indicate a pattern of abuse or manipulation of the ranking.
The algorithms for this approach are beyond the scope of this book, but interested readers can find out more in the references section. This mechanism is complex and requires expertise in statistics to build, so if a reputation model requires this functionality, we recommend using an existing platform as a model.
Participation points are typically a kind of karma in which users accumulate varying amounts of publicly displayable points for taking various actions in an application. Many people see these points as a strong incentive to drive participation and the creation of content. But remember (see Chap_3-First_Mover_Effects ), using points as the only motivation for user actions can push out desirable contributions in favor of lower-quality content that users can submit quickly and easily. Also see Chap_7-Leaderboards_Considered_Harmful for a discussion of the challenges associated with competitive displays of participation points.
Participation points karma is a good example of a pattern in which the inputs (various, often trivial, user actions) don't match the process of reputation generation (accumulating weighted point values) or the output (named levels or raw score).
Activity | Point Award | Maximum/Time |
First participation |
+10 |
+10 |
Log in |
+1 |
+1 per day |
Rate show |
+1 |
+15 per day |
Create avatar |
+5 |
+5 |
Add show or character to profile |
+1 |
+25 |
Add friend |
+1 |
+20 |
Be friended |
+1 |
+50 |
Give best answer |
+3 |
+3 per question |
Have a review voted helpful |
+1 |
+5 per review |
Upload a character image |
+3 |
+5 per show |
Upload a show image |
+5 |
+5 per show |
Add show description |
+3 |
+3 per show |
Reputation models |
Points |
Inputs |
Raw point value (this type of input is risky if disparate applications provide the input; out-of-range values can do significant social damage to your community) An action-type index value for a table lookup of points (this type of input is safer; the points table stays with the model, where it is easier to limit damage and track data trends) |
Processes |
(Weighted) accumulator |
Common uses |
Motivation for users to create content Ranking in leaderboards to engage the most active users Rewards for specific desirable actions Corporate use: identification of influencers or abusers for extended support or moderation In combination with quality karma in creating robust karma. (See Chap_4-Robust_Karma ) |
Pros |
Setup is easy. Incentive is easy for users to understand. Computation is trivial and speedy. Certain classes of users respond positively and voraciously to this type of incentive. See Chap_5-Egocentric_Incentives . |
Cons |
Getting the points-per-action formulation right is an ongoing process, while users continually look for the sweet spot of minimum level of effort for maximum point gain. The correct formulation takes into account the effort required as well as the value of the behavior. See Chap_5-Egocentric_Incentives . Points are a discouragement to many users with altruistic motivations. See Chap_5-Altruistic_Incentives and Chap_7-Leaderboards_Considered_Harmful . |
Point systems are increasingly being used as game currencies. Social games offered by developers such as Zynga generate participation points that users can spend on special benefits in the game, such as unique items or power-ups that improve the experience of the game. (See Figure_6-17 ) Such systems have exploded with the introduction of the ability to purchase the points for real money.
If you consider any points-as-currency scheme, keep in mind that because the points reflect (and may even be exchangeable for) real money, such schemes place the motivations for using your application further from altruism and more in the range of a commercial driver.
Even if you don't officially offer the points for sale and your application allows users to spend them only on virtual items in the game, a commercial market may still arise for them. A good historical example of this kind of aftermarket is the sale of game characters for popular online multiplayer games, such as World of Warcraft. Character levels in a game represent participation or experience points, which in turn represent real investments of time and/or money. For more than a decade, people have been power-leveling game characters and selling them on eBay for amounts in the thousands of dollars.
We recommend against turning reputation points into a currency of any kind unless your application is a game and it is central to your business goals. More discussion of online economies and how they interact with reputation systems is beyond the scope of this book, but an ever-increasing amount of literature on the topic of real-money trading (RMT) is readily available on the Internet.
Compound community claims reflect multiple separate, but related, aggregated claims about a single target and include patterns such as reviews and rated message board posts. But the power of attaching compound inputs of different types from multiple sources lets users understand multiple facets of an object's reputation.
For example, ConsumerReports.org generates two sets of reputation for objects: the scores generated as a result of the tests and criteria set forth in the labs, and the average user ratings and comments provided by customers on the web site. (See Figure_6-18 .) These scores can be displayed side by side to allow the site's users to evaluate a product both on numerous standard measures and on untested and unmeasured criteria. For example, user comments on front-loading clothes washers often mention odors, because former users of top-loading washers don't necessarily know that a front-loading machine needs to be hand-dried after every load. This kind of subtle feedback can't be captured in strictly quantitative measures.
Though compound community claims can be built from diverse inputs from multiple sources, the ratings-and-reviews pattern is well established and deserves special comment here. Asking a user to create a multipart review is a very heavyweight activity-it takes time to compose a thoughtful contribution. Users' time is scarce, and research at Yahoo! and elsewhere has shown that users often abandon the process if extra steps are required, such as logging in, registration for new users, or multiple screens of input.
Even if it's necessary for business reasons, these barriers to entry will significantly increase the abandon rate for your review creation process. People need a good reason to take time out of their day to create a complex review. Be sure to understand your model (see Chap_5-Incentives ) and the effects it may have on the tone and quality of your content. For an example of the effects of incentive on compound community claims, see Chap_5-Friendship_Incentive .
Reputation models |
Ratings-and-reviews, eBay merchant feedback, and so on |
Inputs |
All types from multiple sources and source types, as long as they all have the same target |
Processes |
All appropriate process types apply-every compound community claim is custom built |
Common uses |
User-created object reviews Editor-based roll-ups, such as movie reviews by media critics Side-by-side combinations of user, process, and editorial claims |
Pros |
This type of input is flexible; any number of claims can be kept together. This type of input provides easy global access; all the claims have the same target. If you know the target ID, you can get all reputations with a single call. Some standard formats for this type of input-for example, the ratings-and-reviews format-are well understood by users. |
Cons |
If a user is explicitly asked to create too many inputs, incentive can become a serious impediment to getting a critical mass of contributions on the site. Straying too far from familiar formatting, either for input or output, can create confusion and user fatigue. There is some tension between format familiarity and choosing the correct input scale. See Chap_6-Good_Inputs . |
What happens when you want to make a value judgment about a user who's new to your application? Is there an alternative to the general axiom that “no participation equals no trust” ? In many scenarios, you need an inferred reputation score-a lower-confidence number that can be used to help make low-risk decisions about the a user's trustworthiness until the user can establish an application-specific karma score. (See Figure_6-19 .)
In a web application, proxy reputations may be available even for users who have never created an object, posted a comment, or clicked a single thumb-up. The user's browser possesses session cookies that can hold simple activity counters even for logged-out users; the user is connected through an IP address that can have a reputation of its own (if it was recently or repeatedly used by a known abuser); and finally the user may have an active history with a related product that could be considered in a proxy reputation.
Remembering that the best karma is positive karma (see Chap_6-Negative_Public_Karma ), when an otherwise unknown user evaluates an object in your system and you want to weight the user's input, you can use the inferences from weak reputations to boost the user's reputation from 0 to a reasonable fraction (for example, up to 25%) of the maximum value.
A weak karma score should be used only temporarily while a user is establishing robust karma, and, because it is a weak indicator, it should provide a diminishing share of the eventual, final score. (The share of karma provided by inferred karma should diminish as more trustworthy inputs become available to replace it.) One weighting method is to make the inferred share a bonus on top of the total score (the total can exceed 100%) and then clamp the value to 100% at the end.
Reputation models |
Models are always custom; inferred karma is known to be part of the models in the following applications:
|
Inputs |
Application external values; examples include the following: * User account longevity * IP address abuse score * Browser cookie activity counter or help-disabled flag * External trusted karma score |
Processes |
Custom mixer |
Common uses |
Partial karma substitute: separating the partially known from the complete strangers Help system display: giving unknown users extra navigation help Lockout of potentially abused features, such as content editing, until the user has demonstrated familiarity with the application and lack of hostility to it Deciding when to route new contributions to customer care for moderation |
Pros |
Allows for a significantly lower barrier for some user contributions than otherwise possible, for example, not requiring registration or login. Provides for corporate (internal use) karma: no user knows this score, and the site operator can change the application's calculation method freely as the situation evolves and new proxy reputations become available. Helps render your application impervious to accidental damage caused by drive-by users. |
Cons |
Inferred karma is, by construction, unreliable. For example, since people can share an IP address over time without knowing it or each other, including it in a reputation can undervalue an otherwise excellent user by accident. However, though it might be tempting for that reason to remove IP reputation from the model, IP address is the strongest indicator of bad users; such users don't usually go to the trouble of getting a new IP address whenever they want to attack your site. Inferred karma can be expensive to generate. How often do you want to update the supporting reputations, such as IP or cookie reputation? It would be too expensive to update them at very single HTTP roundtrip, so smart design is required. Inferred karma is weak. Don't trust it alone for any legally or socially significant actions. |
Because an underlying karma score is a number, product managers often misunderstand the interaction between numerical values and online identity. The thinking goes something like this:
This thinking-though seemingly intuitive-is impoverished, and is wrong in at least two important ways:
Even eBay, with the most well-known example of public negative karma, doesn't represent how untrustworthy an actual seller might be-it only gives buyers reasons to take specific actions to protect themselves. In general, avoid negative public karma. If you really want to know who the bad guys are, keep the score separate and restrict it to internal use by moderation staff.
The Sims Online was a multiplayer version of the popular Sims games by Electronic Arts and Maxis in which the user controlled an animated character in a virtual world with houses, furniture, games, virtual currency (called Simoleans), rental property, and social activities. You could call it playing dollhouse online.
One of the features that supported user socialization in the game was the ability to declare that another user was a trusted friend. The feature involved a graphical display that showed the faces of users who had declared you trustworthy outlined in green, attached in a hub-and-spoke pattern to your face in the center.
People checked each other's hubs for help in deciding whether to take certain in-game actions, such as becoming roommates in a house. Decisions like these are costly for a new user-the ramifications of the decision stick with a newbie for a long time, and “backing out” of a bad decision is not an easy thing to do. The hub was a useful decision-making device for these purposes.
That feature was fine as far as it went, but unlike other social networks, The Sims Online allowed users to declare other users un trustworthy too. The face of an untrustworthy user appeared circled in bright red among all the trustworthy faces in a user's hub.
It didn't take long for a group calling itself the Sims Mafia to figure out how to use this mechanic to shake down new users when they arrived in the game. The dialog would go something like this:
“Hi! I see from your hub that you're new to the area. Give me all your Simoleans or my friends and I will make it impossible to rent a house.”
“What are you talking about?”
“I'm a member of the Sims Mafia, and we will all mark you as untrustworthy, turning your hub solid red (with no more room for green), and no one will play with you. You have five minutes to comply. If you think I'm kidding, look at your hub-three of us have already marked you red. Don't worry, we'll turn it green when you pay…”
If you think this is a fun game, think again-a typical response to this shakedown was for the user to decide that the game wasn't worth $10 a month. Playing dollhouse doesn't usually involve gangsters. It's hard to estimate the final cost to EA & Maxis for such a simple design decision, in terms of lost users, abandoned accounts and cancelled subscriptions.
In your own community and application design, think twice about overtly displaying negative reputation, or putting such direct means in the hands of the community to affect other's reputations. You risk enabling your own mafias to flourish.
With your goals, objects, inputs, and reputation patterns in hand, you can draw a draft reputation model diagram and sketch out the flows in enough detail to generate the following questions: what data will I need to formulate these reputation scores correctly?; how will I collect the claims and transform them into inputs?; which of those inputs will need to be reversible, and which will be disposable?
If you're using this book as a guide, try sketching out a model now, before you consider creating screen mockups. One approach we've often found helpful is to start on the right side of the diagram-with the reputations you want to generate-and work your way back to the inputs. Don't worry about the calculations at first; just draw a process box with the name of the reputation inside and a short note on the general nature of the formulation, such as aggregated acting average or community player rank.
Once you've drawn the boxes, connect them with arrows where appropriate. Then consider what inputs go into which boxes; don't forget that the arrows can split and merge as needed.
Then, after you have a good rough diagram, start to dive into the details with your development team. Many mathematical and performance-related details will affect your reputation model design. We've found that reputation systems diagrams make excellent requirements documentation and make it easier to generate the technical specification, while also making the overall design accessible to non-engineers.
Of course, your application will consist of displaying or using the reputations you've diagrammed. Those are the topics of Chapter_7 :“Displaying Reputation” and Chapter_8 :“Using Reputation, The Good, the Bad, and the Ugly” . Project engineers, architects, and operational team members may want to review Chapter_9 :“Application Integration, Testing & Tuning” first, as it completes the schedule focused, development-cycle view of any reputation project.