Main

October 01, 2013

Social Networks, Identity, Psudonyms, & Influence Podcast Episodes

Here are the first 4 episodes of The Social Media Clarity Podcast:

  1. Social Network: What is it, and where do I get one? (mp3) 26 Aug 2013
  2. HuffPo, Identity, and Abuse (mp3) 5 Sep 2013  NEW
  3. Save our Pseudonyms! (Guest: Dr. Bernie Hogan) (mp3) 16 Sep 2013  NEW
  4. Influence is a Graph (mp3) 30 Sep 2013  NEW
Subscribe via iTunes

Subscribe via RSS

Listen on Stitcher

Like us on Facebook

October 28, 2009

Ebay's Merchant Feedback System

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week, we explore, to some depth, one of the Web's longest-running and highest-profile reputation systems. (We also test-drive our new Google-maps powered zoomable diagrams. Wheee!)

EBay contains the Internet's most well-known and studied user reputation or karma system: seller feedback. Its reputation model, like most others that are several years old, is complex and continuously adapting to new business goals, changing regulations, improved understanding of customer needs, and the never-ending need to combat reputation manipulation through abuse.

Rather than detail the entire feedback karma model here, we'll focus on claims that are from the buyer and about the seller. An important note about eBay feedback is that buyer claims exist in a specific context: a market transaction-a successful bid at auction for an item listed by a seller. This specificity leads to a generally higher quality-karma score for sellers than they would get if anyone could just walk up and rate a seller without even demonstrating that they'd ever done business with them; see Chapter 1- Implicit Reputation.

The scrolling/zooming diagram below shows how buyers influence a seller's karma scores on eBay. Though the specifics are unique to eBay, the pattern is common to many karma systems. For an explanation of the graphical conventions used, see Chapter 2.

The reputation model in this figure was derived from the following eBay pages: http://pages.ebay.com/help/feedback/scores-reputation.html and http://pages.ebay.com/services/buyandsell/welcome.html, both current as of July 2009.

We have simplified the model for illustration, specifically by omitting the processing for the requirement that only buyer feedback and Detailed Seller Ratings (DSR) provided over the previous 12 months are considered when calculating the positive feedback ratio, DSR community averages, and–by extension–power seller status. Also, eBay reports user feedback counters for the last month and quarter, which we are omitting here for the sake of clarity. Abuse mitigation features, which are not publicly available, are also excluded.

This diagram illustrates the seller feedback karma reputation model, which is made out of typical model components: two compound buyer input claims-seller feedback and detailed seller ratings-and several roll-ups of the seller's karma: community feedback ratings (a counter), feedback level (a named level), positive feedback percentage (a ratio), and the power seller rating (a label).

The context for the buyer's claims is a transaction identifier-the buyer may not leave any feedback before successfully placing a winning bid on an item listed by the seller in the auction market. Presumably, the feedback primarily describes the quality and delivery of the goods purchased. A buyer may provide two different sets of complex claims, and the limits on each vary:

  • 1. Typically, when a buyer wins an auction, the delivery phase of the transaction starts and the seller is motivated to deliver the goods of the quality advertised in a timely manner. After either a timer expires or the goods have been delivered, the buyer is encouraged to leave feedback on the seller, a compound claim in the form of a three-level rating-positive, neutral, or negative-and a short text-only comment about the seller and/or transaction. The ratings make up the main component of seller feedback karma.
  • 2. Once each week in which a buyer completes a transaction with a seller, the buyer may leave detailed seller ratings, a compound claim of four separate 5-star ratings in these categories: item as described,communications,shipping time,and shipping and handling charges.The only use of these ratings, other than aggregation for community averages, is to qualify the seller as a power seller.

EBay displays an extensive set of karma scores for sellers: the amount of time the seller has been a member of eBay; color-coded stars; percentages that indicate positive feedback; more than a dozen statistics track past transactions; and lists of testimonial comments from past buyers or sellers. This is just a partial list of the seller reputations that eBay puts on display.

The full list of displayed reputations almost serves as a menu of reputation types present in the model. Every process box represents a claim displayed as a public reputation to everyone, so to provide a complete picture of eBay seller reputation, we'll simply detail each output claim separately:

  • 3. The feedback score counts every positive rating given by a buyer as part of seller feedback, a compound claim associated with a single transaction. This number is cumulative for the lifetime of the account, and it generally loses its value over time-buyers tend to notice it only if it has a low value.

It is fairly common for a buyer to change this score, within some time limitations, so this effect must be reversible. Sellers spend a lot of time and effort working to change negative and neutral ratings to positive ratings to gain or to avoid losing a power seller rating. When this score changes, it is then used to calculate the feedback level.

  • 4. The feedback level claim is a graphical representation (in colored stars) of the feedback score. This process is usually a simple data transformation and normalization process; here we've represented it as a mapping table, illustrating only a small subset of the mappings. This visual system of stars on eBay relies, in part, on the assumption that users will know that a red shooting star is a better rating than a purple star. But we have our doubts about the utility of this representation for buyers. Iconic scores such as these often mean more to their owners, and they might represent only a slight incentive for increasing activity in an environment in which each successful interaction equals cash in your pocket.
  • 5. The community feedback rating is a compound claim containing the historical counts for each of the three possible seller feedback ratings-positive, neutral, and negative-over the last 12 months, so that the totals can be presented in a table showing the results for the last month, 6 months, and year. Older ratings are decayed continuously, though eBay does not disclose how often this data is updated if new ratings don't arrive. One possibility would be to update the data whenever the seller posts a new item for sale.

The positive and negative ratings are used to calculate the positive feedback percentage.

  • 6. The positive feedback percentage claim is calculated by dividing the positive feedback ratings by the sum of the positive and negative feedback ratings over the last 12 months. Note that the neutral ratings are not included in the calculation. This is a recent change reflecting eBay's confidence in the success of updates deployed in the summer of 2008 to prevent bad sellers from using retaliatory ratings against buyers who are unhappy with a transaction (known as tit-for-tat negatives). Initially this calculation included neutral ratings because eBay feared that negative feedback would be transformed into neutral ratings. It was not.

This score is an input into the power seller rating, which is a highly-coveted rating to achieve. This means that each and every individual positive and negative rating given on eBay is a critical one–it can mean the difference for a seller between acquiring the coveted power seller status, or not.

  • 7. The Detailed Seller Ratings community averages are simple reversible averages for each of the four ratings categories: item as described,communications,shipping time,and shipping and handling charges.There is a limit on how often a buyer may contribute DSRs.

EBay only recently added these categories as a new reputation model because including them as factors in the overall seller feedback ratings diluted the overall quality of seller and buyer feedback. Sellers could end up in disproportionate trouble just because of a bad shipping company or a delivery that took a long time to reach a remote location. Likewise, buyers were bidding low prices only to end up feeling gouged by shipping and handling charges. Fine-grained feedback allows one-off small problems to be averaged out across the DSR community averages instead of being translated into red-star negative scores that poison trust overall. Fine-grained feedback for sellers is also actionable by them and motivates them to improve, since these DSR scores make up half of the power seller rating.

  • 8. The power seller rating, appearing next to the seller's ID, is a prestigious label that signals the highest level of trust. It includes several factors external to this model, but two critical components are the positive feedback percentage, which must be at least 98%, and the DSR community averages, which each must be at least 4.5 stars (around 90% positive). Interestingly, the DSR scores are more flexible than the feedback average, which tilts the rating toward overall evaluation of the transaction rather than the related details.

Though the context for the buyer's claims is a single transaction or history of transactions, the context for the aggregate reputations that are generated is trust in the eBay marketplace itself. If the buyers can't trust the sellers to deliver against their promises, eBay cannot do business. When considering the roll-ups, we transform the single-transaction claims into trust in the seller, and–by extension–that same trust rolls up into eBay. This chain of trust is so integral and critical to eBay's continued success that they must continuously update the marketplace's interface and reputation systems.

October 21, 2009

User Motivations & System Incentives

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week's entry summarizes our model for describing user motivations and incentives for participation in reputation systems.

This is a short summary of a large section of Chapter 6 of our book, Building Web Reputation Systems, entitled Incentives for User Participation, Quality, and Moderation. For this blog post, the content is being shuffled a bit. First we will name the motivations and related incentive models, then we'll describe how reputation systems interact with each motivational category. To read a more detailed discussion of the incentive sub-categories, read the Chapter 6.

Motivations and Incentives for social media participation:

  • Altruistic motivation: for the good of others
    • Tit-for-Tat or Pay-it-Forward incentives: I do it because someone else did it for me first"
    • Friendship incentives: "I do it because I care about others who will consume this"
    • Know-it-All or Crusader or Opinionated incentives: "I do it because I know something everyone else needs to know"
  • Commercial motivation: to generate revenue
    • Direct revenue incentives: Extracting commercial value (better yet, cash) directly from the user as soon as possible
    • Branding incentives: Creating indirect value by promotion - revenue will follow later
  • Egocentric motivation: for self-gratification
    • Fulfillment incentives: The desire to complete a task, assigned by oneself, a friend, or the application
    • Recognition incentives: The desire for the praise of others
    • The Quest for Mastery: Personal and private motivation to improve oneself

Altruistic or Sharing Incentives

Altruistic, or sharing, incentives reflect the giving nature of users who have something to share-a story, a comment, a photo, an evaluation-and who feel compelled to share it on your site. Their incentives are internal: they may feel an obligation to another user or to a friend, or they may feel loyal to (or despise) your brand.

When you're considering reputation models that offer altruistic incentives, remember that these incentives exist in the realm of social norms-they're all about sharing, not accumulating commercial value or karma points. Avoid aggrandizing users driven by altruistic incentives-they don't want their contributions to be counted, recognized, ranked, evaluated, compensated, or rewarded in any significant way. Comparing their work to anyone else's will actually discourage them from participating.

(See more on Tit-for-Tat, Friend, and Know-it-All altruistic incentives.)

Commercial Incentives

Commercial incentives reflect people's motivation to do something for money, though the money may not come in the form of direct payment from the user to the content creator. Advertisers have a nearly scientific understanding of the significant commercial value of something they call branding. Likewise, influential bloggers know that their posts build their brand, which often involves the perception of them as subject matter experts. The standing that they establish may lead to opportunities such as speaking engagements, consulting contracts, improved permanent positions at universities or prominent corporations, or even a book deal. A few bloggers may actually receive payment for their online content, but more are capturing commercial value indirectly.

Reputation models that exhibit content control patterns based on commercial incentives must communicate a much stronger user identity. They need strong and distinctive user profiles with links to each user's valuable contributions and content. For example, as part of reinforcing her personal brand, an expert in textile design would want to share links to content that she thinks her fans will find noteworthy.

But don't confuse the need to support strong profiles for contributors with the need for a strong or prominent karma system. When a new brand is being introduced to a market, whether it's a new kind of dish soap or a new blogger on a topic, a karma system that favors established participants can be a disincentive to contribute content. A community decides how to treat newcomers-with open arms or with suspicion. An example of the latter is eBay, where all new sellers must "pay their dues" and bend over backward to get a dozen or so positive evaluations before the market at large will embrace them as trustworthy vendors. Whether you need karma in your commercial incentive model depends on the goals you set for your application. One possible rule of thumb: If users are going to pass money directly to other people they don't know, consider adding karma to help establish trust.

(See more on Direct revenue and Branding commercial incentives.)

Egocentric Incentives

Egocentric incentives are often exploited in the design online in computer games and many reputation based web sites. The simple desire to accomplish a task taps into deeply hard-wired motivations described in behavioral psychology as classical and operant conditioning (which involves training subjects to respond to food-related stimulus) and schedules of reinforcement. This research indicates that people can be influenced to repeat simple tasks by providing periodic rewards, even a reward as simple as a pleasing sound.

But, an individual animal's behavior in the social vacuum of a research lab is not the same as the ways in which we very social humans reflect our egocentric behaviors to one another. Humans make teams and compete in tournaments. We follow leaderboards comparing ourselves to others and comparing groups that we associate ourselves with. Even if our accomplishments don't help another soul or generate any revenue for us personally, we often want to feel recognized for them. Even if we don't seek accolades from our peers, we want to be able to demonstrate mastery of something-to hear the message "You did it! Good job!"

Therefore, in a reputation system based on egocentric incentives, user profiles are a key requirement. In this kind of system, users need someplace to show off their accomplishments-even if only to themselves. Almost by definition, egocentric incentives involve one or more forms of karma. Even with only a simple system of granting trophies for achievements, users will compare their collections to one another. New norms will appear that look more like market norms than social norms: people will trade favors to advance their karma, people will attempt to cheat to get an advantage, and those who feel they can't compete will opt out altogether.

Egocentric incentives and karma do provide very powerful motivations, but they are almost antithetical to altruistic ones. The egocentric incentives of many systems have been over-designed, leading to communities consisting almost exclusively of experts. Consider just about any online role playing game that survived more than three years. For example, to retain its highest-level users and the revenue stream they produce, Worlds of Warcraft must continually produce new content targeted at those users. If they stop producing new content for their most dedicated users, their business will collapse. This elder game focus stunts WoW's growth -- parent company Blizzard has all-but-abandoned improvements aimed at acquiring new users. When new users do arrive (usually in the wake of a marketing promotion), they end up playing alone because the veteran players are only interested in the new content and don't want to bother going through the long slog of playing through the lowest levels of the game yet again.

(See more on Fulfillment, Recognition, and Quest-for-Mastery egocentric incentives.)

September 30, 2009

First Mover Effects

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week's essay is concerned with important downstream effects that can arise from the first tentative days & weeks of a community's formation. It is excerpted from Chapter 4: Building Blocks and Reputation Tips.

When an application handles quantitative measures based on user input, whether it's ratings or measuring participation by counting the number of contributions to a site, several issues arise-all resulting from bootstrapping of communities-that we group together under the term first-mover effects.

Early Behavior Modeling and Early-Ratings Bias

The first people to contribute to a site have a disproportionate effect on the character and future contributions of others. After all, this is social media, and people usually try to fit into any new environment. For example, if the tone of comments is negative, new contributors will also tend to be negative, which will also lead to bias in any user-generated ratings. See Ratings Bias Effects.

When an operator introduces user-generated content and associated reputation systems, it is important to take explicit steps to model behavior for the earliest users in order to set the pattern for those who follow.

Discouraging New Contributors

Take special care with systems that contain leaderboards when they're used either for content or for users. Items displayed on leaderboards tend to stay on the leaderboards, because the more people who see those items and click, rate, and comment on them, the more who will follow suit, creating a self-sustaining feedback loop.

This loop not only keeps newer items and users from breaking into the leaderboards, it discourages new users from even making the effort to participate by giving the impression that they are too late to influence the result in any significant way. Though this phenomenon applies to all reputation scores, even for digital cameras, it's particularly acute in the case of simple point-based karma systems, which give active users ever more points for activity so that leaders, over years of feverish activity, amass millions of points, making it mathematically impossible for new users to ever catch up.

September 23, 2009

Party Crashers (or 'Who invited these clowns?')

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week, we look at some of the possible effects when unanticipated guests enter into your carefully-planned and modeled system. This essay is excerpted from Chapter 5.

Reputation can be a successful motivation for users to contribute large volumes of content and/or high-quality content to your application. At the very least, reputation can provide critical money-saving value to your customer care department by allowing users to prioritize the bad content for attention and likewise flag power users and content to be featured.

But mechanical reputation systems, of necessity, are always subject to unwanted or unanticipated manipulation: they are only algorithms, after all. They cannot account for the many, sometimes conflicting, motivations for users' behavior on a site. One of the strongest motivations of users who invade reputation systems is commercial. Spam invaded email. Marketing firms invade movie review and social media sites. And drop-shippers are omnipresent on eBay.

EBay drop-shippers put the middleman back into the online market: they are people who resell items that they don't even own. It works roughly like this:

  1. A seller develops a good reputation, gaining a seller feedback karma of at least 25 for selling items that she personally owns.
  2. The seller buys some drop-shipping software, which helps locate items for sale on eBay and elsewhere cheaply, or joins an online drop-shipping service that has the software and presents the items in a web interface.
  3. The seller finds cheap items to sell and lists them on eBay for a higher price than they're available for in stores but lower than other eBay sellers are selling them for. The seller includes an average or above-average shipping and handling charge.
  4. The seller sells an item to a buyer, receives payment, and sends an order for the item, along with a drop-shipping payment, to the drop-shipper (D), who then delivers the item to the buyer.

This model of doing business was not anticipated by the eBay seller feedback karma model, which only includes buyers and sellers as reputation entities. Drop-shippers are a third party in what was assumed to be a two-party transaction, and they cause the reputation model to break in various ways:

  • The original shippers sometimes fail to deliver the goods as promised to the buyer. The buyer then gets mad and leaves negative feedback: the dreaded red star. That would be fine, but it is the seller-who never saw or handled the good-that receives the mark of shame, not the actual shipping party.
  • This arrangement is a big problem for the seller, who cannot afford the negative feedback if she plans to continue selling on eBay.
  • The typical options for rectifying a bungled transaction won't work in a drop-shipper transaction: it is useless for the buyer to return the defective goods to the seller. (They never originated from the seller anyway.) Trying to unwind the shipment (the buyer returns the item to the seller; the seller returns it to the drop-shipper-if that is even possible; the drop-shipper buys or waits for a replacement item and finally ships it) would take too long for the buyer, who expects immediate recompense.

In effect, the seller can't make the order right with the customer without refunding the purchase price in a timely manner. This puts them out-of-pocket for the price of the goods along with the hassle of trying to recover the money from the drop-shipper.

But a simple refund alone sometimes isn't enough for the buyer! No, depending on the amount of perceived hassle and effort this transaction has cost them, they are still likely to rate the transaction negatively overall. (And rightfully so – once it's become evident that a seller is working through a drop-shipper, many of their excuses and delays start to ring very hollow.) So a seller may have, at this point, outlayed a lot of their own time and money to rectify a bad transaction only to still suffer the penalties of a red star.

What option does the seller have left to maintain their positive reputation? You guessed it – a payoff. Not only will a concerned seller eat the price of the goods – and any shipping involved – but they will also pay an additional cash bounty (typically up to $20.00) to get buyers to flip a red star to green.

What is the cost of clearing negative feedback on drop-shipped goods? The cost of the item + $20.00 + lost time in negotiating with the buyer. That's the cost that reputation imposes on drop-shipping on eBay.

The lesson here is that a reputation model will be reinterpreted by users as they find new ways to use your site. Site operators need to keep a wary eye on the specific behavior patterns they see emerging and adapt accordingly. Chapter 10 provides more detail and specific recommendations for prospective reputation modelers.

September 09, 2009

Time Decay in Reputation Systems

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week's essay is excerpted from Chapter 4: Building Blocks and Reputation Tips.

Time leeches value from reputation: the section called “First Mover Effects” discussed how simple reputation systems grant early contributions are disproportionately valued over time, but there's also the simple problem that ratings become stale over time as their target reputable entities change or become unfashionable - businesses change ownership, technology becomes obsolete, cultural mores shift.

The key insight to dealing with this problem is to remember the expression “What did you do for me this week?” When you're considering how your reputation system will display reputation and use it indirectly to modify the experience of users, remember to account for time value. A common method for compensating for time in reputation values is to apply a decay function: subtract value from the older reputations as time goes on, at a rate that is appropriate to the context. For example, digital camera ratings for resolution should probably lose half their weight every year, whereas restaurant reviews should only lose 10% of their value in the same interval.

Here are some specific algorithms for decaying a reputation score over time:

  • Linear Aggregate Decay
    • Every score in the corpus is decreased by a fixed percentage per unit time elapsed, whenever it is recalculated. This is high performance, but scarcely updated reputations will have dispassionately high values. To compensate, a timer input can perform the decay process at regular intervals.
  • Dynamic Decay Recalculation
    • Every time a score is added to the aggregate, recalculate the value of every contributing score. This method provides a smoother curve, but it tends to become computationally expensive O(n2) over time.
  • Window-based Decay Recalculation
    • The Yahoo! Spammer IP reputation system has used a time window based decay calculation: fixed time or a fixed-size window of previous contributing claim values is kept with the reputation for dynamic recalculation when needed. New values push old values out of the window, and the aggregate reputation is recalculated from those that remain. This method produces a score with the most recent information available, but the information for low-liquidity aggregates may still be old.
  • Time-limited Recalculation
    • This is the de facto method that most engineers use to present any information in an application: use all of the ratings in a time range from the database and compute the score just in time. This is the most costly method, because it involves always hitting the database to consider an aggregate reputation (say, for a ranked list of hotels), when 99% of the time the value is exactly the same as it was the last time it was calculated. This method also may throw away still contextually valid reputation. We recommend trying some of the higher-performance suggestions above.

August 19, 2009

Low Liquidity Compensation for Reputation Systems

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week's essay is excerpted from Chapter 4: Building Blocks and Reputation Tips. This tip provides a solution to an age old problem with ratings.
 

A question of liquidity -

When is 4.0 > 5.0? When enough people say it is!

 
  --2007, F. Randall Farmer, Yahoo! Community Analyst

Consider the following problem with simple averages: it is mathematically unreasonable to compare two similar targets with averages made from significantly different numbers of inputs. For the first target, suppose that there are only three ratings averaging 4.667 stars, which after rounding displays as , and you compare that average score to a target with a much greater number of inputs, say 500, averaging 4.4523 stars, which after rounding displays as only . The second target, the one with the lower average, better reflects the true consensus of the inputs, since there just isn't enough information on the first target to be sure of anything. Most simple-average displays with too few inputs shift the burden of evaluating the reputation to users by displaying the number of inputs alongside the simple average, usually in parentheses, like this: (142) .

But pawning off the interpretation of averages on users doesn't help when you're ranking targets on the basis of averages-a lone rating on a brand-new item will put the item at the top of any ranked results it appears in. This effect is inappropriate and should be compensated for.

We need a way to adjust the ranking of an entity based on the quantity of ratings. Ideally, an application performs these calculations on the fly so that no additional storage is required.

We provide the following solution: a high-performance liquidity compensation algorithm to offset variability in very small sample sizes. It's used on Yahoo! sites to which many new targets are added daily, with the result that, often, very few ratings are applied to each one.

  • RankMean
    • r = SimpleMean m - AdjustmentFactor a + LiquidityWeight l * Adjustment Factor a
  • LiquidityWeight
    • l = min(max((NumRatings n - LiquidityFloor f) / LiquidityCeiling c, 0), 1) * 2
  • Or
    • r = m - a + min(max((n - f) / c, 0.00), 1.00) * 2.00 * a

This formula produces a curve seen in the figure below. Though a more mathematically continuous curve might seem appropriate, this linear approximation can be done with simple nonrecursive calculations and requires no knowledge of previous individual inputs.

Figure: The effects of the liquidity compensation algorithm

Suggested initial values for a , c , and f (assuming normalized inputs):

  • AdjustmentFactor
    • a = 0.10

This constant is the fractional amount to remove from the score before adding back in effects based on input volume. For many applications, such as 5-star ratings, it should be within the range of integer rounding error-in this example, if the AdjustmentFactor is set much higher than 10%, a lot of 4-star entities will be ranked before 5-star ones. If it's set too much lower, it may not have the desired effect.

  • LiquidityFloor
    • f = 10

This constant is the threshold for which we consider the number of inputs required to have a positive effect on the rank. In an ideal environment, this number is between 5 and 10, and our experience with large systems indicates that it should never be set lower than 3. Higher numbers help mitigate abuse and get better representation in consensus of opinion.

  • LiquidityCeiling
    • c = 60

This constant is the threshold beyond which additional inputs will not get a weighting bonus. In short, we trust the average to be representative of the optimum score. This number must not be lower than 30, which in statistics is the minimum required for a t-score. Note that the t-score cutoff is 30 for data that is assumed to be unmanipulated (read: random). We encourage you to consider other values for a , c , and f , especially if you have any data on the characteristics of your sources and their inputs..