June 16, 2014

Web Reputation Systems and the Real World

Web Reputation Systems and the Real World

By Randy Farmer - This essay originally appeared as a chapter in The Reputation Society: How Online Opinions Are Reshaping the Offline World from MIT Press

When anticipating the future of society and technology, it is advisable to look to the past for relevant successes and failures. Randy Farmer shares some of the signposts, warnings, and potential solutions he has encountered over two decades of building reputation systems.

Digital reputation is rapidly expanding in real-world influence. Bryce Glass and I have studied hundreds of online reputation systems, including dozens that we helped develop. As we noted in the preface to Farmer and Glass (2010a, ix), “Today’s Web is the product of over a billion hands and minds. Around the clock and around the globe, a world full of people are pumping out contributions small and large: full-length features on Vimeo; video shorts on YouTube; comments on Blogger; discussions on Yahoo! Groups; and tagged-and-titled Del.icio.us bookmarks.”
Given the myriad contexts in which reputation is being used, it may be surprising to the reader that there is not yet any consensus on this field’s fundamental terminology. We therefore define reputation as “information used to make a value judgment about an object or a person.” But we acknowledge the special role of the reputation(s) of a person and therefore suggest a special term for this: karma. A good example of karma from real life is your credit score.

Reputation System Evolution and Challenges

Preweb Reputation Systems

Modern digital and web reputations are deeply rooted in pre-Internet social systems. In this essay and others in this volume, examples are drawn from real-world models like the credit scoring industry’s measures of creditworthiness and its flagship reputation score, FICO, which was created by Fair Isaac Corporation. There are many industries that rate their members and suppliers. Cities get ratings for their bonds. Products have received the Underwriters Laboratories (UL) or the Good Housekeeping seals of approval. Consumers Digest published five-star ratings long before anyone had access to computers in the home.

But not all real-world reputation systems have been successful or continued in the same form over time. Looking closely at these systems and how they have evolved may save the developers of digital reputation systems millions of dollars by identifying possible best practices and helping to avoid dead ends.

Web Reputation Systems

As recently as fifteen years ago, it was unlikely that you could gather enough information for a consensus about the quality of some obscure item from your coworkers, friends, and family. Your ability to benefit from reputation was limited by the scope of your social circle.
Now, thousands of strangers are publishing their opinions, which are aggregated into digital reputation. Want to buy an antique chess set? Look for a good seller feedback score on eBay. Figure out what movie to rent from Netflix? Five-star ratings and reviews will point you in the right direction. Want to find out what charities your friends are into? Look at their “likes” on Facebook.
On the surface, using digital reputation systems seems better and easier than tapping your social circle. However, there are some interesting—and even troublesome— side effects of reducing reputation to digital scores created from the input of strangers:
• Digital reputation scores are limited in the subtlety of their inputs. Only numbers go in, and those numbers usually come from simple user actions, such as “Marked an Item as Favorite.” But these actions don’t always accurately represent the sentiment that was intended, such as when marking something as a favorite is used for convenience to tag objectionable or novel items.
• Aggregated numerical scores capture only one dimension of a user’s opinions. For example, eBay’s seller feedback score was eventually enhanced with detailed seller ratings that reflect features of a transaction with scores called “Item as described,” “Communications,” “Shipping time,” and “Shipping and handling charges.”
• Another class of weaknesses in digital reputation is manipulation by stakeholders. A cadre of fraudulent New York electronics vendors reportedly used to invade Yahoo! Shopping every Christmas, building up false good-reputation for sales by cross-rating each other with five stars (Hawk 2005).
In order for digital reputation to come out of its early experimental period, we — that is, the community of web reputation designers and developers — need to acknowledge and address these technical, social, and potentially political issues. We need to work together to mitigate foreseeable risks based on historical lessons and to avoid a high-visibility economic or political event that will thrust these problems into the public domain — with people demanding immediate and perhaps illadvised action.

Digital Reputation Is Not Classical Reputation

Social networks borrow and redefine the metaphor of connecting to a friend to create an edge in a digital graph of inter-person relationships. Unfortunately, the term “friend” online has lost its real-life subtlety: if you are active on Facebook, you will likely have many “friends” that you are connected to but that you wouldn’t think of as close friends in real life. The same kind of oversimplification occurs when we attempt to quantify real-world socially defined reputation.
In designing reputation systems, we generally use numerical scores (ratings) together with small snippets of text (evaluations). But compared to real-world reputation, these computational models are almost too simple to merit even using the same word. Real-life reputation is subtle and dynamic, with personal and cultural elements. Most of the time, it is inherently nonnumerical. On the other hand, a digital reputation’s representation of value is typically numerical, globally defined for all, simple to calculate, and often publicly displayed—clumsy, in comparison.
It is the utility provided by digital reputations that makes them so worthwhile. Software uses digital reputations to sort and filter the unmanageably huge morass of content on the web so that a human can make the final selection: where to eat, what to watch, from whom to buy. It is that last interaction—with a choice made by a thinking human being—that imbues the reputation score with its real power.
As reputation systems increase in real-world influence, the importance of who keeps, calculates, and displays the scores will become a greater societal discussion. Online reputation systems have been deployed for more than a decade, and many of the challenges that reputation systems will face are already known. By articulating these weaknesses, systems designers and researchers can address them head on, or at least enter the fray with their eyes wide open.

Karma Is Hard

Though digital reputations for objects are often a poor imitation of the social reputations they attempt to mimic, karma is even more challenging. We have previously detailed a dozen ways in which karma differs from object reputation (Farmer and Glass 2010b).
Incentives are one example. Because karma is reputation for a user, increasing one’s karma score is often considered a strong incentive for participation and contributions on a site. This incentive can create trade-offs between the goals of the site and the desire for the rewards granted with high karma. We can see evidence of this desire, for instance, when collusion or fake user accounts are used by individuals to increase their karma. A related design challenge is finding a trade-off between rewards for a high quantity of participation instances versus rewards for providing high quality of contributions.

As a harbinger of possible future karma dilemmas, consider the legal and business debates that have evolved around credit—such as use of FICO and other credit scores— in the hiring process (as discussed later in this chapter and in chapter 6 of this volume). This practice has emerged because the classical job references reputation system has dried up as the result of litigation. When real-life reputation systems falter, it seems likely that more and more uses will be found for digital karma.

If Reputation Has Real-World Value, Then Methods Matter

The power of reputation systems to provide real, actionable information, crowdsourced from large numbers of people and clever algorithms, will only increase over time. Every top website is using reputation to improve its products and services, even if only internally to mitigate abuse. In short, reputation systems create real-world value. Value often translates to wealth, and where there is wealth, the methods for creating that wealth become optimized—often to the point of exploitation. Ever since people have trusted the written opinions of others, shady contractors have been generating false letters of recommendation to exploit the unwary.
The new wealth of digital reputation is created in large part by software. This software is usually written under extreme time pressure and without much design discipline, which leads to two challenges. First, reputation system bugs become corporate liabilities. Bad reputation model design or even simple coding errors can lead to inaccurate results that may significantly damage the value of an entity or business.
Second, reputation system abuse becomes lucrative. As a score increases in influence, the cost to manipulate that score is less than the value created. Spammers and hackers learned this more than fifteen years ago, and current reputation systems are already under constant attack. SEO (search engine optimization) is already a billiondollar business created solely to influence search ranking reputation. Likewise, public relations and marketing agencies may create buzz and social networking attention by creating façades of seemingly unbiased individuals.

Potential Solutions

Given the difficulties surrounding digital reputation and the problems of abuse and unreliable code, one wonders who would be brave enough to propose and build a nextgeneration reputation system. There are lessons from previous systems that can help eliminate—or at least mitigate—the risks.

Limit Reputation by Context (Especially Karma)

The FICO credit score is most appropriately used to represent the context named “creditworthiness” and is built exclusively out of inputs that are directly related to

the borrowing and repayment of money. The nature of the input matches the nature of the output. We typically run into trouble when the score is used for noncreditworthiness tests, such as preemployment screening. In fact, according to the National Consumer Law Center, using credit scores for preemployment screening is an unreliable indicator of productivity (H.R. 3149). It is also discriminatory, as certain minority groups statistically have lower FICO scores without necessarily being less productive. Likewise, a corporate executive’s excellent eBay buyer feedback has nothing to do with his or her credit score or with the number of experience points for his or her World of Warcraft character. Keep the contexts apart.

Tip: Don’t cross the streams. Good digital reputations should always be contextlimited—the nature of the inputs should constrain the use of the reputation scores that are output.

Focus on Positive Karma

Numerically scoring a person via karma can have a strong personal and emotional effect (see chapter 1 in this volume). Unlike object reputation, with karma it is a best practice to avoid publicly displayed negative scoring, direct evaluation of the person, and comparison of karma scores on leaderboards. Instead, build karma out of qualityrelated indirect inputs, such as scores for the person’s helpfulness or ratings of objects created by the person. Avoid direct rankings and comparisons, except for competitive contexts. Karma leaderboards can demoralize regular participants and discourage participation by new community members.
Though most object reputation is intended to be displayed publicly, some of the most useful karma scores are either corporate (used internally by the site operator to determine the best and worst users in a context) or private (displayed only to the user). Private karma can still be used as a very strong incentive: just ask any blogger how obsessively they check their blog’s visitor analytics.

Tip: Focus on multifaceted positive karma. Avoid direct rankings and comparison of people, except where competition is desired. Reserve negative karma for egregious cases of misconduct; even then, look for alternatives.

Focus on Quality over Quantity

When designing reputation systems for applications and sites, an easy method is often desired to motivate users to take some action. Usually that action is users creating content. The quickest model to build is in broad use across the web today: participation gives users points for taking actions—usually more points for actions that are more desirable. Sometimes the points are redeemable (a virtual currency); other times, they simply accumulate indefinitely. These points are often displayed as a karma score and are sometimes used in competitive forms such as leaderboards.

If these points are a currency or lead to special recognition such as privileged feature access, they often either (1) never catch on because of contextual irrelevance or (2) lead to abuse, as described earlier. Purely participation-based systems often end up in one of these two states, unless they continually evolve to become more relevant and mitigate abuse. A good example of this is when Digg abandoned its leaderboards because all of the top contributors were using bots and friends to manipulate their rank.


When digital karma increases real-world influence or monetary value, abuse will only accelerate. The reputation scores that have the most lasting and trusted real-world value are those based primarily on quality evaluations by other users.

Tip: The best web content karma scores are generated not from participation-based points but from quality evaluations of one’s contributions written by other users.

Quality evaluations provide increased protection against simple forms of reputation abuse, as it is more difficult to generate large volumes of fraudulent indirect feedback. eBay protects their seller feedback scores by limiting the scope of user ratings to a single sales transaction, not an overall evaluation of an ongoing history of business with a seller. Yelp is almost the opposite—the business may be evaluated historically, which makes it easier to manipulate. FICO’s creditworthiness score is also indirect karma— creditors report only specific transaction facts, not subjective and nonstandardized opinions of the consumer’s relationship overall. By tying karma to the quality evaluations of a person’s actions instead of rating the person directly, the score is more reliable and easier to interpret.

Mitigate Abuse through Metamoderation

As any reputation score increases in influence—especially if that influence has realworld side effects—the incentives to abuse the reputation model can grow to exceed the costs of manipulating the score, which leads to increased abuse and decreased utility value of the reputation score (and of the corresponding site or sites as a whole). The reputation score can eventually become untrustworthy.
If the community generating the content is sufficiently large—perhaps when moderation costs exceed a certain percentage of the operating budget, such as 5 percent—metamoderation reputation systems have been shown to be effective tools in cleaning up the worst content. The technical news site Slashdot uses such a metamoderation scheme to combat indirect input abuse by randomly selecting rated items for a second level of cross-check—in effect, “rating the raters” (as discussed in chapter 7).
In Farmer and Bryce (2010a), we detail a Yahoo! Answers case study in which an internal reputation system allowed automatically determined trustworthy users to instantly remove content from the site. This approach effectively and completely shut down the worst trolls. Response time to remove content that violated terms of service (TOS) fell from eighteen hours (using paid moderation) to thirty seconds (with reputation-based metamoderation) and saved $1 million per year.

Tip: If users are creating and evaluating your mission-critical content, consider using reputation-based moderation and metamoderation techniques to enable your community to cross-check your content, identify the best content, and deal with abuse.

New Solutions, New Problems

Future reputation system designers will hopefully apply narrow context to their data, ensure that publicly displayed karma is generated based on quality of actions and contributions, and mitigate abuse through reputation mechanisms such as metamoderation. But whenever a new technology comes into prominence, techno-optimists emerge who see it as a possible solution to contemporary social ills. So it should come as little surprise that many people have high hopes for the socially transformative power of digital reputation systems.
Though exciting possibilities, these trends require critical evaluation through the filter of experience. Important new problems to address are also likely. Lessons taken from the social evolution of other real-life institutions may apply to digital reputation systems as they increase in real-world influence.

The Naming of Names

This chapter’s numerous admonitions to limit reputation to appropriate contexts raise the question: what are these contexts? Certainly reputation contexts within a single website can be narrowly and 100 percent locally defined. eBay seller feedback is a good example: its name describes its inputs and purpose adequately. But outside the web, the most influential reputations aren’t built solely out of single-supplier input.
What about cross-site, cross-company, and other globally shared contexts such as creditworthiness? How will data be shared across various boundaries? (See chapters 16 and 17 in this volume.) Who, if anyone, will manage the taxonomy of reputation contexts?
Looking at the credit score industry, we can see a proven model: the reputation system operator (such as Fair Isaac Corporation, which created the FICO scores) creates the context, defines the model, and specifies the data format and API for inputs in exchange for sharing the reputation results. The creditors supply credit history information for their customers in exchange for the right to access the creditworthiness reputation scores of all others, which they use to optimize their business practices.
Is this ad hoc method of identifying reputation contexts sufficient? What are the possible problems? If cross-domain reputation contexts aren’t standardized in some fashion, it seems likely that consumer confusion will result. We see some early evidence of this when comparing the five-star ratings of various product and service sites.

For example, why are the reviews on Netflix so different from Yahoo! Movies? It seems likely that the answer is that the context of those who wish to decide what movie to see at the theater for $20–$50 is significantly different from the context of selecting a DVD, delivered in a few days as part of a $9.99 all-you-can-eat watch-at-home subscription. Other factors may include selection bias and known patterns of abuse on movie sites (in which some ad agencies post fake positive reviews of first-run films). Even though the formats of the reviews are virtually identical, merging these two databases while ignoring their contexts would produce confusing results.
Nonetheless, on the week it was released, Facebook’s global “Like” system became the most visible cross-site reputation system on the web. We can expect more companies to follow their lead in producing interfaces for integration of reputation systems into applications of all kinds, from websites to social games and mobile phones.
The questions remain: should there be an effort to shape or identify the taxonomy of shared reputation contexts? Should there be a set of suggested practices or even requirements for calculations associated with the important real-world impact of reputation scores—e.g., a set of branded guidelines that build consumer trust in these models? In short, do we need a “Good Housekeeping Seal of Approval” for reputation systems?

Privacy and Regulation

As online reputation’s influence in the real world increases, we will face problems of privacy and regulation. These issues have to date generally been deferred with web reputation systems—perhaps because the inputs were single-sourced and under the umbrella legal shield of the site’s TOS. Generally, allowing users to completely opt out of a service by deleting their account is seen as sufficient privacy protection, and thus far has limited the potential real-world legal exposure of the host site. In short, the TOS defines the context for inputs and reputation, to which the user agrees, and walking away from the data is the escape clause.
But we are already seeing this perceived corporate protection eroding on sites such as eBay, where some sellers’ primary income is threatened by the actions of the hosting company on behalf of certain buyers (Cardingham 2008). Even if the company thinks it is protected by its TOS and “safe-harbor” provisions, most reputation systems depend on the actions of other users to create their scores. These three-party transactions are complicated and confusing, and get worse when users share data.
Once inputs and scores cross domains, information privacy crops up. It’s not enough to opt out of a data source—users need to get their data corrected at (or entirely removed from) the data aggregators. Think about a major dispute with a specific credit card company; perhaps a card was stolen and used to rack up a large sum. After the creditors’ laborious flagging process, aggregators must be notified of disputed items and
must temporarily remove them from creditworthiness scores, but not all aggregators respond at the same rate or have the same policies. On the Internet, reputation spreads rapidly, and traditional simple binary notions of private versus public information can fail us. Existing legal codes do not grant the right to control the dissemination of this increasingly critical data; they need to be updated (Solove 2007).


One way to facilitate thoughtful government regulation of reputation systems is to establish an industry group to define best practices and conventions for managing the privacy and control of data shared in reputation systems. Industry self-policing may be in the best interests of all involved.

The Rise of Online Karma as Offline Power

As more reputation moves online, more karma systems will follow, providing users with incentives to make high-quality contributions to web content, to identify users that don’t comply with the rules, and to enhance contributors’ personal brand. Craig Newmark, founder of Craigslist.org, in the foreword to this book reflects the thinking of many reputation system optimists: the idea that public karma for users may partially displace other forms of political and economic influence.
This idea—that digital reputation, specifically karma, will increase to enable the intelligentsia to rise from the ashes and take their rightful place among the powerful—is appealing to technical people. The Platonic ideal of the Philosopher King has been with us for more than two millennia. Has its time finally come? Will reputation systems enable us to truly identify the people, products, and ideas that will best solve our problems, or even allow us to govern wisely? Science fiction authors have suggested this for many years (e.g., Card 1985; Stiegler 1999). How far can online reputation take us?
Conversations with proponents of this position suggest that “. . . we then just combine all the relevant reputations into an overall GoodCitizen or SmartPerson karma.” As inputs, they then suggest combining factors like credit score, large readership for social media messages and/or blog posts, or strong endorsements on LinkedIn.com.
The largest initial challenge of this model is that good reputation has limited context. Naïvely combining scores from diverse contexts makes the calculation about no context at all. Next, combining scores from multiple sources has the “weakest-link” problem: the security weaknesses or abuses of the least-safe contributor site damage the integrity of the karma as a whole.
Even if one could solve context and weakest-link problems, the basic problem remains that any global SmartPerson karma represents too simple a metric to be used to evaluate a person for a complex role. Such karma may represent traits like popularity or industriousness, but be insufficient to represent one’s capacity to lead a large group of fellow citizens. Modern democratic elections could be considered to be reputation systems with a binary result, isElected. This binary result is not very fine-grained karma and is highly correlated with the amount of money spent on a campaign.

As noted earlier, paying for higher digital reputation already happens with movie and business reviews. Likewise, individuals, businesses, and political parties try to purchase influence via SEO, advertising, and other methods.
If money can buy digital karma, the idea of karma displacing money as influence in politics is not realistic. What remains are different questions: can online karma (and object reputation) be a productive political force in real life? Can it improve the information we use to select our leaders, and bring more justice to our laws?
When reputation scores are limited to appropriately narrow contexts, they can serve an increasing role within those contexts. Being a creator or critical analyst of the web already plays an increasing role in world politics. Recognized web influencers (bloggers, CEOs, etc.) regularly appear before Congress and parliaments worldwide. California’s 2010 elections featured two high-tech CEOs nominated for senator and governor, and in 2009 the Pirate party won a seat in the Swedish parliament.
In the near term, especially given its contextual nature, it seems likely that digital karmas will have a political influence similar to that of traditional endorsements by interest groups such as trade associations and charitable foundations. Online, distributed versions of these organizations are already forming. For example, sellers on eBay have formed the Internet Merchants Association, and combined the leverage of their high seller feedback ratings with their aggregated funding to influence the company and related regulatory bodies.

Karma as Currency

What about the idea of converting karma into something you can spend, like money? A romantic notion: reputation given for good acts gets transformed into currency that you can later pay to others to reward their good acts.
Cory Doctorow, in his science fiction novella Down and Out in the Magic Kingdom (see chapter 18 in this volume), coined the term “Whuffie” to represent karma as a transferable currency: “Whuffie recaptured the true essence of money: in the old days, if you were broke but respected, you wouldn’t starve; contrariwise, if you were rich and hated, no sum could buy you security and peace” (Doctorow 2003, 10). A derivative of this idea has been implemented by the Whuffie Bank, which uses traffic at social media sites such as Twitter to model influence and create a score. Basically, if you get talked about, you gain Whuffie, and you transfer Whuffie when you talk about others. There is a separate mechanism to explicitly grant Whuffie to another user, presumably as a gift or a payment for some product or service rendered.
There are several problems with this model. First, it suffers from the same universal context problem previously defined, only worse: there is no clear way to reliably set an exchangeable numerical value based on user actions across contexts. If one decides to create a different currency for each context—say a science Whuffie and a
sports Whuffie and a political Whuffie—then an accounting and exchange nightmare is created. How much science Whuffie trades for a given amount of political Whuffie? Does the market float? It quickly becomes an untenable mess.


Second, this example turns popularity (something that is already reinforced on social networks, leaderboards, and search rankings) into a currency. By extension, the Whuffie Bank could make the pop star of the month the Philosopher King of the Internet. This problem is systemic to any use of karma as currency. It isn’t likely to export to real life because popularity and attention often aren’t accurate measures of intelligence, trustworthiness, political savvy, technical training, or anything else that might be useful.
Third, it is unclear whether and how reputation can fulfill the traditional currency function of being a “store of value.” As described by Doctorow, any loss of respect would cause a speedy corresponding loss of Whuffie—suggesting a continuous perperson currency revaluation. At Internet speeds, a single scandal could wipe you out in a matter of hours. Karma seems far too fragile to become a significant currency.
In short, Whuffie—global reputation as currency—crashes on the rocks of complexity. The universal context problem suggests that there can be only a few truly global currencies. However, there may be scope for experimenting with reputation as local currency.

Overcoming Challenges Together

Though many reputation contexts will be limited to a single vendor or site, some providers will want to combine scores across all available sources, such as IP address blacklists for email. (Chapter 17 refers to these contexts as constrained reputations and universal reputations, respectively.)
The taxonomic, privacy, and regulatory challenges facing future digital reputation systems have already been articulated in this volume. How do we minimize the duplicated effort in technology development, policy and taxonomy design, industry standards, reputation modeling, and user interface best practices?
Either we continue reputation system development ad hoc, letting large corporations establish single-vendor-centric de facto standards and practices for cross-context (“universal”) reputation, or we use another approach: open standards and open software.
Probably the greatest contributions to the adoption of HTTP and HTML as the backbone data standards for the web were two open source projects: the Mozilla web browser and the Apache web server. These applications provided the stable frameworks required for others to build a new class of Internet software. Though these applications weren’t bug-free, they were the focal point of the effort to produce a reliable system.
In an attempt to provide an open, stable, and common infrastructure for reputation systems development, I have started—with the help of many people—the Open
Reputation Framework project. Our hope is to make the Open Reputation Framework a home for reputation platform implementations, freely released intellectual property, and resources for modeling reputation systems, including a toolkit based on the reputation grammar described in Farmer and Glass (2010a). Online forums at this site—or others like it—as well as broader societal discussions offline are needed to foster the evolution of best practices for privacy, regulation, reputation taxonomy, and related issues. Open and well-lit places for discussion may be a prerequisite to guiding the development of reputation systems in a positive direction.

References

Card, O. S. 1985. Ender’s game. New York: Tor Books.

Cardingham, C. 2008, October 24. Man sued for leaving negative feedback on eBay. Retrieved from: <http://www.money.co.uk/article/1001771-man-sued-for-leaving-negative-feedback-on-ebay.htm>.

Doctorow, C. 2003. Down and out in the Magic Kingdom. New York: Tor Books.

Farmer, R., and B. Glass. 2010a. Building web reputation systems. Sebastopol, CA: O’Reilly.

Farmer, R., and B. Glass. 2010b. On karma: Top-line lessons on user reputation design [web log post.]. Building reputation systems: The blog. Retrieved from: <http://buildingreputation.com/ writings/2010/02/on_karma.html>.

Hawk, T. 2005, November 29. PriceRitePhoto: Abusive bait and switch camera store [web log post]. Thomas Hawk’s digital connection. Retrieved from: <http://thomashawk.com/2005/11/ priceritephoto-abusive-bait-and-switch-camera-store.html>.

H.R. 3149. 2010. Equal Employment for All Act. 111d Cong. Retrieved from: <http://www.open congress.org/bill/111-h3149/show>.

Solove, D. J. 2007. The future of reputation: Gossip, rumor, and privacy on the Internet. New Haven: Yale University Press.

Stiegler, M. 1999. Earthweb. Riverdale, NY: Baen.

March 19, 2014

LinkedIn's Scarlet Letter - Social Media Clarity Podcast

LinkedIn's Scarlet Letter - Episode 14

image

Marc, Scott, and Randy discuss LinkIn's so-called SWAM (Site Wide Automatic Moderation) policy and Scott provides some tips on moderation system design...

[There is no news this week in order to dig a little deeper into the nature of moderating the commons (aka groups).]

Additional Links:

Transcript

John Marc Troyer: Hi, this is John Mark Troyer from VMware, and I'm listening to the Social Media Clarity podcast.

Randy: Welcome to episode 14 of the Social Media Clarity podcast. I'm Randy Farmer.

Scott: I'm Scott Moore.

Marc: I'm Marc Smith.

Marc: Increasingly, we're living our lives on social-media platforms in the cloud, and in order to protect themselves, these services are deploying moderation systems, regulations, tools to control spammers and abusive language. These tools are important, but sometimes the design of these tools have unintended consequences. We're going to explore today some choices made by the people at LinkedIn in their Site Wide Automatic Moderation system known as SWAM. The details of this service are interesting, and they have remarkable consequences, so we're going to dig into it as an example of the kinds of choices and services that are already cropping up on all sorts of sites, but this one's particularly interesting because the consequence of losing access to LinkedIn could be quite serious. It's a very professional site.


Scott: SWAM is the unofficial acronym for Site Wide Automated Moderation, and it's been active on LinkedIn for about a year now. Its intent is to reduce spam and other kinds of harassment in LinkedIn groups. It's triggered by a group owner or a group moderator removing the member or blocking the member from the group. The impact that it has is that it becomes site wide. If somebody is blocked in one group, then they are put into what's called moderation in all groups. That means that your posts do not automatically show up when you post, but they go into a moderation queue and have to be approved before the rest of the group can see them.

Randy: Just so I'm clear, being flagged in one group means that none of your posts will appear in any other LinkedIn group without explicit approval from the moderator. Is that correct?

Scott: That's true. Without the explicit approval of the group that you're posting to, your posts will not be seen.

Randy: That's interesting. This reminds me of the Scarlet Letter from American Puritan history. When someone was accused of a crime, specifically adultery, they would be branded so that everyone could tell. Regardless of whether or not they were a party to the adultery, a victim, you were cast out, and this puts a kind of cast-out mechanism, but unlike then, which was an explicit action that the community all knew about, a moderator on a LinkedIn group could do this by accident.

Scott: From a Forbes article in February, someone related the story that they had joined a LinkedIn group that was for women, and despite it having a male group owner and not explicitly stating that the group was for women only. The practice was that if men joined the group and posted, the owner would simply flag the post just as a way of keeping it to being a woman-only group. Well, this has the impact that simply because the rules were not clear and the behavior was not explicit, then this person was basically put into moderation for making pretty much an honest mistake.

Randy: And this person was a member of multiple groups and now their posts would no longer automatically appear. In fact, there's no way to globally turn this off, to undo the damage that was done, so now we have a Scarlet Letter and a non-existent appeals process, and this is all presumably to prevent spam.

Scott: Yeah, supposedly.

Randy: So it has been a year. Has there been any response to the outcry? Have there been any changes?

Scott: Yes. It seems that LinkedIn is taking a review. They've made a few minor changes. The first notable one is that moderation is temporary, so it can last a undetermined amount of time up to a few weeks. The second one is that it seems that they've actually expanded how you can get flagged to include any post, contribution, comments that are marked as spam or flagged as not being relevant to the group.

Randy: That's pretty amazing. First of all, shortening the time frame doesn't really do anything. You're still stuck with a Scarlet Letter, only it fades over months.

Marc: So there's a tension here. System administrators want to create code that essentially is a form of law. They want to legislate a certain kind of behavior, and they want to reduce the cost of people who violate that behavior, and that seems sensible. I think what we're exploring here is unintended consequences and the fact that the design of these systems seem to lack some of the features that previous physical world or legal relationships have had, that you get to know something about your accuser. You get to see some of the evidence against you. You get to appeal. All of these are expensive, and I note that LinkedIn will not tell you who or which group caused you to fall into the moderation status. They feel that there are privacy considerations there. It is a very different legal regime, and it's being imposed in code.

Randy: Yes. What's really a shame is they are trying to innovate here, where in fact there are best practices that avoid these problems. The first order of best practice is to evaluate content, not users. What they should be focusing on is spam detection and behavior modification. Banning or placing into moderation, what they're doing, does neither. It certainly catches a certain class of spammer, but, in fact, the spam itself gets caught by the reporting. Suspending someone automatically from the group they're in or putting them into auto-moderation for that group if they're a spammer should work fine.

Also, doing traffic analysis on this happening in multiple groups in a short period of time is a great way to identify a spammer and to deal with them, but what you don't need to do is involve volunteer moderators in cleaning up the exceptions. They can still get rid of the spammers without involving moderators handling the appeals because, in effect, there is an appeals process. You appeal to every single other group you're in, which is really absurd because you've not done anything wrong there - you may be a heavy contributor there. We've done this numerous places: I've mentioned before on the podcast my book Building Web Reputation Systems. Chapter 10 describes how we eliminated spam from Yahoo Groups without banning anyone.

Marc: I would point us to the work of Elinor Ostrom, an economist and social theorist, who explored the ways that groups of people can manage each other's behavior without necessarily imposing draconian rules. Interestingly, she came up with eight basic rules for managing the commons, which I think is a good metaphor for what these LinkedIn discussion groups are.

  1. One is that there is a need to "Define clear group boundaries." You have to know who's in the group and who's not in the group. In this regard, services like LinkedIn work very well. It's very clear that you are either a member or not a member.
  2. Rule number two, "Match rules governing use of common goods to local needs and conditions." Well, we've just violated that one. What didn't get customized to each group is how they use the ban hammer. What I think is happening that comes up in the stories where you realize somebody has been caught in the gears of this mechanism is that people have different understandings of the meaning of the ban hammer. Some of them are just trying to sweep out what they think of as just old content, and what they've just done is smeared a dozen people with a tar that will follow them around LinkedIn.
  3. Three is that people should "Ensure that those affected by the rules can participate in modifying the rules." I agree that people have a culture in these groups, and they can modify the rules of that culture, but they aren't being given the options to tune how the mechanisms are applied and what the consequences of those mechanisms are. What if I want to apply the ban hammer and not have it ripple out to all the other groups you're a member of?

    Randy: Well, and that's section four.

  4. Marc: Indeed, which reads, "Make sure the rule-making rights of community members are respected by outside authorities." There should be a kind of federal system in which group managers and group members choose which set of rules they want to live under, but interestingly,
  5. number five really speaks to the issue at hand. "Develop a system carried out by community members for monitoring members' behavior."

    Randy: I would even refine that a little bit online, which is to not only monitor, but to help shape members' behavior so that people are helping people conform to their community.

  6. Marc: Indeed, because this really ties into the next one, which may be the real problem here at the core. "Use graduated sanctions for rule violators." That seems not to be in effect here with the LinkedIn system. You can make a small mistake in one place and essentially have the maximal penalty applied to you. I'm going to suggest that number seven also underscores your larger theme, which is about shaping behavior rather than canceling out behavior.
  7. Number seven is, "Provide accessible low-cost means for dispute resolution", which is to say bring the violators back into the fold. Don't just lock them up and shun them.

    Randy: Specifically on dispute resolution, which includes an appeals process, for Yahoo Answers, we implemented one which was almost 100% reliable in discovering who a spammer was. If someone had a post hidden, an email would be sent to the registered email address saying, "Your post has been hidden," and takes you through the process for potentially appealing. Now, what was interesting is if the email arrived at a real human being, it was an opportunity to help them improve their behavior. If they could edit, they could repost.

    For example, this is what we do at Discourse.org if you get one of these warnings. You are actually allowed to edit the offensive post and repost it with no penalties. The idea is to improve the quality of the interaction. It turns out that all spammers, to a first approximation on Yahoo Answers, had bogus email addresses, so the appeal would never be processed and the object would stay hidden.

  8. Well, I'm going to do number eight, and eight says, "Build responsibility for governing the common resource in nested tiers from the lowest level up to the entire interconnected system." It doesn't say let the entire interconnected system have one rule that binds them all.

    Randy: And it also says from the bottom up. I actually approve of users marking postings as spam and having that content hidden and moving some reputation around. Where we run into trouble is when that signal is amplified by moving it up the interconnected system and then re-propagated across the system. The only people who have to know whether or not someone's a spammer is the company LinkedIn. No other moderator needs to know. Either the content is good or it's not good.

Marc: Elinor Ostrom's work is really exciting, and she certainly deserved the Nobel Prize for it because she really is the empirical answer to that belief that anything that is owned by all is valued by none. That's a phrase that leads people to dismiss the idea of a commons, to believe that it's not possible to ethically and efficiently steward something that's actually open, public, a common resource, and of course, the internet is filled with these common resources. Wikipedia is a common resource. A message board is a common resource.

Like the commonses that Ostrom studied, a lot of them are subject to abuse, but what Ostrom found was that there were institutions that made certain kinds of commons relationships more resilient in the face of abuse, and she enumerated eight of them. I think the real message is that, given an opportunity, people can collectively manage valuable resources and give themselves better resources as a result by effectively managing the inevitable deviance, the marginal cases where people are trying to make trouble, but most people are good.


Scott: Your tips for this episode are aimed at community designers and developers who are building platforms that allow users to form their own groups.

  1. First, push the power down - empower local control and keep the consequences local.
  2. Give group owners the freedom to establish and enforce their own rules for civil discourse.
  3. You will still be able to keep content and behavior within your service's overall terms of use and allow a diversity of culture within different groups.
  4. If, as a service, you detect broader patterns of (content or user) behavior, you can take additional action. But respect that different groups may prefer different behaviors, so be careful to not allow one or even a small set of groups dictate consequences that impact all other groups.
  5. Now that we are giving local control, be sure to allow groups to moderate content separately from moderating members.
  6. As often as not, good members sometimes misstep and make bad posts. Especially, if they are new to a group.
  7. Punishing someone outright can cost communities future valuable members.
  8. By separating content from members, the offending content can be dealt with and the member help to fit the local norms.
  9. Ask community managers and you will hear stories of a member who started off on the wrong foot and eventually became a valued member of their community. This is common. Help group moderators avoid punishing people who make honest mistakes.
  10. When it comes to dispute resolution between members and group moderators. One way to make it easy is to mitigate the potential dispute in the first place.
  11. Make it easy for moderators to set behavior expectations by posting their local rules and guidelines and build in space in your design where local rules can be easily accessed by members.
  12. Also give group owners the option of requiring an agreement to the local rules before a member is allowed to join the group.
  13. AND Make it easy to contact moderators before a member posts and encourage them to ask about posts before even posting.
  14. NOW If the group platform offers a moderation queue, give clear notifications to moderators about pending posts so reviewing the queue is easier to include in their work-flow. Because moderating communities does have a work-flow.
  15. And finally, build a community of group owners and moderators -- and LISTEN to them as they make recommendations and request tools that help them foster their own local communities. The more you help them build successful communities, the more successful your service or platform will be.

Randy: That was a great discussion. We'd like the people at LinkedIn to know that we're all available as consultants if you need help with any of these problems.

Marc: Yeah, we'll fix that for you.

Randy: We'll sign off for now. Catch you guys later. Bye.

Scott: Good-bye.

Marc: Bye-bye.

[Please make comments over on the podcast's episode page.]

February 21, 2014

Five Questions for Selecting an Online Community Platform

Today, we're proud to announce a project that's been in the works for a while: A collaboration with Community Pioneer F. Randall Farmer to produce this exclusive white paper - "Five Questions for Selecting an Online Community Platform." 
Randy is co-host of the Social Media Clarity podcast, a prolific social media innovator, and literally co-wrote the book on Building Web Reputation Systems. We were very excited to bring him on board for this much needed project. While there are numerous books, blogs, and white papers out there to help Community Managers grow and manage their communities, there's no true guide to how to pick the right kind of platform for your community. In this white paper, Randy has developed five key questions that can help determine what platform suits your community best. This platform agnostic guide covers top level content permissions, contributor identity, community size, costs, and infrastructure. It truly is the first guide of its kind and we're delighted to share it with you.
Go to the Cultivating Community post to get the paper.

October 01, 2013

Social Networks, Identity, Psudonyms, & Influence Podcast Episodes

Here are the first 4 episodes of The Social Media Clarity Podcast:

  1. Social Network: What is it, and where do I get one? (mp3) 26 Aug 2013
  2. HuffPo, Identity, and Abuse (mp3) 5 Sep 2013  NEW
  3. Save our Pseudonyms! (Guest: Dr. Bernie Hogan) (mp3) 16 Sep 2013  NEW
  4. Influence is a Graph (mp3) 30 Sep 2013  NEW
Subscribe via iTunes

Subscribe via RSS

Listen on Stitcher

Like us on Facebook

August 26, 2013

Follow Us Over to the Social Media Clarity Podcast

We're gettin' the band back together! Your friendly BWRS authors are reunited on a brand new podcast, aimed at designers, product managers and producers of social platforms and products.

Social Media Clarity will be a regular podcast: "15 minutes of concentrated analysis and advice about social media in platform and product design." Joining us is Marc Smith.

We're all really pleased with how the first episode has turned out. We discuss:

  • Rumors that FB will soon start throttling OpenGraph and API usage for 3d parties
  • A round-table discussion: does my product need its own social networking capabilities?
  • A practical tip at the end, an intro to NodeXL

Check it out, won't you? You can subscribe to the series (soon) through iTunes, or now at socialmediaclarity.net.

January 24, 2011

A Review for programmers

A review aimed at engineers just went up over at i-programmer.info

Building Web Reputation Systems
Author: Randy Farmer and Bryce Glass
Publisher: O'Reilly, 2010
Pages: 336
ISBN: 978-0596159795
Aimed at: Web designers and developers who want to incorporate feedback
Rating: 4
Pros: Valuable advice based on real experience
Cons: Could be improved by a different order of chapters
Reviewed by: Lucy Black

...The book concludes with a real-life case study based on Yahoo! Answers Community Content Moderation. This makes interesting reading and gives a context for what has gone before. It left me wondering whether I might have got more from the rest of the book had I read it first - but of course with this type of book you wont just read once and set aside. You'll refer to it for help as the need arises - and there is an index that will help you locate specific information.

At the end of the day I realised I'd gleaned a lot of useful and practical advice but it would have been an easier experience with just a little reorganisation of the material.

January 13, 2011

New Book Review of Building Web Reputation Systems

Architecture, SOA, BPM, EAI, Cloud has a review of Building Web Reputation Systems...

"...Book is light read but certainly deserve an attentive read and particularly from product designers and who ever involved in product conceptualization..."

It also contains a great set of book related links...

November 16, 2010

Quora:What lessons of Social Web do you wish had been better integrated into Yahoo?

On Quora, an anonymous user asked me the following question:

In hindsight, what lessons have you learned from the Social Web that you wish you had been more successful at integrating into Yahoo before you were let go?

I considered this question at length when composing this reply - this is probably the most thought-provoking question I've been asked to publicly address in months.

If you read any of my blog posts (or my recent book), you already know that I've got a lot of opinions about how the Social Web works: I rant often about identity, reputation, karma, community management, social application design, and business models.

I did these same things during my time for and at Yahoo!

We invented/improved user-status sharing (what later became known as Facebook Newsfeeds) when we created Yahoo! 360° [Despite Facebook's recently granted patent, we have prior art in the form of an earlier patent application and the evidence of an earlier public implementation.]

But 360 was prematurely abandoned in favor of a doomed-from-the-start experiment called Yahoo!Mash. It failed out of the gate because the idea was driven not by research, but personality. But we had hope in the form of the Yahoo! Open Strategy, which promised a new profile full of social media features, deeply integrated with other social sites from the very beginning. After a year of development - Surprise! - Yahoo! flubbed that implementation as well. In four attempts (Profiles, 360, Mash, YOS) they'd only had one marginal success (360), which they sabotaged several times by telling users over and over that the service was being shut down and replaced with inferior functionality. Game over for profiles.

We created a reputation platform and deployed successful reputation models in various places on Yahoo! to decrease operational costs and to identify the best content for search results and to be featured on property home pages [See: The Building Web Reputation Systems Wiki and search for Yahoo to read more.]

The process of integrating with the reputation platform required product management support, but almost immediately after my departure the platform was shipped off to Bangalore to be sunsetted. Ironically, since then the folks at Yahoo! are thinking about building a new reputation platform - since reputation is obviously important, and everyone from the original team has either left, been laid off, or moved on to other teams. Again, this will be the fourth implementation of a reputation platform...

Are you sensing a pattern yet?

Then there's identity. The tripartite identity model I've blogged about was developed while at Yahoo an attempt to explain why it is brain-dead to ask users to reveal their IM name, their email address, and half their login credentials to spammers in order to leave a review of a hotel.

Again we built a massively scalable identity service platform to allow users to be seen as their nickname, age, and location instead of their YID. And again, Yahoo! failed to deploy properly. Despite a cross-company VP-level mandate, each individual business unit silo dragged their heels in doing the (non-trivial, but important and relatively easy) work of integrating the platform. Those BUs knew the truth of Yahoo! - if you delay long enough, any platform change will lose its support when the driving folks leave or are reassigned. So - most properties on Yahoo! are still displaying YIDs and getting up to 90% fewer user contributions as a result.

That's what I learned: Yahoo! can't innovate in Social Media. It has a long history in this, from Yahoo! Groups, which during my tenure had three separate web 2.0 re-designs, with each tossed on the floor in favor of cheap and easy (and useless) integrations (like with Yahoo! Answers) to Flickr, Upcoming, and Delicious. I'm sad to say, Yahoo! seems incapable of reprogramming its DNA, despite regular infusions of new blood. Each attempt ends in either an immune-response (Flickr has its own offices, and a fairly well known disdain for Sunnyvale) or assimilation and decreasing relevance (HotJobs, Personals, Groups, etc.).

So, in the end, I find I can't answer the question. I was one of many people who tried to drive home the lessons of the social web for the entire time I was there. YOS (of which I helped spec in fall 2007) was the last attempt to reshape the company to be social through and through. But, it was a lost cause - the very structure of the environment is personality driven. When those personalities leave, their projects immediately get transferred to Bangalore for end-of-life support, just as much of YOS has been...

I don't know what Yahoo! is anymore, but I know it isn't inventing the future of social anything.

[As I sat through this years F8 developers conference, and listen to Mark Z describe 95% of the YOS design, almost 3 years later, I knew I'd have to write this missive one day. So thanks for the prodding , Anonymous @ Quora]

Randy Farmer
Social Media Consultant, MSB Associates
Former Community Strategy Analyst for Yahoo!

[Please direct comments to Quora]

October 12, 2010

First! Randy to be the kickoff guest for new Community Chat podcast series.

Bill Johnston and Thomas Knolls are launching a new live podcast series: Community Chat on talkshoe.

I am so honored to be the lead-off guest on their inagural episode (Wednesday 10-13-10):



The kickoff episode of Community Chat! [We] will be discussing the premise of the Community Chat podcast with special guest Randy Farmer. Will also be getting a preview of Blog World Expo from Check Hemann.

I'll be talking with them about online community issues developers and operators all share in common - well, as much as I can in 10 minutes. :-) Click on the widget above to go there - it will be recorded for those who missed it live...

UPDATE The widget is now has an option to play back the session. Just choose "Kickoff" and press play. :-)

September 29, 2010

BWRS on Kindle Web - Try before you buy!

You can now read the Kindle edition of Building Web Reputation Systems on the web (search, print, etc.) and it is much cheaper than the paper version. Here's the free sample: