chapter3

Execution environments for reputation (SUMMARY)

Static vs. Dynamic reputation models

There are significant trade-offs in the domain of performance and accuracy when considering how to record, calculate, store, and retrieve reputation events and scores. Some static models have scores that need to be continuous: as accurate as possible in time (such as Spammer Reputation for industrial scale email providers.) Others can be calculated in batch-mode, because a large amount of data will be consulted for each score calculation.

Dynamic models consider either variable contexts: Where the data considered for each calculation is constrained differently every iteration (as seen in various Facebook Apps, such as Zynga's popular Texas HoldEmPoker, which have a friends-only leader board), or where the data calculations effect each other in a non-linear way, such as search relevance calculations like Google's Page-Rank. Recommender Systems, mentioned in the last chapter, are also Dynamic models - typically a large portion of the data set is considered to put every element in a multi-dimensional space for for nearest-neighbor determination - “People like you also bought…”

Static: The Yahoo! Reputation Platform

Yahoo! built a unified Reputation Platform for executing all of its new static reputation models. It was highly scalable; asynchronous and “optimistic” (it never waited for any computation to complete or on any long database reads.) The data was normalized; therefore, it could be shared between different Yahoo! properties (such as Travel and Local, who shared hotel and restaurant reviews.)

This design meant that accurate, up-to-the-second data was available to any reading applications at incredibly high rates—high enough to be used to check the spammer reputation for every IP address for email messages entering yahoo.com (in real time!)

This chapter will describe the operating parameters of such an environment, for those who will need to build their own. Many of the models presented here were actually developed for and proven on this platform. Several are currently operational, and the lessons learned are covered in some detail.

Dynamic: Reputation Within Social Networks

The downside of the Yahoo! reputation platform's static design is that it didn't inherently support social-network limited reputation, such as “most popular amongst your friends.” All rollups were global across a context, this is because - dynamic systems are generally much more compute and data-intensive - this makes them slow.

A concrete example of this class of scaling problem is manifested in the social media site Twitter.com, a highly popular but unreliable messaging service that failed so often in it's first year because of how difficult it is to scale a database to support a custom view of an entire event corpus for every user.

When possible, find ways to simplify your model, either by adding more specific contexts or by reducing dimensions through some clever math. This book won't cover the many forms of dynamic systems in any depth, as many of the algorithms are well covered in academic literature. See the after matter for pointers to various document archives.

← A Grammar for Reputation

Building Blocks and Reputation Tips →