Brian Fishman and Lucia Stacey Harris

Measuring Trust and Safety

Brian Fishman and Lucia Stacey Harris

Measuring the success (and limitations) of a Trust & Safety program is a complex process, particularly because many Trust & Safety departments are considered cost centers. As a result, these teams are ripe for organizational scrutiny. 

In addition to organizational scrutiny, larger platforms must also contend with the court of public opinion. Increased corporate investment in trust & safety is regularly catalyzed not simply by internal champions, but by the general public, activist groups, politicians and journalists that anecdotally identify and publicize salacious content, causing brand risk. Such pressure is a double-edged sword for Trust & Safety leaders. 

On the one hand, any factor that drives resources may be valuable, but anecdote-driven pressure often leads to inefficient and ineffective investments and makes measuring departmental “success” extremely difficult as Trust & Safety is inevitably imperfect and social listening metrics are often influenced by variables outside of a Trust & Safety team’s control.

Regulators are trying to fill the gaps created by anecdotal assessment of platform Trust & Safety, but many details remain ill-defined, even for complex regimes like the Digital Services Act (DSA). At Cinder, we fundamentally think that regulatory compliance is necessary for companies, but it’s insufficient for building excellent Trust & Safety efforts. We invert the thinking: if you build a first-class Trust & Safety infrastructure, compliance falls out of it relatively easily. 

So, what and how should companies measure to evaluate Trust & Safety programs? And, if they can’t measure everything (which they often cannot), where should they start? The answer to that question likely varies by company. For our purposes here we’re going to focus on the value good measurement offers a Trust & Safety leader pushing for increased resources. We break down Trust & Safety metrics into four categories: Risk, Outputs, Efficiency, and Business Outcomes (ROEBO). Leaders at different levels of a company’s Trust & Safety infrastructure are likely to care about these measures for varying reasons and with different levels of intensity. But all are important for positioning a Trust & Safety team to protect or increase the resources required to keep users safe. 

Measuring Risk

All platforms are subject to abuse; in this context, measuring risk is an assessment of how much that harm is manifesting. 

Trust & Safety risk manifests in various ways. For the sake of simplicity, let's call those categories ‘general,’ ‘acute,’ and ‘business.’ 

  • General risk refers to the overall level of violative material (content that breaks a platform’s rules) on your platform
  • Acute risk refers to the danger of truly terrible real-world harm being facilitated on your platform
  • Business risk refers to compliance failures, bad PR, and other advertiser churn 

These risks are not the same. 

In Trust & Safety, the gold standard for measuring general risk is ‘prevalence.
’ Prevalence is the effort to sample material on the platform to estimate the amount and pervasiveness of violative material. Sometimes platforms estimate prevalence as a function of violating material itself and in other cases as a function of how often users actually view that material. The latter is trickier but better accounts for efforts to downrank low-quality content or material deemed likely to violate. There are innumerable nuances in measuring prevalence, but the most basic takeaway is that it requires labeling a statistically significant random sampling of platform content, rather than relying on the count of enforcement decisions themselves. 

Measuring acute risk is extremely difficult because such harms are usually very rare.
In fact, it is so rare, as a percentage of all content on a platform, that it may not show up at all in random samples of digital content even if they are very large. Additionally, acute risk also often manifests in private spaces, so external arbiters such as activists and the media are generally a poor proxy for surfacing this kind of risk. When they do identify such risks, those instances are inherently anecdotal. Bottom line is that there are no strong leading measures to assess this acute risk. Sometimes you can look at references to your platform elsewhere online to estimate acute risk, and you can look to past incidents as a proxy for future challenges. But general risk does not always correlate with acute risk, so part of what Trust & Safety leaders must do is prepare to respond quickly to mitigate the severity of an acute incident that does occur. 

Business risk refers to problems that negatively impact the overall business
. This includes a wide range of dangers: charge backs that reduce revenue, regulatory enforcement fines, decreased retention, and reputational challenges that damage brand.  Almost by definition, business risk impacts the bottom line of a company. But, business risk does not necessarily correlate with general risk or, in particular, acute risk. A platform cluttered with spam might drive away well-intentioned users (and the advertisers that come with them) but pose relatively little risk for real-world harm. Likewise, media attention may focus on anecdotal or embarrassing missteps, not the issues most impactful to users or those most likely to result in real world harm. Depending on the form of the Trust & Safety linked business risk, it can be measured in lost dollars, user surveys, and red-team like assessments that replicate the kind of investigation from media or activists. 

Measuring Outputs

We conceptualize Trust & Safety “outputs” as the actions that result from your Trust & Safety efforts. This means, for example, the number of accounts or pieces of content removed or downranked in various categories, or the number of law enforcement requests handled. Whatever the specific outcome, these sorts of outputs are measured based on the decisions that you make as a Trust & Safety enterprise.

The challenge to measuring outputs is ensuring that every Trust & Safety decision is made within a coherent process and data structure, logged appropriately, and updated correctly in the face of user appeals, automation, automation missteps and the like. 

Any discussion of such outputs will raise questions about how they are produced. Were they a function of manual human review or automation? Platforms often want to explain how they surfaced the violating material that they eventually took action on, as this can be used to refute received wisdom that they do not proactively attempt to identify digital harms. Such measures admittedly straddle the line between output and ‘efficiency’ categories described below. Such measures undoubtedly speak to the internal functioning of a Trust & Safety system, but they are often designed to be disclosed alongside output metrics and are rarely granular enough to inform internal process changes. 

Output metrics are often the central features of platform transparency reports and they are core to some regulatory disclosure regimes, such as the Digital Safety Act’s ‘Statement of Reasons’ database. 

Measuring Efficiency

We think about efficiency metrics as internal indicators of the health of a Trust & Safety enterprise. There are too many to be listed here. But this includes measurements of reviewer performance and health: average handle time, time to resolution, cost per review, throughput, etc. It also includes things like classifier performance, including both precision (the rate of false positives) and recall (the rate of false negatives). If the amount of content removed for various policy violations are output metrics; the rate of false positives per policy is an efficiency metric. 

Efficiency metrics are sometimes used to drive accountability for employees or contractors, but they also can be used to identify resource gaps. If a team of contract reviewers in Morocco operates significantly more slowly or less accurately than a team of contractors reviewing content in India there may be structural issues to better understand. Likewise, efficiency metrics reveal limitations of the entire Trust & Safety structure. If reviews in Polish take significantly longer than those in Spanish it may indicate poorly calibrated automation, staffing challenges, or training failures. 

At Cinder, we have encountered an interesting challenge in illustrating the impact of a modern Trust & Safety tooling stack: many platforms simply are not positioned now to understand the efficiency or effectiveness of their Trust & Safety practices. There is no baseline from which to demonstrate improvement. Trust & Safety professionals generally understand the value of efficiency metrics, but they often struggle to get the resources necessary to measure them and face structural challenges measuring such numbers. Federated Trust & Safety infrastructure that, for example, requires reviewers to make decisions on one platform, manage user communication in another, and investigate the alleged violation in a third is very common and essentially makes measuring critical KPIs like average handle time impossible at any scale. 

Measuring Business Outcomes

Business outcomes reflect the impact of Trust & Safety risks, outputs, and efficiency on a company’s bottom line metrics. As with some kinds of risk, measuring outcomes can be extremely difficult. This is particularly true when Trust & Safety leaders aim to identify business cases for their work. Leaders that are used to thinking about risk in terms of customer dangers or real-world harm are often asked to quantify outcomes in terms of dollars saved or produced. 

Translating the visceral human risks that Trust & Safety teams address into quantifiable metrics may be frustrating, counterintuitive, or even unseemly to some Trust & Safety professionals. Nonetheless, this is a critical mode of thinking for senior leaders - and the good news is that they may not need to build out those calculations on their own. One common partner is a business insights team, who can support understanding the business impact of general risk and acute risk using business metrics Trust & Safety leaders may not have. Work with those teams to understand how investments in tooling infrastructure that save dollars on BPO spend impact the business as a whole. Further, engage your marketing, customer experience, or business development teams to understand perception of your platform and advertisers’ or users willingness to spend if the platform reflects a brand risk. 

Your internal partners are likely trying to quantify the value of other user experiences on retention, spending, and other experiences. They may not understand that exposure to nasty material or that responsive content moderation may be a key business proposition. Engage these people, but come in curious; they may not understand your world and you probably do not fully understand theirs. In our experience, there is probably a win-win for your collaboration, but it may take a few conversations specific to your circumstance to find it.


Measuring Trust & Safety ROEBO is not a nice-to-have. Regulation increasingly requires companies to invest in measuring outputs and risk. But many companies have an opportunity to increase efficiency and accuracy, reduce contractor costs, and better understand market dynamics by investing in process metrics. Here are some places to start:

Decide What to Measure and Why

The most important factor in determining what to measure is the goal or business outcome you want to inform. Want to improve reviewer efficiency? Better measure Average Handle Time (AHT) and throughput. Want to assess user exposure to Russian propaganda regarding Ukraine to inform discussion with European regulators? Better measure the prevalence of such propaganda and the impact of your efforts to address it. The first step to establishing an effective Trust & Safety metrics infrastructure is both the simplest and the most difficult: deciding what you hope to achieve by doing this in the first place.

Determine Your Units of Measurement

Settle on the right units of measurement. This will be harder than it sounds. Think hard about the surfaces of your product that are most important - whether you want to assess accounts disabled, user bios obscured, pieces of content removed, or harmful views restricted (among other options). Content removals are easily understood but might not capture enforcement decisions that neutralize harmful content but do not result in an account disable; obscuring a user bio raises difficult questions about how to account for entities that include multiple pieces of media and text that might addressed independently; the third is harder for users to understand and requires more assumptions that can be challenged and rejected. 

Determine Granularity

Think hard about granularity and the pipelines of information required to produce it. There are obvious conceptual benefits to cutting your data by geography, language, and internal team but make sure that you trust the lineage and internal tracking of all such data. Can you determine the location of your users with precision? Do you trust your automated language detection tools? Are you managing reviewer teams precisely? If your teams use multiple tools to make a single Trust & Safety decision, can you accurately measure Average Handle Time? The old adage still holds: garbage in, garbage out. And that reality tends to be more acute the more granularly you cut your data. 

Plan for the Long-Term

It’s really important to establish a set of metrics built around reliable methodology, and then stick with it. The reason is obvious: the time series really matters. That’s where you identify the trends; that’s how you see where generally-correlated metrics diverge. At the end of the day, that’s how you will learn things that matter. 

Understand What Matters to Your Leadership

Trust & Safety is intrinsically important, but Trust & Safety professionals must understand the business case for their work - and use ROEBO metrics to explain it. It’s not just a matter of doing the right thing because it’s right; it’s a matter of efficient and effective business practice. Regulation that requires companies to measure outputs can force companies to care about those end-states, but companies need to recognize for themselves the opportunity to run more effective and cost efficient Trust & Safety enterprises.