Toward a Concept of Trust & Safety Orchestration

Insights

2.22.2024

Phil Brennan

COO and Co-Founder

Co-Authored:

URL COPIED

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

What is orchestration and how does it matter to Trust & Safety?

Orchestration is shorthand for managing and coordinating responses to platform abuse while updating detection or defenses. The term draws from cybersecurity, which has an advanced practice in combining human judgment and machine speed to identify risks and respond fast.

Imagine a conductor standing before a symphony. Their role is to draw together from all the instruments to orchestrate something greater than each musician could achieve individually.

In cybersecurity, a security orchestration, automation and response (SOAR) product centralizes coordination, execution, and automation of some tasks across teams and tools within a main platform of record. SOAR in theory allows cybersecurity defenders to respond to cybersecurity attacks fast as well as learn and prevent future incidents, improving their posture each time.

Some of the SOAR model fits Trust & Safety. And some of it doesn’t. The idea of orchestration underpins several central concepts at Cinder: a central platform of record for all the actions, policies, decisions, and options needed for a team to respond.

Challenges in Trust & Safety

What Trust & Safety teams face in 2024 needs no further explanation: from generative AI increasing the speed and scale of harmful content, to elections-related abuse, plus new regulations.

Compounding these pressures are macroeconomic realities. Gone are the days of hiring 100 or 1000 more outsourced reviewers or 20 FTEs to manually brute force through issues.

In January, several deepfake crises showed that, regardless of where you sit in the content creation and distribution system, the online landscape is adversarial, and as soon as you perceive a new threat you need to make swift changes to address it.

Orchestration Yields a Concerted Effort

So how can orchestration help? The term suggests concerted and connected effort. Within Trust & Safety, that connection means human-machine effectiveness. The speed of abusive actors, behavior, and content accelerates with generative AI: human-scaled response of the past is unrealistic today.

Let’s take a look at core Trust & Safety functions that could be pulled together to orchestrate responses. They include

Policy development
Fraud detection or KYC/AML scores
Moderation and data labeling systems (for user reports or classifiers)
Custom detection rules, internally trained LLMs, off the shelf classifiers (e.g., OpenAI’s content moderation API, CV models, kNN)
User communications
Escalations, investigations, and case management
Third-party intelligence, law enforcement requests, or press inquiries
Prompt text or uploads, in the case of AI platforms

For many companies, each of these functions lives in a different tool or UI, and the data to act on or investigate may sit in separate databases. Since there are a few different databases involved, seeing all of the “system” in one place is hard to imagine. And harder still to conceive of how to respond fast across so many surfaces.

On to orchestration.

Coordinate Human and Machine Efforts

Effective orchestration means turning human judgment into machine efficiency very fast. If your company is a genAI platform, this process might mean new guardrails, few shot learning. If you’re a marketplace or chat app, maybe it’s account-level or textual abuse that adapts to the most recent account takeover attack.

A few common trends to coordinate across human teams and machines:

The performance of your models hinges on high quality data. Too many false positives and technology is hamstringing your response, not accelerating it.
You need tools to respond quickly in a crisis, and to investigate what actually happened on your platform, all at the speed of the news cycle.
You need content review because as smart as you are, undeterred bad actors will experiment in exploiting platform weaknesses—and failure even once or twice can be devastating.
Label data and moderate at the same time. Human moderation can yield the highest quality training data, tuned to the unique threats and harms on your platform. Here, your Trust & Safety moderation activity is essentially RLHF.
You need to capture every human decision (escalations, incident response, and user comms) and roll it back into your models. Maintain a clear-eyed understanding that it’s an adversarial landscape and you need to be adapting and learning continuously.

All these challenges are interconnected. And they all rely on centralizing diverse data to comprehend and prioritize response to today’s abuse. The promise of orchestration is brightest when it sits atop clean, quality, trustworthy data.

Incident Response, Playbooks, and Reporting

Responding to incidents and escalations is a frequent Trust & Safety headache.

Response is often a rush of shared docs and sheets, frantic querying and data exploration. A hasty final output for executives. The goal: understand the issue and respond appropriately and quickly.

Once incidents happen frequently enough, it’s time for playbooks so everyone on the team knows what to do, how to do it, and what to expect.

In the cybersecurity SOAR context, “playbooks” or runbooks are scripts triggered by specific alerts or incidents. Most are programmatic. The SOAR product identifies specific events or threats and executes automated responses based on predefined playbooks. A SOAR can help create new automated playbooks after learning from past incidents.

Today, in most Trust & Safety teams, incident response remediations can be tough to capture programmatically. Many teams manually make suggestions, submit tickets to write or delete code in a detection engine, or check a classifier’s training data for accuracy.

Orchestration means these incident response findings can be added quickly with no or low code systems. And, they’re accessible from the operational UI, not buried in another tool. Once you’ve got playbooks in place, it’s then a question of which steps need to be automated.

With data centralization, orchestration makes available metrics like mean time to detect or mean time to respond. These metrics create more efficient responses. With a deeper understanding of abuse on the platform, a clearer picture emerges of how many people and resources are required to respond.

Orchestration Helps Trust & Safety Teams Improve AI Faster

Trust & Safety teams have worked with ML for years. LLMs for moderation and detection bring that work into the AI age. LlamaGuard, OpenAI’s content moderation API, and more. Few-shot learning means the volume of data needed for new models continues to go down.

But no model today is going to detect the latest novel (and awful) abuse because it’s too adversarial, too new, too rare.
In an orchestration model, operational responses to new abuse are captured in an incident response or case management system. Suddenly, data collected in the investigation becomes labeled data available for training and tuning. It’s RLHF built into your Trust & Safety system.

While data labeling is considered low-skill and high speed, in a crisis it’s crucial and often highly trained experts are capturing it for you. By thinking in terms of orchestration, the “labels” investigators “apply” suddenly become annotated data.

Automation and Continuous Improvement

Turn that labeled data, playbooks, root cause findings, into automation. Then QA the outputs of the automation to ensure quality and accuracy.

Envision a moment where, for example, hundreds of pieces of content are annotated (moderator, or high-confidence automated action) with similar high-severity labels within a short period, and from the same geography. You’ve seen this before. And thanks to automation, a playbook is triggered. A new case is created and the on-call is notified. Automated rules populate the case with all the relevant UUIDs, the geography, the time period. An LLM summarizes the episode at the top of the case.

The next set of playbook automations send alerts to policy teams. Within minutes, there’s a chance to look at the content and begin assessing what’s going on—need to call regulators? Anticipate press queries? Alert the C-suite? To reduce virality, these automated workflows suggest or initiate brakes on recommendation algorithms.

No automation is perfect. And needs to be QA’d to ensure accuracy. The ops teams in the trenches—are they making accurate decisions on top of the automation? Building QA into the platform is key to confidence in the automated workflows.

Conclusion

Thinking in terms of orchestration keeps Trust & Safety executives prepared and aware. And it helps teams move faster, focus on what matters, and maximize human and machine advantages.

Decide better and faster. Orchestration gathers data and makes it easier to analyze incidents and abuse trends, and make corrective actions or suggest remediation.
Improve communication. Share context with centralized data and logging. Case management means stakeholders have one place to look, same as the responding investigators. Prevent time-consuming redundancies.
Automate, Reinforce. Orchestration allows multiple tools to respond to incidents as a group, even when the data is spread out. Automated rules save time and create efficiencies. Simultaneously recording all the human judgments creates high quality labeled data for model training and fine-tuning.
Access context, history, and intelligence. Advanced orchestration enables third-party intelligence or other off-platform signals to easily join, for example, case management or investigations.
See it in one place. Teams gain access to a single console that provides all the information it needs to investigate and remediate incidents.

Book a meeting

Insights

Measuring Trust and Safety

Measuring the success (and limitations) of a Trust & Safety program is a complex process, particularly because many Trust & Safety departments are considered cost centers.

Insights

Trust & Safety: The Decision Spectrum and Organizational Structure

At Cinder, we conceptualize some of that complexity in the Decision Spectrum, a framework that describes a range of Trust & Safety decision types on a spectrum reflecting various levels of complexity.

Challenges in Trust & Safety

Orchestration Yields a Concerted Effort

Coordinate Human and Machine Efforts

Incident Response, Playbooks, and Reporting

Orchestration Helps Trust & Safety Teams Improve AI Faster

Automation and Continuous Improvement

Conclusion

Read More

Measuring Trust and Safety

Trust & Safety: The Decision Spectrum and Organizational Structure