Glen Wise and Brian Fishman

Cinder AI Blog Series Part One: AI Principles

Glen Wise and Brian Fishman

Deploying AI responsibly is always difficult; doing so for Trust & Safety is particularly challenging. First, Trust & Safety is fundamentally adversarial: the question is not just if a machine learning model will effectively complete tasks, it is whether malicious actors will intentionally circumvent that system. Second, Trust & Safety operations take direct punitive actions that significantly impact people’s lives and basic speech rights, and therefore demand increased scrutiny and explicability. And, third, existing regulatory demands around Trust & Safety shape the application and limits of AI in this space. Given these challenges, our intent here is to explain our principles for integrating AI into Cinder.

Debates around AI often conflate different types of risk.

One set of risks emerges from the abuse of AI to intentionally facilitate harm. A second focuses on the biases and harms unintentionally introduced via AI. A third examines the risk that AI will eventually circumvent human control and threaten humanity. The fourth focuses on the opportunities and responsibilities associated with deploying AI to resolve existing business and social challenges, whether or not they were produced by AI. We have views on the first three issues, but our focus here is on the latter. 

Cinder was built for an era of AI, but one of our most important strategic decisions in founding Cinder was NOT to build our own AI models.

We made that choice for three reasons. First, we believed we could have more impact building a comprehensive, integrated system-of-record, given the dearth of such tools. Second, we expected that extraordinarily well-resourced models like those built by OpenAI, Google, Meta, and others would emerge as best-in-class.
Finally, we understood that a state-of-the-art decision platform would allow us to deploy state-of-the-art foundational models when they matured. 

Now, they have.

Ensuring AI Matches our Mission

At Cinder, we believe AI used for Trust & Safety operations should be dutiful, advancing the mission of the human teams deploying it.

This is different from saying that it should simply improve existing processes. Conceptualizing AI’s impact is complex because it promises to profoundly upend many existing operating models, but deploying AI is not an end itself and is unlikely to improve outcomes in all cases. It is a means to serve existing mission sets - and for Trust & Safety operations that means good outcomes are important and the process must be sound. 

As we integrate AI into Cinder, we intend to do so with the following principles in mind: 

Operational Observability:
AI-empowered systems should be observed constantly and verified continually. Modern AI decisions must be transparent. We can ensure our partners can easily understand the data and decisions they feed their models, and monitor and index determinations made by AI. Trust & Safety decisions have huge impacts on people, so they should be auditable and explicable - whether made by human beings or automated systems.

Human Command.
Human control over AI is critical, which means empowering partners to test AI effectiveness prior to each new use case and constantly benchmarking results against the most authoritative human decisions.

Flexibility. Trust and Safety is fundamentally adversarial. Cinder was designed to enable defenders to adapt as quickly as attackers - and AI must keep pace. This means defensive systems must update constantly as a function of operations and account for environmental and adversarial shifts.

Platforms define their policies differently based on risk tolerance, functionality and user base, and applicable politics and law. Trust & Safety decisions, whether made by human beings or AI, should be responsive to platform policies, else they are effectively arbitrary or bound by an opaque third-party rule set.

The level of decision-making autonomy of AI systems should be proportionate to the expected impact of those actions and the confidence of the machine learning model. For example, it may be appropriate to allow a model to prioritize jobs in a review queue but not to actually remove the content being reviewed. Cinder enables such adjustments dynamically and transparently. Given that some Trust & Safety outcomes are punitive and highly consequential, AI for Trust & Safety must often ensure human beings have final decision-making authority.

Privacy. AI tools should be able to utilize enterprise data appropriately, but be subject to data access restrictions just as any human employee. 

We envision AI tools both empowering existing Trust & Safety operating models and disrupting them entirely.

Trust & Safety teams have long deployed AI to improve efficiency, address scale, and, in some cases, improve on human accuracy. Modern Large Language Models (LLMs) convey a flexibility and ease of deployment that suggests even more dramatic shifts are possible.  

Review teams will double as benchmarking and labeling teams; companies will need more highly-trained investigators as adversaries shift attack vectors; and policy leaders will increasingly balance prompt engineering while building policy that is explicable to consumers, regulators, and machines. 

At Cinder, we are integrating features to facilitate these shifts - and we will be sharing more about those details in this blog series.

We built Cinder from the ground-up for this moment, which is why it allows partners to easily integrate their data, benchmark review decisions against authoritative sets, adjust workflows and enforcement actions in minutes, deploy decision data for multiple purposes, process millions of daily decisions, and audit every decision and action made by a human or automated system. We see those fundamental elements as critical for modern, integrated Trust & Safety, and they are foundational for deploying AI responsibly in a high-stakes, adversarial context. 

Deploying AI responsibly is always difficult; doing so for Trust & Safety is particularly challenging.