← Explore Research Questions Research Question

What behavioral indicators reliably signal attempts to game deliberative processes?

Related Goals

Manipulation attempts that can be reliably detected and prevented across different stages of the assembly process.

Related Capabilities

Resist manipulation

Urgent Robustness

Ability to resist manipulation that would decrease trustworthiness, legitimacy or unfairly influence the outcome.

Related Existing Resources

Adversarial testing for Generative AI

Google’s guide defining adversarial testing as systematically evaluating ML models against malicious or inadvertently harmful input, covering explicit queries (containing policy-violating language) and implicit queries (seeming harmless but involving sensitive topics). The four-stage workflow inv...

Strategy-proofness and Arrow's conditions: Existence and correspondence theorems for voting procedures and social welfare functions

Satterthwaite’s landmark 1975 work on strategy-proofness and Arrow’s conditions, investigating the relationship between preventing strategic manipulation in voting procedures and satisfying Arrow’s impossibility conditions. This foundational work in mechanism design theory demonstrates existence ...

Strategic Classification

Hardt et al. (2015) address classifier manipulation by strategic actors, modeling the problem as a sequential game between classifier designers and individuals seeking favorable classification who may alter attributes to game the system. For natural cost function classes, they developed computati...

Strategic Classification is Causal Modeling in Disguise

Miller, Milli, and Hardt (2020) reveal a fundamental connection between strategic classification and causal inference, distinguishing between gaming (circumventing the system) and genuine improvement. Their central argument is that designing classifiers that incentivize improvement must inevitabl...