Research directions for democratic capabilities
There is an ocean of research required to help democracy keep pace with AI.
The gap map contains over 200 research questions linked directly to the capabilities they will improve and the goals they will help meet. See some gaps in the gap map? On each page, there is a contribution button that allows you to make suggestions, and in the menu bar, there is a general contributions form with more details.
This page includes sections for: What to prioritize, opportunities to have an outsized impact, and AI & ML research questions as an example of exploring by discipline.
The most important work to do
Clearing a few key bottlenecks has the potential to unlock significant uplift through flywheels and spillover improvements in many capabilities.
How can we effectively account for uncertainty in scenario consequences?
How can we design adaptive learning systems that provide personalized learning programs?
Can AI generate its own suggested changes and test them to search the latent space for optimal solutions?
What kinds of systems are appropriate for simulation?
How can facilitators maintain neutrality while also ensuring productive deliberation?
How much authentic human value is lost at each level of AI involvement (AI note-taker vs. AI facilitator vs. AI co-deliberator) and where is the steepest drop-off in the value-cost curve?
If 'doing the work' of synthesizing and clustering is more valuable than having an AI do it, do participants benefit equally from 'doing this work' or does it privilege those with more skills and stamina?
Can AI systems identify their own biases and reasoning errors more reliably than individual humans can identify their own cognitive biases when making sense of inputs?
How can we develop standardized neutrality assessment tools that can be applied across different cultural contexts?
How do we define and measure neutrality when legitimate value disagreements exist about what constitutes "neutral"?
How to develop an AI facilitator that is attentive to power imbalances, adaptive to group dynamics and effective in guiding groups towards successful outcomes?
How can we develop participants' reasoning and critical thinking skills within a process?
How can digital tools assist human facilitators to more effectively facilitate deliberations?
How can we distinguish between legitimate persuasion and manipulative influence in deliberative settings?
What are the effects of AI facilitation on public perceptions, group dynamics and deliberative quality?
What are the best methods for efficiently educating people?
What are the most efficient ways of recruiting participants?
How to help participants recognize and constructively engage with emotional dimensions, especially on highly-sensitive topics?
How can we enumerate a comprehensive set of scenarios or cases that a policy needs to address?
How can delibtech tools expand the space of policy scenarios and considerations in a transparent and fair way?
What behavioral indicators reliably signal attempts to game deliberative processes?
How best to implement global sortition given limited resources or access to population data?
How can individual learning be mediated through group learning to lift all boats?
How can we handle the real-world failure modes of recruitment?
What hybrid approaches can combine fast simulation with selective human input to optimize both speed and accuracy for urgent decisions?
How can we identify the likelihood that key scenarios are missing?
How can we create standardized integrity assessment frameworks for evaluating completed assemblies?
How can we represent scenarios in an interactive and educational process (not predictive modelling)?
How can individual learning agents identify and pair learning partners for defined objectives (idea crosspollination, depolarization, information gaps)?
How can we develop manipulation impact metrics that distinguish between minor and outcome-altering influences?
What are the best methods to measure the faithfulness of simulations?
What are the best methods to measure the accuracy of simulations?
How can we design information presentation formats that minimize susceptibility to framing effects?
What are the best approaches to recruiting a participant pool that captures the complexity and intersections of society while minimising self-selection biases?
How can we track and mitigate biases within scenario mapping?
What strategies can be used to motivate participation in less-democratic contexts?
How can we solve the technical blockers to effective and truth-worthy multi-agent simulation and modelling?
How can AI systems translate, generate and integrate learning materials into diverse formats (text, audio, visual, etc)?
What are the appropriate metrics for measuring neutrality in information presentation, question framing, and synthesis?
What are acceptable thresholds for intervention when neutrality violations are detected?
What are the tradeoffs between openness/transparency and manipulation resistance?
How to rotate groups/route comments to provide an optimal exposure and testing of different reasons?
For a given budget, location, panel size, and unique quotas, how can we design a recruitment plan that will maximize response rates and the representativeness of the sample?
How to manage recruitment in geographies with incredibly poor access and digital and physical infrastructure?
How can we develop criteria and methods for prioritizing scenarios based on likelihood, impact, and relevance to deliberative decisions?
How can we quantify the fairness of different approaches to sampling the population?
How can we develop real-time detection systems for coordinated manipulation attempts during participant recruitment and selection?
How can we develop realistic simulation environments that accurately predict how different deliberative formats will perform according to different design choices?
How can we help participants reason about long-term consequences and intergenerational impacts that are difficult to visualize or experience directly?
What kinds of recruitment methods reach which kinds of people?
How can the impacts of interventions on complex systems be simulated quickly and accurately?
For what uses in what contexts and with what level of faithfulness is it helpful or appropriate to use simulations? What are the philosophical moral political etc. implications?
How can lessons from speculative execution and speculative decoding help increase the availability of deliberative processes through reduced costs?
What is the Pareto frontier of speed, accuracy and easy-to-use interactability?
How can we support people to critically self-reflect on their preferences?
What techniques can help citizens effectively surface, reflect on, and convey their perspective to others?
How can we quantify and test the manipulation resistance of different assembly design choices?
How can we use third party verification of AI systems used in deliberation, using deliberation?
How can we track the perspectives offered and ensure that they all receive appropriate engagement?
How can we translate mathematical bias guarantees from algorithmic settings to real-world human facilitation?
How should we best treat low probability but high impact edge cases?
What methods can support participants to understand better the perspectives of others (e.g., automated language simplification, visual summary)?
How to unobtrusively measure individual and group understanding?
How can we design unobtrusive monitoring systems that don't themselves bias the deliberative process?
Opportunities for rapid impact
Help improve the coverage of this map by contributing new research in areas that are currently underserved.
How can we ensure simulated participants accurately represent missing demographics?
What conditions make adversarial auditing or independent review an accountability mechanism rather than a legitimizing exercise?
What design variables in deliberative formats can AI systems reliably identify as leverage points for optimization through automated multi-agent simulation?
What are the most common barriers that prevent AI labs from binding to deliberative outcomes? Which barriers are structural versus contingent on political will?
Under what conditions can AI-simulated participants maintain democratic legitimacy?
Can AI generate its own suggested changes and test them to search the latent space for optimal solutions?
What alternative mechanisms most effectively replicate the functional properties of a legal bind?
What are the best ways of anticipating key objections core power holders may raise against recommendations?
What collaboration tooling and facilitation workflows preserve deliberative quality and equal influence when participation is partly asynchronous, especially for drafting, consensus testing, and iteration cycles?
How can automatic logging of key events improve access for verifiers?
How do we balance efficiency with resilience in resource-constrained environments?
How much authentic human value is lost at each level of AI involvement (AI note-taker vs. AI facilitator vs. AI co-deliberator) and where is the steepest drop-off in the value-cost curve?
If 'doing the work' of synthesizing and clustering is more valuable than having an AI do it, do participants benefit equally from 'doing this work' or does it privilege those with more skills and stamina?
What conditions allow commitments to remain binding when the regulatory or political environment shifts significantly after the commitment was made?
Regarding timelines, when does the obligation need to begin? How long a delay, after a decision has been made, is acceptable for a bind to be considered respected? What prevents indefinite deferral?
Is there a demonstrable trade-off between the degree of legal bindingness imposed on AI labs and their capacity for rapid AI innovation? If so, under what governance designs is that trade-off minimized?
How should the degree of bindingness be calibrated to the characteristics of the decision at stake?
Can AI systems identify their own biases and reasoning errors more reliably than individual humans can identify their own cognitive biases when making sense of inputs?
What channels exist to formally challenge accountability failures? How accessible and impactful are they in practice?
What checks and balances are needed, when making fully-binding decisions?
To what extent can we clearly communicate the inner workings of AI-augmented deliberative tools?
How do we communicate changes to stakeholders without undermining confidence in outcomes?
Under what conditions is it reasonable to not stick with commitments? (e.g. does the reversal of a commitment require an explicit mandate, either through an election or a subsequent deliberative process?)
What are the best approaches to managing confidentiality and sensitive information with oversight?
How do standardized consent frameworks that specify permitted research uses affect both researcher access rates and participant willingness to contribute data, and what consent design achieves the best balance?
How to quickly provide information on the consequences for different types and ranges of intersectional constraints on selecting participants?
What are the best methods for constructing attrition-robust panels?
Under what conditions should a binding deliberative outcome be legally contestable or reversible?
How can we convey similarities and differences between the outputs? (Including satellite deliberations - across cultures and languages - with the goal of finding common ground?)
How can deliberative processes operating at different governance layers be coordinated such that they inform rather than contradict each other, especially when underlying values or priorities differ across regions?
What redundancies and buffers are most cost-effective for different types of disruptions?
How should the learning phase of a transnational assembly be designed so that participants from different epistemic cultures assess the evidence base as credible and non-ideological?
How can we develop standardized neutrality assessment tools that can be applied across different cultural contexts?
What onboarding and norm-setting protocols successfully establish shared procedural expectations (around disagreement, turn-taking, inclusion of dissent, and decision closure) among participants drawn from cultures with fundamentally different conventions for public reasoning and conflict?
How do accountability mechanisms hold when AI organizations operate globally but commitments are made in specific jurisdictions?
What is the minimum viable legal architecture in terms of entity jurisdiction, data routing, and consent frameworks that allows a single deliberative process to lawfully handle participant data across the major regulatory regimes (GDPR, PIPL, LGPD, etc.) without fragmenting the process into siloed national tracks?
How can cryptographic mechanisms create locking mechanisms and binding incentive structures?
What consent, anonymization, and data governance protocols (comparing opt-in vs. opt-out, persistent vs. temporary storage, restricted vs. open licensing) enable practitioners to balance participant privacy and autonomy against the research value of maintaining rich deliberative records?
What data triage and routing processes (structured as decision trees vs. algorithmic vs. moderator-driven) enable process organizers to respond to emerging issues during deliberations, measured by time-to-action and intervention appropriateness?
What makes a community 'affected'?
What constellation of outcomes (spanning legitimacy, recommendation quality, participant satisfaction, opinion change, and downstream policy impact) must any democratic process achieve to be considered successful, and how do these vary with process purpose?
What are some inputs used in technical alignment approaches that could be produced or enhanced with deliberative processes?
What are the tipping points where adaptation compromises core democratic values?
How do different process design choices impact different desired outcomes?
What happens when affected communities and convening authorities disagree on whether a commitment has been made? What weight is given to the arguments of the affected communities?
How do downstream effects from participation systematically vary across different deliberative process formats (comparing citizens' assemblies, deliberative polls, mini-publics, and online forums), and what process features predict effect heterogeneity?
What particular knock-on effects from participation (spanning civic engagement, political efficacy, discussion spillover, network influence, or policy awareness) are most important to measure, and what longitudinal methods best capture them without excessive participant burden?
Why do some accountability campaigns successfully change authority behavior while others are ignored?
What consequences do AI organizations or governments treat as real deterrents today? When does reputational cost stop mattering?
Which pressure tactics (public shaming, litigation, coalition-building) are most effective in different political contexts?
What are the most efficient ways of recruiting participants?
How to help participants recognize and constructively engage with emotional dimensions, especially on highly-sensitive topics?
How can we enumerate a comprehensive set of scenarios or cases that a policy needs to address?
How can delibtech tools expand the space of policy scenarios and considerations in a transparent and fair way?
Which visualization and dashboard designs (comparing temporal vs. spatial vs. network-based layouts) best support real-time information use by facilitators under time pressure, and when do practitioners choose to ignore dashboard signals?
How can deliberative outputs be formatted as functions such that they can automatically adapt?
What governance configurations (oversight composition, funding diversification thresholds, operational hosting rotation, and transparency mechanisms) are necessary and sufficient for a global assembly to be perceived as credibly neutral by participants and publics across geopolitical blocs?
How best to implement global sortition given limited resources or access to population data?
What are the most common barriers that prevent governments from binding to deliberative outcomes? Which barriers are structural versus contingent on political will?
How can technically binding decisions integrate with AI alignment in gradual ways?
How can we foster the group building and trust that is necessary for high-quality deliberation as group sizes increase?
What transparency and consent mechanisms are required for hybrid assemblies?
What hybrid approaches can combine fast simulation with selective human input to optimize both speed and accuracy for urgent decisions?
How can we identify the likelihood that key scenarios are missing?
What is the amount of carrots vs. sticks necessary to protect commitments internally?
How can different translation techniques be combined and integrated most seamlessly?
How can we create standardized integrity assessment frameworks for evaluating completed assemblies?
What feedback and transmission protocols ensure that decisions or directions from central processes reach subsidiary processes in time to shape their work, and how do asynchronous vs. synchronous sequencing affect deliberative quality at each level?
How can we represent scenarios in an interactive and educational process (not predictive modelling)?
What are the internal barriers that prevent commitment from happening? (e.g. employee pressure, incentive systems, decision-making culture, organizational structure?)
To what extent can a structured repository of interpretive precedents — built from annotated implementation decisions linked back to the deliberative rationale that grounds them — function as a reliable 'case law' for navigating ambiguity in process outputs?
How do iterative constraint updates (possibly presented as embedded technical briefs, live feasibility checks, or expert panels) during deliberation affect the quality and implementability of recommendations, compared to constraint-free processes?
How can representation be managed across iterations of panels or many panels in parallel?
How do we evaluate any moral philosophical or efficacy-based justifications for trying to include the voices of non-humans/future generations in deliberations?
How reliably can language models trained on deliberative transcripts, stated rationales, and value-elicitation outputs distinguish between implementation decisions that are consistent with versus divergent from the normative commitments embedded in process outputs?
How can individual learning agents identify and pair learning partners for defined objectives (idea crosspollination, depolarization, information gaps)?
What role can legal or compliance infrastructure play in embedding deliberative commitments into operations? Under what conditions can it be counter-productive?
What existing analogues (e.g. binding arbitration) provide legal precedents, and what do they fail to address for AI governance contexts?
What triggering thresholds and conditions can be specified in law or regulation such that they reliably activate appropriate deliberative processes?
What machine translation and annotation approaches (comparing human-in-the-loop vs. automated vs. hybrid) maintain semantic accuracy for multilingual data in international or diverse assemblies, particularly for idioms and context-dependent meaning?
What are the best methods for managing aggregated inputs when content production and integration occur over time?
What are the options for managing the hierarchical relationships between processes, and their pros and cons?
What institutional design features of a mandating body (composition, decision rules, relationship to existing international organizations) are necessary for participants and external audiences to perceive a global assembly as legitimately authorized rather than self-appointed?
How do we measure commitment drift, i.e. commitments that have not stuck over time?
How can we measure the concreteness of statements and recommendations?
What observable deliberative quality dimensions (such as turn-taking equity, argument depth, perspective inclusion, or respectfulness) can be reliably measured through automated content analysis or human observation in real time, and what does measurement reveal about facilitator behavior changes?
What measurement approaches (comparing explicit belief statements, semantic mapping, implicit preference tasks, or network analysis of argument adoption) best capture individual and group learning and preference shifts while remaining feasible to administer at deliberation intervals?
How do different methods for measuring preference transformation (pre/post surveys, in-process journaling, exit interviews, or network tracking) correlate with one another and with long-term behavioral change, under different deliberative process formats?
What are the best methods to measure the accuracy of simulations?
How can deliberative processes produce outputs that meet legal, technical, or administrative requirements without compromising participant ownership?
What are the different methods for representing non-humans/future generations and how do these methods compare?
How can we define and measure "minimum viable" conditions for different assembly objectives?
How can we solve the technical blockers to effective and truth-worthy multi-agent simulation and modelling?
Which existing or novel configurations of multi-stakeholder commitment (such as pre-negotiated adoption pledges from national governments, treaty body referral mechanisms, or voluntary corporate compliance frameworks) may produce the highest rates of recommendation uptake from transnational deliberative processes, and under what conditions?
Does the integration of deliberative technologies raise fundamentally new transparency challenges to processes and if so, what are they?
Which open standards and API specifications (building on ActivityPub, NDJSON, or deliberation-specific formats) best enable interoperability between different tools while operating within organizations' existing tech stacks and governance constraints?
How can process outcomes (spanning legitimacy, recommendation quality, participant satisfaction, opinion change, and downstream policy impact) be operationalized as measurable indicators practitioners can feasibly collect?
How to rotate groups/route comments to provide an optimal exposure and testing of different reasons?
How does the optimal session length and pacing vary with task type (information processing vs. value deliberation vs. decision-making), topic complexity (technical vs. normative vs. hybrid), and participant characteristics (expertise, prior knowledge, cognitive diversity)?
What is the Pareto frontier for different deliberative outcomes (learning, viewpoint formation, common ground, decision quality, participant satisfaction) as a function of session time, and are there absolute minimum thresholds below which outcomes collapse for given topic types?
How can deliberative process outputs be incorporated into shared system evaluations or benchmarks?
How does the degree of isolation of a citizen participation office affect its resilience to political interference? What level of integration vs. independence optimizes legitimacy?
How to manage recruitment in geographies with incredibly poor access and digital and physical infrastructure?
What pre-convening technical mapping approach (such as decision-system audits, output format workshops, or constraint inventories) enables authorities to communicate decision-integration constraints to process designers before convening?
Do some process modifications require extensive piloting, while others can be deployed immediately? What characteristics of a novel application predict adaptation difficulty?
What exercise, issue and context characteristics most strongly predict the time required to complete a given task or process and can these be codified into a prediction model usable by practitioners at design time?
What are the most effective methods and formats for presenting process outputs to decision makers, and what tools can support this process?
What pre-commitments and transparency measures best preserve legitimacy during adaptations?
What agenda-setting and issue-framing procedures demonstrably prevent the priorities of technology-producing nations or major funders from dominating the scope and terms of deliberation in transnational processes?
What legal mechanisms can a private company set up to make deliberative outcomes enforceable?
How to develop real-time dashboards that track process health across multiple dimensions?
Developing process pattern languages to be combined with open or customised facilitation resources.
How can deliberative processes be used to produce formal and verifiable specifications (unit tests, integration tests) for technical systems?
What properties should commitments have to make them truly adaptable? (e.g. specificity vs. breadth, time boundedness, rules for how commitments evolve over time)
What practices protect commitments from reversal when leadership or staff changes in an organization or government?
How can we assist or automate the aggregation of deliberative input from diverse participants in real time whilst maintaining nuance around minority perspectives?
What real-time facilitation interventions (such as structured summarization, breakpoint decisions, or adaptive session extension) enable organizers to detect when deliberative quality is degrading due to time pressure and respond within process constraints?
What support tools could help practitioners get high-level feedback in real-time, such that they can adapt and improve the process as it is underway?
What combination of technical infrastructure, governance structures, and consent protocols enables researchers to access deliberative process data at scale while preserving participant privacy and data integrity?
What restorative practices are most effective in deliberative settings?
How does scaling out affect who participates in the deliberative process?
What scheduling and modality architectures (rotating synchronous windows, follow-the-sun relays, asynchronous deliberation with structured synthesis) minimize systematic regional disadvantage, and how should fairness be quantified?
What is the relationship between different selection variables and public trust, deliberative quality, epistemic quality, and output quality?
When is selective transparency legitimate? What should always be public vs what legitimately needs confidentiality?
For what uses in what contexts and with what level of faithfulness is it helpful or appropriate to use simulations? What are the philosophical moral political etc. implications?
What simulation fidelity level (agent realism, dialogue authenticity, decision distributions) accurately predicts outcomes for specific deliberative formats under real-world constraints, and where does increased fidelity stop improving predictive value?
How can lessons from speculative execution and speculative decoding help increase the availability of deliberative processes through reduced costs?
How can lessons from speculative execution and speculative decoding help increase the availability of deliberative processes through reduced costs?
What is the Pareto frontier of speed, accuracy and easy-to-use interactability?
Under what internal communication and training approaches do staff across departments internalize deliberative norms such that they proactively design processes or flag integration opportunities?
What aggregation approaches are best suited to different stages of a process?
How can we systematically stress-test assembly designs before implementation?
Under what conditions is a subsidiary/decentralized approach necessary or superior to a single scaled-out process, and how do organizers diagnose whether fragmentation serves deliberative quality or merely distributes work?
What techniques can help citizens effectively surface, reflect on, and convey their perspective to others?
What are the key technical blockers (agent behavior calibration, emergent group dynamics modeling, preference faithfulness) to effective and trustworthy multi-agent simulation, and which are tractable with current methods?
What design choices help promote the transparency of deliberative technologies, and what tradeoffs does this raise?
Could there be templated approaches to socialising and developing internal commitments?
What are the most effective methods of testing the compatibility of outputs with legal/constitutional/jurisdictional or other fundamental constraints on recommendation uptake?
How can we use third party verification of AI systems used in deliberation, using deliberation?
What feedback mechanisms/traceability measures can help participants understand how their contributions influenced outcomes in the process?
How do we best track and identify important voices that are currently missing?
How can we track the perspectives offered and ensure that they all receive appropriate engagement?
Which transcription and annotation approaches (comparing human verbatim, human semantic, hybrid human-AI, or AI-only) best handle cross-talk, non-verbal communication, and emotional valence while maintaining accuracy standards?
How to translate existing social choice research into practical methodologies with decision aides for matching process to context such as identifying trade-offs between theoretical guarantees, speed, explainability, and legitimacy in the eyes of participants public and stakeholders?
How can translation best be provided for those in very remote and hard-to-access geographies?
How should we best treat low probability but high impact edge cases?
What are the most compelling features of processes for building trust?
What group-building and trust-formation mechanisms sustain high-quality deliberation as synchronous group size increases beyond typical face-to-face thresholds?
How can practitioners balance (through adaptive protocols or meta-evaluation frameworks) universal standards for cross-context learning against context-specific adaptations required by local stakeholder concerns and governance structures?
How to unobtrusively measure individual and group understanding?
What normative frameworks and transparent decision rules enable organizers to justify the weight given to different subsidiary processes in shaping central outcomes, and how can these be communicated to participants to maintain perceived fairness?
Explore research questions by discipline
We’ve tagged research questions with some basic categories to help you find the work most relevant to you. Below is an example for AI and ML.