Resource

Introducing Democratic Fine-Tuning

Joe Edelman’s Democratic Fine-Tuning (DFT) process aligning LLMs with human values through collective deliberation using Values Cards (where participants articulate underlying values like “protecting my community” rather than divisive language) and Moral Graphs (collaborative data structures mapping relationships between values to create a “wisdom gradient”). Participants engage in three stages: articulating considerations, selecting wisest values, and identifying hierarchical relationships, producing training data for reward models that fine-tune LLMs toward wise rather than merely obedient behavior.

Experimental Practice

Link https://meaningalignment.substack.com/p/introducing-democratic-fine-tuning

Creators Joe Edelman and Oliver Klingefjord

Year 2023

Related Capabilities

Represent complexity

Substantiveness

Ability for final outputs to be nuanced, concrete, decisive, and comprehensive.

Introducing Democratic Fine-Tuning

Related Capabilities

Represent complexity

Related Research Questions

How to balance finding common ground within a limited time, while minimally sacrificing depth of final outputs?

Relevant disciplines

What are the best methods for providing impartial robustness checking and critical friend support for output refinement?

Relevant disciplines

How can we ensure that outputs go beyond abstract, high-level principles to specific, actionable proposals?

Relevant disciplines