Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment methods typically work on a single preference and thus struggle with conflicts inherent in such aggregated datasets. As one early attempt, in this paper, we propose a data-centric approach to align LLMs through the effective use of fine-grained preferences. Specifically, we formulate the problem as a direct fine-grained preference optimization and introduce preference divergence (PD) that quantifies inter-aspect preference conflicts. Instead of directly tackling the consequent complicated optimization, we recast it as a data selection problem and propose a simple yet effective strategy, which identifies a subset of data corresponding to the most negative PD values, for efficient training. We theoretically analyze the loss-bound optimality of our selection strategy and conduct extensive empirical studies on varied settings and datasets to demonstrate that our practical selection method could achieve consistent improvement against standard full-data alignment, using even just 30% of the data.
Figure 1: The overall workflow of the proposed PD selection method.
As illustrated in Figure 1, our method first collects an aggregated dataset from multiple sub-datasets of different fine-grained preferences. For each sub-preference, we train a de-biased reward model using a smaller proxy model and leverage these models to estimate the PD term for each sample across the entire dataset. The final curated subset is obtained by retaining samples with the most negative PD values within the selection budget, and this subset is then used to align the LLM via standard DPO. To make the PD estimation reliable, we further mitigate length bias through length-balanced sampling and a length reward penalty.
Figure 2: Conflicts between fine-grained and overall preferences commonly occur, and only a part of the samples show complete consistency across all fine-grained aspects.
Collecting fine-grained preferences is more feasible as the underlying criteria are simpler, but aggregated datasets can contain redundancy, noise, and especially preference conflicts. The proposed preference divergence (PD) measures whether one sub-preference conflicts with or agrees with the consensus of other aspects, turning this challenge into a principled data selection criterion.
Table 1 and Figure 3: Performance comparison of different methods, including win rate, length-controlled win rate, average win score, GPU hours, and performance variation under different selection budgets.
Experiments on UltraFeedback and HelpSteer show that PD selection consistently improves alignment performance while reducing training cost. With only a subset of the data, PD selection can outperform full-data alignment, validating that filtering out conflicting and low-value samples is helpful for robust and efficient LLM alignment.
@inproceedings{zhang2026data,
title={Data Selection for LLM Alignment Using Fine-Grained Preferences},
author={Jia Zhang and Yao Liu and Chen-Xi Zhang and Yi Liu and Yi-Xuan Jin and Lan-Zhe Guo and Yu-Feng Li},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=nRS87hbAqU}
}