Metaethical Foundations of Artificial Intelligence Alignment

Methodological Approaches and Their Limitations

  • Ivan Snetkov Lecturer at the HSE University (Moscow, Russia)
Keywords: Artificial Intelligence, Existential Risks, Alignment Problem, Metaethics, Moral Non-naturalism, Moral Naturalism

Abstract

The article investigates the alignment problem, which concerns the integration of moral values into the architecture of artificial intelligence (AI) systems to mitigate existential risks. It examines conceptual approaches to addressing the alignment problem, including the utilitarian principles proposed by S. Russell and E. Yudkowsky's concept of “coherent extrapolated volition”. The study introduces the notion of a “meta-alignment problem”. Through an analysis of the conceptual distinction between “strong” and “weak AI, the author concludes that these categories necessitate distinct approaches to resolving the alignment problem. The article evaluates existing methodological approaches to tackling this issue, including the “principles-to-practice” approach and the “practice-oriented approach, highlighting their limitations, such as difficulties in operationalizing moral principles and accommodating individual moral preferences. It also explores the potential of “hybrid” approaches. The consideration of metaethical foundations is proposed as a means to address a key challenge in hybrid approaches, namely the ambiguity surrounding the criteria for data “quality”. The study advocates for the use of conceptual models of morality developed within metaethics — specifically non-naturalism (intuitionism) and moral naturalism — as a foundation for devising new hybrid alignment strategies. The non-naturalist approach relies on moral intuitions explored through experimental philosophy, enabling the reconciliation of individual and collective moral intuitions by bridging value gaps between humans and AI. In contrast, the naturalist approach draws on neurobiological data to identify moral “facts”, rendering AI systems more transparent and predictable. Metaethical foundations significantly influence AI design, and their explicit consideration not only facilitates the development of effective alignment methodologies but also allows for empirical evaluation of the viability of metaethical approaches in addressing the alignment problem. The article contributes to the discourse on the metaethical foundations of AI alignment. It proposes directions for future research and outlines potential pathways for aligning AI systems with moral values.

Downloads

Download data is not yet available.
Published
2025-09-28
How to Cite
Snetkov I. (2025). Metaethical Foundations of Artificial Intelligence Alignment. Philosophy Journal of the Higher School of Economics, 9(3), 277-302. https://doi.org/10.17323/2587-8719-2025-3-277-302