In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues some objectives, but not the intended ones. It can be challenging for AI designers to align an AI system because it can be difficult for them to specify the full range of desired and undesired behaviors. To avoid this difficulty, they typically use simpler proxy goals, such as gaining human approval. However, this approach can create loopholes, overlook necessary constraints, or reward the AI system for just appearing aligned. Misaligned AI systems can malfunction or cause harm. AI systems may find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful ways (reward hacking). AI systems may also develop unwanted instrumental strategies such as seeking power or survival because such strategies help them achieve their given goals. Furthermore, they may develop undesirable emergent goals that may be hard to detect before the system is in deployment, where it faces new situations and data distributions. Today, these problems affect existing commercial systems such as language models, robots, autonomous vehicles, and social media recommendation engines. Some AI researchers argue that more capable future systems will be more severely affected since these problems partially result from the systems being highly capable. Many leading AI scientists such as Geoffrey Hinton and Stuart Russell argue that AI is approaching superhuman capabilities and could endanger human civilization if misaligned. AI alignment is a subfield of AI safety, the study of how to build safe AI systems. Other subfields of AI safety include robustness, monitoring, and capability control. Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (32)
MATH-101(en): Analysis I (English)
We study the fundamental concepts of analysis, calculus and the integral of real-valued functions of a real variable.
CS-552: Modern natural language processing
Natural language processing is ubiquitous in modern intelligent technologies, serving as a foundation for language translators, virtual assistants, search engines, and many more. In this course, stude
EE-556: Mathematics of data: from theory to computation
This course provides an overview of key advances in continuous optimization and statistical analysis for machine learning. We review recent learning formulations and models as well as their guarantees
Show more
Related publications (55)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.