Summary
In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues some objectives, but not the intended ones. It can be challenging for AI designers to align an AI system because it can be difficult for them to specify the full range of desired and undesired behaviors. To avoid this difficulty, they typically use simpler proxy goals, such as gaining human approval. However, this approach can create loopholes, overlook necessary constraints, or reward the AI system for just appearing aligned. Misaligned AI systems can malfunction or cause harm. AI systems may find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful ways (reward hacking). AI systems may also develop unwanted instrumental strategies such as seeking power or survival because such strategies help them achieve their given goals. Furthermore, they may develop undesirable emergent goals that may be hard to detect before the system is in deployment, where it faces new situations and data distributions. Today, these problems affect existing commercial systems such as language models, robots, autonomous vehicles, and social media recommendation engines. Some AI researchers argue that more capable future systems will be more severely affected since these problems partially result from the systems being highly capable. Many leading AI scientists such as Geoffrey Hinton and Stuart Russell argue that AI is approaching superhuman capabilities and could endanger human civilization if misaligned. AI alignment is a subfield of AI safety, the study of how to build safe AI systems. Other subfields of AI safety include robustness, monitoring, and capability control. Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.