In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues some objectives, but not the intended ones. It can be challenging for AI designers to align an AI system because it can be difficult for them to specify the full range of desired and undesired behaviors. To avoid this difficulty, they typically use simpler proxy goals, such as gaining human approval. However, this approach can create loopholes, overlook necessary constraints, or reward the AI system for just appearing aligned. Misaligned AI systems can malfunction or cause harm. AI systems may find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful ways (reward hacking). AI systems may also develop unwanted instrumental strategies such as seeking power or survival because such strategies help them achieve their given goals. Furthermore, they may develop undesirable emergent goals that may be hard to detect before the system is in deployment, where it faces new situations and data distributions. Today, these problems affect existing commercial systems such as language models, robots, autonomous vehicles, and social media recommendation engines. Some AI researchers argue that more capable future systems will be more severely affected since these problems partially result from the systems being highly capable. Many leading AI scientists such as Geoffrey Hinton and Stuart Russell argue that AI is approaching superhuman capabilities and could endanger human civilization if misaligned. AI alignment is a subfield of AI safety, the study of how to build safe AI systems. Other subfields of AI safety include robustness, monitoring, and capability control. Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (32)
MATH-101(en): Analysis I (English)
We study the fundamental concepts of analysis, calculus and the integral of real-valued functions of a real variable.
CS-552: Modern natural language processing
Natural language processing is ubiquitous in modern intelligent technologies, serving as a foundation for language translators, virtual assistants, search engines, and many more. In this course, stude
EE-556: Mathematics of data: from theory to computation
This course provides an overview of key advances in continuous optimization and statistical analysis for machine learning. We review recent learning formulations and models as well as their guarantees
Show more
Related lectures (171)
Limits and Continuity: Analysis 1
Explores limits, continuity, and uniform continuity in functions, including properties at specific points and closed intervals.
Intermediate Value Theorem
Covers the Intermediate Value Theorem, uniform continuity, Lipschitz functions, and the properties of continuous functions.
Limits of Sequences
Covers the concept of limits of sequences, including definitions, examples, boundedness, and more.
Show more
Related publications (55)
Related concepts (9)
Machine ethics
Machine ethics (or machine morality, computational morality, or computational ethics) is a part of the ethics of artificial intelligence concerned with adding or ensuring moral behaviors of man-made machines that use artificial intelligence, otherwise known as artificial intelligent agents. Machine ethics differs from other ethical fields related to engineering and technology. Machine ethics should not be confused with computer ethics, which focuses on human use of computers.
Existential risk from artificial general intelligence
Existential risk from artificial general intelligence is the hypothesis that substantial progress in artificial general intelligence (AGI) could result in human extinction or another irreversible global catastrophe. One argument goes as follows: The human species currently dominates other species because the human brain possesses distinctive capabilities other animals lack. If AI were to surpass humanity in general intelligence and become superintelligent, then it could become difficult or impossible to control.
ChatGPT
ChatGPT, which stands for Chat Generative Pre-trained Transformer, is a large language model-based chatbot developed by OpenAI and launched on November 30, 2022, notable for enabling users to refine and steer a conversation towards a desired length, format, style, level of detail, and language used. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context. ChatGPT is built upon GPT-3.
Show more
Related MOOCs (9)
Analyse I
Le contenu de ce cours correspond à celui du cours d'Analyse I, comme il est enseigné pour les étudiantes et les étudiants de l'EPFL pendant leur premier semestre. Chaque chapitre du cours correspond
Analyse I (partie 1) : Prélude, notions de base, les nombres réels
Concepts de base de l'analyse réelle et introduction aux nombres réels.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.