In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues some objectives, but not the intended ones.
It can be challenging for AI designers to align an AI system because it can be difficult for them to specify the full range of desired and undesired behaviors. To avoid this difficulty, they typically use simpler proxy goals, such as gaining human approval. However, this approach can create loopholes, overlook necessary constraints, or reward the AI system for just appearing aligned.
Misaligned AI systems can malfunction or cause harm. AI systems may find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful ways (reward hacking). AI systems may also develop unwanted instrumental strategies such as seeking power or survival because such strategies help them achieve their given goals. Furthermore, they may develop undesirable emergent goals that may be hard to detect before the system is in deployment, where it faces new situations and data distributions.
Today, these problems affect existing commercial systems such as language models, robots, autonomous vehicles, and social media recommendation engines. Some AI researchers argue that more capable future systems will be more severely affected since these problems partially result from the systems being highly capable.
Many leading AI scientists such as Geoffrey Hinton and Stuart Russell argue that AI is approaching superhuman capabilities and could endanger human civilization if misaligned.
AI alignment is a subfield of AI safety, the study of how to build safe AI systems. Other subfields of AI safety include robustness, monitoring, and capability control. Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Natural language processing is ubiquitous in modern intelligent technologies, serving as a foundation for language translators, virtual assistants, search engines, and many more. In this course, stude
This course provides an overview of key advances in continuous optimization and statistical analysis for machine learning. We review recent learning formulations and models as well as their guarantees
Machine ethics (or machine morality, computational morality, or computational ethics) is a part of the ethics of artificial intelligence concerned with adding or ensuring moral behaviors of man-made machines that use artificial intelligence, otherwise known as artificial intelligent agents. Machine ethics differs from other ethical fields related to engineering and technology. Machine ethics should not be confused with computer ethics, which focuses on human use of computers.
Existential risk from artificial general intelligence is the hypothesis that substantial progress in artificial general intelligence (AGI) could result in human extinction or another irreversible global catastrophe. One argument goes as follows: The human species currently dominates other species because the human brain possesses distinctive capabilities other animals lack. If AI were to surpass humanity in general intelligence and become superintelligent, then it could become difficult or impossible to control.
ChatGPT, which stands for Chat Generative Pre-trained Transformer, is a large language model-based chatbot developed by OpenAI and launched on November 30, 2022, notable for enabling users to refine and steer a conversation towards a desired length, format, style, level of detail, and language used. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context. ChatGPT is built upon GPT-3.
Le contenu de ce cours correspond à celui du cours d'Analyse I, comme il est enseigné pour les étudiantes et les étudiants de l'EPFL pendant leur premier semestre. Chaque chapitre du cours correspond
In contrast to vast academic efforts to study AI security, few real-world reports of AI security incidents exist. Released incidents prevent a thorough investigation of the attackers' motives, as crucial information about the company and AI application is ...
AAAI Press2024
Driven by the demand for real-time processing and the need to minimize latency in AI algorithms, edge computing has experienced remarkable progress. Decision-making AI applications stand out for their heavy reliance on data-centric operations, predominantl ...
Artificial intelligence (AI) is increasingly used in an ever larger number of industries. Alongside this development, however, abundant works argue that AI-driven systems are lacking in terms of safety, ethics and transparency. As a direct consequence, the ...