PharmaSimText: A Text-Based Educational Playground filled with RL-LLM Agents That Work Together Even in Disagreement

There has been a growing interest in developing simulated learners to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the learner's ability to generalize skills across tasks. In this paper, we aim to enhance simulated learners' generalization capabilities in less-structured text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs). We investigate three types of agents: (i) RL-based agents that utilize natural language for state and action representations, (ii) LLM-based agents that leverage the model's general knowledge and reasoning through prompting, and (iii) hybrid RL-LLM agents that combine these two strategies to improve agents' performance and generalizability. To support the development of these agents, we introduce PharmaSimText, a novel benchmark developed with expert-evaluated GPT-4 generations derived from a virtual pharmacy environment designed for practicing diagnostic conversations. After experimenting with RL-based and LLM-based agents using GPT-4 and open-source LLMs along with a wide range of strategies for combining them, we find that RL-based agents are good at completing tasks, but not at asking quality diagnostic questions. Conversely, LLM-based agents are better at asking diagnostic questions, but not at completing tasks. Finally, specific variations of hybrid RL-LLM agents enable us to overcome these limitations. Our findings highlight the potential of combining methods based on RL and LLMs in creating generalizable agents that have solutions close to human ones with the LLM component, while remaining faithful to controlled environments with the RL component. The source code and benchmark are available on GitHub. 1