March 1, 2019

Interestingness Elements for Explainable Reinforcement Learning through Introspection

Pedro Sequeira, Eric Yeh, Melinda Gervasio

Citation

Sequeira, P., Yeh, E., and Gervasio, M. (2019). Interestingness elements for explainable reinforcement learning through introspection. Joint Proceedings of the ACM IUI 2019 Workshops, Vol. 2327.

Abstract

We propose a framework toward more explainable reinforcement learning (RL) agents. The framework uses introspective analysis of an agent’s history of interaction with its environment to extract several interestingness elements regarding its behavior. Introspection operates at three distinct levels, first analyzing characteristics of the task that the agent has to solve, then the behavior of the agent while interacting with the environment, and finally by performing a meta-analysis combining information gathered at the lower levels. The analyses rely on data that is already collected by standard RL algorithms. We propose that additional statistical data can easily be collected by a RL agent while learning that helps extract more meaningful aspects. We provide insights on how an explanation framework can leverage the elements generated through introspection. Namely, they can help convey learned strategies to a human user, justify the agent’s decisions in relevant situations, denote its learned preferences and goals, and identify circumstances in which advice from the user might be needed.

↓ View online

Interestingness Elements for Explainable Reinforcement Learning through Introspection

Abstract

Read more from SRI

Vaalia Health: Causal AI for precision healthcare

Ashish Gehani named 2025 SRI Fellow

Major NIH study helps families manage sleep and screentime