Skip to content

Maze Games

Maze games, also known as grid world games, are the most popular test environments for QRL algorithms. However, in QRL literature, various researchers have used different maze games, which makes it difficult to compare the results. In this tutorial, we will show how to implement a custom maze game env which can be used with CleanQRL. We provide an implementation which makes it easy to customize parameters like rewards and maze size such that one can adapt it easily. Many researchers have used different maze games with slight differences:

Researcher/Paper Maze Size Reward Structure (reward, penalty, neutral) maze name
Crawford nx5 200, 0, 100 crawford
Müller 3x3 220, -220, -10 mueller
Neumann 3x4 200, -200, -10 neumann_a
3x5 200, -200, -10 neumann_b
4x5 200, -200, -10 neumann_c

Depending on the QRL algorithm, the state is either binary or one-hot encoded. Additionally, sometimes the actions of the agents are restricted and sometimes they are not. As one can see, this makes meaningful comparisons difficult. Nevertheless, in the implementation below, one can customize the following parameters:

custom_maze.py
        # Environment parameters
        env_id: str = "CustomMazeEnv"  # Environment ID
        maze_name: str = "crawford"  # Name of the maze
        state_encoding: str = "binary" # State encoding: binary, onehot, integer
        n: int = 3  # Number of rows in the maze if the maze name is crawford
        P: float = -100 # Value of penalty
        R: float = 100 # Value of reward
        N: float = 0 # Default / neutral reward for all other states