Rough Clustering-Based Monte Carlo Reinforcement Learning with Options
for Minimizing the Length of Trajectories

In Reinforcement Learning (RL), Monte Carlo methods teach the agents how
to make decisions based on the principle of averaging the sample
returns. Monte Carlo Reinforcement Learning (MCRL) methods interact with
an environment and estimate the value functions of states based on
experiences gained from those interactions. Monte Carlo RL methods need
to collect many full trajectories (sequences of states, actions, and
rewards) for accurate estimation of value functions and for achieving
convergence. Even if the state space of the Reinforcement Learning
problem is moderately large, the number of trajectories required to
estimate value functions accurately can become significantly large.
Since the length of each trajectory increases as well with the increase
in the size of the state space, collecting and processing each
trajectory also becomes computationally expensive and time-consuming. To
address this problem, in this work, we proposed a new methodology for
making Monte Carlo RL methods more efficient and converge faster by
accelerating the learning process. The proposed methodology performs
state space abstraction by clustering the state space, which causes a
significant reduction in the lengths of trajectories. A rough
clustering-based approach is considered for the abstraction of state
space. An agent is allowed to explore the environment and collect the
trajectories for some number of episodes. Based on the collected
trajectories, the preliminary topological structure of the environment
is obtained and initial estimates of state value functions are
calculated. Using the obtained preliminary topological structure,
available estimates of state value functions, group average hierarchical
agglomerative clustering is performed and the state space is divided
into clusters of states. The initial estimates of state value functions
and preliminary topological structure are again used for the reshuffling
of states in clustered state space via rough clustering. By performing
several iterations of rough clustering, better-quality clusters of
states are obtained. Options (macro-actions) for traversing between
different clusters of the state space are learnt using dynamic
programming techniques. Once the options are learnt, at each time step
of Monte Carlo learning, the agent can choose between a primitive action
or an option if available. These options significantly accelerate the
learning process. Options reduce the lengths of trajectories. Trajectory
length reduction results in faster convergence of Monte Carlo methods,
and the same is shown with the support of experimental results in this
paper.