Agents

The agents are the entities that perform the learning process to, ultimately, accomplish a task.

Reinforcement learning

The agents based on reinforcement learning implement a value-based algorithm called Q-learning. More precisely, the agent implemented in this framework is based on deep double Q-learning.


source

DQNAgent

 DQNAgent (model, learning_rate=0.001, criterion=None, optimizer=None,
           batch_size=128, target_update=5, gamma=0.85, eps_0=1,
           eps_decay=0.999, eps_min=0.1)

Agent based on a deep Q-Network (DQN): Input: - model: torch.nn.Module with the DQN model. Dimensions must be consistent - criterion: loss criterion (e.g., torch.nn.SmoothL1Loss) - optimizer: optimization algorithm (e.g., torch.nn.Adam) - eps_0: initial epsilon value for an epsilon-greedy policy - eps_decay: exponential decay factor for epsilon in the epsilon-greedy policy - eps_min: minimum saturation value for epsilon - gamma: future reward discount factor for Q-value estimation

We provide a default architecture for the neural network that encodes the Q-values, usually referred to as deep Q-Network (DQN).


source

DQN

 DQN (state_size, action_size)

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

Monte-Carlo

The agents based on Monte-Carlo sampling follow the Metropolis-Hastings algorithm to move between states. A random action (new state) is proposed and the move is accepted or rejected with a certain probability.


source

MCAgent

 MCAgent (beta=0.1)

Initialize self. See help(type(self)) for accurate signature.