= BrFSAgent(np.array([1, 1, 1, 0, 0, 0])) agent
Agents
Reinforcement learning
The agents based on reinforcement learning implement a value-based algorithm called Q-learning. More precisely, the agent implemented in this framework is based on deep double Q-learning.
DQNAgent
DQNAgent (model, learning_rate=0.001, criterion=None, optimizer=None, batch_size=128, target_update=5, gamma=0.85, eps_0=1, eps_decay=0.999, eps_min=0.1)
Agent based on a deep Q-Network (DQN): Input: - model: torch.nn.Module with the DQN model. Dimensions must be consistent - criterion: loss criterion (e.g., torch.nn.SmoothL1Loss) - optimizer: optimization algorithm (e.g., torch.nn.Adam) - eps_0: initial epsilon value for an epsilon-greedy policy - eps_decay: exponential decay factor for epsilon in the epsilon-greedy policy - eps_min: minimum saturation value for epsilon - gamma: future reward discount factor for Q-value estimation
We provide a default architecture for the neural network that encodes the Q-values, usually referred to as deep Q-Network (DQN).
DQN
DQN (state_size, action_size)
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool
Blind-search
The agents based on tree search currently only implement blind-search techniques, such as breadth first search.
BrFSAgent
BrFSAgent (initial_state)
Agent based on Breadth First Search (BrFS).
agent.expand()
[array([0, 1, 1, 0, 0, 0]),
array([1, 0, 1, 0, 0, 0]),
array([1, 1, 0, 0, 0, 0]),
array([1, 1, 1, 1, 0, 0]),
array([1, 1, 1, 0, 1, 0]),
array([1, 1, 1, 0, 0, 1])]
Monte-Carlo
The agents based on Monte-Carlo sampling follow the Metropolis-Hastings algorithm to move between states. A random action (new state) is proposed and the move is accepted or rejected with a certain probability.
MCAgent
MCAgent (beta=0.1)
Initialize self. See help(type(self)) for accurate signature.