DAG implementation

DAG implementation#

DAG encoding#

class Brick(nn.Module)[source]

Bases: Module

The Meta class Brick serves as a basis to incorporate the nn.Module layers from PyTorch into DRAGON. In addition to the __init__ and forward functions, they should have a method to modify the layer given an input shape. The **args correspond to the layer hyperparameters.

Parameters:

input_shape (tuple) – Shape of the input tensor.

foward(X)[source]

forward(X)

Forward pass.

Parameters:

X (torch.Tensor) – Input tensor.

modify_operation(input_shape)[source]

Modify the operation so it can take a tensor of shape input_shape as input.

Parameters:

input_shape (tuple) – Shape of the input tensor.

class Node(nn.Module)[source]

Bases: Module

The class Node is the implementation of a DAG node. Each node is made of a combiner, an operation and an activation function. The operation is parametrized by a set of hyperparameters.

Parameters:
  • combiner (str) – Name of the combiner. The only combiner implemented for now within DRAGON are: ‘add’, ‘concat’ and ‘mul’

  • operation (Brick) – Operation that will be performed within the node.

  • hp (dict) – Dictionary containing the hyperparameters. The keys name should match the arguments of the operation class __init__ function.

  • activation (nn.Module, default=nn.Identity()) – Activation function.

  • input_comp (['Pad', 'Crop'], default='Pad') – Defines how the combiner will compute the input shape. When set to ‘Pad’, the maximum input shape from all the incoming tensors will be taken. When set to ‘Crop’ the mean input shape will be taken.

Examples

>>> import torch.nn as nn
>>> from dragon.search_space.bricks import MLP
>>> from dragon.search_space.dag_variables import Node
>>> print(Node(combiner="add", operation=MLP, hp={"out_channels": 10}, activation=nn.ReLU()))
(combiner) add -- (name) <class 'dragon.search_space.bricks.basics.MLP'> -- (hp) {'out_channels': 10} -- (activation) ReLU() --
copy()[source]

Creates an new Node variable which is a copy of this one.

Returns:

new_node – Copy of the actual Node.

Return type:

Node

set_operation(input_shapes, device=None)[source]

Initialize the operation using the new input shapes. First, the global input shape of the operation is computed, using the combiner type and the input_comp attribute Then, the operation is initialized with the global input shape and the hyperparameters. The operation parameters are modified with the xavier_uniform initialization. Finally, the node ouput shape is computed.

Parameters:
  • input_shapes (list, tuple or int) – Input shapes of the multiple (or single) input vectors of the node.

  • device (str, default=None) – Device on which the node operation should be computed.

compute_input_shape(X, h=None)[source]

Compute the global input shape for the operation, given the (possibly) multiple input shapes. The global shape depends on the combiner type and the value of self.input_comb.

Parameters:

input_shapes (list) – List containing the input shapes of the different input tensors.

Return type:

tuple

combine(X)[source]

Use the combiner to combine the input vectors. First the vectors are modified to have the global input shape using the self.padding function. Then they are combined by addition, multiplication or concatenation.

Parameters:

X (list) – List containing the input tensors.

Return type:

torch.Tensor

padding(X, start=-1, pad_start=())[source]

Modify the input tensors gathered in X so they all have the global input shape. The padding is performed over all dimensions for the ‘add’ and ‘mul’ combiners, but not on the last one for the ‘concat’ combiner.

Parameters:
  • X (list or torch.Tensor) – List containing the input tensors.

  • start (int, default=-1) – Dimension where to start the padding. It depends on the combiner.

  • pad_start (tuple, default=()) – Default padding over the last dimension. It depends on the combiner.

Returns:

pad_X – List containing the tensors with the right shape.

Return type:

list

compute_output_shape()[source]

Compute the output shape of the node. A fake vector is created with a shape equals to the global input shape. This fake vector is processed by the operation. The output vector shape will be the node output shape.

Returns:

shape – The node output shape.

Return type:

tuple

modification(combiner=None, name=None, hp=None, input_shapes=None, device=None)[source]

Modify the node. The modifications can be applied to the combiner, the operation, the operation’s hyperparameters or the input shapes. The values set to None will stay unchanged. If the operation and the hyperparameters do not change, the operation is just modified. Otherwise a new one will be created.

Parameters:
  • combiner (str, default=None) – Name of the new combiner.

  • operation (Brick, default=None) – New operation that will be performed within the node.

  • hp (dict, default=None) – Dictionary containing the new hyperparameters. The keys name should match the arguments of the operation class __init__ function.

  • input_shape (list, tuple or int, default=None) – List of the new input shapes.

  • device (str, default=None) – Name of the device where the node is computed.

modify_operation(input_shape)[source]

Modify the operation so it can take as input a tensor of shape input_shape.

Parameters:

input_shape (tuple) – New input shape.

set(input_shapes)[source]

Initialize or modify the node with the incoming shapes `input_shape`s .

Parameters:

input_shapes (list, tuple or int) – Input shapes of the multiple (or single) input vectors of the node.

forward(X, h=None)[source]

Forward pass of the layer. The inputs are first combined by the combiner, then processed by the operation and the activation function.

Parameters:
  • X (torch.Tensor or list) – Input tensor or list of input tensors.

  • h (torch.Tensor, default=None) – Hidden state, used in the case of recurrent layer.

Returns:

X or (X,h) – Processed tensor(s).

Return type:

torch.Tensor

load_state_dict(state_dict, **kwargs)[source]

Copy parameters and buffers from state_dict into this module and its descendants.

If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Warning

If assign is True the optimizer must be created after the call to load_state_dict unless get_swap_module_params_on_conversion() is True.

Parameters:
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

  • assign (bool, optional) – When False, the properties of the tensors in the current module are preserved while when True, the properties of the Tensors in the state dict are preserved. The only exception is the requires_grad field of Default: ``False`

Returns:

  • missing_keys is a list of str containing any keys that are expected

    by this module but missing from the provided state_dict.

  • unexpected_keys is a list of str containing the keys that are not

    expected by this module but present in the provided state_dict.

Return type:

NamedTuple with missing_keys and unexpected_keys fields

Note

If a parameter or buffer is registered as None and its corresponding key exists in state_dict, load_state_dict() will raise a RuntimeError.

class AdjMatrix(nn.Module)[source]

Bases: Module

The class AdjMatrix is the implementation of an Directed Acyclic Graph (DAG) using its adjacency matrix combined with the nodes list.

Parameters:
  • operations (list) – List of nodes, ie: the operations that would be performed within the graph.

  • matrix (np.array) – Adjacency matrix. The order of the operations and adjacency matrix’s entries should be the same.

Examples

>>> import numpy as np
>>> from dragon.search_space.dag_variables import AdjMatrix
>>> import torch.nn as nn
>>> from dragon.search_space.bricks import MLP, Identity
>>> from dragon.search_space.dag_variables import Node
>>> node_1 = Node(combiner="add", operation=MLP, hp={"out_channels": 10}, activation=nn.ReLU())
>>> node_2 = Node(combiner="add", operation=MLP, hp={"out_channels": 5}, activation=nn.ReLU())
>>> node_3 = Node(combiner="concat", operation=Identity, hp={}, activation=nn.Softmax())
>>> operations = [node_1, node_2, node_3]
>>> matrix = np.array([[0, 1, 1],
                       [0, 0, 1],
                       [0, 0, 0]])
>>> print(AdjMatrix(operations, matrix))
NODES: [
(combiner) add -- (name) <class 'dragon.search_space.bricks.basics.MLP'> -- (hp) {'out_channels': 10} -- (activation) ReLU() -- ,
(combiner) add -- (name) <class 'dragon.search_space.bricks.basics.MLP'> -- (hp) {'out_channels': 5} -- (activation) ReLU() -- ,
(combiner) concat -- (name) <class 'dragon.search_space.bricks.basics.Identity'> -- (hp) {} -- (activation) Softmax(dim=None) -- ] | MATRIX:[[0, 1, 1], [0, 0, 1], [0, 0, 0]]
assert_adj_matrix()[source]

The operations and matrix variables should verify some properties such as: - The operations variable should be a list. - The matrix variable should be a squared upper-triangular numpy array filled with 0s on the diagonal. - The matrix variable should not contain empty rows beside the last one and empty columns beside the first one. It would indeed emply nodes without incoming or outgoing connections. - The matrix variable and the :node: operations variable should have the same dimension.

copy()[source]

Creates an new AdjMatrix variable which is a copy of this one.

Returns:

adj_matrix – Copy of the actual variable.

Return type:

AdjMatrix

set(input_shape)[source]

Initialize the nn.Module within the operations list, with the new input shape. If the layers have already been initialized, they may be modified if the input_shape has changed since their initialization. The layers are initialized or modified one after the other, in the :node: operations list order.

Parameters:

input_shape (int or tuple) – Shape of the DAG’s input tensor.

forward(X)[source]

Forward pass through the DAG. The latent vectors are processed layer by layer, following the :node: operations list order.

Parameters:

X (torch.Tensor) – Input tensor.

Returns:

output – Network output tensor.

Return type:

torch.Tensor

fill_adj_matrix(matrix)[source]

Add random edges into an adjacency matrix in case it contains orphan nodes (no incoming connection) or nodes having no outgoing connection. Except from the first node, all nodes should have at least one incoming connection, meaning the corresponding column should not sum to zero. Except from the last node, all nodes should have at least one outgoing connection, meaning the corresponding row should not sum to zero.

Parameters:

matrix (np.array) – Adjacency matrix from a DAG that may contain orphan nodes.

Returns:

matrix – Adjacency matrix from a DAG that does not contain orphan nodes.

Return type:

np.array

DAG variables#

class HpVar(Variable)[source]

Bases: Variable

The class HpVar defines Variables which represent a node operation. The operation can be a Constant or a CatVar, where the values inherit from the Brick class. If the operation is represented by a Constant, the multiple operations should share the same hyperparameters.

Parameters:
  • label (str) – Name of the variable.

  • operation (Constant or CatVar) – One or several candidate operations encoded as Brick variable. If operation is a CatVar, the multiple operations should share the same hyperparameters.

  • hyperparameters (dict) – Dictionary of hyperparameters which inherit from Variables (for example IntVar for a number of channels or FloatVar for a dropout rate).

Examples

>>> from dragon.search_space.bricks import MLP
>>> from dragon.search_space.base_variables import Constant, IntVar
>>> from dragon.search_space.dag_variables import HpVar
>>> mlp = Constant("Mlp operation", MLP)
>>> hp = {"out_channels": IntVar("out_channels", 1, 10)}
>>> mlp_var = HpVar("MLP var", mlp, hyperparameters=hp)
>>> mlp_var.random()
[<class 'dragon.search_space.bricks.basics.MLP'>, {'out_channels': 9}]
>>> from dragon.search_space.bricks import LayerNorm1d, BatchNorm1d
>>> from dragon.search_space.base_variables import CatVar
>>> norm = CatVar("1d norm layers", features=[LayerNorm1d, BatchNorm1d])
>>> norm_var = HpVar("Norm var", norm, hyperparameters={})
>>> norm_var.random()
[<class 'dragon.search_space.bricks.normalization.BatchNorm1d'>, {}]
random(size=1)[source]

Create random operation. First, if the operation is a CatVar, an operation is randomly selected among the different possibilities. Then, one random value per hyperparameter is drawn.

Parameters:

size (int, default=1) – Number of draws.

Returns:

matrices – List containing the randomly created operations, or a single operation if size=1.

Return type:

list or AdjMatrix

isconstant()[source]
Returns:

out

Return type:

False

class NodeVariable(Variable)[source]

Bases: Variable

The class NodeVariable defines Variable which represent DAGs nodes by creating objects from the Node class.

Parameters:
  • label (str) – Name of the variable.

  • operations (DynamicBlock) – HpVar containing the candidate operations.

  • init_complexity (int) – Maximum number of nodes that the randomly created DAGs should have.

Examples

>>> from dragon.search_space.dag_variables import NodeVariable, HpVar
>>> from dragon.search_space.bricks import MLP
>>> from dragon.search_space.base_variables import Constant, IntVar, CatVar
>>> from dragon.search_space.bricks_variables import activation_var
>>> combiner = CatVar("Combiner", features = ['add', 'mul'])
>>> operation = HpVar("Operation", Constant("Mlp operation", MLP), hyperparameters={"out_channels": IntVar("out_channels", 1, 10)})
>>> node = NodeVariable(label="Node variable",
...                 combiner=combiner,
...                 operation=operation,
...                 activation_function=activation_var("Activation"))
>>> node.random()
(combiner) mul -- (name) <class 'dragon.search_space.bricks.basics.MLP'> -- (hp) {'out_channels': 2} -- (activation) SiLU() --
random(size=1)[source]

Create random nodes. The combiner, the operation and the activation function are sequentally randomly selected.

Parameters:

size (int, default=1) – Number of draws.

Returns:

matrices – List containing the randomly created nodes, or a single node if size=1.

Return type:

list or Node

isconstant()[source]
Returns:

out – Return False, a dynamic block cannot be constant. (It is a binary)

Return type:

False

class EvoDagVariable(Variable)[source]

Bases: Variable

The class EvoDagVariable defines Variables which represent Directed Acyclic Graph by creating objects from the AdjMatrix class. The candidate operations should be gathered within a DynamicBlock. The maximum size of this DynamicBlock will set the graph maximum number of nodes.

Parameters:
  • label (str) – Name of the variable.

  • operations (DynamicBlock) – DynamicBlock containing Variables corresponding to the candidate operations.

  • init_complexity (int) – Maximum number of nodes that the randomly created DAGs should have.

Examples

>>> from dragon.search_space.dag_variables import HpVar, NodeVariable, EvoDagVariable
>>> from dragon.search_space.bricks import MLP, MaxPooling1D, AVGPooling1D
>>> from dragon.search_space.base_variables import Constant, IntVar, CatVar, DynamicBlock
>>> from dragon.search_space.bricks_variables import activation_var
>>> mlp = HpVar("Operation", Constant("MLP operation", MLP), hyperparameters={"out_channels": IntVar("out_channels", 1, 10)})
>>> pooling = HpVar("Operation", CatVar("Pooling operation", [MaxPooling1D, AVGPooling1D]), hyperparameters={"pool_size": IntVar("pool_size", 1, 5)})
>>> candidates = NodeVariable(label = "Candidates",
...                         combiner=CatVar("Combiner", features=['add', 'concat']),
...                         operation=CatVar("Candidates", [mlp, pooling]),
...                         activation_function=activation_var("Activation"))
>>> operations = DynamicBlock("Operations", candidates, repeat=5)
>>> dag = EvoDagVariable(label="DAG", operations=operations)
>>> dag.random()
NODES: [
(combiner) add -- (name) <class 'dragon.search_space.bricks.basics.Identity'> -- (hp) {} -- (activation) Identity() -- ,
(combiner) add -- (name) <class 'dragon.search_space.bricks.pooling.MaxPooling1D'> -- (hp) {'pool_size': 2} -- (activation) Sigmoid() -- ,
(combiner) concat -- (name) <class 'dragon.search_space.bricks.pooling.MaxPooling1D'> -- (hp) {'pool_size': 3} -- (activation) ELU(alpha=1.0) -- ,
(combiner) add -- (name) <class 'dragon.search_space.bricks.pooling.AVGPooling1D'> -- (hp) {'pool_size': 4} -- (activation) ReLU() -- ] | MATRIX:[[0, 1, 1, 1], [0, 0, 1, 1], [0, 0, 0, 1], [0, 0, 0, 0]]
random(size=1)[source]

Create random DAGs. First, a list of random nodes is creating, with a size lower than the :code: complexity attribute. The first element of this list will always be an Identity layer. Then, the adjacency matrix is created as an upper-triangle matrix, with the same size as the list. This adjacency matrix is corrected using the :code: fill_adj_matrix function to prevent nodes from having no incoming or outgoing connections.

Parameters:

size (int, default=1) – Number of draws.

Returns:

matrices – List containing the randomly created DAGs, or a single DAG if size=1.

Return type:

list or AdjMatrix

isconstant()[source]
Returns:

out – Return False, a dynamic block cannot be constant. (It is a binary)

Return type:

False