pyanomaly.networks.meta.pcn_parts package¶
Submodules¶
pyanomaly.networks.meta.pcn_parts.convolution_lstm module¶
-
class
pyanomaly.networks.meta.pcn_parts.convolution_lstm.ConvLSTM(input_channels, hidden_channels, kernel_size, step=1, effective_step=[1])¶ Bases:
torch.nn.modules.module.Module-
forward(input)¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
pyanomaly.networks.meta.pcn_parts.convolution_lstm.ConvLSTMCell(input_channels, hidden_channels, kernel_size)¶ Bases:
torch.nn.modules.module.Module-
forward(x, h, c)¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
pyanomaly.networks.meta.pcn_parts.erm module¶
pyanomaly.networks.meta.pcn_parts.pcm module¶
pyanomaly.networks.meta.pcn_parts.prednet module¶
PredNet in PyTorch.
-
class
pyanomaly.networks.meta.pcn_parts.prednet.PredNet(stack_sizes, R_stack_sizes, A_filter_sizes, Ahat_filter_sizes, R_filter_sizes, pixel_max=1.0, error_activation='relu', A_activation='relu', LSTM_activation='tanh', LSTM_inner_activation='hard_sigmoid', output_mode='error', extrap_start_time=None, data_format='channels_last', return_sequences=False)¶ Bases:
torch.nn.modules.module.ModulePredNet realized by zcr.
- Args:
- stack_sizes:
Number of channels in targets (A) and predictions (Ahat) in each layer of the architecture.
Length of stack_size (i.e. len(stack_size) and we use num_layers to denote it) is the number of layers in the architecture.
First element is the number of channels in the input.
e.g., (3, 16, 32) would correspond to a 3 layer architecture that takes in RGB images and has 16 and 32 channels in the second and third layers, respectively.
下标为(lay + 1)的值即为pytorch中第lay个卷积层的out_channels参数. 例如上述16对应到lay 0层(即输入层)的A和Ahat的out_channels是16.
- R_stack_sizes:
Number of channels in the representation (R) modules.
Length must equal length of stack_sizes, but the number of channels per layer can be different.
即pytorch中卷积层的out_channels参数.
- A_filter_sizes:
Filter sizes for the target (A) modules. (except the target (A) in lowest layer (i.e., input image))
Has length of len(stack_sizes) - 1.
e.g., (3, 3) would mean that targets for layers 2 and 3 are computed by a 3x3 convolution of the errors (E) from the layer below (followed by max-pooling)
即pytorch中卷积层的kernel_size.
- Ahat_filter_sizes:
Filter sizes for the prediction (Ahat) modules.
Has length equal to length of stack_sizes.
e.g., (3, 3, 3) would mean that the predictions for each layer are computed by a 3x3 convolution of the representation (R) modules at each layer.
即pytorch中卷积层的kernel_size.
- R_filter_sizes:
Filter sizes for the representation (R) modules.
Has length equal to length of stack_sizes.
Corresponds to the filter sizes for all convolutions in the LSTM.
即pytorch中卷积层的kernel_size.
- pixel_max:
The maximum pixel value.
Used to clip the pixel-layer prediction.
- error_activation:
Activation function for the error (E) units.
- A_activation:
Activation function for the target (A) and prediction (A_hat) units.
- LSTM_activation:
Activation function for the cell and hidden states of the LSTM.
- LSTM_inner_activation:
Activation function for the gates in the LSTM.
- output_mode:
Either 'error', 'prediction', 'all' or layer specification (e.g., R2, see below).
- Controls what is outputted by the PredNet.
- if 'error':
The mean response of the error (E) units of each layer will be outputted. That is, the output shape will be (batch_size, num_layers).
- if 'prediction':
The frame prediction will be outputted.
- if 'all':
The output will be the frame prediction concatenated with the mean layer errors. The frame prediction is flattened before concatenation. Note that nomenclature of 'all' means all TYPE of the output (i.e., error and prediction), but should not be confused with returning all of the layers of the model.
- For returning the features of a particular layer, output_mode should be of the form unit_type + layer_number.
e.g., to return the features of the LSTM "representational" units in the lowest layer, output_mode should be specificied as 'R0'. The possible unit types are 'R', 'Ahat', 'A', and 'E' corresponding to the 'representation', 'prediction', 'target', and 'error' units respectively.
- extrap_start_time:
Time step for which model will start extrapolating.
Starting at this time step, the prediction from the previous time step will be treated as the "actual"
- data_format:
'channels_first': (channel, Height, Width)
'channels_last' : (Height, Width, channel)
-
forward(A0_withTimeStep, initial_states)¶ - A0_withTimeStep is the input from dataloader. Its shape is: (batch_size, timesteps, 3, Height, Width).
说白了, 这个A0_withTimeStep就是dataloader加载出来的原始图像, 即最底层(layer 0)的A, 只不过在batch_size和timestep两个维度扩展了.
initial_states is a list of pytorch-tensors. 这个states参数其实就是初始状态, 因为这个forword函数本身是不被循环执行的.
- NOTE: 这个foward函数目的是为了实现原Keras版本的 step 函数, 但是和后者不太一样. 因为原代码的PredNet类是
继承了Keras中的`Recurrent`类, 所以貌似该父类就实现了将dataloader(即原代码中的SequenceGenerator)加载 的数据(batch_size, timesteps, 3, H, W)分解为(batch_size, 3, H, W), 然后循环timesteps次求解. 而这里的forward需要自己实现循环timesteps次. 这里的A的shape就是从dataloader中来的5D tensor (batch_size, timesteps, 3, Height, Width), 原代码中step函数的输入`x`的shape是4D tensor (batch_size, 3, Height, Width).
-
get_initial_states(input_shape)¶ - input_shape is like: (batch_size, timeSteps, Height, Width, 3)
or: (batch_size, timeSteps, 3, Height, Width)
-
isNotTopestLayer(layerIndex)¶ judge if the layerIndex is not the topest layer.
-
make_layers()¶ equal to the build method in original version.
-
step(A, states)¶ 这个step函数是和原代码中的`step`函数是等价的. 是PredNet的核心逻辑所在. 类比于标准LSTM的实现方式, 这个step函数的角色相当于LSTMCell, 而下面的forward函数相当于LSTM类.
- Args:
A: 4D tensor with the shape of (batch_size, 3, Height, Width). 就是从A_withTimeStep按照时间步抽取出来的数据. states 和 `forward`函数的`initial_states`的形式完全相同, 只是后者是初始化的PredNet状态, 而这里的states是在timesteps内运算时的PredNet参数.
-
pyanomaly.networks.meta.pcn_parts.prednet.batch_flatten(x)¶ equal to the batch_flatten in keras. x is a Variable in pytorch
-
pyanomaly.networks.meta.pcn_parts.prednet.get_activationFunc(act_str)¶
-
pyanomaly.networks.meta.pcn_parts.prednet.hard_sigmoid(x)¶ hard sigmoid function by zcr.
Computes element-wise hard sigmoid of x.
- what is hard sigmoid?
Segment-wise linear approximation of sigmoid. Faster than sigmoid. Returns 0. if x < -2.5, 1. if x > 2.5. In -2.5 <= x <= 2.5, returns 0.2 * x + 0.5.
See e.g. https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/sigm.py#L279