Documentation

Gradient Metric Collector

class gradient_metrics.GradientMetricCollector(metrics: Union[List[gradient_metrics.metrics.GradientMetric], gradient_metrics.metrics.GradientMetric])

Helper class for computing gradients.

Parameters: metrics (sequence of GradientMetric or GradientMetric) – A list of gradient metrics.
Raises: ValueError – If the list of metrics is empty.

__call__(loss: torch.Tensor, retain_graph: bool = False) → torch.Tensor

Computes gradient metrics per sample.

Parameters

loss (torch.Tensor) – A loss tensor to compute the gradients on. This should have a shape of (N,) with N being the number of samples.
retain_graph (bool) – If True, retains the graph of the supplied loss. Default False.

Raises

ValueError – If the loss does not require a gradient
ValueError – If the loss does not have a shape of (N,)

Returns

Gradient metrics per sample with a shape of (N,dim).

Return type

torch.Tensor

property data: torch.Tensor

Holds the metric data.

Returns: The metric values. All metrics are read out of the GradientMetric instances and concatenated. The output shape is (dim,).
Return type: torch.Tensor

property dim: int

Number of gradient metrics per sample.

This is useful if you want to build a meta model based on the retrieved gradient metrics and need to now the input shape per sample.

Returns: The number of gradient metrics per sample.
Return type: int

reset() → None: Resets all gradient metric instances to their default values.

Gradient Metrics

GradientMetric

class gradient_metrics.metrics.GradientMetric(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)

This is the base class for all gradient metrics

Parameters

target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.
grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.

__call__(grad: Union[Sequence[torch.Tensor], torch.Tensor]) → None

A gradient metric instance is registered as a backward hook on parameters. This is going to be called when the associated parameter is part of a backward call.

Parameters: grad (torch.Tensor) – The gradient of the associated parameter. On this the metric is going to be computed.

_collect(grad: torch.Tensor) → None

This method has to be implemented by every GradientMetric subclass. It will be called on the gradient supplied to the instance via the backward hook.

Parameters: grad (torch.Tensor) – The gradient of the associated parameter.
Raises: NotImplementedError – Raises an error if not implemented by sub classes.

_get_metric() → torch.Tensor

This method should return the metric values stored in buffer. It will be used by the data property and has to be implemented by all sub classes.

Raises: NotImplementedError – Raises an error if not implemented by sub classes.
Returns: The metric value stored in the buffer.
Return type: torch.Tensor

_register_parameters() → None

property data: torch.Tensor

Holds the metric data.

Returns: The metric value stored in the buffer.
Return type: torch.Tensor

reset() → None

Resets the metric values to a default value. Has to be implemented by all sub classes.

Raises: NotImplementedError – Raises an error if not implemented by sub classes.

Max

class gradient_metrics.metrics.Max(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)

Bases: gradient_metrics.metrics.GradientMetric

Computes the maximum over the gradients.

The maximum between the currently saved buffer and the supplied gradients is computed on each call, saving the result in the buffer.

Parameters

target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.
grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.

property data: torch.Tensor

Holds the metric data.

Returns: The metric value stored in the buffer.
Return type: torch.Tensor

reset() → None: Initializes/resets the buffer to \(-\infty\)

Mean

class gradient_metrics.metrics.Mean(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)

Bases: gradient_metrics.metrics.GradientMetric

Computes the mean of the supplied gradients.

The buffer always holds the mean of all previously supplied gradients. This exists besides MeanStd to reduce computation cost if you do not want to computed the standard deviation.

Parameters

target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.
grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.

property data: torch.Tensor

Holds the metric data.

Returns: The metric value stored in the buffer.
Return type: torch.Tensor

reset() → None: Initializes/resets the buffer and counter to 0

MeanStd

class gradient_metrics.metrics.MeanStd(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, return_mean: bool = True, eps: float = 1e-16)

Bases: gradient_metrics.metrics.GradientMetric

Computes Mean and Standard Deviation.

This uses Welford’s online algorithm for mean and variance computation to reduce memory usage.

If there was only a single gradient entry, the returned standard deviation is equal to eps. This is also the lower bound for the standard deviation.

Parameters

target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.
grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.
return_mean (bool, optional) – Whether to return the mean or not. Defaults to True.
eps (float, optional) – Small epsilon for gradients with very small standard deviation which would otherwise result in a possible division by zero in the second order derivatives. Defaults to 1e-16.

Raises

ValueError – If eps is smaller or equal to zero.

property data: torch.Tensor

Holds the metric data.

Returns: The metric value stored in the buffer.
Return type: torch.Tensor

reset() → None: Initializes/resets the buffers.

Min

class gradient_metrics.metrics.Min(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)

Bases: gradient_metrics.metrics.GradientMetric

Computes the minimum over the gradients.

The minimum between the currently saved buffer and the supplied gradients is computed on each call, overwriting the buffer with the result.

Parameters

target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.
grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.

property data: torch.Tensor

Holds the metric data.

Returns: The metric value stored in the buffer.
Return type: torch.Tensor

reset() → None: Initializes/resets the buffer to \(\infty\)

PNorm

class gradient_metrics.metrics.PNorm(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, p: float = 1.0)

Bases: gradient_metrics.metrics.GradientMetric

Computes the p-norm over the flattened gradients.

\[(\sum_{i=1}^n |x_i|^p)^{\frac{1}{p}}\]

Parameters

target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.
grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.
p (float optional) – Power of the norm. Defaults to 1 (absolute-value norm).

Raises

ValueError – If p is not in the interval (0,inf].

property data: torch.Tensor

Holds the metric data.

Returns: The metric value stored in the buffer.
Return type: torch.Tensor

reset() → None: Initializes/resets the buffer to 0