Documentation

Gradient Metric Collector

class gradient_metrics.GradientMetricCollector(metrics: Union[List[gradient_metrics.metrics.GradientMetric], gradient_metrics.metrics.GradientMetric])

Helper class for computing gradients.

Parameters

metrics (sequence of GradientMetric or GradientMetric) – A list of gradient metrics.

Raises

ValueError – If the list of metrics is empty.

__call__(loss: torch.Tensor, retain_graph: bool = False) torch.Tensor

Computes gradient metrics per sample.

Parameters
  • loss (torch.Tensor) – A loss tensor to compute the gradients on. This should have a shape of (N,) with N being the number of samples.

  • retain_graph (bool) – If True, retains the graph of the supplied loss. Default False.

Raises
  • ValueError – If the loss does not require a gradient

  • ValueError – If the loss does not have a shape of (N,)

Returns

Gradient metrics per sample with a shape of (N,dim).

Return type

torch.Tensor

property data: torch.Tensor

Holds the metric data.

Returns

The metric values. All metrics are read out of the GradientMetric instances and concatenated. The output shape is (dim,).

Return type

torch.Tensor

property dim: int

Number of gradient metrics per sample.

This is useful if you want to build a meta model based on the retrieved gradient metrics and need to now the input shape per sample.

Returns

The number of gradient metrics per sample.

Return type

int

reset() None

Resets all gradient metric instances to their default values.

Gradient Metrics

GradientMetric

class gradient_metrics.metrics.GradientMetric(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)

This is the base class for all gradient metrics

Parameters
  • target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.

  • grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.

__call__(grad: Union[Sequence[torch.Tensor], torch.Tensor]) None

A gradient metric instance is registered as a backward hook on parameters. This is going to be called when the associated parameter is part of a backward call.

Parameters

grad (torch.Tensor) – The gradient of the associated parameter. On this the metric is going to be computed.

_collect(grad: torch.Tensor) None

This method has to be implemented by every GradientMetric subclass. It will be called on the gradient supplied to the instance via the backward hook.

Parameters

grad (torch.Tensor) – The gradient of the associated parameter.

Raises

NotImplementedError – Raises an error if not implemented by sub classes.

_get_metric() torch.Tensor

This method should return the metric values stored in buffer. It will be used by the data property and has to be implemented by all sub classes.

Raises

NotImplementedError – Raises an error if not implemented by sub classes.

Returns

The metric value stored in the buffer.

Return type

torch.Tensor

_register_parameters() None
property data: torch.Tensor

Holds the metric data.

Returns

The metric value stored in the buffer.

Return type

torch.Tensor

reset() None

Resets the metric values to a default value. Has to be implemented by all sub classes.

Raises

NotImplementedError – Raises an error if not implemented by sub classes.

Max

class gradient_metrics.metrics.Max(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)

Bases: gradient_metrics.metrics.GradientMetric

Computes the maximum over the gradients.

The maximum between the currently saved buffer and the supplied gradients is computed on each call, saving the result in the buffer.

Parameters
  • target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.

  • grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.

property data: torch.Tensor

Holds the metric data.

Returns

The metric value stored in the buffer.

Return type

torch.Tensor

reset() None

Initializes/resets the buffer to \(-\infty\)

Mean

class gradient_metrics.metrics.Mean(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)

Bases: gradient_metrics.metrics.GradientMetric

Computes the mean of the supplied gradients.

The buffer always holds the mean of all previously supplied gradients. This exists besides MeanStd to reduce computation cost if you do not want to computed the standard deviation.

Parameters
  • target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.

  • grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.

property data: torch.Tensor

Holds the metric data.

Returns

The metric value stored in the buffer.

Return type

torch.Tensor

reset() None

Initializes/resets the buffer and counter to 0

MeanStd

class gradient_metrics.metrics.MeanStd(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, return_mean: bool = True, eps: float = 1e-16)

Bases: gradient_metrics.metrics.GradientMetric

Computes Mean and Standard Deviation.

This uses Welford’s online algorithm for mean and variance computation to reduce memory usage.

If there was only a single gradient entry, the returned standard deviation is equal to eps. This is also the lower bound for the standard deviation.

Parameters
  • target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.

  • grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.

  • return_mean (bool, optional) – Whether to return the mean or not. Defaults to True.

  • eps (float, optional) – Small epsilon for gradients with very small standard deviation which would otherwise result in a possible division by zero in the second order derivatives. Defaults to 1e-16.

Raises

ValueError – If eps is smaller or equal to zero.

property data: torch.Tensor

Holds the metric data.

Returns

The metric value stored in the buffer.

Return type

torch.Tensor

reset() None

Initializes/resets the buffers.

Min

class gradient_metrics.metrics.Min(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)

Bases: gradient_metrics.metrics.GradientMetric

Computes the minimum over the gradients.

The minimum between the currently saved buffer and the supplied gradients is computed on each call, overwriting the buffer with the result.

Parameters
  • target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.

  • grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.

property data: torch.Tensor

Holds the metric data.

Returns

The metric value stored in the buffer.

Return type

torch.Tensor

reset() None

Initializes/resets the buffer to \(\infty\)

PNorm

class gradient_metrics.metrics.PNorm(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, p: float = 1.0)

Bases: gradient_metrics.metrics.GradientMetric

Computes the p-norm over the flattened gradients.

\[(\sum_{i=1}^n |x_i|^p)^{\frac{1}{p}}\]
Parameters
  • target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For torch.nn.Module instances a single metric instance will be registered to all parameters returned by torch.nn.Module.parameters(), thus computing the metric over all parameters of the Module.

  • grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a torch.Tensor as input and returns a torch.Tensor. The callable is applied to the gradient before it is handed over to the _collect method of a GradientMetric instance.

  • p (float optional) – Power of the norm. Defaults to 1 (absolute-value norm).

Raises

ValueError – If p is not in the interval (0,inf].

property data: torch.Tensor

Holds the metric data.

Returns

The metric value stored in the buffer.

Return type

torch.Tensor

reset() None

Initializes/resets the buffer to 0