Documentation
Gradient Metric Collector
- class gradient_metrics.GradientMetricCollector(metrics: Union[List[gradient_metrics.metrics.GradientMetric], gradient_metrics.metrics.GradientMetric])
Helper class for computing gradients.
- Parameters
metrics (sequence of GradientMetric or GradientMetric) – A list of gradient metrics.
- Raises
ValueError – If the list of metrics is empty.
- __call__(loss: torch.Tensor, retain_graph: bool = False) torch.Tensor
Computes gradient metrics per sample.
- Parameters
loss (torch.Tensor) – A loss tensor to compute the gradients on. This should have a shape of
(N,)
withN
being the number of samples.retain_graph (bool) – If True, retains the graph of the supplied loss. Default False.
- Raises
ValueError – If the loss does not require a gradient
ValueError – If the loss does not have a shape of
(N,)
- Returns
Gradient metrics per sample with a shape of
(N,dim)
.- Return type
torch.Tensor
- property data: torch.Tensor
Holds the metric data.
- Returns
The metric values. All metrics are read out of the
GradientMetric
instances and concatenated. The output shape is(dim,)
.- Return type
torch.Tensor
- property dim: int
Number of gradient metrics per sample.
This is useful if you want to build a meta model based on the retrieved gradient metrics and need to now the input shape per sample.
- Returns
The number of gradient metrics per sample.
- Return type
int
- reset() None
Resets all gradient metric instances to their default values.
Gradient Metrics
GradientMetric
- class gradient_metrics.metrics.GradientMetric(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)
This is the base class for all gradient metrics
- Parameters
target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For
torch.nn.Module
instances a single metric instance will be registered to all parameters returned bytorch.nn.Module.parameters()
, thus computing the metric over all parameters of the Module.grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a
torch.Tensor
as input and returns atorch.Tensor
. The callable is applied to the gradient before it is handed over to the_collect
method of aGradientMetric
instance.
- __call__(grad: Union[Sequence[torch.Tensor], torch.Tensor]) None
A gradient metric instance is registered as a backward hook on parameters. This is going to be called when the associated parameter is part of a backward call.
- Parameters
grad (torch.Tensor) – The gradient of the associated parameter. On this the metric is going to be computed.
- _collect(grad: torch.Tensor) None
This method has to be implemented by every GradientMetric subclass. It will be called on the gradient supplied to the instance via the backward hook.
- Parameters
grad (torch.Tensor) – The gradient of the associated parameter.
- Raises
NotImplementedError – Raises an error if not implemented by sub classes.
- _get_metric() torch.Tensor
This method should return the metric values stored in buffer. It will be used by the data property and has to be implemented by all sub classes.
- Raises
NotImplementedError – Raises an error if not implemented by sub classes.
- Returns
The metric value stored in the buffer.
- Return type
torch.Tensor
- _register_parameters() None
- property data: torch.Tensor
Holds the metric data.
- Returns
The metric value stored in the buffer.
- Return type
torch.Tensor
- reset() None
Resets the metric values to a default value. Has to be implemented by all sub classes.
- Raises
NotImplementedError – Raises an error if not implemented by sub classes.
Max
- class gradient_metrics.metrics.Max(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)
Bases:
gradient_metrics.metrics.GradientMetric
Computes the maximum over the gradients.
The maximum between the currently saved buffer and the supplied gradients is computed on each call, saving the result in the buffer.
- Parameters
target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For
torch.nn.Module
instances a single metric instance will be registered to all parameters returned bytorch.nn.Module.parameters()
, thus computing the metric over all parameters of the Module.grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a
torch.Tensor
as input and returns atorch.Tensor
. The callable is applied to the gradient before it is handed over to the_collect
method of aGradientMetric
instance.
- property data: torch.Tensor
Holds the metric data.
- Returns
The metric value stored in the buffer.
- Return type
torch.Tensor
- reset() None
Initializes/resets the buffer to \(-\infty\)
Mean
- class gradient_metrics.metrics.Mean(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)
Bases:
gradient_metrics.metrics.GradientMetric
Computes the mean of the supplied gradients.
The buffer always holds the mean of all previously supplied gradients. This exists besides
MeanStd
to reduce computation cost if you do not want to computed the standard deviation.- Parameters
target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For
torch.nn.Module
instances a single metric instance will be registered to all parameters returned bytorch.nn.Module.parameters()
, thus computing the metric over all parameters of the Module.grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a
torch.Tensor
as input and returns atorch.Tensor
. The callable is applied to the gradient before it is handed over to the_collect
method of aGradientMetric
instance.
- property data: torch.Tensor
Holds the metric data.
- Returns
The metric value stored in the buffer.
- Return type
torch.Tensor
- reset() None
Initializes/resets the buffer and counter to 0
MeanStd
- class gradient_metrics.metrics.MeanStd(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, return_mean: bool = True, eps: float = 1e-16)
Bases:
gradient_metrics.metrics.GradientMetric
Computes Mean and Standard Deviation.
This uses Welford’s online algorithm for mean and variance computation to reduce memory usage.
If there was only a single gradient entry, the returned standard deviation is equal to
eps
. This is also the lower bound for the standard deviation.- Parameters
target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For
torch.nn.Module
instances a single metric instance will be registered to all parameters returned bytorch.nn.Module.parameters()
, thus computing the metric over all parameters of the Module.grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a
torch.Tensor
as input and returns atorch.Tensor
. The callable is applied to the gradient before it is handed over to the_collect
method of aGradientMetric
instance.return_mean (bool, optional) – Whether to return the mean or not. Defaults to True.
eps (float, optional) – Small epsilon for gradients with very small standard deviation which would otherwise result in a possible division by zero in the second order derivatives. Defaults to 1e-16.
- Raises
ValueError – If eps is smaller or equal to zero.
- property data: torch.Tensor
Holds the metric data.
- Returns
The metric value stored in the buffer.
- Return type
torch.Tensor
- reset() None
Initializes/resets the buffers.
Min
- class gradient_metrics.metrics.Min(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None)
Bases:
gradient_metrics.metrics.GradientMetric
Computes the minimum over the gradients.
The minimum between the currently saved buffer and the supplied gradients is computed on each call, overwriting the buffer with the result.
- Parameters
target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For
torch.nn.Module
instances a single metric instance will be registered to all parameters returned bytorch.nn.Module.parameters()
, thus computing the metric over all parameters of the Module.grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a
torch.Tensor
as input and returns atorch.Tensor
. The callable is applied to the gradient before it is handed over to the_collect
method of aGradientMetric
instance.
- property data: torch.Tensor
Holds the metric data.
- Returns
The metric value stored in the buffer.
- Return type
torch.Tensor
- reset() None
Initializes/resets the buffer to \(\infty\)
PNorm
- class gradient_metrics.metrics.PNorm(target_layers: Union[Sequence[Union[torch.nn.modules.module.Module, torch.Tensor]], torch.nn.modules.module.Module, torch.Tensor], grad_transform: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, p: float = 1.0)
Bases:
gradient_metrics.metrics.GradientMetric
Computes the p-norm over the flattened gradients.
\[(\sum_{i=1}^n |x_i|^p)^{\frac{1}{p}}\]- Parameters
target_layers (torch.nn.Module, torch.Tensor or sequence of them) – Layers or tensors on which the metrics will be registered as backward hooks. For
torch.nn.Module
instances a single metric instance will be registered to all parameters returned bytorch.nn.Module.parameters()
, thus computing the metric over all parameters of the Module.grad_transform (Optional[Callable[[torch.Tensor], torch.Tensor]], optional) – A callable which accepts a
torch.Tensor
as input and returns atorch.Tensor
. The callable is applied to the gradient before it is handed over to the_collect
method of aGradientMetric
instance.p (float optional) – Power of the norm. Defaults to 1 (absolute-value norm).
- Raises
ValueError – If p is not in the interval (0,inf].
- property data: torch.Tensor
Holds the metric data.
- Returns
The metric value stored in the buffer.
- Return type
torch.Tensor
- reset() None
Initializes/resets the buffer to 0