ADAM#

class ADAM(maxiter=10000, tol=1e-06, lr=0.001, beta_1=0.9, beta_2=0.99, noise_factor=1e-08, eps=1e-10, amsgrad=False, snapshot_dir=None)[source]#

Bases: Optimizer

Adam and AMSGRAD optimizers.

Adam [1] is a gradient-based optimization algorithm that is relies on adaptive estimates of lower-order moments. The algorithm requires little memory and is invariant to diagonal rescaling of the gradients. Furthermore, it is able to cope with non-stationary objective functions and noisy and/or sparse gradients.

AMSGRAD [2] (a variant of Adam) uses a ‘long-term memory’ of past gradients and, thereby, improves convergence properties.

References

[1]: Kingma, Diederik & Ba, Jimmy (2014), Adam: A Method for Stochastic Optimization.: arXiv:1412.6980
[2]: Sashank J. Reddi and Satyen Kale and Sanjiv Kumar (2018),: On the Convergence of Adam and Beyond. arXiv:1904.09237

Parameters:

maxiter (int) – Maximum number of iterations
tol (float) – Tolerance for termination
lr (float) – Value >= 0, Learning rate.
beta_1 (float) – Value in range 0 to 1, Generally close to 1.
beta_2 (float) – Value in range 0 to 1, Generally close to 1.
noise_factor (float) – Value >= 0, Noise factor
eps (float) – Value >=0, Epsilon to be used for finite differences if no analytic gradient method is given.
amsgrad (bool) – True to use AMSGRAD, False if not
snapshot_dir (str | None) – If not None save the optimizer’s parameter after every step to the given directory

Attributes

bounds_support_level#: Returns bounds support level

gradient_support_level#: Returns gradient support level

initial_point_support_level#: Returns initial point support level

is_bounds_ignored#: Returns is bounds ignored

is_bounds_required#: Returns is bounds required

is_bounds_supported#: Returns is bounds supported

is_gradient_ignored#: Returns is gradient ignored

is_gradient_required#: Returns is gradient required

is_gradient_supported#: Returns is gradient supported

is_initial_point_ignored#: Returns is initial point ignored

is_initial_point_required#: Returns is initial point required

is_initial_point_supported#: Returns is initial point supported

setting#: Return setting

settings#

Methods

get_support_level()[source]#: Return support level dictionary

static gradient_num_diff(x_center, f, epsilon, max_evals_grouped=None)#

We compute the gradient with the numeric differentiation in the parallel way, around the point x_center.

Parameters:

x_center (ndarray) – point around which we compute the gradient
f (func) – the function of which the gradient is to be computed.
epsilon (float) – the epsilon used in the numeric differentiation.
max_evals_grouped (int) – max evals grouped, defaults to 1 (i.e. no batching).

Returns:

the gradient computed

Return type:

grad

load_params(load_dir)[source]#

Load iteration parameters for a file called adam_params.csv.

Parameters:: load_dir (str) – The directory containing adam_params.csv.

minimize(fun, x0, jac=None, bounds=None)[source]#

Minimize the scalar function.

Parameters:

fun (Callable[[POINT], float]) – The scalar function to minimize.
x0 (POINT) – The initial point for the minimization.
jac (Callable[[POINT], POINT] | None) – The gradient of the scalar function fun.
bounds (list[tuple[float, float]] | None) – Bounds for the variables of fun. This argument might be ignored if the optimizer does not support bounds.

Returns:

The result of the optimization, containing e.g. the result as attribute x.

Return type:

OptimizerResult

print_options()#: Print algorithm-specific options.

save_params(snapshot_dir)[source]#

Save the current iteration parameters to a file called adam_params.csv.

Note

The current parameters are appended to the file, if it exists already. The file is not overwritten.

Parameters:: snapshot_dir (str) – The directory to store the file in.

set_max_evals_grouped(limit)#: Set max evals grouped

set_options(**kwargs)#

Sets or updates values in the options dictionary.

The options dictionary may be used internally by a given optimizer to pass additional optional values for the underlying optimizer/optimization function used. The options dictionary may be initially populated with a set of key/values when the given optimizer is constructed.

Parameters:: kwargs (dict) – options, given as name=value.

static wrap_function(function, args)#

Wrap the function to implicitly inject the args at the call of the function.

Parameters:

function (func) – the target function
args (tuple) – the args to be injected

Returns:

wrapper

Return type:

function_wrapper