ADAM

class ADAM(maxiter=10000, tol=1e-06, lr=0.001, beta_1=0.9, beta_2=0.99, noise_factor=1e-08, eps=1e-10, amsgrad=False, snapshot_dir=None, callback=None)[source]

Bases: Optimizer

Adam and AMSGRAD optimizers.

Adam [1] is a gradient-based optimization algorithm that is relies on adaptive estimates of lower-order moments. The algorithm requires little memory and is invariant to diagonal rescaling of the gradients. Furthermore, it is able to cope with non-stationary objective functions and noisy and/or sparse gradients.

AMSGRAD [2] (a variant of Adam) uses a ‘long-term memory’ of past gradients and, thereby, improves convergence properties.

References

[1]: Kingma, Diederik & Ba, Jimmy (2014), Adam: A Method for Stochastic Optimization.

arXiv:1412.6980

[2]: Sashank J. Reddi and Satyen Kale and Sanjiv Kumar (2018),

On the Convergence of Adam and Beyond. arXiv:1904.09237

Parameters:
  • maxiter (int) – Maximum number of iterations

  • tol (float) – Tolerance for termination

  • lr (float) – Value >= 0, Learning rate.

  • beta_1 (float) – Value in range 0 to 1, Generally close to 1.

  • beta_2 (float) – Value in range 0 to 1, Generally close to 1.

  • noise_factor (float) – Value >= 0, Noise factor

  • eps (float) – Value >=0, Epsilon to be used for finite differences if no analytic gradient method is given.

  • amsgrad (bool) – True to use AMSGRAD, False if not

  • snapshot_dir (str | None) – If not None save the optimizer’s parameter after every step to the given directory

  • callback (CALLBACK | None) – A callback function passed information in each iteration step. The information is, in this order: current time step, the parameters, the function value.

Attributes

bounds_support_level

Returns bounds support level

gradient_support_level

Returns gradient support level

initial_point_support_level

Returns initial point support level

is_bounds_ignored

Returns is bounds ignored

is_bounds_required

Returns is bounds required

is_bounds_supported

Returns is bounds supported

is_gradient_ignored

Returns is gradient ignored

is_gradient_required

Returns is gradient required

is_gradient_supported

Returns is gradient supported

is_initial_point_ignored

Returns is initial point ignored

is_initial_point_required

Returns is initial point required

is_initial_point_supported

Returns is initial point supported

setting

Return setting

settings

Methods

get_support_level()[source]

Return support level dictionary

static gradient_num_diff(x_center, f, epsilon, max_evals_grouped=None)

We compute the gradient with the numeric differentiation in the parallel way, around the point x_center.

Parameters:
  • x_center (ndarray) – point around which we compute the gradient

  • f (func) – the function of which the gradient is to be computed.

  • epsilon (float) – the epsilon used in the numeric differentiation.

  • max_evals_grouped (int) – max evals grouped, defaults to 1 (i.e. no batching).

Returns:

the gradient computed

Return type:

grad

load_params(load_dir)[source]

Load iteration parameters for a file called adam_params.csv.

Parameters:

load_dir (str) – The directory containing adam_params.csv.

minimize(fun, x0, jac=None, bounds=None)[source]

Minimize the scalar function.

Parameters:
Returns:

The result of the optimization, containing e.g. the result as attribute x.

Return type:

OptimizerResult

print_options()

Print algorithm-specific options.

save_params(snapshot_dir)[source]

Save the current iteration parameters to a file called adam_params.csv.

Note

The current parameters are appended to the file, if it exists already. The file is not overwritten.

Parameters:

snapshot_dir (str) – The directory to store the file in.

set_max_evals_grouped(limit)

Set max evals grouped

set_options(**kwargs)

Sets or updates values in the options dictionary.

The options dictionary may be used internally by a given optimizer to pass additional optional values for the underlying optimizer/optimization function used. The options dictionary may be initially populated with a set of key/values when the given optimizer is constructed.

Parameters:

kwargs (dict) – options, given as name=value.

static wrap_function(function, args)

Wrap the function to implicitly inject the args at the call of the function.

Parameters:
  • function (func) – the target function

  • args (tuple) – the args to be injected

Returns:

wrapper

Return type:

function_wrapper