ADAM#
- class ADAM(maxiter=10000, tol=1e-06, lr=0.001, beta_1=0.9, beta_2=0.99, noise_factor=1e-08, eps=1e-10, amsgrad=False, snapshot_dir=None)[source]#
Bases:
Optimizer
Adam and AMSGRAD optimizers.
Adam [1] is a gradient-based optimization algorithm that is relies on adaptive estimates of lower-order moments. The algorithm requires little memory and is invariant to diagonal rescaling of the gradients. Furthermore, it is able to cope with non-stationary objective functions and noisy and/or sparse gradients.
AMSGRAD [2] (a variant of Adam) uses a ‘long-term memory’ of past gradients and, thereby, improves convergence properties.
References
- [1]: Kingma, Diederik & Ba, Jimmy (2014), Adam: A Method for Stochastic Optimization.
- [2]: Sashank J. Reddi and Satyen Kale and Sanjiv Kumar (2018),
On the Convergence of Adam and Beyond. arXiv:1904.09237
- Parameters:
maxiter (int) – Maximum number of iterations
tol (float) – Tolerance for termination
lr (float) – Value >= 0, Learning rate.
beta_1 (float) – Value in range 0 to 1, Generally close to 1.
beta_2 (float) – Value in range 0 to 1, Generally close to 1.
noise_factor (float) – Value >= 0, Noise factor
eps (float) – Value >=0, Epsilon to be used for finite differences if no analytic gradient method is given.
amsgrad (bool) – True to use AMSGRAD, False if not
snapshot_dir (str | None) – If not None save the optimizer’s parameter after every step to the given directory
Attributes
- bounds_support_level#
Returns bounds support level
- gradient_support_level#
Returns gradient support level
- initial_point_support_level#
Returns initial point support level
- is_bounds_ignored#
Returns is bounds ignored
- is_bounds_required#
Returns is bounds required
- is_bounds_supported#
Returns is bounds supported
- is_gradient_ignored#
Returns is gradient ignored
- is_gradient_required#
Returns is gradient required
- is_gradient_supported#
Returns is gradient supported
- is_initial_point_ignored#
Returns is initial point ignored
- is_initial_point_required#
Returns is initial point required
- is_initial_point_supported#
Returns is initial point supported
- setting#
Return setting
- settings#
Methods
- static gradient_num_diff(x_center, f, epsilon, max_evals_grouped=None)#
We compute the gradient with the numeric differentiation in the parallel way, around the point x_center.
- Parameters:
- Returns:
the gradient computed
- Return type:
grad
- load_params(load_dir)[source]#
Load iteration parameters for a file called
adam_params.csv
.- Parameters:
load_dir (str) – The directory containing
adam_params.csv
.
- minimize(fun, x0, jac=None, bounds=None)[source]#
Minimize the scalar function.
- Parameters:
fun (Callable[[POINT], float]) – The scalar function to minimize.
x0 (POINT) – The initial point for the minimization.
jac (Callable[[POINT], POINT] | None) – The gradient of the scalar function
fun
.bounds (list[tuple[float, float]] | None) – Bounds for the variables of
fun
. This argument might be ignored if the optimizer does not support bounds.
- Returns:
The result of the optimization, containing e.g. the result as attribute
x
.- Return type:
- print_options()#
Print algorithm-specific options.
- save_params(snapshot_dir)[source]#
Save the current iteration parameters to a file called
adam_params.csv
.Note
The current parameters are appended to the file, if it exists already. The file is not overwritten.
- Parameters:
snapshot_dir (str) – The directory to store the file in.
- set_max_evals_grouped(limit)#
Set max evals grouped
- set_options(**kwargs)#
Sets or updates values in the options dictionary.
The options dictionary may be used internally by a given optimizer to pass additional optional values for the underlying optimizer/optimization function used. The options dictionary may be initially populated with a set of key/values when the given optimizer is constructed.
- Parameters:
kwargs (dict) – options, given as name=value.