mrpro.algorithms.optimizers.lbfgs
- mrpro.algorithms.optimizers.lbfgs(f: Operator[Unpack[tuple[Tensor, ...]], tuple[Tensor, ...]] | Callable[[Unpack[tuple[Tensor, ...]]], tuple[Tensor]], initial_parameters: Sequence[Tensor], learning_rate: float = 1.0, max_iterations: int = 100, max_evaluations: int | None = 100, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 10, line_search_fn: None | Literal['strong_wolfe'] = 'strong_wolfe', callback: Callable[[OptimizerStatus], bool | None] | None = None) tuple[Tensor, ...][source]
LBFGS for (non-linear) minimization problems.
The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a quasi-Newton optimization method that approximates the inverse Hessian matrix using a limited memory of past gradients and updates. It is well-suited for high-dimensional problems and leverages curvature information for faster convergence compared to first-order methods such as
mrpro.algorithms.optimizers.adam.The parameter update rule is:
\[\theta_{k+1} = \theta_k - \alpha_k H_k \nabla f(\theta_k),\]where \(H_k\) is a limited-memory approximation of the inverse Hessian, and \(\alpha_k\) is the step size determined via line search (e.g., strong Wolfe conditions).
The algorithm performs the following steps:
Compute the gradient of the objective function.
Approximate the inverse Hessian matrix \(H_k\) using stored gradients and updates.
Perform a line search to compute the step size \(\alpha_k\).
Update the parameters.
Store the latest gradient and update information.
This implementation wraps PyTorch’s
torch.optim.LBFGSclass. For more information, see [WIKI], [NOC1980], and [LIU1989].References
[NOC1980]Nocedal, J. (1980). “Updating quasi-Newton matrices with limited storage.” Mathematics of Computation, 35(151), 773-782. https://doi.org/10.1090/S0025-5718-1980-0572855-7
[LIU1989]Liu, D. C., & Nocedal, J. (1989). “On the limited memory BFGS method for large scale optimization.” Mathematical Programming, 45(1-3), 503-528. https://doi.org/10.1007/BF01589116
[WIKI]Wikipedia: Limited-memory_BFGS https://en.wikipedia.org/wiki/Limited-memory_BFGS
- Parameters:
f (
Union[Operator[Unpack[tuple[Tensor,...]],tuple[Tensor,...]],Callable[[Unpack[tuple[Tensor,...]]],tuple[Tensor]]]) – scalar function to be minimizedinitial_parameters (
Sequence[Tensor]) –Sequenceof parameters to be optimized. Note that these parameters will not be changed. Instead, we create a copy and leave the initial values untouched.learning_rate (
float, default:1.0) – learning rate. This should usually be left as1.0if a line search is used.max_iterations (
int, default:100) – maximal number of iterationsmax_evaluations (
int|None, default:100) – maximal number of evaluations offper optimization steptolerance_grad (
float, default:1e-07) – termination tolerance on first order optimalitytolerance_change (
float, default:1e-09) – termination tolerance on function value/parameter changeshistory_size (
int, default:10) – update history sizeline_search_fn (
Optional[Literal['strong_wolfe']], default:'strong_wolfe') – line search algorithm, eitherstrong_wolfeorNone(meaning constant step size)callback (
Callable[[OptimizerStatus],bool|None] |None, default:None) – Function to be called after each iteration. This can be used to monitor the progress of the algorithm. If it returnsFalse, the algorithm stops at that iteration. N.B. the callback is not called within the line search of LBFGS. You can use the information from theOptimizerStatusto display a progress bar.
- Returns:
List of optimized parameters.