It's the method used by the Minimizer to approximate the inverse of the Hessian matrix when doing gradient-descent minimization. When you're minimizing a function, you start at some point in your function's domain, compute the gradient (which is a vector pointing in the uphill direction), march in the opposite (downhill) direction, watching the function decrease at each step, until the function starts to increase again, then repeat the gradient calculation and do this again in the new downhill direction. Using only first-derivative information like this (which is what the "linmin_iterated" minimization type does) is inefficient and converges slowly, though. More efficient is adding a bit of second-derivative information. To do this properly, you'd need to calculate a matrix of second partial derivatives called the Hessian matrix, but this scales with O(N^2) in terms of memory and computation time, where N is the number of degrees of freedom of the system that you're minimizing. On top of this, you'd have to invert this large matrix, which is also costly. Since that's very expensive, we use approximations instead: we estimate the effect of the second derivatives (the inverse Hessian) using past iterations of the gradient. There are a number of ways of doing this (the Davidon–Fletcher–Powell (DFP) formula and the Broyden–Fletcher–Goldfarb–Shanno (BFGS) formula being two common approximations. The "l" is the "linear memory" version of the BFGS formula, and it tends to be the default version that we use nowadays. (The other bits -- "nonmonotone" and "armijo" -- refer to details of how the line search is carried out and how the stopping condition is evaluated and how step sizes are determined and whatnot).

It's the method used by the Minimizer to approximate the inverse of the Hessian matrix when doing gradient-descent minimization. When you're minimizing a function, you start at some point in your function's domain, compute the gradient (which is a vector pointing in the uphill direction), march in the

opposite(downhill) direction, watching the function decrease at each step, until the function starts to increase again, then repeat the gradient calculation and do this again in the new downhill direction. Using only first-derivative information like this (which is what the "linmin_iterated" minimization type does) is inefficient and converges slowly, though. More efficient is adding a bit of second-derivative information. To do this properly, you'd need to calculate a matrix of second partial derivatives called the Hessian matrix, but this scales with O(N^2) in terms of memory and computation time, where N is the number of degrees of freedom of the system that you're minimizing. On top of this, you'd have toinvertthis large matrix, which is also costly. Since that's very expensive, we use approximations instead: we estimate the effect of the second derivatives (the inverse Hessian) using past iterations of the gradient. There are a number of ways of doing this (the Davidon–Fletcher–Powell (DFP) formula and the Broyden–Fletcher–Goldfarb–Shanno (BFGS) formula being two common approximations. The "l" is the "linear memory" version of the BFGS formula, and it tends to be the default version that we use nowadays. (The other bits -- "nonmonotone" and "armijo" -- refer to details of how the line search is carried out and how the stopping condition is evaluated and how step sizes are determined and whatnot).Perfect.

Thanks for your detailed explaination.