The Newton method is a so-called locally convergent method. Convergence of the sequence generated in the Newton iteration to a zero is therefore only guaranteed if the starting value, i.e. the 0-th member of the sequence, is already "sufficiently close" to the zero. If the starting value is too far away, the convergence behavior is not fixed, i.e., both a divergence of the sequence and an oscillation (in which finitely many function values alternate) or a convergence to another zero of the function under consideration are possible.
If the starting value
is chosen in such a way that the Newton method converges, then the convergence is quadratic, however, i.e. with the order of convergence 2 (if the derivative does not vanish at the zero point). The set of starting points for which the Newton method converges to a certain zero forms the catchment area of this zero. If you color the catchment areas of different zeros in the complex plane differently for a polynomial function with real or complex coefficients, you get a Newton fractal. In this fractal it can be seen that the catchment areas contain basins, i.e. circular disks around the zeros, from which the Newton iteration converges stably towards the zero in the centre. But it can also be seen that the edges of the catchment areas are "frayed", they even have a fractal structure. So small deviations in the starting point can lead to different zeros.
However, if there is
exactly one zero in the interval , in
consistently
as well as
holds and the initial value
chosen to
the left of the zero ξ, then the sequence always converges in Newton's method, strictly monotonically increasing (see the figure below resp. the table above from
).
Examples of non-convergence
- Oscillatory behavior results, among others, for the polynomial
with
. The point
with
and f^{\prime
is
mapped by the Newton operator to the point the point
in turn, with
and
, is
mapped to , so Newton iteration with one of these points as the initial value yields a periodic sequence, these two points alternate cyclically. This cycle is stable, it forms an attractor of the Newton iteration. This means that there are environments around both points, so that starting points from these environments converge against the cycle and thus have one each of the points 0 and 1 as the limit of the subsequence with even index and that with odd index. - Divergence or arbitrary distance from the starting point results for
with
and
. There exists a place
with
i.e.
One convinces oneself that then
holds. This behavior is not stable, because when the initial value is slightly varied, such as by numerical computation, the Newton iteration moves further and further away from the ideal divergent sequence. Even with eventual convergence, the zero found will be very far from the initial value.
Local quadratic convergence
Let be
a twice continuously differentiable real function and
a simple zero of
, in which the derivative thus has no zero. This means that the graph of the function is transversal, i.e. non-touching, intersecting the
axis. Let be
a point close to
. Then the second degree Taylor formula (with Lagrangian remainder) can be
lies between
and
,
can be rearranged according to the difference
.
It is now rearranged so that the Newton operator appears on the right-hand side,
.
Let
be an interval around
without zero of the derivative
and
and
bounds on the derivatives of
. Then for all
the estimation follows
.
Let
denote the constant factor. At each step
the Newton iteration, the quantity
be smaller than the square of the same magnitude in the previous step,
. After complete induction we get
.
Thus, if the starting point of the iteration can be guaranteed to be the estimate
, e.g. by the interval length of
being less than
, then the sequence
of Newton iteration converges to the zero
, because the sequence
and hence
is a zero sequence according to the given estimation. The shortening of the interval can be achieved by some iterations of a slower procedure for zero constraint, e.g., the bisection procedure or the regula falsi.
The convergence rate following from these estimates is called quadratic, the (logarithmic) precision or number of valid digits doubles in each step. The estimation of the distance
to the zero is often given linearly in
, e.g. the following holds
, if the length of the interval
is less than
This gives an estimate of the valid digits in the binary system.
, if the length of the interval
is less than
, i.e., close enough to zero results in a doubling of the valid decimal places in each step.
Local quadratic convergence with multiple zeros by modification
In the case that
has a multiple zero of finite degree at
, the rate of convergence can also be estimated and quadratic convergence can be enforced again by a slight modification.
If
has
a
-fold zero, be
written as
with
.
Then by the product rule
and hence the expression
.
If we now insert this into the iteration, we get

and from it, after bilateral subtraction of ,
the iteration error
. 
Now when the expression
become very small, the summand
in the denominator becomes much smaller than
, so that the back bracket in approaches
more and more the value
. For the simple zero with
, one has a small value
, which becomes almost 0 so that
. For
, becomes
about 0.5, so the distance to zero only about halves from step to step, and after about 10 steps one has increased the precision only in another three decimal places. For
,
becomes about 0.67, so that only after about 16 steps the accuracy increases by another three decimal places, and so on.
Therefore one can estimate the multiplicity of the zero by the convergence behaviour, if one does not know it for other reasons, and - as now still described - optimize the procedure.
For a
-fold zero one modifies the Newtonian approximation method with a factor
:
(Newton method with
-fold zero).
Thus
then becomes.
![{\displaystyle {\begin{aligned}(x_{\text{neu}}-a)&=(x-a)\left[1-{\frac {k\cdot g(x)}{k\cdot g(x)+(x-a)\cdot g'(x)}}\right]\\&=(x-a)\left[{\frac {k\cdot g(x)+(x-a)\cdot g'(x)}{k\cdot g(x)+(x-a)\cdot g'(x)}}-{\frac {k\cdot g(x)}{k\cdot g(x)+(x-a)\cdot g'(x)}}\right]\\&=(x-a){\frac {(x-a)\cdot g'(x)}{k\cdot g(x)+(x-a)\cdot g'(x)}}\\&=(x-a)^{2}{\frac {g'(x)}{k\cdot g(x)+(x-a)\cdot g'(x)}}\end{aligned}}}](https://www.alegsaonline.com/image/5f1c38c84e7d192079e376bbf7e046389222342a.svg)
Now if
again very small, then in the denominator the summand becomes
much smaller than
, and one obtains
,
where the right factor converges to a fixed value because of
. As can be seen, quadratic convergence is now also present here.
An example shows the convergence behavior very nicely. The function
has the simple zero
. The left column of the table shows the rapid convergence for the starting value 1, after 4 steps the precision cannot be increased any more, at the error the number of zeros behind the decimal point doubles (at least). If we now square the function (middle column), the zero becomes a double one, and now we see the behavior explained above, that without modification the error is only about halved in each step. If we then modify this case with the factor
the same behavior as for the simple zero occurs (right column).
| n | for  | Error | for without factor  | Error | for with factor  | Error |
| 0 |  |  |  |  |  |  |
| 1 |  |  |  |  |  |  |
| 2 |  |  |  |  |  |  |
| 3 |  |  |  |  |  |  |
| 4 |  |  |  |  |  |  |
| 5 |  |  |  |  |  |  |
| 6 |  |  |  |  |  |  |
| 7 |  |  |  |  |  |  |
| 8 |  |  |  |  |  |  |
| 9 |  |  |  |  |  |  |
| 10 |  |  |  |  |  |  |
The Newton method for complex numbers
For complex numbers
complex one writes the formula accordingly:

with the holomorphic function
. The decomposition into real and imaginary part gives

The complex derivative is independent of the direction of the derivative at the point z i.e. it is valid

Therefore the Cauchy-Riemann differential equations are valid
and 
The complex equation (1) can be decomposed into real and imaginary parts:
.
With the help of (2) it follows

The geometric meaning of this equation is seen as follows. One determines the minimum of the magnitude
. The minimum is assumed to be .
This can be determined using the gradient method, i.e., the steepest descent method.
One introduces the term
Using the vector formula for this method is
:

This is identical to formula (1).
Comments
- The local convergence proof can also be done in the same way in the multidimensional case, but it is then technically a bit more difficult since two- and three-level tensors are used for the first and second derivatives, respectively. Essentially, the constant K is given by
be replaced with appropriate induced operator norms. - The local convergence proof assumes that an interval containing a zero is known. From his proof, however, there is no way to test this quickly. A proof of convergence which also provides a criterion for this was first given by Leonid Kantorovich and is known as Kantorovich's theorem.
- To find a suitable starting point, one sometimes uses other ("coarser") methods. For example, one can determine an approximate solution with the gradient method and then refine it with the Newton method.
- If the starting point is unknown, one can use a homotopy to deform the function
of which one is looking for a zero, to a simpler function
of which (at least) one zero is known. One then runs through the deformation backwards in the form of a finite sequence of only "slightly" differing functions. From the first function
one knows one zero. As starting value of the Newton iteration to the just current function of the sequence one uses the approximation of a zero of the function preceding in the sequence. For the exact procedure see homotopy method.
As an example may serve the "flooding homotopy": with an arbitrary
we form the initial function
with known zero
. We have
flooded the "water level" from the "zero level" to the level Now we gradually lower the water level,
,
,
. At each step, an approximation ξ
of a zero is determined, where
set. It is
and hence ξ
one of the approximate solutions we are looking for.