Introduction by means of an example
If a car is driving on a road, this fact can be used to create a table in which the distance covered since the start of the recording is entered at each point in time. In practice, it is useful not to keep such a table too close-meshed, i.e., for example, to make a new entry only every 3 seconds in a period of 1 minute, which would require only 20 measurements. However, such a table can theoretically be made arbitrarily close-meshed, if every point in time is to be taken into account. In this case, the previously discrete data, i.e. data with a distance, merge into a continuum. The present is then interpreted as a point in time, i.e. as an infinitely short period of time. At the same time, however, the car has covered a theoretically measurable exact distance at every point in time, and if it does not slow down to a standstill or even reverse, the distance will increase continuously, i.e. it will never be the same at any point in time as at another.
· 
Exemplary representation of a table, every 3 seconds a new measurement is entered. Under such conditions, only average velocities can be calculated in the periods 0 to 3, 3 to 6 etc. seconds can be calculated. Since the distance covered always increases, the car seems to move only forward.
· 
Potential transition to an arbitrarily close-meshed table, which takes the form of a curve after all points have been entered. Now a distance
is assigned to each time point between 0 and 60 seconds. Regions, within which the curve runs more steeply upwards, correspond to time periods, in which a larger number of meters per time unit is covered. In regions with an almost constant number of meters, for example in the range 15-20 seconds, the car drives slowly and the curve runs flat.
The motivation behind the notion of deriving a time-distance table or function is to now be able to specify how fast the car is moving at a certain present time. From a time-stretch table the appropriate time-speed table is to be derived. The background is that speed is a measure of how much the distance traveled changes over time. If the speed is high, a strong increase in the distance can be seen, while a low speed leads to little change. Since each time point has also been assigned a distance, such an analysis should in principle be possible, because with the knowledge of the distance traveled
within a time period
the following applies to the velocity

Thus, if
and
two different times, "the speed" of the car in the period between them is

The differences in numerator and denominator have to be formed, since one is only interested in the distance 
traveled within a certain time period Nevertheless, this approach does not provide a complete picture, since initially only velocities for "real time periods" were measured. A present velocity, comparable to a speed camera photo, on the other hand, would refer to an infinitely short time interval. Furthermore, it is very possible that the car still changes its speed even in very short intervals, for example during emergency braking. Accordingly, the upper term "speed" is not applicable and must be replaced by "average speed". Thus, if real time intervals, i.e. discrete data, are used, the model is simplified in that a constant speed is assumed for the car within the intervals considered.
If, on the other hand, we want to move on to a "perfectly fitting" time-velocity table, the term "average velocity in a time interval" must be replaced by "velocity at a point in time". To do this, a time
must first be chosen. The idea now is to run "real time intervals" in a limit process against an infinitely short time interval and study what happens to the average velocities involved. Although the denominator
to 0, this is not a problem, because the car can move less and less far in shorter time intervals with a continuous course, i.e. without teleportation, so that numerator and denominator decrease at the same time, and in the limit process an indefinite term "
" arises. This can make sense as a limit value under certain circumstances, for example, express

exactly the same velocities. Now there are two possibilities when studying the velocities. Either, they do not show any tendency to approach a certain finite value in the considered limit process. In this case, no velocity valid at time
can be assigned to the motion of the car, i.e., the term "
" has no definite meaning here. If, on the other hand, there is an increasing stabilization in the direction of a fixed value, then there exists the limit

and expresses exactly the
prevailing speed of the car at time The indeterminate term "
" takes on a unique value in this case. The resulting numerical value is also called the derivative of
at location
and the symbol often
used for it.
The principle of differential calculus
The example of the last section is particularly simple if the increase of the distance of the car with time is uniform, i.e. linear. In this case, one also speaks of a proportionality between time and distance, if at the beginning of the recording (
) no distance has been covered yet (
). This results in an always constant change of the distance in a certain time interval, no matter from when the measurement starts. For example, between 0 and 1 the car covers the same distance as between 9 and 10 seconds. If we assume that the car moves 2 meters further for every second that elapses, proportionality means that it moves only 1 meter for every half second, and so on. In general, then,
i.e., for each additional unit of time, two additional units of distance are added, so that the rate of change at each point is 2 "meters per (added) second".
For the more general case, replacing 2 by any number
, i.e.
, then for each elapsed time unit, another
distance units are added. This can be seen quickly, because the following applies to the distance difference

In general, the car moves forward in
time units by a total of
distance units - its speed is therefore, in case of the choice of meters and seconds made, constant "
meters per second". If the starting value is not
but
, this does not change anything, since the constant in the upper difference always subtracts out. This is also reasonable from an illustrative point of view: The starting position of the car should be irrelevant for its speed if the motion is uniform.
It can therefore be stated:
- Linear functions. For linear functions (note that it does not have to be an origin line), the derivative term is explained as follows. If the function under consideration has the form
then the instantaneous rate of change at each point has the value
, so it is true for the corresponding derivative function
. Thus, the derivative can be read directly from the data
In particular, every constant function
has the derivative
, since changing the input values does not change the output value. The measure of change is therefore 0 everywhere.
Sometimes it can be much more difficult if a movement is not uniform. In this case, the course of the time-stretch function may look completely different from a straight line. From the nature of the time-stretch function, it can then be seen that the car's motion trajectories are very varied, which may have to do with traffic lights, curves, traffic jams and other road users, for example. Since such types of progressions are particularly common in practice, it is convenient to extend the derivation notion to non-linear functions as well. Here, however, one quickly encounters the problem that, at first glance, there is no clear proportionality factor that precisely expresses the local rate of change. Therefore, the only possible strategy is to linearize the nonlinear function to reduce the problem to the simple case of a linear function. This technique of linearization forms the very calculus of differential calculus and is of very great importance in calculus, since it helps to reduce complicated processes locally to very easily understood processes, namely linear processes.
The strategy can be exemplified by the non-linear function
The following table shows the linearization of the quadratic function
at position 1.
|  | 0,5 | 0,75 | 0,99 | 0,999 | 1 | 1,001 | 1,01 | 1,1 | 2 | 3 | 4 | 100 |
|  | 0,25 | 0,5625 | 0,9801 | 0,998001 | 1 | 1,002001 | 1,0201 | 1,21 | 4 | 9 | 16 | 10000 |
|  | 0 | 0,5 | 0,98 | 0,998 | 1 | 1,002 | 1,02 | 1,2 | 3 | 5 | 7 | 199 |
That the linearization is only a local phenomenon is shown by the increasing deviation of the function values at more distant input values. The linear function
mimics the behavior of
near input 1 very well (better than any other linear function).
However, unlike
for one has an easy time interpreting the rate of change at the point 1: It is (as everywhere) exactly 2. Thus
.
It can therefore be stated:
- Non-linear functions. If the instantaneous rate of change of a non-linear function is to be determined at a certain point, it must be linearized there (if possible). Then the slope of the approximate linear function is the local rate of change of the non-linear function under consideration, and the same view applies as for derivatives of linear functions. In particular, the rates of change of a non-linear function are not constant, but will change from point to point.
The exact determination of the correct linearization of a non-linear function at a given point is the central task of the calculus of differential calculus. The question is whether it is possible to calculate from a curve such as
which linear function it best approximates at a given point. Ideally, this calculation is even so general that it can be applied to all points in the domain of definition. In the case of
can be shown that at the point
the best linear approximation must have
slope With the additional information that the linear function
must intersect the curve at the point the full functional equation of the approximating linear function can then be obtained. In many cases, however, the specification of the slope, i.e. the derivative, is sufficient.
The starting point is the explicit determination of the limit value of the differential quotient

from which for very small h by simple transformation the expression

emerges. The right-hand side is a function linear
in with slope
and mimics 
very well near For some elementary functions such as polynomial functions, trigonometric functions, exponential functions, or logarithmic functions, a derivative function can be determined by this limit process. With the help of so-called derivation rules, this process can then be generalized to many other functions, such as sums, products or concatenations of elementary functions like those mentioned above.
Exemplary: If
and
, then the product is
approximated by the product of the linear functions:
, and by multiplying out:

where the gradient of
at
corresponds
exactly to Furthermore, the derivation rules help to replace the sometimes time-consuming limit determinations by a "direct calculus" and thus simplify the derivation process. For this reason, differential quotients are studied in teaching for fundamental understanding and are used to prove the derivation rules, but are not applied in computational practice.
Exemplary calculation of the derivative
The approach to the derivative calculation is first the difference quotient. This can be demonstrated by the functions
and 
In the case of the binomial formula 
helps. This gives

In the last step the term
was absorbed in the difference and a factor
shortened. If now tends to 0,
only remains in the limit from the "secant slope" 2
- this is the sought exact tangent slope
. In general, for polynomial functions, derivation decreases the degree by one.
Another important type of function is exponential functions, such as
. Here, for each input
factors 10
are multiplied together, for example
,
or
. This can also be
generalized to non-integer numbers by means of "splitting" factors into roots (e.g.,
). Exponential functions, the characteristic equation is

which is based on the principle that the product of
factors 10 and
factors 10 consists of
factors 10. In particular, there exists a direct connection between any differences
and
through

This triggers the important (and for exponential functions peculiar) effect that the derivative function must correspond to the derived function except for one factor:

The factor, except for which function and derivative are equal, is the derivative at the point 0. Strictly speaking, it must be verified that this exists at all. If so, is
already derivable everywhere.
The calculation rules are described in detail in the section Derivation Calculation.
Classification of the application possibilities
Extreme value problems
→ Main article: Extreme value problem
An important application of the differential calculus is that the derivative can be used to determine local extreme values of a curve. So instead of having to search mechanically for high or low points using a table of values, the calculus provides a direct answer in some cases. If there is a high or low point, the curve has no "real" slope at this point, which is why the optimal linearization has a slope of 0. For the exact classification of an extreme value, however, further local data of the curve are necessary, because a slope of 0 is not sufficient for the existence of an extreme value (let alone a high or low point).
In practice, extreme value problems typically occur when processes, for example in the economy, are to be optimized. Often there are unfavorable results at the marginal values, but in the direction of the "middle" there is a steady increase, which then has to be maximized somewhere. For example, the optimal choice of a sales price: If the price is too low, the demand for a product is very high, but the production cannot be financed. On the other hand, if it is too high, in extreme cases it will not be bought at all. Therefore, an optimum lies somewhere "in the middle". The prerequisite for this is that the relationship can be represented in the form of a (continuously) differentiable function.
The examination of a function for extreme points is part of a curve discussion. The mathematical background is provided in the section Application of higher derivatives.
Mathematical modeling
In mathematical modeling, complex problems are to be captured and analyzed in mathematical language. Depending on the problem, the investigation of correlations or causalities or also the giving of prognoses are target-oriented within the framework of this model.
Especially in the environment of so-called differential equations, the differential calculus is a central tool for modeling. These equations occur, for example, when there is a causal relationship between the stock of a quantity and its change over time. An everyday example could be:
The more inhabitants a city has, the more people want to move there.
More concretely, this could mean, for example, that with
current residents, an average of
people will move in over the next 10 years, and with
inhabitants on average
persons in the next 10 years, etc.-so as not to have to run all the numbers individually: If
people live in the city, so many people want to move in that another would be added
after 10 years. If there is such a causality between stock and change over time, it can be asked whether a forecast for the number of inhabitants after 10 years can be derived from these data if, for example, the city had
inhabitants in 2020. In doing so, it would be wrong to believe that this will be
since as the population increases, the demand for housing will in turn increasingly increase. The crux of understanding the correlation is thus once again its locality: if the city has
inhabitants, then at this point
people want to move in per 10 years. But a short moment later, when more people have moved in, the situation looks different again. If this phenomenon is thought to be arbitrarily close-meshed in time, a "differential" correlation results. However, in many cases the continuous approach is also suitable for discrete problems.
With the help of differential calculus, a model can be derived from such a causal relationship between stock and change in many cases, which resolves the complex relationship in the sense that a stock function can be explicitly specified at the end. If, for example, the value 10 years is then inserted into this function, the result is a forecast for the number of city residents in 2030. In the case of the upper model, a stock function
sought with
,
in 10 years, and
. The solution is then

with the natural exponential function (natural means that the proportionality factor between stock and change is simply equal to 1), and for 2030 the estimated forecast is
million population. Thus, the proportionality between population and rate of change leads to exponential growth and is a classic example of a self-reinforcing effect. Analogous models work for population growth (the more individuals, the more births) or for the spread of a contagious disease (the more diseased, the more contagions). In many cases, however, these models reach a limit when natural constraints (such as an upper limit on the total population) prevent the process from continuing indefinitely. In these cases, similar models, such as logistic growth, are more appropriate.
Numerical methods
The property of a function to be differentiable is advantageous in many applications, since this gives the function more structure. One example is solving equations. In some mathematical applications, it is necessary to find the value of one (or more) unknown
, which
is the zero of a function It is then
. Depending on the nature of
strategies can be developed to specify a zero at least approximately, which is usually quite sufficient in practice. If
is differentiable at every point with derivative
then Newton's method can help in many cases. In this method, the differential calculus plays a direct role insofar as a derivative must always be calculated explicitly in the stepwise procedure.
Another advantage of differential calculus is that in many cases complicated functions, such as roots or even sine and cosine, can be well approximated using simple calculation rules such as addition and multiplication. If the function is easy to evaluate at an adjacent value, this is of great benefit. For example, if an approximation for the number
sought, the differential calculus for
the linearization

because it is proved that
. Both function and first derivative could be calculated well
at the point because it is a square number. Inserting
gives
, which
agrees with the exact result
within an error less than By including higher derivatives, the accuracy of such approximations can be additionally increased, since then not only linear, but quadratic, cubic, etc. is approximated, see also Taylor series.
Pure mathematics
Differential calculus also plays an important role in pure mathematics as a core of calculus. An example is differential geometry, which deals with figures that have a differentiable surface (without kinks, etc.). For example, a plane can be placed tangentially on a spherical surface at any point. Illustratively, if you stand at a point on the earth, you will have the feeling that the earth is flat if you let your gaze wander in the tangential plane. In reality, however, the earth is only locally flat: The applied plane serves the simplified representation (by linearization) of the more complicated curvature. Globally it has a completely different shape as a spherical surface.
The methods of differential geometry are extremely important for theoretical physics. Thus, phenomena such as curvature or spacetime can be described by methods of differential calculus. Also the question, what is the shortest distance between two points on a curved surface (for example the earth's surface), can be formulated and often answered with these techniques.
Differential calculus has also proved its worth in the study of numbers as such, i.e. within the framework of number theory, in analytic number theory. The basic idea of analytic number theory is to transform certain numbers about which one wants to learn something into functions. If these functions have "good properties" such as differentiability, one hopes to be able to draw conclusions about the original numbers via the structures that accompany them. It has often proven useful to move from real to complex numbers in order to perfect analysis (see also complex analysis), i.e. to study functions over a larger range of numbers. An example is the analysis of the Fibonacci numbers
, whose law of formation dictates that a new number should always arise from the sum of the two preceding ones. Approach of the analytic number theory is the formation of the generating function

i.e. of an "infinitely long" polynomial (a so-called power series) whose coefficients are exactly the Fibonacci numbers. For sufficiently small numbers
this expression makes sense, because the powers
then go towards 0 much faster than the Fibonacci numbers go towards infinity, so in the long run everything settles down at a finite value. It is possible for these values to calculate the function
explicitly by

The denominator polynomial
"mirrors" exactly the behavior
of the Fibonaccinumbers f_{n}}
fact
by termwise arithmetic. On the other hand, differential calculus can be used to show that the function
sufficient to uniquely characterize the Fibonacci numbers (their coefficients). However, since it is a plain rational function, this allows us to find the exact formula valid for any Fibonacci number 

with the golden ratio
when
and
set. The exact formula is able to calculate a Fibonacci number without knowing the previous ones. The conclusion is drawn by a so-called coefficient comparison and uses that the polynomial
has
as zeros
and
The higher dimensional case
The differential calculus can be generalized to the case of "higher dimensional functions". This means that both input and output values of the function are not merely part of the one-dimensional real number ray, but also points of a higher-dimensional space. An example is the rule

between two-dimensional spaces in each case. The function understanding as a table remains identical here, only that this has "clearly more" entries with "four columns"
Multidimensional mappings can also be linearized at a point in some cases. However, it is now appropriate to note that there can be multiple input dimensions as well as multiple output dimensions: The correct way to generalize is that the linearization in each component of the output accounts for each variable in a linear fashion. This draws for upper example function an approximation of the form

after itself. This then mimics the entire function
very well near the input Accordingly, in each component, a "slope" is given for each variable - this will then measure the local behavior of the component function for small change in that variable. This slope is also called the partial derivative. The correct constant intercepts
calculated exemplarily by
or
. As in the one-dimensional case, the slopes (here
) depend strongly on the choice of point (here
) at which to derive. The derivative is therefore no longer a number, but a union of several numbers - in this example there are four - and these numbers are usually different for all inputs. It is generally also used for the derivation

with which all "gradients" are gathered in a so-called matrix. This term is also called Jacobi matrix or functional matrix.
Example: If above
set, it can be shown that the following linear approximation is
very good for very small changes in
and

For example

and

In the very general case, if one has
variables and
output components, then combinatorially there are a total of
"gradients", i.e. partial derivatives. In the classical case
there is
one gradient because of
and in the example above
there are
"gradients".