Check your HP product warranty status and support options available based on your country/region location with the HP Warranty Check Tool. Hp serial number lookup tool. The video in this document shows you the most common ways to find your Product Name (Model), Product Number, or Serial Number. If you are unable to see.
Apr 22, 2018 - Tags: Download Fire (1996) DVD Full Movie Download, Movie download in. Fire Twister 2015 720p BluRay in Hindi by Movies Plus Download. Free, download, latest, mp4 mobile movies, hd movies, mp4 movies, mobile. https://tonesever770.weebly.com/fire-twister-full-movie-download-torrent-mp4.html. Jun 11, 2017 - Fire Twister 2015 BRRip 720p Dual Audio In Hindi English IMDb Rating: 2.5/10. Full Movie Free Download Via Single Links Size 840Mb||. Twister Full Movie Download [HD Torrent]. Fire Twister 2015 Bluray mp4 download, Fire Twister 2015 Bluray 3gp download, Fire Twister 2015 Bluray 720p,. Twister Full Movie Download [HD Torrent]. Brrip Movie Full Hd mp4 download, Fire Twister 2015 Hindi Dubbed Brrip Movie Full Hd 3gp download, Fire Twister. Fire Twister 2015 720p BluRay in Hindi. Fire Twister Full Movie Casper Van Dien, Lisa Ciara, Johnny Hawkes. Twister Full Movie Download [HD Torrent].
https://tonesever770.weebly.com/molecular-biology-of-the-cell-problems-book-pdf-download.html. Topics including taxonomy, the application of baculoviruses as insecticides, the molecular basis for the remarkable ability of these viruses to express genes at high levels, and the interrelationships of baculovirus and transposable elements are also covered. Author(s): George F Rohrmann.
Juvenile wears many hats: He's a self-professed 'magnolia soldier,' a former alligator wrestler and -- after the release of 400 Degreez -- a Southern rap legend. He has a crunchy, at-times indecipherable growl that grinds beneath the track, providing perfect counterpoint to Mannie. Juvenile 400 degreez free album download.
EXERCISES 3.1 Let A = {1, 2, 3, 4, 5}. Consider the relations R = {(1, 2), (1, 4), (3, 1), (3, 4), (3, 5), (5, 1), (5, 4)} ⊆ A × A and S = {(a1 , a2 ) ∈ A × A | a1 + a2 = 6}. (a) Illustrate R and S by graphs and as points in a rectangular coordinate system. (b) Which of the following propositions are true of relation R: 1R2, 2R1, 3 R1, {2, 3, 4, 5} = {a ∈ A | 1Ra}? (c) Find the inverse relations for R and S. (d) Check whether the relations R and S are mappings. 146 Relations; mappings; functions 3.2 Consider the following relations and find out whether they are mappings. Which of the mappings are surjective, injective, bijective? (a) (b) a PP PP q c 3 PP P PP q P - f 5 a Q 1 1 0010 0010 00100010 1 2 0010 b 0010QQ 0010 00100010Q s 3 Q c 0010 3.3 (c) - 1 2 b PP PP q P - 4 c 6 a (d) 1 a 0010 1 0010 @ 00100010 0010 3 @ b @- c 5 @ R d 7 PP @ PP q g P Let A = B = {1, 2, 3} and C = {2, 3}. Consider the following mappings f : A → B and g : C → A with f (1) = 3, f (2) = 2, f (3) = 1 and g(2) = 1, g(3) = 2. (a) Illustrate f , g, f ◦ g and g ◦ f by graphs if possible. (b) Find the domains and the ranges of the given mappings and of the composite mappings. (c) What can you say about the properties of these mappings? 3.4 Given is the relation F = {(x1 , x2 ) ∈ R2 | |x2 | = x1 + 2}. Check whether F or F −1 is a mapping. In the case when there is a mapping, find the domain and the range. Graph F and F −1 . 3.5 Given are the relations F = {(x1 , x2 ) | x2 = x13 } with x1 ∈ {−3, −2, −1, 0, 1, 2, 3} and G = {(x, y) ∈ R2 | 9x2 + 2y2 = 18}. 3.6 Are these relations functions? If so, does the inverse function exist? Given are the functions f : Df → R and g : Dg → R with f (x) = 2x + 1 3.7 and g(x) = x2 − 2. Find and graph the composite functions g ◦ f and f ◦ g. Given are the functions f : R → R+ and g : R → R with f (x) = ex and g(x) = −x. (a) Check whether the functions f and g are surjective, injective or bijective. Graph these functions. (b) Find f −1 and g −1 and graph them. (c) Find f ◦ g and g ◦ f and graph them. Relations; mappings; functions 147 3.8 Find a ∈ R such that f : Df = [a, ∞) → R with y = f (x) = x2 + 2x − 3 being a bijective function. Find and graph function f −1 . 3.9 3.10 Find domain, range and the inverse function for function f : Df → R with y = f (x): √ x−4 ; (b) y = (x − 2)3 . (a) y = √ x+4 Given are the polynomials P5 : R → R and P2 : R → R with P5 (x) = 2x5 − 6x4 − 6x3 + 22x2 − 12x (a) (b) (c) (d) 3.11 and P2 (x) = (x − 1)2 . Calculate the quotient P5 /P2 by polynomial division. Find all the zeroes of polynomial P5 and factorize P5 . Verify Vieta’s formulae given in Theorem 3.6. Draw the graph of the function P5 . Check by means of Horner’s scheme whether x1 = 1, x2 = −1, x3 = 2, x4 = −2 are zeroes of the polynomial P6 : R → R with P6 (x) = x6 + 2x5 − x4 − x3 + 2x2 − x − 2. Factorize polynomial P6 . 3.12 Find the domain, range and the inverse function for each of the following functions fi : Dfi → R with yi = fi (x): (a) y1 = sin x, y2 = 2 sin x, y3 = sin 2x, y4 = sin x + 2 and y5 = sin(x + 2); (b) y1 = e x , y2 = 2e x , y3 = e2x , y4 = e x + 2 and y5 = e x+2 . Graph the functions given in (a) and (b) and check whether they are odd or even or whether they have none of these properties. 3.13 Given are the following functions f : Df → R with y = f (x): 4 (a) y = ln √x ; (d) y = 4 − x2 ; (b) y = ln x3 ; (e) y = 1 + e−x ; 2 (c) y = 3x √ + 5; (f) y = |x| − x. Find the domain and range for each of the above functions and graph these functions. Check where the functions are increasing and whether they are bounded. Which of the functions are odd or even? 4 Differentiation In economics, there are many problems which require us to take into account how a function value changes with respect to small changes of the independent variable (e.g. input, time, etc.). For example, assume that the price of some product changes slightly. The question is how does this affect the amount of product customers will buy? A useful tool for such investigations is differential calculus, which we treat in this chapter. It is an important field of mathematics with many applications, e.g. graphing functions, determination of extreme points of functions with or without additional constraints. Differential calculus allows us to investigate specific properties of functions such as monotonicity or convexity. For instance in economics, cost, revenue, profit, demand, production or utility functions have to be investigated with respect to their properties. In this chapter, we consider functions f : Df → R depending on one real variable, i.e. Df ⊆ R. 4.1 LIMIT AND CONTINUITY 4.1.1 Limit of a function One of the basic concepts in mathematics is that of a limit (see Definition 2.6 for a sequence). In this section, the limit of a function is introduced. This notion deals with the question of which value does the dependent variable y of a function f with y = f (x) approach as the independent variable x approaches some specific value x0 ? Definition 4.1 The real number L is called the limit of function f : Df → R as x tends to x0 if for any sequence {xn } with xn = x0 , xn ∈ Df , n = 1, 2, . . ., which converges to x0 , the sequence of the function values {f (xn )} converges to L. Thus, we say that function f tends to number L as x tends to (but is not equal to) x0 . As an abbreviation we write lim f (x) = L. x→x0 Note that limit L must be a (finite) number, otherwise we say that the limit of function f as x tends to x0 does not exist. If this limit does not exist, we distinguish two cases. If L = ±∞, we also say that function f is definitely divergent as x tends to x0 , otherwise function f is said to be indefinitely divergent as x tends to x0 . Differentiation 149 In the above definition, the elements of sequence {xn } can be both greater and smaller than x0 . In certain situations, only the limit from one side has to be considered. In the following, we consider such one-sided approaches, where the terms of the sequences are either all greater or all smaller than x0 . Definition 4.2 The real number Lr (real number Ll ) is called the right-side (left-side) limit of function f : Df → R as x tends to x0 from the right side (left side) if for any sequence {xn } with xn > x0 (xn < x0 ), xn ∈ Df , n = 1, 2, . . ., the sequence of the function values {f (xn )} converges to Lr (converges to Ll ). We also write lim f (x) = Lr and x→x0 +0 lim f (x) = Ll x→x0 −0 for the right-side and left-side limits, respectively. A relationship between one-sided limits and the limit as introduced in Definition 4.1 is given by the following theorem. THEOREM 4.1 The limit of a function f : Df → R as x tends to x0 exists if and only if both the right-side and left-side limits exist and coincide, i.e. lim f (x) = x→x0 +0 lim f (x) = lim f (x). x→x0 −0 x→x0 We note that it is not necessary for the existence of a limit of function f as x tends to x0 that the function value f (x0 ) at point x0 be defined. Example 4.1 Let function f : R {0} → R with f (x) = |x| x be given. We want to compute L = lim f (x). x→0 However, since lim f (x) = lim x→0+0 x→0+0 x =1 x and lim f (x) = lim x→0−0 the limit of function f as x tends to zero does not exist. Example 4.2 Let function f : R {0} → R with f (x) = sin 1 x x→0−0 −x = −1, x 150 Differentiation be given. If we want to compute L = lim f (x), x→0 we find that even both one-sided limits Lr = lim f (x) and x→0+0 Ll = lim f (x) x→0−0 do not exist since, as x tends to zero either from the left or from the right, the function values of the sine function oscillate in the interval [−1, 1], i.e. the function values change very quickly between the numbers −1 and 1 (even with increasing frequency as x tends to zero). Thus, the limit of function f as x tends to zero does not exist and function f is indefinitely divergent as x tends to zero. Next, we give some useful properties of limits. Assume that the limits THEOREM 4.2 lim f1 (x) = L1 and x→x0 lim f2 (x) = L2 x→x0 exist. Then the following limits exist, and we obtain: 000e 000f (1) lim f1 (x) + f2 (x) = lim f1 (x) + lim f2 (x) = L1 + L2 ; x→x0 x→x0 000e 000f x→x0 (2) lim f1 (x) − f2 (x) = lim f1 (x) − lim f2 (x) = L1 − L2 ; x→x0 x→x0 x→x0 000e 000f (3) lim f1 (x) · f2 (x) = lim f1 (x) · lim f2 (x) = L1 · L2 ; x→x0 x→x0 f1 (x) x→x0 f2 (x) x→x0 lim f1 (x) x→x0 provided that L2 = 0; = LL12 lim f2 (x) 0018 0017 √ (5) lim f1 (x) = lim f1 (x) = L1 provided that L1 ≥ 0; x→x0 x→x0 ' (n (6) lim [f1 (x)]n = lim f1 (x) = Ln1 ; (4) lim = x→x0 x→x0 x→x0 ( ' lim f1 (x) (7) lim af1 (x) = a x→x0 x→x0 = aL1 . Example 4.3 Given the function f : Df → R with 0019 x2 + 3x − 1 , f (x) = x we compute the limit L = lim f (x). x→1 Differentiation 151 Applying Theorem 4.2, we obtain 0019 0019 0016 x2 + 3x − 1 limx→1 x2 + 3 limx→1 x − 1 1+3−1 √ = = = 3. L = lim x→1 x limx→1 x 1 Example 4.4 Given the function f : Df → R with √ x−2 , f (x) = x−4 we compute the limit L = lim f (x). x→4 If we apply Theorem 4.2, part (4), and separately determine the limit of the numerator and the denominator, we find that both terms tend to zero, and we cannot find the limit in this way. Therefore, we rationalize the numerator by multiplying the numerator and the denominator √ by x + 2 and obtain: √ √ ( x − 2)( x + 2) x−4 1 1 1 L = lim = lim √ =√ = . = lim √ √ x→4 (x − 4)( x + 2) x→4 (x − 4)( x + 2) x→4 x + 2 4 4+2 4.1.2 Continuity of a function Definition 4.3 A function f : Df → R is said to be continuous at x0 ∈ Df if the limit of function f as x tends to x0 exists and if this limit coincides with the function value f (x0 ), i.e. lim f (x) = f (x0 ). x→x0 Alternatively, we can give the following equivalent definition of a continuous function at x0 using the (δ − ε) notation. Definition 4.3∗ A function f : Df → R is said to be continuous at x0 ∈ Df if, for any real number ε > 0, there exists a real number δ(ε) such that inequality |x − x0 | < δ(ε) implies inequality |f (x) − f (x0 )| < ε. We illustrate the latter definition in Figure 4.1. Part (a) represents a continuous function at x0 . This means that for any ε > 0 (in particular, for any arbitrarily small positive ε), we can state a number δ depending on ε such that for all x from the open interval (x0 − δ, x0 + δ) the function values f (x) are within the open interval (f (x) − ε, f (x) + ε) (i.e. in the dashed area). In other words, continuity of a function at some point x0 ∈ Df means that small 152 Differentiation Figure 4.1 Continuous and discontinuous functions. changes in the independent variable x lead to small changes in the dependent variable y. Figure 4.1(b) represents a function which is not continuous at x0 . For some (small) ε > 0, we cannot give a value δ such that for all x ∈ (x0 − δ, x0 + δ) the function values are in the dashed area. If the one-sided limits of a function f as x tends to x0 are different (or one or both limits do not exist), or if they are identical but the function value f (x0 ) is not defined or value f (x0 ) is defined but not equal to both one-sided limits, then function f is discontinuous at x0 . Next, we classify some types of discontinuities. If the limit of function f as x tends to x0 exists but the function value f (x0 ) is different or function f is even not defined at point x0 , we have a removable discontinuity. In the case when function f is not defined at point x0 , we also say that function f has a gap at x0 . A function f has a finite jump at x0 if both one-sided limits of function f as x tends to x0 − 0 and x tends to x0 + 0 exist and they are different. The following discontinuities characterize situations when at least one of the one-sided limits of function f as x tends to x0 ± 0 does not exist. If one of the one-sided limits of function f as x tends to x0 ± 0 exists, but from the Differentiation 153 other side function f tends to ∞ or −∞, then we say that function f has an infinite jump at point x0 . A rational function f = P/Q has a pole at point x0 if Q(x0 ) = 0 but P(x0 ) = 0. (As a consequence, the function values of f as x tends to x0 − 0 or to x0 + 0 tend either to ∞ or −∞, i.e. function f is definitely divergent as x tends to x0 .) The multiplicity of the zero x0 of polynomial Q defines the order of the pole. In the case of a pole of even order, the sign of function f does not change ‘at’ point x0 while in the case of a pole of odd order the sign of function f changes ‘at’ point x0 . Finally, function f has an oscillation point ‘at’ x0 if function f is indefinitely divergent as x tends to x0 (i.e. neither the limit of function f as x tends to x0 exist nor function f tends to ±∞ as x tends to x0 ). In the above cases of a (finite or infinite) jump, a pole and an oscillation point, we also say that function f has an irremovable discontinuity at point x0 . For illustration, we consider the following examples. Example 4.5 Let function f : Df → R with f (x) = x2 + x − 2 x−1 be given. In this case, we have x0 = 1 ∈ Df , but lim f (x) = 3 x→x0 (see Figure 4.2). Thus, function f has a gap at point x0 = 1. This is a removable discontinuity, since we can define a function f ∗ : R → R with 0013 f (x) for x = 1 f ∗ (x) = 3 for x = 1 which is continuous at point x0 = 1. Figure 4.2 Function f with f (x) = (x2 + x − 2)/(x − 1). 154 Differentiation Example 4.6 We consider function f : Df → R with 1 f (x) = sin . x For x0 = 0, we get x0 ∈ Df , and we have already shown in Example 4.2 that function f is indefinitely divergent as x tends to zero. Therefore, point x0 = 0 is an oscillation point. Analogously to Definition 4.3, we can introduce one-sided continuity. If the limit of function f as x tends to x0 + 0 exists and is equal to f (x0 ), then function f is called right-continuous at point x0 . In the same way, we can define the left-continuity of a function at a point x0 . Function f is called continuous on the open interval (a, b) ∈ Df if it is continuous at all points of (a, b). Analogously, function f is called continuous on the closed interval [a, b] ⊆ Df if it is continuous on the open interval (a, b), right-continuous at point a and left-continuous at point b. If function f is continuous at all points x of the domain Df , then we also say that f is continuous. Properties of continuous functions First, we give some properties of functions that are continuous at particular points. THEOREM 4.3 Let functions f : Df → R and g : Dg → R be continuous at point x0 ∈ Df ∩ Dg . Then functions f + g, f − g, f · g and, for g(x0 ) = 0, also function f /g are continuous at x0 . THEOREM 4.4 Let function f : Df → R be continuous at x0 ∈ Df and function g : Dg → R be continuous at point x1 = f (x0 ) ∈ Dg . Then the composite function g ◦ f is continuous at point x0 , and we get 0003 0004 lim g(f (x)) = g lim f (x) = g(f (x0 )). x→x0 x→x0 The latter theorem implies that we can ‘interchange’ the determination of the limit of function f as x tends to x0 and the calculation of the value of function g. We continue with three properties of functions that are continuous on the closed interval [a, b]. THEOREM 4.5 Let function f : Df → R be continuous on the closed interval [a, b] ⊆ Df . Then function f is bounded on [a, b] and takes its minimal value fmin and its maximal value fmax at points xmin and xmax , respectively, belonging to the interval [a, b], i.e. fmin = f (xmin ) ≤ f (x) ≤ f (xmax ) = fmax for all x ∈ [a, b]. Theorem 4.5 does not necessarily hold for an open or half-open interval, e.g. function f with f (x) = 1/x is not bounded on the left-open interval (0, 1]. THEOREM 4.6 (Bolzano’s theorem) Let function f : Df → R be continuous on the closed interval [a, b] ∈ Df with f (a) · f (b) < 0. Then there exists an x∗ ∈ (a, b) such that f (x∗ ) = 0. Differentiation 155 Theorem 4.6 has some importance for finding zeroes of a function numerically (see later in Chapter 4.8). In order to apply a numerical procedure, one needs an interval [a, b] (preferably as small as possible) so that a zero of the function is certainly contained in this interval. THEOREM 4.7 (intermediate-value theorem) Let function f : Df → R be continuous on the closed interval [a, b] ⊆ Df . Moreover, let fmin be the smallest and fmax the largest value of function f for x ∈ [a, b]. Then for each y∗ ∈ [fmin , fmax ], there exists an x∗ ∈ [a, b] such that f (x∗ ) = y∗ . The geometrical meaning of Theorem 4.7 is illustrated in Figure 4.3. The graph of any line y = y∗ ∈ [fmin , fmax ] intersects at least once the graph of function f . For the particular value y∗ chosen in Figure 4.3, there are two such values x1∗ and x2∗ with f (x1∗ ) = f (x2∗ ) = y∗ . Figure 4.3 Illustration of Theorem 4.7. 4.2 DIFFERENCE QUOTIENT AND THE DERIVATIVE We now consider changes in the function value y = f (x) in relation to changes in the independent variable x. If we change the value x of the independent variable by some value x, the function value may also change by some difference y, i.e. we have y = f (x + x) − f (x). We now consider the ratio of the changes y and x and give the following definition. Definition 4.4 Let f : Df → R and x0 , x0 + x ∈ (a, b) ⊆ Df . The ratio f (x0 + x) − f (x0 ) y = x x is called the difference quotient of function f with respect to points x0 + x and x0 . 156 Differentiation The quotient y/x depends on the difference x and describes the average change of the value of function f between the points x0 and x0 + x. Let us now consider what happens when x → 0, i.e. the difference in the x values of the two points x0 + x and x0 becomes arbitrarily small. Definition 4.5 Let f : Df → R. Then function f with y = f (x) is said to be differentiable at point x0 ∈ (a, b) ⊆ Df if the limit lim x→0 f (x0 + x) − f (x0 ) x exists. The above limit is denoted as df (x0 ) = f 0010 (x0 ) dx and is called the differential quotient or derivative of function f at point x0 . We only mention that one can define one-sided derivatives in an analogous manner, e.g. the left derivative of function f at some point x0 ∈ Df can be defined by considering only the left-side limit in Definition 4.5. If function f is differentiable at each point x of the open interval (a, b) ⊆ Df , function f is said to be differentiable on the interval (a, b). Analogously, if function f is differentiable at each point x ∈ Df , function f is said to be differentiable. If for any x ∈ Df the derivative f 0010 (x) exists, we obtain a function f 0010 with y0010 = f 0010 (x) by assigning to each x ∈ Df the value f 0010 (x). We also say that function f 0010 is the first derivative of function f . If function f 0010 is continuous, we say that the original function f is continuously differentiable. In economics, function f 0010 is also referred to as the marginal of f or the marginal function. It reflects the fact that the derivative characterizes the change in the function value provided that the change in the variable x is sufficiently small, i.e. it can be considered as ‘marginal’. This means that the marginal function can be interpreted as the approximate change in function f when variable x increases by one unit from x0 to x0 + 1, i.e. f 0010 (x0 ) ≈ f (x0 + 1) − f (x0 ). A geometric interpretation of the first derivative is as follows. The value f 0010 (x0 ) is the slope of the tangent to the curve y = f (x) at the point (x0 , f (x0 )) (see Figure 4.4). Consider the line through the points (x0 , f (x0 )) and (x0 + x, f (x0 + x)). For the slope of this line we have y = tan β. x If x becomes smaller and tends to zero, the angle of the corresponding line and the x axis tends to α, and we finally obtain f 0010 (x0 ) = lim x→0 y = tan α. x Differentiation 157 Figure 4.4 Geometrical interpretation of the derivative f 0010 (x0 ). Example 4.7 Let function f : R → R with 0013 y = f (x) = 2x 3−x for x < 1 for 1 ≤ x ≤ 2 be given and let x0 = 1. We obtain lim x→0−0 f (x0 + x) − f (x0 ) 2(x0 + x) − 2x0 2x = lim = lim =2 x→0−0 x→0−0 x x x and lim x→0+0 f (x0 + x) − f (x0 ) 3 − (x0 + x) − (3 − x0 ) = lim x→0+0 x x −x = lim = −1. x→0+0 x Consequently, the differential quotient of function f at point x0 = 1 does not exist and function f is not differentiable at point x0 = 1. However, since both one-sided limits of function f as x tends to 1 ± 0 exist and are equal to f (1) = 2, function f is continuous at point x0 = 1. The latter example shows that a function being continuous at point x0 is not necessarily differentiable at this point. However, the converse is true. THEOREM 4.8 Let function f : Df → R be differentiable at point x0 ∈ Df . Then function f is continuous at x0 . 158 Differentiation 4.3 DERIVATIVES OF ELEMENTARY FUNCTIONS; DIFFERENTIATION RULES Before giving derivatives of elementary functions, we consider an example for applying Definition 4.5. Example 4.8 We consider function f : R → R with f (x) = x3 and determine the derivative f 0010 at point x: f 0010 (x) = lim x→0 f (x + x) − f (x) (x + x)3 − x3 = lim x→0 x x x3 + 3x2 x + 3x(x)2 + (x)3 − x3 x→0 x = lim x · [3x2 + 3xx + (x)2 ] = 3x2 . x→0 x = lim The above formula for the derivative of function f with f (x) = x3 can be generalized to the case of a power function f with f (x) = xn , for which we obtain f 0010 (x) = nxn−1 . The determination of the derivative according to Definition 4.5 appears to be unpractical for frequent use in the case of more complicated functions. Therefore, we are interested in an overview on derivatives of elementary functions which we can use when investigating more complicated functions later. Table 4.1 contains the derivatives of some elementary functions of one variable. Table 4.1 Derivatives of elementary functions y = f (x) y0010 = f 0010 (x) C xn xα ex ax 0 nxn−1 αxα−1 ex a x ln a 1 x 1 x ln a cos x − sin x 1 cos2 x ln x loga x sin x cos x tan x cot x − 1 sin2 x Df −∞ < x < ∞, −∞ < x < ∞, 0 < x < ∞, −∞ < x < ∞ −∞ < x < ∞, C is constant n∈N α∈R a>0 0<x<∞ 0 < x < ∞, −∞ < x < ∞ −∞ < x < ∞ π x = + kπ , 2 x = kπ , a>0 k∈Z k∈Z Differentiation 159 Sum, product and quotient rules Next, we present formulas for the derivative of the sum, difference, product and quotient of two functions. THEOREM 4.9 Let functions f : Df → R and g : Dg → R be differentiable at x ∈ Df ∩Dg . Then the functions f + g, f − g, f · g and, for g(x) = 0, also function f /g are differentiable at point x, and we have: (1) (2) (3) (4) (f (f (f 0003 + g)0010 (x) = f 0010 (x) + g 0010 (x); − g)0010 (x) = f 0010 (x) − g 0010 (x); ·0004g)0010 (x) = f 0010 (x) · g(x) + f (x) · g 0010 (x); f 0010 (x) · g(x) − f (x) · g 0010 (x) f 0010 (x) = . g [g(x)]2 As a special case of part (3) we obtain: if f (x) = C, then (C · g)0010 (x) = C · g 0010 (x), where C is a constant. Example 4.9 Let function f : Df → R with f (x) = 4x4 − 2x + ln x + be given. Using √ x √ x = x1/2 , we obtain f 0010 (x) = 16x3 − 2 + Example 4.10 1 1 + √ . x 2 x In macroeconomics, it is assumed that for a closed economy Y = C + I, where Y is the national income, C is the consumption and I is the investment. Assume that the consumption linearly depends on the national income, i.e. equation C = a + bY holds, where a and b are parameters. Here C 0010 (Y ) = b is called the marginal propensity to consume, a parameter which is typically assumed to be between zero and one. If we want to give the national income as a function depending on the investment, we obtain from Y = a + bY + I the function Y as Y = Y (I ) = a+I . 1−b 160 Differentiation For the derivative Y 0010 (I ) we obtain Y 0010 (I ) = dY 1 = . dI 1−b The latter result can be interpreted such that an increase in I by one unit leads to an increase of Y by 1/(1 − b) > 0 units. Example 4.11 Let functions f : R → R and g : R → R with f (x) = x2 and g(x) = ex be given. Then (f · g)0010 (x) = 2xex + x2 ex = xex (x + 2) 0003 00040010 f 2xex − x2 ex xex (2 − x) x(2 − x) (x) = = = . g ex (ex )2 (ex )2 Derivative of composite and inverse functions Next, we consider composite and inverse functions and give a rule to determine their derivatives. THEOREM 4.10 Let functions f : Df → R and g : Dg → R be continuous. Then: (1) If function f is differentiable at point x ∈ Df and function g is differentiable at point y = f (x) ∈ Dg , then function g ◦ f is also differentiable at point x and (g ◦ f )0010 (x) = (g(f ))0010 (x) = g 0010 (f (x)) · f 0010 (x) (chain rule). (2) Let function f be strictly monotone on Df and differentiable at point x ∈ Df with f 0010 (x) = 0. Then the inverse function f −1 with x = f −1 (y) is differentiable at point y = f (x) ∈ Df −1 and (f −1 )0010 (y) = 1 1 = 0010 −1 . f 0010 (x) f f (y) An alternative formulation of the chain rule with h = g(y) and y = f (x) is given by dh dh dy = · . dx dy dx The rule given in part (2) of Theorem 4.10 for the derivative of the inverse function f −1 can also be written as (f −1 )0010 = 1 . f 0010 ◦ f −1 Differentiation 161 Example 4.12 Suppose that a firm produces only one product. The production cost, denoted by C, depends on the quantity x ≥ 0 of this product. Let √ C = f (x) = 4 + ln(x + 1) + 3x + 1. Then we obtain C 0010 = f 0010 (x) = df (x) 1 3 = + √ . dx x + 1 2 3x + 1 Assume that the firm produces x0 = 133 units of this product. We get C 0010 (133) = 1 3 3 1 + √ + ≈ 0.08246. = 134 2 400 134 40 So, if the current production is increased by one unit, the production cost increases approximately by 0.08246 units. Let function g : Dg → R with Example 4.13 g(x) = cos [ln(3x + 1)]2 be given which is defined for x > −1/3, i.e. Dg = (−1/3, ∞). Setting g = cos v, v = h2 , h = ln y and y = (3x + 1), application of the chain rule yields g 0010 (x) = dg dv dh dy · · · , dv dh dy dx and thus we get g 0010 (x) = − sin [ln(3x + 1)]2 · 2 ln(3x + 1) · =− 1 ·3 3x + 1 6 · sin [ln(3x + 1)]2 · ln(3x + 1). 3x + 1 By means of Theorem 4.10, we can also determine the derivatives of the inverse functions of the trigonometric functions. Consider the function f with 0011 π π0012 y = f (x) = sin x, x∈ − , , 2 2 and the inverse function x = f −1 (y) = arcsin y. 162 Differentiation Since f 0010 (x) = cos x, we obtain from Theorem 4.10, part (2): 1 1 = . f 0010 (x) cos x 0018 √ 0018 Because cos x = + cos2 x = 1 − sin2 x = 1 − y2 for x ∈ (−π/2, π/2), we get (f −1 )0010 (y) = (f −1 )0010 (y) = 0018 1 1 − y2 , and after interchanging variables x and y 1 (f −1 )0010 (x) = √ . 1 − x2 Similarly, we can determine the derivatives of the inverse functions of the other trigonometric functions. We summarize the derivatives of the inverse functions of the trigonometric functions in the following overview: f (x) = arcsin x, 1 f 0010 (x) = √ , 1 − x2 f (x) = arccos x, f 0010 (x) = − √ f (x) = arctan x, f 0010 (x) = f (x) = arccot x, f 0010 (x) = − 1 1 − x2 1 , 1 + x2 1 , 1 + x2 Df = {x ∈ R | −1 < x < 1}; , Df = {x ∈ R | −1 < x < 1}; Df = {x ∈ R | −∞ < x < ∞}; Df = {x ∈ R | −∞ < x < ∞}. Logarithmic differentiation As an application of the chain rule, we consider so-called logarithmic differentiation. Let g(x) = ln f (x) with f (x) > 0. We obtain g 0010 (x) = [ln f (x)]0010 = f 0010 (x) f (x) and thus f 0010 (x) = f (x) · g 0010 (x) = f (x) · [ln f (x)]0010 . Logarithmic differentiation is particularly useful when considering functions f of type f (x) = u(x)v(x) , i.e. both the basis and the exponent are functions depending on variable x. Differentiation 163 Example 4.14 Let function f : Df → R with f (x) = xsin x be given. We set 0011 0012 g(x) = ln f (x) = ln xsin x = sin x · ln x. Applying the formula of logarithmic differentiation, we get g 0010 (x) = cos x · ln x + sin x · 1 x and consequently 0003 0004 1 f 0010 (x) = f (x) · g 0010 (x) = xsin x cos x · ln x + sin x · . x Higher-order derivatives In the following we deal with higher-order derivatives. If function f 0010 with y0010 = f 0010 (x) is again differentiable (see Definition 4.5), function y00100010 = f 00100010 (x) = df 0010 (x) d 2 f (x) = dx (dx)2 is called the second derivative of function f at point x. We can continue with this procedure and obtain in general: y(n) = f (n) (x) = df n−1 (x) d n f (x) , = dx (dx)n n ≥ 2, which denotes the nth derivative of function f at point x ∈ Df . Notice that we use f 0010 (x), f 00100010 (x), f 001000100010 (x), and for n ≥ 4, we use the notation f (n) (x). Higher-order derivatives are used for instance in the next section when investigating specific properties of functions. Example 4.15 Let function f : Df → R with f (x) = 3x2 + 1 + e2x x 164 Differentiation be given. We determine all derivatives until fourth order and obtain 1 + 2e2x x2 2 f 00100010 (x) = 6 + 3 + 4e2x x 6 f 001000100010 (x) = − 4 + 8e2x x 24 f (4) (x) = 5 + 16e2x . x f 0010 (x) = 6x − If the derivatives f 0010 (x0 ), f 00100010 (x0 ), . . . , f (n) (x0 ) all exist, we say that function f is n times differentiable at point x0 ∈ Df . If f (n) is continuous at point x0 , then function f is said to be n times continuously differentiable at x0 . Similarly, if function f (n) is continuous, then function f is said to be n times continuously differentiable. 4.4 DIFFERENTIAL; RATE OF CHANGE AND ELASTICITY In this section, we discuss several possibilities for characterizing the resulting change in the dependent variable y of a function when considering small changes in the independent variable x. Definition 4.6 Let function f : Df → R be differentiable at point x0 ∈ (a, b) ⊆ Df . The differential of function f at point x0 is defined as dy = f 0010 (x0 ) · dx. The differential is also denoted as df . Note that dy (or df ) is proportional to dx, with f 0010 (x0 ) as the factor of proportionality. The differential gives the approximate change in the function value at x0 when changing the argument by (a small value) dx (i.e. from x0 to x0 + dx), i.e. y ≈ dy = f 0010 (x0 ) · dx. The differential is illustrated in Figure 4.5. The differential can be used for estimating the maximal absolute error in the function value when the independent variable is only known with some error. Let |x| = |dx| ≤ , then |y| ≈ |dy| = |f 0010 (x0 )| · |dx| ≤ |f 0010 (x0 )| · . Differentiation 165 Figure 4.5 The differential dy. Example 4.16 Let the length of an edge of a cube be determined as x = 7.5 ± 0.01 cm, i.e. the value x0 = 7.5 cm has been measured, and the absolute error is no greater than 0.01, i.e. |dx| = |x| ≤ = 0.01. For the surface of the cube we obtain S0 = S(x0 ) = 6x02 = 337.5 cm2 . We use the differential to estimate the maximal absolute error of the surface and obtain |S| ≈ |dS| ≤ |S 0010 (x0 )| · = |12x0 | · 0.01 = 0.9 cm2 , i.e. the maximal absolute error in the surface is estimated by 0.9 cm2 , and we get S0 ≈ 337.5 ± 0.9 cm3 . Moreover, for the estimation of the maximal relative error we obtain 0010 0010 0010 0010 0010 S 0010 0010 dS 0010 0.9 0010 0010 0010 0010 0010 S 0010 ≈ 0010 S 0010 ≤ 337.5 ≈ 0.00267, 0 0 i.e. the maximal relative error is estimated by approximately 0.267 per cent. We have already discussed that the derivative of a function characterizes the change in the function value for a ‘very small’ change in the variable x. However, in economics often a modified measure is used to describe the change of the value of a function f . The reason for this is that a change in the price of, e.g. bread by 1 EUR would be very big in comparison with the current price, whereas the change in the price of a car by 1 EUR would be very small. Therefore, economists prefer measures for characterizing the change of a function value in relation to this value itself. This leads to the introduction of the proportional rate of change of a function given in the following definition. 166 Differentiation Definition 4.7 Let function f : Df → R be differentiable at point x0 ∈ (a, b) ⊆ Df and f (x0 ) = 0. The term f 0010 (x0 ) f (x0 ) ρf (x0 ) = is called the proportional rate of change of function f : Df → R at point x0 . Proportional rates of change are often quoted in percentages or, when time is the independent variable, as percentages per year. This percentage rate of change is obtained by multiplying the value ρf (x0 ) by 100 per cent. Let function f : Df → R with Example 4.17 f (x) = x2 e0.1x be given. The first derivative of function f is given by f 0010 (x) = 2xe0.1x + 0.1x2 e0.1x = xe0.1x (2 + 0.1x) and thus, the proportional rate of change at point x ∈ Df is calculated as follows: ρf (x) = f 0010 (x) 2 xe0.1x (2 + 0.1x) = + 0.1. = f (x) x x2 e0.1x We compare the proportional rates of change at points x0 = 20 and x1 = 2, 000. For x0 = 20, we get ρf (20) = 2 + 0.1 = 0.2 20 which means that the percentage rate of change is 20 per cent. For x1 = 2, 000, we get ρf (2, 000) = 2 + 0.1 = 0.101, 2, 000 i.e. the percentage rate of change is 10.1 per cent. Thus, the second percentage rate of change is much smaller than the first one. Definition 4.8 Let function f : Df → R be differentiable at point x0 ∈ (a, b) ⊆ Df and f (x0 ) = 0. The term εf (x0 ) = x0 · f 0010 (x0 ) = x0 · ρf (x0 ) f (x0 ) is called the (point) elasticity of function f : Df → R at x0 . Differentiation 167 In many economic applications, function f represents a demand function and variable x represents price or income. The (point) elasticity of function f corresponds to the ratio of the relative (i.e. percentage) changes of the function values of f and variable x: 0003 0003 0004 0004 x0 f (x) x f (x) εf (x0 ) = lim . · = lim : x→0 f (x0 ) x→0 x f (x0 ) x0 An economic function f is called elastic at point x0 ∈ Df if |εf (x0 )| > 1. On the other hand, it is called inelastic at point x0 if |εf (x0 )| < 1. In the case of |εf (x0 )| = 1, function f is of unit elasticity at point x0 . In economics, revenue R is considered as a function of the selling price p by R(p) = p · D(p), where D is the demand function that is usually decreasing, i.e. if the price p rises, the quantity D sold falls. However, the revenue (as the product of price p and quantity D) may rise or fall. For the marginal revenue function, we obtain R0010 (p) = p · D0010 (p) + D(p). Assume now that R0010 (p) > 0, i.e. from p · D0010 (p) + D(p) > 0 we get for D(p) > 0 εD (p) = p · D0010 (p) > −1, D(p) where due to D0010 (p) ≤ 0, we have |εD (p)| < 1. Thus, if revenue increases when the price increases, it must be at an inelastic interval of the demand function D = D(p), and in the inelastic case, a small increase in the price will always lead to an increase in revenue. Accordingly, in the elastic case, i.e. if |εD (p)| > 1, a small increase in price leads to a decrease in revenue. So an elastic demand function is one for which the quantity demanded is very responsive to price, which can be interpreted as follows. If the price increases by one per cent, the quantity demanded decreases by more than one per cent. Example 4.18 We consider function f : (0, ∞) → R given by f (x) = xx+1 and determine the elasticity of function f at point x ∈ Df . First, we calculate the first derivative f 0010 by applying logarithmic differentiation. We set g(x) = ln f (x) = (x + 1) ln x. Therefore, g 0010 (x) = [ln f (x)]0010 = f 0010 (x) x+1 = ln x + . f (x) x Thus, we obtain for the first derivative 0003 0004 x+1 f 0010 (x) = f (x) · g 0010 (x) = xx+1 · ln x + . x 168 Differentiation For the elasticity at point x ∈ Df , we obtain 0011 x · xx+1 ln x + x · f 0010 (x) εf (x) = = f (x) xx+1 x+1 x 0012 = x ln x + x + 1. 4.5 GRAPHING FUNCTIONS To get a quantitative overview on a function f : Df → R resp. on its graph, we determine and investigate: (1) (2) (3) (4) (5) (6) (7) domain Df (if not given) zeroes and discontinuities monotonicity of the function extreme points and values convexity and concavity of the function inflection points limits, i.e. how does the function behave when x tends to ±∞. Having the detailed information listed above, we can draw the graph of function f . In the following, we discuss the above subproblems in detail. In connection with functions of one variable, we have already discussed how to determine the domain Df and we have classified the different types of discontinuities. As far as the determination of zeroes is concerned, we have already considered special cases such as zeroes of a quadratic function. For more complicated functions, where finding the zeroes is difficult or analytically impossible, we give numerical procedures for the approximate determination of zeroes later in this chapter. We start with investigating the monotonicity of a function. 4.5.1 Monotonicity By means of the first derivative f 0010 , we can determine intervals in which a function f is (strictly) increasing or decreasing. In particular, the following theorem can be formulated. THEOREM 4.11 Let function f : Df → R be differentiable on the open interval (a, b) and let I = [a, b] ⊆ Df . Then: (1) (2) (3) (4) (5) Function f is increasing on I if and only if f 0010 (x) ≥ 0 for all x ∈ (a, b). Function f is decreasing on I if and only if f 0010 (x) ≤ 0 for all x ∈ (a, b). Function f is constant on I if and only if f 0010 (x) = 0 for all x ∈ (a, b). If f 0010 (x) > 0 for all x ∈ (a, b), then function f is strictly increasing on I . If f 0010 (x) < 0 for all x ∈ (a, b), then function f is strictly decreasing on I . We recall from Chapter 3 that, if a function is (strictly) increasing or (strictly) decreasing on an interval I ⊆ Df , we say that function f is (strictly) monotone on the interval I . Checking a Differentiation 169 function f for monotonicity requires us to determine the intervals on which function f is monotone and strictly monotone, respectively. Example 4.19 We investigate function f : R → R with f (x) = ex (x2 + 2x + 1) for monotonicity. Differentiating function f , we obtain f 0010 (x) = ex (x2 + 2x + 1) + ex (2x + 2) = ex (x2 + 4x + 3). To find the zeroes of function f 0010 , we have to solve the quadratic equation x2 + 4x + 3 = 0 (notice that ex is positive for x ∈ R) and obtain √ √ x1 = −2 + 4 − 3 = −1 and x2 = −2 − 4 − 3 = −3. Since we have distinct zeroes (i.e. the sign of the first derivative changes ‘at’ each zero) and e.g. f 0010 (0) = 3 > 0, we get that f 0010 (x) > 0 for x ∈ (−∞, −3) ∪ (−1, ∞) and f 0010 (x) < 0 for x ∈ (−3, −1). By Theorem 4.11 we get that function f is strictly increasing on the intervals (−∞, −3] and [−1, ∞) while function f is strictly decreasing on the interval [−3, −1]. 4.5.2 Extreme points First we give the definition of a local and of a global extreme point which can be either a minimum or a maximum. Definition 4.9 A function f : Df → R has a local maximum (minimum) at point x0 ∈ Df if there is an interval (a, b) ⊆ Df containing x0 such that f (x) ≤ f (x0 ) (f (x) ≥ f (x0 ), respectively) (4.1) for all points x ∈ (a, b). Point x0 is called a local maximum (minimum) point. If inequality (4.1) holds for all points x ∈ Df , function f has at point x0 a global maximum (minimum), and x0 is called a global maximum (minimum) point. These notions are illustrated in Figure 4.6. Let Df = [a, b]. In the domain Df , there are two local minimum points x2 and x4 as well as two local maximum points x1 and x3 . The global maximum point is x3 and the global minimum point is the left boundary point a. We now look for necessary and sufficient conditions for the existence of a local extreme point in the case of differentiable functions. THEOREM 4.12 (necessary condition for local optimality) Let function f : Df → R be differentiable on the open interval (a, b) ⊆ Df . If function f has a local maximum or local minimum at point x0 ∈ (a, b), then f 0010 (x0 ) = 0. 170 Differentiation A point x0 ∈ (a, b) with f 0010 (x0 ) = 0 is called a stationary point (or critical point). In searching global maximum and minimum points for a function f in a closed interval I = [a, b] ⊆ Df , we have to search among the following types of points: (1) points in the open interval (a, b), where f 0010 (x) = 0 (stationary points); (2) end points a and b of I ; (3) points in (a, b), where f 0010 (x) does not exist. Points in I according to (1) can be found by means of differential calculus. Points according to (2) and (3) have to be checked separately. Returning to Figure 4.6, there are three stationary points x2 , x3 and x4 . The local maximum point x1 cannot be found by differential calculus since the function drawn in Figure 4.6 is not differentiable at point x1 . Figure 4.6 Local and global optima of function f on Df = [a, b]. The following two theorems present sufficient conditions for so-called isolated local extreme points, for which in inequality (4.1) in Definition 4.9 the strict inequality holds for all x ∈ (a, b) different from x0 . First, we give a criterion for deciding whether a stationary point is a local extreme point in the case of a differentiable function which uses only the first derivative of function f . THEOREM 4.13 (first-derivative test for local extrema) Let function f : Df → R be differentiable on the open interval (a, b) ∈ Df and x0 ∈ (a, b) be a stationary point of function f . Then: (1) If f 0010 (x) > 0 for all x ∈ (a∗ , x0 ) ⊆ (a, b) and f 0010 (x) < 0 for all x ∈ (x0 , b∗ ) ⊆ (a, b), then x0 is a local maximum point of function f . (2) If f 0010 (x) < 0 for all x ∈ (a∗ , x0 ) ⊆ (a, b) and f 0010 (x) > 0 for all x ∈ (x0 , b∗ ) ⊆ (a, b), then x0 is a local minimum point of function f . (3) If f 0010 (x) > 0 for all x ∈ (a∗ , x0 ) ⊆ (a, b) and for all x ∈ (x0 , b∗ ) ⊆ (a, b), then x0 is not a local extreme point of function f . The same conclusion holds if f 0010 (x) < 0 on both sides of x0 . Differentiation 171 For instance, part (1) of Theorem 4.13 means that, if there exists an interval (a∗ , b∗ ) around x0 such that function f is strictly increasing to the left of x0 and strictly decreasing to the right of x0 in this interval, then x0 is a local maximum point of function f . The above criterion requires the investigation of monotonicity properties of the marginal function of f . The following theorem presents an alternative by using higher-order derivatives to decide whether a stationary point is a local extreme point or not provided that the required higher-order derivatives exist. THEOREM 4.14 (higher-order derivative test for local extrema) Let f : Df → R be n times continuously differentiable on the open interval (a, b) ⊆ Df and x0 ∈ (a, b) be a stationary point. If f 0010 (x0 ) = f 00100010 (x0 ) = f 001000100010 (x0 ) = · · · = f (n−1) (x0 ) = 0 and f (n) (x0 ) = 0, where number n is even, then point x0 is a local extreme point of function f , in particular: (1) If f (n) (x0 ) < 0, then function f has at point x0 a local maximum. (2) If f (n) (x0 ) > 0, then function f has at point x0 a local minimum. Example 4.20 f (x) = We determine all local extreme points of function f : (0, ∞) → R with 1 · ln2 3x. x To check the necessary condition of Theorem 4.12, we determine f 0010 (x) = − 1 1 1 1 · 3 = 2 · ln 3x(2 − ln 3x). · ln2 3x + · 2 ln 3x · x 3x x2 x From equality f 0010 (x) = 0, we obtain the following two cases which we need to consider. Case a ln 3x = 0. Then we obtain 3x = e0 = 1 which yields x1 = 1/3. Case b 2 − ln 3x = 0. Then we obtain ln 3x = 2 which yields x2 = e2 /3. To check the sufficient condition for x1 and x2 to be local extreme points according to Theorem 4.14, we determine f 00100010 as follows: 0003 0004 2 1 1 1 1 f 00100010 (x) = − 3 · ln 3x + 2 · · 3 (2 − ln 3x) − 2 · ln 3x · ·3 3x x x 3x x 0003 0004 2 1 1 = − 3 · ln 3x + 3 (2 − ln 3x) − 3 · ln 3x x x x = 2 · (1 − 3 ln 3x + ln2 3x). x3 In particular, we obtain 0003 0004 1 f 00100010 = 54 > 0 3 and f 00100010 0003 1 2 e 3 0004 = 54 54 · (1 − 3 · 2 + 22 ) = − 6 < 0. e6 e Hence, x1 = 1/3 is a local minimum point with f (x1 ) = 0, and x2 = e2 /3 is a local maximum point with f (x2 ) = 12/e2 . 172 Differentiation Example 4.21 A monopolist (i.e. an industry with a single firm) producing a certain product has the demand–price function (also denoted as the inverse demand function) p(x) = −0.04x + 200, which describes the relationship between the (produced and sold) quantity x of the product and the price p at which the product sells. This strictly decreasing function is defined for 0 ≤ x ≤ 5, 000 and can be interpreted as follows. If the price tends to 200 units, the number of customers willing to buy the product tends to zero, but if the price tends to zero, the sold quantity of the product tends to 5,000 units. The revenue R of the firm in dependence on the output x is given by R(x) = p(x) · x = (−0.04x + 200) · x = −0.04x2 + 200x. Moreover, let the cost function C describing the cost of the firm in dependence on the produced quantity x be given by C(x) = 80x + 22, 400. This yields the profit function P with P(x) = R(x) − C(x) = −0.04x2 + 200x − (80x + 22, 400) = −0.04x2 + 120x − 22, 400. We determine the production output that maximizes the profit. Looking for stationary points we obtain P 0010 (x) = −0.08x + 120 = 0 which yields the point xP = 1, 500. Due to P 00100010 (xP ) = −0.08 < 0, the output xP = 1, 500 maximizes the profit with P(1, 500) = 67, 600 units. Points with P(x) = 0 (i.e. revenue R(x) is equal to cost C(x)) are called break-even points. In the example, we obtain from P(x) = 0 the equation x2 − 3, 000x + 560, 000 = 0, which yields the roots (break-even points) 0018 x1 = 1, 500 − 1, 690, 000 = 200 and x2 = 1, 500 + 0018 1, 690, 000 = 2, 800, i.e. an output x ∈ (200; 2, 800) leads to a profit for the firm. Finally, we mention that when maximizing revenue, one gets from R0010 (x) = −0.08x + 200 = 0 Differentiation 173 the stationary point xR = 2, 500, which is, because R00100010 (2, 500) = −0.08 < 0, indeed the output that maximizes revenue. However, in the latter case, the profit is only P(2, 500) = 27, 600 units. Example 4.22 The cost function C : R+ → R of an enterprise producing a quantity x of a product is given by C(x) = 0.1x2 − 30x + 25, 000. Then, the average cost function Ca measuring the cost per unit produced is given by Ca (x) = 25, 000 C(x) = 0.1x − 30 + . x x We determine local minimum points of the average cost function Ca . We obtain Ca0010 (x) = 0.1 − 25, 000 . x2 The necessary condition for a local extreme point is Ca0010 (x) = 0. This corresponds to 0.1x2 = 25, 000 which has the two solutions x1 = 500 and x2 = −500. Since x2 ∈ DC , the only stationary point is x1 = 500. Checking the sufficient condition, we obtain Ca00100010 (x) = 2 · 25, 000 x3 Ca00100010 (500) = 50, 000 > 0, 125, 000, 000 and i.e. the produced quantity x1 = 500 minimizes average cost with Ca (500) = 70. Example 4.23 A firm wins an order to design and produce a cylindrical container for transporting a liquid commodity. This cylindrical container should have a given volume V0 . The cost of producing such a cylinder is proportional to its surface. Let R denote the radius and H denote the height of a cylinder. Among all cylinders with given volume V0 , we want 174 Differentiation to determine that with the smallest surface S (i.e. that with lowest cost). We know that the volume is obtained as V0 = V (R, H ) = πR2 H , (4.2) and the surface can be determined by the formula S = S(R, H ) = 2πR2 + 2πRH . (4.3) From formula (4.2), we can eliminate variable H which yields H= V0 . πR2 (4.4) Substituting the latter term into formula (4.3), we obtain the surface as a function of one variable, namely in dependence on the radius R, since V0 is a constant: 0003 0004 V0 V0 2 = 2 πR + S = S(R) = 2πR2 + 2πR · . R πR2 Now we look for the minimal value of S(R) for 0 < R < ∞. For applying Theorems 4.12 and 4.14, we determine the first and second derivatives of function S = S(R): 0004 0004 0003 0003 V0 2V0 S 0010 (R) = 2 2πR − 2 and S 00100010 (R) = 2 2π + 3 . R R Setting S 0010 (R) = 0 yields 2πR3 − V0 = 0 (4.5) and thus the stationary point is obtained as follows: 0016 3 V0 . R1 = 2π Notice that the other two roots of equation (4.5) are complex and therefore not a candidate point for a minimal surface of the cylinder. Moreover, we obtain S 00100010 (R1 ) = 2(2π + 4π) = 12π > 0, i.e. the surface becomes minimal for the radius R1 . One can also argue without checking the sufficient condition that the only stationary point with a positive radius must be a minimum point since, both as R → 0 and as R → ∞, the surface of the cylinder tends to infinity so that for R1 , the surface must be at its minimal value. Determining the corresponding value for the height, we obtain from equality (4.4) 0016 0016 V0 V0 V0 · (2π)2/3 3 V0 2/3 3 V0 H1 = = 0003 0017 00042 = =2 · =2· = 2R1 . 2/3 2 π 2π πR1 π · V0 V0 π 3 2π Thus, the surface of a cylinder with given volume is minimal if the height of the cylinder is equal to its diameter, i.e. H = 2R. Differentiation 175 4.5.3 Convexity and concavity The definition of a convex and a concave function has already been given in Chapter 3. Now, by means of differential calculus, we can give a criterion to check whether a function is (strictly) convex or concave on a certain interval I . THEOREM 4.15 Let function f : Df → R be twice differentiable on the open interval (a, b) ⊆ Df and let I = [a, b]. Then: (1) (2) (3) (4) Function f is convex on I if and only if f 00100010 (x) ≥ 0 for all x ∈ (a, b). Function f is concave on I if and only if f 00100010 (x) ≤ 0 for all x ∈ (a, b). If f 00100010 (x) > 0 for all x ∈ (a, b), then function f is strictly convex on I . If f 00100010 (x) < 0 for all x ∈ (a, b), then function f is strictly concave on I . Example 4.24 Let function f : R → R with f (x) = aebx , a, b ∈ R+ {0} be given. We obtain f 0010 (x) = abebx and f 00100010 (x) = ab2 ebx . For a, b > 0, we get f 0010 (x) > 0 and f 00100010 (x) > 0 for all x ∈ Df , i.e. function f is strictly increasing and strictly convex. In this case, we say that function f has a progressive growth (since it grows faster than a linear function, which has a proportionate growth). Consider now function g : R+ → R with g(x) = a ln(1 + bx), a, b ∈ R+ {0}. We obtain g 0010 (x) = ab 1 + bx and g 00100010 (x) = − ab2 . (1 + bx)2 For a, b > 0, we get g 0010 (x) > 0 and g 00100010 (x) < 0 for all x ∈ Dg , i.e. function g is strictly increasing and strictly concave. In this case, we say that function g has degressive growth. Example 4.25 f (x) = We investigate function f : R → R with 2x x2 + 1 176 Differentiation for convexity and concavity. We obtain f 0010 (x) = 2(x2 + 1) − 2x · 2x 1 − x2 =2· 2 2 2 (x + 1) (x + 1)2 f 00100010 (x) = 2 · =2· =4· −2x(x2 + 1)2 − (1 − x2 ) · 2 · (x2 + 1) · 2x (x2 + 1)4 000f 000e (x2 + 1) · − 2x(x2 + 1) − (1 − x2 ) · 4x (x2 + 1)4 x3 − 3x . (x2 + 1)3 To find the zeroes of the second derivative f 00100010 , we have to solve the cubic equation x3 − 3x = x · (x2 − 3) = 0 which yields the solutions x1 = 0, x2 = √ 3 and √ x3 = − 3. √ ∈ (x1 , x2 ) = (0, 3), Since e.g. f 00100010 (1) = 4 · (1 − 3)/23 = −1 < 0, we have f 00100010 (x) < 0 for x √ i.e. by Theorem 4.15, function f is strictly concave on the interval [0, 3]. Moreover, since we have distinct zeroes, the sign of the second derivative changes ‘at’√each zero, and by √ Theorem 4.15 we obtain: function √ √ f is strictly convex on [− 3, 0] ∪ [ 3, ∞) and strictly concave on (−∞, − 3] ∪ [0, 3]. To decide whether a function is convex or concave, the following notion of an inflection point can be helpful. Definition 4.10 Let function f : Df → R be twice differentiable on the open interval (a, b) ⊆ Df . Point x0 ∈ (a, b) is called an inflection point of function f when f changes at x0 from being convex to being concave or vice versa, i.e. if there is an interval (a∗ , b∗ ) ⊆ (a, b) containing x0 such that either of the following two conditions holds: (1) f 00100010 (x) ≥ 0 if a∗ < x < x0 (2) f 00100010 (x) ≤ 0 if a∗ < x < x0 and and f 00100010 (x) ≤ 0 if x0 < x < b∗ f 00100010 (x) ≥ 0 if x0 < x < b∗ . or Consequently, if the second derivative changes the sign ‘at’ point x0 , then point x0 is an inflection point of function f . The notion of an inflection point is illustrated in Figure 4.7. Next, we give a criterion for an inflection point of function f . Differentiation 177 Figure 4.7 Inflection point x0 of function f . THEOREM 4.16 Let function f : Df → R be n times continuously differentiable on the open interval (a, b) ⊆ Df . Point x0 ∈ (a, b) is an inflection point of function f if and only if f 00100010 (x0 ) = f 001000100010 (x0 ) = · · · = f (n−1) (x0 ) = 0 and f (n) (x0 ) = 0, where n is odd. Example 4.26 We consider function f : R → R with f (x) = x4 + 2x3 − 12x2 + 4 = 0 and determine inflection points. We obtain f 0010 (x) = 4x3 + 6x2 − 24x and f 00100010 (x) = 12x2 + 12x − 24. In order to solve f 00100010 (x) = 0, we have to find the roots of the quadratic equation x2 + x − 2 = 0, which gives x1 = 1 and x2 = −2 as candidates for an inflection point. Using f 001000100010 (x) = 24x + 12, we obtain f 001000100010 (1) = 36 = 0 and f 001000100010 (−2) = −36 = 0, i.e. both points x1 = 1 and x2 = −1 are inflection points of function f . 178 Differentiation 4.5.4 Limits We have already discussed some rules for computing limits of sums, differences, products or quotients of functions. However, it was necessary that each of the limits exists, i.e. each of the limits was a finite number. The question we consider now is what happens when we want to determine the limit of a quotient of two functions, and both limits of the function in the numerator and the function in the denominator tend to ∞ as x tends to a specific value x0 . The same question arises when both functions in the numerator and in the denominator tend to zero as x approaches some value x0 . Definition 4.11 If in a quotient both the numerator and the denominator tend to zero as x tends to x0 , we call such a limit an indeterminate form of type “0/0”, and we write lim x→x0 f (x) = “ 0/0”. g(x) The notion ‘indeterminate form’ indicates that the limit cannot be found without further examination. There exist six further indeterminate forms as follows: “∞/∞”, “0 · ∞”, “∞ − ∞”, “00 ”, “∞0 ”, “1∞ ”. For the two indeterminate forms “0/0” and “∞/∞”, the limit can be found by means of the following rule. THEOREM 4.17 (Bernoulli–l’Hospital’s rule) Let functions f : Df → R and g : Dg → R both tend to zero as x tends to x0 , or f and g both tend to ∞ as x tends to x0 . Moreover, let f and g be continuously differentiable on the open interval (a, b) ∈ Df ∩ Dg containing x0 and g 0010 (x) = 0 for x ∈ (a, b). Then lim x→x0 f (x) f 0010 (x) = lim 0010 = L. g(x) x→x0 g (x) Here either the limit exists (i.e. value L is finite) or the limit does not exist. It may happen that after application of Theorem 4.17, we still have an indeterminate form of type “0/0” or “∞/∞”. In that case, we apply Theorem 4.17 repeatedly as long as we have found L. Moreover, we mention that Theorem 4.17 can be applied under appropriate assumptions also to one-sided limits (i.e. x → x0 + 0 or x → x0 − 0) as well as to the cases x → −∞ and x → ∞. To illustrate Theorem 4.17, consider the following example. Example 4.27 L = lim x→0 We determine the limit esin 2x − e2x , sin 2x − 2x Differentiation 179 which is an indeterminate form of type “0/0”. Applying Theorem 4.17 repeatedly three times, we obtain 2 esin 2x cos 2x − e2x esin 2x cos 2x − e2x L = lim = lim “0/0” x→0 x→0 2 cos 2x − 2 cos 2x − 1 sin 2x 2 cos 2x − sin 2x − e2x 2 e = lim “0/0” x→0 −2 sin 2x 2esin 2x cos 2x cos2 2x − sin 2x + esin 2x (−4 cos 2x · sin 2x − 2 cos 2x) − 2e2x = lim x→0 −2 cos 2x = 1. In the above computations, we have to apply Theorem 4.17 three times since after the first and second applications of Theorem 4.17, we still have an indeterminate form of type “0/0”. We note that all other indeterminate forms can be transformed to one of the forms “0/0” or “∞/∞”, and they can be treated by Theorem 4.17, too. We discuss these transformations in some more detail. Let lim g(x) = 0 lim f (x) = ∞, and x→x0 x→x0 which corresponds to an indeterminate form of type “0 · ∞”. Then we can reduce the latter case to one discussed above by considering the reciprocal expression of one function, i.e. we use either lim [f (x) · g(x)] = lim x→x0 g(x) 1 f (x) x→x0 or f (x) lim [f (x) · g(x)] = lim x→x0 x→0 1 g(x) . The first limit on the right-hand side is an indeterminate form of type “0/0”, and the second one is an indeterminate form of type “∞/∞”. In the case of an indeterminate form of type “∞ − ∞”, i.e. lim [g(x) − f (x)] lim g(x) = ∞ with x→x0 x→x0 and lim f (x) = ∞, x→x0 we can apply the following general transformation: g(x) − f (x) = 1 1 g(x) − 1 1 f (x) = 1 f (x) − 1 g(x) 1 g(x)f (x) , where the right-hand side is an indeterminate form of type “0/0”. However, we can often apply an easier reduction to obtain an indeterminate form that we have already discussed. 180 Differentiation We determine 00110018 0012 3 3 x − 2x2 − x . L = lim Example 4.28 x→∞ Obviously, this is an indeterminate form of type “∞ − ∞”. We reduce this form to the type “0/0” without the above general transformation. Instead of it, we perform for x = 0 the following algebraic manipulation: 0018 3 0003 0004( 0004 ' 0003 2 1/3 2 1/3 −x =x 1− −x x3 − 2x2 − x = x3 1 − x x %0003 & 0004 2 1/3 =x 1− −1 . x “∞ − ∞” “ ∞ · 0” Thus we obtain 00110018 0012 3 3 lim x − 2x2 − x = lim x→∞ 0011 1− 2 x 00121/3 −1 1 x x→∞ which is an indeterminate form of type “0/0”. By applying Theorem 4.17, we obtain L = lim 1 3 0011 1− x→∞ 2 x 0012−2/3 − x12 · 2 x2 % = lim x→∞ & 0003 0004 2 2 2 −2/3 − =− . 1− 3 x 3 If we have an indeterminate form of type “00 ”, “∞0 ” or “1∞ ”, i.e. the limit to be determined has the structure L = lim g(x)f (x) x→x0 with g(x) > 0, we take the natural logarithm on both sides of the equation y(x) = g(x)f (x) and obtain 000f 000e ln y(x) = ln g(x)f (x) = f (x) · ln g(x). If now x tends to x0 , we obtain an indeterminate form of type “0 · ∞” which we can already treat. Suppose that this limit exists and let 000e 000f L1 = lim f (x) · ln g(x) , x→x0 then we finally get L = eL1 . Differentiation 181 Notice that the latter is obtained from Theorem 4.4 since the exponential function is continuous. We determine the limit Example 4.29 L = lim (1 + x)1/x x→0 which is an indeterminate form of type “1∞ ”. Setting y(x) = (1 + x)1/x and taking the natural logarithm on both sides, we obtain ln y(x) = 1 · ln(1 + x). x Applying Theorem 4.17, we get by Bernoulli–l’Hospital’s rule ln(1 + x) = lim x→0 x lim x→0 1 1+x 1 = lim x→0 1 = 1. 1+x Thus, we obtain lim ln y(x) = 1. x→0 Therefore, L = lim y(x) = e1 = e. x→0 4.5.5 Further examples In the following example, we investigate all the properties listed above when graphing functions. Example 4.30 f (x) = Let us discuss in detail function f : Df → R with x2 + 5x + 22 . x−2 Function f is defined for all x ∈ R with x = 2, i.e. Df = R {2}. At point x0 = 2, there is a pole of first order with lim x→2−0 f (x) = −∞ and lim x→2+0 f (x) = ∞. 182 Differentiation Figure 4.8 Graph of function f with f (x) = (x2 + 5x + 22)/(x − 2). From f (x) = x2 + 5x + 22 = 0, we obtain the roots 0016 0016 25 25 5 5 x1 = − + − 22 and x2 = − − − 22. 2 4 2 4 Since 25 63 − 22 = − < 0, 4 4 there do not exist real zeroes of function f . We obtain for the first and second derivatives f 0010 (x) = (2x + 5)(x − 2) − (x2 + 5x + 22) x2 − 4x − 32 = 2 (x − 2) (x − 2)2 f 00100010 (x) = (2x − 4)(x − 2)2 − 2(x − 2)(x2 − 4x − 32) 72 = . (x − 2)4 (x − 2)3 and Setting f 0010 (x) = 0, we obtain x2 − 4x − 32 = 0 which yields the stationary points √ √ x3 = 2 + 4 + 32 = 8 and x4 = 2 − 4 + 32 = −4. Differentiation 183 f 00100010 (x From 3 ) > 0, we get that x3 = 8 is a local minimum point with f (x3 ) = 21, and from f 00100010 (x4 ) < 0, we get that x4 = −4 is a local maximum point with f (x4 ) = −3. Since f 0010 (x) > 0 for all x ∈ (−∞, −4) ∪ (8, ∞), function f is strictly increasing on (−∞, −4] ∪ [8, ∞). For x ∈ (−4, 2) ∪ (2, 8), we have f 0010 (x) < 0, and thus function f is strictly decreasing on [−4, 2) ∪ (2, 8]. To determine candidates for an inflection point, we set f 00100010 (x) = 0. The latter equation has no solution, and thus function f does not have an inflection point. Because f 00100010 (x) > 0 for x ∈ (2, ∞), function f is strictly convex on the interval (2, ∞). Since f 00100010 (x) < 0 for x ∈ (−∞, 2), function f is strictly concave on the interval (−∞, 2). Moreover, lim f (x) = ∞ x→∞ and lim x→−∞ f (x) = −∞. The graph of function f is given in Figure 4.8. Example 4.31 Let the relationship between the price p of a good and the resulting demand D for this good be given by the demand function D = D(p), where D= 400 − 10p , p+5 0 ≤ p ≤ 40. (4.6) First, we determine the extreme points of function D = D(p) and check the function for monotonicity as well as concavity or convexity. We obtain D0010 (p) = −10(p + 5) − (400 − 10p) · 1 450 =− . (p + 5)2 (p + 5)2 Thus D0010 (p) = 0 for all p ∈ [0, 40], and therefore function D = D(p) cannot have an extreme point since the necessary condition for its existence is not satisfied. Moreover, since D0010 (p) < 0 for all p ∈ (0, 40), function D = D(p) is strictly decreasing on the closed interval [0, 40]. Checking function D = D(p) for convexity and concavity, respectively, we obtain D00100010 (p) = −450(p + 5)−3 · (−2) = 900 >0 (p + 5)3 for all p ∈ (0, 40). Therefore, function D = D(p) is strictly convex on the closed interval [0, 40]. Next, we check function D = D(p) for elasticity. We obtain εD (p) = p · D0010 (p) 450p(p + 5) 45p =− . =− D(p) (p + 5)(40 − p) (p + 5)2 (400 − 10p) We can determine points where function D = D(p) changes between being elastic and inelastic, i.e. we can check, where equality |εD (p)| = 1 holds and obtain 45p = 1. (p + 5)(40 − p) Thus, (p + 5)(40 − p) = 45p 184 Differentiation which yields the quadratic equation p2 + 10p − 200 = 0 with the two solutions √ p1 = −5 + 25 + 200 = −5 + 15 = 10 and p2 = −5 − 15 = −20 < 0. Solution p2 does not belong to the domain DD of function D = D(p). Hence, for p = 10, function D is of unit elasticity. Since |εD (p)| < 1 for p ∈ [0, 10) and |εD (p)| > 1 for p ∈ (10, 40], we get that function D = D(p) is inelastic for p ∈ [0, 10) and elastic for p ∈ (10, 40]. Finally, we look for the function p = p(D) giving the price in dependence on the demand D. Since function D(p) is strictly decreasing on the closed interval [0, 40], the inverse demand function p = p(D) exists. Solving equation (4.6) for p, we obtain D(p + 5) = 400 − 10p p(D + 10) = 400 − 5D p= 400 − 5D . D + 10 Since D = D(p) is strictly monotone, the inverse function p = p(D) is strictly monotone as well. Because D(0) = 80 and D(40) = 0 (see also the inequality in (4.6)), function p = p(D) is defined for D ∈ [0, 80]. 4.6 MEAN-VALUE THEOREM Before presenting the mean-value theorem of differential calculus, we start with a special case. THEOREM 4.18 (Rolle’s theorem) Let function f : Df → R be continuous on the closed interval [a, b] ∈ Df with f (a) = f (b) = 0 and differentiable on the open interval (a, b). Then there exists a point x∗ ∈ (a, b) with f 0010 (x∗ ) = 0. Geometrically, Rolle’s theorem says that in the case of f (a) = f (b), there exists an interior point x∗ of the interval (a, b), where the tangent line is parallel to the x axis. In other words, there is at least one stationary point in the open interval (a, b). The mean-value theorem of differential calculus generalizes Rolle’s theorem to the case f (a) = f (b). THEOREM 4.19 (mean-value theorem of differential calculus) Let function f : Df → R be continuous on the closed interval [a, b] ∈ Df and differentiable on the open interval (a, b). Then there exists at least one point x∗ ∈ (a, b) such that f 0010 (x∗ ) = f (b) − f (a) . b−a Differentiation 185 The mean-value theorem of differential calculus is illustrated in Figure 4.9. In this case, we have two points x1∗ ∈ (a, b) and x2∗ ∈ (a, b) with the property that the derivative of function f at each of these points is equal to the quotient of f (b) − f (a) and b − a. The mean-value theorem of differential calculus can be used e.g. for the approximate calculation of function values. Figure 4.9 Mean-value theorem of differential calculus. Example 4.32 Using the mean-value theorem, we compute f : R+ → R with f (x) = √ 3 √ 3 29. Let us consider function x. We get f 0010 (x) = 1 −2/3 ·x . 3 Applying the mean-value theorem with a = 27 and b = 29, we get f (b) − f (a) = b−a √ 3 √ √ 3 29 − 3 27 29 − 3 1 = = f 0010 (x∗ ) = · (x∗ )−2/3 29 − 27 2 3 which can be rewritten as √ 3 29 = 3 + 1 · (x∗ )−2/3 · 2. 3 Using x∗ = 27 (notice that x∗ is not from the open interval (27, 29), but it can be chosen since function f is differentiable at x∗ = a = 27 and moreover, the value of function f 0010 is 186 Differentiation easily computable at this point), we get √ 3 29 ≈ 3 + 1 2 1 1 2 · ≈ 3.074. · 2 = 3 + · 3 2/3 = 3 + 3 272/3 3 (3 ) 27 Example 4.33 A vehicle starts at time zero and s(t) gives the distance from the starting point (in metres) in dependence on time t (in seconds). It is known that s(t) = t 2 /2 and let a = 0 and b = 20. Then the mean-value theorem reads as s(20) − s(0) = 20 − 0 1 2 · 202 − 0 200 = 10 = s0010 (t ∗ ) = t ∗ . = 20 20 The left-hand side fraction gives the average velocity of the vehicle in the interval [0, 20] which is equal to 10 m per second. The mean-value theorem implies that there is at least one value t ∗ ∈ (0, 20), where the current velocity of the vehicle is equal to 10 m/s. In the above example, it happens at time t ∗ = 10. 4.7 TAYLOR POLYNOMIALS Often it is necessary to approximate as closely as possible a ‘complicated’ function f by a polynomial Pn of some degree n. One way to do this is to require that all n derivatives of function f and of the polynomial Pn , as well as the function values of f and Pn , coincide at some point x = x0 provided that function f is sufficiently often differentiable. Before considering the general case, we discuss this question for some small values of n. First consider n = 1. In this case, we approximate function f around x0 by a linear function (straight line) P1 (x) = f (x0 ) + f 0010 (x0 ) · (x − x0 ). Obviously, we have P1 (x0 ) = f (x0 ) and P10010 (x0 ) = f 0010 (x0 ). For n = 2, we approximate function f by a quadratic function (parabola) P2 with P2 (x) = f (x0 ) + f 0010 (x0 ) · (x − x0 ) + f 00100010 (x0 ) · (x − x0 )2 . 2! In this case, we also have P2 (x0 ) = f (x0 ), P20010 (x0 ) = f 0010 (x0 ) and additionally P200100010 (x0 ) = f 00100010 (x0 ). Suppose now that function f is sufficiently often differentiable, and we wish to approximate function f by a polynomial Pn of degree n with the requirements listed above. THEOREM 4.20 Let function f : Df → R be n + 1 times differentiable on the open interval (a, b) ∈ Df containing points x0 and x. Then function f can be written as f 0010 (x0 ) f 00100010 (x0 ) · (x − x0 ) + · (x − x0 )2 + · · · 1! 2! f (n) (x0 ) + · (x − x0 )n + Rn (x), n! f (x) = f (x0 ) + Differentiation 187 where Rn (x) = f (n+1) (x0 + λ(x − x0 )) · (x − x0 )n+1 , (n + 1)! 0 < λ < 1, is Lagrange’s form of the remainder. The above form of representing function f by some polynomial together with some remainder is known as Taylor’s formula. We still have to explain the meaning of the remainder Rn (x). From equation Rn (x) = f (x) − Pn (x), the remainder gives the difference between the original function f and the approximating polynomial Pn . We are looking for an upper bound for the error when approximating function f by polynomial Pn in some interval I = (x0 , x). To this end, we have to estimate the maximal possible value of the remainder Rn (x∗ ) for x∗ ∈ (x0 , x). (Note that x0 + λ(x − x0 ) with 0 < λ < 1 in Rn (x) above corresponds to some point x∗ ∈ (x0 , x).) If we know that the (n + 1)th derivative of function f in the interval (x0 , x) is bounded by some constant M , i.e. | f (n+1) (x∗ ) | ≤ M for x∗ ∈ (x0 , x), then we obtain from Theorem 4.20: M · (x − x0 )n+1 . (n + 1)! | Rn (x) | ≤ (4.7) Inequality (4.7) gives the maximal error when replacing function f with its nth Taylor polynomial Pn (x) = n 0006 f (k) (x0 ) k=0 k! · (x − x0 )k in the open interval (x0 , x). The above considerations can also be extended to an interval (x, x0 ) to the left of point x0 . (Note that we have to replace (x − x0 ) in formula (4.7) by | x − x0 |.) Example 4.34 Let function f : R → R with f (x) = ex be given. We wish to approximate this function by a polynomial of smallest degree such that for |x| < 0.5 the maximal error does not exceed 10−4 . We choose x0 = 0 and obtain f 0010 (x) = f 00100010 (x) = · · · = f (n) (x) = ex . Thus, we have f (0) = f 0010 (0) = · · · = f (n) (0) = 1. Consequently, f (x) = 1 + x + x3 xn x2 + + ··· + + Rn (x) 2 6 n! 188 Differentiation with Rn (x) = f (n+1) (λx) n+1 eλx = ·x · xn+1 , (n + 1)! (n + 1)! where 0 < λ < 1. For |x| < 0.5, we have eλx < e1/2 ≈ 1.6487 < 2. Thus, we determine n such that Rn (x) < 2 · |x|n+1 2 ≤ 10−4 . < (n + 1)! (n + 1)! · 2n+1 From the latter estimation it follows that we look for the smallest value of n such that (n + 1)! · 2n+1 ≥ 2 · 104 , or equivalently, (n + 1)! · 2n ≥ 104 . For n ∈ {0, 1, 2, 3, 4}, the above inequality is not satisfied. However, for n = 5, we get (n + 1)! · 2n = 720 · 32 = 23,040 = 2.304 · 104 ≥ 104 . We have obtained: 0010 0010 5 0010 0006 xk 00100010 0010 R5 (x) = 0010ex − 0010 ≤ 10−4 0010 k! 0010 for |x| < 0.5, k=0 and thus we can approximate function f with f (x) = ex for |x| < 0.5 by the polynomial P5 with P5 (x) = 1 + x + x3 x4 x5 x2 + + + , 2 6 24 120 and the error of this approximation does not exceed the value 10−4 . Example 4.35 Let us approximate function f : [−4, ∞) → R with √ f (x) = x + 4 = (x + 4)0.5 by a linear √ function P1 (x) around x0 = 0. This approximation should be used to give an estimate of 4.02. We get 0003 0004 1 1 1 f 0010 (x) = · (x + 4)−1/2 and f 00100010 (x) = · − · (x + 4)−3/2 . 2 2 2 Differentiation 189 Consequently, for x0 = 0 we get f (0) = 2 and f 0010 (0) = 1 1 1 · = . 2 2 4 We obtain 1 1 (4.8) f (x) = 2 + x − (λx + 4)−3/2 x2 , 0 < λ < 1. 4 8 √ In order to estimate 4.02, we write 4.02 = 0.02+4 and use representation (4.8). Estimating the maximum error, we get for x = 0.02 the inequalities 0 < λx < 0.02 for 0 < λ < 1, and consequently the inequality λx + 4 > 4. Therefore, we have (λx + 4)−3/2 < 4−3/2 = 1 , 8 and we get the following estimate: 0004 0003 00101 4 1 2 2 00100010 0010 |R1 (0.02)| = 0010 (λ · 0.02 + 4)−3/2 · · < 10−5 . 0010< 8 100 80, 000 8 We conclude that √ 1 4.02 ≈ 2 + · 0.02 = 2.005 4 with an error less than 10−5 . 4.8 APPROXIMATE DETERMINATION OF ZEROES In this section, we discuss several algorithms for finding zeroes of a function approximately. As an application of Taylor polynomials we first consider the determination of zeroes by Newton’s method. We discuss two variants of this method. The first possibility is to approximate function f about x0 by its tangent at x0 : f (x) ≈ P1 (x) = f (x0 ) + f 0010 (x0 ) · (x − x0 ). Let x0 be an initial (approximate) value. Then we have f (x) ≈ P1 (x) = 0, from which we obtain f (x0 ) + f 0010 (x0 ) · (x − x0 ) = 0. We eliminate x and identify it with x1 which yields: x1 = x0 − f (x0 ) . f 0010 (x0 ) Now we replace value x0 on the right-hand side by the new approximate value x1 , and we determine another approximate value x2 . The procedure can be stopped if two successive 190 Differentiation approximate values xn and xn+1 are sufficiently close to each other. The question that we still have to discuss is whether specific assumptions have to be made on the initial value x0 in order to ensure that sequence {xn } converges to a zero of function f . The first assumption that we have to make is x0 ∈ [a, b] with f (a) · f (b) < 0. In the case of a continuous function, this guarantees that the closed interval [a, b] contains at least one zero x of function f (see Theorem 4.6). However, the latter condition is still not sufficient. It can be proved that, if function f is differentiable on the open interval (a∗ , b∗ ) ⊆ Df containing [a, b] and if additionally f 0010 (x) = 0 and f 00100010 (x) = 0 for x ∈ [a, b] and x0 ∈ {a, b} is chosen such that f (x0 ) · f 00100010 (x0 ) > 0, then the above procedure converges to the (unique) zero x ∈ [a, b]. The procedure is illustrated in Figure 4.10. Figure 4.10 Illustration of Newton’s method. A second possibility is that we approximate function f about x0 by a parabola. This procedure is denoted as Newton’s method of second order. We get: f (x) ≈ P2 (x) = f (x0 ) + f 0010 (x0 ) · (x − x0 ) + f 00100010 (x0 ) · (x − x0 )2 . 2! Let x0 be an initial (approximate) value. Then we obtain from f (x) ≈ P2 (x) = 0 f (x0 ) + f 0010 (x0 ) · (x − x0 ) + f 00100010 (x0 ) · (x − x0 )2 = 0. 2! If we eliminate x and identify it with x1 we get: x1A,1B = x0 − f 0010 (x0 ) ± 0017 [f 0010 (x0 )]2 − 2f (x0 )f 00100010 (x0 ) f 00100010 (x0 ) . Differentiation 191 Using now x1 = x1A or x1 = x1B (depending on which value is the better approximation of the zero), instead of x0 on the right-hand side of the above equation, we determine a new approximate value x2 and so on. Again some assumptions have to be made to ensure convergence of sequence {xn } to a zero. It can be proved that, if function f is differentiable on the open interval (a∗ , b∗ ) ⊆ Df containing [a, b] and x0 ∈ [a, b] with f (a) · f (b) < 0 and f 00100010 (x) = 0 for x ∈ [a, b], then the sequence {xn } determined by Newton’s method of second order converges to a zero x ∈ [a, b]. The above two variants of Newton’s method use derivatives. Finally, we briefly discuss two derivative-free methods for determining zeroes numerically. The first procedure presents a general iterative algorithm, also known as a fixed-point procedure, which transforms the equation f (x) = 0 into the form x = ϕ(x). A solution x with x = ϕ(x) is denoted as fixed-point. Starting with some point x0 ∈ [a, b], we compute iteratively the values xn+1 = ϕ(xn ), n = 0, 1, . . . . The procedure stops when two successive values xk and xk+1 are ‘sufficiently close’ to each other. A sufficient condition to ensure convergence of this approach to a fixed point is that there exists a constant L ∈ (0, 1) with |ϕ 0010 (x)| ≤ L for all x ∈ [a, b]. The second derivative-free method is known as regula falsi. In this case, we approximate function f between x0 = a and x1 = b with f (a) · f (b) < 0 by a straight line through the points (a, f (a)) and (b, f (b)). This yields: y − f (a) = f (b) − f (a) · (x − a). b−a Since we look for a zero x of function f , we set y = 0 and obtain the approximate value x2 = a − f (a) · b−a f (b) − f (a) for the zero. In general, we have f (x2 ) = 0. (In the other case, we have found the exact value of the zero.) Now we check which of both closed intervals [a, x2 ] and [x2 , b] contains the zero. If f (a) · f (x2 ) < 0, then there exists a zero in the interval [a, x2 ]. We replace b by x2 , and determine a new approximate value x3 . Otherwise, i.e. if f (x2 ) · f (b) < 0, then interval [x2 , b] contains a zero. In that case, we replace a by x2 and determine then an approximate value x3 , too. Continuing in this way, we get a sequence {xn } converging to a zero x ∈ [a, b]. This procedure is illustrated in Figure 4.11. 192 Differentiation Figure 4.11 Illustration of regula falsi. Example 4.36 Let f : (0, ∞) → R with f (x) = x − lg x − 3 be given. We determine the zero contained in the closed interval [3, 4] exactly to three decimal places. First we apply Newton’s method. We obtain f 0010 (x) = 1 − 1 x ln 10 and f 00100010 (x) = 1 . x2 ln 10 Obviously we have f (3) < 0, f (4) > 0, f 0010 (x) > 0 for x ∈ [3, 4] and f 00100010 (x) > 0 for x ∈ [3, 4]. Using xn+1 = xn − f (xn ) xn − lg xn − 3 , = xn − f 0010 (xn ) 1 − xn ln1 10 n = 0, 1, . . . , we get the results presented in Table 4.2 when starting with x0 = 4. We obtain f (x2 ) < 10−5 , and the value given in the last row in bold face corresponds to the zero rounded to three decimal places. Table 4.2 Application of Newton’s method n xn f (xn ) f 0010 (xn ) 0 1 2 3 4 3.55359 3.55026 3.550 0.39794 0.00292 0.00000 0.89143 0.87779 Differentiation 193 Next, we apply the fixed-point procedure. We can rewrite the given function as x = lg x + 3. Starting with x0 = 3 and using the fixed-point equation xn+1 = ϕ(xn ) = lg xn + 3, n = 0, 1, . . . , we obtain the results given in the second column of Table 4.3. We note that the sufficient convergence condition is satisfied in this case. From ϕ 0010 (x) = 1 x ln 10 and ln 10 > 2, we obtain e.g. | ϕ 0010 (x) | ≤ 1 <1 6 for x ∈ [3, 4]. We note that, if one rewrites the given function as x = 10x−3 , the fixed-point method does not converge to the zero x. (The reader can easily verify that the sufficient convergence condition is not satisfied in this case.) Finally, we apply regula falsi. Letting x0 = a = 3 and x1 = b = 4, we get the results presented in the third and fourth columns of Table 4.3. Table 4.3 Application of derivative-free methods n 0 1 2 3 4 5 Fixed-point method Regula falsi xn xn f (xn ) 3 3.47712 3.54122 3.54915 3.55012 3.550 3 4 3.5452 3.5502 3.550 −0.47712 0.39794 −0.0044 −0.00005 EXERCISES 4.1 Find the left-side limit and the right-side limit of function f : Df → R as x approaches x0 . Can we conclude from these answers that function f has a limit as x approaches x0 ? 0013 0013 √ x for x ≤ 1 a for x = x0 (a) f (x) = ; (b) f (x) = , x0 = 1; a + 1 for x = x0 x2 for x > 1 0013 x for x < 1 (c) f (x) = |x|, x0 = 0; (d) f (x) = , x0 = 1. x + 1 for x ≥ 1 194 Differentiation 4.2 Find the following limits if they exist: (a) 3 2 + 2x ; lim x −x3x −2 x→2 (b) 3 3x2 lim x x− −2 ; x→2 (c) 3 2 lim x − 3x2 . (x − 2) x→2 What type of discontinuity is at point x0 = 2? 4.3 Check the continuity of function f : Df → R at point x = x0 and define the type of discontinuity: √ (a) f (x) = xx+−72− 3 , x0 = 2; (b) (c) f (x) = |x − 1|, 0013 x−1 e for x < 1 f (x) = , 2x for x ≥ 1 x0 = 1; x0 = 1; (d) f (x) = x0 = 1. Are the following functions f : Df → R differentiable at points x = x0 and x = x1 , respectively? e1/(x−1) , 4.4 f (x) = |x x0 = 5, ⎧ − 5| + 6x, for x < 0; ⎨ cos x (b) f (x) = 1 + x2 for 0 ≤ x ≤ 2, x0 = 0, ⎩ 2x + 1 for x > 2. (a) 4.5 x1 = 0; x1 = 2; Find the derivative of each of the following functions f : Df → R with: y = 2x3 − 5x − 3 sin x + sin(π/8); (b) y = (x4 + 4x) sin x; 2 (c) y = x − cos x ; (d) y = (2x3 − 3x + ln x)4 ; 2 + sin x 3 2 4 (e) y = cos(x + 3x − 8) ; (f) y = cos4 (x3 + 3x2 − 8); √ (g) y = sin(ex ); (h) y = ln(x2 + 1). (a) 4.6 Find and simplify the derivative of each of the following functions f : Df → R with: 0016 (a) f (x) = (tan x − 1) cos x; (b) f (x) = ln 1 + sin x ; 1 − sin x √ (c) f (x) = (1 + 3 x)2 . Find the derivatives of the following functions f : Df → R using logarithmic differentiation: (a) f (x) = (tan x)x , Df = (0, π/2); (b) f (x) = sin xx−1 , Df = (1, ∞); √ 2) x − 1 . (c) f (x) = (x + x3 (x − 2)2 4.8 Find the third derivatives of the following functions f : Df → R with: 4.7 (a) (c) f (x) = x2 sin x; 2x2 f (x) = ; (x − 2)3 (b) f (x) = ln(x2 ); (d) f (x) = (x + 1)ex . Differentiation 195 4.9 Given the total cost function C : R+ → R with C(x) = 4x3 − 2x2 + 4x + 100, where C(x) denotes the total cost in dependence on the output x. Find the marginal cost at x0 = 2. Compute the change in total cost resulting from an increase in the output from 2 to 2.5 units. Give an approximation of the exact value by using the differential. 4.10 Let D : DD → R+ be a demand function, where D(p) is the quantity demand in dependence on the price p. Given D(p) = 320/p, calculate approximately the change in quantity demand if p changes from 8 to 10 EUR. 4.11 Given the functions f : Df → R+ with f (x) = 2ex/2 and g : Dg → R+ with √ g(x) = 3 x find the proportional rates of change ρf (x), ρg (x) and the elasticities εf (x) and εg (x) and specify them for x0 = 1 and x1 = 100. For which values x ∈ Df are the functions f and g elastic? Give the percentage rate of change of the function value when value x increases by one per cent. 4.12 Given the price–demand function D = D(p) = 1, 000e−2( p−1) 2 with demand D > 0 and price p > 0, find the (point) elasticity of demand εD (p). Check at which prices the demand is elastic. 4.13 Find all local extrema, the global maximum and the global minimum of the following functions f : Df → R with Df ⊆ [−5, 5], where: (a) 4.14 f (x) = x4 − 3x3 + x2 − 5; (c) f (x) = (e) f (x) = 2 e−x /2 ; √ x√ . 1+ x (b) f (x) = 4 − |x − 3|; 2 (d) f (x) = x x− 2 ; Assume that function f : Df → R with f (x) = a ln x + bx2 + x has local extrema at point x1 = 1 and at point x2 = 2. What can you conclude about the values of a and b? Check whether they are relative maxima or minima. 196 Differentiation 4.15 Determine the following limits by Bernoulli–l’Hospital’s rule: (a) lim sinx x ; x→0 lim (x2 )1/x ; 0003 0004 1 − 1 ; (e) lim x→0 x sin x x2 (c) 4.16 x→∞ (b) (d) (f) 2(x−1) 2 lim e 2 − 2x ; (x − 1) lim xx ; x→1 x→0+0 1 ln sin x . x lim x→0 x 2 For the following functions f : Df → R, determine and investigate domains, zeroes, discontinuities, monotonicity, extreme points and extreme values, convexity and concavity, inflection points and limits as x tends to ±∞. Graph the functions f with f (x) given as follows: 2 2 f (x) = x + 12 ; (b) f (x) = 3x 2− 4x ; (x − 2) −2x + x 4 3 2 x + x (c) f (x) = 3 ; (d) f (x) = e(x−1) /2 ; x − 2x2 + x √ 3 (e) f (x) = ln x −2 2 ; (f) f (x) = 2x2 − x3 . x (a) 4.17 Expand the following functions f : Df → R into Taylor polynomials with corresponding remainder: (a) f (x) = sin πx x0 = 2, n = 5; 4 , (b) f (x) = ln(x + 1), x0 = 0, n ∈ N; (c) f (x) = e−x sin 2x, x0 = 0, n = 4. √ 4.18 Calculate 1/ 5 e by using Taylor polynomials for function f : Df → R with f (x) = ex . The error should be less than 10−6 . 4.19 Determine the zero x of function f : Df → R with f (x) = x3 − 6x + 2 and 0≤x≤1 exactly to four decimal places. Use (a) Newton’s method and (b) regula falsi. 4.20 Find the zero x of function f : Df → R with f (x) = x − ln x − 3 and x > 1. Determine the value with an error less than 10−5 and use Newton’s method. 5 Integration In differential calculus we have determined the derivative f 0010 of a function f . In many applications, a function f is given and we are looking for a function F whose derivative corresponds to function f . For instance, assume that a marginal cost function C 0010 is given, i.e. it is known how cost is changing according to the produced quantity x, and we are looking for the corresponding cost function C. Such a function C can be found by integration, which is the reverse process of differentiation. Another application of integration might be to determine the area enclosed by the graphs of specific functions. In this chapter, we discuss basic integration methods in detail. 5.1 INDEFINITE INTEGRALS We start with the definition of an antiderivative of a function f . Definition 5.1 A function F : DF → R differentiable on an interval I ⊆ DF is called an antiderivative of the function f : Df = DF → R if F 0010 (x) = f (x) for all x ∈ DF . Obviously, the antiderivative is not uniquely determined since we can add any constant the derivative of which is always equal to zero. In particular, we get the following theorem. THEOREM 5.1 If function F : DF → R is any antiderivative of function f : Df → R, then all the antiderivatives F ∗ of function f are of the form F ∗ (x) = F(x) + C, where C ∈ R is any constant. By means of the antiderivative, we can now introduce the notion of the indefinite integral, which is as follows. 198 Integration Definition 5.2 Let function F : DF → R be an antiderivative of function f . The indefinite integral of function f , denoted by f (x) dx, is defined as , f (x) dx = F(x) + C, where C ∈ R is any constant. Function f is also called the integrand, and as we see from the above definition, the indefinite integral of function f gives the infinitely many antiderivatives of the integrand f . The notation dx in the integral indicates that x is the variable of integration, and C denotes the integration constant. The relationship between differentiation and integration can be seen from the following two formulas. We have , d d [F(x) + C] = f (x), f (x) dx = dx dx and , F 0010 (x) dx = , f (x) dx = F(x) + C. The first formula says that, if we differentiate the obtained antiderivative, we again obtain the integrand of the indefinite integral. In this way, one can easily check whether the indefinite integral has been found correctly. Conversely, if we differentiate function F and find then the corresponding indefinite integral with the integrand F 0010 , the result differs from function F only by some constant C. 5.2 INTEGRATION FORMULAS AND METHODS 5.2.1 Basic indefinite integrals and rules From the considerations about differential calculus in Chapter 4, we are already able to present some antiderivatives. Their validity can be easily checked by differentiating the righthand side, where we must obtain the integrand of the corresponding left-hand side integral. Unfortunately, it is not possible to find analytically an antiderivative for any given function. Determining an antiderivative for a given function is often much harder than differentiating the function. Some indefinite integrals , xn+1 xn dx = +C n+1 , xr+1 xr dx = +C r+1 , 1 dx = ln |x| + C x (n ∈ Z, n = −1) (r ∈ R, r = −1, x > 0) (x = 0) Integration 199 , ex dx = ex + C ax dx = ax +C ln a sin x dx = − cos x + C cos x dx = sin x + C = tan x + C = − cot x + C = arcsin x + C = arctan x + C , , , , dx cos2 x , dx 2 , sin x dx √ 1 − x2 , dx 1 + x2 (a > 0, a = 1) 0011 x = 0012 π + kπ, k ∈ Z 2 (x = kπ, k ∈ Z) Next, we give two basic rules for indefinite integrals: , , (1) k · f (x) dx = k · f (x) dx (constant-factor rule); , , , [ f (x) ± g(x)] dx = f (x) dx ± g(x) dx (2) (sum–difference rule). Rule (1) says that we can write a constant factor in front of the integral, and rule (2) says that, if the integrand is the sum (or difference) of two functions, we can determine the indefinite integral as the sum (or difference) of the corresponding two integrals. Using the given list of definite integrals and the two rules above, we are now able to find indefinite integrals for some simple functions. Example 5.1 Given is the integral , 0011 0012 I= 2x3 + 3x − 2 sin x dx. Applying the rules for indefinite integrals, we can split the integral into several integrals and solve each of them by using the list of indefinite integrals given above. We obtain: , , , I = 2 x3 dx + 3x dx − 2 sin x dx =2· x4 3x 1 3x + − 2(− cos x) + C = x4 + + 2 cos x + C. 4 ln 3 2 ln 3 Example 5.2 We wish to find 0004 , 0003√ x − 2x2 I= dx. √ 3 x 200 Integration Using power rules and the indefinite integral for a power function as integrand, we can transform the given integral as follows: 0004 0004 , 0003√ , 0003 1/2 x2 x x2 x I= dx − 2 · − 2 · dx = √ √ 3 3 x1/3 x1/3 x x , 0011 0012 x7/6 3 0018 x8/3 6 0018 3 6 = x1/6 − 2x5/3 dx = 7 − 2 · 8 + C = · x7 − · x8 + C. 7 4 6 3 The antiderivatives and rules presented so far are sufficient only for a few specific indefinite integrals. Hence, it is necessary to consider some more general integration methods which allow us to find broader classes of indefinite integrals. One of these methods is integration by substitution. 5.2.2 Integration by substitution The aim of this method is to transform a given integral in such a way that the resulting integral can be easily found. This is done by introducing a new variable t by means of an appropriate substitution t = g(x) or x = g −1 (t). The integration method by substitution results from the chain rule of differential calculus. THEOREM 5.2 Suppose that function f : Df → R has an antiderivative F and function g : Dg → R with Rg ⊆ Df is continuously differentiable on an open interval (a, b) ∈ Dg . Then function z = f ◦ g exists with z = ( f ◦ g)(x) = f (g(x)) and setting t = g(x), we obtain , , f (g(x)) · g 0010 (x) dx = f (t) dt = F(t) + C = F(g(x)) + C. The symbol ◦ stands for the composition of functions as introduced in Chapter 3. Theorem 5.2 states that, if the integrand is the product of a composite function f ◦ g and the derivative g 0010 of the inside function, then the antiderivative is given by the composite function F ◦ g, where function F is an antiderivative of function f . The validity of Theorem 5.2 can be easily proved by differentiating the composite function F ◦ g (using the chain rule). Before considering several examples, we give some special cases of Theorem 5.2 with specific inside functions g: (1) Let g(x) = ax + b. In this case, we get g 0010 (x) = a, and Theorem 5.2 reads as , , , 1 1 1 f (ax + b) dx = f (ax + b) · a dx = f (t) dt = · F(t) + C a a a 1 = · F(ax + b) + C. a Function g describes a linear substitution. (2) Let f (g(x)) = [g(x)]n . Then Theorem 5.2 turns into , 1 [g(x)]n · g 0010 (x) dx = · [g(x)]n+1 + C. n+1 Integration 201 (3) Let f (g(x)) = 1/[g(x)]. Then Theorem 5.2 reads as , g 0010 (x) dx = ln |g(x)| + C. g(x) (4) Let f (g(x)) = eg(x) . Then Theorem 5.2 corresponds to the equality , eg(x) · g 0010 (x) dx = eg(x) + C. We illustrate the above method by the following examples. Example 5.3 Let us find , (3x + 2)4 dx. This integral is of type (1) above, i.e. we have g(x) = 3x + 2 and f (t) = t 4 . Using g 0010 (x) = 3, we obtain , , , 1 1 1 t5 (3x + 2)4 · 3 dx = t 4 dt = · + C 3 3 3 5 1 = · (3x + 2)5 + C. 15 (3x + 2)4 dx = Example 5.4 Consider the integral , 3 5x2 ex dx. Setting t = g(x) = x3 , we obtain dg = g 0010 (x) = 3x2 . dx The application of Theorem 5.2 yields (see also special case (4) above with g(x) = x3 ) , 3 5x2 ex dx = 5 3 , 3 3x2 ex dx = 5 3 , et dt = Example 5.5 We want to find the integral , 5x + 8 dx. 3x2 + 1 5 t 5 3 e + C = ex + C. 3 3 202 Integration The first step is to split the above integral into two integrals: , , , 5x + 8 5x 8 dx = dx + dx. 3x2 + 1 3x2 + 1 3x2 + 1 We next discuss how we can find both integrals by substitution. The first integral is very similar to case (3) of Theorem 5.2. If we had 6x in the numerator, we could immediately apply this case. So, we multiply both the numerator and the denominator by 6 and obtain , , 5x 6x 5 5 dx = = ln(3x2 + 1) + C1 . 6 6 3x2 + 1 3x2 + 1 The second integral is similar to one of the basic integrals given in Chapter 5.2.1. If the integrand is 1/(1 + x2 ), then we obtain from this list of indefinite integrals the antiderivative arctan x. Thus we transform the second integral as follows: , , , 8 8 1 1 000f 000e dx = dx = 8 dt. √ √ 2+1 3x2 + 1 t 2 3 ( 3 · x) + 1 √ In the latter integral, we can apply the linear substitution t = 3 · x (or equivalently, we apply immediately the formula given for integrals of type (1) after Theorem 5.2). Now the indefinite integral can be found immediately from the list of given integrals: , 0012 0011√ 8 8 8 1 dt = √ arctan t + C2 = √ arctan 3 · x + C2 . √ · 2 1+t 3 3 3 Combining both results and rationalizing all denominators, we obtain , 0011√ 0012 5x + 8 8√ 5 3 arctan 3 · x + C. dx = ln(3x2 + 1) + 2 3 6 3x + 1 Sometimes we do not see immediately that the integrand is of the type f (g(x)) · g 0010 (x) (or can be easily transformed into this form). If we try to apply some substitution t = g(x) and if, by using the differential dt = g 0010 (x) dx, it is possible to replace all terms in the original integrand by some terms depending on the new variable t, then we can successfully apply integration by substitution. We illustrate this by the following examples. Example 5.6 We determine , 0019 e2x dx 1 − e2x and apply the substitution t = ex . By differentiation, we get dt = ex dx Integration 203 which can be rewritten as dx = dt dt = . ex t Using the substitution and the latter equality, we can replace all terms depending on x in the original integral by some term depending on t, and we obtain , 0019 e2x dx = 1 − e2x , 0019 t2 dt · = 1 − t2 t , √ t 1 − t2 · dt = t , dt = arcsin t + C. √ 1 − t2 The latter indefinite integral has been taken from the list in Chapter 5.2.1. Substituting back again, we get , 0019 e2x dx = arcsin ex + C. 1 − e2x Example 5.7 Consider the integral , dx . √ x x2 − 9 We apply the substitution t = √ x2 − 9 and obtain by differentiation dt x , =√ 2 dx x −9 which yields dt dx . =√ x x2 − 9 √ Replacing dx/ x2 − 9 in the integral, we would still have x2 in the denominator. In order to apply integration by substitution, we must be able also to replace this term by some term in 2 the variable t. Using the √ above substitution again, we can replace x in the denominator by 2 2 2 2 solving equation t = x − 9 for x which yields x = t + 9. Hence we obtain , , , dt 1 dt t 1 = arctan + C = t 2 9 3 3 t2 + 9 x x2 − 9 + 1 3 √ 2 x −9 1 + C. = arctan 3 3 √ dx = Notice that in the last step, when determining the indefinite integral, we have again applied a substitution, namely z = t/3 (or equivalently, type (1) of the integrals after Theorem 5.2 has been used). 204 Integration Example 5.8 Let us consider the integral , dx . sin x In this case, we can apply the substitution tan x =t 2 which can always be applied when the integrand is a rational function of the trigonometric functions sin x and cos x. Solving the above substitution for x yields x = 2 arctan t. By differentiation, we obtain 2 dx . = dt 1 + t2 Now, we still have to replace sin x by some function depending only on variable t. This can be done by using the addition theorems for the sine function (see property (1) of trigonometric functions in Chapter 3.3.3 and its special form given by formulas (3.8)). We have sin x = 2 · sin x 2 1 · cos 2x = 2 sin sin2 2x x 2 · cos 2x + cos2 2x = 2 tan 2x 1 + tan2 x 2 = 2t . 1 + t2 To get the second to last fraction above, we have divided both the numerator and the denominator by cos2 (x/2). Then, we get for the given integral , dx = sin x , 1 + t2 2dt = · 2t 1 + t2 , 0010 dt 0010 = ln |t| + C = ln 0010tan t x 00100010 0010 + C. 2 5.2.3 Integration by parts Another general integration method is integration by parts. The formula for this method is obtained from the formula for the differentiation of a product of two functions u and v: [u(x) · v(x)]0010 = u0010 (x) · v(x) + u(x) · v0010 (x). Integrating now both sides of the above equation leads to the following theorem which gives us the formula for integration by parts. THEOREM 5.3 Let u : Du → R and v : Dv → R be two functions differentiable on some open interval I = (a, b) ⊆ Du ∩ Dv . Then: , , u(x) · v0010 (x) dx = u(x) · v(x) − u0010 (x) · v(x) dx. Integration 205 The application of integration by parts requires that we can find an antiderivative of function v0010 and an antiderivative of function u0010 · v. If we are looking for an antiderivative of a product of two functions, the successful use of integration by parts depends on an appropriate choice of functions u and v0010 . Integration by parts can, for instance, be applied to the following types of integrals: , (1) Pn (x) · ln x dx, , (2) Pn (x) · sin ax dx, , (3) Pn (x) · cos ax dx, , (4) Pn (x) · eax dx, where Pn (x) = an xn + an−1 xn−1 + . . . + a1 x + a0 is a polynomial of degree n. In most cases above, polynomial Pn is taken as function u which has to be differentiated within the application of Theorem 5.3. (As a consequence, the derivative u0010 is a polynomial of smaller degree.) However, in case (1) it is usually preferable to take Pn as function v0010 which has to be integrated within the application of integration by parts (so that the logarithmic function is differentiated). We illustrate integration by parts by some examples. Example 5.9 Let us find , (x2 + 2) sin x dx. This is an integral of type (2) above, and we set u(x) = x2 + 2 and v0010 (x) = sin x. Now we obtain u0010 (x) = 2x and v(x) = − cos x. Hence, , , (x2 + 2) sin x dx = −(x2 + 2) cos x − (−2x cos x) dx , = −(x2 + 2) cos x + 2 x cos x dx. If we now apply integration by parts again to the latter integral with u(x) = x and v0010 (x) = cos x, and v(x) = sin x. we get u0010 (x) = 1 206 Integration This yields , 0003 0004 , (x2 + 2) sin x dx = −(x2 + 2) cos x + 2 x sin x − sin x dx = −(x2 + 2) cos x + 2x sin x + 2 cos x + C = 2x sin x − x2 cos x + C. Notice that integration constant C has to be written as soon as no further integral appears on the right-hand side. Example 5.10 Let us determine , ln x dx. Although the integrand f (x) = ln x is here not written as a product of two functions, we can nevertheless apply integration by parts by introducing factor one, i.e. we set u(x) = ln x and v0010 (x) = 1. Then we obtain u0010 (x) = 1 x and v(x) = x which leads to , , ln x dx = x ln x − dx = x(ln x − 1) + C. Example 5.11 We determine , sin2 x dx. In this case, we set u(x) = sin x and v0010 (x) = sin x, and we obtain with u0010 (x) = cos x and v(x) = − cos x by applying integration by parts , , sin2 x dx = − sin x cos x + cos2 x dx. Integration 207 Now one could again apply integration by parts to the integral on the right-hand side. However, doing this we only obtain the identity sin2 x dx = sin2 x dx which does not yield a solution to the problem. Instead we use the equality cos2 x = 1 − sin2 x (see property (8) of trigonometric functions in Chapter 3.3.3), and the above integral can be rewritten as follows: , , sin2 x dx = − sin x cos x + (1 − sin2 x) dx , = − sin x cos x + x − sin2 x dx. Now we can add sin2 x dx to both sides, divide the resulting equation by two, introduce the integration constant C and obtain , 1 sin2 x dx = (x − sin x cos x) + C. 2 Often, one has to combine both discussed integration methods. Consider the following two examples. Example 5.12 We determine , x dx. sin2 x Although the integrand does not have one of the special forms (1) to (4) given after Theorem 5.3, the application of integration by parts is worthwhile. Setting u(x) = x 1 and v0010 (x) = and v(x) = − cot x. sin2 x , we get u0010 (x) = 1 Function v is obtained from the list of indefinite integrals given in Chapter 5.2.1. This leads to , , x dx = −x cot x + cot x dx. sin2 x It remains to find the integral on the right-hand side. Using cot x = cos x , sin x 208 Integration we can apply integration by substitution. Setting g(x) = sin x the integrand takes the form g 0010 (x)/g(x) (see case (3) after Theorem 5.2), and so we obtain , x sin2 x dx = −x cot x + ln | sin x| + C. Example 5.13 , sin We find √ x dx. First, we apply integration by substitution and set t = √ x. This gives dt 1 1 = √ = dx 2t 2 x which can be rewritten as 2t dt = dx. Replacing , sin √ x and dx, we get √ x dx = 2 , t sin t dt. The substitution has been successfully applied since in the right-hand side integral, only terms depending on variable t and dt occur. This is an integral of type (2) given after Theorem 5.3, and we apply integration by parts. This yields u(t) = t and v0010 (t) = sin t and u0010 (t) = 1 and v(t) = − cos t. Thus, , 2 0003 0004 , t sin t dt = 2 · −t cos t + cos t dt = 2 · (−t cos t + sin t) + C. After substituting back, we have the final result , sin √ √ √ √ x dx = 2 · − x · cos x + sin x + C. Integration 209 5.3 THE DEFINITE INTEGRAL In this section, we start with consideration of the following problem. Given is a function f with y = f (x) ≥ 0 for all x ∈ [a, b] ⊆ Df . How can we compute the area A under the graph of function f from a to b assuming that function f is continuous on the closed interval [a, b]? To derive a formula for answering this question, we subdivide [a, b] into n subintervals of equal length by choosing points a = x0 < x1 < x2 < . . . < xn−1 < xn = b. Let li (resp. ui ) be the point of the closed interval [xi−1 , xi ], where function f takes the minimum (maximum) value, i.e. f (li ) = min{ f (x) | x ∈ [xi−1 , xi ]}, f (vi ) = max{ f (x) | x ∈ [xi−1 , xi ]}, and let xi = xi − xi−1 . Note that for a continuous function, the existence of the function values f (li ) and f (ui ) in the interval [xi−1 , xi ] follows from Theorem 4.5. Then we can give a lower bound Anmin and an upper bound Anmax in dependence on number n for the area A as follows (see Figure 5.1): Anmin = n 0006 f (li ) · xi , i=1 Anmax = n 0006 f (ui ) · xi . i=1 Figure 5.1 The definition of the definite integral. Since for each x ∈ [xi−1 , xi ], we have f (li ) ≤ f (x) ≤ f (ui ), i ∈ {1, 2, . . . , n}, inequalities Anmin ≤ A ≤ Anmax hold. We observe that, if n increases (i.e. the lengths of the intervals become smaller), the lower and upper bounds for the area improve. Therefore, we consider the limits of both bounds as the number of intervals tends to ∞ (or equivalently, the lengths of the intervals tend to zero). If both limits of the sequences {Anmin } and {Anmax } as n tends to 210 Integration ∞ exist and are equal, we say that the definite integral of function f over the interval [a, b] exists. Formally we can summarize this in the following definition. Definition 5.3 Let function f : Df → R be continuous on the closed interval [a, b] ⊆ Df . If the limits of the sequences {Anmin } and {Anmax } as n tends to ∞ exist and coincide, i.e. lim Anmin = lim Anmax = I , n→∞ n→∞ then I is called the definite (Riemann) integral of function f over the closed interval [a, b ] ⊆ Df . We write , I= b f (x) dx a for the definite integral of function f over the interval [a, b]. The numbers a and b, respectively, are denoted as lower and upper limits of integration. The Riemann integral is named after the German mathematician Riemann. We can state the following property. THEOREM 5.4 Let function f : Df → R be continuous on the closed interval [a, b] ⊆ Df . Then the definite integral of function f over the interval [a, b] exists. We only mention that there exist further classes of functions for which the definite integral can be appropriately defined, e.g. for functions that are bounded on the closed interval [a, b] having at most a finite number of discontinuities in [a, b]. Note that the evaluation of a definite integral according to Definition 5.3 may be rather complicated or even impossible. There is a much easier way by means of an antiderivative of function f which is presented in the following theorem. THEOREM 5.5 (Newton–Leibniz’s formula) Let function f : Df → R be continuous on the closed interval [a, b] ⊆ Df and function F be an antiderivative of f . Then the definite integral of function f over [a, b] is given by the change in the antiderivative between x = a and x = b: , b a 0010b 0010 f (x)dx = F(x) 0010 = F(b) − F(a). a From Theorem 5.5 we see that also for a definite integral, the main difficulty is to find an antiderivative of the integrand f . Therefore, we again have to apply one of the methods presented in Chapter 5.2 for finding an antiderivative. In the following, we give some properties of the definite integral which can immediately be derived from the definition of the definite integral. Integration 211 Properties of the definite integral , a f (x) dx = 0; (1) , a ,a b f (x) dx = − f (x) dx; (2) b , ,a b b k · f (x) dx = k · f (x) dx (k ∈ R); (3) a , c , b ,a b f (x) dx = f (x) dx + f (x) dx (a ≤ c ≤ b); (4) c 0010,a 0010 a, 0010 b 0010 b 0010 0010 (5) 0010 f (x) dx0010 ≤ | f (x)| dx. 0010 a 0010 a Moreover, the following property holds: , 0010t 0010 f (x) dx = F(x)0010 = F(t) − F(a) = G(t). t a a Using the latter formulation, we obtain d dt , t f (x) dx = G 0010 (t) = a d [F(t) + C] = f (t). dt The first property expresses that the definite integral with a variable upper limit can be considered as a function G depending on this limit. The latter property states that the derivative of the definite integral with respect to the upper limit of integration is equal to the integrand as a function evaluated at that limit. THEOREM 5.6 Let functions f : Df → R and g : Dg → R be continuous on the closed interval [a, b] ⊆ Df ∩ Dg with f (x) ≤ g(x) for x ∈ [a, b]. Then , a b , f (x) dx ≤ b g(x) dx. a Theorem 5.6 is also known as the monotonicity property of the definite integral. In the derivation of the definite integral we have considered the case when function f is nonnegative in the closed interval [a, b] ⊆ Df . We have seen that the definite integral gives the area enclosed by the function f and the x axis between x = a and x = b. In the case when function f is negative in the closed interval [a, b], it follows from Theorem 5.6 that the definite integral has a negative value (using function g identically equal to zero, the result follows from the monotonicity property). In this case, the definite integral gives the area enclosed by the function f and the x axis between x = a and x = b with negative sign. As a consequence, if function f has zeroes in the interval [a, b], one has to split this interval into subintervals in order to get the area enclosed by the function and the x axis between x = a and x = b. We continue with some examples for evaluating definite integrals. The first example illustrates the above comment. 212 Integration Example 5.14 We wish to determine the area A enclosed by function f with f (x) = cos x with the x axis between a = 0 and b = π. Function f is non-negative in the closed interval [0, π/2] and non-positive in the closed interval [π/2, π]. Therefore, by using property (4) and the above comment, the area A is obtained as follows: , π 2 A= , cos x dx − 0 0010π 0010π 00102 0010 = sin x0010 − sin x0010 π 0 π π 2 cos x dx 2 = (1 − 0) − (0 − 1) = 2. It is worth noting that we would get value zero when evaluating the definite integral of the cosine function between a = 0 and b = π (since the area between the cosine function and the x axis in the first subinterval is equal to the area between the x axis and the cosine function in the second subinterval). Example 5.15 , 2 1 We evaluate the definite integral dx . x(1 + ln x) Applying integration by substitution, we set t = 1 + ln x and obtain by differentiation dt 1 = . dx x Inserting both terms into the integral, we get , 2 1 dx = x(1 + ln x) , t(2) t(1) dt . t Thus we obtain , t(2) t(1) 0010t(2) 00102 dt 0010 0010 = ln |t|0010 = ln |1 + ln x|0010 = ln(1 + ln 2) − ln(1 + ln 1) = ln(1 + ln 2). t(1) 1 t In the above computations, we did not transform the limits of the definite integral into the corresponding t values but we have used the above substitution again after having found an antiderivative. Of course, we can also transform the limits of integration (in this case, we get t(1) = 1 + ln 1 = 1 and t(2) = 1 + ln 2) and insert the obtained values directly into the obtained antiderivative ln |t|. Integration 213 Example 5.16 The marginal cost of a firm manufacturing a single product is given by C 0010 (x) = 6 − 60 , x+1 0 ≤ x ≤ 1000, where x is the quantity produced, and the marginal cost is given in EUR. If the quantity produced changes from 300 to 400, the change in cost is obtained by Newton–Leibniz’s formula as follows: , C(400) − C(300) = = 400 C 0010 (x) dx 300 , 400 0003 6− 300 60 x+1 0004 0010400 0010 dx = (6x − 60 ln |x + 1|) 0010 300 ≈ (2, 400 − 359.64) − (1, 800 − 342.43) = 582.79 EUR. Thus, the cost increases by 582.79 EUR when production is increased from 300 units to 400 units. Example 5.17 We want to compute the area enclosed by the graphs of the two functions f1 : R → R and f2 : R → R given by f1 (x) = x2 − 4 and f2 (x) = 2x − x2 . We first determine the points of intercept of both functions and obtain from x2 − 4 = 2x − x2 the quadratic equation x2 − x − 2 = 0 which has the two real solutions x1 = −1 and x2 = 2. The graphs of both functions are parabolas which intersect only in these two points (−1, −3) and (2, 0). To compute the enclosed area A, we therefore have to evaluate the definite integral , A= 2 −1 [f2 (x) − f1 (x)] dx 214 Integration which yields , A= , 000f (2x − x2 ) − (x2 − 4) dx = 000e 2 −1 2 −1 (−2x2 + 2x + 4) dx 00040010 0003 3 x2 x 00102 + 2x 0010 (−x2 + x + 2) dx = 2 − + −1 3 2 −1 0004 0003 0004( '0003 1 1 8 + − 2 = 9. =2 − +2+4 − 3 3 2 , =2 2 Thus, the area enclosed by the graphs of the given functions is equal to nine squared units. THEOREM 5.7 (mean-value theorem for integrals) Let function f : Df → R be continuous on the closed interval [a, b ] ⊆ Df . Then there exists a real number x∗ ∈ [a, b ] such that , b 1 M = f (x∗ ) = f (x) dx. b−a a This theorem is graphically illustrated for the case f (x) ≥ 0 in Figure 5.2. That is, there is at least one value x∗ ∈ [a, b] such that the dashed area , I= b f (x) dx a is equal to the area of the rectangle with the lengths b − a and function value f (x∗ ) (where value x∗ must be suitably determined). Figure 5.2 The mean-value theorem for integrals. Integration 215 5.4 APPROXIMATION OF DEFINITE INTEGRALS There are several reasons why it may not be possible to evaluate a definite integral. For some functions, there does not exist an antiderivative that can be determined analytically. 2 As an example, we can mention here function f with f (x) = e−x , which is often applied in probability theory and statistics, or function g with g(x) = (sin x)/x. Sometimes it may be too time-consuming to determine an antiderivative, or function f may be given only as a set of points (x, y) experimentally determined. In such cases, we want to determine the definite integral approximately by applying numerical methods. In the following, we present a few methods and give some comments on the precision of the approximate value obtained. Approximate methods divide the closed interval [a, b ] into n subintervals of equal length h = (b − a)/n, and so we get the subintervals [x0 , x1 ] , [x1 , x2 ] , . . . , [xn−1 , xn ] , where a = x0 and b = xn . Within each interval, we now replace function f by some other function which is ‘close to the original one’ and for which the integration can easily be performed. In all the methods below, we replace function f by a polynomial of small degree. Approximation by rectangles In this case, we approximate function f by a step function (in each interval a constant function is used), i.e. we approximate the definite integral by the sum of the areas of rectangles. We obtain , b b−a · [ f (x0 ) + f (x1 ) + · · · + f (xn−1 )] = IR . f (x) dx ≈ n a The above approximation is illustrated in Figure 5.3. This formula gives a value between the lower and upper Riemann sum for a specific value of n. When applying approximation by rectangles, the error IR of the approximate value of the definite integral can be estimated as follows: 0010 0010 0010 0010, b 0010 (b − a)2 00100010 0010 0010 IR = 0010 · 0010 max f 0010 (x)00100010 , f (x)dx − IR 0010 ≤ c · n a≤x≤b a where c ∈ [0, 1). Approximation by trapeziums An alternative approximation is obtained when in each closed interval [xi−1 , xi ] function f is replaced by a line segment (i.e. a linear function) through the points (xi−1 , f (xi−1 )) and (xi , f (xi )). This situation is illustrated in Figure 5.4. In this way, we get the following approximation formula for the definite integral: , a b f (x) dx ≈ ( ' b−a f (a) + f (b) · + f (x1 ) + f (x2 ) + · · · + f (xn−1 ) = ITR . n 2 By the above formula, we approximate the definite integral by the sum ITR of the areas of n trapeziums. Assuming an equal number n of subintervals, often the approximation by 216 Integration Figure 5.3 Approximation by rectangles. Figure 5.4 Approximation by trapeziums. trapeziums gives better approximate values than the approximation by rectangles. This is particularly true when the absolute value of the first derivative in the closed interval [a, b] can become large and thus we may have big differences in the function values in some of the subintervals. Integration 217 When determining the definite integral by the above approximation by trapeziums, we get the following estimation for the maximum error ITR of the approximate value ITR : ITR 0010, 0010 =0010 b a 0010 (b − a)3 0010 f (x)dx − ITR 0010 ≤ 12n2 0010 0010 0010 0010 00100010 0010 · 0010 max f (x)00100010 . a≤x≤b From the above formula we see that the maximum error can become large when the absolute value of the second derivative of function f is large for some value in the closed interval [a, b]. Moreover, the smaller the interval length h is, the better the approximation. Kepler’s formula Here we consider the special case when the closed interval [a, b] is divided only into two subintervals [a, (a + b)/2] and [(a + b)/2, b] of equal length. But now we approximate function f by a quadratic function (parabola) which is uniquely defined by three points, and here we use the points (a, f (a)) , ((a + b)/2, f ((a + b)/2)) and (b, f (b)) (see Figure 5.5). Figure 5.5 Kepler’s formula. This leads to Kepler’s formula: , b f (x) dx ≈ a ' 0003 0004 ( b−a a+b · f (a) + 4f + f (b) = IK . 6 2 The error IK of the approximate value IK of the definite integral can be estimated as follows: 0010, 0010 IK = 0010 a b 0010 (b − a)5 0010 f (x)dx − IK 0010 ≤ 2, 880 0010 0010 0010 0010 · 00100010 max f (4) (x)00100010 . a≤x≤b 218 Integration The formula shows that a large absolute value of the fourth derivative for some value x ∈ [a, b] can lead to a large error of the approximate value. If function f is a polynomial of degree no greater than three, then the value IK is even the exact value of the definite integral since for each values x ∈ [a, b], the fourth derivative is equal to zero in this case. Simpson’s formula This formula is a generalization of Kepler’s formula. The only difference is that we do not divide the closed interval [a, b] into only two subintervals but into a larger even number n = 2m of subintervals. Now we apply Kepler’s formula to any two successive intervals. This leads to Simpson’s formula: , b f (x) dx ≈ a b−a · [f (a) + 4(f (x1 ) + f (x3 ) + · · · + f (x2m−1 )) 6m +2( f (x2 ) + f (x4 ) + · · · + f (x2m−2 )) + f (b)] = IS . We can give the following estimate for the maximum error IS of the approximate value IS : 0010, 0010 IS = 0010 a b 0010 0010 0010 (b − a)5 0010 0010 0010 f (x)dx − IS 0010 ≤ · n · 00100010 max f (4) (x)00100010 . 180 a≤x≤b Example 5.18 We illustrate approximation by rectangles, trapeziums and by Simpson’s formula by looking at the example , 6 2 , 6 f (x)dx = 2 ex dx, x where we use n = 8 subintervals for each method. We get the function values given in Table 5.1. Table 5.1 Function values for Example 5.18 i xi f (xi ) 0 1 2 3 4 5 6 7 8 2 2.5 3 3.5 4 4.5 5 5.5 6 3.6945 4.8730 6.6952 9.4616 13.6495 20.0038 29.6826 44.4894 67.2381 Integration 219 If we use approximation by rectangles, we get , 6 2 b−a · [ f (x0 ) + f (x1 ) + · · · + f (x7 )] n 4 ≈ · 132.5497 = 66.2749. 8 f (x)dx ≈ Applying approximation by trapeziums, we get , 6 f (x) dx ≈ 2 ≈ ' ( f (x0 ) + f (x8 ) b−a · + f (x1 ) + f (x2 ) + · · · + f (x7 ) n 2 4 · 164.3215 = 82.1608. 8 If we apply Simpson’s formula, we get , 6 2 b−a ex dx ≈ · [ f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + 2f (x4 ) + 4f (x5 ) + 2f (x6 ) x 6m 4 +4f (x7 ) + f (x8 )] ≈ · 486.2984 ≈ 81.0497. 24 5.5 IMPROPER INTEGRALS So far, we have made the following two basic assumptions in defining the definite integral of a function f over the interval [a, b]. (1) The limits of integration a and b are both finite. (2) The function is continuous and therefore bounded on the closed interval [a, b ]. If either of the conditions (1) and (2) is not satisfied, the definite integral is called an improper integral. We now consider both cases separately. 5.5.1 Infinite limits of integration In this case, we define the improper integral appropriately as the limit of certain definite integrals with finite limits of integration. 220 Integration Definition 5.4 Let one or both limits of integration be infinite. Then we define the improper integrals as follows: , , ∞ , −∞ , ∞ −∞ f (x) dx b→∞ a a b b f (x) dx = lim , f (x) dx = lim b a→−∞ a , b f (x) dx = lim b→∞ a→−∞ f (x) dx f (x) dx a provided that these limits exist. If a limit does not exist, the corresponding improper integral has no value and is said to be divergent. By using Newton–Leibniz’s formula for definite integrals, we can evaluate the improper integrals given above as follows: Evaluation of improper integrals: , ∞ a , b −∞ ∞ , −∞ 0010b 0010 f (x) dx = lim F(x) 0010 = lim F(b) − F(a) b→∞ ∞ 0 b→∞ 0010b 0010 f (x) dx = lim F(x) 0010 = F(b) − lim F(a) a→−∞ a→−∞ a 0010b 0010 f (x) dx = lim F(x) 0010 = lim F(b) − lim F(a) b→∞ a→−∞ Example 5.19 , a a b→∞ a→−∞ We evaluate the improper integral 1 −2x e dx. 3 We obtain , 0 ∞ ' (0010 , b 1 −2x 1 1 1 0010b e dx = · lim · lim (−e−2x ) 0010 e−2x dx = · 0 3 3 b→∞ 0 3 2 b→∞ ( ' 1 1 1 = · lim (−e−2b ) − (−e0 ) = · (0 + 1) = . 6 b→∞ 6 6 5.5.2 Unbounded integrands Similarly to case (1), we define the improper integral again as a limit of certain definite integrals. Integration 221 Definition 5.5 Let function f : Df → R be continuous on the right-open interval [a, b) ⊆ Df but definitely divergent as x tends to b − 0. Then we define the improper b integral a f (x) dx as , b , f (x) dx = lim t t→b−0 a a f (x) dx provided that this limit exists. Otherwise the improper integral has no value and is said to be divergent. Let function f be continuous on the left-open interval (a, b] ⊆ Df but definitely b divergent as x tends to a + 0. Then we define the improper integral a f (x) dx as , b , f (x) dx = lim a t→a+0 t b f (x) dx provided that this limit exists. Otherwise the improper integral has no value and is said to be divergent. Example 5.20 , 1 0 We evaluate the improper integral dx . √ 3 x √ Since function 1/ 3 x is definitely divergent as x tends to 0 + 0, we apply Definition 5.5 and obtain 0003 0004 , 1 , 1 3 2/3 001000101 dx x2/3 001000101 −1/3 · x x dx = lim = lim = lim √ 0010 0010 3 t t→0+0 2 t t→0+0 2 x t→0+0 t 0 3 0004 0003 0018 3 3 3 3 = t2 = . − lim 2 2 t→0+0 2 Thus the above improper integral has the value 3/2. Finally, we consider an example where a point x0 exists in the interior of the closed interval [a, b], where function f is definitely divergent as x tends to x0 . This case can be reduced to the consideration of two integrals according to Definition 5.5. Example 5.21 , 1 −1 dx . x2 Consider the improper integral 222 Integration This integrand has a discontinuity (pole of second order) at x = 0. Therefore we must partition the integral into two integrals, each of them with an unbounded integrand according to one of the cases presented in Definition 5.5, and we obtain , 1 , t1 , 1 dx dx dx = lim + lim 2 2 t1 →0−0 −1 x t2 →0+0 t2 x 2 −1 x 0003 0003 00040010 0004 1 0010t1 1 001000101 − − = lim 0010 + lim 0010 t1 →0−0 x −1 t2 →0+0 x t2 0003 0003 0004 0004 1 1 − + 1 + lim −1 + = ∞. = lim t1 →0−0 t2 →0+0 t1 t2 Thus, this improper integral has no finite value. Notice that, if we did not take into consideration that the integrand is definitely divergent as x tends to zero and applied Newton–Leibniz’s formula for evaluating a definite integral, we would obtain the value −2 for the above integral (which is obviously false since the integrand is a non-negative function in the interval so that the area cannot be negative). 5.6 SOME APPLICATIONS OF INTEGRATION In this section, we discuss some economic applications of integration. 5.6.1 Present value of a continuous future income flow Assume that in some time interval [0, T ] an income is received continuously at a rate of f (t) EUR per year. Contrary to the problems considered in Chapter 2, we now assume that interest is compounded continuously at a rate of interest i. Moreover, we denote by P(t) the present value of all payments made over the time interval [0, t]. In other words, the value P(t) gives the amount of money one would have to deposit at time zero in order to have at time T the amount which would result from depositing continuously the income flow f (t) over the time interval [0, T ]. In Chapter 2, we discussed the case of a discrete compounding of some amount. Let A be the amount due for payment after t years with a rate of interest i per year. Then the present value of this amount A is equal to P = A · (1 + i)−t (see Chapter 2). In the case of m payment periods per year, the present value would be 0003 0004 i −mt P =A· 1+ . m If we assume continuous compounding, we have to consider the question of what happens if m → ∞. We set n = m/i. Notice that from m → ∞ it follows that n → ∞ as well. Now the present value P of the amount A is given by '0003 0004 ( 1 n −it P =A 1+ . n Integration 223 Using the limit 0003 0004 1 n = e, lim 1 + n→∞ n we find that, in the case of a continuous compounding, the present value P of amount A is given by P = A · e−it . Now returning to our original problem, we can say that f (t) · t is approximately equal to the income received in the closed interval [t, t + t]. The present value of this amount at time zero is therefore equal to f (t) · t · e−it . Taking into account that this present value is equal to the difference P(t + t) − P(t), we obtain P(t + t) − P(t) ≈ f (t) · e−it . t The left-hand side is the difference quotient of function P (considering points t and t + t). Taking the limit of this difference quotient as t tends to zero, we obtain P 0010 (t) = f (t) · e−it . Evaluating now the definite integral of this integrand from 0 to T , we get , T P(T ) − P(0) = P 0010 (t) dt = 0 , T f (t) · e−it dt. 0 Since P(0) = 0 by definition, we finally obtain , T P(T ) = f (t) · e−it dt. 0 Example 5.22 Let function f : Df → R with f (t) = 30t + 100 (in EUR) describe the annual rate of an income flow at time t continuously received over the years from time t = 0 to time t = 5. Moreover, assume that the rate of interest is i = 0.06 (or, which is the same, p = i · 100 per cent = 6 per cent) compounded continuously. Applying the formula derived above, the present value at time zero is obtained by , P(5) = 5 (30t + 100) · e−0.06t dt. 0 Applying integration by parts with u(t) = 30t + 100 and v0010 (t) = e−0.06t , we obtain u0010 (t) = 30 and v(t) = − 1 · e−0.06t , 0.06 224 Integration and thus , 5 00105 1 30 0010 e−0.06t dt · (30t + 100) · e−0.06t 0010 + 0 0.06 0.06 0 00105 00105 1 30 0010 −0.06t 0010 =− · e · (30t + 100) · e−0.06t 0010 − 0010 0 0 0.06 (0.06)2 0003 00040010 5 1 30 0010 =− · e−0.06t · 30t + 100 + 0010 0.06 0.06 0 00105 1 0010 =− · e−0.06t · (30t + 600)0010 0 0.06 0003 0004 1 1 =− · e−0.3 · (150 + 600) + · e0 · 600 0.06 0.06 P(5) = − =− 750 −0.3 600 + ·e = 739.77 EUR, 0.06 0.06 i.e. the present value at time zero is equal to 739.77 EUR. 5.6.2 Lorenz curves These are curves that characterize the income distribution among the population. The Lorenz curve L is defined for 0 ≤ x ≤ 1, where x represents the percentage of the population (as a decimal), and L(x) gives the income share of 100x per cent of the poorest part of the population. Therefore, for each Lorenz curve, we have L(0) = 0, L(1) = 1, 0 ≤ L(x) ≤ x, and L is an increasing and convex function on the closed interval [0, 1]. Then the Gini coefficient G is defined as follows: , G=2 1 [x − L(x)] dx. 0 Factor two is used as a scaling factor to guarantee that the Gini coefficient is always between zero and one. The smaller the Gini coefficient is, the fairer is the income distributed among the population. The Lorenz curve and the Gini coefficient are illustrated in Figure 5.6. They can also be used e.g. for measuring the concentration in certain industrial sectors. Example 5.23 Assume that for Nowhereland the Lorenz curve describing the income distribution among the population is given by L(x) = 3 2 2 x + x, 5 5 DL = [0, 1]. First we note that this is a Lorenz curve since L(0) = 0, L(1) = 1 and the difference 0004 0003 3 3 3 3 2 2 x + x = x − x2 = x(1 − x) x − L(x) = x − 5 5 5 5 5 Integration 225 is non-negative for 0 ≤ x ≤ 1 since all factors are non-negative in the latter product representation. (Notice also that function L is strictly increasing and strictly convex on the closed interval [0, 1].) We obtain 0004 , 1 , 10003 3 3 [x − L(x)] dx = 2 G=2 x − x2 dx 5 5 0 0 0003 0004 3 2 3 3 001000101 1 =2 x − x 0010 = . 0 10 15 5 Since the Gini coefficient is rather small, the income is ‘rather equally’ distributed among the population (e.g. L(0.25) = 0.1375 means that 25 per cent of the poorest population still have an income share of 13.75 per cent and L(0.5) = 0.35 means that 50 per cent of the poorest population still have an income share of 35 per cent). Figure 5.6 Lorenz curve and Gini coefficient. 5.6.3 Consumer and producer surplus Assume that for a certain product there is a demand function D and a supply function S, both depending on the price p of the product. The demand function is decreasing while the supply function is increasing. For some price p∗ , there is an equilibrium called the market price, i.e. demand is equal to supply: D(p∗ ) = S(p∗ ). Some consumers are willing to pay a higher price than p∗ until a certain maximum price pmax is reached. On the other hand, some producers could be willing to sell the product at a lower price than p∗ , which means that the supply S(p) increases from the minimum price pmin . Assuming that the price p can be considered as a continuous variable, the customer surplus CS is obtained as , pmax D(p) dp, CS = p∗ 226 Integration while the producer surplus PS is obtained as , PS = p∗ S(p) dp. pmin The consumer surplus CS can be interpreted as the total sum that customers save when buying the product at the market price instead of the price they would be willing to pay. Analogously, the producer surplus is the total sum that producers earn when all products are sold at the market price instead of the price they would be willing to accept. The producer surplus PS and the consumer surplus CS are illustrated in Figure 5.7. Figure 5.7 Producer and consumer surplus. Example 5.24 Assume that the demand function D is given by D(p) = 5p2 − 190p + 1, 805 and the supply function S is given by S(p) = 20p2 − 160p + 320. First, we mention that we get the minimum price pmin = 4 from S(p) = 0 and the maximum price pmax = 19 from D(p) = 0. Moreover, we note that the demand function is in fact strictly decreasing due to D0010 (p) = 10p − 190 < 0 for p < 19, and the supply function is strictly increasing since S 0010 (p) = 40p − 160 > 0 for p > 4. The market price p∗ is obtained from D(p∗ ) = S(p∗ ): 5(p∗ )2 − 190p∗ + 1, 805 = 20(p∗ )2 − 160p∗ + 320 Integration 227 which yields 15(p∗ )2 + 30p∗ − 1, 485 = 0, or, after dividing by 15, equivalently, (p∗ )2 + 2p∗ − 99 = 0. From the latter quadratic equation, we get the zeroes p1∗ = −1 + √ 1 + 99 = 9 and p2∗ = −1 − √ 1 + 99 = −11. Since the second root is negative, the market price is p∗ = p1∗ = 9. Hence, the consumer surplus is obtained as , CS = 19 , 19 0011 0012 5p2 − 190p + 1, 805 dp D( p) dp = 9 9 0003 00040010 00040010 5 3 p2 p3 001019 001019 · p − 95p2 + 1, 805p 0010 − 190 · + 1, 805p 0010 = = 5· 9 9 3 3 2 0003 0004 34, 295 = − 34, 295 + 34, 295 − (1, 215 − 7, 695 + 16, 245) = 1, 666.67. 3 0003 The producer surplus is obtained as , 9 PS = 4 , 9 S(p) dp = (20p2 − 160p + 320) dp 4 0003 00040010 0003 00040010 p3 p2 20 3 00109 00109 = 20 · − 160 · + 320p 0010 = · p − 80p2 + 320p 0010 4 4 3 2 3 0003 0004 1, 280 = 4, 860 − 6, 480 + 2, 880 − − 1, 280 + 1, 280 = 833.33. 3 EXERCISES 5.1 Use the substitution rule to find the following indefinite integrals: , , , ln x 5 (a) esin x cos x dx; (b) dx; (c) dx; x 1 − 4x , , , dx x dx x3 dx (d) ; (e) ; ; (f) 0018 0018 3−2x e x2 + 1 1 + x2 , 2x , , e − 2ex cos3 x dx (g) ; (i) dx; (h) dx; 0018 2x e +1 sin2 x 2 − 9x2 , , dx dx ; (k) . (j) 1 − cos x 2 sin x + sin 2x 228 Integration 5.2 Use integration by parts to find the following indefinite integrals: , , , x (a) x2 ex dx; (b) ex cos x dx; (c) dx; cos2 x , , , x2 ln x dx; (f ) ln(x2 + 1) dx. (d) cos2 x dx; (e) 5.3 Evaluate the following definite integrals: , π , 2 , 4 2 dx (c) (a) x2 dx; (b) sin3 x dx; √ ; 0 −1 0 1+ x , π , 4 , t 2 x dx dx 1 , t < ; (f) (d) ; (e) sin x cos2 x dx; √ 2x − 1 2 1 + 2x 0 0 0 , 0 dx (g) . 2 −1 x + 2x + 2 5.4 A firm intends to pre-calculate the development of cost, sales and profit of a new product for the first five years after launching it. The calculations are based on the following assumptions: • • • t denotes the time (in years) from the introduction of the product beginning with t = 0; C(t) = 1, 000 · 4 − (2et )/(et + 1) is the cost as a function of t ∈ R, t ≥ 0; 2 −t S(t) = 10, 000 · t · e are the sales as a function of t ∈ R, t ≥ 0. (a) Calculate total cost, total sales and total profit for a period of four years. (b) Find average sales per year and average cost per year for this period. (c) Find the total profit as a function of the time t. 5.5 (a) Find , 2π sin x dx 0 and compute the area enclosed by function f : R → R with f (x) = sin x and the x axis. (b) Compute the area enclosed by the two functions f1 : R → R and f2 : R → R given by f1 (x) = x3 − 4x 5.6 and f2 (x) = 3x + 6. The throughput q = q(t) (output per time unit) of a continuously working production plant is given by a function depending on time t: % 0003 00042 & t q(t) = q0 · 1 − . 10 The throughput decreases for t = 0 up to t = 10 from q0 to 0. One overhaul during the time interval [0, T ] with T < 10 means that the throughput goes up to q0 . After that it decreases as before. Integration 229 (a) Graph the function q with regard to the overhaul. (b) Let t0 = 4 be the time of overhaul. Find the total output for the time interval [0, T ] with T > 4. (c) Determine the time t0 of overhaul which maximizes the total output in the interval [0, T ]. 5.7 Determine the following definite integral numerically: , 1 0 dx 1 + x2 . (a) Use approximation by trapeziums with n = 10. (b) Use Kepler’s formula. (c) Use Simpson’s formula with n = 10. Compare the results of (a), (b) and (c) with the exact value. 5.8 5.9 Evaluate the following improper integrals: , 0 , ∞ dx (a) ex dx; (b) ; x2 + 2x + 1 −∞ 1 , ∞ , 4 dx (d) λx2 e−λx dx; (e) √ ; x 0 0 , ∞ (c) λe−λx dx; 0 , 6 (f ) 0 2x − 1 dx. (x + 1)(x − 2) Let function f with f (t) = 20t + 200 (in EUR) describe the annual rate of an income flow at time t continuously received over the years from time t = 0 to time t = 6. Interest is compounded continuously at a rate of 4 per cent p.a. Evaluate the present value at time zero. 5.10 Given are a demand function D and a supply function S depending on price p as follows: D(p) = 12 − 2p and S( p) = 8 4 p− . 7 7 Find the equilibrium price p∗ and evaluate customer surplus CS and producer surplus PS. Illustrate the result graphically. 6 Vectors In economics, an ordered n-tuple often describes a bundle of commodities such that the ith value represents the quantity of the ith commodity. This leads to the concept of a vector which we will introduce next. 6.1 PRELIMINARIES Definition 6.1 A vector a is an ordered n-tuple of real numbers a1 , a2 , . . . , an . The numbers a1 , a2 , . . . , an are called the components (or coordinates) of vector a. We write ⎛ ⎜ ⎜ a=⎜ ⎝ a1 a2 . . ⎞ ⎟ ⎟ ⎟ = (a1 , a2 , . . . , an )T ⎠ an for a column vector. For the so-called transposed vector aT which is a row vector obtained by taking the column as a row, we write: ⎛ ⎜ ⎜ aT = (a1 , a2 , . . . , an ) = ⎜ ⎝ a1 a2 . . ⎞T ⎟ ⎟ ⎟ . ⎠ an We use letters in bold face to denote vectors. We have defined above a vector a always as a column vector, and if we write this vector as a row vector, this is indicated by an upper-case T which stands for ‘transpose’. The convenience of using the superscript T will become clear when matrices are discussed in the next chapter. If vector a has n components, we say that vector a has dimension n or that a is an n-dimensional vector or n-vector. If not noted differently, we index components of a vector by subscripts while different vectors are indexed by superscripts. Vectors 231 Definition 6.2 real n-tuples: ⎧⎛ ⎪ ⎪ ⎪ ⎨⎜ ⎜ Rn = ⎜ ⎪ ⎝ ⎪ ⎪ ⎩ The n-dimensional (Euclidean) space Rn is defined as the set of all ⎫ ⎞0010 0010 ⎪ 0010 ⎪ ⎪ ⎟0010 ⎬ ⎟0010 a ∈ R, i = 1, 2, . . . , n ⎟0010 i ⎪ ⎠0010 ⎪ 0010 ⎪ ⎭ 0010 an a1 a2 . . Similarly, Rn+ stands for the set of all non-negative real n-tuples. We can graph a vector as an arrow in the n-dimensional space which can be interpreted as a displacement of a starting point P resulting in a terminal point Q. For instance, if point P has the coordinates (p1 , p2 , . . . , pq ) and point Q has the coordinates (q1 , q2 , . . . , qn ), then ⎞ q1 − p 1 ⎜ q2 − p 2 ⎟ ⎟ a = PQ = ⎜ ⎝ .. ⎠. qn − pn ⎛ It is often assumed that the starting point P is the origin of the coordinate system. In this case, the components of vector a are simply the coordinates of point Q and, therefore, a row vector aT can be interpreted as a point (i.e. a location) in the n-dimensional Euclidean space. In the case of n = 2, we can illustrate vectors in the plane, e.g. the vectors 0003 0004 0003 0004 4 −1 a= and b= 3 2 are illustrated in Figure 6.1. Finding the terminal point of vector a means that we are going four units to the right and three units up from the origin. Similarly, to find the terminal point of b, we are going one unit to the left and two units up from the origin. Figure 6.1 Representation of two-dimensional vectors a and b. 232 Vectors Next, we introduce some relations on vectors of the same dimension. Definition 6.3 Let a, b ∈ Rn with a = (a1 , a2 , . . . , an )T and b = (b1 , b2 , . . . , bn )T . The vectors a and b are said to be equal if all their corresponding components are equal, i.e. ai = bi for all i = 1, 2, . . . , n. We write a ≤ b if ai ≤ bi and a ≥ b if ai ≥ bi for all i = 1, 2, . . . , n. Analogously, we write a < b if ai < bi and a > b if ai > bi for all i = 1, 2, . . . , n. Remark Note that not every pair of n-dimensional vectors may be compared by the relations ≤ and ≥, respectively. For instance, for the vectors ⎛ ⎞ ⎛ ⎞ 3 2 ⎝ ⎠ −1 a= and b = ⎝ 0 ⎠, 2 2 we have neither a ≤ b nor a ≥ b. Example 6.1 Consider the vectors ⎛ ⎞ ⎛ ⎞ 2 1 a = ⎝ 4 ⎠, b = ⎝ −2 ⎠ 3 3 ⎛ and ⎞ 0 c = ⎝ 3 ⎠. 2 Then we have a ≥ b, a > c, but vectors b and c cannot be compared, i.e. we have neither b ≤ c nor c ≤ b. Special vectors Finally, we introduce some special vectors. By ei we denote the ith unit vector, where the ith component is equal to one and all other components are equal to zero, i.e. ⎛ ⎞ 0 ⎜ . ⎟ ⎜ . ⎟ ⎜ ⎟ ⎜ 0 ⎟ ⎜ ⎟ i ⎟ e =⎜ ← ith component ⎜ 1 ⎟ ⎜ 0 ⎟ ⎜ ⎟ ⎜ . ⎟ ⎝ . ⎠ 0 The n-dimensional zero vector is a vector containing only zeroes as the n components: ⎛ ⎞ 0 ⎜ 0 ⎟ ⎜ ⎟ 0 = ⎜ . ⎟. ⎝ . ⎠ 0 Vectors 233 6.2 OPERATIONS ON VECTORS We start with the operations of adding two vectors and multiplying a vector by a real number (scalar). Definition 6.4 Let a, b ∈ Rn with a = (a1 , a2 , . . . , an )T and b = (b1 , b2 , . . . , bn )T . The sum of the two vectors a and b is the n-dimensional vector a + b obtained by adding each component of a to the corresponding component of b: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ a1 b1 a1 + b1 ⎜ a2 ⎟ ⎜ b2 ⎟ ⎜ a2 + b 2 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ a+b=⎜ . ⎟+⎜ . ⎟=⎜ ⎟ . ⎝ . ⎠ ⎝ . ⎠ ⎝ ⎠ . an an + bn bn Definition 6.5 Let a ∈ Rn with a = (a1 , a2 , . . . , an )T and λ ∈ R. The product of number (or scalar) λ and vector a is the n-dimensional vector λa whose components are λ times the corresponding components of a: ⎛ ⎞ ⎛ ⎞ a1 λa1 ⎜ a2 ⎟ ⎜ λa2 ⎟ ⎜ ⎟ ⎜ ⎟ λ a = λ · ⎜ . ⎟ = ⎜ . ⎟. ⎝ . ⎠ ⎝ . ⎠ an λan The operation of multiplying a vector by a scalar is known as scalar multiplication. Using Definitions 6.4 and 6.5, we can now define the difference of two vectors. Definition 6.6 Let a, b ∈ Rn be n-dimensional vectors. Then the difference between the vectors a and b is defined by a − b = a + (−1)b. According to Definition 6.6, the difference vector is obtained by subtracting the components of vector b from the corresponding components of vector a. Notice that the sum and the difference of two vectors are only defined when both vectors a and b have the same dimension. Example 6.2 Let 0003 0004 4 a= and 1 0003 b= 1 3 0004 . 234 Vectors Then we obtain 0003 0004 0003 0004 0003 0004 4 1 5 a+b= + = 1 3 4 0003 and a−b= 4 1 0004 0003 − 1 3 0004 0003 = 3 −2 0004 . Applying Definition 6.5, we obtain 0003 3a =3· 4 1 0004 0003 = 12 3 0004 0003 and (−2) b = (−2) · 1 3 0004 0003 = −2 −6 0004 . The sum and difference of the two vectors, as well as the scalar multiplication, are geometrically illustrated in Figure 6.2. The sum of the two vectors a and b is the vector obtained when adding vector b to the terminal point of vector a. The resulting vector from the origin to the terminal point of vector b gives the sum a + b. Figure 6.2 Vector operations. We see that multiplication of a vector by a positive scalar λ does not change the orientation of the vector, while multiplication by a negative scalar reverses the orientation of the vector. The difference a − b of two vectors means that we add vector b with opposite orientation to the terminal point of vector a, i.e. we add vector −b to vector a. Next, we summarize some rules for the vector operations introduced above. Let a, b, c ∈ Rn and λ, µ ∈ R. Vectors 235 Rules for vector addition and scalar multiplication (1) a + b = b + a, (commutative laws) (2) (a + b) + c = a + (b + c), (associative laws) (3) λ(a + b) = λa + λb, (distributive laws) (4) a + 0 = a, (0 ∈ Rn ) (5) 1 a = a. λa = aλ; (λµ)a = λ(µa); (λ + µ)a = λa + µa; a + (−a) = 0; The validity of the above rules follows immediately from Definitions 6.4 and 6.5 and the validity of the commutative, associative and distributive laws for the set of real numbers. Definition 6.7 The scalar product of two n-dimensional vectors a = (a1 , a2 , . . . , an )T and b = (b1 , b2 , . . . , bn )T is defined as follows: ⎞ ⎛ b1 n ⎜ b2 ⎟ 0006 ⎟ ⎜ aT · b = (a1 , a2 , . . . , an ) · ⎜ . ⎟ = a1 b1 + a2 b2 + . . . + an bn = a i bi . ⎝ . ⎠ i=1 bn The scalar product is also known as the inner product. Note also that the scalar product of two vectors is not a vector, but a number (i.e. a scalar) and that aT · b is defined only if a and b are both of the same dimension. In order to guarantee consistence with operations in later chapters, we define the scalar product in such a way that the first vector is written as a row vector aT and the second vector is written as a column vector b. The commutative and distributive laws are valid for the scalar product, i.e. a T · b = bT · a and aT · (b + c) = aT · b + aT · c for a, b, c ∈ Rn . It is worth noting that the associative law does not necessarily hold for the scalar product, i.e. in general we have a · (bT · c) = (aT · b) · c. Example 6.3 Assume that a firm produces three products with the quantities x1 = 30, x2 = 40 and x3 = 10, where xi denotes the quantity of product i. Moreover, the cost of production is 20 EUR per unit of product 1, 15 EUR per unit of product 2 and 40 EUR per unit of product 3. Let c = (c1 , c2 , c3 )T be the cost vector, where the ith component describes the cost per unit of product i and x = (x1 , x2 , x3 )T . The total cost of production is obtained 236 Vectors as the scalar product of vectors c and x, i.e. with ⎛ ⎞ ⎛ ⎞ 20 30 c = ⎝ 15 ⎠ and x = ⎝ 40 ⎠ 40 10 we obtain ⎛ ⎞ 30 cT · x = (20, 15, 40) · ⎝ 40 ⎠ = 20 · 30 + 15 · 40 + 40 · 10 10 = 600 + 600 + 400 = 1, 600. We have found that the total cost of production for the three products is 1,600 EUR. Definition 6.8 Let a ∈ R with a = (a1 , a2 , . . . , an )T . The (Euclidean) length (or norm) of vector a, denoted by |a|, is defined as 0017 |a| = a21 + a22 + . . . + a2n . A vector with length one is called a unit vector (remember that we have already introduced the specific unit vectors e1 , e2 , . . . , en which obviously have length one). Each non-zero n-dimensional vector a can be written as the product of its length |a| and an n-dimensional unit vector e(a) pointing in the same direction as the vector a itself, i.e. a = |a| · e(a) . Example 6.4 Let vector ⎛ ⎞ −2 a=⎝ 3 ⎠ 6 be given. We are looking for a unit vector pointing in the same direction as vector a. Using 0018 √ |a| = (−2)2 + 32 + 62 = 49 = 7, we find the corresponding unit vector ⎛ ⎞ ⎛ ⎞ −2 −2/7 1 1 ⎝ (a) 3 ⎠ = ⎝ 3/7 ⎠ . e = ·a = · |a| 7 6 6/7 Using Definition 6.8, we can define the (Euclidean) distance between the n-vectors a = (a1 , a2 , . . . , an )T and b = (b1 , b2 , . . . , bn )T as follows: 0018 |a − b| = (a1 − b1 )2 + (a2 − b2 )2 + . . . + (an − bn )2 . Vectors 237 The distance between the two-dimensional vectors a and b is illustrated in Figure 6.3. It corresponds to the length of the vector connecting the terminal points of vectors a and b. Figure 6.3 Distance between the vectors a and b. Example 6.5 Let vectors ⎛ ⎞ 3 a=⎝ 2 ⎠ −3 ⎛ and ⎞ −1 b=⎝ 1 ⎠ 5 be given. The distance between both vectors is given by |a − b| = 0018 √ [3 − (−1)]2 + (2 − 1)2 + (−3 − 5)2 = 81 = 9. Next, we present some further rules for the scalar product of two vectors and the length of a vector. Let a, b ∈ Rn and λ ∈ R. Further rules for the scalar product and the length √ (1) |a| = aT · a ≥ 0; (2) |a| = 0 ⇐⇒ a = 0; (3) |λa| = |λ| · |a|; (4) |a + b| ≤ |a| + |b|; (5) aT · b = |a| · |b| · cos(a, b); (6) |aT · b| ≤ |a| · |b| (Cauchy–Schwarz inequality). 238 Vectors In rule (5), cos(a, b) denotes the cosine value of the angle between vectors a and b. We illustrate the Cauchy–Schwarz inequality by the following example. Example 6.6 Let ⎛ ⎞ 2 a = ⎝ −1 ⎠ 3 ⎛ ⎞ 5 b = ⎝ −4 ⎠ . −1 and For the lengths of vectors a and b, we obtain |a| = 0018 √ 22 + (−1)2 + 32 = 14 and |b| = 0018 52 + (−4)2 + (−1)2 = √ 42. The scalar product of vectors a and b is obtained as aT · b = 2 · 5 + (−1) · (−4) + 3 · (−1) = 10 + 4 − 3 = 11. Cauchy–Schwarz’s inequality says that the absolute value of the scalar product of two vectors a and b is never greater than the product of the lengths of both vectors. For the example, this inequality turns into |aT · b| = 11 ≤ |a| · |b| = √ 14 · √ 42 ≈ 24.2487. Example 6.7 Using rule (5) above, which can also be considered as an alternative equivalent definition of the scalar product, and the definition of the scalar product, according to Definition 6.7, one can easily determine the angle between two vectors a and b of the same dimension. Let ⎞ ⎛ ⎞ ⎛ 3 2 a = ⎝ −1 ⎠ and b = ⎝ 1 ⎠. 2 2 Then we obtain aT · b |a| · |b| 3 · 2 + (−1) · 1 + 2 · 2 9 3 =0018 =√ √ = √ ≈ 0.80178. √ 2 2 2 2 2 2 14 · 9 14 3 + (−1) + 2 · 2 + 1 + 2 cos(a, b) = We have to find the smallest positive argument of the cosine function which gives the value 0.80178. Therefore, the angle between vectors a and b is approximately equal to 36.7◦ . Next, we consider orthogonal vectors. Consider the triangle given in Figure 6.4 formed by the three two-dimensional vectors a, b and a − b. Denote the angle between vectors a and b by γ . From the Pythagorean theorem we know that angle γ is equal to 90◦ if and only if the Vectors 239 sum of the squared lengths of vectors a and b is equal to the squared length of vector a − b. Thus, we have: γ = 90◦ ⇐⇒ |a|2 + |b|2 = |a − b|2 ⇐⇒ aT · a + bT · b = (a − b)T · (a − b) ⇐⇒ aT · a + bT · b = aT · a − aT · b − bT · a + bT · b ⇐⇒ aT · b = 0. The latter equality has been obtained since aT · b = bT · a. For two-dimensional vectors we have seen that the angle between them is equal to 90o if and only if the scalar product is equal to zero. We say in this case that vectors a and b are orthogonal (or perpendicular) and write a ⊥ b. The above considerations can be generalized to the n-dimensional case and we define orthogonality accordingly: a ⊥ b ⇐⇒ aT · b = 0, where a, b ∈ Rn . Figure 6.4 Triangle formed by vectors a, b and a – b. Example 6.8 The three-dimensional vectors ⎛ ⎞ 3 a = ⎝ −1 ⎠ 2 ⎛ and ⎞ 4 b=⎝ 6 ⎠ −3 are orthogonal since aT · b = 3 · 4 + (−1) · 6 + 2 · (−3) = 0. 240 Vectors 6.3 LINEAR DEPENDENCE AND INDEPENDENCE In this section, we discuss one of the most important concepts in linear algebra. Before introducing linearly dependent and linearly independent vectors, we give the following definition. Definition 6.9 Let ai , i = 1, 2, . . . , m, be n-dimensional vectors, and λi , i = 1, 2, . . . , m, be real numbers. Then the n-vector a given by a = λ1 a1 + λ2 a2 + . . . + λm am m 0006 = λi a i (6.1) i=1 is called a linear combination of the vectors a1 , a2 , . . . , am . If λi ≥ 0, i = 1, 2, . . . , m, and m 0006 λi = 1 i=1 in representation (6.1), then a is called a convex combination of the vectors a1 , a2 , . . . , am . The set of all convex combinations of the two vectors a1 and a2 is illustrated in Figure 6.5. It is the set of all vectors whose terminal points are on the line connecting the terminal points of vectors a1 and a2 . Therefore, both vectors c1 and c2 can be written as a convex combination of the vectors a1 and a2 . Notice that for λ1 = 1 and λ2 = 1 − λ1 = 0 we obtain vector a1 , whereas for λ1 = 0 and λ2 = 1 − λ1 = 1 we obtain vector a2 . Note that a convex combination of some vectors is also a special linear combination of these vectors. Figure 6.5 Set of convex combinations of vectors a1 and a2 . Vectors 241 Definition 6.10 The m n-dimensional vectors a1 , a2 , . . . , am ∈ Rn are linearly dependent if there exist numbers λi , i = 1, 2, . . . , m, not all equal to zero, such that m 0006 λi ai = λ1 a1 + λ2 a2 + · · · + λm am = 0. (6.2) i=1 If equation (6.2) only holds when λ1 = λ2 = . . . = λm = 0, then the vectors a1 , a2 , . . . , am are said to be linearly independent. Since two vectors are equal if they coincide in all components, the above equation (6.2) represents n linear equations with the variables λ1 , λ2 , . . . , λm . In Chapter 8, we deal with the solution of such systems in detail. Remark From Definition 6.10 we obtain the following equivalent characterization of linearly dependent and independent vectors. (1) A set of m vectors a1 , a2 , . . . , am ∈ Rn is linearly dependent if and only if at least one of the vectors can be written as a linear combination of the others. (2) A set of m vectors a1 , a2 , . . . , am ∈ Rn is linearly independent if and only if none of the vectors can be written as a linear combination of the others. Example 6.9 Let 0003 0004 3 a1 = 1 0003 and a2 = −9 −3 0004 . In this case, we have a2 = −3a1 which can be written as 3a1 + 1a2 = 0. We can conclude that equation (6.2) holds with λ1 = 3 and λ2 = 1, and thus vectors a1 and a2 are linearly dependent (see Figure 6.6). Example 6.10 Let 0003 0004 3 a1 = 1 0003 and b2 = −1 2 0004 . In this case, equation λ1 a1 + λ2 b2 = 0 reduces to 3λ1 − λ2 = 0 λ1 + 2λ2 = 0. 242 Vectors This is a system of two linear equations with two variables λ1 and λ2 which can easily be solved. Multiplying the first equation by two and adding it to the second equation, we obtain λ1 = 0 and then λ2 = 0 as the only solution of this system. Therefore, both vectors a1 and b2 are linearly independent (see Figure 6.6). Figure 6.6 Linearly dependent and independent vectors. The above examples illustrate that in the case of two vectors of the two-dimensional Euclidean space R2 , we can easily decide whether they are linearly dependent or independent. Two two-dimensional vectors are linearly dependent if and only if one vector can be written as a multiple of the other vector, i.e. a2 = − λ1 1 ·a , λ2 λ2 = 0 (see Figure 6.6). On the other hand, in the two-dimensional space every three vectors are linearly dependent. This is also illustrated in Figure 6.6. Vector c can be written as a linear combination of the linearly independent vectors a1 and b2 , i.e. c = λ1 a1 + λ2 b2 , from which we obtain 1c − λ1 a1 − λ2 b2 = 0. (6.3) By Definition 6.10, these vectors are linearly dependent since e.g. the scalar of vector c in representation (6.3) is different from zero. Considering 3-vectors, three vectors are linearly dependent if one of them can be written as a linear combination of the other two vectors which means that the third vector belongs to the plane spanned by the other two vectors. If the three vectors do not belong to the same plane, Vectors 243 these vectors are linearly independent. Four vectors in the three-dimensional Euclidean space are always linearly dependent. In general, we can say that in the n-dimensional Euclidean space Rn , there are no more than n linearly independent vectors. Example 6.11 Let us consider the three vectors ⎛ ⎞ 2 a = ⎝ 0 ⎠, 0 1 ⎛ ⎞ 1 a =⎝ 1 ⎠ 0 2 ⎛ ⎞ 3 a = ⎝ 2 ⎠, 1 3 and and investigate whether they are linearly dependent or independent. Using Definition 6.10, we obtain ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ ⎞ 3 2λ1 + λ2 + 3λ3 0 2 1 ⎠ = ⎝ 0 ⎠. λ2 + 2λ3 λ1 ⎝ 0 ⎠ + λ2 ⎝ 1 ⎠ + λ3 ⎝ 2 ⎠ = ⎝ 1 0 0 0 λ3 ⎛ Considering the third component of the above vectors, we obtain λ3 = 0. Substituting λ3 = 0 into the second component, we get from λ2 + 2λ3 = 0 the only solution λ2 = 0 and considering finally the first component, we obtain from 2λ1 + λ2 + 3λ3 = 0 the only solution λ1 = 0. Since vector λT = (λ1 , λ2 , λ3 ) = (0, 0, 0) is the only solution, the above three vectors are linearly independent. Example 6.12 The set {e1 , e2 , . . . , en } of n-dimensional unit vectors in the space Rn obviously constitutes a set of linearly independent vectors, and any n-dimensional vector aT = (a1 , a2 , . . . , an ) can be immediately written as a linear combination of these unit vectors: ⎛ ⎜ ⎜ a = a1 ⎜ ⎝ 1 0 . . 0 ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟ + a2 ⎜ ⎠ ⎝ 0 1 . . 0 ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟ + · · · + an ⎜ ⎠ ⎝ 0 0 . . ⎞ n ⎟ 0006 ⎟ ai e i . ⎟= ⎠ i=1 1 In this case, the scalars of the linear combination of the unit vectors are simply the components of vector a. 244 Vectors 6.4 VECTOR SPACES We have discussed several properties of vector operations so far. In this section, we introduce the notion of a vector space. This is a set of elements (not necessarily only vectors of real numbers) which satisfy certain rules listed in the following definition. Definition 6.11 Given a set V = {a, b, c, . . . } of vectors (or other mathematical objects), for which an addition and a scalar multiplication are defined, suppose that the following properties hold (λ, µ ∈ R): (1) a + b = b + a; (2) (a + b) + c = a + (b + c); (3) there exists a vector 0 ∈ V such that for all a ∈ V the equation a + 0 = a holds (0 is the zero or neutral element with respect to addition); (4) for each a ∈ V there exists a uniquely determined element x ∈ V such that a + x = 0 (x = −a is the inverse element of a with respect to addition); (5) (λµ)a = λ(µa); (6) 1 · a = a; (7) λ(a + b) = λa + λb; (8) (λ + µ)a = λa + µa. If for any a, b ∈ V , inclusion a + b ∈ V and for any λ ∈ R inclusion λa ∈ V hold, then V is called a linear space or vector space. As mentioned before, the elements of a vector space do not necessarily need to be vectors since other ‘mathematical objects’ may also obey the above rules (1) to (8). Next, we give some examples of vector spaces satisfying the rules listed in Definition 6.11, where in each case an addition and a scalar multiplication is defined in the usual way. Examples of vector spaces (1) (2) (3) (4) (5) the n-dimensional space Rn ; the set of all n-vectors a ∈ Rn that are orthogonal to some fixed n-vector b ∈ Rn ; the set of all sequences {an }; the set C[a, b] of all continuous functions on the closed interval [a, b]; the set of all polynomials Pn (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 of a degree of at most n. To prove this, one has to verify the validity of the rules (1) to (8) given in Definition 6.11 and to show that the sum of any two elements as well as multiplication by a scalar again gives an Vectors 245 element of this space. For instance, consider the set of all polynomials of degree n. The sum of two such polynomials Pn1 (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 Pn2 (x) = bn xn + bn−1 xn−1 + · · · + b1 x + b0 gives Pn1 (x) + Pn2 (x) = (an + bn )xn + (an−1 + bn−1 )xn−1 + · · · + (a1 + b1 )x + (a0 + b0 ), i.e. the sum of these polynomials is again a polynomial of degree n. By multiplying a polynomial Pn1 of degree n by a real number λ, we obtain again a polynomial of degree n: λPn1 = (λan )xn + (λan−1 )xn−1 + · · · + (λa1 )x + λa0 . Basis of a vector space; change of basis Next, we introduce the notion of the basis of a vector space. Definition 6.12 A set B = {b1 , b2 , . . . , bn } of linearly independent vectors of a vector space V is called a basis of V if any vector a ∈ V can be written as a linear combination a = λ1 b1 + λ2 b2 + · · · + λn bn of the basis vectors b1 , b2 , . . . , bn . The number n = |B| of vectors contained in the basis gives the dimension of vector space V . For an n-dimensional vector space V , we also write: dim V = n. Obviously, the set Bc = {e1 , e2 , . . . , en } of unit vectors constitutes a basis of the n-dimensional Euclidean space Rn (see also Example 6.12). This basis Bc is also called the canonical basis. The notion of a basis of a vector space is a fundamental concept in linear algebra which we will need again later when discussing various algorithms. Remark (1) An equivalent definition of a basis is that a maximal set of linearly independent vectors of a vector space V constitutes a basis and therefore, the dimension of a vector space is given by the maximal number of linearly independent vectors. (2) We say that V is spanned by the vectors b1 , b2 , . . . , bn of B since any vector of the vector space V can be ‘generated’ by means of the basis vectors. (3) The dimension of a vector space is not necessarily equal to the number of components of its vectors. If a set of n-dimensional vectors is given, they can contain at most n linearly independent vectors. However, any n linearly independent vectors of an n-dimensional vector space constitute a basis. Next, we establish whether an arbitrary vector of a vector space can be written as a linear combination of the basis vectors in a unique way. 246 Vectors THEOREM 6.1 Let B = {b1 , b2 , . . . , bn } be a basis of an n-dimensional vector space V . Then any vector c ∈ V can be uniquely written as a linear combination of the basis vectors from set B. PROOF We prove the theorem indirectly. Assume there exist two different linear combinations of the given basis vectors from B which are equal to vector c: c = λ1 b1 + λ2 b2 + · · · + λn bn (6.4) c = µ1 b1 + µ2 b2 + · · · + µn bn , (6.5) and i.e. there exists an index i with 1 ≤ i ≤ n such that λi = µi . By subtracting equation (6.5) from equation (6.4), we get 0 = (λ1 − µ1 )b1 + (λ2 − µ2 )b2 + · · · + (λn − µn )bn Since the basis vectors b1 , b2 , . . . , bn are linearly independent by Definition 6.12, we must have λ1 − µ1 = 0, λ2 − µ2 = 0, .., λn − µn = 0, which is equivalent to λ1 = µ1 , λ2 = µ2 , .., λn = µn , i.e. we have obtained a contradiction. Thus, any vector of V can be uniquely written as a linear combination of the given basis vectors. 0001 While the dimension of a vector space is uniquely determined, the basis is not uniquely determined. This leads to the question of whether we can replace a particular vector in the basis by some other vector not contained in the basis such that again a basis is obtained. The following theorem shows that there is an easy way to answer this question, and from the proof of the following theorem we derive an algorithm for the replacement of a vector in the basis by some other vector (provided this is possible). The resulting procedure is a basic part of some algorithms for the solution of systems of linear equations or linear inequalities which we discuss in Chapter 8. THEOREM 6.2 (Steinitz’s theorem) Let set B = {b1 , b2 , . . . , bn } be a basis of an n-dimensional vector space V and let vector ak be given by ak = λ1 b1 + λ2 b2 + · · · + λk bk + · · · + λn bn with λk = 0. Then the set B∗ = {b1 , b2 , . . . , bk−1 , ak , bk+1 , . . . , bn } is also a basis of V , i.e. vector bk contained in basis B can be replaced by the vector ak to obtain another basis. PROOF Let us consider the following linear combination of the zero vector: µ1 b1 + µ2 b2 + · · · + µk−1 bk−1 + µk ak + µk+1 bk+1 + · · · + µn bn = 0. (6.6) Vectors 247 By substituting the linear combination ak = λ1 b1 + λ2 b2 + · · · + λn bn into equation (6.6), we obtain: (µ1 + µk λ1 )b1 + (µ2 + µk λ2 )b2 + · · · + (µk−1 + µk λk−1 )bk−1 + (µk λk )bk + (µk+1 + µk λk+1 )bk+1 + · · · + (µn + µk λn )bn = 0. Since the vectors of the set B = {b1 , b2 , . . . , bn } constitute a basis, they are linearly independent and all the scalars in the above linear combination must be equal to zero, i.e. we get (µi + µk λi ) = 0 for i = 1, 2, . . . , n, i = k and µk λk = 0. Since by assumption λk = 0, we get first µk = 0 and then, using the latter result, µi = 0 for all i with 1 ≤ i ≤ n, i = k, i.e. all the scalars in the linear combination (6.6) must be equal to zero. Hence, the vectors of set B∗ = {b1 , b2 , . . . , bk−1 , ak , bk+1 , . . . , bn } are linearly independent by Definition 6.10 and they constitute a basis. 0001 We know that any basis of a certain vector space consists of the same number of vectors. So, if we want to remove one vector from the current basis, we have to add exactly one other vector to the remaining ones such that the resulting set of vectors is again linearly independent. We now look for a procedure for performing such an interchange of two vectors described in the proof of Theorem 6.2. To this end, assume that B = {b1 , . . . , bk−1 , bk , bk+1 , . . . , bn } is a basis and B∗ = {b1 , . . . , bk−1 , ak , bk+1 , . . . , bn } is another basis, where vector bk has been replaced by vector ak . According to Theorem 6.2, we must have λk = 0 in the linear combination of vector ak = λ1 b1 + · · · + λk bk + · · · + λn bn (6.7) of the vectors of basis B since otherwise a replacement of vector bk by vector ak is not possible. Let us consider an arbitrary vector c and its linear combinations of the basis vectors of bases B and B∗ , respectively: c = α1 b1 + · · · + αk−1 bk−1 + αk bk + αk+1 bk+1 + · · · + αn bn (6.8) c = β1 b1 + · · · + βk−1 bk−1 + βk ak + βk+1 bk+1 + · · · + βn bn . (6.9) and By substituting representation (6.7) of vector ak into representation (6.9), we obtain c = β1 b1 + · · · + βk−1 bk−1 + βk (λ1 b1 + · · · + λk bk + · · · + λn bn ) + βk+1 bk+1 + · · · + βn bn = (β1 + βk λ1 ) b1 + · · · + (βk−1 + βk λk−1 ) bk−1 + βk λk bk + (βk+1 + βk λk+1 ) bk+1 + · · · + (βn + βk λn ) bn . 248 Vectors Comparing now the scalars of the vectors from the basis B = {b1 , b2 , . . . , bn } in both representations of vector c (i.e in representation (6.8) and the last equality above) first for k and then for all remaining i = k, we first obtain βk = αk λk (6.10) and then from αi = βi + βk λi , i = 1, 2, . . . , n, i = k, it follows by means of equality (6.10) βi = αi − λi αk , λk i = 1, 2, . . . , n, i = k. In order to transform the linear combination of vector c of the basis vectors of B into the linear combination of the basis vectors of B∗ , we can therefore use the scheme given in Table 6.1. The last column describes the operation that has to be performed in order to get the elements of the current row: e.g. in row n + 2, the notation ‘row 2 – (λ2 /λk ) row k’ means that we have to take the corresponding element of row 2 (i.e. α2 in the c column), and then we have to subtract λ2 /λk times the corresponding element of row k (i.e. αk in the c column) which gives the new element β2 in row n + 2 and the c column. The transformation formula above for all rows different from k is also called the rectangle formula since exactly four elements forming a rectangle are required to determine the corresponding new element. This is illustrated in the scheme above for the determination of element β2 , where the corresponding four elements in row 2 and row k are underlined. Table 6.1 Tableau for Steinitz’s procedure Row Basis vectors ak c 1 2 . . k . . n b1 b2 . . bk . . bn λ1 λ2 . . λk . . λn α1 α2 . . αk . . αn n+1 b1 0 n+2 b2 0 β1 = α1 − β2 = α2 − . . n+k . . 2n . . ak . . bn . . 1 . . 0 . . βk = αλk k . . βn = αn − λλn αk k Operation λ1 λk αk λ2 λk αk row 1 − λλ1 row k k row 2 − λλ2 row k k . . 1 λk row k . . row n − λλn row k k Vectors 249 ak In columns 3 and 4, we have the corresponding scalars of vectors and c in the linear combinations of the basis vectors of B (row 1 up to row n) and of the basis vectors of B∗ (row n + 1 up to row 2n). In particular, from the first n rows we get ak = λ1 b1 + · · · + λk bk + · · · + λn bn and c = α1 b1 + · · · + αk bk + · · · + αn bn . From the last n rows, we get the linear combinations ak = 0b1 + · · · + 0bk−1 + 1ak + 0bk+1 + · · · + 0bn and 0003 0004 0004 0003 0004 0003 λ1 αk λn c = α1 − αk b1 + · · · + a k + · · · + α n − αk b n . λk λk λk If we have representations of several vectors with respect to basis B and we look for their corresponding representations in the new basis B∗ , we simply add some more columns (for each vector one column) in the above scheme, and by performing the same operations as indicated in the last column, we get all the representations with respect to the new basis B∗ . The above procedure of replacing a vector in the current basis is illustrated by the following example. Let a basis B = {b1 , b2 , b3 , b4 } of the four-dimensional space R4 with ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 3 −2 −1 ⎜ −1 ⎟ ⎜ −5 ⎟ ⎜ 1 ⎟ ⎜ 0 ⎟ 2 3 4 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ b1 = ⎜ ⎝ 1 ⎠ , b = ⎝ 2 ⎠ , b = ⎝ −2 ⎠ and b = ⎝ −1 ⎠ −2 −1 2 1 Example 6.13 ⎛ be given. We do not prove here that these four vectors indeed constitute a basis in R4 . According to Definition 6.10, we have to solve a system of four linear equations with four variables which we treat later in Chapter 8 in detail. Moreover, let a3 = 4b1 − 2b2 + 2b3 − 6b4 be the vector which enters the basis instead of vector b3 . In addition, let vector c be given as c = b1 + 2b2 + 4b3 + 3b4 . Notice that ⎛ ⎞ −4 ⎜ −7 ⎟ ⎟ c=⎜ ⎝ −6 ⎠ , 7 (6.11) 250 Vectors i.e. the representation of this vector by means of the unit vectors e1 , e2 , e3 , e4 as basis vectors is c = −4e1 − 7e2 − 6e3 + 7e4 . Applying the tableau given in Table 6.1 to find the linear combination of vector c of the new basis vectors, we obtain the results given in Table 6.2. From row 5 to row 8 of the c column, we get the representation of vector c by means of the basis B∗ = {b1 , b2 , a3 , b4 }: c = −7b1 + 6b2 + 2a3 + 15b4 . Table 6.2 The change of the basis in Example 6.13 Row Basis vectors b3 1 2 3 4 b1 b2 b3 b4 4 −2 2 −6 1 2 4 3 5 6 7 8 b1 b2 a3 b4 0 0 1 0 −7 6 2 15 Operation c row 1 − 2 row 3 row 2 + row 3 1 row 3 2 row 4 + 3 row 3 We can easily check that our computations are correct: ⎛ ⎞ ⎛ 1 3 ⎜ −1 ⎟ ⎜ −5 ⎟ ⎜ c = −7 ⎜ ⎝ 1 ⎠ + 6⎝ 2 −2 −1 ⎞ ⎛ ⎞ ⎛ 0 ⎟ ⎜ ⎟ ⎜ ⎟ + 2 ⎜ 8 ⎟ + 15 ⎜ ⎠ ⎝ 2 ⎠ ⎝ −8 ⎞ ⎛ −1 −4 ⎜ −7 0 ⎟ ⎟=⎜ −1 ⎠ ⎝ −6 1 7 ⎞ ⎟ ⎟, ⎠ i.e. for vector c we get the same representation with respect to the basis vectors e1 , e2 , e3 , e4 as before (see equality (6.11)). If a basis B = {b1 , b2 , . . . , bn } should be replaced by a basis B∗ = {a1 , a2 , . . . , an }, then we can apply consecutively the above procedure by replacing in each step a vector bi , 1 ≤ i ≤ n, by a vector a j , 1 ≤ j ≤ n, provided that the assumption of Theorem 6.2 is satisfied. EXERCISES 6.1 Given are the vectors ⎛ ⎞ 2 a = ⎝ 1 ⎠, −1 ⎛ ⎞ 1 b = ⎝ −4 ⎠ −2 ⎛ and ⎞ 2 c = ⎝ 2 ⎠. 6 Vectors 251 (a) Find vectors a + b − c, a + 3b, b − 4a + 2c, a + 3(b − 2c). (b) For which of the vectors a, b and c do the relations > or ≥ hold? (c) Find the scalar products aT · b, aT · c, bT · c. Which of the vectors a, b and c are orthogonal? What is the angle between the vectors b and c? (d) Compute vectors (aT · b) · c and a · (bT · c). (e) Compare number |b + c| with number |b| + |c| and number |bT · c| with number |b| · |c|. 6.2 6.3 6.4 Find α and β so that vectors ⎛ ⎞ 2 ⎝ −1 ⎠ a= and α ⎛ ⎞ β 4 ⎠ −2 b=⎝ are orthogonal. (a) What is the distance between the following points: (1, 2, 3) and (4, −1, 2) in the three-dimensional Euclidean space R3 ? (b) Illustrate the following sets of points in R2 : a ≥ b and |a| ≥ |b|. Given are the vectors 0003 0004 1 a1 = and 0 0003 a2 = Find out which of the vectors 0003 0004 0003 0004 −2 2 , and 1 3 −1 1 0003 0004 0 0.5 . 0004 are linear combinations of a1 and a2 . Is one of the above vectors a convex combination of vectors a1 and a2 ? Graph all these vectors. 6.5 Given are the vectors 0003 0004 0003 0004 4 1 a1 = , a2 = , 2 4 0003 a3 = 3 0 0004 0003 and a4 = 3 2 0004 . Show that vector a4 can be expressed as a convex linear combination of vectors a1 , a2 and a3 . Find the convex combinations of vectors a1 , a2 and a3 graphically. 6.6 Are the vectors ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 1 5 1 2 3 a = ⎝ 0 ⎠, a = ⎝ −2 ⎠ and a =⎝ 4 ⎠ 0 1 −2 6.7 linearly independent? Do the two vectors 0003 0004 2 a1 = −1 0003 and a2 = −4 2 0004 span the two-dimensional space? Do they constitute a basis? Graph the vectors and illustrate their linear combinations. 252 Vectors 6.8 Do the vectors ⎛ ⎞ 1 ⎜ 0 ⎟ ⎜ ⎟, ⎝ 0 ⎠ 1 ⎛ ⎞ 0 ⎜ 0 ⎟ ⎜ ⎟, ⎝ 1 ⎠ 0 ⎛ ⎞ 0 ⎜ 1 ⎟ ⎜ ⎟ ⎝ 0 ⎠ 1 ⎛ and ⎞ 1 ⎜ 0 ⎟ ⎜ ⎟ ⎝ 1 ⎠ 0 constitute a basis in R4 ? 6.9 Let vectors ⎛ ⎞ 1 a1 = ⎝ 0 ⎠ , 3 ⎛ ⎞ 0 a2 = ⎝ 1 ⎠ 0 ⎛ and ⎞ 1 a3 = ⎝ 0 ⎠ −1 constitute a basis in R3 . (a) Express vector ⎛ ⎞ 3 a=⎝ 3 ⎠ −3 as a linear combination of the three vectors a1 , a2 and a3 above. (b) Find all other bases for the three-dimensional space which include vector a and vectors from the set {a1 , a2 , a3 }. (c) Express vector ⎛ ⎞ 5 b = 2a1 + 2a2 + 3a3 = ⎝ 2 ⎠ 3 by the basis vectors a1 , a2 and a. 7 Matrices and determinants 7.1 MATRICES We start with an introductory example. Example 7.1 Assume that a firm uses three raw materials denoted by R1 , R2 and R3 to produce four intermediate products S1 , S2 , S3 and S4 . These intermediate products are partially also used to produce two final products F1 and F2 . The numbers of required units of the intermediate products are independent of the use of intermediate products as input for the two final products. Table 7.1 gives the number of units of each raw material which are required for the production of one unit of each of the intermediate products. Table 7.2 gives the number of units of each intermediate product necessary to produce one unit of each of the final products. The firm intends to produce 80 units of S1 , 60 units of S2 , 100 units of S3 and 50 units of S4 as well as 70 units of F1 and 120 units of F2 . The question is: how many units of the raw materials are necessary to produce the required numbers of the intermediate and final products? Table 7.1 Raw material requirements for the intermediate products Raw material S1 S2 S3 S4 R1 R2 R3 2 3 1 1 2 4 4 1 0 0 3 5 Table 7.2 Intermediate product requirements for the final products Raw material F1 F2 S1 S2 S3 S4 2 4 1 3 3 0 4 1 254 Matrices and determinants To produce the required units of the intermediate products, we need 2 · 80 + 1 · 60 + 4 · 100 + 0 · 50 = 620 units of raw material R1 . Similarly, we need 3 · 80 + 2 · 60 + 1 · 100 + 3 · 50 = 610 units of raw material R2 and 1 · 80 + 4 · 60 + 0 · 100 + 5 · 50 = 570 units of raw material R3 . Summarizing the above considerations, the vector yS of the required units of raw materials for the production of the intermediate products is given by ⎛ ⎞ 620 y = ⎝ 610 ⎠ , 570 S where the kth component gives the number of units of Rk required for the production of the intermediate products. Next, we calculate how many units of each raw material are required for the production of the final products. Since the intermediate products are used for the production of the final products (see Table 7.2), we find that for the production of one unit of final product F1 the required amount of raw material R1 is 2 · 2 + 1 · 4 + 4 · 1 + 0 · 3 = 12. Similarly, to produce one unit of final product F2 requires 2 · 3 + 1 · 0 + 4 · 4 + 0 · 1 = 22 units of R1 . To produce one unit of final product F1 requires 3 · 2 + 2 · 4 + 1 · 1 + 3 · 3 = 24 units of R2 . Continuing in this way, we get Table 7.3, describing how many units of each raw material are required for the production of each of the final products. Table 7.3 Raw material requirements for the final products Raw material F1 F2 R1 R2 R3 12 24 33 22 16 8 Matrices and determinants 255 Therefore, for the production of the final products, there are 12 · 70 + 22 · 120 = 3, 480 units of raw material R1 , 24 · 70 + 16 · 120 = 3, 600 units of raw material R2 and finally 33 · 70 + 8 · 120 = 3, 270 units of raw material R3 required. The vector yF containing as components the units of each raw material required for the production of the final products is then given by ⎛ ⎞ 3, 480 F y = ⎝ 3, 600 ⎠ . 3, 270 So the amount of the individual raw materials required for the total production of the intermediate and final products is obtained as the sum of the vectors yS and yF . Denoting this sum vector by y, we obtain ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 620 3, 480 4, 100 y = ⎝ 610 ⎠ + ⎝ 3, 600 ⎠ = ⎝ 4, 210 ⎠ . 575 3, 270 3, 845 The question is whether we can simplify the above computations by introducing some formal apparatus. In the following, we use matrices and define operations such as addition or multiplication in an appropriate way. Definition 7.1 A matrix A is a rectangular array of elements (numbers or other mathematical objects, e.g. functions) aij of the form ⎞ ⎛ a11 a12 · · · a1n ⎜a21 a22 · · · a2n ⎟ ⎟ ⎜ . A = (aij ) = ⎜. .. .. ⎟ ⎠ ⎝. am1 am2 · · · amn Any element (or entry) aij has two indices, a row index i and a column index j. The matrix A is said to have the order or dimension m × n (read: m by n). If m = n, matrix A is called a square matrix. For a matrix A of order m × n, we also write A = A(m,n) , or A = (aij )(m,n) , or simply A = (aij ). 256 Matrices and determinants Definition 7.2 Let a matrix A of order m × n be given. The transpose AT of matrix A is obtained by interchanging the rows and columns of A, i.e. the first column becomes the first row, the first row becomes the first column and so on. Thus: A = (aij ) =⇒ AT = (a∗ij ) with a∗ji = aij for 1 ≤ j ≤ n and 1 ≤ i ≤ m. Obviously, matrix AT in Definition 7.2 is of order n × m. Example 7.2 Let 0003 2 3 4 A= 7 −1 0 0004 1 . 4 Since matrix A is of order 2 × 4, matrix AT is of order 4 × 2, and we get ⎛ ⎞ 2 7 ⎜3 −1⎟ ⎟. A=⎜ ⎝4 0⎠ 1 4 A vector ⎞ a1 ⎜ a2 ⎟ ⎜ ⎟ a=⎜ . ⎟ ⎝ . ⎠ am Remark ⎛ is a special matrix with one column (i.e. a matrix of order m × 1). Analogously, a transposed vector aT = (a1 , a2 , . . . , am ) is a special matrix consisting only of one row. Definition 7.3 Two matrices A and B of the same order m × n are equal if corresponding elements are equal, i.e. aij = bij for 1 ≤ i ≤ m and 1 ≤ j ≤ n. So only for matrices of the same order m × n can we decide whether both matrices are equal. Definition 7.4 A matrix A of order n × n is called symmetric if A = AT , i.e. equality aij = aji holds for 1 ≤ i, j ≤ n. Matrix A is called antisymmetric if A = −AT , i.e. aij = −aji for 1 ≤ i, j ≤ n. Matrices and determinants 257 As a consequence from Definition 7.4, we obtain: if A is antisymmetric, then we must have aii = 0 for i = 1, 2, . . . , n. Special matrices We finish this section with some matrices of special structure. Definition 7.5 ⎛ d1 ⎜0 ⎜ D = ⎜. ⎝. 0 A matrix D = (dij ) of order n × n with ⎞ 0 ··· 0 0013 d2 · · · 0 ⎟ di for 1 ≤ i, j ≤ n and i = j, ⎟ . . ⎟ , i.e. dij = 0 for 1 ≤ i, j ≤ n and i = j, ⎠ . . 0 · · · dn is called a diagonal matrix. A diagonal matrix I = (iij ) of order n × n with ⎛ ⎞ 1 0 ··· 0 0013 ⎜0 1 · · · 0⎟ 1 for 1 ≤ i, j ≤ n and i = j ⎜ ⎟ I = ⎜. . , i.e. iij = ⎟ . . ⎠ 0 for 1 ≤ i, j ≤ n and i = j ⎝. . 0 0 ··· 1 is called an identity matrix. Definition 7.6 A matrix U = (uij ) of order n × n with ⎞ ⎛ u11 u12 · · · u1n ⎜0 u22 · · · u2n ⎟ ⎟ ⎜ U = ⎜. . . ⎟ , i.e. uij = 0 for 1 ≤ i, j ≤ n and i > j . ⎝. . . ⎠ 0 0 · · · unn is called an upper triangular matrix. A matrix L = (lij ) of order n × n with ⎞ ⎛ ··· 0 l11 0 ⎜l21 l22 · · · 0 ⎟ ⎟ ⎜ L = ⎜. . . ⎟ , i.e. lij = 0 for 1 ≤ i, j ≤ n and i < j ⎝. . . ⎠ ln1 ln2 · · · lnn is called a lower triangular matrix. Notice that the matrices given in Definitions 7.5 and 7.6 are defined only in the case of a square matrix. 258 Matrices and determinants Definition 7.7 ⎛ 0 ⎜0 ⎜ O = ⎜. ⎝. 0 A matrix O = (oij ) of order m × n with ⎞ 0 ··· 0 0 · · · 0⎟ ⎟ . . ⎟ , i.e. oij = 0 for 1 ≤ i ≤ m and 1 ≤ j ≤ n . .⎠ 0 ··· 0 is called a zero matrix. 7.2 MATRIX OPERATIONS In the following, we discuss matrix operations such as addition and multiplication and their properties. Definition 7.8 Let A = (aij ) and B = (bij ) be two matrices of order m × n. The sum A + B is defined as the m × n matrix (aij + bij ), i.e. A + B = (aij )(m,n) + (bij )(m,n) = (aij + bij )(m,n) . Thus, the sum of two matrices of the same order is obtained when corresponding elements at the same position in both matrices are added. The zero matrix O is the neutral element with respect to matrix addition, i.e. we have A + O = O + A = A, where matrix O has the same order as matrix A. Definition 7.9 Let A = (aij ) be an m × n matrix and λ ∈ R. The product of the scalar λ and the matrix A is the m × n matrix λA = (λaij ), i.e. any element of matrix A is multiplied by the scalar λ. The operation of multiplying a matrix by a scalar is called scalar multiplication. Using Definitions 7.8 and 7.9, we can define the difference of two matrices as follows. Definition 7.10 Let A = (aij ) and B = (bij ) be matrices of order m × n. Then the difference of matrices A and B is defined as A − B = A + (−1)B. Matrices and determinants 259 Consequently, matrix A − B is given by the m × n matrix (aij − bij ), i.e. A − B = (aij )(m,n) − (bij )(m,n) = (aij − bij )(m,n) . Example 7.3 Let ⎛ 3 A = ⎝0 1 1 −2 4 ⎞ 2 3⎠ , 5 ⎛ 1 B = ⎝3 4 ⎞ 2 0 1 −1⎠ 2 −2 and ⎛ 1 C = ⎝2 2 ⎞ 2 5 0 3⎠ . −3 1 We compute 2A + 3B − C and obtain ⎛ ⎞ ⎛ ⎞ 2·3 2·1 2·2 3·1 3·2 3·0 ⎝ ⎠ ⎝ 2A + 3B − C = 2 · 0 2 · (−2) 2 · 3 + 3 · 3 3 · 1 3 · (−1)⎠ 2·1 2·4 2·5 3 · 4 3 · 2 3 · (−2) ⎛ ⎞ 1 2 5 0 3⎠ − ⎝2 2 −3 1 ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 6 2 4 3 6 0 1 2 5 0 3⎠ = ⎝0 −4 6⎠ + ⎝ 9 3 −3⎠ − ⎝2 2 8 10 12 6 −6 2 −3 1 ⎛ ⎞ 8 6 −1 0⎠ . = ⎝ 7 −1 12 17 3 Next, we give some rules for adding two matrices A and B of the same order and for multiplying a matrix by some real number (scalar multiplication). Let A, B, C be matrices of order m × n and λ, µ ∈ R. Rules for matrix addition and scalar multiplication (1) A + B = B + A; (commutative law) (2) (A + B) + C = A + (B + C); (associative law) (3) λ(A + B) = λA + λB; (λ + µ)A = λA + µA. (distributive laws) We have already introduced the notion of a vector space in Chapter 6. Using matrix addition and scalar multiplication as introduced in Definitions 7.8 and 7.9, we can extend the rules presented above and get the following result. THEOREM 7.1 The set of all matrices of order m × n constitutes a vector space. 260 Matrices and determinants Next, we introduce the multiplication of two matrices of specific orders. Definition 7.11 Let A = (aij ) be a matrix of order m × p and B = (bij ) be a matrix of order p × n. The product AB is a matrix of order m × n which is defined by ⎛ a11 b11 + · · · + a1p bp1 ⎜a b + ··· + a b 2p p1 ⎜ 21 11 AB = ⎜ . ⎜ ⎝ . am1 b11 + · · · + amp bp1 a11 b12 + · · · + a1p bp2 a21 b12 + · · · + a2p bp2 . . am1 b12 + · · · + amp bp2 ··· .. . . ··· ⎞ a11 b1n + · · · + a1p bpn a21 b1n + · · · + a2p bpn ⎟ ⎟ ⎟. . ⎟ ⎠ . am1 b1n + · · · + amp bpn Notice that the product AB is defined only when the number of columns of matrix A is equal to the number of rows of matrix B. For calculating the product A(m,p) B(p,n) , we can use Falk’s scheme which is as follows: AB with cij = p 0006 b11 b21 . . b12 b22 . . ··· ··· b1n b2n . . a11 a21 . . a12 a22 . . ··· ··· a1p a2p . . bp1 c11 c21 . . bp2 c12 c22 . . ··· ··· ··· bpn c1n c2n . . am1 am2 ··· amp cm1 cm2 ··· cmn aik bkj for i = 1, 2, . . . , m and j = 1, 2, . . . , n. k=1 From the above scheme we again see that element cij is obtained as the scalar product of the ith row vector of matrix A and the jth column vector of matrix B. If we have to perform more than one matrix multiplication, we can successively apply Falk’s scheme. Assuming that the corresponding products of n matrices are defined, we can (due to the validity of the associative law) perform the multiplications either starting from the left or from the right. In the former case, we obtain C = A1 A2 A3 · · · An according to C = [(A1 A2 )A3 ] · · · An , i.e. by using Falk’s scheme repeatedly we obtain A1 A2 A 1 A2 A3 A 1 A 2 A3 ··· ··· An , C or in the latter case C = A1 [A2 . . . (An−1 An )], i.e. we obtain by using Falk’s scheme repeatedly: An−1 . . A2 A1 An An−1 An . . A2 A3 . . . A n C Matrices and determinants 261 Next, we discuss some properties of matrix multiplication. (1) Matrix multiplication is not commutative, i.e. in general we have AB = BA. It may even happen that only one of the possible products of two matrices is defined but not the other. For instance, let A be a matrix of order 2 × 4 and B be a matrix of order 4 × 3. Then the product AB is defined and gives a product matrix of order 2 × 3. However, the product BA is not defined since matrix B has three columns but matrix A has only two rows. (2) For matrices A(m,p) , B( p,r) and C(r,n) , we have A(BC) = (AB)C, i.e. matrix multiplication is associative provided that the corresponding products are defined. (3) For matrices A(m,p) , B(p,n) and C(p,n) , we have A(B + C) = AB + AC, i.e. the distributive law holds provided that B and C have the same order and the product of matrices A and B + C is defined. (4) The identity matrix I of order n × n is the neutral element of matrix multiplication of square matrices of order n × n, i.e. AI = IA = A. Let A be a square matrix. Then we write AA = A2 , and in general An = AA . . . A, where factor A occurs n times, is known as the nth power of matrix A. Example 7.4 Let matrices ⎛ 3 A = ⎝4 5 −2 1 4 ⎞ 6 3⎠ 0 and ⎛ 2 B = ⎝4 1 ⎞ 3 1⎠ 5 be given. The product BA is not defined since matrix B has two columns but matrix A has three rows. The product AB is defined according to Definition 7.11, and the resulting product matrix C = AB is of the order 3 × 2. Applying Falk’s scheme, we obtain AB 3 −2 6 4 1 3 5 4 0 2 3 4 1 1 5 , 4 37 15 28 26 19 i.e. we have obtained ⎛ 4 C = AB = ⎝15 26 ⎞ 37 28⎠ . 19 Example 7.5 Three firms 1, 2 and 3 share a market for a certain product. Currently, firm 1 has 25 per cent of the market, firm 2 has 55 per cent and firm 3 has 20 per cent of the market. 262 Matrices and determinants We can summarize this in a so-called market share vector s, where component si is a real number between zero and one giving the current percentage of firm i as a decimal so that the sum of all components is equal to one. In this example, the corresponding market share vector s = (s1 , s2 , s3 )T is given by ⎛ ⎞ 0.25 s = ⎝ 0.55 ⎠ . 0.20 In the course of one year, the following changes occur. (1) Firm 1 keeps 80 per cent of its customers, while losing 5 per cent to firm 2 and 15 per cent to firm 3. (2) Firm 2 keeps 65 per cent of its customers, while losing 15 per cent to firm 1 and 20 per cent to firm 3. (3) Firm 3 keeps 75 per cent of its customers, while losing 15 per cent to firm 1 and 10 per cent to firm 2. We compute the market share vector s∗ after the above changes. To do this, we introduce a matrix T = (tij ), where tij is the percentage (as a decimal) of customers of firm j who become a customer of firm i within the next year. Matrix T is called a transition matrix. In this example, matrix T is as follows: ⎛ ⎞ 0.80 0.15 0.15 ⎝0.05 0.65 0.10⎠ . 0.15 0.20 0.75 To get the percentage of customers of firm 1 after the course of the year, we have to compute s1∗ = 0.80s1 + 0.15s2 + 0.15s3 . Similarly, we can compute the values s2∗ and s3∗ , and we find that vector s∗ is obtained as the product of matrix T and vector s: ⎛ ⎞⎛ ⎞ ⎛ ⎞ 0.80 0.15 0.15 0.25 0.3125 ∗ s = T s = ⎝0.05 0.65 0.10⎠ ⎝ 0.55 ⎠ = ⎝ 0.3900 ⎠ . 0.15 0.20 0.75 0.20 0.2975 Hence, after one year, firm 1 has 31.25 per cent of the customers, firm 2 has 39 per cent and firm 3 has 29.75 per cent. Example 7.6 Consider again the data given in Example 7.1. Introducing matrix RS(3,4) as the matrix giving the raw material requirements for the intermediate products as in Table 7.1 F and matrix S(4,2) as the matrix of the intermediate product requirements for the final products as in Table 7.2, we get the raw material requirements for the final products described by matrix RF(3,2) by matrix multiplication: F RF(3,2) = RS(3,4) · S(4,2) . Matrices and determinants 263 S x(4,1) F x(2,1) Let vectors and give the number of units of each of the intermediate and final products, respectively, where the ith component refers to the ith product. Then we obtain the vector y of the total raw material requirements as follows: S F y(3,1) = y(3,1) + y(3,1) S F = RS(3,4) · x(4,1) + RF(3,2) · x(2,1) S F F + RS(3,4) · S(4,2) · x(2,1) . = RS(3,4) · x(4,1) The indicated orders of the matrices confirm that all the products and sums are defined. We now return to transposes of matrices and summarize the following rules, where A and B are m × n matrices, C is an n × p matrix and λ ∈ R. Rules for transposes of matrices (1) (2) (3) (4) (AT )T = A; (A + B)T = AT + BT , (λA)T = λAT ; (AC)T = C T AT . Definition 7.12 (A − B)T = AT − BT ; A matrix A of order n × n is said to be orthogonal if AT A = I . As a consequence of Definition 7.12, we find that in an orthogonal matrix A, the scalar product of the ith row vector and the jth column vector with i = j is equal to zero, i.e. these vectors are orthogonal (cf. Chapte 6.2). Example 7.7 Matrix 001e A= √ 001f − 12 3 1 2 1 2 √ 3 1 2 is orthogonal since 001e A A= T 1 2 √ 1 −2 3 1 2 √ 001f001e 3 1 2 √ 001f − 12 3 1 2 √ 1 2 3 1 2 0003 = 1 0 0004 0 = I. 1 7.3 DETERMINANTS Determinants can be used to answer the question of whether the inverse of a matrix exists and to find such an inverse matrix. They can be used e.g. as a tool for solving systems of 264 Matrices and determinants linear equations (this topic is discussed in detail in Chapter 8) or for finding eigenvalues (see Chapter 10). Let a11 ⎜a21 ⎜ A = ⎜. ⎝. a12 a22 . . ⎞ . . . a1n . . . a2n ⎟ ⎟ . ⎟ . ⎠ an1 an2 . . . ann ⎛ be a square matrix and Aij denote the submatrix obtained from A by deleting the ith row and jth column. It is clear that Aij is a square matrix of order (n − 1) × (n − 1). Definition 7.13 The determinant of a matrix A of order n × n with numbers as elements is a number, assigned to matrix A by the following rule: det A = |A| = n 0006 (−1) j+1 · a1j · |A1j |. j=1 For n = 1, we define |A| = a11 . Whereas a matrix of order m × n is a rectangular array of m · n elements, determinants are defined only for square matrices and in contrast to matrices, a determinant is a number provided that the elements of the matrix are numbers as well. According to Definition 7.13, a determinant of a matrix of order n × n can be found by means of n determinants of matrices of order (n − 1) × (n − 1). The rule given in Definition 7.13 can also be applied when the elements of the matrices are e.g. functions or mathematical terms. For n = 2 and matrix 0003 0004 a a12 A = 11 , a21 a22 we get |A| = a11 a22 − a12 a21 . For n = 3 and matrix ⎞ ⎛ a11 a12 a13 A = ⎝a21 a22 a23 ⎠ , a31 a32 a33 we get |A| = a11 · |A11 | − a12 · |A12 | + a13 · |A13 | 0010 0010 0010 0010 0010 0010a21 a23 0010 0010a21 0010a a23 00100010 0010 0010 0010 = a11 · 00100010 22 − a + a · · 12 0010 13 0010 0010 0010 a32 a33 a31 a33 a31 0010 a22 00100010 a32 0010 Matrices and determinants 265 = a11 (a22 a33 − a32 a23 ) − a12 (a21 a33 − a31 a23 ) + a13 (a21 a32 − a31 a22 ) = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32 − a12 a21 a33 − a13 a22 a31 . The latter computations for n = 3 can be done as follows. We add the first two columns at the end as fourth and fifth columns. Then we compute the products of the three diagonals from the left top to the right bottom and add them, and from this value we subtract the sum of the products of the three diagonals from the left bottom to the right top. This procedure is known as Sarrus’s rule (see Figure 7.1) and works only for the case n = 3. Determinants of square submatrices are called minors. The order of a minor is determined by its number of rows (or columns). A minor |Aij | multiplied by (−1)i+j is called a cofactor. The following theorem gives, in addition to Definition 7.13, an alternative way of finding the determinant of a matrix of order n × n. Figure 7.1 Sarrus’s rule. THEOREM 7.2 (Laplace’s theorem, cofactor expansion of a determinant) Let A be a matrix of order n × n. Then the determinant of matrix A is equal to the sum of the products of the elements of one row or column with the corresponding cofactors, i.e. |A| = n 0006 (−1)i+j · aij · |Aij | (expansion of a determinant by row i) (−1)i+j · aij · |Aij | (expansion of a determinant by column j). j=1 and |A| = n 0006 i=1 Theorem 7.2 contains Definition 7.13 as a special case. While Definition 7.13 requires cofactor expansion by the first row to evaluate the determinant of matrix A, Theorem 7.2 indicates that we can choose one arbitrary row or column of matrix A to which we apply cofactor expansion. Therefore, computations are simplified if we choose one row or column with many zeroes in matrix A. 266 Matrices and determinants Example 7.8 We evaluate the determinant of matrix ⎛ ⎞ 2 3 5 0 2⎠ A=⎝ 1 −1 −4 2 by applying Theorem 7.2 and performing cofactor expansion by the second column. We get 0010 0010 0010 0010 0010 0010 0010 1 20010 0010 0010 0010 0010 0010 + (−1)4 · 0 · 0010 2 50010 + (−1)5 · (−4) · 00102 50010 |A| = (−1)3 · 3 · 00100010 0010 0010 0010 0010 −1 2 −1 2 1 20010 = (−3) · [2 − (−2)] + 0 + 4 · (4 − 5) = −12 + 0 − 4 = −16. According to Theorem 7.2, the first determinant of order 2 × 2 on the right-hand side of the first row above is the minor |A12 | obtained by crossing out in matrix A the first row and the second column, i.e. 0010 0010 0010 0010 0010a a23 00100010 00100010 1 200100010 = . |A12 | = 00100010 21 a31 a33 0010 0010−1 20010 Accordingly, the other minors A22 and A32 are obtained by crossing out the second column as well as the second and third rows, respectively. We now give some properties of determinants. THEOREM 7.3 Let A be an n × n matrix. Then |A| = |AT |. This is a consequence of Theorem 7.2 since we can apply cofactor expansion by the elements of either one row or one column. Therefore, the determinant of matrix A and the determinant of the transpose AT are always equal. In the case of a triangular matrix, we can easily evaluate the determinant, as the following theorem shows. THEOREM 7.4 Let A be an n × n (lower or upper) triangular matrix. Then |A| = a11 · a22 · . . . · ann = n aii . i=1 As a corollary of Theorem 7.4, we find that the determinant of an identity matrix I is equal to one, i.e. |I | = 1. If we evaluate a determinant using Theorem 7.2, it is desirable that the determinant has an appropriate structure, e.g. computations are simplified if many elements of one row or of one column are equal to zero. For this reason, we are looking for some rules that allow us to evaluate a determinant in an easier form. THEOREM 7.5 Let A be an n × n matrix. Then: (1) If we interchange in A two rows (or two columns), then we get for the resulting matrix A∗ : |A∗ | = −|A|. Matrices and determinants 267 (2) If we multiply all elements of a row (or all elements of a column) by λ ∈ R, then we get for the resulting matrix A∗ : |A∗ | = λ · |A|. (3) If we add to all elements of a row (or to all elements of a column) λ times the corresponding elements of another row (column), then we get for the resulting matrix A∗ : |A∗ | = |A|. COROLLARY 7.1 For the n × n matrix B = λA, we obtain: |B| = |λA| = λn · |A|. The latter corollary is obtained by a repeated application of part (2) of Theorem 7.5. THEOREM 7.6 Let A and B be matrices of order n × n. Then |AB| = |A| · |B|. It is worth noting that in general |A + B| = |A| + |B|. Next, we consider two examples of evaluating determinants. Example 7.9 We evaluate the determinant of matrix ⎛ 1 −2 ⎜−2 5 ⎜ A=⎝ 3 −4 4 −11 ⎞ 4 3 −6 4⎟ ⎟. 16 5⎠ 20 10 We apply Theorem 7.5 to generate a determinant having the same value, in which all elements are equal to zero below the diagonal: 0010 0010 0010 0010 0010 0010 1 −2 300100010 001000101 −2 4 300100010 001000101 −2 4 0010 0010−2 1 1 2 1000100010 001000100 5 −6 400100010 001000100 |A| = 00100010 0010 = 00100 0010 = 00100 2 4 −4 0 3 −4 16 5 0010 0010 0010 0010 0010 0010 4 −11 20 100010 00100 −3 4 −20010 00100 0 0010 0010 00101 −2 4 300100010 0010 00100 0010 1 2 10 0010 = 240. = − 00100010 0 10 2800100010 00100 00100 0 0 −240010 4 2 0 10 0010 300100010 1000100010 −2400100010 280010 In the first transformation step, we have generated zeroes in rows 2 to 4 of the first column. To this end, we have multiplied the first row by 2 and added it to the second row, yielding the new second row. Analogously, we have multiplied the first row by −3 and added it to the third row, and we have multiplied the first row by −4 and added it to the fourth row. In the next transformation step, we have generated zeroes in rows 3 and 4 of the second column. This means we have multiplied the second row (of the second determinant) by −2 and added it to the third row, and we have multiplied the second row by 3 and added it to the fourth row (application of part (3) of Theorem 7.5). Additionally, we have interchanged rows 3 and 4, which changes the sign of the determinant. Finally, we applied Theorem 7.4. 268 Matrices and determinants Example 7.10 We want to determine for which values of t the determinant 0010 0010 0010 3 1 200100010 0010 0 400100010 |A| = 001000102 + 2t 0010 1 2 − t 00010 is equal to zero. We first apply expansion by column 3 according to Theorem 7.2 and obtain 0010 0010 0010 0010 0010 0010 00103 0010 3 00102 + 2t 100100010 0 00100010 1 00100010 0010 0010 0010 |A| = 2 · 0010 −4·0010 +0·0010 2 + 2t 00010 1 2 − t0010 1 2 − t0010 = 2 · [(2 + 2t) · (2 − t)] − 4 · (6 − 3t − 1) = −4t 2 + 16t − 12. From |A| = 0, we obtain −4t 2 + 16t − 12 = 0 which corresponds to t 2 − 4t + 3 = 0. This quadratic equation has the two real roots t1 = 1 and t2 = 3. Thus, for t1 = 1 and t2 = 3, we get |A| = 0. To find the value of |A|, we did not apply Theorem 7.5. Using Theorem 7.5, we can transform the determinant such that we have many zeroes in one row or column (which simplifies our remaining computations when applying Theorem 7.2). Multiplying each element of row 1 in the initial determinant by −2 and adding each element to the corresponding element of row 2, we obtain 0010 0010 0010 3 1 200100010 0010 |A| = 00100010−4 + 2t −2 000100010 . 0010 1 2 − t 00010 In this case, when expanding by column 3, we have to determine the value of only one subdeterminant. We get 0010 0010 0010−4 + 2t −2 0010 0010 0010 |A| = 2 0010 1 2 − t0010 which is equal to the already obtained value −4t 2 + 16t − 12. The following theorem presents some cases when the determinant of a matrix A is equal to zero. THEOREM 7.7 Let A be a matrix of order n × n. Assume that one of the following propositions holds: (1) Two rows (columns) are equal. (2) All elements of a row (column) of A are equal to zero. (3) A row (column) is the sum of multiples of other rows (columns). Then |A| = 0. Matrices and determinants 269 We next introduce the notions of a singular and a regular matrix. Definition 7.14 A square matrix A is said to be singular if |A| = 0 and regular (or non-singular) if |A| = 0. We now consider a first possibility to solve special systems of linear equations. This approach is named after the German mathematician Cramer and uses determinants to find the values of the variables. Cramer’s rule Let A(n,n) · x(n,1) = b(n,1) be a system of linear equations, i.e. a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 . . . . . . an1 x1 + an2 x2 + · · · + ann xn = bn , and we assume that A is regular (i.e. |A| = 0). Moreover, let Aj (b) denote the matrix which is obtained if the jth column of A is replaced by vector b, i.e. 0010 0010a11 0010 0010a21 0010 |Aj (b)| = 0010 . 0010 . 0010 0010an1 a12 a22 . . . . . a1, j−1 . . . a2, j−1 . . b1 b2 . . a1, j+1 a2, j+1 . . an2 . . . an, j−1 bn an, j+1 0010 . . . a1n 00100010 . . . a2n 00100010 . 0010 . . 00100010 . . . ann 0010 Then xj = |Aj (b)| |A| for j = 1, 2, . . . , n is the unique solution of the system Ax = b of linear equations. Cramer’s rule makes it possible to solve special systems of linear equations. However, this rule is appropriate only when the determinant of matrix A is different from zero (and thus a unique solution of the system of linear equations exists). It is also a disadvantage of this method that, if we obtain |A| = 0, we must stop our computations and we have to apply some more general method for solving systems of linear equations, as we will discuss in Chapter 8. Moreover, from a practical point of view, Cramer’s rule is applicable only in the case of a rather small number of n. 270 Matrices and determinants Example 7.11 Consider the system of linear equations 3x1 x1 2x1 + − 4x2 x2 + − + 2x3 3x3 x3 =1 =7 =4 which we solve by applying Cramer’s rule. We first evaluate the determinant of matrix ⎛ 3 A = ⎝1 2 ⎞ 4 2 −1 −3⎠ . 0 1 Using expansion by row 3 according to Theorem 7.2, we obtain 0010 00103 4 0010 |A| = 001000101 −1 00102 0 0010 0010 200100010 0010 4 −300100010 = 2 · 00100010 −1 0010 1 0010 0010 0010 0010 00103 00103 200100010 200100010 0010 0010 − 0 · + 1 · 00101 −30010 00101 −30010 0010 400100010 −10010 = 2 · (−12 + 2) − 0 + 1 · (−3 − 4) = −27. In the above computations, we have decided to expand by row 3 since there is already one zero contained in this row and therefore we have to evaluate only two minors of order two. For this reason, one could also choose expansion by column 2. Since |A| = 0, we know now that the given system has a unique solution which can be found by Cramer’s rule. Continuing, we get 0010 00101 0010 |A1 (b)| = 001000107 00104 0010 00103 0010 |A2 (b)| = 001000101 00102 4 −1 0 1 7 4 0010 0010 200100010 0010 4 −300100010 = 4 · 00100010 −1 10010 0010 0010 00101 200100010 0010 + 1 · 00107 −30010 0010 400100010 −10010 = 4 · (−12 + 2) + 1 · (−1 − 28) = −69; 0010 0010 0010 0010 0010 0010 20010 001000100 −20 1100100010 0010−20 110010 0010 0010 0010 0010 0010 −30010 = 00101 7 −30010 = −1 · 0010 −10 70010 10010 00100 −10 70010 = −1 · (−140 + 110) = 30 and 0010 00103 0010 |A3 (b)| = 001000101 00102 4 −1 0 0010 0010 100100010 00101 0010 70010 = −4 · 00100010 2 40010 0010 0010 00103 700100010 + (−1) · 00100010 40010 2 0010 100100010 40010 = −4 · (4 − 14) − 1 · (12 − 2) = 30. For finding |A1 (b)| and |A3 (b)|, we have used Theorem 7.2. In the former case, we have again applied expansion by row 3, and in the latter case, we have applied expansion by column 2. For finding |A2 (b)|, we have first used Theorem 7.5, part (3). Since there are no zero elements, we have transformed the determinant such that in one column or row (in our case column 1) all but one elements are equal to zero so that the application of Theorem 7.2 Matrices and determinants 271 is reduced to finding the value of one minor of order two. By Cramer’s rule, we get 69 23 |A1 (b)| ; = = 9 |A| 27 |A2 (b)| 30 10 x2 = =− =− ; |A| 27 9 |A3 (b)| 30 10 x3 = =− =− . 9 |A| 27 x1 = 7.4 LINEAR MAPPINGS Definition 7.15 A mapping A : Rn → Rm is called linear if A(x1 + x2 ) = A(x1 ) + A(x2 ) for all x1 , x2 ∈ Rn and A(λx) = λA(x) for all λ ∈ R and x ∈ Rn . A linear mapping is therefore defined in such a way that the image of the sum of two vectors is equal to the (vector) sum of the two images, and the image of the multiple of a vector is equal to the multiple of the image of the vector. A linear mapping A : Rn → Rm can be described by means of a matrix A = (aij ) of order m × n such that ⎛ ⎞ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ x1 a11 a12 · · · a1n x1 y1 ⎜ x2 ⎟ ⎜ y2 ⎟ ⎜ a21 a22 · · · a2n ⎟ ⎜ x2 ⎟ ⎜ ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ m x = ⎜ . ⎟ ∈ Rn 001c −→ y = ⎜ . ⎟ = ⎜ . . . ⎟ · ⎜ . ⎟ ∈ R ⎝ . ⎠ ⎝ . ⎠ ⎝ . . . ⎠ ⎝ . ⎠ xn ym am1 am2 · · · amn xn Definition 7.16 The set of all n-dimensional vectors x which are mapped by A : Rn → Rm into the m-dimensional zero vector 0 is called the kernel of the mapping, abbreviated ker A, i.e. ker A = {x(n,1) ∈ Rn | A(m,n) · x(n,1) = 0(m,1) }. The kernel of a linear mapping is also called null space. Determining the kernel of a linear mapping requires the solution of a system of linear equations with the components of vector x as unknowns, which we will treat in detail in Chapter 8. The following theorem shows how a composition of two linear mappings can be described by a matrix. 272 Matrices and determinants THEOREM 7.8 Let B : Rn → Rs and A : Rs → Rm be linear mappings. Then the composite mapping A ◦ B : Rn → Rm is a linear mapping described by matrix C(m,n) = A(m,s) · B(s,n) . Example 7.12 Assume that a firm produces by means of q raw materials R1 , R2 , . . . , Rq the m intermediate products S1 , S2 , . . . , Sm , and with these intermediate products and with the q raw materials the n final products F1 , F2 , . . . , Fn . Denote by rijS F sjk F rik the number of units of raw material Ri which are necessary for the production of one unit of intermediate product Sj , the number of units of intermediate product Sj which are necessary for the production of one unit of final product Fk , the number of units of raw material Ri which are additionally necessary for the production of one unit of final product Fk . We introduce the matrices RS = (rijS ) of order q × m, S F = (sijF ) of order m × n and RF = (rijF ) of order q × n and denote by xF = (x1F , x2F , . . . , xnF )T the production vector of S ) the production vector of the intermediate the final products and by xS = (x1S , x2S , . . . , xm products. We want to determine the required vector y of raw materials. First, raw materials according to the matrix equation 1 F y(q,1) = RF(q,n) · x(n,1) are required for the final products. Moreover, we get for vector xS the following matrix equation: S F F x(m,1) = S(m,n) · x(n,1) , and for the production of intermediate products given by vector xS , the required vector y2 of raw materials is given by 2 S F F y(q,1) = RS(q,m) · x(m,1) = RS(q,m) · S(m,n) · x(n,1) . Thus, we get the following relationship between the q-vector y of required raw materials and the n-dimensional vector xF : 1 2 y(q,1) = y(q,1) + y(q,1) F , = (RF + RS · S F )(q,n) · x(n,1) q i.e. RF + RS ◦ S F represents a linear mapping from the n-space Rn+ into the q-space R+ . This linear mapping can be described in the following way: q xF ∈ Rn+ 001c−→ (RF + RS ◦ S F )(xF ) = RF · xF + RS · S F · xF = y ∈ R+ , i.e. by this linear mapping a feasible n-dimensional production vector of the final products is mapped into a q-dimensional vector of required raw materials. Matrices and determinants 273 Next, we introduce the inverse mapping of a linear mapping. THEOREM 7.9 n × n, i.e. ⎛ ⎜ ⎜ x=⎜ ⎝ x1 x2 . . Let A : Rn → Rn be a linear mapping described by a matrix A of order ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟ ∈ Rn 001c −→ Ax = y = ⎜ ⎠ ⎝ xn y1 y2 . . ⎞ ⎟ ⎟ ⎟ ∈ Rn , ⎠ yn and let matrix A be regular. Then there exists a unique inverse mapping A−1 such that ⎛ ⎜ ⎜ y=⎜ ⎝ y1 y2 . . ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟ ∈ Rn 001c −→ A−1 y = x = ⎜ ⎠ ⎝ yn x1 x2 . . ⎞ ⎟ ⎟ ⎟ ∈ Rn . ⎠ xn Obviously, the composite mapping A ◦ A−1 = A−1 ◦ A is the identical mapping I . 7.5 THE INVERSE MATRIX Definition 7.17 Given is a square matrix A. If there exists a matrix A−1 such that AA−1 = A−1 A = I , then we say that A−1 is an inverse or inverse matrix of A. We note that the inverse A−1 characterizes the inverse mapping of a linear mapping described by matrix A. The following theorem answers the question: under which condition does the inverse of a matrix A exist? THEOREM 7.10 Let A be a matrix of order n × n. Then: (1) If matrix A is regular, then there exists a unique inverse matrix A−1 . (2) If matrix A is singular, then A does not have an inverse. If the inverse A−1 of matrix A exists, we also say that matrix A is invertible. According to Theorem 7.10, a square matrix A is invertible if and only if |A| = 0. 274 Matrices and determinants Solving equations by matrix inversion Consider the matrix equations AX = B and YA = C, where matrix B of order n × m and matrix C of order m × n are given. The matrices X and Y are assumed to be unknown. From the above equations, it follows that matrix X has the order n × m and matrix Y has the order m × n. For |A| = 0, the inverse of matrix A exists and we get AX = B ⇐⇒ X = A−1 B; YA = C ⇐⇒ Y = CA−1 . The equations on the right-hand side are obtained by multiplying in the former case, equation AX = B from the left by A−1 and in the latter case, equation YA = C from the right by A−1 . Remember that matrix multiplication is not commutative. Example 7.13 Let the matrix equation 4X = X (2B − A) + 3(A + X ) be given, where A, B and X are n × n matrices. We solve the above equation for matrix X and obtain: 4X = 2XB − XA + 3A + 3X X − 2XB + XA = 3A X (I − 2B + A) = 3A X = 3A(I − 2B + A)−1 . In the second to last step, we have factored out matrix X from the left and we have used X = XI . Thus, if the inverse of matrix I − 2B + A exists, matrix X is uniquely determined. The following theorem presents a first possibility of computing the inverse of a matrix. THEOREM 7.11 given by Let A be a regular matrix of order n × n. Then the inverse matrix A−1 is ⎛ A−1 ⎞ +|A11 | −|A21 | + · · · ±|An1 | ⎟ 0012T 1 0011 1 ⎜ ⎜−|A12 | +|A22 | − · · · ∓|An2 |⎟ (−1)i+j · |Aij | = = ⎜ . . . ⎟ . |A| |A| ⎝ . . . ⎠ ±|A1n | ∓|A2n | ± · · · +|Ann | T The matrix (−1)i+j · |Aij | is the transpose of the matrix of the cofactors, which is called the adjoint of matrix A and denoted by adj(A). To determine the inverse of a matrix A of order n × n, the evaluation of a determinant of order n × n and of n2 minors of order n − 1 is required. Thus, with increasing order of n, the application of Theorem 7.11 becomes rather time-consuming. Matrices and determinants 275 Example 7.14 We consider the matrix ⎛ ⎞ 1 2 −1 0⎠ , A=⎝ 2 1 −1 0 1 and we want to determine the inverse A−1 of matrix A. In order to apply Theorem 7.2, we first evaluate the determinant of A: 0010 0010 0010 0010 0010 1 2 −10010 00101 2 −10010 0010 0010 0010 0010 0010 0010 00102 10010 0010 0010 0010 0010 0010 0010 = −4. 00010 = 00102 1 00010 = −1 0010 |A| = 0010 2 1 0 20010 0010−1 0 10010 00100 2 00010 In the above computations, we have first added rows 1 and 3 and then applied cofactor expansion by column 3. Calculating the minors, we obtain 0010 0010 0010 0010 0010 0010 00101 00010 0010 2 00010 0010 2 10010 0010 = 1; 0010 0010 0010 0010 |A11 | = 00100010 |A | = = 2; |A | = 12 13 0010−1 10010 0010−1 00010 = 1; 0 10010 0010 0010 0010 0010 0010 0010 00102 −10010 0010 0010 0010 0010 0010 = 2; |A22 | = 0010 1 −10010 = 0; |A23 | = 0010 1 20010 = 2; |A21 | = 00100010 0010−1 0010−1 00010 0 10010 10010 0010 0010 0010 0010 0010 0010 00102 −10010 0010 0010 00101 20010 0010 = 1; |A32 | = 00101 −10010 = 2; 0010 0010 |A31 | = 00100010 |A | = 33 00102 00102 10010 = −3. 1 00010 00010 With the above computations, we get the inverse matrix ⎞ ⎛ ⎛ |A31 | |A11 | −|A21 | 1 −2 1 ⎝ 1 −1 ⎝−2 −|A12 | |A22 | −|A32 |⎠ = 0 A = |A| −4 |A13 | −|A23 | |A33 | 1 −2 ⎛ ⎞ 1 1 1 − − ⎜ 4 2 4⎟ ⎜ ⎟ ⎜ 1 ⎟ 1 ⎟. =⎜ ⎜ 2 0 2⎟ ⎜ ⎟ ⎝ 1 1 3⎠ − 4 2 4 Example 7.15 ⎞ 1 −2⎠ −3 Consider the matrix equation AX = X − B with 0003 A= 3 a 10 6 0004 0003 and B= 1 −2 0004 0 , 1 where a ∈ R. We want to determine X provided that this matrix is uniquely determined. Replacing X by IX , where I is the identity matrix, and solving the above matrix equation for X , we obtain first (I − A)X = B 276 Matrices and determinants and then X = (I − A)−1 B with 0003 I −A= −2 −10 0004 −a . −5 To check whether the inverse of matrix I − A exists, we determine 0010 0010 0010 −2 −a0010 0010 = 10 − 10a = 10(1 − a). |I − A| = 00100010 −10 −50010 For a = 1, we have |I − A| = 0, and thus the inverse of I − A exists and the given matrix X is uniquely determined. On the contrary, for a = 1, the inverse of I − A does not exist, and thus matrix X is not uniquely determined. We continue and obtain for a = 1 0003 0004 1 −5 a (I − A)−1 = . 10(1 − a) 10 −2 By multiplying matrices (I − A)−1 and B, we finally obtain ⎛ ⎞ a 5 + 2a 0003 0004 − ⎜ 10(1 − a) 10(1 − a) ⎟ 1 −5 − 2a a ⎟. X = =⎜ ⎝ ⎠ 7 1 14 −2 10(1 − a) − 5(1 − a) 5(1 − a) We now summarize some rules for operating with the inverses of matrices, assuming that the matrices A and B are of order n × n and that the inverses A−1 and B−1 exist. Rules for calculations with inverses (1) (A−1 )−1 = A; (2) (AT )−1 = (A−1 )T ; (3) (AB)−1 = B−1 A−1 ; 1 (4) (λA)−1 = · A−1 λ 1 (5) |A−1 | = . |A| (λ ∈ R {0}); We prove the validity of rule (5). Using |I | = 1 and Theorem 7.6, we obtain 1 = |I | = |AA−1 | = |A| · |A−1 | from which rule (5) follows. As a generalization of rule (3), we obtain (An )−1 = (A−1 )n for all n ∈ N. We have seen that, if the order of the matrix is large, the determination of the inverse of the matrix can be rather time-consuming. In some cases, it is possible to apply an easier Matrices and determinants 277 approach. The following theorem treats such a case, which occurs (as we discuss later) in several economic applications. THEOREM 7.12 Let C be a triangular matrix of order n × n with cii = 0 for i = 1, 2, . . . , n and A = I − C, where I is the identity matrix of order n × n. Then the inverse A−1 is given by A−1 = (I − C)−1 = I + C + C 2 + · · · + C n−1 . The advantage of the formula presented in Theorem 7.12 over the formula given in Theorem 7.11 is that the determination of the inverse is done without using determinants. The latter formula uses only matrix multiplication and addition, and the matrices to be considered contain a lot of zeroes as elements. 7.6 AN ECONOMIC APPLICATION: INPUT–OUTPUT MODEL We finish this chapter with an important application of matrices in economics. Assume that we have a set of n firms each of them producing one good only. Production of each good j requires an input of aij units of good i per unit of good j produced. (The coefficients aij are also known as input–output coefficients.) Production takes place with fixed techniques (i.e. the values aij do not change). Let x = (x1 , x2 , . . . , xn )T be the vector giving the total amount of goods produced, let matrix A = (aij ) of order n × n be the so-called technology or input–output matrix and let y = (y1 , y2 , . . . , yn )T be the demand vector for the use of the n goods. Considering the ith good, there are aij xj units required as input for the production of xj units of good j, and yi units are required as final customer demand. Therefore, the amount xi of good i has to satisfy the equation xi = ai1 x1 + ai2 x2 + · · · + ain xn + yi , i = 1, 2, . . . , n. Expressing the latter n equations in matrix notation, we get the equation x = Ax + y which can be rewritten as I x − Ax = (I − A)x = y, where I is the identity matrix of order n × n. The above model expresses that vector x, giving the total output of goods produced, is equal to the sum of vector Ax describing the internal consumption of the goods and vector y representing the customer demand. The model is referred to as an input–output or Leontief model. The customer demand vector y is in general different from the zero vector, and in this case we have an open Leontief model. The equation (I − A)x = y represents a linear mapping Rn → Rn described by matrix I −A. If the total possible output x is known, we are interested in getting the possible amount y of goods left for the customer. Conversely, a customer demand vector y can be given and we ask for the total output vector x 278 Matrices and determinants required to satisfy the customer demand. In the latter case, we have to consider the inverse mapping x = (I − A)−1 y. So we have to determine the inverse of matrix I − A to answer this question in the latter case. Example 7.16 Consider a numerical example for an open input–output model. Let matrix A and vector y be given as follows: ⎛ ⎞ 1 1 0 ⎜5 5 ⎟ ⎛ ⎞ ⎜ ⎟ 2 ⎜2 3 1⎟ ⎟ A=⎜ and y = ⎝ 0 ⎠. ⎜5 5 5⎟ ⎜ ⎟ 1 ⎝3 2 1⎠ 5 5 5 Then we get ⎛ 4 ⎜ 5 ⎜ ⎜ 2 I −A=⎜ ⎜− 5 ⎜ ⎝ 3 − 5 1 5 2 5 2 − 5 − ⎞ 0 ⎟ ⎛ ⎟ 4 1 1⎟ = ⎝−2 − ⎟ ⎟ 5 ⎟ 5 −3 4⎠ 5 −1 2 −2 ⎞ 0 −1⎠ . 4 Setting ⎛ 4 B = ⎝−2 −3 −1 2 −2 ⎞ 0 −1⎠ , 4 we have I − A = B/5. Instead of inverting matrix I − A, we invert matrix B, which has only integers, and finally take into account that matrix equation (I − A)−1 = 5 · B−1 holds. First, we obtain |B| = 13 and thus the inverse of matrix B exists. Applying Theorem 7.11, we get ⎛ ⎞ 6 4 1 1 ⎝ 11 16 4⎠ . B−1 = 13 10 11 6 Then we get (I − A)−1 = 5B−1 ⎛ 6 5 ⎝ 11 = 13 10 ⎞ 4 1 16 4⎠ . 11 6 Finally, we obtain vector x as follows: ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 6 4 1 2 13 5 5 ⎝ 5 ⎝ 26 ⎠ = ⎝ 10 ⎠ . 11 16 4⎠ ⎝ 0 ⎠ = x= 13 10 11 6 13 1 26 10 Matrices and determinants 279 Figure 7.2 Relationships between raw materials and products in Example 7.17. Next, we consider a second example of this type of problem. Example 7.17 A firm produces by means of four raw materials R1 , R2 , R3 and R4 five products P1 , P2 , P3 , P4 and P5 , where some of these products are also used as intermediate products. The relationships are given in the graph presented in Figure 7.2. The numbers beside the arrows describe how many units of raw material Rk and product Pi , respectively, are necessary for one unit of Pj , j = 1, 2, . . . , 5. Vector x = (x1 , x2 , x3 , x4 , x5 )T describes the produced units (total output) of product Pi and y = (y1 , y2 , y3 , y4 , y5 )T denotes the final demand (export) for the output of the products Pi . (1) We first determine a relationship between vectors x and y. Let the technology matrix ⎛ 0 ⎜0 ⎜ A=⎜ ⎜0 ⎝0 0 1 0 0 0 0 0 0 0 0 0 be given. Then x = Ax + y or, correspondingly, (I − A)x = y. 0 1 2 0 0 ⎞ 1 2⎟ ⎟ 0⎟ ⎟ 1⎠ 0 280 Matrices and determinants In detail, we have ⎛ 1 −1 0 0 ⎜0 1 0 −1 ⎜ ⎜0 0 1 −2 ⎜ ⎝0 0 0 1 0 0 0 0 ⎞⎛ −1 ⎜ −2⎟ ⎟⎜ ⎜ 0⎟ ⎟⎜ −1⎠ ⎝ 1 x1 x2 x3 x4 x5 ⎞ ⎛ ⎞ y1 ⎟ ⎜ y2 ⎟ ⎟ ⎜ ⎟ ⎟ = ⎜ y3 ⎟ . ⎟ ⎜ ⎟ ⎠ ⎝ y4 ⎠ y5 (2) We now calculate the final demand when the total x = (220, 110, 120, 40, 20)T . We obtain ⎛ ⎞⎛ ⎞ ⎛ 1 −1 0 0 −1 220 ⎜0 ⎜ ⎟ ⎜ 1 0 −1 −2⎟ ⎜ ⎟ ⎜ 110 ⎟ ⎜ ⎜ ⎟ ⎜ 0 1 −2 0⎟ y = (I − A)x = ⎜ ⎜0 ⎟ ⎜ 120 ⎟ = ⎜ ⎝0 0 0 1 −1⎠ ⎝ 40 ⎠ ⎝ 0 0 0 0 1 20 output is given by 90 30 40 20 20 ⎞ ⎟ ⎟ ⎟. ⎟ ⎠ (3) Let y = (60, 30, 40, 10, 20)T be the given final demand vector. We determine the production vector x and the required units of raw materials for this case. To this end, we need the inverse of matrix I − A. Since I − A is an upper triangular matrix with all diagonal elements equal to zero, we can apply Theorem 7.12 for determining (I − A)−1 , and we obtain (I − A)−1 = I + A + A2 + A3 + A4 . Using ⎛ 0 ⎜0 ⎜ A2 = ⎜ ⎜0 ⎝0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 ⎛ ⎞ 2 1⎟ ⎟ 2⎟ ⎟, 0⎠ 0 0 ⎜0 ⎜ A3 = ⎜ ⎜0 ⎝0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ⎞ 1 0⎟ ⎟ 0⎟ ⎟ 0⎠ 0 we get that A4 is the zero matrix and therefore ⎛ (I − A)−1 1 ⎜0 ⎜ =⎜ ⎜0 ⎝0 0 1 1 0 0 0 0 0 1 0 0 ⎞ 4 3⎟ ⎟ 2⎟ ⎟. 1⎠ 1 1 1 2 1 0 Then we get ⎛ 1 ⎜0 ⎜ x = (I − A)−1 y = ⎜ ⎜0 ⎝0 0 1 1 0 0 0 0 0 1 0 0 1 1 2 1 0 ⎞⎛ 4 ⎜ 3⎟ ⎟⎜ ⎜ 2⎟ ⎟⎜ 1⎠ ⎝ 1 60 30 40 10 20 ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟=⎜ ⎟ ⎜ ⎠ ⎝ 180 100 100 30 20 ⎞ ⎟ ⎟ ⎟. ⎟ ⎠ Matrices and determinants 281 For the required vector of raw materials r = (r1 , r2 , r3 , r4 )T , where ri denotes the quantity of required units of raw material Ri , we obtain from Figure 7.2: r1 r2 r3 r4 = = = = 4x1 2x1 + 2x2 + x3 2x3 0 0 1 2 0 0 3 0 i.e. ⎞ ⎛ 4 r1 ⎜ r2 ⎟ ⎜2 ⎟ ⎜ r=⎜ ⎝ r3 ⎠ = ⎝0 0 r4 ⎛ 0 0 2 0 + ⎛ ⎞ 0 ⎜ ⎜ 3⎟ ⎟⎜ 0⎠ ⎜ ⎝ 0 + 3x5 3x4 x1 x2 x3 x4 x5 ⎞ ⎟ ⎟ ⎟. ⎟ ⎠ For the determined vector x = (180, 100, 100, 30, 20)T , we get the following vector r: ⎞ ⎛ ⎞ ⎛ 720 r1 ⎜ r2 ⎟ ⎜ 420 ⎟ ⎟ ⎜ ⎟ r=⎜ ⎝ r3 ⎠ = ⎝ 390 ⎠ . 200 r4 EXERCISES 7.1 Given are the matrices 0003 0004 3 4 1 A= ; 1 2 −2 0003 0004 1 0 −1 C= −1 1 1 ⎛ ⎞ 1 2⎠; 0 0003 3 1 D= 4 2 2 B=⎝ 1 −1 and 0004 0 . 2 (a) Find the transposes. Check whether some matrices are equal. (b) Calculate A + D, A − D, AT − B and C − D. (c) Find A + 3(BT − 2D). 7.2 Find a symmetric and an antisymmetric matrix so that their sum is equal to ⎛ ⎞ 2 −1 0 1 −1⎠ . A = ⎝1 3 0 −2 7.3 Calculate all defined products of matrices A and B: 0003 0004 0003 0004 4 2 3 5 −1 (a) A = ; B= ; 3 2 7 6 2 ⎛ ⎞ 1 2 6 0003 0004 ⎜4 2 3⎟ 4 3 5 3 ⎟ (b) A = ; B=⎜ ⎝4 5 2⎠; 2 5 0 1 3 4 5 282 Matrices and determinants ⎛ ⎞ −3 ⎜ 6⎟ ⎟ (c) A = 2 3 4 5 ; B = ⎜ ⎝−3⎠; 2 ⎛ 0003 0004 2 2 −1 3 (d) A = ; B=⎝ 1 −4 2 −6 −1 ⎛ ⎞ 0003 0004 x 2 3 1 (e) A = ; B = ⎝y⎠. 1 5 2 z 7.4 Use the matrices given in Exercise 7.3 (d) and verify the equalities AT BT = (BA)T 7.5 ⎞ 3 0⎠; −2 BT AT = (AB)T . and Given are the matrices ⎛ 1 A = ⎝7 5 ⎞ 0 −4⎠; 3 ⎛ ⎞ −1 ⎜ 2⎟ ⎟ B=⎜ ⎝ 0⎠ 7 0003 C= and −2 0 0 3 1 0 0004 0 . 6 (a) Find the dimension of a product of all three matrices if possible. (b) Test the associative law of multiplication with the given matrices. 7.6 Calculate all powers of the following matrices: ⎛ (a) 7.7 0 ⎜0 ⎜ A=⎝ 0 0 2 0 0 0 8 7 0 0 ⎞ 1 3⎟ ⎟; 5⎠ 0 0003 (b) B = cos α sin α 0004 sin α . − cos α A firm produces by means of two raw materials R1 and R2 three intermediate products S1 , S2 and S3 , and with these intermediate products two final products F1 and F2 . The numbers of units of R1 and R2 necessary for the production of 1 unit of Sj , j ∈ {1, 2, 3}, and the numbers of units of S1 , S2 and S3 necessary to produce 1 unit of F1 and F2 , are given in the following tables: R1 R2 S1 2 5 S2 3 4 S3 5 1 S1 S2 S3 F1 6 1 3 F2 0 4 2 Solve the following problems by means of matrix operations. (a) How many raw materials are required when 1,000 units of F1 and 2,000 units of F2 have to be produced? (b) The costs for one unit of raw material are 3 EUR for R1 and 5 EUR for R2 . Calculate the costs for intermediate and final products. Matrices and determinants 283 7.8 Given is the matrix ⎛ ⎞ 2 3 0 A = ⎝−1 2 4⎠ . 0 5 1 (a) (b) (c) (d) 7.9 Find the submatrices A12 and A22 . Calculate the minors |A12 | and |A22 |. Calculate the cofactors of the elements a11 , a21 , a31 of matrix A. Evaluate the determinant of matrix A. Evaluate the following determinants: 0010 0010 0010 0010 00101 0 1 20010 0010 0010 00101 2 0 00010 0010 0010 0010 2 1 60010 0010 0010 00102 1 2 00010 00101 2 3 40010 0010 0010 0010; 0010 0010 (c) (b) 00100010 (a) 00100010−1 0 300100010; 00104 3 2 10010; 0010 0010 0010 0010 3 2 90010 00100 2 1 20010 00102 1 4 00010 00100 0 2 10010 0010 0010 00103 3 3 · · · 3 30010 0010 0010 0010 0010 0010 0010 0010 0010−1 00102 7 4 10010 2 4 3 0010 00103 0 3 · · · 3 30010 0010 0010 0010 0010 0010 0010 2 −4 −8 −60010 00103 1 4 00010 0010; 0010; (f ) 001000103 3 0 · · · 3 300100010 (e) 00100010 (d) 00100010 . 0010 0010 7 1 5 0 5 1 0 0 0010 ......... 0010 0010 0010 0010 0010 0010 0010 0010 1 00102 0 0 00010 5 0 10010 00103 3 3 · · · 3 00010 (n,n) 7.10 Find the solutions x of the following equations: 0010 0010 0010 0010 0010x 1 0010−1 x x00100010 200100010 0010 0010 200100010 = 27; (b) 001000103 x −100100010 = 2. (a) 00100010 2 −1 0010 2 00104 x −20010 2 −10010 7.11 Find the solution of the following system of equations by Cramer’s rule: 2x1 3x1 −5x1 7.12 +4x2 −6x2 +8x2 = = = 1 −2. 4 Let A : R3 → R3 be a linear mapping described by matrix ⎛ 3 A = ⎝−1 4 7.13 +3x3 −2x3 +2x3 1 2 1 ⎞ 0 4⎠ . 5 Find the kernel of this mapping. Given are three linear mappings described by the following systems of equations: u1 u2 u3 = = = v1 2v1 −v1 −v2 −v2 +v2 +v3 −v3 +2v3 , 284 Matrices and determinants v1 v2 v3 = = = −w1 w1 w1 w2 w3 = = = x1 −x1 2x1 +w3 −w3 −2w3 , +2w2 w2 −x2 −2x2 −x3 +3x3 +x3 . Find the composite mapping x ∈ R3 001c → u ∈ R3 . 7.14 Find the inverse of each of the following matrices: ⎛ ⎞ 1 0 3 (a) A = ⎝4 1 2⎠; 0 1 1 ⎛ ⎞ 2 −3 1 4 −2⎠; (b) B = ⎝3 5 1 −1 ⎛ ⎞ 1 3 −2 4⎠; (c) C = ⎝0 2 0 0 −1 ⎛ ⎞ 1 0 −1 2 ⎜ 2 −1 −2 3⎟ ⎟. (d) D = ⎜ ⎝−1 2 2 −4⎠ 0 1 2 −5 7.15 Let ⎛ 1 ⎜0 ⎜ A=⎜ ⎜0 ⎝0 0 7.16 2 1 0 0 0 −1 0 2 3 1 −3 0 1 0 0 ⎞ 4 0⎟ ⎟ −1⎟ ⎟. 2⎠ 1 Find the inverse matrix by means of equality A = I − C. Given are the matrices 0003 A= −2 1 0004 5 −3 0003 and B= 1 −2 0004 4 . 9 Find (AB)−1 and B−1 A−1 . 7.17 Given are the following matrix equations: (a) (XA)T = B; (d) A(XB)−1 = C; Find matrix X . (b) XA = B − 2X ; (c) AXB = C; (e) C T XA + (X T C)T = I − 3C T X . Matrices and determinants 285 7.18 Given is an open input–output model (Leontief model) with ⎛ 0 ⎜0 A=⎜ ⎝0 0 ⎞ 0.2 0.1 0.3 0 0.2 0.5⎟ ⎟. 0 0 0⎠ 0.4 0 0 Let x be the total vector of goods produced and y the final demand vector. (a) Explain the economic meaning of the elements of A. (b) Find the linear mapping which maps all the vectors x into the set of all final demand vectors y. (c) Is a vector ⎛ ⎞ 100 ⎜200⎟ ⎟ x=⎜ ⎝200⎠ 400 of goods produced possible for some final demand vector y? (d) Find the inverse mapping of that obtained in (b) and interpret it economically. 7.19 A firm produces by means of three factors R1 , R2 and R3 five products P1 , P2 , . . . , P5 , where some of these products are also used as intermediate products. The relationships are given in the graph presented in Figure 7.3. The numbers beside the arrows describe how many units of Ri respectively Pi are necessary for one unit of Pj . Let pi denote Figure 7.3 Relationships between raw materials and products in Exercise 7.19. 286 Matrices and determinants the produced units (output) of Pi and qi denote the final demand for the output of Pi , i ∈ {1, 2, . . . , 5}. (a) Find a linear mapping p ∈ R5+ 001c → q ∈ R5+ . (b) Find the inverse mapping. (c) Let rT = (r1 , r2 , r3 ) be the vector which contains the required units of the factors R1 , R2 , R3 . Find a linear mapping q 001c → r. Calculate r when q = (50, 40, 30, 20, 10)T . 8 Linear equations and inequalities Many problems in economics can be modelled as a system of linear equations or a system of linear inequalities. In this chapter, we consider some basic properties of such systems and discuss general solution procedures. 8.1 SYSTEMS OF LINEAR EQUATIONS 8.1.1 Preliminaries At several points in previous chapters we have been confronted with systems of linear equations. For instance, deciding whether a set of given vectors is linearly dependent or linearly independent can be answered via the solution of such a system. The following example of determining feasible production programmes also leads to a system of linear equations. Example 8.1 Assume that a firm uses three raw materials R1 , R2 and R3 for the production of four goods G1 , G2 , G3 and G4 . There are the following amounts of raw materials available: 120 units of R1 , 150 units of R2 and 180 units of R3 . Table 8.1 gives the raw material requirements per unit of each good. Denote by xi the quantity of good Gi , i ∈ {1, 2, 3, 4}. We are interested in all possible production programmes which fully use the available amounts of the raw materials. Considering raw material R1 we get the following equation: 1x1 + 2x2 + 1x3 + 3x4 = 120. Table 8.1 Raw material requirements for the goods Gi , i ∈ {1, 2, 3, 4} Raw material R1 R2 R3 Goods G1 G2 G3 G4 1 2 1 2 0 4 1 3 2 3 1 4 288 Linear equations and inequalities Here 1x1 is the amount of raw material R1 necessary for the production of good G1 , 2x2 the amount of R1 for good G2 , 1x3 the amount of R1 for good G3 and 3x4 is the amount of R1 required for the production of good G4 . Since all 120 units of raw material R1 should be used, we have an equation. Similarly we obtain an equation for the consumption of the other two raw materials. Thus, we get the following system of three linear equations with four variables: x1 2x1 x1 + 2x2 + 4x2 + + + x3 3x3 2x3 + + + 3x4 x4 4x4 = = = 120 150 180 Moreover, we are interested only in solutions for which all values xi , i ∈ {1, 2, 3, 4}, are non-negative. Considering e.g. the production programme x1 = 40, x2 = 15, x3 = 20 and x4 = 10, we can easily check that all equations are satisfied, i.e. this production programme is feasible, and there exists at least one solution of this system of linear equations. In order to describe all feasible production programmes, we have to find all solutions satisfying the above equations such that all values xi , i ∈ {1, 2, 3, 4}, are non-negative. Often, it is not desired that all raw materials should necessarily be fully used. In the latter case it is required only that for each raw material Ri the available amount is not exceeded. Then all equality signs in the above equations have to be replaced by an inequality sign of the form ≤, and we obtain a system of linear inequalities which is discussed in Chapter 8.2. In the following, we discuss general methods for solving systems of linear equations. We answer the question of whether a system has a solution, whether an existing solution is uniquely determined and how the set of all solutions can be determined in the general case. Definition 8.1 The system a11 x1 + a12 x2 + · · · + a1n xn a21 x1 + a22 x2 + · · · + a2n xn . . = = . . b1 b2 . . (8.1) am1 x1 + am2 x2 + · · · + amn xn = bm is called a system of linear equations, where x1 , x2 , . . . , xn are the unknowns or variables, a11 , a12 , . . . , amn are the coefficients and b1 , b2 , . . . , bm are called the right-hand sides. As an abbreviation, system (8.1) can also be written in matrix representation: Ax = b, Linear equations and inequalities 289 where the left-hand side corresponds to the multiplication of a matrix of order m × n by a vector (matrix of order n × 1) resulting in the m-vector b, i.e. ⎛ a11 ⎜ a21 ⎜ ⎜ . ⎝ . a12 a22 . . ··· ··· am1 am2 · · · amn a1n a2n . . ⎞⎛ ⎟⎜ ⎟⎜ ⎟⎜ ⎠⎝ x1 x2 . . ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟=⎜ ⎠ ⎝ xn b1 b2 . . ⎞ ⎟ ⎟ ⎟. ⎠ bm Analogously, we can write a system of linear equations in vector representation as follows: n 0006 xj aj = b, j=1 where ⎛ ⎜ ⎜ aj = ⎜ ⎝ a1j a2j . . ⎞ ⎟ ⎟ ⎟, ⎠ j = 1, 2, . . . , n, amj are the column vectors of matrix A. The above left-hand side represents a linear combination of the column vectors of matrix A with the values of the variables as scalars, i.e.: ⎛ ⎜ ⎜ x1 ⎜ ⎝ a11 a21 . . am1 ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟ + x2 ⎜ ⎠ ⎝ a12 a22 . . ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟ + · · · + xn ⎜ ⎠ ⎝ am2 a1n a2n . . amn ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟ = ⎜ ⎠ ⎝ b1 b2 . . ⎞ ⎟ ⎟ ⎟. ⎠ bm Next, we introduce some basic notions. Definition 8.2 If we have bi = 0, i = 1, 2, . . . , m, in system (8.1), then this system is called homogeneous. If we have bk = 0 for at least one k ∈ {1, 2, . . . , m}, then system (8.1) is called non-homogeneous. Definition 8.3 A vector x = (x1 , x2 , . . . , xn )T which satisfies Ax = b is called a solution of system (8.1). The set S = {x ∈ Rn | Ax = b} is called the set of solutions or the general solution of system (8.1). 290 Linear equations and inequalities 8.1.2 Existence and uniqueness of a solution Next, we investigate in which cases system (8.1) has a solution. To this end, we introduce the notion of the rank of a matrix. For this purpose, the following property is useful. THEOREM 8.1 Let A be a matrix of order m × n. Then the maximum number of linearly independent column vectors of A coincides with the maximum number of linearly independent row vectors of A. Definition 8.4 Let A be a matrix of order m×n. The rank of matrix A, written r(A), is the maximum number p of linearly independent column (or according to Theorem 8.1 equivalently, row) vectors in A. If A is any zero matrix, we set r(A) = 0. As an obvious consequence of Definition 8.4, we obtain r(A) = p ≤ min{m, n}. The following theorem gives a first criterion to determine the rank of a matrix A. THEOREM 8.2 The rank r(A) of matrix A is equal to the order of the largest minor of A that is different from zero. We recall that a minor of a matrix A was defined as a determinant of a square submatrix of A. The above criterion can be used when transforming the determinant of a matrix A in such a way that the order of the largest minor can be easily obtained. Otherwise, it can be rather time-consuming to determine the rank of a matrix by applying Theorem 8.2. Consider the following two examples. Example 8.2 ⎛ 1 A = ⎝4 3 Let 2 6 2 ⎞ 0 2 ⎠. 4 We obtain |A| = 0 which means that matrix A cannot have rank three. However, e.g. for the minor obtained from matrix A by deleting the last row and column, we get 0010 0010 00101 2 0010 0010 = −2, |A33 | = 00100010 4 6 0010 i.e. there is a minor of order two which is different from zero, and thus matrix A has rank two. Example 8.3 Let us consider the matrix ⎛ ⎞ 1 − x −1 1 ⎝ 1 1−x 3 ⎠ A= 1 0 1 Linear equations and inequalities 291 with x ∈ R. We determine the rank of matrix A in dependence on the value of x ∈ R. Expanding |A| by the third row, we get 0010 0010 0010 00101 − x −1 0010 1 00100010 0010 0010 + 1 · 0010 1 3 0010 1−x 0010 000e 000f = −3 − (1 − x) + (1 − x)(1 − x) + 1 0010 0010 −1 |A| = 1 · 00100010 1−x = (−4 + x) + (x2 − 2x + 2) = x2 − x − 2. We determine the roots of the equation x2 − x − 2 = 0 and obtain x1 = −1 and x2 = 2. Thus, we get |A| = 0 for the case x = −1 and x = 2, i.e. due to Theorem 8.2, we have r(A) = 3 for x ∈ R {−1, 2}. For x ∈ {−1, 2}, we obtain for the minor formed by rows 1 and 3 as well as columns 2 and 3 0010 0010 0010−1 1 0010 0010 = −1 = 0. |A21 | = 00100010 0 1 0010 Thus, we obtain r(A) = 2 for the case x = −1 and for the case x = 2 because the order of the largest minor with a value different from zero is two. We define the m × (n + 1) augmented matrix ⎞ ⎛ a11 a12 · · · a1n b1 ⎜ a21 a22 . . . a2n b2 ⎟ ⎟ ⎜ Ab = (A | b) = ⎜ . . . . ⎟ ⎝ . . . . ⎠ Definition 8.5 am1 am2 .. amn bm as the coefficient matrix A expanded by an additional column containing vector b of the right-hand side. Obviously, we have r(A) ≤ r(Ab ) since matrix Ab contains an additional column vector in comparison with matrix A. Moreover, since the augmented matrix differs from the coefficient matrix A by exactly one column, there are only two cases possible: either r(Ab ) = r(A) or r(Ab ) = r(A) + 1. Definition 8.6 If system (8.1) has at least one solution, it is said to be consistent. If this system has no solution, it is said to be inconsistent. Next, we present a necessary and sufficient condition for the case where a system of linear equations has at least one solution. 292 Linear equations and inequalities THEOREM 8.3 System (8.1) is consistent if and only if the rank of the coefficient matrix A is equal to the rank of the augmented matrix Ab = (A | b), i.e. system Ax = b is consistent ⇐⇒ r(A) = r(Ab ). Since for a homogeneous system of equations, the augmented matrix contains matrix A plus an additional zero (column) vector, the number of linearly independent (column or row) vectors of the augmented matrix is always equal to the number of linearly independent vectors of matrix A. This leads to the following corollary. COROLLARY 8.1 A homogeneous system Ax = 0 is always consistent. Indeed, for system Ax = 0 we have r(A) = r(Ab ). We can note that a homogeneous system has at least the so-called trivial solution xT = (x1 , x2 , . . . , xn ) = (0, 0, . . . , 0). Next, we deal with the following question. If system (8.1) is consistent, when is the solution uniquely determined? An answer is given by the following theorem. THEOREM 8.4 Consider the system Ax = b of linear equations, where A is a matrix of order m × n, and let this system be consistent. Then: (1) If r(A) = r(Ab ) = n, then solution x = (x1 , x2 , . . . , xn )T is uniquely determined. (2) If r(A) = r(Ab ) = p < n, then there exist infinitely many solutions. In this case, the set of solutions forms an (n − p)-dimensional vector space. In case (2) of Theorem 8.4 we say that the set of solutions has dimension n−p. Let us consider part (2) of Theorem 8.4 in a bit more detail. In this case, we can select n − p variables that can be chosen freely. Having their values fixed, the remaining variables are uniquely determined. We denote the n − p arbitrarily chosen variables as free variables, and we say that the system of linear equations has n − p degrees of freedom. 8.1.3 Elementary transformation; solution procedures Solution procedures for systems of linear equations transform the given system into a ‘system with easier structure’. The following theorem characterizes some transformations of a given system of linear equations such that the set of solutions does not change. THEOREM 8.5 The set of solutions of system (8.1) does not change if one of the following transformations is applied: (1) An equation is multiplied by a number λ = 0 or it is divided by a number λ = 0. (2) Two equations are interchanged. (3) A multiple of one equation is added to another equation. Operations (1) to (3) are called elementary or equivalent transformations. By such elementary transformations, the rank r(A) of a matrix A does not change either (see also rules for evaluating determinants given in Chapter 7.3). Finally, we introduce a special form of a system of linear equations and a solution as follows. Linear equations and inequalities 293 Definition 8.7 A system Ax = b of p = r(A) linear equations, where in each equation one variable occurs only in this equation and it has the coefficient +1, is called a system of linear equations in canonical form. These eliminated variables are called basic variables (bv), while the remaining variables are called non-basic variables (nbv). Hence the number of basic variables of a system of linear equations in canonical form is equal to the rank of matrix A. As a consequence of Definition 8.7, if a system of linear equations Ax = b is given in canonical form, the coefficient matrix A always contains an identity matrix. If r(A) = p = n, the identity matrix I is of order n × n, i.e. the system has the form I xB = b, where xB is the vector of the basic variables. (Note that columns might have been interchanged in matrix A to get the identity matrix, which means that the order of the variables in vector xB is different from that in vector x.) If r(A) = p < n, the order of the identity submatrix is p × p. In the latter case, the system can be written as I xB + AN xN = b, where xB is the p-vector of the basic variables, xN is the (n − p)-vector of the non-basic variables and AN is the submatrix of A formed by the column vectors belonging to the nonbasic variables. (Again column interchanges in matrix A might have been applied.) This canonical form, from which the general solution can easily be derived, is used in one of the solution procedures described in this subsection. Definition 8.8 A solution x of a system of equations Ax = b in canonical form, where each non-basic variable has the value zero, is called a basic solution. Thus, if matrix A is of order p × n with r(A) = p < n, then at least n − p variables are equal to zero in a basic solution of the system Ax = b. The number of possible basic solutions of a given system of linear equations is determined by the number of different possibilities of choosing p basic variables. That is, one has to find among the column vectors of matrix A all possibilities of p linearly independent vectors belonging to the p basic variables. There exist at most pn basic solutions (see Chapter 1.3 on Combinatorics). One method of solving systems of linear equations has already been discussed in Chapter 7.3, but remember that Cramer’s rule is applicable only in special cases. The usual methods of solving systems of linear equations apply elementary transformations mentioned in Theorem 8.5 to transform the given system into a form from which the solution can be easily obtained. The methods typically used transform the original system into either (1) a canonical form according to Definition 8.7 ( pivoting procedure or Gauss–Jordan elimination) or into (2) a ‘triangular’ or echelon form (Gaussian elimination). 294 Linear equations and inequalities It is worth noting that the notation for both procedures in the literature is not used in a standard way; in particular Gaussian elimination is often also denoted as pivoting. The reason is that both procedures are variants of the same strategy: simplify the given system of linear equations in such a way that the solution can be easily obtained from the final form of the system. We now discuss both methods in detail. Pivoting First, we discuss the pivoting procedure. The transformation of the original system into a canonical form (possibly including less than m equations) is based on the following theorem and the remark given below. THEOREM 8.6 Let Ax = b be a given system of m linear equations with n variables and r(A) = p < min{m, n}. Then the augmented matrix Ab = (A | b) can be transformed by applying Theorem 8.5 and column interchanges into the form ⎛ 1 ⎜0 ⎜ ⎜. ⎜. ⎜. ⎜ ⎜0 ∗ ∗ (A | b ) = ⎜ ⎜0 ⎜ ⎜0 ⎜ ⎜ . ⎝. 0 0 1 . . 0 0 0 . . .. 0 .. 0 .. .. 1 .. 0 .. 0 . . a∗1,p+1 a∗2,p+1 . . ∗ ap,p+1 0 0 . . 0 .. 0 0 . . . a∗1n . . . a∗2n . . ∗ . . . apn .. 0 .. 0 . . .. 0 b∗1 b∗2 . . b∗p b∗p+1 0 . . ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (8.2) 0 with b∗p+1 = 0 or b∗p+1 = 0. It is easy to see that the matrix A∗ given in (8.2) (and therefore also the original coefficient matrix A) have rank p. In terms of Theorem 8.2, this means that matrix A∗ has a minor of order p whose value is different from zero. This can easily be seen by taking the identity submatrix obtained in the left upper part (printed in bold face) whose determinant is equal to one. However, there is no minor of a larger order than p whose value is different from zero. (If we add one row and one column, the value of the determinant is equal to zero since there is one row containing only zero entries.) Notice also that the first p rows in representation (8.2) describe a system of linear equations in canonical form. Remark In the case when r(A) = p is not smaller than min{m, n}, we can transform matrix A into one of the three following forms (A∗ | b∗ ): (1) If m < n, then ⎛ 1 0 .. 0 ⎜0 1 . . . 0 ⎜ (A∗ | b∗ ) = ⎜ . . . ⎝ . . . 0 0 .. 1 a∗1,m+1 a∗2,m+1 . . ∗ am,m+1 a∗1n a∗2n . . ∗ . . . amn .. .. b∗1 b∗2 . . b∗m ⎞ ⎟ ⎟ ⎟. ⎠ (8.3) Linear equations and inequalities 295 (2) If m > n, then ⎛ 1 ⎜0 ⎜ ⎜ . ⎜. ⎜ ⎜0 ∗ ∗ (A | b ) = ⎜ ⎜0 ⎜ ⎜0 ⎜ ⎜. ⎝ . 0 0 .. 0 1 .. 0 . . . . 0 .. 1 0 .. 0 0 .. 0 . . . . 0 .. 0 ⎞ b∗1 b∗2 . . b∗n b∗n+1 0 . . ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (8.4) 0 with b∗n+1 = 0 or b∗n+1 = 0. (3) If m = n, then ⎛ 1 0 .. 0 ⎜0 1 . . . 0 ⎜ (A∗ | b∗ ) = ⎜ . . . ⎝ . . . 0 0 .. 1 b∗1 b∗2 . . b∗m ⎞ ⎟ ⎟ ⎟. ⎠ (8.5) Each of the matrices in (8.2) to (8.5) contains an identity submatrix. The order of this identity matrix is the rank of the coefficient matrix A originally given. In the transformed matrix A∗ , each column corresponds to some variable. Since column interchanges were allowed, we simply denote the variable belonging to the first column as the first basic variable xB1 , the variable belonging to the second column as the second basic variable xB2 and so on. (If no column interchanges were applied, we have the natural numbering x1 , x2 , . . . , xp .) Accordingly, we denote the variables belonging to the columns which do not form the identity submatrix as the non-basic variables xN 1 , xN 2 , . . . , xN ,n−p . We now discuss how the solutions can be found from the system in canonical form given by the first rows of matrices (8.2) to (8.5) (including the identity matrix). In the cases defined by matrices (8.3) and (8.5), there always exists a solution (r(A) = r(Ab ) = m). In particular, in the case of matrix (8.5) we have the unique solution xB1 = b∗1 , xB2 = b∗2 , .., xBn = b∗n . (Note that variables might have been interchanged.) In the case of matrix (8.3), we have n − m degrees of freedom, i.e. n − m variables belonging to columns m + 1, m + 2, . . . , n can be chosen arbitrarily. In the case of matrix (8.4), there exists a unique solution xB1 = b∗1 , xB2 = b∗2 , .., xBn = b∗n , provided that b∗n+1 = 0, otherwise the system has no solution. In the case of p < min{m, n} considered in Theorem 8.6, the system is consistent if and only if b∗p+1 = 0, and the system of the first p equations obtained by the transformation of matrix (A | b) into (A∗ | b∗ ) is a 296 Linear equations and inequalities system in canonical form which can be written as + + xB1 xB2 . . xBp + a∗1,p+1 xN 1 a∗2,p+1 xN 1 . . a∗p,p+1 xN 1 + ··· + + ··· + + ··· + a∗1n xN ,n−p a∗2n xN ,n−p . . a∗pn xN ,n−p = b∗1 = b∗2 . . = b∗p . r(A∗ ) Since r(A) = = p, there are n − p degrees of freedom, i.e. n − p variables can be chosen arbitrarily. We can rewrite the system in canonical form in terms of the basic variables as follows: xB1 = xB2 = . . xBp = b∗1 b∗2 . . b∗p − − a∗1,p+1 xN 1 a∗2,p+1 xN 1 − a∗p,p+1 xN 1 − ··· − − ··· − . . − ··· − a∗1n xN ,n−p a∗2n xN ,n−p . . a∗pn xN ,n−p xNj arbitrary for j = 1, 2, . . . , n − p. We emphasize that, if xB1 , xB2 , . . . , xBp are the basic variables, this means that the column vectors of matrix A belonging to these variables are linearly independent and form a basis of the vector space spanned by the corresponding p column vectors of matrix A. If we choose xNj = 0, j = 1, 2, . . . , n − p, we obtain the basic solution xB1 = b∗1 , xB2 = b∗2 , .., xBp = b∗p . Next, we discuss how we can transform the augmented matrix Ab = (A | b) into matrix (A∗ | b∗ ) by elementary transformations in a systematic way from which the canonical form can be established. These formulas correspond to those which have been presented in Chapter 6.4 when considering the replacement of one vector in a basis by another vector of a vector space. Assume that variable xl should become the basic variable in the kth equation of the system. This can be done only if akl = 0. The element akl is called the pivot or pivot element. Accordingly, row k is called the pivot row, and column l is called the pivot column. The transformation formulas distinguish between the pivot row and all remaining rows. They are as follows. Transformation formulas (1) Pivot row k: akj , akl bk bk = akl akj = j = 1, 2, . . . , n (2) Remaining rows i = 1, 2, . . . , m, i = k: ail · akj = aij − akj · ail , akl ail bi = bi − · bk = bi − bk · ail akl a¯ ij = aij − j = 1, 2, . . . , n Linear equations and inequalities 297 These transformations can be performed by using the following tableaus, where rows 1 to m in the first tableau give the initial system, and rows m + 1 up to 2m in the second tableau give the system of equations after the transformation. The column bv indicates the basic variables in the corresponding rows (i.e. after the pivoting transformation, xl is basic variable in the kth equation of the system, while we do not in general have basic variables in the other equations yet). The last column describes the operation which has to be done in order to get the corresponding row. Row bv x1 x2 ··· xl ··· xn b 1 2 . . k . . m − − a11 a21 . . ak1 . . am1 a12 a22 . . ak2 . . am2 ··· ··· a1l a2l . . akl . . aml ··· ··· a1n a2n . . akn . . amn b1 b2 . . bk . . bm m+1 m+2 . . m+k . . 2m − − a11 a21 . . ak1 . . am1 a12 a22 . . ak2 . . am2 ··· ··· 0 0 . . 1 . . 0 ··· ··· a1n a2n . . akn . . amn b1 b2 . . bk . . bm − − xl − ··· ··· ··· ··· ··· ··· ··· ··· Operation row 1 − row 2 − a1l akl row k a2l akl row k . . 1 akl row k . . row m − aaml row k kl Here all elements in the pivot column (except the pivot element) are equal to zero in the new tableau. After the transformation, we have the basic variable xl in row m + k which occurs with coefficient +1 now only in equation k of the system and the transformed equation can be found now in row m + k of the second tableau. To illustrate the computations in all rows except the pivot row, consider the determination of element a22 . We have to consider four elements: the number a22 on the original position, the element a2l of the second row standing in the pivot column, the element ak2 of the pivot row standing in the second column and the pivot element akl . (The corresponding elements are printed in bold face in the tableau.) These elements form a rectangle, and therefore the rule for computing the values in all rows except the pivot row in the new scheme is also known as the rectangle rule. Example 8.4 Let us consider the following system of linear equations: x1 3x1 4x1 2x1 + + + x2 x2 x2 + + + + 3x3 4x3 7x3 x3 − + + + 2x4 x4 x4 3x4 =5 =5 = 10 = 0. Applying the pivoting procedure, we get the following sequence of tableaus. (Hereafter, the pivot element is always printed in bold face, and the last column is added in order to describe 298 Linear equations and inequalities the operation that yields the corresponding row): Row bv x1 x2 x3 x4 Operation 1 2 3 4 − − − − 1 3 4 2 0 1 1 1 3 4 7 1 −2 1 1 3 5 5 10 0 5 6 7 8 x1 − − − 1 0 0 0 0 1 1 1 3 −5 −5 −5 −2 7 9 7 5 −10 −10 −10 row 1 row 2 − 3 row 1 row 3 − 4 row 1 row 4 − 2 row 1 9 10 11 12 x1 x2 − − 1 0 0 0 0 1 0 0 3 −5 0 0 −2 7 2 0 5 −10 0 0 row 5 row 6 row 7 − row 6 row 8 − row 6 13 14 15 16 x1 x2 x4 − 1 0 0 0 0 1 0 0 3 −5 0 0 0 0 1 0 5 −10 0 0 row 9 + row 11 row 10 − 72 row 11 1 row 11 2 row 12 b We avoid the interchange of the column of x3 with the column of x4 to get formally the structure given in Theorem 8.6. In the first two pivoting steps, we have always chosen element one as the pivot element, which ensures that all above tableaus contain only integers. In the second tableau (i.e. rows 5 up to 8), we could already delete one row since rows 6 and 8 are identical (therefore, these row vectors are linearly dependent). From the last tableau we also see that the rank of the coefficient matrix is equal to three (since we have a 3×3 identity submatrix, which means that the largest minor with a value different from zero is of order three). Since x1 , x2 and x4 are basic variables, we have found that the column vectors belonging to these variables (i.e. the first, second and fourth columns of A) are linearly independent and constitute a basis of the space generated by the column vectors of matrix A. From rows 13 to 16, we can rewrite the system in terms of the basic variables: x1 = 5 − 3x3 x2 = −10 + 5x3 x4 = 0 (x3 arbitrary). Setting now x3 = t with t ∈ R, we get the following set of solutions of the considered system: x1 = 5 − 3t; x2 = −10 + 5t; x3 = t; x4 = 0; t ∈ R. Since we know that one variable can be chosen arbitrarily, we have selected the non-basic variable x3 as the free variable. From the last tableau we see that we could not choose x4 as the free variable since it must have value zero in any solution of the given system of linear equations. However, we can easily see from the transformed system of linear equations, whether one or several variables are uniquely determined so that they cannot be taken as free variables. Linear equations and inequalities 299 In the latter example, there was (due to n − p = 1) one variable that can be chosen arbitrarily. In this case, we also say that we have a one-parametric set of solutions. Gaussian elimination Next, we discuss Gaussian elimination. This procedure is based on the following theorem. THEOREM 8.7 Let Ax = b be a given system of m linear equations with n variables and r(A) = p < min{m, n}. Then the augmented matrix Ab = (A | b) can be transformed by applying Theorem 8.5 and column interchanges into the form ⎛ ∗ a11 ⎜0 ⎜ ⎜ . ⎜ . ⎜ . ⎜ ⎜0 ∗ ∗ (A | b ) = ⎜ ⎜0 ⎜ ⎜0 ⎜ ⎜ . ⎝ . 0 ∗ a12 ∗ a22 . . 0 0 0 . . 0 ∗ . . . a1p ∗ . . . a2p . . ∗ . . . app .. 0 .. 0 . . .. 0 a∗1,p+1 a∗2,p+1 . . a∗p,p+1 0 0 . . 0 . . . a∗1n . . . a∗2n . . . . . a∗pn .. 0 .. 0 . . .. 0 b∗1 b∗2 . . b∗p b∗p+1 0 . . ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (8.6) 0 with a∗11 · a∗22 · . . . · a∗pp = 0 and b∗p+1 = 0 or b∗p+1 = 0. In terms of Theorem 8.2, the transformed matrix A∗ in Theorem 8.7 (and the original matrix A too) possesses a minor of order p whose value is different from zero. This is the minor formed by the first p rows and columns in representation (8.6). Since all diagonal elements are different from zero, the value of the determinant is equal to the product of these diagonal elements (see Theorem 7.4). However, matrix A∗ (and matrix A too) does not have a minor of order p + 1 which is different from zero (in each minor of matrix A∗ , there would be one row containing only zero entries, and by Theorem 7.7 in Chapter 7.3, this determinant is equal to zero). If p is not smaller than the minimum of m and n, we can transform the augmented matrix Ab similarly to Theorem 8.7, as described in the following remark. Remark In the case when rank r(A) = p is not smaller than min{m, n}, we can transform the augmented matrix Ab = (A | b) by elementary transformations and column interchanges to get one of the following special cases: (1) If m < n, then ⎛ ∗ a11 ⎜0 ⎜ (A∗ | b∗ ) = ⎜ . ⎝ . 0 ∗ a12 ∗ a22 . . 0 with a∗11 · a∗22 · . . . · a∗mm = 0. ∗ a1m ∗ a2m . . ∗ . . . amm .. .. a∗1,m+1 a∗2,m+1 . . ∗ am,m+1 a∗1n a∗2n . . . . . a∗mn .. .. b∗1 b∗2 . . b∗m ⎞ ⎟ ⎟ ⎟ ⎠ (8.7) 300 Linear equations and inequalities (2) If m > n, then ⎛ ∗ a11 ⎜0 ⎜ ⎜ . ⎜ . ⎜ ⎜0 (A∗ | b∗ ) = ⎜ ⎜0 ⎜ ⎜0 ⎜ ⎜ . ⎝ . 0 ∗ a12 ∗ a22 . . ∗ . . . a1n ∗ . . . a2n . . 0 0 0 . . ∗ . . . ann .. 0 .. 0 . . 0 .. 0 b∗1 b∗2 . . b∗n b∗n+1 0 . . ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (8.8) 0 with a∗11 · a∗22 · . . . · a∗nn = 0 and b∗n+1 = 0 or b∗n+1 = 0. (3) If m = n, then ⎞ ⎛ ∗ ∗ ∗ . . . a1n b∗1 a11 a12 ⎜ 0 a ∗ . . . a ∗ b∗ ⎟ 22 2n 2 ⎟ ⎜ (A∗ | b∗ ) = ⎜ . . . . ⎟ ⎝ . . . . ⎠ ∗ 0 0 . . . ann b∗n (8.9) with a∗11 · a∗22 · . . . · a∗nn = 0. Considering the largest order of a minor different from zero, we conclude that in case (1) above, matrix A has rank m, in case (2) matrix A has rank n and in case (3), matrix A has rank m = n. The discussion of the consistency of the system of linear equations and the selection of the free variables is the same as for the pivoting procedure. In all matrices (8.6) to (8.9), an upper triangular submatrix with non-zero entries on the diagonal is contained. We next describe the systematic generation of the triangular form given in Theorem 8.7 and the above remark. Assume that a11 = 0 (otherwise we interchange two rows or columns). Then we transform all equations except the first one, where the new kth equation, k ∈ {2, 3, . . . , m}, is obtained by multiplying the first equation by −ak1 /a11 and adding the resulting equation to the original kth equation. Element a11 is also denoted as the pivot or pivot element. This leads to the following system: a11 x1 + a12 x2 a22 x2 . . + ··· + + ··· + a1n xn a2n xn . . = b1 = b2 . . am2 x2 + ··· + amn xn = bm , where akj = akj − ak1 · a1j , a11 k = 2, 3, . . . , m, j = 2, 3, . . . , n. By the above transformation we have obtained a system where variable x1 occurs only in the first equation, i.e. all elements below the pivot element are now equal to zero. Now we apply this procedure to equations 2, 3, . . . , m provided that a22 = 0 (otherwise we interchange two rows or columns) and so on until a triangular form according to (8.6) to (8.9) has been obtained. Linear equations and inequalities 301 From the triangular system, we determine the values of the variables by ‘back substitution’ (i.e. we determine the value of one variable from the last equation, then the value of a second variable from the second to last equation and so on). It is worth noting that only the first equations belonging to the triangular submatrix are necessary to find the general solution, while the remaining equations are superfluous and can be skipped. We illustrate Gaussian elimination by the following examples. Example 8.5 Consider the following system of linear equations: x1 x1 4x1 + − + + + − x2 x2 6x2 x3 2x3 x3 =3 =2 = 9. Applying Gaussian elimination we obtain the following tableaus. Row x1 x2 x3 Operation 1 2 3 1 1 4 1 −1 6 1 2 −1 3 2 9 4 5 6 1 0 0 1 −2 2 1 1 −5 3 −1 −3 row 1 row 2 − row 1 row 3 − 4 row 1 7 8 9 1 0 0 1 −2 0 1 1 −4 3 −1 −4 row 4 row 5 row 6 + row 5 b From rows 7 to 9, we see that both the coefficient matrix A and the augmented matrix Ab have rank three. Therefore, the given system is consistent and has a unique solution. Moreover, we get the following triangular system: x1 + − x2 2x2 + + − x3 x3 4x3 =3 = −1 = −4. Applying back substitution, we get from the last equation x3 = 1. Then we obtain from the second equation −2x2 = −1 − x3 = −2 which yields x2 = 1, and finally from the first equation x1 = 3 − x2 − x3 = 3 − 1 − 1 which gives x1 = 1. 302 Linear equations and inequalities Example 8.6 We solve the following system of linear equations: x1 3x1 2x1 + + + x2 2x2 3x2 + + + x3 x3 4x3 =2 =2 = 1. Applying Gaussian elimination, we obtain the following tableaus. Row x1 x2 x3 Operation 1 2 3 1 3 2 1 2 3 1 1 4 2 2 1 4 5 6 1 0 0 1 −1 1 1 −2 2 2 −4 −3 row 1 row 2 − 3 row 1 row 3 − 2 row 1 7 8 9 1 0 0 1 −1 0 1 −2 0 2 −4 −7 row 4 row 5 row 6 + row 5 b From row 9 we see that the considered system has no solution since this equation 0x1 + 0x2 + 0x3 = −7 leads to a contradiction. The rank of the coefficient matrix A is equal to two but the rank of the augmented matrix Ab is equal to three. 8.1.4 General solution We now investigate how the set of solutions of a system of linear equations can be written in terms of vectors. The following theorem describes the general solution of a homogeneous system of linear equations. THEOREM 8.8 Let Ax = 0 be a homogeneous system of m linear equations with n variables and r(A) = p < n. Then there exist besides the zero vector n − p further linearly independent solutions x1 , x2 , . . . , xn−p ∈ Rn of the system Ax = 0, and the set of solutions SH can be written as * + SH = xH ∈ Rn | xH = λ1 x1 + λ2 x2 + · · · + λn−p xn−p , λ1 , λ2 , . . . , λn−p ∈ R . According to Definition 7.15 the set SH corresponds to the kernel of a linear mapping described by matrix A. In order to present the general solution, we need n − p linearly independent solution vectors. To illustrate how these vectors can be found, we consider the following example of a homogeneous system of linear equations and determine the general solution of this system. Linear equations and inequalities 303 Example 8.7 Let Ax = 0 with ⎛ 1 ⎜−1 ⎜ A=⎝ 0 1 2 −3 −1 1 0 1 1 1 ⎞ 1 1 ⎟ ⎟. 2 ⎠ 3 1 −2 −1 0 Applying Gaussian elimination, we obtain the following tableaus: Row x1 x2 x3 x4 x5 b 1 2 3 4 1 −1 0 1 2 −3 −1 1 0 1 1 1 1 −2 −1 0 1 1 2 3 0 0 0 0 5 6 7 8 1 0 0 0 2 −1 −1 −1 0 1 1 1 1 −1 −1 −1 1 2 2 2 0 0 0 0 Operation row 1 row 1 + row 2 row 3 −row 1 + row 4 We can stop our computations here because rows 7 and 8 are identical to row 6 and thus we can drop rows 7 and 8. (If we continue applying Gaussian elimination, we obtain two rows containing only zeroes.) Hence we use a triangular system with the following two equations obtained from rows 5 and 6: x1 + 2x2 − x2 + + − x3 + + x4 x4 x5 2x5 =0 =0 (8.10) Thus, we have r(A) = 2. Because n = 5, there are three linearly independent solutions x1 , x2 and x3 . They can be obtained for instance by setting exactly one of three variables that can be chosen arbitrarily to be equal to one and the other two of them to be equal to zero, i.e. x1 : x31 = 1, x41 = 0, x51 = 0; x2 : x32 = 0, x42 = 1, x52 = 0; x3 : x33 = 0, x43 = 0, x53 = 1. Now the remaining components of each of the three vectors are uniquely determined. It is clear that the resulting vectors are linearly independent since the matrix formed by the three vectors x1 , x2 and x3 is a 3 × 3 identity matrix. Determining the remaining components from system (8.10), we obtain ⎛ ⎜ ⎜ x1 = ⎜ ⎜ ⎝ −2 1 1 0 0 ⎞ ⎟ ⎟ ⎟, ⎟ ⎠ ⎛ ⎜ ⎜ x2 = ⎜ ⎜ ⎝ 1 −1 0 1 0 ⎞ ⎟ ⎟ ⎟ ⎟ ⎠ ⎛ and ⎜ ⎜ x3 = ⎜ ⎜ ⎝ −5 2 0 0 1 ⎞ ⎟ ⎟ ⎟. ⎟ ⎠ 304 Linear equations and inequalities Thus, according to Theorem 8.8 we get the following set SH of solutions: 0010 ⎛ 0010 0010 ⎜ 0010 ⎜ 0010 SH = xH ∈ R5 00100010 xH = λ1 ⎜ ⎜ ⎪ 0010 ⎪ ⎝ ⎪ 0010 ⎪ ⎩ 0010 ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ −2 1 1 0 0 ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟ + λ2 ⎜ ⎟ ⎜ ⎠ ⎝ 1 −1 0 1 0 ⎞ ⎛ ⎟ ⎜ ⎟ ⎜ ⎟ + λ3 ⎜ ⎟ ⎜ ⎠ ⎝ −5 2 0 0 1 ⎫ ⎪ ⎪ ⎪ ⎪ ⎬ ⎞ ⎟ ⎟ ⎟ ; λ1 , λ2 , λ3 ∈ R . ⎟ ⎪ ⎪ ⎠ ⎪ ⎪ ⎭ The following theorem characterizes the general solution of a non-homogeneous system of linear equations. THEOREM 8.9 Let Ax = b be a system of m linear equations with n variables and r(A) = p < n. Moreover, let * SH = xH ∈ Rn | xH = λ1 x1 + λ2 x2 + · · · + λn−p xn−p , λ1 , λ2 , . . . , λn−p ∈ R + denote the general solution of the homogeneous system Ax = 0 and xN be a solution of the non-homogeneous system Ax = b. Then the set of solutions S of system Ax = b can be written as * + S = x ∈ Rn | x = xH + xN , xH ∈ SH * = x ∈ Rn | x = λ1 x1 + λ2 x2 + · · · + λn−p xn−p + xN , + λ1 , λ2 , . . . , λn−p ∈ R . Theorem 8.9 says that the general solution of a non-homogeneous system of linear equations is obtained as the sum of the general solution of the corresponding homogeneous system (with vector b replaced by the zero vector 0) and a particular solution of the non-homogeneous system. As we see in this and the next chapters, the pivoting procedure is also the base for solving linear systems of inequalities and linear programming problems. The Gaussian elimination procedure is sometimes advantageous when the system contains parameters. Finally, we give a more complicated example of a system of linear equations containing a parameter a, and we discuss the solution in dependence on the value of this parameter. Example 8.8 We determine all solutions of the following system of linear equations: ax1 x1 x1 + + + x2 ax2 x2 + + + x3 x3 ax3 =1 =a = a2 , Linear equations and inequalities 305 where a ∈ R. We interchange the first and third equations. Applying now Gaussian elimination, this gives the first pivot 1 (in the case of taking the original first equation, we would have pivot a, and this means that we have to exclude the case a = 0 in order to guarantee that the pivot is different from zero). We obtain the following tableaus. x2 x3 Operation Row x1 1 2 3 1 1 a a 1 1 1 a 1 a a2 1 4 5 6 1 0 0 a 1−a −a2 + 1 1 a−1 −a + 1 a a2 − a −a2 + 1 row 1 row 2 − row 1 row 3 − a row 1 7 8 9 1 0 0 a 1−a −a2 − a + 2 1 a−1 0 a a2 − a 1−a row 4 row 5 row 6 + row 5 b In row 5, we have taken a − 1 in the x3 column as pivot and this generates a zero below this element. (Notice that this implies a − 1 = 0.) In the third tableau (rows 7 to 9), we can still interchange the columns belonging to variables x2 and x3 (to generate formally a triangular matrix). We first consider the case when the coefficient of variable x2 in row 9 of the above scheme is equal to zero, i.e. −a2 − a + 2 = 0. For this quadratic equation we get two real solutions a1 = 1 and a2 = −2. If a = a1 = 1, then the right-hand side 1 − a is also equal to zero. Moreover, in this case also all elements of row 8 are equal to zero, but row 7 contains non-zero elements. Hence both the coefficient matrix and the augmented coefficient matrix have rank one: r(A) = r(A | b) = 1, and there exist infinitely many solutions (two variables can be chosen arbitrarily). Choosing variables x2 = s ∈ R and x3 = t ∈ R arbitrarily, we get x1 = 1 − s − t. This solution can be alternatively written using Theorem 8.9. To find xH , we determine x11 using x21 = s = 1, x31 = t = 0 which yields x11 = −1 (notice that x1 = −s − t in the homogeneous system) and x12 using x22 = s = 0, x32 = t = 1 which yields x12 = −1. To get a particular solution xN of the non-homogeneous system, we set x2N = x3N = 0 which yields x1N = 1. Therefore, we get the general solution S as follows: * + S = x ∈ R3 | x = x H + x N ⎧ 0010 ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 0010 −1 −1 1 ⎨ 0010 30010 = x ∈ R 0010x = λ1 ⎝ 1 ⎠ + λ2 ⎝ 0 ⎠ + ⎝ 0 ⎠ ; ⎩ 0010 0 1 0 ⎫ ⎬ λ1 , λ2 ∈ R . ⎭ If a = a2 = −2, then the right-hand side 1 − a = 3 is different from zero. In this case, we have r(A) = 2 but r(Ab ) = 3. Therefore, the system of linear equations is inconsistent, and there is not a solution of the system. Consider now the remaining cases with a = 1 and a = −2. In all these cases we have r(A) = r(Ab ) = 3, and thus there exists a uniquely determined solution. Using rows 7 to 9 306 Linear equations and inequalities in the above tableau, we get x2 = 1−a 1 1−a = =− (a − 1)(a + 2) a+2 −a2 − a + 2 x3 = a(a − 1) − (1 − a)x2 a2 − a − (1 − a)x2 = = a + x2 a−1 a−1 =a+ 1 a2 + 2a + 1 (a + 1)2 = = a+2 a+2 a+2 x1 = a − x3 − ax2 = a − = (a + 1)2 a − a+2 a+2 a2 + 2a − a2 − 2a − 1 − a a+1 =− . a+2 a+2 We have already discussed a method for determining the inverse of a regular matrix A (see Chapter 7.5). The method presented there requires us to evaluate a lot of determinants so that this is only efficient for matrices of order n × n with a small value of n. Of course, we can also determine matrix A−1 via the solution of systems of linear equations with different right-hand side vectors, as in the following section. 8.1.5 Matrix inversion The inverse X = A−1 of a matrix A of order n × n satisfies the matrix equation AX = I , assuming that |A| = 0 (notice that this corresponds to the condition r(A) = n). Let the column vectors of matrix X be: ⎞ ⎞ ⎞ ⎛ ⎛ ⎛ x11 x12 x1n ⎜ x21 ⎟ ⎜ x22 ⎟ ⎜ x2n ⎟ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ x1 = ⎜ . ⎟ , x2 = ⎜ . ⎟ , . . . , xn = ⎜ . ⎟ . . . ⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠ xn1 xn2 xnn Then the matrix equation AX = I can be written as A(x1 x2 . . . xn ) = I = (e1 e2 . . . en ) which is equivalent to the following n systems of linear equations: Ax1 = e1 , Ax2 = e2 , .., Axn = en . These n systems of n linear equations differ only in the right-hand side vector and all the transformations of the coefficient matrix A within the application of a solution method are the same. Thus, applying for instance the pivoting procedure, we can use the following scheme for solving simultaneously these systems of n linear equations, i.e. if r(A) = n, we obtain after n pivoting steps the inverse A−1 = X = (xij ). In the following tableaus we have Linear equations and inequalities 307 assumed that no rows and columns have been interchanged (so the variables occur according to their natural numbering), i.e. the elements on the diagonal have successively been chosen as pivots. Of course, we can analogously also apply the Gaussian elimination procedure to solve the above systems with different right-hand side vectors. Row bv A 1 2 . . n . . − − a11 a21 . . an1 a12 a22 . . an2 . . ··· ··· n(n − 1) + 1 n(n − 1) + 2 . . n2 x1 x2 . . xn 1 0 . . 0 0 1 . . 0 I .. .. − . . I ··· .. a1n a2n . . ann 1 0 . . 0 0 0 . . 1 x11 x21 . . xn1 ··· ··· 0 1 . . 0 . . ··· x12 . . . x22 . . . . . xn2 . . . A−1 0 0 . . 1 x1n x2n . . xnn Example 8.9 We consider matrix ⎛ ⎞ 1 3 1 A = ⎝2 −1 1 ⎠ 1 1 1 and determine the inverse A−1 by means of the pivoting procedure. The computations are shown in the following scheme. Row bv A I 1 2 3 − − − 1 2 1 3 −1 1 1 1 1 1 0 0 0 1 0 0 0 1 4 5 6 x1 − − 1 0 0 3 −7 −2 1 −1 0 1 −2 −1 0 1 0 0 0 1 row 1 row 2 − 2 row 1 row 3 − row 1 7 x1 1 0 0 1 0 − 17 row 5 9 − 0 0 3 7 − 17 − 27 row 4 + 37 row 5 x2 1 7 2 7 3 −7 0 8 4 7 1 7 2 7 1 row 6 − 27 row 5 10 11 x1 x2 1 0 0 1 0 0 1 1 0 −2 − 12 row 7 − 2 row 9 row 8 − 12 row 9 12 x3 0 0 1 −1 7 2 1 2 3 −2 Operation 7 row 9 2 Since it is possible to generate an identity matrix on the left-hand side (see rows 10 to 12), the rank of matrix A is equal to three and the inverse A−1 exists. From rows 10 to 12, 308 Linear equations and inequalities we obtain ⎛ ⎜ A−1 = ⎝ 1 1 2 − 32 1 0 ⎞ −2 − 12 ⎟ ⎠. −1 7 2 Computing the product A−1 A gives the identity matrix and confirms the correctness of our computations. 8.2 SYSTEMS OF LINEAR INEQUALITIES 8.2.1 Preliminaries We start this section with the following introductory example. Example 8.10 A drink consisting of orange juice and champagne is to be mixed for a party. The ratio of orange juice to champagne has to be at least 1 : 2. The total quantity (volume) of the drink must not be more than 30 l, and at least 4 l more orange juice than champagne are to be used. We denote by x1 the quantity of orange juice in litres and by x2 the quantity of champagne in litres. Then we get the following constraints: x1 x1 x1 : − + x2 x2 x2 x1 , x2 ≥ ≤ ≤ ≥ 1 : 2 4 30 0. Hereafter, the notation x1 , x2 ≥ 0 means that both variables are non-negative: x1 ≥ 0, x2 ≥ 0. The first inequality considers the requirement that the ratio of orange juice and champagne (i.e. x1 : x2 ) should be at least 1 : 2. The second constraint takes into account that at most 4 l orange juice more than champagne are to be used (i.e. x1 ≤ x2 +4), and the third constraint ensures that the quantity of the drink is no more than 30 l. Of course, both quantities of orange juice and champagne have to be non-negative. The first inequality can be rewritten by multiplying both sides by −2x2 (note that the inequality sign changes) and putting both variables on the left-hand side so that we obtain the following inequalities: −2x1 x1 x1 + − + 1x2 x2 x2 x1 , x2 ≤ ≤ ≤ ≥ 0 4 30 0. In this section, we deal with the solution of such systems of linear inequalities with nonnegative variables. Linear equations and inequalities 309 In general, we define a system of linear inequalities as follows. Definition 8.9 The system a11 x1 a21 x1 + + a12 x2 a22 x2 . . + ··· + + ··· + a1n xn a2n xn R1 R2 . . b1 b2 . . am1 x1 + am2 x2 + ··· + amn xn Rm bm (8.11) is called a system of linear inequalities with the coefficients aij , the right-hand sides bi and the variables xi . Here Ri ∈ {≤, =, ≥}, i = 1, 2, . . . , m, means that one of the three relations ≤, =, or ≥ should hold, and we assume that at least one inequality occurs in the given system. The inequalities xj ≥ 0, j ∈ J ⊆ {1, 2, . . . , n} (8.12) are called non-negativity constraints. The constraints (8.11) and (8.12) are called a system of linear inequalities with |J | non-negativity constraints. In the following, we consider a system of linear inequalities with n non-negativity constraints (i.e. J = {1, 2, . . . , n}) which can be formulated in matrix form as follows: Ax R b, x ≥ 0, (8.13) where R = (R1 , R2 , . . . , Rm )T denotes the vector of the relation symbols with Ri ∈ {≤, =, ≥}, i = 1, 2, . . . , m. Definition 8.10 A vector x = (x1 , x2 , . . . , xn )T ∈ Rn which satisfies the system Ax R b is called a solution. If a solution x also satisfies the non-negativity constraints x ≥ 0, it is called a feasible solution. The set * + M = x ∈ Rn | Ax R b, x ≥ 0 is called the set of feasible solutions or the feasible region of a system (8.13). 8.2.2 Properties of feasible solutions First, we introduce the notion of a convex set. Definition 8.11 A set M is called convex if for any two vectors x1 , x2 ∈ M , any convex combination λx1 + (1 − λ)x2 with 0 ≤ λ ≤ 1 also belongs to set M . 310 Linear equations and inequalities The definition of a convex set is illustrated in Figure 8.1. In Figure 8.1(a), set M is convex since every point of the connecting straight line between the terminal points of vectors x1 and x2 also belongs to set M (for arbitrary vectors x1 and x2 ending in M ). However, set M ∗ in Figure 8.1(b) is not convex, since for the chosen vectors x1 and x2 not every point on the connecting straight line between the terminal points of both vectors belongs to set M ∗ . Figure 8.1 A convex set and a non-convex set. Definition 8.12 A vector (point) x ∈ M is called the extreme point (or corner point or vertex) of the convex set M if x cannot be written as a proper convex combination of two other vectors of M , i.e. x cannot be written as x = λx1 + (1 − λ)x2 with x1 , x2 ∈ M and 0 < λ < 1. Returning to Figure 8.1, set M in part (a) has six extreme points x(1) , x(2) , . . . , x(6) or equivalently the terminal points P1 , P2 , . . . , P6 of the corresponding vectors (here and in the following chapter, we always give the corresponding points Pi in the figures). In the case of two variables, we can give the following geometric interpretation of a system of linear inequalities. Assume that the constraints are given as inequalities. The constraints ai1 x1 + ai2 x2 Ri bi , Ri ∈ {≤, ≥}, i = 1, 2, . . . , m, are half-planes which are bounded by the lines ai1 x1 + ai2 x2 = bi . The ith constraint can also be written in the form x1 x2 = 1, + si1 si2 where si1 = bi /ai1 and si2 = bi /ai2 are the intercepts of the line with the x1 axis and the x2 axis, respectively (see Figure 8.2). The non-negativity constraints x1 ≥ 0 and x2 ≥ 0 represent the non-negative quadrant in the two-dimensional space. Thus, when considering a system of m inequalities with two non-negative variables, the feasible region is described by the intersection of m half-planes Linear equations and inequalities 311 Figure 8.2 Representation of a line by x1 and x2 intercepts. Figure 8.3 The set of solutions as the intersection of half-planes. with the non-negative quadrant. This is illustrated for the case m = 3 in Figure 8.3, where the feasible region M is dashed. In Figure 8.3 and figures which follow we use arrows to indicate which of the resulting half-planes of each constraint satisfies the corresponding inequality constraint. The arrows at the coordinate axes indicate that both variables have to be non-negative. In general, we can formulate the following property. THEOREM 8.10 The feasible region M of system (8.13) is either empty or a convex set with at most a finite number of extreme points. PROOF We only prove that, if M = ∅, it is a convex set. Let x1 , x2 ∈ M , i.e. we have Ax1 R b, x1 ≥ 0 and Ax2 R b, x2 ≥ 0, 312 Linear equations and inequalities and we prove that λx1 + (1 − λ)x2 ∈ M . Then 000e 000f A λx1 + (1 − λ)x2 = λAx1 + (1 − λ)Ax2 R λb + (1 − λ)b = b (8.14) and due to λx1 ≥ 0 and (1 − λ)x2 ≥ 0 for 0 ≤ λ ≤ 1, we get λx1 + (1 − λ)x2 ≥ 0. (8.15) From (8.14) and (8.15), it follows that M is convex. 0001 A convex set with a finite number of extreme points is also called a convex polyhedron. One of the possible cases of Theorem 8.10 is illustrated in Figure 8.4. Here the values of both variables can be arbitrarily large provided that the given two inequalities are satisfied, i.e. the feasible region is unbounded. In this case, the feasible region has three extreme points P1 , P2 and P3 . The following theorem characterizes the feasible region of a system of linear inequalities provided that this set is bounded. Figure 8.4 An unbounded set of solutions M . THEOREM 8.11 Let the feasible region M of system (8.13) be bounded. Then it can be written as the set of all convex combinations of the extreme points x1 , x2 , . . . , xs of set M , i.e. * M = x ∈ Rn | x = λ1 x1 + λ2 x2 + · · · + λs xs ; 0 ≤ λi ≤ 1, i = 1, 2, . . . , s, s 0006 + λi = 1 . i=1 In the case of only two variables, we can graphically solve the problem. To illustrate the determination of set M , consider the following example. Linear equations and inequalities 313 Example 8.11 Two goods G1 and G2 are produced by means of two raw materials R1 and R2 with the capacities of 50 and 80 units, respectively. To produce 1 unit of G1 , 1 unit of R1 and 1 unit of R2 are required. To produce 1 unit of G2 , 1 unit of R1 and 2 units of R2 are required. The price of G1 is 3 EUR per unit, the price of G2 is 2 EUR per unit and at least 60 EUR worth of goods need to be sold. Let xi be the number of produced units of Gi , i ∈ {1, 2}. A feasible production programme has to satisfy the following constraints: x1 x1 3x1 + + + x2 2x2 2x2 x1 , x2 ≤ ≤ ≥ ≥ 50 80 60 0 (I ) (II ) (III ) (constraint for R1 ) (constraint for R2 ) (selling constraint) (non-negativity constraints) This is a system of linear inequalities with only two variables, which can be easily solved graphically. The feasible region is given in Figure 8.5. The convex set of feasible solutions has five extreme points described by the vectors xi (or points Pi ), i = 1, 2, . . . , 5: 0003 0004 0003 0004 0003 0004 20 50 20 x1 = , x2 = , x3 = , 0 0 30 0003 0004 0003 0004 0 0 x4 = and x5 = . 40 30 Therefore, the feasible region M is the set of all convex combinations of the above five extreme points: 0010 0013 0003 0004 0003 0004 0003 0004 0003 0004 0010 20 50 20 0 M = x ∈ R2+ 00100010 x = λ1 + λ2 + λ3 + λ4 0 0 30 40 6 0003 0004 5 0006 0 +λ5 , λ1 , λ2 , . . . , λ5 ≥ 0, λi = 1 . 30 i=1 Figure 8.5 Feasible region for Example 8.11. 314 Linear equations and inequalities If the feasible region M is unbounded, there exist (unbounded) one-dimensional rays emanating from some extreme point of M on which points are feasible. Assume that there are u such rays and let r1 , r2 , . . . , ru denote those vectors pointing from a corresponding extreme point in the direction of such an unbounded one-dimensional ray. Then the feasible region M can be described as follows. THEOREM 8.12 Let the feasible region M of system (8.13) be unbounded. Then it can be written as follows: s u * 0006 0006 M = x ∈ Rn | x = λi x i + µj r j ; i=1 0 ≤ λi ≤ 1, j=1 i = 1, 2, . . . , s, s 0006 λi = 1; + µ1 , µ2 , . . . , µu ≥ 0 , i=1 where x1 , x2 , . . . , xs are the extreme points of set M and r1 , r2 , . . . , ru are the vectors of the unbounded one-dimensional rays of set M . According to Theorem 8.12, any feasible solution of an unbounded feasible region M can be written as a convex combination of the extreme points and a linear combination of the vectors of the unbounded rays with non-negative scalars µj , j = 1, 2, . . . , u. Considering the example given in Figure 8.4, there are two unbounded one-dimensional rays emanating from points P2 and P3 . A relationship between extreme points and basic feasible solutions is given in the following theorem. THEOREM 8.13 Any extreme point of the feasible region M of system (8.13) corresponds to at least one basic feasible solution, and conversely, any basic feasible solution corresponds exactly to one extreme point. The latter theorem needs to be discussed in a bit more detail. We know from our previous considerations that in a basic solution, all non-basic variables are equal to zero. Thus, if the coefficient matrix A of the system of linear equations has rank m, at most m variables have positive values. We can distinguish the following two cases. Definition 8.13 Let M be the feasible region of system (8.13) and let r(A) = m. If a basic feasible solution x ∈ M has m positive components, the solution is called non-degenerate. If the basic feasible solution x has less than m positive components, the solution is called degenerate. As we discuss later in connection with linear programming problems in Chapter 9, degeneracy of solutions may cause computational problems. In the case where all basic feasible solutions are non-degenerate solutions, Theorem 8.13 can be strengthened. Linear equations and inequalities 315 THEOREM 8.14 Let all basic feasible solutions of system (8.13) be non-degenerate solutions. Then there is a one-to-one correspondence between basic feasible solutions and extreme points of the set M of feasible solutions of system (8.13). 8.2.3 A solution procedure According to Theorem 8.11 we have to generate all extreme points in order to describe the feasible region of a system of linear inequalities. Using Theorems 8.13 and 8.14, respectively, this can be done by generating all basic feasible solutions of the given system. In this section, we restrict ourselves to the case when system (8.13) is given in the special form Ax ≤ b, x ≥ 0, with b ≥ 0, (8.16) where A is an m × n matrix and we assume that r(A) = m. (The case of arbitrary constraints is discussed in detail in the next chapter when dealing with linear programming problems.) This particular situation often occurs in economic applications. For instance, it is necessary to determine feasible production programmes of n goods by means of m raw materials, where the coefficient matrix A describes the use of the particular raw materials per unit of each good and the non-negative vector b describes the capacity constraints on the use of the raw materials. For a system (8.16), inequalities are transformed into equations by introducing a slack variable in each constraint, i.e. for the ith constraint ai1 x1 + ai2 x2 + · · · + ain xn ≤ bi , we write ai1 x1 + ai2 x2 + · · · + ain xn + ui = bi , where ui ≥ 0 is a so-called slack variable (i ∈ {1, 2, . . . , m}). In the following, we use the m-dimensional vector u = (u1 , u2 , . . . , um )T for the slack variables introduced in the given m inequalities. Letting 0003 A∗ = (A, I ) and x= x u 0004 , the system turns into A∗ x∗ = b with x∗ = 0003 x u 0004 ≥ 0. (8.17) In this case, we can choose 0003 0004 0003 0004 x 0 x∗ = = u b as an initial basic feasible solution, i.e. the variables x1 , x2 , . . . , xn are the non-basic variables and the slack variables u1 , u2 , . . . , um are the basic variables. Moreover, we assume that x is a non-degenerate basic feasible solution (which means that b > 0). 316 Linear equations and inequalities Starting with this basic feasible solution, we systematically have to generate all other basic feasible solutions. To this end, we have to determine the pivot element in each step such that the new basic solution is again feasible. (Such pivoting steps are repeated until all basic feasible solutions have been visited.) For an m × n matrix A with r(A) = m, there are at most n+m different basic feasi n+m m ble solutions of system (8.17) and therefore at most m extreme points x of the set M = {x ∈ Rn+ | Ax ≤ b}. Each pivoting step corresponds to a move from an extreme point (represented by some basic feasible solution) to another basic feasible solution (usually corresponding to another extreme point). We can use the tableau in Table 8.2 to perform a pivoting step. Rows 1 to m represent the initial basic feasible solution, and rows m + 1 to 2m represent the basic solution obtained after the pivoting step. (As before, we could also add an additional column to describe the operations that have to be performed in order to get rows m+1 to 2m.) Table 8.2 Tableau for a pivoting step Row bv 1 . . k . . m u1 . . uk . . um · · · xl x1 a11 . . ak1 . . am1 ··· a1l · · · . . · · · akl · · · . . · · · aml · · · m + 1 u1 a11 − aa1l ak1 · · · kl . . . . . . ak1 m + k xl ··· akl . . . . . . 2m um am1 − aaml ak1 · · · kl · · · xn 0 . . 1 . . 0 u1 . . . uk a1n . . akn . . amn 1 . . 0 . . 0 .. ··· a1n − aa1l akn kl . . ··· akn akl 1 . . 0 . . 0 . . ··· amn − aaml akn kl . . . um b 0 . . 1 . . 0 .. .. − aa1l kl . . .. .. 1 akl .. − aaml kl .. .. .. .. .. . . .. 0 . . 0 . . 1 b1 . . bk . . bm 0 . . 0 . . 1 b1 − aa1l bk kl . . bk akl . . bm − aaml bk kl Assume that we choose akl = 0 as pivot element, i.e. we replace the basic variable uk by the original non-basic variable xl . Then we obtain the new basic solution 0003 0004 x x∗ = u with and 0003 0004 bk xT = (x1 , . . . , xl , . . . , xn ) = 0, . . . , ,..,0 akl 0004 0003 a1l aml uT = (u1 , . . . , uk , . . . , um ) = b1 − · bk , . . . , 0, . . . , bm − · bk . akl akl The new basic solution 0004 0003 x x∗ = u Linear equations and inequalities 317 ∗ is feasible if all components of vector x are non-negative, i.e. bk ≥ 0 akl (8.18) and bi − ail · bk ≥ 0, akl i = 1, 2, . . . , m, i = k. (8.19) Next, we derive a condition such that both (8.18) and (8.19) are satisfied. First, since b > 0 (remember that we are considering a non-degenerate solution) and akl = 0 by assumption, we have the equivalence bk ≥ 0 akl ⇐⇒ akl > 0. This means that in the chosen pivot column, only positive elements are a candidate for the pivot element. However, we also have to ensure that (8.19) is satisfied. We get the following equivalence: bi − ail · bk ≥ 0 akl ⇐⇒ bi bk ≥ ≥0 ail akl for all i with ail > 0. This means that we have to take that row as pivot row, which yields the smallest quotient of the current right-hand side component and the corresponding element in the chosen pivot column among all rows with a positive element in the pivot column. Summarizing, we can replace the basic variable uk by the non-basic variable xl if 0010 0013 0014 bk bi 00100010 a and qk = = min > 0, i ∈ {1, 2, . . . , m} . akl > 0 il akl ail 0010 If one of the above conditions is violated, i.e. if akl < 0 or quotient qk = bk /akl is not minimal among all quotients in the rows with a positive element in the pivot column, we do not get a basic feasible solution, i.e. at least one component of the new right-hand side vector would be negative. Therefore, we add a column Q in the tableau in Table 8.2 to calculate the corresponding quotients qi which have to be taken into account. Notice that in the new tableau, there is again an identity submatrix (with possibly interchanged columns) contained in the transformed matrix belonging to the basic variables. In the case of a degenerate basic feasible solution, all the above formulas for a pivoting step remain valid. In such a case, we have a smallest quotient qk = 0 (which, by the way, means that also in the case of choosing a negative pivot element a basic feasible solution results). We will discuss the difficulties that may arise in case of degeneracy later in Chapter 9 in a bit more detail. The process of enumerating all basic feasible solutions of system (8.13) can be done by hand (without using a computer) only for very small problems. Consider the following example. Example 8.12 Consider the system of linear inequalities presented in Example 8.10. Introducing slack variables u1 , u2 and u3 , we obtain the initial tableau in rows 1 to 3 below. Now, the goal is to generate all basic feasible solutions of the given system of linear equations. 318 Linear equations and inequalities Since each basic feasible solution has three basic variables, there are at most 53 = 10 basic feasible solutions. We successively perform pivoting steps and might obtain for instance the following sequence of basic feasible solutions: Row bv x1 x2 u1 u2 u3 b Q 1 2 3 u1 u2 u3 −2 1 1 1 −1 1 1 0 0 0 1 0 0 0 1 0 4 30 − 4 30 4 5 6 u1 x1 u3 0 1 0 −1 −1 2 1 0 0 2 1 −1 0 0 1 8 4 26 − − 13 7 u1 0 0 1 14 0 0 17 34 x2 0 1 0 1 2 1 2 1 2 21 1 3 2 1 2 1 −2 8 x1 9 13 − 10 u2 0 0 1 42 x1 1 0 10 30 12 x2 0 1 1 3 1 3 2 3 14 11 2 3 1 −3 1 3 20 30 0 0 The corresponding pivot elements are printed in bold face. In the first tableau (rows 1 to 3), we can choose either the column belonging to x1 or that belonging to x2 as pivot column (since there is at least one positive element in each of these columns). Selecting the column belonging to x1 , the quotient is uniquely determined, and we have to replace the basic variable u2 by the non-basic variable x1 . In the second pivoting step, we can choose only the column belonging to x2 as pivot column (since, when choosing the column belonging to u2 , we come back to the basic feasible solution represented by rows 1 to 3). From the last basic feasible solution, we cannot generate another extreme point. However, we can perform another pivoting step so that u3 becomes a basic variable. Determining now the minimum quotient in the Q column, it is not uniquely determined: we can choose x1 or x2 as the variable that becomes a non-basic variable (since in both cases, we have the smallest quotient 30). If we choose x1 as the variable that becomes a non-basic variable, we get the tableau: Row bv x1 x2 u1 u2 u3 b Q 13 14 15 u2 u3 x2 −1 3 −2 0 0 1 1 −1 1 1 0 0 0 1 0 4 30 0 If we choose in the fourth tableau x2 as the variable that becomes a non-basic variable, we get the following tableau: Row bv x1 x2 u1 16 u2 0 − 12 − 12 3 2 1 2 1 −2 1 2 17 x1 1 18 u3 0 u2 u3 1 0 0 1 0 0 1 30 b 4 Q Linear equations and inequalities 319 We still have to show that we have now indeed generated all basic feasible solutions. For the second, third and fourth tableaus we have discussed all the possibilities. We still have to check the remaining possibilities for the first, fifth and six tableaus. If in the first tableau we choose instead of the x1 column the x2 column as pivot column, we get the fifth tableau (rows 13 to 15). Checking all possibilities for generating a new basic feasible solution in the fifth and sixth tableaus, we see that we can generate only such basic feasible solutions as we have already found. This means that the remaining possible combinations for selecting basic variables which are x2 , u1 , u3 ; x2 , u1 , u2 ; x1 , u1 , u2 and x1 , x2 , u3 do not lead to a basic feasible solution. (One can check this by trying to find all basic solutions.) Therefore, in the above example, there are six basic feasible solutions and four basic infeasible solutions. From rows 1 to 18, we get the following basic feasible solutions. (The basic variables are printed in bold face.) (1) (2) (3) (4) (5) (6) x1 = 0, x1 = 4, x1 = 17, x1 = 10, x1 = 0, x1 = 0, x2 = 0, u1 = 0, u2 = 4, u3 x2 = 0, u1 = 8, u2 = 0, u3 x2 = 13, u1 = 21, u2 = 0, x2 = 20, u1 = 0, u2 = 14, x2 = 0, u1 = 0, u2 = 4, u3 x2 = 0, u1 = 0, u2 = 4, u3 = 30; = 26; u3 = 0; u3 = 0; = 30; = 30. Deleting now the introduced slack variables u1 , u2 , u3 , we get the corresponding extreme points P1 with the coordinates (0, 0), P2 with the coordinates (4, 0), P3 with the coordinates (17, 13) and P4 with the coordinates (10, 20). The fifth and sixth basic feasible solutions correspond to extreme point P1 again. (In each of them, exactly one basic variable has value zero.) Therefore, the first, fifth and sixth basic feasible solutions are degenerate solutions corresponding to the same extreme point. Figure 8.6 Feasible region for Example 8.12. The feasible region together with the extreme points is given in Figure 8.6. It can be seen that our computations have started from point P1 , then we have moved to the adjacent extreme point P2 , then to the adjacent extreme point P3 and finally to P4 . Then all extreme points 320 Linear equations and inequalities have been visited and the procedure stops. According to Theorem 8.10, the feasible region is given by the set of all convex combinations of the four extreme points: 0010 0003 0004 0003 0004 0003 0004 0003 0004 0003 0004 0010 0 4 17 10 x1 ∈ R2 00100010 = λ1 + λ2 + λ3 + λ4 ; 0 0 13 20 x2 6 4 0006 λi ≥ 0, i ∈ {1, 2, 3, 4}, λi = 1 . 00130003 M= 0004 x1 x2 i=1 Since the columns belonging to the basic variables are always unit vectors, they can be omitted. All required information for performing a pivoting step is contained in the columns of the non-basic variables. Therefore, we can use a short form of the tableau, where the columns are assigned to the non-basic variables and the rows are assigned to the basic variables. Assume that altogether n variables denoted by x1 , x2 , . . . , xn occur in the system of linear inequalities; among them there are m basic variables denoted by xB1 , xB2 , . . . , xBm and n0010 = n − m non-basic variables denoted by xN 1 , xN 2 , . . . , xNn0010 . In this case, the short form of the tableau is as in Table 8.3. Table 8.3 Short form of the tableau for a pivoting step Row 1 . . k . . m bv xN 1 xB1 . . xBk . . xBm a∗11 ··· ··· ··· ··· .. xNl .. xN n0010 ··· a∗ − a1l ∗ kl .. ∗ a∗1n0010 − a1l ∗ akn0010 kl . . ∗ b∗1 − a1l ∗ bk kl . . 1 a∗kl ··· a∗kn0010 a∗kl b∗k a∗kl xN 1 a∗11 − . . . . . . m+k xBk . . . . a∗k1 a∗kl 2m xBm a∗k1 . . ··· ∗ a∗m1 − aml ∗ ak1 kl ··· a∗ − aml ∗ kl a∗ . . . . . . a∗ ··· ∗ akl . . a∗ml ··· xB1 xN n0010 . . a∗k1 . . a∗m1 m+1 a∗1l ··· a∗1n0010 . . a∗kn0010 . . a∗mn0010 ··· . . a∗1l a∗kl xNl ··· a∗ ∗ a∗mn0010 − aml ∗ akn0010 kl b∗1 . . b∗k . . b∗m a∗ . . a∗ ∗ b∗m − aml ∗ bk kl In column bv, the basic variables xB1 , xB2 , . . . , xBm are given, and the next n0010 columns represent the non-basic variables xN 1 , xN 2 , . . . , xNn0010 . After the first pivoting step, the kth basic variable is now the former lth non-basic variable: xBk = xNl . Correspondingly, the lth nonbasic variable in the new solution is the former kth basic variable: xNl = xBk . Notice that in each transformation step we have to write the sequence of the non-basic variables since the variables in the columns do not appear according to the numbering. Linear equations and inequalities 321 It is worth emphasizing that, in contrast to the full form of the tableau, the new elements ail of the pivot column are obtained as follows: akl = 1 , a∗kl ail = − a∗il , a∗kl i = 1, 2, . . . , m, i = k, i.e. the pivot element is replaced by its reciprocal value and the remaining elements of the pivot column are divided by the negative pivot element. Notice that this column also occurs in the full form of the tableau, though not in the pivot column (where after the pivoting step a unit vector occurs) but it appears in the column of the new non-basic variable xNl in the full form of the tableau after the pivoting step. The following example uses the short form of this tableau. Example 8.13 2x1 −x1 + + We consider the following system of linear inequalities: x2 x2 x2 x1 , x2 ≤ ≤ ≥ ≥ 25 30 −6 0. We multiply the third constraint by −1 and introduce in each inequality a slack variable denoted by u1 , u2 and u3 . This yields a system with three equations and five variables. 2x1 x1 + − x2 x2 x2 + u1 + = 25 = 30 + u3 = 6 x1 , x2 , u1 , u2 , u3 ≥ 0. u2 There are at most 53 = 10 basic feasible solutions, each of them including exactly three basic variables. In order to describe the feasible region, we have to generate systematically all the possible basic feasible solutions. Starting from the first basic solution with x1 = 0, x2 = 0, u1 = 25, u2 = 30, u3 = 6 and using the short form of the tableau, we obtain e.g. the following sequence of basic feasible solutions by subsequent pivoting steps. In the first step we have chosen the column belonging to x1 as pivot column. (This is possible since at least one element in this column is positive; analogously one can also start with the column belonging to x2 as the first pivot column.) Using the quotients in the Q column, we have found that row 3 must be the pivot row and, therefore, we have the pivot element 1. From the tableau given by rows 4 to 6, we have to select the column belonging to variable x2 as pivot column (otherwise we come back to the basic feasible solution described by rows 1 to 3). The quotient rule determines row 5 as pivot row. Continuing in this way (note that the pivot column is now always uniquely determined since in the other case we would always go back to the previous basic feasible solution), we get the results in the following tableaus. 322 Linear equations and inequalities Row bv 1 2 3 u1 u2 u3 4 5 6 u1 u2 x1 7 u1 8 x2 9 x1 10 11 12 13 14 15 u3 x2 x1 u3 x2 u2 x2 b Q 0 2 1 1 1 −1 25 30 6 − 15 6 u3 x2 0 −2 1 1 3 −1 25 18 6 25 6 − u3 u2 2 3 − 23 1 3 − 13 19 57 2 1 3 1 3 6 − 12 36 u1 u2 3 2 − 12 0 57 2 − − 5 x1 1 − 12 1 2 u1 x1 − 12 1 −1 1 0 2 25 5 2 31 25 5 Having generated the above five basic feasible solutions, we can stop our computations for the following reasons. There remain five other selections of three variables out of the variables x1 , x2 , u1 , u2 , u3 which, however, do not yield basic feasible solutions. The variables x2 , u1 , u3 as basic variables lead to a basic infeasible solution. (When selecting x2 as the variable that becomes a basic variable in the next solution and row 2 as pivot row, this would violate the quotient rule since q2 = 30 > q1 = 25.) Similarly, the choice of x1 , u1 , u3 as basic variables does not lead to a basic feasible solution since we would violate the quotient rule when choosing the x1 column as pivot column and row 2 as pivot row. The choice of x1 , u2 , u3 as basic variables is not possible because when choosing the x1 column as pivot column and row 1 as pivot row in the first tableau, pivot zero would result and this is not allowed (i.e. the corresponding column vectors of matrix A belonging to these variables are linearly dependent). The choice of x1 , x2 , u2 as basic variables leads to a basic infeasible solution because when choosing the u2 column as pivot column and row 7 as pivot row in the third tableau, the resulting pivot element would be negative. For the same reason, the choice of variables x2 , u1 , u2 as basic variables does not lead to a basic feasible solution. (Selecting the x2 column as pivot column and row 3 as pivot row in the first tableau yields the negative pivot −1.) Therefore, in addition to the five basic feasible solutions determined from rows 1 to 15, there are four further basic infeasible solutions. From the above five basic feasible solutions, we obtain the resulting five extreme points of the feasible region M (by dropping the slack variables u1 , u2 and u3 ). Therefore, set M is Linear equations and inequalities 323 given by 00130003 M= x1 x2 0003 +λ4 0010 0003 0004 0003 0004 0003 0004 0003 0004 0010 x1 0 6 12 ∈ R2+ 00100010 = λ1 + λ2 + λ3 0 0 6 x2 6 0003 0004 0004 5 0006 0 5/2 ; λi ≥ 0, i = 1, 2, . . . , 5, λi = 1 . + λ5 25 25 0004 i=1 The graphical solution of the system of linear inequalities is given in Figure 8.7. The ith basic feasible solution corresponds to extreme point Pi . In this example, there is exactly one basic feasible solution corresponding to each extreme point since each basic feasible solution is non-degenerate. Figure 8.7 Feasible region for Example 8.13. EXERCISES 8.1 Decide whether the following systems of linear equations are consistent and find the solutions. Apply both methods of Gaussian elimination and pivoting (use the rank criterion). (a) x1 2x1 3x1 (b) 3x1 4x1 x1 4x1 + 2x2 + 3x2 + x2 + x2 − x2 − x2 − x2 + 3x3 + x3 + 2x3 + x3 + 2x3 + x3 + 2x3 = 5 = 8 = 5 = 3 = 4 = 1 = 5 324 Linear equations and inequalities (c) 8.2 3x1 + 4x2 + x3 + 6x4 = 8 3x1 + 8x2 + 6x3 + 5x4 = 7 8x1 + 5x2 + 6x3 + 7x4 = 6 6x1 + 2x2 + 5x3 + 3x4 = 5 (d) x1 + 2x2 + 3x3 = 16 8x1 + 7x2 + 6x3 = 74 4x1 + 5x2 + 9x3 = 49 5x1 + 4x2 + 2x3 = 43 x1 + 4x2 + x3 = 22. (a) Solve the following homogeneous systems of linear equations Ax = 0 with (i) ⎛ 1 ⎜0 A=⎜ ⎝1 2 1 1 1 3 1 3 2 5 ⎞ 1 0 ⎟ ⎟; 4 ⎠ 0 ⎛ 1 ⎜2 (ii) A = ⎜ ⎝1 2 1 3 1 3 1 6 2 5 ⎞ 1 3 ⎟ ⎟ 4 ⎠ 0 (b) Find the solutions of the non-homogeneous system Ax = b with matrix A from (i) resp. (ii) and b = (0, 0, −2, 2)T . 8.3 Find the general solutions of the following systems and specify two different basic solutions for each system: (a) 4x1 −x1 2x1 (b) x1 x1 2x1 3x1 8.4 x2 − 3x3 + 5x4 = −2 x 2 + x 3 − x4 = 4 x2 − x3 − 2x4 = 1 2x2 − x3 + 4x5 4x2 − 5x3 + x4 + 3x5 2x2 + 10x3 + x4 − x5 2x2 + 5x3 + 2x4 + 2x5 = = = = 2 1 11 12 What restriction on the parameter a ensures the consistence of the following system? Find the solution depending on a. 3x 3x 4x 8.5 − + − + + − + + 4y + 2y + 5y + 4z + 3z + az = 2 = 3 = 4 Check the consistence of the following system as a function of the parameter λ: 3x 6x + 2y + 4y + + z λz = 0 = 0 Do the following cases exist? (a) There is no solution. (b) There is a unique solution. Find the solution if possible. Linear equations and inequalities 325 8.6 Given is the system ⎛ 1 0 0 −1 ⎜0 1 −1 1 ⎜ ⎝0 0 a 0 0 1 0 b ⎞ ⎛ 2 x1 ⎟ ⎜x2 ⎟ ⎜3 ⎟⎜ ⎟ = ⎜ ⎠ ⎝x3 ⎠ ⎝1 0 x4 ⎞⎛ ⎞ ⎟ ⎟. ⎠ (a) Decide with respect to the parameters a and b in which cases the system is consistent or inconsistent. When does a unique solution exist? (b) Find the general solution if possible. (c) Calculate the solution for a = 1 and b = 0. 8.7 Decide whether the following system of linear equations is consistent and find the solution in dependence on parameters a and b: ax bx ax + + + (a + b) y ab y by + bz + az + bz = 3a + 5b = a(2b + 3) + b = a + 5b 8.8 Find the kernel of the linear mapping described by matrix ⎛ ⎞ 2 −2 2 A = ⎝5 −1 7 ⎠ . 3 −1 4 8.9 Given are the two systems of linear equations x1 3x1 2x1 8.10 8.11 + + + 2x2 1x2 3x2 + + + 3x3 2x3 1x3 = = = 5 5 8 and x1 2x1 3x1 Solve them in only one tableau. Find the inverse matrices by pivoting: ⎛ ⎞ ⎛ 1 3 2 1 5 3 ⎠; (a) A = ⎝ 2 (b) B = ⎝3 −3 −8 −4 7 ⎛ ⎞ 1 0 −1 2 ⎜ 2 −1 −2 3 ⎟ ⎟. (c) C = ⎜ ⎝−1 2 2 −4 ⎠ 0 1 2 −5 + + + 4 2 8 2x2 3x2 x2 + + + = = = 3x3 x3 2x3 ⎞ 6 1 ⎠; 8 Find the solutions of the following matrix equations: (a) XA = B with ⎛ −1 A=⎝ 2 −3 3 5 −8 ⎞ 2 3 ⎠ −4 0003 and B= 1 3 −4 3 2 −2 0004 ; 14 13 21. 326 Linear equations and inequalities (b) AXB = C with 0003 0004 2 1 A= , −1 2 0003 B= (c) XA = −2(X + B) with ⎛ ⎞ −3 1 1 1 ⎠ A = ⎝−1 0 2 1 −2 8.12 3 4 2 3 0004 0003 0003 and C= and B= 2 1 −4 0 3 −2 0 1 1 0 0004 ; 0004 . Assume we have a four-industry economy. Each industry is to produce an output just sufficient to meet its own input requirements and those of the other three industries as well as the final demand of the open sector. That is, the output level xi must satisfy the following equation: xi = xi1 + xi2 + xi3 + xi4 + yi , i ∈ {1, 2, 3, 4}, where yi is the final demand for industry i and xij is the amount of xi needed as input for the industry j ∈ {1, 2, 3, 4}. Let ⎛ ⎞ ⎛ ⎞ 5 15 15 10 5 ⎜25 25 10 30 ⎟ ⎜10 ⎟ ⎜ ⎟ ⎜ and y=⎝ ⎟ . X = (xij ) = ⎝ 10 20 20 20 ⎠ 30 ⎠ 10 30 15 25 20 (a) Find the input-coefficient matrix A which satisfies the equation x = Ax + y with x = (x1 , x2 , x3 , x4 )T and xij = aij xj . (b) How does the matrix X change if the final demand changes to y = (10, 20, 20, 15)T and the input coefficients are constant? 8.13 Given is the bounded convex set with the extreme points ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 2 1 2 x2 = ⎝1 ⎠, x3 = ⎝0 ⎠ and x1 = ⎝1 ⎠, 0 1 1 Do the points ⎛ ⎞ 2 1⎝ ⎠ 3 a= 2 1 8.14 and ⎛ ⎞ −1 x4 = ⎝ 3 ⎠. 1 ⎛ ⎞ 1 b = ⎝0 ⎠ 2 belong to the convex set given above? In addition to its main production a firm produces two products A and B with two machines I and II. The products are manufactured on machine I and packaged on machine II. Machine I has a free capacity of 40 hours and machine II can be used for 20 hours. To produce 1 tonne of A takes 4 min on machine I and 6 min on machine II. Linear equations and inequalities 327 Producing 1 tonne of B takes 10 min on machine I and 3 min on machine II. What output combinations are possible? (a) Model the problem by means of a system of linear inequalities. (b) Solve the problem graphically. (c) Find the general solution by calculation. 8.15 Solve the following problem graphically: −x1 −2x1 x1 8.16 + + − x2 x2 2x2 x1 , x2 ≤ ≤ ≤ ≥ 4 3 1 0 Find the general solutions of the following systems by calculation: (a) 2x1 4x1 (b) x1 x1 + + + 5x2 x2 x2 + + − 2x3 + x3 x1 , x2 , x3 ≤ ≤ ≥ 5 3 0 x3 5x3 x1 , x2 , x3 ≤ ≤ ≥ 3 3 0 9 Linear programming In Chapter 8, we considered systems of linear inequalities and we discussed how to find the feasible region of such systems. In this chapter, any feasible solution of such a system is evaluated by the value of a linear objective function, and we are looking for a ‘best’ solution among the feasible ones. After introducing some basic notions, we discuss the simplex algorithm as a general method of solving such problems. Then we introduce a so-called dual problem, which is closely related to the problem originally given, and we present a modification of the simplex method based on the solution of this dual problem. 9.1 PRELIMINARIES We start this section with an introductory example. Example 9.1 A company produces a mixture consisting of three raw materials denoted as R1 , R2 and R3 . Raw materials R1 and R2 must be contained in the mixture with a given minimum percentage, and raw material R3 must not exceed a certain given maximum percentage. Moreover, the price of each raw material per kilogram is known. The data are summarized in Table 9.1. We wish to determine a feasible mixture with the lowest cost. Let xi , i ∈ {1, 2, 3}, be the percentage of raw material Ri . Then we get the following constraints. First, x1 + x2 + x3 = 100. (9.1) Equation (9.1) states that the sum of the percentages of all raw materials equals 100 per cent. Since the percentage of raw material R3 should not exceed 30 per cent, we obtain the constraint x3 ≤ 30. (9.2) Table 9.1 Data for Example 9.1 Raw material Required (%) Price in EUR per kilogram R1 R2 R3 at least 10 at least 50 at most 30 25 17 12 Linear programming 329 The percentage of raw material R2 must be at least 50 per cent, or to put it another way, the sum of the percentages of R1 and R3 must be no more than 50 per cent: x1 + x3 ≤ 50. (9.3) Moreover, the percentage of R1 must be at least 10 per cent, or equivalently, the sum of the percentages of R2 and R3 must not exceed 90 per cent, i.e. x2 + x3 ≤ 90. (9.4) Moreover, all variables should be non-negative: x1 ≥ 0, x2 ≥ 0, x3 ≥ 0. (9.5) The cost of producing the resulting mixture should be minimized, i.e. the objective function is as follows: z = 25x1 + 17x2 + 12x3 −→ min! (9.6) The notation z −→ min! indicates that the value of function z should become minimal for the desired solution. So we have formulated a problem consisting of an objective function (9.6), four constraints (three inequalities (9.2), (9.3) and (9.4) and one equation (9.1)) and the non-negativity constraints (9.5) for all three variables. In general, a linear programming problem (abbreviated LPP) consists of constraints (a system of linear equations or linear inequalities), non-negativity constraints and a linear objective function. The general form of such an LPP can be given as follows. General form of an LPP z = c1 x1 + c2 x2 + . . . + cn xn −→ max! (min!) subject to (s.t.) a11 x1 + a12 x2 + . . . + a1n xn a21 x1 + a22 x2 + . . . + a2n xn . . R1 R2 . . b1 b2 . . am1 x1 + am2 x2 + . . . + amn xn Rm bm xj ≥ 0, j ∈ J ⊆ {1, 2, . . . , n}, where Ri ∈ {≤, =, ≥}, 1 ≤ i ≤ m. An LPP considers either the maximization or the minimization of a linear function z = cT x. In each constraint, we have exactly one of the signs ≤, = or ≥, i.e. we may have both equations and inequalities as constraints, where we assume that at least one inequality occurs. 330 Linear programming Alternatively, we can give the following matrix representation of an LPP: z = cT x −→ max! (min!). s.t. Ax R b (9.7) x ≥ 0, where R = (R1 , R2 , . . . , Rm )T , Ri ∈ {≤, =, ≥}, i = 1, 2, . . . , m. Here matrix A is of order m × n. The vector c = (c1 , c2 , . . . , cn )T is known as the vector of the coefficients in the objective function, and the vector b = (b1 , b2 , . . . , bm )T is the right-hand-side vector. The feasibility of a solution of an LPP is defined in the same way as for a system of linear inequalities (see Definition 8.10). Definition 9.1 A feasible solution x = (x1 , x2 , . . . , xn )T , for which the objective function has an optimum (i.e. maximum or minimum) value is called an optimal solution, and z0 = cT x is known as the optimal objective function value. 9.2 GRAPHICAL SOLUTION Next, we give a geometric interpretation of an LPP with only two variables x1 and x2 , which allows a graphical solution of the problem. For fixed value z and c2 = 0, the objective function z = c1 x1 + c2 x2 is a straight line of the form c1 z x2 = − x1 + , c2 c2 i.e. for different values of z we get parallel lines all with slope −c1 /c2 . The vector 0003 0004 c1 c= c2 points in the direction in which the objective function increases most. Thus, when maximizing the linear objective function z, we have to shift the line x2 = − c1 z x1 + c2 c2 in the direction given by vector c, while when minimizing z, we have to shift this line in the opposite direction, given by vector −c. This is illustrated in Figure 9.1. Based on the above considerations, an LPP of the form (9.7) with two variables can be graphically solved as follows: (1) Determine the feasible region as the intersection of all feasible half-planes with the first quadrant. (This has already been discussed when describing the feasible region of a system of linear inequalities in Chapter 8.2.2.) Linear programming 331 (2) Draw the objective function z = Z, where Z is constant, and shift it either in the direction given by vector c (in the case of z → max!) or in the direction given by vector −c (in the case of z → min!). Apply this procedure as long as the line z = constant has common points with the feasible region. Figure 9.1 The cases z → max! and z → min! Example 9.2 A firm can manufacture two goods G1 and G2 in addition to its current production programme, where Table 9.2 gives the available machine capacities, the processing times for one unit of good Gi and the profit per unit of each good Gi , i ∈ {1, 2}. Table 9.2 Data for Example 9.2 Process Processing times per unit G1 G2 Free machine capacity in minutes Turning Milling Planing Profit in EUR per unit 0 6 6 1 640 720 600 -- 8 6 3 2 (1) First, we formulate the corresponding LPP. Let xi denote the number of produced units of good Gi , i ∈ {1, 2}. Then we obtain the following problem: z = x1 + 2x2 → max! s.t. 8x2 ≤ 6x1 + 6x2 ≤ 6x1 + 3x2 ≤ x1 , x2 ≥ 640 720 600 0. (2) We solve the above problem graphically (see Figure 9.2). We graph each of the three constraints as an equation and mark the corresponding half-spaces satisfying the given inequality constraint. The arrows on the constraints indicate the feasible half-space. The feasible region M is obtained as the intersection of these three feasible half-spaces with the first quadrant. (We get the dashed area in Figure 9.2.) We graph the objective function z = Z, where Z is constant. (In Figure 9.2 the parallel lines z = 0 and z = 200 are given.) The optimal extreme point is P4 corresponding to x1∗ = 40 and x2∗ = 80. 332 Linear programming The optimal objective function value is z0max = x1∗ + 2x2∗ = 40 + 2 · 80 = 200. Notice that for z > 200, the resulting straight line would not have common points with the feasible region M . Figure 9.2 Graphical solution of Example 9.2. Next, we determine the basic feasible solutions corresponding to the extreme points P1 , P2 , . . . , P5 . To this end, we introduce a slack variable in each constraint which yields the following system of constraints: 6x1 6x1 + + 8x2 6x2 3x2 + x3 + x4 +x5 x1 , x2 , x3 , x4 , x5 = = = ≥ 640 720 600 0. Considering point P1 with x1 = x2 = 0, the basic variables are x3 , x4 and x5 (and consequently the matrix of the basis vectors is formed by the corresponding column vectors, which is an identity matrix). The values of the basic variables are therefore x3 = 640, x4 = 720 and x5 = 600, and the objective function value is z1 = 0. Considering point P2 , we insert x1 = 100, x2 = 0 into the system of constraints, which yields the basic feasible solution x1 = 100, x2 = 0, x3 = 640, x4 = 120, x5 = 0 with the objective function value z2 = 100. Considering point P3 with x1 = 80, x2 = 40, we obtain from the system of constraints the basic feasible solution x1 = 80, x2 = 40, x3 = 320, x4 = 0, x5 = 0 Linear programming 333 with the objective function value z3 = 160. Considering point P4 with x1 = 40, x2 = 80, we get from the system of constraints the basic feasible solution x1 = 40, x2 = 80, x3 = 0, x4 = 0, x5 = 120 with the objective function value z4 = 200. Finally, considering point P5 with x1 = 0, x2 = 80, we get the corresponding basic feasible solution x1 = 0, x2 = 80, x3 = 0, x4 = 240, x5 = 360 with the objective function value z5 = 160. The above considerations have confirmed our graphical solution that P4 is the extreme point with the maximal objective function value z0max = z4 = 200. Example 9.3 We solve graphically the LPP z = x1 + x2 → max! s.t. x1 + x2 ≥ 6 −ax1 + x2 ≥ 4 x1 , x2 ≥ 0, where a denotes some real parameter, i.e. a ∈ R. We first draw the constraints as equations and check which of the resulting half-spaces satisfy the given inequality constraint. The arrows on the constraints indicate again the feasible half-spaces. This way, we get the feasible region M as the intersection of the half-spaces (see Figure 9.3). Since the second constraint contains parameter a, we graph in Figure 9.3 the line resulting from the second constraint as equation for some values of a, namely: a = −2, a = 0 and a = 2. (M is dashed for a = 2, while for a = −2 and a = 0 only the resulting constraint including parameter a is dashed.) Figure 9.3 Feasible region M for Example 9.3. 334 Linear programming It can be seen that all the resulting equations for different values of parameter a go through the point (0,4), and the slope of the line is given by 1/a for a = 0. To find the optimal solution, we now graph the objective function z = Z, where Z is constant. (In Figure 9.3 the lines z = 1 and z = 8 are shown.) The arrow on the lines z = constant indicates in which direction the objective function value increases. (Remember that the dashed area gives the feasible region M of the problem for a = 2.) From Figure 9.3, we see that the function value can become arbitrarily large (independently of the value of parameter a), i.e. an optimal solution of the maximization problem does not exist. We continue with some properties of an LPP. 9.3 PROPERTIES OF A LINEAR PROGRAMMING PROBLEM; STANDARD FORM Let M be the feasible region and consider the maximization of the objective function z = cT x. We know already from Chapter 8.2.2 that the feasible region M of a system of linear inequalities is either empty or a convex polyhedron (see Theorem 8.10). Since the feasibility of a solution of an LPP is independent of the objective function, the latter property also holds for an LPP. We can further reinforce this and the following three cases for an LPP may occur: (1) The feasible region is empty: M = ∅. In this case the constraints are inconsistent, i.e. there is no feasible solution of the LPP. (2) M is a non-empty bounded subset of the n-space Rn . (3) M is an unbounded subset of the n-space Rn , i.e. at least one variable may become arbitrarily large, or if some variables are not necessarily non-negative, at least one of them may become arbitrarily small. In case (2), the feasible region M is also called a convex polytope, and there is always a solution of the maximization problem. In case (3), there are again two possibilities: (3a) The objective function z is bounded from above. Then an optimal solution of the maximization problem under consideration exists. (3b) The objective function z is not bounded from above. Then there does not exist an optimal solution for the maximization problem under consideration, i.e. there does not exist a finite optimal objective function value. Cases (3a) and (3b) are illustrated in Figure 9.4. In case (3a), there are three extreme points, and P is an optimal extreme point. In case (3b), there exist four extreme points, however, the values of the objective function can become arbitrarily large. Linear programming 335 Figure 9.4 The cases (3a) and (3b). THEOREM 9.1 If an LPP has an optimal solution, then there exists at least one extreme point, where the objective function has an optimum value. According to Theorem 9.1, one can restrict the search for an optimal solution to the consideration of extreme points (represented by basic feasible solutions). The following theorem characterizes the set of optimal solutions. THEOREM 9.2 Let P1 , P2 , . . . , Pr described by vectors x1 , x2 , . . . , xr be optimal extreme points. Then any convex combination x0 = λ1 x1 + λ2 x2 + . . . + λr xr , λi ≥ 0, i = 1, 2, . . . , r, r 0006 λi = 1 (9.8) i=1 is also an optimal solution. PROOF Let x1 , x2 , . . . , xr be optimal extreme points with cT x1 = cT x2 = . . . = cT xr = z0max and x0 be defined as in (9.8). Then x0 is feasible since the feasible region M is convex. Moreover, cT x0 = cT (λ1 x1 + λ2 x2 + . . . + λr xr ) = λ1 (cT x1 ) + λ2 (cT x2 ) + . . . + λr (cT xr ) = λ1 z0max + λ2 z0max + . . . + λr z0max = (λ1 + λ2 + . . . + λr )z0max = z0max , i.e. point x0 is optimal. 0001 In Figure 9.5, Theorem 9.2 is illustrated for a problem with two variables having two optimal extreme points P1 and P2 . Any point on the connecting line between the points P1 and P2 is optimal. 336 Linear programming Figure 9.5 The case of several optimal solutions. Standard form In the next definition, we introduce a special form of an LPP. Definition 9.2 An LPP of the form z = cT x −→ max! s.t. Ax = b, x ≥ 0, where A = (AN , I ) and b ≥ 0 is called the standard form of an LPP. According to Definition 9.2, matrix A can be partitioned into some matrix AN and an identity submatrix I . Thus, the standard form of an LPP is characterized by the following properties: (1) the LPP is a maximization problem; (2) the constraints are given as a system of linear equations in canonical form with nonnegative right-hand sides and (3) all variables have to be non-negative. It is worth noting that the standard form always includes a basic feasible solution for the corresponding system of linear inequalities (when the objective function is skipped). If no artificial variables are necessary when generating the standard form or when all artificial variables have value zero (this means that the right-hand sides of all constraints that contain an artificial variable are equal to zero), this solution is also feasible for the original problem and it corresponds to an extreme point of the feasible region M . However, it is an infeasible solution for the original problem if at least one artificial variable has a value greater than zero. In this case, the constraint from the original problem is violated and the basic solution does not correspond to an extreme point of set M . The generation of the standard form of an LPP plays an important role in finding a starting solution in the procedure that we present later for solving an LPP. Any LPP can formally be transformed into the standard form by the following rules. We consider the possible violations of the standard form according to Definition 9.2. Linear programming 337 (1) Some variable xj is not necessarily non-negative, i.e. xj may take arbitrary values. Then variable xj is replaced by the difference of two non-negative variables, i.e. we set: xj = xj∗ − xj∗∗ with xj∗ ≥ 0 and xj∗∗ ≥ 0. Then we get: xj∗ > xj∗∗ ⇐⇒ xj > 0 xj∗ = xj∗∗ ⇐⇒ xj = 0 xj∗ < xj∗∗ ⇐⇒ xj < 0. (2) The given objective function has to be minimized: z = c1 x1 + c2 x2 + . . . + cn xn → min! The determination of a minimum of function z is equivalent to the determination of a maximum of function z¯ = −z: z = c1 x1 + c2 x2 + . . . + cn xn → min! ⇐⇒ z¯ = −z = −c1 x1 − c2 x2 − . . . − cn xn → max! (3) For some right-hand sides, we have bi < 0: ai1 x1 + ai2 x2 + . . . + ain xn = bi < 0. In this case, we multiply the above constraint by −1 and obtain: −ai1 x1 − ai2 x2 − . . . − ain xn = −bi > 0. (4) Let some constraints be inequalities: ai1 x1 + ai2 x2 + . . . + ain xn ≤ bi or ak1 x1 + ak2 x2 + . . . + akn xn ≥ bk . Then by introducing a slack variable ui and a surplus variable uk , respectively, we obtain an equation: ai1 x1 + ai2 x2 + . . . + ain xn + ui = bi with ui ≥ 0 or ak1 x1 + ak2 x2 + . . . + akn xn − uk = bk with uk ≥ 0. 338 Linear programming (5) Let the given system of linear equations be not in canonical form, i.e. the constraints are given e.g. as follows: a11 x1 + a12 x2 + . . . + a1n xn = b1 a21 x1 + a22 x2 + . . . + a2n xn = b2 . . am1 x1 + am2 x2 + . . . + amn xn = bm with bi ≥ 0, i = 1, 2, . . . , m; xj ≥ 0, j = 1, 2, . . . , n. In the above situation, there is no constraint that contains an eliminated variable with coefficient +1 (provided that all column vectors of matrix A belonging to variables x1 , x2 , . . . , xn are different from the unit vector). Then we introduce in each equation an artificial variable xAi as basic variable and obtain: a11 x1 + a12 x2 + . . . + a1n xn + xA1 a21 x1 + a22 x2 + . . . + a2n xn . . = b1 = b2 . . + xA2 am1 x1 + am2 x2 + . . . + amn xn + xAm = bm with bi ≥ 0, i = 1, 2, . . . , m; xj ≥ 0, j = 1, 2, . . . , n, and xAi ≥ 0, i = 1, 2, . . . , m. At the end, we may renumber the variables so that they are successively numbered. (For the above representation by x1 , x2 , . . . , xn+m , but in the following, we always assume that the problem includes n variables x1 , x2 , . . . , xn after renumbering.) In the way described above, we can transform any given LPP formally into the standard form. (It is worth noting again that a solution is only feasible for the original problem if all artificial variables have value zero.) For illustrating the above transformation of an LPP into the standard form, we consider the following example. Example 9.4 Given is the following LPP: z = −x1 + 3x2 + x4 → min! s.t. x1 − x2 + 3x3 x2 − 5x3 x3 − x4 ≥ + 2x4 ≤ + x4 ≤ x2 , x3 , x4 ≥ 8 −4 3 0. First, we substitute for variable x1 the difference of two non-negative variables x1∗ and x1∗∗ , i.e. x1 = x1∗ − x1∗∗ with x1∗ ≥ 0 and x2∗∗ ≥ 0. Further, we multiply the objective function z by −1 and obtain: z = −z = x1∗ − x1∗∗ − 3x2 − x4 → max! s.t. x1∗ − x1∗∗ − x2 + 3x3 − x4 x2 − 5x3 + 2x4 x3 + x4 x1∗ , x1∗∗ , x2 , x3 , x4 ≥ 8 ≤ −4 ≤ 3 ≥ 0. Multiplying the second constraint by −1 and introducing the slack variable x7 in the third constraint as well as the surplus variables x5 and x6 in the first and second constraints, we Linear programming 339 obtain all constraints as equations with non-negative right-hand sides: z = −z = x1∗ − x2∗∗ − 3x2 − x4 → max! s.t. x1∗ − x1∗∗ − x2 + 3x3 − − x2 + 5x3 − x3 + x4 2x4 x4 − x5 − x6 + x7 x1∗ , x1∗∗ , x2 , x3 , x4 , x5 , x6 , x7 = = = ≥ 8 4 3 0. Now we can choose variable x1∗ as eliminated variable in the first constraint and variable x7 as the eliminated variable in the third constraint, but there is no variable that occurs only in the second constraint having coefficient +1. Therefore, we introduce the artificial variable xA1 in the second constraint and obtain: z = −z = x1∗ − x2∗∗ − 3x2 − x4 → max! s.t. − x1∗∗ − x2 + 3x3 − x4 − x5 + x1∗ =8 − x2 + 5x3 − 2x4 − x6 + xA1 =4 x3 + x4 + x7 = 3 x1∗ , x1∗∗ , x2 , x3 , x4 , x5 , x6 , x7 , xA1 ≥ 0. Notice that we have written the variables in such a way that the identity submatrix (column vectors of variables x1∗ , xA1 , x7 ) occurs at the end. So in the standard form, the problem has now n = 9 variables. A vector satisfying all constraints is only a feasible solution for the original problem if the artificial variable xA1 has value zero (otherwise the original second constraint would be violated). 9.4 SIMPLEX ALGORITHM In this section, we always assume that the basic feasible solution resulting from the standard form of an LPP is feasible for the original problem. (In particular, we assume that no artificial variables are necessary to transform the given LPP into the standard form.) We now discuss a general method for solving linear programming problems, namely the simplex method. The basic idea of this approach is as follows. Starting with some initial extreme point (represented by a basic feasible solution resulting from the standard form of an LPP), we compute the value of the objective function and check whether the latter can be improved upon by moving to an adjacent extreme point (by applying the pivoting procedure). If so, we perform this move to the next extreme point and seek then whether further improvement is possible by a subsequent move. When finally an extreme point is attained that does not admit any further improvement, it will constitute an optimal solution. Thus, the simplex method is an iterative procedure which ends after a finite number of pivoting steps in an optimal extreme point (provided it is possible to move in each step to an adjacent extreme point). This idea is illustrated in Figure 9.6 for the case z → max! Starting from extreme point P1 , one can go via points P2 and P3 to the optimal extreme point P4 or via P 2 , P 3 , P 4 to P4 . In both cases, the objective function value increases from extreme point to extreme point. In order to apply such an approach, a criterion to decide whether a move to an adjacent extreme point improves the objective function value is required, which we will derive in the following. We have already introduced the canonical form of a system of equations in Chapter 8.1.3 (see Definition 8.7). In the following, we assume that the rank of matrix A is equal to m, i.e. in the canonical form there are m basic variables among the n variables, and the number of non-basic variables is equal to n0010 = n − m. Consider a feasible canonical form 340 Linear programming with the basic variables xBi and the non-basic variables xNj : 0010 xBi = b∗i − n 0006 a∗ij xNj , (n0010 = n − m). i = 1, 2, . . . , m (9.9) j=1 Partitioning the set of variables into basic and non-basic variables, the objective function z can be written as follows: z = c1 x1 + c2 x2 + . . . + cn xn = cB1 xB1 + cB2 xB2 + . . . + cBm xBm + cN 1 xN 1 + cN 2 xN 2 + . . . + cNn0010 xNn0010 m n0010 0006 0006 cBi xBi + cNj xNj . = j=1 i=1 Using equations (9.9), we can replace the basic variables and write the objective function only in dependence on the non-basic variables. We obtain ⎛ ⎞ m n0010 n0010 0006 0006 0006 ∗ ∗ ⎝ ⎠ z = cBi bi − aij xNj + cNj xNj i=1 = m 0006 j=1 cBi b∗i − 001e m n0010 0006 0006 i=1 j=1 j=1 001f cBi a∗ij − cNj xNj . i=1 We refer to the latter row, where the objective function is written in terms of the current non-basic variables, as the objective row. Moreover, we define the following values: gj = m 0006 cBi a∗ij − cNj (coefficient of variable xNj in the objective row); (9.10) (value of the objective function of the basic solution). (9.11) i=1 z0 = m 0006 cBi b∗i i=1 Figure 9.6 Illustration of the simplex method. Linear programming 341 Table 9.3 Simplex tableau of the basic feasible solution (9.9) No. nbv xN 1 xN 2 ··· xNl ··· xNn0010 bv -1 cN 1 cN 2 ··· cNl ··· cNn0010 0 xB1 xB2 . . cB1 cB2 . . a∗11 a∗21 . . a∗12 a∗22 . . ··· ··· a∗1l a∗2l . . ··· ··· a∗1n0010 a∗2n0010 . . b∗1 b∗2 . . xBk . . xBm z cBk . . cBm a∗k1 . . a∗m1 g1 a∗k2 . . a∗m2 g2 ··· ∗ akl . . a∗ml gl ··· a∗kn0010 . . a∗mn0010 gn0010 b∗k . . b∗m z0 ··· ··· ··· ··· Q Concerning the calculation of value z0 according to formula (9.11), we remember that in a basic solution, all non-basic variables are equal to zero. Then we get the following representation of the objective function in dependence on the non-basic variables xNj : z = z0 − g1 xN 1 − g2 xN 2 − . . . − gn0010 xNn0010 . Here each coefficient gj gives the change in the objective function value if the non-basic variable xNj is included in the set of basic variables (replacing some other basic variable) and if its value would increase by one unit. By means of the coefficients in the objective row, we can give the following optimality criterion. THEOREM 9.3 (optimality criterion) If inequalities gj ≥ 0, j = 1, 2, . . . , n0010 , hold for the coefficients of the non-basic variables in the objective row, the corresponding basic feasible solution is optimal. From Theorem 9.3 we get the following obvious corollary. COROLLARY 9.1 If there exists a column l with gl < 0 in a basic feasible solution, the value of the objective function can be increased by including the column vector belonging to the non-basic variable xNl into the set of basis vectors, i.e. variable xNl becomes a basic variable in the subsequent basic feasible solution. Assume that we have some current basic feasible solution (9.9). The corresponding simplex tableau is given in Table 9.3. It corresponds to the short form of the tableau of the pivoting procedure as introduced in Chapter 8.2.3 when solving systems of linear inequalities. An additional row contains the coefficients gj together with the objective function value z0 (i.e. the objective row) calculated as given above. In the second row and column of the simplex tableau in Table 9.3, we write the coefficients of the corresponding variables in the objective function. Notice that the values of the objective row in the above tableau (i.e. the coefficients gj and the objective function value z0 ) are obtained as scalar products of the vector (−1, cB1 , cB2 , . . . , cBm )T of the second column 342 Linear programming and the vector of the corresponding column (see formulas (9.10) and (9.11)). Therefore, numbers −1 and 0 in the second row are fixed in each tableau. In the left upper box, we write the number of the tableau. Note that the basic feasible solution (9.9) represented by Table 9.3 is not necessarily the initial basic feasible solution resulting from the standard form (in the latter case, we would simply have a∗ij = aij and b∗i = bi for i = 1, 2, . . . , m and j = 1, 2, . . . , n0010 provided that A = (aij ) is the matrix formed by the column vectors belonging to the initial non-basic variables). If in the simplex tableau given in Table 9.3 at least one of the coefficients gj is negative, we have to perform a further pivoting step, i.e. we interchange one basic variable with a non-basic variable. This may be done as follows, where first the pivot column is determined, then the pivot row, and in this way the pivot element is obtained. Determination of the pivot column l Choose some column l, 1 ≤ l ≤ n0010 , such that gl < 0. Often, a column l is used with gl = min{gj | gj < 0, j = 1, 2, . . . , n0010 }. It is worth noting that the selection of the smallest negative coefficient gl does not guarantee that the algorithm terminates after the smallest possible number of iterations. It guarantees only that there is the biggest increase in the objective function value when going towards the resulting subsequent extreme point. Determination of the pivot row k We recall that after the pivoting step, the feasibility of the basic solution must be maintained. Therefore, we choose row k with 1 ≤ k ≤ m such that 0013 ∗ 0010 0014 b∗k bi 00100010 ∗ a = min > 0, i = 1, 2, . . . , m . a∗kl a∗il 0010 il It is the same selection rule which we have already used for the solution of systems of linear inequalities (see Chapter 8.2.3). To determine the above quotients, we have added the last column Q in the tableau given in Table 9.3, where we enter the quotient in each row in which the corresponding element in the chosen pivot column is positive. If column l is chosen as pivot column, the corresponding variable xNl becomes a basic variable in the next step. We also say that xNl is the entering variable, and the column of the initial matrix A belonging to variable xNl is entering the basis. Using row k as pivot row, the corresponding variable xBk becomes a non-basic variable in the next step. In this case, we say that xBk is the leaving variable, and the column vector of matrix A belonging to variable xNl is leaving the basis. Element a∗kl is known as the pivot or pivot element. It has been printed in bold face in the tableau together with the leaving and the entering variables. The following two theorems characterize situations when either an optimal solution does not exist or when an existing optimal solution is not uniquely determined. THEOREM 9.4 If inequality gl < 0 holds for a coefficient of a non-basic variable in the objective row and inequalities a∗il ≤ 0, i = 1, 2, . . . , m, hold for the coefficients in column l of the current tableau, then the LPP does not have an optimal solution. Linear programming 343 In the latter case, the objective function value is unbounded from above, and we can stop our computations. Although there is a negative coefficient gl , we cannot move to an adjacent extreme point with a better objective function value (there is no leaving variable which can be interchanged with the non-basic variable xNl ). THEOREM 9.5 If there exists a coefficient gl = 0 in the objective row of the tableau of an optimal basic feasible solution and inequality a∗il > 0 holds for at least one coefficient in column l, then there exists another optimal basic feasible solution, where xNl is a basic variable. If the assumptions of Theorem 9.5 are satisfied, we can perform a further pivoting step with xNl as entering variable, and there is at least one basic variable which can be chosen as leaving variable. However, due to gl = 0, the objective function value does not change. Based on the results above, we can summarize the simplex algorithm as follows. Simplex algorithm (1) Transform the LPP into the standard form, where the constraints are given in canonical form as follows (remember that it is assumed that no artificial variables are necessary to transform the given problem into standard form): AN xN + I xB = b, xN ≥ 0, xB ≥ 0, b ≥ 0, where AN = (aij ) is of order m × n0010 and b = (b1 , b2 , . . . , bm )T . The initial basic feasible solution is 0003 x= xN xB 0004 0003 = 0 b 0004 with the objective function value z0 = cT x. Establish the corresponding initial tableau. (2) Consider the coefficients gj , j = 1, 2, . . . , n0010 , of the non-basic variables xNj in the objective row. If gj ≥ 0 for j = 1, 2, . . . , n0010 , then the current basic feasible solution is optimal, stop. Otherwise, there is a coefficient gj < 0 in the objective row. (3) Determine column l with gl = min{gj | gj < 0, j = 1, 2, . . . , n0010 } as pivot column. (4) If ail ≤ 0 for i = 1, 2, . . . , m, then stop. (In this case, there does not exist an optimal solution of the problem.) Otherwise, there is at least one element ail > 0. (5) Determine the pivot row k such that bk = min akl 0013 0010 0014 bi 00100010 a > 0, i = 1, 2, . . . , m . il ail 0010 344 Linear programming (6) Interchange the basic variable xBk of row k with the non-basic variable xNl of column l and calculate the following values of the new tableau: 1 ; akl akj bk a∗kj = ; b∗k = ; j = 1, 2, . . . , n0010 , j = l; akl akl ail a∗il = − ; i = 1, 2, . . . , m, i = k; akl ail ail a∗ij = aij − · akj ; b∗i = bi − · bk ; akl akl a∗kl = i = 1, 2, . . . , m, i = k; j = 1, 2, . . . , n0010 , j = l. Moreover, calculate the values of the objective row in the new tableau: gl ; akl gl gj∗ = gj − · akj ; akl gl z0∗ = z0 − · bk . akl gl∗ = − j = 1, 2, . . . , n0010 , j = l; Consider the tableau obtained as a new starting solution and go to step 2. It is worth noting that the coefficients gj∗ and the objective function value z0∗ in the objective row of new tableau can also be obtained by means of formulas (9.10) and (9.11), respectively, using the new values a∗ij and b∗i . If in each pivoting step the objective function value improves, the simplex method certainly terminates after a finite number of pivoting steps. However, it is possible that the objective function value does not change after a pivoting step. Assume that, when determining the pivot row, the minimal quotient is equal to zero. This means that some component of the current right-hand side vector is equal to zero. (This always happens when in the previous tableau the minimal quotient was not uniquely defined.) In this case, the basic variable xBk has the value zero, and in the next pivoting step, one non-basic variable becomes a basic variable again with value zero. Geometrically, this means that we do not move to an adjacent extreme point in this pivoting step: there is only one non-basic variable interchanged with some basic variable having value zero, and so it may happen that after a finite number of such pivoting steps with unchanged objective function value, we come again to a basic feasible solution that has already been visited. So, a cycle occurs and the procedure would not stop after a finite number of steps. We only mention that there exist several rules for selecting the pivot row and column which prevent such a cycling. One of these rules can be given as follows. Smallest subscript rule. If there are several candidates for the entering and/or leaving variables, always choose the corresponding variables having the smallest subscript. The above rule, which is due to Bland, means that among the non-basic variables with negative coefficient gj in the objective row, the variable with the smallest subscript is taken as the Linear programming 345 entering variable and if, in this case, the same smallest quotient is obtained for several rows, then among the corresponding basic variables again the variable with the smallest subscript is chosen. However, since cycling occurs rather seldom in practice, the remote possibility of cycling is disregarded in most computer implementations of the simplex algorithm. Example 9.5 Let us consider again the data given in Example 9.2. We can immediately give the tableau for the initial basic feasible solution: 1 nbv x1 bv −1 1 2 0 x3 x4 x5 0 0 0 0 6 6 −1 8 6 3 −2 640 720 600 0 x2 Q 80 120 200 Choosing x2 as entering variable, we get the quotients given in the last column of the above tableau and thus we choose x3 as leaving variable. (Hereafter, both the entering and leaving variables are printed in bold face.) Number 8 (also printed in bold face) becomes the pivot element, and we get the following new tableau: 2 nbv x1 bv −1 1 0 1 8 − 34 − 38 1 4 x2 2 0 x4 0 6 x5 0 6 −1 x3 0 Q 80 -- 240 40 360 60 160 Now, the entering variable x1 is uniquely determined, and from the quotient column we find x4 as the leaving variable. Then we obtain the following new tableau: 3 nbv x4 x3 bv −1 0 0 0 x2 2 0 1 x5 0 −1 1 8 − 18 3 8 1 8 80 x1 1 6 1 6 Q 40 120 200 From the last tableau, we get the optimal solution which has already been found in Example 9.2 geometrically, i.e. the basic variables x1 , x2 and x5 are equal to the corresponding values of the right-hand side: x1 = 40, x2 = 80, x5 = 120, and the non-basic variables x3 and x4 are equal to zero. We also see that the simplex method moves in each step 346 Linear programming from an extreme point to an adjacent extreme point. Referring to Figure 9.2, the algorithm starts at point P1 , moves then to point P5 and finally to the optimal extreme point P4 . If in the first tableau variable x1 were chosen as the entering variable (which would also be allowed since the corresponding coefficient in the objective row is negative), the resulting path would be P1 , P2 , P3 , P4 , and in the latter case three pivoting steps would be necessary. Example 9.6 A firm intends to manufacture three types of products P1 , P2 and P3 so that the total production cost does not exceed 32,000 EUR. There are 420 working hours possible and 30 units of raw materials may be used. Additionally, the data presented in Table 9.4 are given. Table 9.4 Data for Example 9.6 Product Selling price (EUR/piece) Production cost (EUR/piece) Required raw material (per piece) Working time (hours per piece) P1 P2 P3 1,600 1,000 3 20 3,000 2,000 2 10 5,200 4,000 2 20 The objective is to determine the quantities of each product so that the profit is maximized. Let xi be the number of produced pieces of Pi , i ∈ {1, 2, 3}. We can formulate the above problem as an LPP as follows: z = 6x1 + 10x2 + 12x3 → max! s.t. x1 + 2x2 + 4x3 ≤ 32 3x1 + 2x2 + 2x3 ≤ 30 2x1 + x2 + 2x3 ≤ 42 x1 , x2 , x3 ≥ 0. The objective function has been obtained by subtracting the production cost from the selling price and dividing the resulting profit by 100 for each product. Moreover, the constraint on the production cost has been divided by 1,000, and the constraint on the working time by 10. Introducing now in the ith constraint the slack variable x3+i ≥ 0, we obtain the standard form together with the following initial tableau: 1 nbv x1 x2 x3 bv −1 6 10 12 0 Q x4 x5 x6 0 0 0 1 3 2 −6 2 2 1 −10 4 2 2 −12 32 30 42 0 8 15 21 Linear programming 347 Choosing x3 now as the entering variable (since it has the smallest negative coefficient in the objective row), variable x4 becomes the leaving variable due to the quotient rule. We obtain: 2 nbv x1 x2 x4 bv −1 6 10 0 0 x3 12 1 2 16 0 14 14 x6 0 1 4 1 −2 − 12 8 x5 1 4 5 2 3 2 26 96 − −3 1 0 −4 3 Q Choosing now x2 as entering variable, x5 becomes the leaving variable. We obtain the tableau: 3 nbv x1 x5 x4 bv −1 6 0 0 0 x3 12 −1 − 12 x2 10 0 0 1 2 1 −2 − 12 14 x6 5 2 3 2 4 1 152 7 1 Q 1 26 Since now all coefficients gj are positive, we get the following optimal solution from the latter tableau: x1 = 0, x2 = 14, x3 = 1, x4 = 0, x5 = 0, x6 = 26. This means that the optimal solution is to produce no piece of product P1 , 14 pieces of product P2 and one piece of product P3 . Taking into account that the coefficients of the objective function were divided by 100, we get a total profit of 15,200 EUR. Example 9.7 We consider the following LPP: z = −2x1 − 2x2 → min! s.t. x1 − x2 ≥ −1 −x1 + 2x2 ≤ 4 x1 , x2 ≥ 0. First, we transform the given problem into the standard form, i.e. we multiply the objective function and the first constraint by −1 and introduce the slack variables x3 and x4 . We obtain: z = 2x1 + 2x2 → max! s.t. − x1 + x2 + − x1 + 2x2 x3 + x4 x1 , x2 , x3 , x4 = = ≥ 1 4 0. 348 Linear programming Now we can establish the first tableau: 1 nbv x1 x2 bv −1 2 2 0 Q x3 x4 0 0 −1 −1 −2 1 2 −2 1 4 0 1 2 Since there are only negative elements in the column of variable x1 , only variable x2 can be the entering variable. In this case, we get the quotients given in the last column of the latter tableau and therefore, variable x3 is the leaving variable. We obtain the following tableau: 2 nbv x1 x3 bv −1 2 0 0 Q x2 x4 2 0 −1 1 −4 1 −2 2 1 2 2 – 2 In the latter tableau, there is only one negative coefficient of a non-basic variable in the objective row, therefore variable x1 becomes the entering variable. Since there is only one positive element in the column belonging to x1 , variable x4 becomes the leaving variable. We obtain the following tableau: 3 nbv x4 x3 bv −1 0 0 0 x2 x1 2 2 1 1 4 −1 −2 −6 3 2 10 Q Since there is only one negative coefficient of a non-basic variable in the objective row, variable x3 should be chosen as entering variable. However, there are only negative elements in the column belonging to x3 . This means that we cannot perform a further pivoting step, and so there does not exist an optimal solution of the maximization problem considered (i.e. the objective function value can become arbitrarily large, see Theorem 9.4). Example 9.8 Given is the following LPP: z = x1 + x2 + x3 + x4 + x5 + x6 → min! ≥ 4, 000 s.t. 2x1 + x2 + x3 x2 + 2x4 + x5 ≥ 5, 000 x3 + 2x5 + 3x6 ≥ 3, 000 x1 , x2 , x3 , x4 , x5 , x6 ≥ 0. To get the standard form, we notice that in each constraint there is one variable that occurs only in this constraint. (Variable x1 occurs only in the first constraint, variable x4 only in the second constraint and variable x6 only in the third constraint.) Therefore, we divide the first constraint by the coefficient 2 of variable x1 , the second constraint by 2 and the third Linear programming 349 constraint by 3. Then, we introduce a surplus variable in each of the constraints, multiply the objective function by −1 and obtain the standard form. (Again the variables are written in such a way that the identity submatrix of the coefficient matrix occurs now at the end.) z = −z = −x1 − x2 − x3 − x4 − x5 − x6 → max! 1 1 + x1 − x7 s.t. 2 x2 + 2 x3 + 12 x5 1 2 x2 − x8 1 2 3 x3 + 3 x5 = 2, 000 + x4 = 2, 500 − x9 + x6 = 1, 000 x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 ≥ 0. This yields the following initial tableau: 1 nbv x2 x3 x5 x7 x8 x9 bv −1 −1 −1 −1 0 0 0 x1 −1 1 2 0 −1 0 0 2,000 − x4 −1 1 2 1 2 0 −1 0 2,500 5,000 x6 −1 0 1 2 2 3 − 16 0 0 −1 1,000 1,500 1 1 1 − 5,500 0 1 3 1 6 0 0 Q Choosing now x5 as entering variable, we obtain the quotients given in the last column of the above tableau and therefore, x6 is chosen as leaving variable. We obtain the following tableau: 2 nbv x2 x3 x6 x7 x8 x9 bv −1 −1 −1 −1 0 0 0 1 2 1 −4 1 2 1 4 0 −1 0 0 2,000 − 34 0 −1 1,750 0 0 1 1 3 4 − 32 3 4 x1 −1 x4 −1 1 2 1 2 x5 −1 0 0 3 2 1 4 0 Q 1,500 −5,250 Now all coefficients of the non-basic variables in the objective row are non-negative and from the latter tableau we obtain the following optimal solution: x1 = 2, 000, x2 = x3 = 0, x4 = 1, 750, x5 = 1, 500, x6 = 0 = −5, 250 which corresponds to z0min = 5, 250 with the optimal objective function value z max 0 (for the original minimization problem). Notice that the optimal solution is not uniquely determined. In the last tableau, there is one coefficient in the objective row equal to zero. Taking x2 as the entering variable, the quotient rule determines x4 as the leaving variable, and the following basic feasible solution with the same objective function value is obtained: x1 = 250, x2 = 3, 500, x3 = x4 = 0, x5 = 1, 500, x6 = 0. 350 Linear programming 9.5 TWO-PHASE SIMPLEX ALGORITHM In this section, we discuss the case when artificial variables are necessary to transform a given problem into standard form. In such a case, we have to determine a basic solution feasible for the original problem that can also be done by applying the simplex algorithm. This procedure is called phase I of the simplex algorithm. It either constructs an initial basic feasible solution or recognizes that the given LPP does not have a feasible solution at all. If a feasible starting solution has been found, phase II of the simplex algorithm starts, which corresponds to the simplex algorithm described in Chapter 9.4. The introduction of artificial variables is necessary when at least one constraint is an equation with no eliminated variable that has coefficient +1, e.g. the constraints may have the following form: a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 . . . . am1 x1 + am2 x2 + · · · + amn xn = bm xj ≥ 0, j = 1, 2, . . . , n; bi ≥ 0, i = 1, 2, . . . , m. As discussed in step (5) of generating the standard form, we introduce an artificial variable xAi in each equation. Additionally we replace the original objective function z by an objective function zI minimizing the sum of all artificial variables (or equivalently, maximizing the negative sum of all artificial variables) since it is our goal that all artificial variables will get value zero to ensure feasibility for the original problem. This gives the following linear programming problem to be considered in phase I: zI = −xA1 − xA2 − . . . − xAm −→ max! s.t. a11 x1 + a12 x2 + . . . + a1n xn + xA1 a21 x1 + a22 x2 + . . . + a2n xn . . am1 x1 + am2 x2 + . . . + amn xn xj ≥ 0, j = 1, 2, . . . , n and + xA2 = b1 = b2 . . (9.12) + xAm = bm xAi ≥ 0, i = 1, 2, . . . , m. The above problem is the standard form of an LPP with function z replaced by function zI . The objective function zI used in the first phase is also called the auxiliary objective function. When minimizing function zI , the smallest possible objective function value is equal to zero. It is attained when all artificial variables have value zero, i.e. the artificial variables have become non-basic variables, or possibly some artificial variable is still a basic variable but it has value zero. When an artificial variable becomes the leaving variable (and therefore has value zero in the next tableau), it never will be the entering variable again, and therefore this variable together with the corresponding column can be dropped in the new tableau. Linear programming 351 Assume that we have determined an optimal solution of the auxiliary problem (9.12) by the simplex method, i.e. the procedure stops with gj ≥ 0 for all coefficients of the non-basic variables in the objective row (for the auxiliary objection function zI ). Then, at the end of phase I , the following cases are possible: (1) We have zImax < 0. Then the initial problem does not have a feasible solution, i.e. M = ∅. (2) We have zImax = 0. Then one of the following cases occurs: (a) All artificial variables are non-basic variables. Then the basic solution obtained represents a feasible canonical form for the initial problem, and we can start with phase II of the simplex algorithm described in Chapter 9.4. (b) Among the basic variables, there is still an artificial variable in row k: xBk = xAl = 0 (degeneration case). Then we have one of the two possibilities: (i) In the row belonging to the basic variable xAl = 0, all coefficients are also equal to zero. In this case, the corresponding equation is superfluous and can be omitted. (ii) In the row belonging to the basic variable xAl = 0, we have a∗kj = 0 in the tableau according to Table 9.3 (with function zI ) for at least one coefficient. Then we can choose a∗kj as pivot element and replace the artificial variable xAl by the non-basic variable xNj . We illustrate the two-phase simplex algorithm by the following three examples. Example 9.9 Given is the following LPP: z = x1 − 2x2 → max! s.t. x1 + x 2 ≤ 4 2x1 − x2 ≥ 1 x1 , x2 ≥ 0. We transform the given problem into standard form by introducing a surplus variable (x3 ) in the second constraint, a slack variable (x4 ) in the first constraint and an artificial variable (xA1 ) in the second constraint. Now we replace the objective function z by the auxiliary function zI . Thus, in phase I of the simplex method, we consider the following LPP: zI = −xA1 → max! s.t. x1 + x2 2x1 − x2 − + x3 = = ≥ x4 + xA1 x1 , x2 , x3 , x4 , xA1 4 1 0. We start with the following tableau: 1 nbv x1 x2 x3 bv −1 0 0 0 0 Q x4 xA1 0 −1 1 2 −2 1 −1 1 0 −1 1 4 1 −1 4 1 2 352 Linear programming Choosing x1 as entering variable gives the quotients presented in the last column of the above tableau, and variable xA1 becomes the leaving variable. This leads to the following tableau: 2 nbv xA1 x2 x3 bv −1 −1 0 0 0 x4 0 − 12 1 2 x1 0 7 2 1 2 1 2 1 3 2 − 12 0 − 12 0 Q 0 Now phase I is finished, we drop variable xA1 and the corresponding column, use the original objective function and determine the coefficients gj of the objective row. This yields the following tableau: 2∗ nbv x2 x3 bv −1 −2 0 0 Q x4 0 1 2 x1 1 7 2 1 2 1 2 − 3 2 − 12 − 12 3 2 − 12 7 Due to the negative coefficient in the objective row, we choose x3 as the entering variable in the next step and variable x4 becomes the leaving variable. Then we obtain the following tableau 3 nbv x2 x4 bv −1 −2 0 0 x3 x1 0 1 3 1 3 2 1 1 7 4 4 Q Since all coefficients gj are non-negative, the obtained solution is optimal: x1 = 4, x2 = 0. The introduced surplus variable x3 is equal to seven while the introduced slack variable x4 is equal to zero. The optimal objective function value is z0max = 4. The graphical solution of this problem is illustrated in Figure 9.7. We see from Figure 9.7 that the origin of the coordinate system with variables x1 and x2 is not feasible since the second constraint is violated. This was the reason for introducing the artificial variable xA1 which has initially the value one. After the first pivoting step, we get a feasible solution for the original problem which corresponds to extreme point P1 in Figure 9.7. Now phase II of the simplex algorithm starts, and after the next pivoting step we reach an adjacent extreme point P2 which corresponds to an optimal solution of the considered LPP. Let us consider another LPP assuming that the objective function changes now to z˜ = −x1 + 3x2 −→ min! Linear programming 353 Can we easily decide whether the optimal solution for the former objective function is also optimal for the new one? We replace only the coefficients c1 and c2 of the objective function in the last tableau (again for the maximization version of the problem), recompute the coefficients gj of the objective row and obtain the following tableau: 3 nbv x2 x4 bv −1 −3 0 0 x3 x1 0 1 3 1 4 2 1 1 7 4 4 Q Since also in this case all coefficients gj in the objective row are non-negative, the solution x1 = 4, x2 = 0 is optimal for z˜ = −x1 + 3x2 −→ min as well with a function value z˜0min = −4. This can also be confirmed by drawing the objective function z˜ in Figure 9.7. Figure 9.7 Graphical solution of Example 9.9. Example 9.10 We consider the data given in Example 9.1 and apply the two-phase simplex method. Transforming the given problem into standard form we obtain: z = −25x1 − 17x2 − 12x3 → max! s.t. x1 + x2 + x3 + xA1 x3 x1 + x3 x2 + x3 + x4 + x5 + x6 x1 , x2 , x3 , x4 , x5 , x6 , xA1 = 100 = 30 = 50 = 90 ≥ 0. 354 Linear programming Starting with phase I of the simplex method, we replace function z by the auxiliary objective function zI = −xA1 −→ max! We obtain the following initial tableau: 1 nbv x1 x2 x3 bv −1 0 0 0 0 xA1 x4 x5 x6 −1 0 0 0 1 0 1 0 −1 1 0 0 1 −1 1 1 1 1 −1 100 30 50 90 −100 Q 100 − 50 − Choosing x1 as the entering variable, we get the quotients given above and select x5 as the leaving variable. This leads to the following tableau: 2 nbv x5 x2 x3 bv −1 0 0 0 0 Q xA1 x4 x1 x6 −1 0 0 0 −1 0 1 0 1 1 0 0 1 −1 0 1 1 1 0 50 30 50 90 − 50 50 − − 90 Now x2 becomes the entering variable and the artificial variable xA1 is the leaving variable. We get the following tableau, where the superfluous column belonging to xA1 is dropped. 3 nbv x5 x3 bv −1 0 0 0 x2 x4 x1 x6 0 0 0 0 −1 0 1 1 0 0 1 1 1 0 50 30 50 40 0 Q Now, phase I is finished, and we can consider the objective function z = −25x1 − 17x2 − 12x3 −→ max! Linear programming 355 We recompute the coefficients in the objective row and obtain the following tableau: 3∗ nbv x5 x3 bv −1 0 −12 0 Q x2 x4 x1 x6 −17 0 −25 0 −1 0 1 1 −8 0 1 1 1 −13 50 30 50 40 −2,100 − 30 50 40 We choose x3 as entering variable and based on the quotients given in the last column, x4 is the leaving variable. After this pivoting step, we get the following tableau: 4 nbv x5 x4 bv −1 0 0 0 Q x2 x3 x1 x6 −17 −12 −25 0 −1 0 1 1 −8 0 1 −1 −1 13 50 30 20 10 −1,710 − − 20 10 We choose x5 as entering variable and x6 as leaving variable which gives the following tableau: 5 nbv x6 x4 bv −1 0 0 0 x2 x3 x1 x5 −17 −12 −25 0 1 0 −1 1 8 −1 1 0 −1 5 60 30 10 10 −1,630 Q This last tableau gives the following optimal solution: x1 = 10, x2 = 60, x3 = 30, x4 = 0, x5 = 10, x6 = 0 with the objective function value z0min = 1, 630 for the minimization problem. Example 9.11 Consider the following LPP: z = x1 + 2x2 → max! s.t. x1 − x 2 ≥ 1 5x1 − 2x2 ≤ 3 x1 , x2 ≥ 0. 356 Linear programming Transforming the above problem into standard form, we obtain z = x1 + 2x2 → max! s.t. x1 − x2 − 5x1 − 2x2 x3 + = = ≥ xA1 + x4 x1 , x2 , x3 , x4 , xA1 1 3 0. This leads to the following starting tableau for phase I with the auxiliary objective function zI = −xA1 → max! 1 nbv x1 x2 x3 bv −1 0 0 0 0 Q xA1 x4 −1 0 1 5 −1 −2 −1 0 1 3 1 −1 1 1 −1 3 5 We choose now variable x1 as entering variable which gives the leaving variable x4 . This yields the following tableau: 2 nbv x4 x2 x3 bv −1 0 0 0 0 xA1 −1 − 15 − 35 −1 0 1 5 1 5 − 25 3 5 0 2 5 3 5 − 25 x1 1 Q So, we finish with case (1) described earlier, i.e. zImax < 0. Consequently, the above LPP does not have a feasible solution. In fact, in the final tableau, variable xA1 is still positive (so the original constraint x1 − x2 − x3 = 1 is violated). For the considered problem, the empty feasible region is given in Figure 9.8. (It can be seen that there are no feasible solutions for this problem, which confirms our computations.) Figure 9.8 Empty feasible region M for Example 9.11. Linear programming 357 Remark For many problems, it is possible to reduce the number of artificial variables to be considered in phase I. This is important since in any pivoting step, only one artificial variable is removed from the set of basic variables. Thus, in the case of introducing m artificial variables, at least m pivoting steps are required in phase I. Let us consider the following example. Example 9.12 −2x1 x1 x1 − + Consider the following constraints of an LPP: 2x2 − + x3 2x3 2x2 x2 + x3 − − − + + + x4 x4 x4 x5 x5 x5 x1 , x2 , x3 , x4 , x5 ≥ ≥ ≥ ≥ ≥ 1 2 3 5 0. First, we introduce in each constraint a surplus variable and obtain 2x2 − x3 − x4 + x5 − x6 = 1 −2x1 + 2x3 − x4 + x5 − x7 = 2 x1 − 2x2 − x4 + x5 − x8 = 3 x1 + x2 + x3 − x9 = 5 x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 ≥ 0. Now, we still need an eliminated variable with coefficient +1 in each equation constraint. Instead of introducing an artificial variable in each of the constraints, we subtract all other constraints from the constraint with the largest right-hand side. In this example, we get the first three equivalent constraints by subtracting the corresponding constraint from the fourth constraint: x1 3x1 − + x1 + x2 x2 3x2 x2 + − + + 2x3 x3 x3 x3 + + + x4 x4 x4 − − − x5 x5 x5 − − − − x9 + x 6 = 4 x9 + x7 = 3 x9 + x8 = 2 x9 = 5 x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 ≥ 0. Now we have to introduce only one artificial variable xA1 in the last constraint in order to start with phase I of the simplex algorithm. 9.6 DUALITY; COMPLEMENTARY SLACKNESS We consider the following LPP denoted now as primal problem (P): z = cT x −→ max! s.t. Ax ≤ b x ≥ 0, (P) 358 Linear programming where x = (x1 , x2 , . . . , xn )T ∈ Rn . By means of matrix A and vectors c and b, we can define a dual problem (D) as follows: w = bT u −→ min! s.t. AT u ≥ c u ≥ 0, (D) where u = (un+1 , un+2 , . . . , un+m )T ∈ Rm . Thus, the dual problem (D) is obtained by the following rules: (1) The coefficient matrix A of problem (P) is transposed. (2) The variables of the dual problem are denoted as un+1 , un+2 , . . . , un+m , and they have to be non-negative. (3) The vector b of the right-hand side of problem (P) is the vector of the objective function of problem (D). (4) The vector c of the objective function of the primal problem (P) is the vector of the right-hand side of the dual problem (D). (5) In all constraints of the dual problem (D), we have inequalities with the relation ≥. (6) The objective function of the dual problem (D) has to be minimized. By ‘dualizing’ problem (P), there is an assignment between the constraints of the primal problem (P) and the variables of the dual problem (D), and conversely, between the variables of problem (P) and the constraints of problem (D). Both problems can be described by the scheme given in Table 9.5 which has to be read row-wise for problem (P) and column-wise for problem (D). Table 9.5 Relationships between problems (P) and (D) x1 x2 .. xn ≤ un+1 un+2 . . un+m a11 a21 . . am1 a12 a22 . . am2 .. .. .. a1n a2n . . amn b1 b2 . . bm ≥ c1 c2 .. cn PP Pmin! max! PP For instance, the first constraint of problem (P) reads as a11 x1 + a12 x2 + . . . + a1n xn ≤ b1 while e.g. the second constraint of problem (D) reads as a12 un+1 + a22 un+2 + . . . + am2 un+m ≥ c2 . Notice that the variables are successively numbered, i.e. the n variables of the primal problem (P) are indexed by 1, 2, . . . , n while the m variables of the dual problem (D) are indexed by n + 1, n + 2, . . . , n + m. We get the following relationships between problems (P) and (D). THEOREM 9.6 Let problem (D) be the dual problem of problem (P). Then the dual problem of problem (D) is problem (P). Linear programming 359 )T THEOREM 9.7 Let x = (x1 , x2 , . . . , xn be an arbitrary feasible solution of problem (P) and u = (un+1 , un+2 , . . . , un+m )T be an arbitrary feasible solution of problem (D). Then z0 = cT x ≤ bT u = w0 . From the latter theorem, it follows that the objective function value of some feasible solution for a maximization problem is always smaller than or equal to the function value of a feasible solution for the dual (minimization) problem. THEOREM 9.8 If one of the problems (P) or (D) has an optimal solution, then the other one also has an optimal solution, and both optimal solutions x∗ and u∗ have the same objective function value, i.e. z0max = cT x∗ = bT u∗ = w0min . From Theorem 9.8 it follows that, if we know a feasible solution for the primal problem and a feasible solution for the dual problem, and both solutions have the same function value, then they must be optimal for the corresponding problems. The following theorem treats the case when one of the problems (P) and (D) has feasible solutions but the other one has not. THEOREM 9.9 If one of the problems (P) or (D) has a feasible solution but no optimal solution, then the other one does not have a feasible solution at all. Next, we consider the following question. Can we find the optimal solution for the dual problem immediately from the final tableau for the primal problem provided that both problems (P) and (D) have feasible solutions? We introduce slack variables into the constraints of problem (P) and surplus variables into the constraints of problem (D), and we obtain for the constraints: a11 x1 +a12 x2 + · · · +a1n xn + xn+1 a21 x1 +a22 x2 + · · · +a2n xn . . am1 x1 +am2 x2 + · · · +amn xn −u1 −u2 + xn+2 =b1 =b2 . . (P ∗ ) + xn+m =bm +a11 un+1 +a21 un+2 + · · · +am1 un+m = c1 +a12 un+1 +a22 un+2 + · · · +am2 un+m = c2 . . . . −un +a1n un+1 +a2n un+2 + · · · +amn un+m =cn . (D∗ ) Then we obtain the following relationship between the primal variables xj , j = 1, 2, . . . , n+m, and the dual variables uj , j = 1, 2, . . . , n + m, where the variables are numbered according to (P ∗ ) and (D∗ ). THEOREM 9.10 The coefficients of the non-basic variables xj in the objective row of the optimal tableau of problem (P) are equal to the optimal values of the corresponding variables uj of problem (D), and conversely: The optimal values of the basic variables of problem (P) 360 Linear programming are equal to the coefficients of the corresponding dual variables in the objective row of the optimal tableau of problem (P). Thus, from the optimal tableau of one of the problems (P) or (D), we can determine the optimal solution of the other one. Example 9.13 Consider the data given in Example 9.2. The optimal tableau for this problem has been determined in Example 9.5. Writing the dual problem with the constraints as in (D∗ ), we obtain: w = 640u3 + 720u4 + 600u5 −→ min! s.t. − u1 + 6u4 + 6u5 = − u2 + 8u3 + 6u4 + 3u5 = u1 , u2 , u3 , u4 , u5 ≥ 1 2 0. Applying Theorem 9.10, we get the following. Since in the optimal solution given in Example 9.5, variables x4 and x3 are non-basic variables, the corresponding variables u4 and u3 are basic variables in the optimal solution of the dual problem and their values are equal to the coefficients of x4 and x3 in the objective row of the optimal tableau of the primal problem, i.e. u4 = 1/6 and u3 = 1/8. Since x2 , x1 and x5 are basic variables in the optimal tableau of the primal problem, the variables u2 , u1 and u5 are non-basic variables in the optimal solution of the dual problem, i.e. their values are equal to zero. Accordingly, the values of the right-hand side in the optimal primal tableau correspond to the coefficients of the dual non-basic variables in the objective row, i.e. they are equal to 80, 40 and 120, respectively. Finally, we briefly deal with a ‘mixed’ LPP (i.e. equations may occur as constraints, we may have inequalities both with ≤ and ≥ sign, and variables are not necessarily non-negative). Then we can establish a dual problem using the rules given in Table 9.6. To illustrate, let us consider the following example. Example 9.14 Consider the LPP z = 2x1 + 3x2 + 4x3 s.t. x1 − x2 −3x1 + x2 4x1 − x2 x1 ≥ 0 −→ max! + x3 ≤ 20 + x3 ≥ 3 + 2x3 = 10 5 x3 ≤ (x2 , x3 arbitrary). Applying the rules given in Table 9.6, we obtain the following dual problem: w = 20u4 + 3u5 + 10u6 + 5u7 −→ min! ≥ 2 s.t. u4 − 3u5 + 4u6 −u4 + u 5 − u6 = 3 u4 + u5 + 2u6 + u7 = 4 u4 ≥ 0, u5 ≤ 0, u7 ≥ 0 (u6 arbitrary). Linear programming 361 Table 9.6 Rules for dualizing a problem Primal problem (P) Maximization problem Constraints ith constraint: inequality with sign ≤ ith constraint: inequality with sign ≥ ith constraint: equation Variables xj ≥ 0 xj ≤ 0 xj arbitrary =⇒ =⇒ =⇒ =⇒ =⇒ =⇒ Dual problem (D) Minimization problem Variables un+i ≥ 0 un+i ≤ 0 un+i arbitrary Constraints jth constraint: inequality with sign ≥ jth constraint: inequality with sign ≤ jth constraint: equation We finish this section with an economic interpretation of duality. Consider problem (P) as the problem of determining the optimal production quantities xj∗ of product Pj , j = 1, 2, . . . , n, subject to given budget constraints, i.e. for the production there are m raw materials required and the right-hand side value bi gives the available quantity of raw material Ri , i ∈ {1, 2, . . . , m}. The objective is to maximize the profit described by the linear function cT x. What is the economic meaning of the corresponding dual variables of problem (D) in this case? Assume that we increase the available amount of raw material Ri by one unit (i.e. we use the right-hand side component b0010i = bi + 1) and all other values remain constant. Moreover, assume that there is an optimal non-degenerate basic solution (i.e. all basic variables have a value greater than zero) and that the same basic solution is optimal for the modified problem (P0010 ) with bi replaced by b0010i (i.e. no further pivoting step is required). In this case, the optimal function value z max of problem (P0010 ) is given by 0 ∗ ∗ ∗ ∗ z max = b1 un+1 + . . . + (bi + 1)un+i + . . . + bm un+m = w0min + un+i , 0 ∗ , u∗ , . . . , u∗ T where u∗ = (un+1 n+m ) denotes the optimal solution of the dual problem (D). n+2 The latter equality holds due to Theorem 9.8 (i.e. z0max = w0min ). This means that after ‘buying’ an additional unit of resource Ri , the optimal function value increases by the value ∗ . Hence, the optimal values u∗ of the variables of the dual problem can be interpreted un+i n+i ∗ characterizes the as shadow prices for buying an additional unit of raw materials Ri , i.e. un+i price that a company is willing to pay for an additional unit of raw material Ri , i = 1, 2, . . . , m. Next, we consider the primal problem (P) and the dual problem (D). We can formulate the following relationship between the optimal solutions of both problems. THEOREM 9.11 Let x∗ = (x1∗ , x2∗ , . . . , xn∗ )T be a feasible solution of problem (P) and u∗ = ∗ , u∗ , . . . , u∗ T ∗ (un+1 n+m ) be a feasible solution of problem (D). Then both solutions x and n+2 u∗ are simultaneously optimal if and only if the following two conditions hold: (1) m 0006 ∗ aij un+i = cj or xj∗ = 0 for j = 1, 2, . . . , n; i=1 (2) n 0006 j=1 ∗ = 0 for i = 1, 2, . . . , m. aij xj∗ = bi or un+i 362 Linear programming Theorem 9.11 means that vectors x∗ and u∗ are optimal for problems (P) and (D) if and only if (1) either the jth constraint in problem (D) is satisfied with equality or the jth variable of problem (P) is equal to zero (or both) and (2) either the ith constraint in problem (P) is satisfied with equality or the ith variable of ∗ = 0) is equal to zero (or both). problem (D) (i.e. un+i Theorem 9.11 can be rewritten in the following form: THEOREM 9.12 A feasible solution x∗ = (x1∗ , x2∗ , . . . , xn∗ )T of problem (P) is optimal if ∗ , u∗ , . . . , u∗ T and only if there exists a vector u∗ = (un+1 n+m ) such that the following two n+2 properties are satisfied: m 0006 ∗ (1) if xj∗ > 0, then aij un+i = cj ; (9.13) (2) if n 0006 i=1 aij xj∗ ∗ < bi , then un+i = 0, j=1 provided that vector u∗ is feasible for problem (D), i.e. m 0006 ∗ ≥ cj aij un+i i=1 ∗ un+i ≥0 for j = 1, 2, . . . , n; (9.14) for i = 1, 2, . . . , m. Remark Theorem 9.12 is often useful in checking the optimality of allegedly optimal solutions when no certificate of optimality is provided. Confronted with an allegedly optimal solution x∗ of problem (P), we first set up the system of linear equations (9.13) and solve it for u∗ = ∗ , u∗ , . . . , u∗ ∗ ∗ T ∗ ∗ T (un+1 n+m ) . If the solution u = (un+1 , un+2 , . . . , un+m ) is uniquely detern+2 mined, then vector x∗ is optimal if and only if vector u∗ is feasible, i.e. conditions (9.14) hold. Example 9.15 Consider the LPP in Example 9.2 and the corresponding dual problem established in Example 9.13. Assume that we still do not know an optimal solution, but we want to use complementary slackness to verify that x1∗ = 40, x2∗ = 80 is indeed an optimal solution for problem (P). It follows from the second condition of (9.13) that u5∗ = 0 since, for the third constraint of the primal problem, we have 6x1∗ + 3x2∗ = 6 · 40 + 3 · 80 = 480 < 600 while the other two constraints of the primal problem are satisfied with equality, i.e. 8x2∗ = 8 · 80 = 640 and 6x1∗ + 6x2∗ = 6 · 40 + 6 · 80 = 720. Establishing the system of linear equations resulting from the first part of (9.13), we get 8u3∗ + 6u4∗ 6u4∗ = = 1 2. (Notice that x1∗ = 40 > 0 and x2∗ = 80 > 0.) The latter system has the unique solution u3∗ = 1 8 and u4∗ = 1 6 Linear programming 363 which satisfies conditions (9.14). Therefore, without applying the simplex algorithm, we have confirmed that x1∗ = 40, x2∗ = 80 as well as u3∗ = 1/8, u4∗ = 1/6, u5∗ = 0 constitute optimal solutions for the primal and dual problems, respectively. 9.7 DUAL SIMPLEX ALGORITHM Based on the relationships between problems (P) and (D), one can give an alternative variant of the simplex algorithm known as the dual simplex algorithm. While the simplex algorithm presented in Chapter 9.4 always works with (primal) feasible solutions and it stops when the current feasible solution is optimal (i.e. all coefficients gj in the objective row are greater than or equal to zero), the dual algorithm operates with infeasible solutions (i.e. there are right-hand side components smaller than zero), but the coefficients gj in the objective row are always all greater than or equal to zero (this means that the current solution satisfies the optimality criterion but it is not feasible). In the latter variant, all resulting basic solutions are feasible for the dual problem, but infeasible for the primal problem. In the dual simplex method, first the pivot row and then the pivot column are determined. Using the notations introduced in Chapter 9.4 (see description of the simplex algorithm), the pivot row and column are determined as follows provided that the simplex tableau of the current basic solution is as given in Table 9.3. Determination of the pivot row k Choose row k, 1 ≤ k ≤ m, such that bk < 0. Often, a row k is used with b∗k = min{b∗i | b∗i < 0, i = 1, 2, . . . , m}. Determination of the pivot column l Choose column l, 1 ≤ l ≤ n0010 , such that 0010 6 0015 gl gi 00100010 ∗ 0010 = min 0010 a < 0, j = 1, 2, . . . , n . |a∗kl | |a∗kj | 0010 kj Notice that, contrary to the (primal) simplex algorithm given in Chapter 9.4, the pivot element in the dual simplex algorithm is always negative. The determination of the pivot column as given above guarantees that after each pivoting step, all coefficients gj in the objective row remain greater than or equal to zero. In order to find the smallest quotient that defines the pivot column, we can add a Q row under the objective row (instead of the Q column in the primal simplex algorithm). The remaining transformation formulas are the same as described in Chapter 9.4, see step (6) of the simplex algorithm. The procedure stops when all righthand sides are non-negative. The application of the dual simplex algorithm is particularly favourable when the given problem has the form z = cT x −→ min! s.t. Ax ≥ b x ≥ 0. 364 Linear programming We illustrate the dual simplex algorithm by the following example. Example 9.16 Let the following LPP be given: z = 2x1 + 3x2 + 3x3 x2 s.t. x1 + −x1 + 2x2 4x1 + 3x2 −→ min! − 3x3 + x3 − 2x3 x1 , x2 , x3 ≥ 6 ≥ 2 ≥ 3 ≥ 0. We rewrite the problem as a maximization problem, multiply all constraints by −1 and introduce a slack variable in each constraint. This gives the following LPP: z = −z = −2x1 − 3x2 − 3x3 −→ max! s.t. − x1 − x2 + 3x3 + x4 x1 − 2x2 − x3 + x5 −4x1 − 3x2 + 2x3 + x6 x1 , x2 , x3 , x4 , x5 , x6 = −6 = −2 = −3 ≥ 0. Now we can start with the dual simplex algorithm (notice that the application of the dual simplex algorithm does not now require the introduction of artificial variables as in the case of applying the primal simplex algorithm), and the initial tableau is as follows: 1 nbv x1 x2 x3 bv −1 −2 −3 −3 0 x4 x5 x6 0 0 0 −1 1 −4 2 2 −1 −2 −3 3 3 3 −1 2 3 – −6 −2 −3 0 Q In the above tableau, all coefficients gj in the objective row are greater than or equal to zero (i.e. the optimality criterion is satisfied), but the current basic solution is infeasible (each basic variable is negative), which can be interpreted such that the current objective function value is ‘better’ than the optimal one. Applying the dual simplex algorithm, we first determine the pivot row. Choosing the smallest right-hand side component, x4 becomes the leaving variable, and based on the values in the quotient row, x1 becomes the entering variable, which gives the pivot element −1. This leads to the following tableau: 2 nbv x4 x2 x3 bv −1 0 −3 −3 0 x1 x5 x6 −2 0 0 −1 1 −4 2 1 −3 1 1 −3 2 −10 9 6 −8 21 −12 Q – 1 3 – Linear programming 365 Choosing now x5 as leaving variable, the only possible entering variable is x2 which gives the pivot element −3. We obtain the following tableau: 3 nbv x4 x5 x3 bv −1 0 0 −3 0 x1 −2 − 23 1 3 − 73 x2 −3 x6 0 1 3 1 3 − 28 3 10 3 8 3 55 3 − 44 3 − 13 − 11 3 − 13 7 3 Q − 23 29 3 Now, all right-hand sides are non-negative, and so the procedure stops. The optimal solution is as follows: x1 = 10 , 3 x2 = 8 , 3 x3 = x4 = x5 = 0, x6 = 55 , 3 and the optimal function value is z0max = −z min = 0 44 . 3 EXERCISES 9.1 Solve the following problems graphically: (a) (c) 9.2 z = x1 + x2 → min! s.t. x1 + x2 ≥ x1 − x 2 ≥ x1 + 2x2 ≤ x1 + 4x2 ≥ x1 , x2 ≥ 2 3 6 0 0 z = −x1 − 2x2 → min! s.t. − x1 + x2 ≤ x1 + 2x2 ≤ 2x1 + x2 ≤ ≤ x1 x1 , x2 ≥ (b) z = −x1 + 4x2 → min! s.t. x1 − 2x2 ≤ −x1 + 2x2 ≤ x1 + 2x2 ≤ x1 , x2 ≥ 4 11 10 4 0 4 4 8 0 (d) z = x1 + x2 → max! s.t. x1 − x2 ≥ 0 −x1 − 2x2 ≤ 4 x1 , x2 ≥ 0 A craftsman has a free capacity of 200 working hours which he wants to use for making two products A and B. Production of one piece uses up 1 hour for A and 4 hours for B. 366 Linear programming The number of pieces of product A is at most 100. The number of pieces of product B must be at least 30, but more than three times the amount of A is impossible. Finally, products A and B can be sold at prices of 20 EUR and 27 EUR but the variable costs incurred amount to 10 EUR and 21 EUR per piece of products A and B, respectively. What output combination should the craftsman choose in order to maximize the total profit? Find the solution of the problem graphically. 9.3 Find the standard forms of the following linear programming problems: (a) (b) 9.4 z = x1 − 2x2 + x3 → min! s.t. x1 + x2 + x3 ≤ 3x1 − x2 + x3 ≥ x1 , x2 , x3 ≥ z = −x1 + 2x2 − 3x3 + x4 → min! s.t. 2x1 + 2x2 − x3 + 3x4 x1 + 2x3 + x4 −2x1 + 2x3 − 3x4 x1 ≤ 0, x2 , x3 ≥ 0, x4 ∈ R = 8 ≤ 10 ≥ 0 (a) Find the optimal solution of the following problem graphically and by the simplex method: z = x1 + x2 → max! s.t. 3x1 + 2x2 ≤ x1 + 4x2 ≤ x1 , x2 ≥ 9.5 7 −4 0 6 4 0 (b) Solve problem 9.1 (c) by the simplex method. Solve the following problems by the simplex method: (a) z = 7x1 + 4x2 + 5x3 + 6x4 → max! s.t. 20x1 + 10x2 + 12x3 + 16x4 x3 x1 + x2 + x3 + x4 x1 , x2 , x3 , x4 (b) 9.6 9.7 z = 2x1 − 6x2 → min! s.t. 2x1 − x2 x1 − 3x2 + x3 3x1 x1 , x2 , x3 ≤ ≤ = ≥ ≤ 400 ≤ 5 ≤ 30 ≥ 0 10 15 12 0 Solve problem 9.2 by the two-phase simplex algorithm. Solve the following linear programming problems by the simplex method: (a) z = x1 + x2 − x3 → min! s.t. 3x1 − x2 − 4x3 x1 + 2x2 x2 + 3x3 x1 , x2 , x3 ≤ ≥ ≥ ≥ 0 10 4 0 Linear programming 367 (b) (c) 9.8 9.9 z = x1 + 2x2 + x3 → max! s.t. x1 + x2 x2 + x 3 x1 + x3 x1 , x2 , x3 z = 2x1 − x2 + x3 s.t. x1 + x2 2x1 + 3x2 x2 ≤ ≤ ≥ ≥ 10 14 15 0 → max! − x3 − x4 − x3 − 2x4 + 2x3 x1 , x2 , x3 , x4 = 4 = 9 ≥ 3 ≥ 0 Formulate the dual problems of the three primal problems given in exercises 9.5 (a), 9.7 (a) and 9.7 (c). Find the dual problems and both the primal and dual optimal solutions: (a) z = 2x1 − 2x2 − x3 − x4 → min! s.t. − x1 + 2x2 + x3 − x4 x1 − x2 + x4 − 2x4 x1 + 4x2 x1 , x2 , x3 , x4 (b) z = 3x1 − x2 + x3 + x4 → max! s.t. 2x1 + x2 − 3x3 − x4 −x1 + x2 + 3x3 − 2x4 −x1 + 2x3 + x4 x1 , x2 , x3 , x4 9.10 ≤ ≤ ≤ ≥ 1 4 1 0 ≤ 4 ≤ 4 ≤ 4 ≥ 0 A firm produces by means of three raw materials R1 , R2 and R3 two different products P1 and P2 . Profits, raw material requirements and capacities are given in the following table: R1 R2 R3 profit per unit Pi P1 P2 2 2 4 2 4 1 0 3 capacity 16 10 20 Use the method of complementary slackness to decide whether a production of four units of P1 and two units of P2 is optimal. 9.11 A publicity agency needs at least 100 paper strips that are 1 m long and 5 cm wide, 200 strips that are 1 m long and 3 cm wide and 400 strips that are 1 m long and 2 cm wide. The necessary strips can be cut from 1 m long and 10 cm wide strips. (a) Find all cutting variants without waste strips. (b) Formulate a linear programming problem, where the number of 10 cm wide strips needed is minimized. Denote by xi the number of 10 cm wide strips that are cut by variant i. (c) Solve the problem by the dual simplex algorithm. 10 Eigenvalue problems and quadratic forms In this chapter, we deal with an application of homogeneous systems of linear equations, namely eigenvalue problems. Moreover, we introduce so-called quadratic forms and investigate their sign. Quadratic forms play an important role when determining extreme points and values of functions of several variables, which are discussed in Chapter 11. 10.1 EIGENVALUES AND EIGENVECTORS An eigenvalue problem can be defined as follows. Definition 10.1 Let A be an n × n matrix. Then the scalar λ is called an eigenvalue of matrix A if there exists a non-trivial solution x ∈ Rn , x = 0, of the matrix equation Ax = λx. (10.1) The solution xT = (x1 , x2 , . . . , xn ) = (0, 0, . . . , 0) is called an eigenvector of A (associated with scalar λ). Equation (10.1) is equivalent to a homogeneous linear system of equations. From the matrix equation Ax = λx = λI x we obtain (A − λI )x = 0, (10.2) where I is the identity matrix of order n × n. Hence, system (10.2) includes n linear equations with n variables x1 , x2 , . . . , xn . According to Definition 10.1, eigenvalue problems are defined only for square matrices A. In an eigenvalue problem, we look for all (real or complex) values λ such that the image of a non-zero vector x given by a linear mapping described by matrix A is a multiple λx of this vector x. It is worth noting that value zero is possible as an eigenvalue, while the zero vector is not possible as an eigenvector. Although eigenvalue problems arise mainly in engineering Eigenvalue problems and quadratic forms 369 sciences, they also have some importance in economics. For example, as we show in the following two chapters, they are useful for deciding whether a function has an extreme point or for solving certain types of differential and difference equations. The following theorem gives a necessary and sufficient condition for the existence of nontrivial (i.e. different from the zero vector) solutions of problem (10.1). THEOREM 10.1 Problem (10.1) has a non-trivial solution x = 0 if and only if the determinant of matrix A − λI is equal to zero, i.e. |A − λI | = 0. The validity of the above theorem can easily be seen by taking into account that a homogeneous system (10.2) of linear equations has non-trivial solutions if and only if the rank of the coefficient matrix of the system (i.e. the rank of matrix A − λI ) is less than the number of variables. The latter condition is equivalent to the condition that the determinant of the coefficient matrix is equal to zero, which means that matrix A − λI has no inverse matrix. We rewrite the determinant of matrix A − λI as a function of the variable λ. Letting P(λ) = |A − λI |, where A = (aij ) is a matrix of order n × n, we get the following equation in λ: 0010 0010 0010a11 − λ a12 .. a1n 00100010 0010 0010 a21 a22 − λ . . . a2n 00100010 0010 P(λ) = 0010 . . . 0010 = 0, . 0010 . . 00100010 0010 0010 an1 an2 . . . ann − λ0010 which is known as the characteristic equation (or eigenvalue equation) of matrix A. From the definition of a determinant, it follows that P(λ) is a polynomial in λ which has degree n for a matrix A of order n × n. The zeroes of this characteristic polynomial P(λ) = (−1)n λn + bn−1 λn−1 + · · · + b1 λ + b0 of degree n are the eigenvalues of matrix A. Thus, in order to find all eigenvalues of an n × n matrix A, we have to determine all zeroes of polynomial P(λ) of degree n (i.e. all roots of the characteristic equation P(λ) = 0). Here we often have to apply numerical methods (described in Chapter 4) to find them (approximately). In general, the eigenvalues of a real matrix A can be complex numbers (and also the eigenvectors may contain complex components). The following theorem describes a case when all eigenvalues of matrix A are real. THEOREM 10.2 If matrix A of order n × n is a symmetric matrix (i.e. A = AT ), then all eigenvalues of A are real numbers. For each eigenvalue λi , i = 1, 2, . . . , n, we have to find the general solution of the homogeneous system of linear equations (A − λi I )x = 0 (10.3) in order to get the corresponding eigenvectors. Since the rank of matrix A is smaller than n, the solution of the corresponding system of equations is not uniquely determined and for each eigenvalue λi , i = 1, 2, . . . , n, of system (10.3), there is indeed a solution where not all variables are equal to zero. 370 Eigenvalue problems and quadratic forms We continue with some properties of the set of eigenvectors belonging to the same eigenvalue. THEOREM 10.3 Let λ be an eigenvalue of multiplicity k (i.e. λ is k times a root of the characteristic equation P(λ) = 0) of a matrix A of order n × n. Then: (1) The number of linearly independent eigenvectors associated with eigenvalue λ is at least one and at most k. (2) If A is a symmetric matrix, then there exist k linearly independent eigenvectors associated with λ. The set of linearly independent eigenvectors associated with eigenvalue λ forms a vector space. As a consequence of Theorem 10.3, we mention that, if x1 and x2 are eigenvectors associated with eigenvalue λ, then also vector sx1 + tx2 with s ∈ R and t ∈ R is an eigenvector associated with λ. It also follows from Theorem 10.3 that for an arbitrary square matrix A there always exists exactly one linearly independent eigenvector associated with an eigenvalue of multiplicity one. THEOREM 10.4 Let A be a matrix of order n × n. Then: (1) Eigenvectors associated with different eigenvalues of matrix A are linearly independent. (2) If matrix A is symmetric, then eigenvectors associated with different eigenvalues are orthogonal. Let us consider the following three examples to determine all eigenvalues and eigenvectors of a given matrix. Example 10.1 We determine the eigenvalues and eigenvectors of matrix 0003 0004 1 2 A= . 2 1 The characteristic equation is given by 0010 0010 00101 − λ 2 00100010 P(λ) = |A − λI | = 00100010 = (1 − λ)(1 − λ) − 4 = λ2 − 2λ − 3 = 0 2 1 − λ0010 with the solutions √ λ1 = 1 + 1 + 3 = 3 and λ2 = 1 − √ 1 + 3 = −1. To determine the corresponding eigenvectors, we have to solve the matrix equation (A − λI )x = 0 for λ = λ1 and λ = λ2 . Thus, we get the following system for λ = λ1 = 3: −2x1 + 2x2 = 0 2x1 − 2x2 = 0. The second equation may be obtained from the first equation by multiplying by −1 (i.e. both row vectors of the left-hand side are linearly dependent) and can therefore be dropped. The Eigenvalue problems and quadratic forms 371 coefficient matrix of the above system has rank one and so we can choose one variable arbitrarily, say x2 = t, t ∈ R. This yields x1 = t, and each eigenvector associated with the eigenvalue λ1 = 3 can be described in the form 0003 x1 = t 1 1 0004 t ∈ R. , Analogously, for λ = λ2 = −1, we get the system: 2x1 + 2x2 = 0 2x1 + 2x2 = 0. Again, we can drop one of the two identical equations and can choose one variable arbitrarily, say x2 = s, s ∈ R. Then we get x1 = −s. Thus, all eigenvectors associated with λ2 = −1 can be represented as 0003 x2 = s −1 1 0004 s ∈ R. , This example illustrates Theorem 10.4. For arbitrary choice of s, t ∈ R, the eigenvectors x1 and x2 are linearly independent and orthogonal (i.e. the scalar product of vectors x1 and x2 is equal to zero). Example 10.2 ⎛ We determine all eigenvalues and eigenvectors of matrix 0 −1 0 A = ⎝−7 −5 −2 ⎞ 1 5⎠ . 5 To find the eigenvalues, we consider the characteristic equation P(λ) = 0: 0010 0010 0010 0010 0010−λ −1 1 00100010 00100010−λ −1 0 00100010 0010 5 00100010 = 00100010−7 −λ 5 − λ00100010 = 0. P(λ) = |A − λI | = 00100010−7 −λ 0010−5 −2 5 − λ0010 0010−5 −2 3 − λ0010 The above transformation is obtained by adding columns 2 and 3 to get the third element equal to zero in row 1. Expanding the latter determinant by row 1, we obtain 0010 0010 0010 0010 0010−λ 5 − λ0010 0010 0010 0010 + 1 · 0010−7 5 − λ0010 P(λ) = |A − λI | = −λ · 00100010 0010−5 3 − λ0010 −2 3 − λ0010 = λ2 (3 − λ) − 2λ(5 − λ) + [(−7) · (3 − λ) + 5(5 − λ)] = (3λ2 − λ3 − 10λ + 2λ2 ) + (−21 + 7λ + 25 − 5λ) = −λ3 + 5λ2 − 8λ + 4. Considering now the characteristic equation P(λ) = 0, we try to find a first root and use Horner’s scheme (see Chapter 3.3.3) for the computation of the function value. 372 Eigenvalue problems and quadratic forms Checking λ1 = 1, we get −1 λ1 = 1 −1 5 −8 4 −1 4 −4 4 −4 0 i.e. λ1 = 1 is a root of the characteristic equation P(λ) = 0. From Horner’s scheme (see last row), we obtain that dividing P(λ) by the linear factor λ − λ1 = λ − 1 gives the polynomial P2 (λ) of degree two: P2 (λ) = −λ2 + 4λ − 4. Setting P2 (λ) = 0, we obtain λ2 = 2 + √ 4−4=2 λ3 = 2 − and √ 4 − 4 = 2, i.e. λ2 = λ3 = 2 is an eigenvalue of multiplicity two. In order to determine the eigenvectors associated with λ1 = 1, we get the homogeneous system of linear equations −x1 −7x1 −5x1 − − − x2 x2 2x2 + + + =0 =0 = 0. x3 5x3 4x3 Applying Gaussian elimination, we get the following tableaus: Row x1 x2 x3 1 2 3 −1 −7 −5 −1 −1 −2 1 5 4 0 0 0 4 5 6 −1 0 0 −1 6 3 1 −2 −1 0 0 0 row 1 row 2 – 7 row 1 row 3 – 5 row 1 7 8 9 −1 0 0 −1 6 0 1 −2 0 0 0 0 row 4 row 5 row 6 – 12 row 5 b Operation Since the rank of the coefficient matrix is equal to two, we can choose one variable arbitrarily. Setting x3 = 3, we get x2 = 1 and x1 = 2 (here we set x3 = 3 in order to get integer solutions for the other two variables), i.e. each eigenvector associated with λ1 = 1 has the form ⎛ ⎞ 2 ⎝ x = s 1 ⎠, 3 1 s ∈ R. Eigenvalue problems and quadratic forms 373 Considering the eigenvalue λ2 = λ3 = 2, we get the following system of linear equations: −2x1 −7x1 −5x1 − − − + + + x2 2x2 2x2 x3 5x3 3x3 = = = 0 0 0. After applying Gaussian elimination or pivoting, we find that the coefficient matrix of this system of linear equations has rank two. Hence we can choose one variable arbitrarily. If we choose x3 = 1, we get x2 = −1 and x1 = 1. Therefore, each eigenvector associated with eigenvalue λ2 = λ3 = 2 has the form ⎛ ⎞ 1 x = t ⎝ −1 ⎠ , 1 2 t ∈ R. In particular, for the latter eigenvalue of multiplicity two, there exists only one linearly independent eigenvector. Example 10.3 ⎛ −4 A=⎝ 2 −1 We determine the eigenvalues and eigenvectors of matrix −3 3 −3 ⎞ 3 −6⎠ . 0 The characteristic equation is given by 0010 0010 0010−4 − λ −3 3 00100010 0010 3 − λ −600100010 P(λ) = |A − λI | = 00100010 2 0010 −1 −3 −λ0010 0010 0010 0010−4 − λ 0 3 00100010 0010 −3 − λ −600100010 = 00100010 2 0010 −1 −3 − λ −λ0010 0010 0010 0010−4 − λ 0 3 00100010 0010 0 −6 + λ00100010 = 00100010 3 0010 −1 −3 − λ −λ 0010 0010 0010 0010−4 − λ 0 3 00100010 0010 0 −6 + λ00100010 = (−3 − λ) · 00100010 3 0010 −1 1 −λ 0010 0010 0010 0010−4 − λ 3 00100010 = −(−3 − λ) · 00100010 3 −6 + λ0010 = −(−3 − λ) · [(−4 − λ)(−6 + λ) − 9] = (3 + λ) · (−λ2 + 2λ + 15) = 0. In the transformations above, we have first added column 3 to column 2, and then we have added row 3 multiplied by −1 to row 2. In the next step, the term (−3 − λ) has been factored 374 Eigenvalue problems and quadratic forms out from the second column and finally, the resulting determinant has been expanded by column 2. From equation 3 + λ = 0, we obtain the first eigenvalue λ1 = −3, and from equation −λ2 + 2λ + 15 = 0, we obtain the two eigenvalues λ2 = −3 λ3 = 5. and Next, we determine a maximum number of linearly independent eigenvectors for each eigenvalue. We first consider the eigenvalue λ1 = λ2 = −3 of multiplicity two and obtain the following system of linear equations: −x1 − 3x2 + 3x3 = 0 2x1 + 6x2 − 6x3 = 0 −x1 − 3x2 + 3x3 = 0. Since equations one and three coincide and since equation two corresponds to equation one multiplied by −2, the rank of the coefficient matrix of the above system is equal to one, and therefore we can choose two variables arbitrarily. Consequently, there exist two linearly independent eigenvectors associated with this eigenvalue. Using our knowledge about the general solution of homogeneous systems of linear equations, we get linearly independent solutions by choosing for the first vector x1 : x21 = 1, x31 = 0 and for the second vector x2 : x22 = 0, x32 = 1 (i.e. we have taken x2 and x3 as the variables that can be chosen arbitrarily). Then the remaining variables are uniquely determined and we obtain e.g. from the first equation of the above system: x11 = −3, x12 = 3. Therefore, the set of all eigenvectors associated with λ2 = λ3 = −3 is given by ⎧ 0010 ⎫ ⎛ ⎞ ⎛ ⎞ 0010 −3 3 ⎨ ⎬ 0010 3 0010 x ∈ R 0010 x = s ⎝ 1 ⎠ + t ⎝ 0 ⎠ , s ∈ R, t ∈ R . ⎩ ⎭ 0010 0 1 While in Example 10.2 only one linearly independent vector was associated with the eigenvalue of multiplicity two, in this example there are two linearly independent eigenvectors associated with an eigenvalue of multiplicity two. This is the maximal possible number since we know from Theorem 10.3 that at most k linearly independent eigenvectors are associated with an eigenvalue of multiplicity k. To finish this example, we still have to find the eigenvalue associated with λ3 = 5 by the solution of the following system of linear equations: −9x1 − 3x2 + 3x3 = 0 2x1 − 2x2 − 6x3 = 0 −x1 − 3x2 − 5x3 = 0. By applying Gaussian elimination or pivoting, we find that the coefficient matrix has rank two, and therefore one variable can be chosen arbitrarily. Choosing x3 = 1, we finally get x2 = −2x3 = −2 and x1 = −3x2 − 5x3 = 1. Eigenvalue problems and quadratic forms 375 Therefore, an eigenvector associated with λ3 = 5 can be written in the form ⎛ ⎞ 1 x = u ⎝ −2 ⎠ , 1 3 u ∈ R. In the next chapters, we show how eigenvalues can be used for solving certain optimization problems as well as differential and difference equations. The problem of determining eigenvalues and the corresponding eigenvectors often arises in economic problems dealing with processes of proportionate growth or decline. We demonstrate this by the following example. Example 10.4 Let xtM be the number of men and xtW the number of women in some population at time t. The relationship between the populations at some successive times t and t + 1 has been found to be as follows: M xt+1 = 0.8xtM + 0.4xtW W xt+1 = 0.3xtM + 0.9xtW . Letting xt = (xtM , xtW )T , we obtain the following relationship between the populations xt+1 and xt at successive times: 001e 001f 0003 0004001e M 001f M xt+1 xt 0.8 0.4 = . 0.3 0.9 xW xtW t+1 Moreover, we assume that the ratio of men and women is constant over time, i.e. 001e 001f 001e 001f M xt+1 xtM =λ . W xt+1 xtW Now, the question is: do there exist such values λ ∈ R+ and vectors xt satisfying the above equations, i.e. can we find numbers λ and vectors xt such that Axt = λxt ? To answer this question, we have to find the eigenvalues of matrix 0003 A= 0.8 0.3 0.4 0.9 0004 and then for an appropriate eigenvalue the corresponding eigenvector. We obtain the characteristic equation 0010 0010 00100.8 − λ 0.4 00100010 P(λ) = |A − λI | = 00100010 = (0.8 − λ)(0.9 − λ) − 0.3 · 0.4 = 0. 0.3 0.9 − λ0010 376 Eigenvalue problems and quadratic forms This yields P(λ) = λ2 − 1.7λ + 0.6 = 0 and the eigenvalues λ1 = 0.85 + √ 0.7225 − 0.6 = 0.85 + 0.35 = 1.2 and λ2 = 0.85 − 0.35 = 0.5. Since we are looking for a proportionate growth in the population, only the eigenvalue greater than one is of interest, i.e. we have to consider λ1 = 1.2 and determine the corresponding eigenvector from the system −0.4xtM 0.3xtM + − 0.4xtW 0.3xtW =0 = 0. The coefficient matrix has rank one, we can choose one variable arbitrarily and get the eigenvector 0003 xt = u 1 1 0004 u ∈ R. , In order to have a proportionate growth in the population, the initial population must consist of the same number of men and women, and the population grows by 20 per cent until the next time it is considered. This means that if initially at time t = 1, the population is given by vector x1 = (1, 000; 1, 000)T , then at time t = 2 the population is given by x2 = (1, 200; 1, 200)T , at time t = 3 the population is given by x3 = (1, 440; 1, 440)T , and so on. 10.2 QUADRATIC FORMS AND THEIR SIGN We start with the following definition. Definition 10.2 the term If A = (aij ) is a matrix of order n × n and xT = (x1 , x2 , . . . , xn ), then Q(x) = xT Ax is called a quadratic form. (10.4) Eigenvalue problems and quadratic forms 377 Writing equation (10.4) explicitly, we have: ⎛ a11 ⎜a21 ⎜ Q(x) = Q(x1 , x2 , . . . , xn ) = (x1 , x2 , . . . , xn ) ⎜ . ⎝ . a12 a22 . . an1 an2 ⎞⎛ . . . a1n ⎜ . . . a2n ⎟ ⎟⎜ . ⎟ ⎜ . ⎠⎝ . . . ann x1 x2 . . ⎞ ⎟ ⎟ ⎟ ⎠ xn = a11 x1 x1 + a12 x1 x2 + · · · + a1n x1 xn + a21 x2 x1 + a22 x2 x2 + · · · + a2n x2 xn + · · · + an1 xn x1 + an2 xn x2 + · · · + ann xn xn = n n 0006 0006 aij xi xj . i=1 j=1 THEOREM 10.5 Let A be a matrix of order n × n. Then the quadratic form xT Ax can be written as a quadratic form xT A∗ x of a symmetric matrix A∗ of order n × n, i.e. we have xT Ax = xT A∗ x, where A∗ = 0012 1 0011 · A + AT . 2 As a result of Theorem 10.5, we can restrict ourselves to the consideration of quadratic forms of symmetric matrices, where all eigenvalues are real numbers (see Theorem 10.2). In the following definition, the sign of a quadratic form xT Ax is considered. Definition 10.3 A square matrix A of order n × n and its associated quadratic form Q(x) are said to be (1) (2) (3) (4) (5) positive definite if Q(x) = xT Ax > 0 for all xT = (x1 , x2 , . . . , xn ) = (0, 0, . . . , 0); positive semi-definite if Q(x) = xT Ax ≥ 0 for all x ∈ Rn ; negative definite if Q(x) = xT Ax < 0 for all xT = (x1 , x2 , . . . , xn ) = (0, 0, . . . , 0); negative semi-definite if Q(x) = xT Ax ≤ 0 for all x ∈ Rn ; indefinite if they are neither positive semi-definite nor negative semi-definite. The following example illustrates Definition 10.3. Example 10.5 Let 0003 0004 1 −1 A= . −1 1 We determine the sign of the quadratic form Q(x) = xT Ax by applying Definition 10.3. Then 0004 0003 00040003 1 −1 x1 Q(x) = xT Ax = (x1 , x2 ) x2 −1 1 = x1 (x1 − x2 ) + x2 (−x1 + x2 ) 378 Eigenvalue problems and quadratic forms = x1 (x1 − x2 ) − x2 (x1 − x2 ) = (x1 − x2 )2 ≥ 0. Therefore, matrix A is positive semi-definite. However, matrix A is not positive definite since there exist vectors xT = (x1 , x2 ) = (0, 0) such that Q(x) = xT Ax = 0, namely if x1 = x2 but is different from zero. The following theorem shows how we can decide by means of the eigenvalues of a symmetric matrix whether the matrix is positive or negative (semi-)definite. THEOREM 10.6 Let A be a symmetric matrix of order n × n with the eigenvalues λ1 , λ2 , . . . , λn ∈ R. Then: (1) A is positive definite if and only if all eigenvalues of A are positive (i.e. λi > 0 for i = 1, 2, . . . , n). (2) A is positive semi-definite if and only if all eigenvalues of A are non-negative (i.e. λi ≥ 0 for i = 1, 2, . . . , n). (3) A is negative definite if and only if all eigenvalues of A are negative (i.e. λi < 0 for i = 1, 2, . . . , n). (4) A is negative semi-definite if and only if all eigenvalues of A are non-positive (i.e. λi ≤ 0 for i = 1, 2, . . . , n). (5) A is indefinite if and only if A has at least two eigenvalues with opposite signs. Example 10.6 Let us consider matrix A with 0003 0004 0 2 A= . 1 −1 We determine the eigenvalues of A and obtain the characteristic equation 0010 0010 0010−λ 200100010 P(λ) = |A − λI | = 00100010 = 0. 1 −1 − λ0010 This yields −λ(−1 − λ) − 2 = λ2 + λ − 2 = 0. The above quadratic equation has the solutions 0016 1 3 1 1 λ1 = − + +2=− + =1>0 2 4 2 2 0016 1 1 1 3 λ2 = − − + 2 = − − = −2 < 0. 2 4 2 2 and Since both eigenvalues have opposite signs, matrix A is indefinite according to part (5) of Theorem 10.6. Eigenvalue problems and quadratic forms 379 Next, we present another criterion to decide whether a given matrix A is positive or negative definite. To apply this criterion, we have to investigate the sign of certain minors introduced in the following definition. Definition 10.4 the determinants 0010 0010a11 0010 0010a21 0010 Dk = 0010. 0010. 0010 0010ak1 The leading principal minors of matrix A = (aij ) of order n × n are a12 a22 . . ak2 0010 · · · a1k 00100010 · · · a2k 00100010 . 0010 , . 00100010 · · · akk 0010 k = 1, 2, . . . , n, i.e. Dk is obtained from |A| by crossing out the last n − k columns and rows. By means of the leading principal minors, we can give a criterion to decide whether a matrix A is positive or negative definite. THEOREM 10.7 Let matrix A be a symmetric matrix of order n×n with the leading principal minors Dk , k = 1, 2, . . . , n. Then: (1) A is positive definite if and only if Dk > 0 for k = 1, 2, . . . , n. (2) A is negative definite if and only if (−1)k · Dk > 0 for k = 1, 2, . . . , n. For a symmetric matrix A of order n × n, we have to check the sign of n determinants to find out whether A is positive (negative) definite. If all leading principal minors are greater than zero, matrix A is positive definite according to part (1) of Theorem 10.7. If the signs of the n leading principal minors alternate, where the first minor is negative (i.e. element a11 is smaller than zero), then matrix A is negative definite according to part (2) of Theorem 10.7. The following two examples illustrate the use of Theorem 10.7. Example 10.7 ⎛ −3 A=⎝ 2 0 Let 2 −3 0 ⎞ 0 0⎠ . −5 For matrix A, we get the leading principal minors D1 = a11 = −3 < 0, 0010 0010 0010 0010−3 0010a11 a12 0010 0010 0010 D2 = 0010 = 00100010 2 a21 a22 0010 0010 0010 0010a11 a12 a13 0010 0010 0010 D3 = 00100010a21 a22 a23 00100010 = 0010a31 a32 a33 0010 0010 200100010 = 9 − 4 = 5 > 0, −30010 0010 0010 0010−3 2 000100010 0010 0010 2 −3 000100010 = −5D2 = −25 < 0. 0010 0010 0 0 −50010 380 Eigenvalue problems and quadratic forms Since the leading principal minors Dk , k ∈ {1, 2, 3}, alternate in sign, starting with a negative sign of D1 , matrix A is negative definite according to part (2) of Theorem 10.7. We check for which values a ∈ R the matrix Example 10.8 ⎛ 2 ⎜1 A=⎜ ⎝0 0 1 0 a 1 1 3 0 1 ⎞ 0 0⎟ ⎟ 1⎠ 2 is positive definite. We apply Theorem 10.7 and investigate for which values of a all leading principal minors are greater than zero. We obtain: D1 = 2 > 0, 0010 00102 D2 = 00100010 1 0010 100100010 = 2a − 1 > 0. a0010 The latter inequality holds for a > 1/2. Next, we obtain 0010 00102 0010 D3 = 001000101 00100 1 a 1 0010 000100010 100100010 = 6a − 2 − 3 > 0, 30010 which holds for a > 5/6. For calculating D4 = |A|, we can expand |A| by row 4: 0010 00102 0010 D4 = |A| = − 001000101 00100 1 a 1 0010 000100010 000100010 + 2D3 = −(2a − 1) + 12a − 10 = 10a − 9 > 0, 10010 which holds for a > 9/10. Due to 1/2 < 5/6 < 9/10, and since all leading principal minors must be positive, matrix A is positive definite for a > 9/10. THEOREM 10.8 Let A be a symmetric matrix of order n × n. Then: (1) If matrix A is positive semi-definite, then each leading principal minor Dk , k = 1, 2, . . . , n, is non-negative. (2) If matrix A is negative semi-definite, then each leading principal minor Dk is either zero or has the same sign as (−1)k , k = 1, 2, . . . , n. It is worth noting that Theorem 10.8 is only a necessary condition for a positive (negative) semi-definite matrix A. If all leading principal minors of a matrix A are non-negative, we cannot conclude that this matrix must be positive semi-definite. We only note that, in order to get a sufficient condition for a positive (negative) semi-definite matrix, we have to check Eigenvalue problems and quadratic forms 381 all minors of matrix A and they have to satisfy the conditions of Theorem 10.8 concerning their signs. EXERCISES 10.1 Given are the matrices 0003 0004 0003 0004 2 1 2 1 A= ; B= ; 4 −1 −2 4 ⎞ ⎛ ⎞ ⎛ −3 −2 4 2 0 0 2 3⎠ C = ⎝−3 and D = ⎝ 2 1 −2⎠. −2 −2 3 −1 0 2 Find the eigenvalues and the eigenvectors of each of these matrices. 10.2 Let xt be the consumption value of a national economy in period t and yt the capital investment of the economy in this period. For the following period t + 1, we have xt+1 = 0.7xt + 0.6yt which describes the change in consumption from one period to the subsequent one depending on consumption and capital investment in the current period. Consumption increases by 70 per cent of consumption and by 60 per cent of the capital investment. The capital investment follows the same type of strategy: yt+1 = 0.6xt + 0.2yt . Thus, we have the system ut+1 = Aut with ut = (xt , yt )T , t = 1, 2, . . . . (a) Find the greatest eigenvalue λ of the matrix A and the eigenvectors associated with this value. (b) Interpret the result above with λ as a factor of proportionate growth. (c) Let 10, 000 units be the sum of consumption value and capital investment in the first period. How does it have to be split for proportionate growth? Assume you have the same growth rate λ for the following two periods, what are the values of consumption and capital investment? 10.3 Given are the matrices ⎛ 1 2 0 ⎜0 3 2 ⎜ A=⎝ 0 0 −2 0 0 0 ⎞ 1 0⎟ ⎟ 0⎠ 5 and ⎛ 4 B = ⎝0 0 0 1 3 ⎞ 0 3⎠ . 1 (a) Find the eigenvalues and the eigenvectors of each of these matrices. (b) Verify that the eigenvectors of matrix A form a basis of the space R4 and that the eigenvectors of matrix B are linearly independent and orthogonal. 382 Eigenvalue problems and quadratic forms 10.4 Verify that the quadratic form xT Bx with matrix B from Exercise 10.1 is positive definite. 10.5 Given are the matrices: 0003 0004 0003 0004 3 −1 2 2 A= ; B= ; −1 1 2 1 ⎛ ⎞ ⎛ 5 1 0 1 1⎝ ⎠ 1 5 0 C= and D = ⎝0 2 0 0 −8 2 0 1 0 ⎞ 2 0⎠. 5 (a) Find the eigenvalues of each of these matrices. (b) Determine by the given criterion (see Theorem 10.7) which of the matrices A, B, C, D (and their associated quadratic forms, respectively) are positive definite and which are negative definite. (c) Compare the results of (b) with the results of (a). 10.6 Let x = (1, 1, 0)T be an eigenvector associated with the eigenvalue λ1 = 3 of the matrix ⎛ ⎞ a1 0 1 A = ⎝ 2 a 2 0⎠ . 1 −1 a3 (a) What can you conclude about the values of a1 , a2 and a3 ? (b) Find another eigenvector associated with λ1 . (c) Is it possible in addition to the answers concerning part (a) to find further conditions for a1 , a2 and a3 when A is positive definite? (d) If your answer is affirmative for part (c), do you see a way to find a1 , a2 and a3 exactly when λ2 = −3 is also an eigenvalue of the matrix A? 11 Functions of several variables In Chapter 4, we considered such situations as when a firm produces some output y = f (x) by means of a certain input x. However, in many economic applications, we have to deal with situations where several variables have to be included in the mathematical model, e.g. usually the output depends on a set of different input factors. Therefore, we deal in this chapter with functions depending on more than one independent variable. 11.1 PRELIMINARIES If we have a pair (x, y) ∈ R2 of two input factors, the output may be measured by a function value f (x, y) ∈ R depending on two independent variables x and y. The notion of independent variables means that each variable can vary by itself without affecting the others. In a more general case, the input may be described by an n-tuple (x1 , x2 , . . . , xn ) ∈ Rn of numbers, and the output is described by a value f (x1 , x2 , . . . , xn ) ∈ R. In the former case, f : R2 → R is a function of two variables mapping the pair (x, y) into a real number f (x, y). In the latter case, f : Rn → R is a function of n variables, or we also say that it is a function of an n-dimensional vector x. Instead of f (x1 , x2 , . . . , xn ), we also use the notation f (x) for the function value, where x denotes an n-vector. Similarly as in the case of a function of one variable, we denote the set of points for which the function value is defined as domain Df of function f . Formally we can summarize: a function f : Df → R, Df ⊆ Rn , depending on n variables x1 , x2 , . . . , xn assigns a specified real number to each n-vector (or point) (x1 , x2 , . . . , xn ) ∈ Df . (Since vector x can also be considered as a point, we always skip the transpose sign ‘T’ of a row vector in this chapter.) Example 11.1 (Cobb–Douglas production function) A production function assigns to each n-vector x with non-negative components, where the ith component represents the amount of input i (i = 1, 2, . . . , n; such a vector is also known as an input bundle), the maximal output z = f (x1 , x2 , . . . , xn ). Assume that an agricultural production function Q, where Q is the number of units produced, depends on three input variables K, L and R, where K stands for the capital invested, L denotes the labour input and R denotes the area of land that is used for the agricultural production. If the relationship between the independent variables K, L and R and the dependent variable (output) Q is described by the equation Q(K, L, R) = A · K a1 · La2 · Ra3 , 384 Functions of several variables where A, a1 , a2 , a3 are given parameters, we say that Q = Q(K, L, R) is a Cobb–Douglas function. In this case, the output depends on the three variables K, L and R, but in general, it may depend on n variables, say K1 , K2 , . . . , Kn . The graph of a function f with z = f (x, y), called a surface, consists of all points (x, y, z) for which z = f (x, y) and (x, y) belongs to the domain Df . The surface of a function f with Df ⊆ R2 is illustrated in Figure 11.1. To get an overview on a specific function, we might be interested in knowing all points (x, y) of the domain Df which have the same function value. A level curve for function f depending on two variables is a curve in the xy plane given by z = f (x, y) = C, where C is a constant. This curve is also called an isoquant (indicating ‘equal quantity’). To illustrate, consider the following example. Figure 11.1 The surface z = f (x, y). Example 11.2 Let function f : Df → R with f (x, y) = 4x2 − y2 , Df = {(x, y) ∈ R2 | −1 ≤ x ≤ 1, −2 ≤ y ≤ 2}, be given. We determine the isoquants for this function. Putting 4x2 − y2 = C, where C denotes a constant, and eliminating variable y (and x, respectively), we get 0003 0004 0017 0018 1 y = ± 4x2 − C or, correspondingly, x=± y2 + C . 2 Functions of several variables 385 For C = 0, we get the two lines y = 2x and y = −2x intersecting at point (0, 0). For C = 0, each of the equations above gives two functions (namely one with sign ‘+’ and one with sign ‘−’) provided that C is chosen such that the term under the square root is non-negative. The level curves for some values of C are given in Figure 11.2. Figure 11.2 Level curves for function f with f (x, y) = 4x2 − y2 . Often, one wishes to determine to which value a function tends provided that the independent variables tend to a certain point x0 = (x10 , x20 , . . . , xn0 ). In the following, we introduce the limit of a function of n independent variables as a generalization of the limit of a function of one variable introduced in Chapter 4.1.1. Definition 11.1 The real number L is called the limit of the function f : Df → R, Df ⊆ Rn , as x = (x1 , x2 , . . . , xn ) tends to point x0 = (x10 , x20 , . . . , xn0 ) if for any real number > 0 there exists a real number δ = δ(000b) > 0 such that | f (x) − L |< , provided that |x1 − x10 | < δ, We can also write lim f (x) = L. x→x0 |x2 − x20 | < δ, .., |xn − xn0 | < δ. 386 Functions of several variables The limit exists only if L is a finite number. The left-hand side in the above notation means that the variables xi tend simultaneously and independently to xi0 . In the case of a function of one variable, there were only two possibilities in which a variable x might tend to a value x0 , namely either from the right-hand side or from the left-hand side. Definition 11.1 requires that the limit has to exist for any possible path on which x approaches x0 . A specific path in the case of a function f depending on two variables is e.g. considered if we first determine the limit as x tends to x0 , and then the limit as y tends to y0 , i.e. we determine L1 = lim lim f (x, y). y→y0 x→x0 Accordingly, we can interchange the order in which both limits above are determined. These specific limits are called iterated limits. However, from the existence and equality of both iterated limits one cannot conclude that the limit of function f exists, as the following example shows. Example 11.3 Let function f : Df → R with 0003 f (x, y) = 2x − 2y x+y 00042 Df = {(x, y) ∈ R2 | y = −x}, , and (x0 , y0 ) = (0, 0) be given. For the iterated limits we get 0003 L1 = lim lim y→0 x→0 0003 L2 = lim lim x→0 y→0 2x − 2y x+y 2x − 2y x+y 00042 = lim (−2)2 = lim 4 = 4; y→0 y→0 00042 = lim 22 = lim 4 = 4. x→0 x→0 Although the iterated limits exist and are equal, the limit of function f as (x, y) tends to (0, 0) does not exist since e.g. along the path y = x, we get f (x, y) = f (x, x) = 0 =0 2x for x = 0 and consequently L3 = lim f (x, x) = 0, x→0 which is different from L1 and L2 . Next, we deal with the generalization of the concept of a continuous function to the case of functions of more than one variable. Roughly speaking, a function f of n variables is continuous if small changes in the independent variables produce small changes in the function value. Formally we can introduce the following definition. Functions of several variables 387 Definition 11.2 A function f : Df → R, Df ⊆ Rn , is said to be continuous at point x0 = (x10 , x20 , . . . , xn0 ) ∈ Df if the limit of function f as x tends to x0 exists and if this limit coincides with the function value f (x0 ), i.e. lim f (x1 , x2 , . . . , xn ) = f (x10 , x20 , . . . , xn0 ). x→x0 If function f is continuous at all points x0 ∈ Df , then we also say that f is continuous. Analogously to functions of one variable (see Theorems 4.3 and 4.4 in Chapter 4.1.2), we get the following statement for functions of more than one variable: Any function of n > 1 independent variables that can be constructed from continuous functions by combining the operations of addition, subtraction, multiplication, division and functional composition is continuous wherever it is defined. 11.2 PARTIAL DERIVATIVES; GRADIENT In Chapter 4, we considered the first-order and higher-order derivatives of a function of one independent variable. In the following, we discuss how to find derivatives of a function depending on more than one variable. First, we assume that function f with z = f (x, y) depends on two independent variables x and y, and we introduce the notion of a partial derivative with respect to one of the two variables. Before proceeding, we introduce the notion of an -neighbourhood U (x0 ) of a point x0 ∈ Df ⊆ Rn . We denote by U (x0 ) with > 0 the set of all points x ∈ Rn such that the Euclidean distance between the vectors x and x0 is smaller than , i.e. 00107 ⎫ 00108 n ⎬ 0006 8 0010 U (x0 ) = x ∈ Rn 00100010 9 (xi − xi0 )2 < ⎩ ⎭ 0010 i=1 ⎧ ⎨ In the case of n = 2 variables, the above definition of an -neighbourhood of point (x0 , y0 ) includes all points that are within a circle (but not on the boundary of the circle) with the centre at point (x0 , y0 ) and the radius . Definition 11.3 Let function f : Df → R, Df ⊆ R2 , be defined in some neighbourhood U (x0 , y0 ) of the point (x0 , y0 ) ∈ Df . Then function f is said to be partially differentiable with respect to x at point (x0 , y0 ) if the limit lim x→0 f (x0 + x, y0 ) − f (x0 , y0 ) x 388 Functions of several variables exists, and f is said to be partially differentiable with respect to y at the point (x0 , y0 ) if the limit lim y→0 f (x0 , y0 + y) − f (x0 , y0 ) y exists. The above limits are denoted as fx (x0 , y0 ) resp. fy (x0 , y0 ) and are called partial derivatives (of the first order) of function f with respect to x and with respect to y at point (x0 , y0 ). We also write ∂f (x0 , y0 ) ∂x and ∂f (x0 , y0 ) ∂y for the partial derivatives with respect to x and y at point (x0 , y0 ). If for any point (x, y) ∈ Df the partial derivatives fx (x, y) and fy (x, y) exist, we get functions fx and fy by assigning to each point (x, y) the values fx (x, y) and fy (x, y), respectively. Finding the partial derivatives of function f with z = f (x, y) means: (1) to find fx (x, y), one has to differentiate function f with respect to variable x while treating variable y as a constant, and (2) to find fy (x, y), one has to differentiate function f with respect to variable y while treating variable x as a constant. The partial derivatives at some point (x0 , y0 ) correspond again to slopes of certain straight lines. We can give the following interpretations. (1) Geometric interpretation of fx (x0 , y0 ). The slope of the tangent line fT (x) to the curve of intersection of the surface z = f (x, y) and the plane y = y0 at the point (x0 , y0 , z 0 ) on the surface is equal to fx (x0 , y0 ). (2) Geometric interpretation of fy (x0 , y0 ). The slope of the tangent line fT (y) to the curve of intersection of the surface z = f (x, y) and the plane x = x0 at the point (x0 , y0 , z 0 ) on the surface is equal to fy (x0 , y0 ). The partial derivatives are illustrated in Figure 11.3. The slopes of the lines fT (x) and fT (y) are given by the partial derivatives of function f with z = f (x, y) with respect to the variables x and y, respectively, at point (x0 , y0 ). Both tangent lines span a plane fT (x, y), which is called the tangent plane to function z = f (x, y) at point (x0 , y0 ), and the equation of this tangent plane is given by fT (x, y) = f (x0 , y0 ) + fx (x0 , y0 ) · (x − x0 ) + fy (x0 , y0 ) · (y − y0 ). Functions of several variables 389 T T T Figure 11.3 Partial derivatives of function f with z = f (x, y) and the tangent plane fT (x, y). Example 11.4 Assume that the total cost C of a firm manufacturing two products A and B depends on the produced quantities. Let x be the number of produced units of product A, y be the number of produced units of product B and let the total cost C(x, y) be given as C(x, y) = 200 + 22x + 16y3/2 , DC = R2+ . The first term of function C represents the fixed cost and the other two terms represent the variable cost in dependence on the values x and y. We determine the partial derivatives with respect to x and y and obtain Cx (x, y) = 22 and Cy (x, y) = 3 √ √ · 16 · y = 24 y. 2 Cx (x, y) is known as the marginal cost of product A, and Cy (x, y) as the marginal cost of product B. The partial derivative Cx (x, y) = 22 can be interpreted such that, when the quantity of product B does not change, an increase of the production of product A by one unit leads to an approximate increase of 22 units in total cost (independent of the concrete value of variable x). When keeping the production of A constant, the marginal cost of product B depends on the particular value of variable y. For instance, if y = 16, then an increase √ of the production of B by one unit causes approximately an increase of Cy (x, 16) = 24 · 16 = 96 units in total cost, √ whereas for y = 100 the increase in total cost is approximately equal to Cy (x, 100) = 24 · 100 = 240 units. 390 Functions of several variables The previous considerations can be generalized to functions of more than two variables. In the following, we consider a function f with z = f (x1 , x2 , . . . , xn ) depending on the n independent variables x1 , x2 , . . . , xn . Definition 11.4 Let function f : Df → R, Df ⊆ Rn , be defined in some neighbourhood U (x0 ) of point x0 = (x10 , x20 , . . . , xn0 ) ∈ Df . Then function f is said to be partially differentiable with respect to xi at point x0 if the limit lim 0 , x 0 + x , x 0 , . . . , x 0 ) − f (x 0 , x 0 , . . . , x 0 ) f (x10 , . . . , xi−1 i i+1 n n i 1 2 xi →0 xi exists. The above limit is denoted by fxi (x0 ) and is called the partial derivative (of the first order) of function f with respect to xi at point x0 = (x10 , x20 , . . . , xn0 ). We also write ∂f (x10 , x20 , . . . , xn0 ) ∂xi for the partial derivative of function f with respect to xi at point x0 . The partial derivative fxi (x0 ) gives approximately the change in the function value that results from an increase of variable xi by one unit (i.e. from xi0 to xi0 + 1) while holding all other variables constant. That is, partial differentiation of a function of n variables means that we must keep n − 1 independent variables constant while allowing only one variable to change. Since we know already from Chapter 4 how to handle constants in the case of a function of one independent variable, it is not a problem to perform the partial differentiation. If function f is partially differentiable with respect to xi at any point x ∈ Df , then function f is said to be partially differentiable with respect to xi . If function f is partially differentiable with respect to all variables, f is said to be partially differentiable. If all partial derivatives are continuous, we say that function f is continuously partially differentiable. Example 11.5 We consider the cost function C : R3+ → R of a firm producing three products given by C(x1 , x2 , x3 ) = 300 + 2x1 + 0.5x22 + √ x1 x2 + x3 , Functions of several variables 391 where xi denotes the produced quantity of product i ∈ {1, 2, 3}. For the partial derivatives, we obtain x2 ; Cx1 (x1 , x2 , x3 ) = 2 + √ 2 x1 x2 + x 3 x1 Cx2 (x1 , x2 , x3 ) = x2 + √ ; 2 x1 x2 + x3 1 Cx3 (x1 , x2 , x3 ) = √ . 2 x1 x2 + x 3 Next, we introduce higher-order partial derivatives. Note that the first-order partial derivatives of a function f : Df → R, Df ⊆ Rn , are in general again functions of n variables x1 , x2 , . . . , xn . Thus, we can again determine the partial derivatives of these functions describing the first-order partial derivatives and so on. To illustrate this, consider a function f with z = f (x, y) depending on n = 2 variables. We obtain: ∂f (x, y) ∂f (x, y) = g(x, y); fy (x, y) = = h(x, y); ∂x ∂y ∂ 2 f (x, y) ∂g(x, y) ∂ 2 f (x, y) ∂h(x, y) fxx (x, y) = = = = gx (x, y); fyy (x, y) = = hy (x, y); 2 ∂x ∂y ∂x ∂y2 ∂ 2 f (x, y) ∂ 2 f (x, y) ∂g(x, y) ∂h(x, y) fxy (x, y) = = = gy (x, y); fyx (x, y) = = = hx (x, y). ∂y∂x ∂y ∂x∂y ∂x fx (x, y) = Here the notation fxy (x, y) means that we first differentiate function f with respect to variable x and then with respect to variable y, i.e. fxy (x, y) = ∂ ∂y 0003 ∂f (x, y) ∂x 0004 = ∂ 2 f (x, y) . ∂y∂x Analogously, we can consider mth-order partial derivatives that include m successive partial differentiations with respect to certain variables. If all partial derivatives of mth-order are continuous, we say that function f is m times continuously partially differentiable. Example 11.6 Let function f : R2 → [−1, 1] with z = f (x, y) = sin(3x − y) be a function of two independent variables. We determine all second-order partial derivatives of function f and obtain: fx (x, y) = 3 cos(3x − y); fxx (x, y) = −9 sin(3x − y); fy (x, y) = − cos(3x − y); fyx (x, y) = 3 sin(3x − y); fxy (x, y) = 3 sin(3x − y); fyy (x, y) = − sin(3x − y). 392 Functions of several variables We observe that in Example 11.6 the mixed second-order partial derivatives fxy (x, y) and fyx (x, y) are identical. We now ask for a condition under which the latter statement is necessarily true. This question is answered by the following theorem. THEOREM 11.1 (Young’s theorem) Let two mth-order partial derivatives of function f : Df → R, Df ⊆ Rn , involve the same number of differentiations with respect to each of the variables and suppose that they are both continuous. Then the two partial derivatives are necessarily equal at all points x ∈ Df . If in Theorem 11.1 only a particular point x0 ∈ Df is considered, it suffices to assume that both mth-order partial derivatives are continuous only in some neighbourhood U (x0 ) with > 0 to ensure equality of both mth-order partial derivatives at point x0 . Hereafter, for the sake of simplicity in the presentation, we always suppose in what follows that the corresponding assumptions on partial differentiability or continuous partial differentiability of a function f hold for all points of the domain Df . A special case of Theorem 11.1 is obtained when n = m = 2. Then Young’s theorem may be interpreted as follows: if functions fxy and fyx are continuous, then fxy (x, y) = fyx (x, y) holds at all points (x, y) ∈ Df . Definition 11.5 Let function f : Df → R, Df ⊆ Rn , be partially differentiable at point x0 ∈ Df . Then the vector ⎛ ⎜ ⎜ grad f (x0 ) = ⎜ ⎝ fx1 (x0 ) fx2 (x0 ) . . ⎞ ⎟ ⎟ ⎟ ⎠ fxn (x0 ) is called the gradient of function f at point x0 ∈ Df . The vector grad f (x0 ) has an important geometric interpretation. It gives the direction in which function f at point x0 ∈ Df increases most. Example 11.7 We consider function f : R2 → R with 1 f (x1 , x2 ) = 2x12 + x22 2 and determine the gradient at point x0 = (x10 , x20 ) = (1, 2). First, we find the partial derivatives fx1 (x1 , x2 ) = 4x1 and which yields 0003 grad f (x1 , x2 ) = 4x1 x2 0004 fx2 (x1 , x2 ) = x2 Functions of several variables 393 and thus 0003 grad f (1, 2) = 4 2 0004 . An illustration is given in Figure 11.4. Consider the level curve 1 f (x1 , x2 ) = 2x12 + x22 = c 2 with c = f (x10 , x20 ) = 4 which is an ellipse x12 x2 + 2 =1 2 8 √ √ with the half axes a = 2 and b = 8. Consider the tangent line to this level curve at point (1, 2). From Figure 11.4 we see that the gradient at this point, i.e. vector (4, 2)T , is orthogonal to the tangent line to the level curve at point (x10 , x20 ) = (1, 2). It can be shown that this property always holds. T Figure 11.4 The gradient of function f with f (x1 , x2 ) = 2x12 + x22 /2 at point (x10 , x20 ) = (1, 2). Example 11.8 Suppose that the production function f : R2+ → R+ of a firm is given by f (x, y) = 3x2 y + 0.5xey , 394 Functions of several variables where x stands for the amount of labour used and y stands for the amount of capital used. We obtain the gradient 0003 grad f (x, y) = 6xy + 0.5ey 3x2 + 0.5xey 0004 . If the current inputs are x0 = 10 and y0 = ln 12, then, in order to get the biggest increase in output, we have to determine the gradient at point (x0 , y0 ), and we obtain 0003 grad f (x0 , y0 ) = 60 ln 12 + 6 300 + 60 0004 0003 ≈ 155.09 360 0004 . So, to increase the output as much as possible, the firm should increase labour and capital approximately in the ratio 155.09 : 360, i.e. in the ratio 1:2.32. 11.3 TOTAL DIFFERENTIAL Partial derivatives consider only changes in one variable where the remaining ones are not changed. Now we investigate how a function value changes in the case of simultaneous changes of the independent variables. This leads to the concept of the total differential introduced in the following definition. Definition 11.6 Let function f : Df → R, Df ⊆ Rn , be continuously partially differentiable. Then the total differential of function f at point x0 = (x10 , x20 , . . . , xn0 ) is defined as dz = fx1 (x0 ) dx1 + fx2 (x0 ) dx2 + · · · + fxn (x0 ) dxn . The terms fxi (x0 ) dxi , i = 1, 2, . . . , n, are called partial differentials. The partial differential fxi (x0 ) dxi gives approximately the change in the function value when changing variable xi at point x0 by xi = dxi , i ∈ {1, 2, . . . , n}. The total differential is illustrated in Figure 11.5 for the case n = 2. We consider the points x0 and x0 + x, where x = (x1 , x2 ). Then the total differential gives the difference in the function value of the tangent plane at point x0 + x and the function value of function f at point x0 . If the changes x1 and x2 are small, then the total differential dz is approximately equal to z, which gives the difference in the function values of function f at the points x0 + x and x0 . In other words, in some neighbourhood of point x0 , we can approximate function f by the tangent plane fT (x1 , x2 ). Functions of several variables 395 T Figure 11.5 The total differential of function f with z = f (x, y). Example 11.9 We determine the total differential of function f : Df → R with x z = f (x, y) = ln tan . y For the first-order partial derivatives we get fx (x, y) = fy (x, y) = 1 y tan yx cos2 x y − yx2 tan yx cos2 x y = 1 2 = y sin yx cos yx y sin = −x −2x = . y2 sin yx cos yx y2 sin 2x y 2x y ; In the above transformation, we have applied the addition theorem for the sine function (see property (1) of the trigonometric functions in Chapter 3.3.3). Then the total differential is given by dz = fx (x, y) dx + fy (x, y) dy = 2 dx y sin 2x y − 2x dy y2 sin 2x y = 2 y2 sin 2x y · (y dx − x dy) . 396 Functions of several variables Next, we consider an application of the total differential, namely the estimation of the maximal error in the value of a function of n independent variables. Assume that we have to determine the function value z = f (x1 , x2 , . . . , xn ) at point x0 = (x10 , x20 , . . . , xn0 ), where some errors occur in the independent variables. The errors in these variables usually lead to an error z in the objective function value z. The aim is to estimate the error z provided that the errors x1 , x2 , . . . , xn in the variables x1 , x2 , . . . , xn are bounded by constants i , i.e. we have |xi | ≤ i . Since all errors xi , i = 1, 2, . . . , m, are assumed to be sufficiently small, we can apply Definition 11.6 and, because z ≈ dz, we obtain 0010 n 0010 n n 00100006 0010 0006 0006 0010 0010 0 | z | ≈ | dz | = 0010 |fxi (x0 )| · i . fxi (x ) · xi 0010 ≤ |fxi (x0 )| · |xi | ≤ 0010 0010 i=1 i=1 i=1 By means of the last formula, we have an estimation for the maximal absolute error z in the function value at point x0 provided that the absolute error in the variable xi at point xi0 is bounded by a sufficiently small constant i , i ∈ {1, 2, . . . , n}. Example 11.10 Assume that the radius R and the height H of a cylinder are measured with R = 5.05 ± 0.01 cm and H = 8.20 ± 0.005 cm, i.e. R0 = 5.05 cm, H0 = 8.20 cm, |dR| ≤ 0.01 cm and |dH | ≤ 0.005 cm. We wish to estimate the absolute and relative errors of the volume V of the cylinder. We have V0 = V (R0 , H0 ) = πR20 H0 = 656.97 cm3 . For the total differential we get dV = ∂V ∂V dR + dH = 2πR0 H0 dR + πR20 dH . ∂R ∂H Inserting R0 = 5.05, H0 = 8.20, |dR| ≤ 0.01 and |dH | ≤ 0.005, we obtain the estimate |V | ≈ |dV | ≤ |2πR0 H0 | · |dR| + |πR20 | · |dH | ≤ 82.82π · 0.01 + 25.5025π · 0.005 = 0.9557125π < 3.01 cm3 (note that we have used the triangle inequality to get the estimate given above), i.e. V0 ≈ 656.97 ± 3.01 cm3 . For the relative error we get 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 V 0010 0010 dV 0010 00100010 2πR0 H0 dR + πR20 dH 00100010 0010 dR 0010 0010 dH 0010 0010 0010≈0010 0010=0010 0010 0010+0010 0010 ≤ 2 0010 0010 V 0010 0010V 0010 0010 0010 R 0010 0010 H 0010 < 0.005. 0010 πR2 H 0 0 0 0 Thus, for the given maximum errors in the variables, we get the estimate that the relative error for the volume is less than 0.5 per cent. Functions of several variables 397 11.4 GENERALIZED CHAIN RULE; DIRECTIONAL DERIVATIVES In Chapter 11.2, we investigated how a function value changes if one moves along one of the axes. The answer was obtained by means of the partial derivatives with respect to one of the variables. Now we investigate a more general question, namely how a function value changes if we move along an arbitrary curve C. First, we assume that this curve C is given in a parametric form x = x(t) and y = y(t). As an example, we mention the parametric representation of a function f with z(t) = f (x(t), y(t)), where the domain is a circle given by all points (x, y) ∈ R2 with x = r sin t, y = r cos t; t ∈ [0, 2π], where r denotes the (constant) radius and t is the (variable) angle measured in radians. We only note that parameter t often represents the time in mathematical models of economical problems, i.e. we describe the variables x and y in representation f (x, y) as functions of time t. A point (x0 , y0 ) corresponds to some value t 0 in the above parametric representation, i.e. x0 = x(t 0 ) y0 = y(t 0 ). and Accordingly, for some point on the curve C with parameter value t = t 0 + t, we get the representation x = x(t 0 + t) and y = y(t 0 + t). Now we can conclude that, if t → 0, then also x → 0 and y → 0. Therefore, the derivative df (x, y) f (x(t 0 + t), y(t 0 + t)) − f (x(t 0 ), y(t 0 )) = lim t→0 dt t f (x0 + x, y0 + y) − f (x0 , y0 ) t→0 t = lim characterizes the approximate change of function f if parameter t increases from t0 by one unit, i.e. one moves along curve C from point (x(t0 ), y(t0 )) to point (x(t0 + 1), y(t0 + 1)) provided that the above limit exists. To obtain a formula for the derivative of such a function with variables depending on a parameter t, we consider the general case of n independent variables and assume that all variables xi , i = 1, 2, . . . , n, are also functions of a variable t, i.e. x1 = x1 (t), x2 = x2 (t), .., xn = xn (t). Thus, we have: z = f (x1 , x2 , . . . , xn ) = f (x1 (t), x2 (t), . . . , xn (t)) = z(t). 398 Functions of several variables To determine the derivative dz/dt, we have to compute dz(x1 , x2 , . . . , xn ) f (x1 + x1 , x2 + x2 , . . . , xn + xn ) − f (x1 , x2 , . . . , xn ) = lim , t→0 dt t provided that this limit exists, where xi = xi (t + t) − xi (t). This can be done by applying the following theorem. THEOREM 11.2 (chain rule) differentiable, where Let function f : Df → R, Df ⊆ Rn , be continuously partially x1 = x1 (t), x2 = x2 (t), . . . , xn = xn (t), and functions xi (t) : Dxi → R, i = 1, 2, . . . , n, are differentiable with respect to t. Then function z(t) = f (x(t)) = f (x1 (t), x2 (t), . . . , xn (t)) is also differentiable with respect to t, and we get dz dx1 dx2 dxn = fx1 (x) + fx2 (x) + . . . + fxn (x) . dt dt dt dt The above result can be generalized to the case when the functions xi depend on two variables u and v. We get the following theorem. THEOREM 11.3 (generalized chain rule) Let function f : Df → R, Df ⊆ Rn , be continuously partially differentiable, where x1 = x1 (u, v), x2 = x2 (u, v), .., xn = xn (u, v), and functions xi (u, v) : Dxi → R, Dxi ⊆ R2 , i = 1, 2, . . . , n, are continuously partially differentiable with respect to both variables u and v. Then function z with z(u, v) = f (x1 (u, v), x2 (u, v), . . . , xn (u, v)) is also partially differentiable with respect to both variables u and v, and ∂z = fx1 (x) ∂u ∂z = fx1 (x) ∂v ∂x1 + fx2 (x) ∂u ∂x1 + fx2 (x) ∂v ∂x2 ∂xn + . . . + fxn (x) , ∂u ∂u ∂x2 ∂xn + . . . + fxn (x) . ∂v ∂v Example 11.11 Consider function f : R3 → R+ with z = f (x1 , x2 , x3 ) = ex1 +2x2 −x3 , 2 Functions of several variables 399 where x1 = uv, x2 = + and x3 = u + 2v. We determine the partial derivatives of function f with respect to variables u and v. We obtain u2 fx1 = 2x1 ex1 +2x2 −x3 , 2 ∂x1 = v, ∂u v2 fx2 = 2ex1 +2x2 −x3 , ∂x1 = u, ∂v 2 ∂x2 = 2u, ∂u fx3 = −ex1 +2x2 −x3 , 2 ∂x2 = 2v, ∂v ∂x3 = 1, ∂u ∂x3 = 2, ∂v and thus for function z = z(u, v) ∂z ∂x1 ∂x2 ∂x3 2 2 2 2 = fx1 + fx2 + fx3 = (2uv2 + 4u − 1)eu v +2(u +v )−(u+2v) , ∂u ∂u ∂u ∂u ∂z ∂x3 ∂x1 ∂x2 2 2 2 2 = fx1 + fx2 + fx3 = (2u2 v + 4v − 2)eu v +2(u +v )−(u+2v) . ∂v ∂v ∂v ∂v Note that the same result is obtained if we first substitute the functions x1 = x1 (u, v), x2 = x2 (u, v) and x3 = x3 (u, v) into function f with z = f (x1 , x2 , x3 ) and determine the partial derivatives with respect to variables u and v. So far we have considered partial derivatives which characterize changes of the function value f (x) in a direction of the axis of one variable. In the following, we deal with the change of f (x) in a direction given by a certain vector r = (r1 , r2 , . . . , rn )T . This leads to the introduction of the directional derivative given in the following definition. Definition 11.7 Let function f : Df → R, Df ⊆ Rn , be continuously partially differentiable and r = (r1 , r2 , . . . , rn )T be a directional vector with |r| = 1. The term [grad f (x0 )]T · r = [∇f (x0 )]T · r = fx1 (x0 ) · r1 + fx2 (x0 ) · r2 + . . . + fxn (x0 ) · rn is called the directional derivative of function f at point x0 = (x10 , x20 , . . . , xn0 ) ∈ Df in the direction r. Notice that the directional derivative is a scalar obtained as the scalar product of the vectors grad f (x0 ) and r. The assumption |r| = 1 allows us to interpret the value of the directional derivative as the approximate change of the function value if we move one unit from point x0 in direction r. The directional derivative of function f at point x0 in the direction r is illustrated in Figure 11.6. It gives the slope of the tangent line fT (r) to function f at point x0 along the direction given by vector r. If vector ri = (0, . . . , 0, 1, 0, . . . , 0)T is the ith unit vector, the directional derivative in direction ri corresponds to the ith partial derivative, i.e. [grad f (x0 )]T · ri = fxi (x0 ). 400 Functions of several variables Figure 11.6 The directional derivative of function f at point x0 in the direction r. Thus, partial derivatives are special directional derivatives. For the directional derivative of function f at point x0 in the direction r, we also write ∂f 0 (x ). ∂r Example 11.12 The total cost function f : R3+ → R of a firm producing the quantities x1 , x2 and x3 of the three goods 1, 2 and 3 is given by f (x1 , x2 , x3 ) = 20 + x12 + 2x2 + 3x3 + 2x1 x3 . Let the current production quantities of the three goods be x10 = 10, x20 = 15 and x30 = 20. Assume that the firm has the chance to increase its production by exactly five units, namely two products by two units and the remaining product by one unit, and we want to know which change results in the smallest increase in cost. This means that we have to investigate the directional derivatives in the direction of the vectors ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 2 2 r2 = ⎝ 1 ⎠ and r3 = ⎝ 2 ⎠ . r1 = ⎝ 2 ⎠ , 2 2 1 First, we consider direction r1 = (1, 2, 2)T , and determine the directional derivative. The gradient of function f is given by ⎞ ⎛ 2x1 + 2x3 ⎠ 2 grad f (x1 , x2 , x3 ) = ⎝ 3 + 2x1 Functions of several variables 401 and therefore at point x0 = (x10 , x20 , x30 ) = (10, 15, 20) by ⎛ ⎞ 60 grad f (x ) = ⎝ 2 ⎠ . 23 0 Since r1 is not a unit vector, we have to divide vector r1 by its length |r1 | = 0018 √ 12 + 22 + 22 = 9 = 3, which yields vector ⎛ ⎞ 1 1⎝ ⎠ 2 . r = 1 = 3 |r | 2 1 r1 We obtain for the directional derivative [grad f (x0 )]T (r1 ) = (∂f )/(∂r1 ) at point x0 in the direction r1 : ⎛ ⎞ 1 ∂f 0 1 r1 0 T ⎝ 2 ⎠ = (x ) = [grad f (x )] · · (60, 2, 23) · 3 ∂r1 |r1 | 2 = 1 110 · (60 · 1 + 2 · 2 + 23 · 2) = . 3 3 How can we interpret the directional derivative (∂f )/(∂r1 )? When moving from x0 one unit in the direction of r1 (i.e. from x0 to x0 + r1 ), the increase in cost is approximately equal to 110/3. Equivalently, if we increase the production of good 1 by one unit, and of goods 2 and 3 by two units each, the approximate increase in the cost is 3 · 110/3 = 110 units. For the directional vector r2 = (2, 1, 2)T , we obtain first the unit vector ⎛ ⎞ 2 1⎝ ⎠ r2 2 1 r = 2 = 3 |r | 2 and then the directional derivative at point x0 in the direction r2 : ⎛ ⎞ 2 ∂f 0 r2 1 0 T ⎝ 1 ⎠ · (60, 2, 23) · (x ) = [grad f (x )] · = 3 ∂r2 |r2 | 2 = 1 1 · (60 · 2 + 2 · 1 + 23 · 2) = · 168 = 56. 3 3 Accordingly, for the directional vector r3 = (2, 2, 1)T , we obtain first the unit vector ⎛ ⎞ 2 1 r3 = 3 = ⎝ 2 ⎠ 3 |r | 1 r3 402 Functions of several variables and then the directional derivative at point x0 = (10, 15, 20) in the direction r3 : ⎛ ⎞ 2 1 r3 ∂f 0 0 T (x ) = [grad f (x )] · 3 = · (60, 2, 23) · ⎝ 2 ⎠ 3 ∂r3 |r | 1 = 1 1 · (60 · 2 + 2 · 2 + 23 · 1) = · 147 = 49. 3 3 Thus the cost has the smallest increase in the case of extending production of the three goods according to vector r1 = (1, 2, 2)T , and the firm would choose this variant. It is worth noting that we can here compare immediately the values of the directional derivatives since all vectors r1 , r2 , r3 considered have the same length 3. 11.5 PARTIAL RATE OF CHANGE AND ELASTICITY; HOMOGENEOUS FUNCTIONS As in the case of a function of one variable, we can define a rate of change and the elasticity of a function with respect to exactly one of the n variables keeping the remaining ones constant. For instance, if price and income may be considered as the two independent variables of a demand function, we can distinguish between price and income elasticity of demand. In this section, we generalize the rate of change and elasticity of a function of one variable introduced in Chapter 4.4 to the case of functions of several variables. Definition 11.8 Let function f : Df → R, Df ⊆ Rn , be partially differentiable and x0 = (x10 , x20 , . . . , xn0 ) ∈ Df with f (x0 ) = 0. The term ρf ,xi (x0 ) = fxi (x0 ) f (x0 ) is called the partial rate of change of function f with respect to xi at point x0 . Example 11.13 f (x, y) = x · Let function f : R2+ {(0, 0)} → R with √ y · ex/2+y/4 be given. We determine the partial rates of change of function f at point (20,4) and obtain 1 √ · x · y · ex/2+y/4 2 √ x y x/2+y/4 x fy (x, y) = √ · ex/2+y/4 + = ·e 2 y 4 fx (x, y) = √ y · ex/2+y/4 + 0011 x0012 √ y · ex/2+y/4 · 1 + 2 2+y x/2+y/4 . √ ·x·e 4 y = Functions of several variables 403 This yields ρf ,x (x, y) = 1+ fx (x, y) = f (x, y) x ρf ,y (x, y) = fy (x, y) 2+y = . f (x, y) 4y x 2 = 2+x 2x and For point (20, 4), we obtain ρf ,x (20, 4) = 0.55 and ρf ,y (20, 4) = 0.375. Definition 11.9 Let function f : Df → R, Df ⊆ Rn , be partially differentiable and x0 = (x10 , x20 , . . . , xn0 ) ∈ Df with f (x0 ) = 0. The term εf ,xi (x0 ) = xi0 · ρf ,xi (x0 ) = xi0 · fxi (x0 ) f (x0 ) is called the partial elasticity of function f with respect to xi at point x0 . For point x0 ∈ Df , the value εf ,xi (x0 ) is approximately equal to the percentage change in the function value caused by an increase of one per cent in the variable xi (i.e. from xi0 to 1.01xi0 ) while holding the other variables constant. One important class of functions in economics is the class of so-called homogeneous functions, which we introduce next. Definition 11.10 A function f : Df → R, Df ⊆ Rn , is said to be homogeneous of degree k on Df if t > 0 and (x1 , x2 , . . . , xn ) ∈ Df imply (tx1 , tx2 , . . . , txn ) ∈ Df and f (tx1 , tx2 , . . . , txn ) = t k · f (x1 , x2 , . . . , xn ) for all t > 0, where k can be positive, zero or negative. According to the latter definition, a function is said to be homogeneous of degree k if the multiplication of each of its independent variables by a constant t changes the function value by the proportion t k . The value t can take any positive value provided that (tx1 , tx2 , . . . , txn ) belongs to the domain Df of function f . We mention that many functions are not homogeneous of a certain degree. The notion of a homogeneous function is related to the notion ‘returns to scale’ in economics. It is said that for any production function, there are constant returns to scale, if a proportional increase in all independent variables x1 , x2 , . . . , xn leads to the same proportional increase in the function value (output) f (x1 , x2 , . . . , xn ). Thus, constant returns 404 Functions of several variables to scale would mean that we have a homogeneous function of degree k = 1. If there is a larger increase in the function value than in the independent variables (i.e. taking t times the original values of the input variables, the production output increases by a bigger factor than t), it is said that there are increasing returns to scale. This corresponds to a homogeneous function of degree k > 1. Finally, if there is a production function with decreasing returns to scale, then it corresponds to a homogeneous function of degree k < 1. We continue with two properties of homogeneous functions. THEOREM 11.4 (Euler’s theorem) Let function f : Df → R, Df ⊆ Rn , be continuously partially differentiable, where inequality t > 0 and inclusion x = (x1 , x2 , . . . , xn ) ∈ Df imply inclusion (tx1 , tx2 , . . . , txn ) ∈ Df . Then function f is homogeneous of degree k on Df if and only if the equation x1 · fx1 (x) + x2 · fx2 (x) + . . . + xn · fxn (x) = k · f (x) (11.1) holds for all points x = (x1 , x2 , . . . , xn ) ∈ Df . In gradient notation, we can express equation (11.1) in Euler’s theorem as xT · grad f (x) = k · f (x). Applying the definition of partial elasticity, we obtain the following corollary. COROLLARY 11.1 Let function f : Df → R, Df ⊆ Rn , with f (x) = 0 for x ∈ Df , be homogeneous of degree k. Then the sum of the partial elasticities of function f is equal to k, i.e. εf ,x1 (x) + εf ,x2 (x) + . . . + εf ,xn (x) = k. Example 11.14 Consider a Cobb–Douglas production function Q : R2+ → R with two variables K and L given by Q(K, L) = A · K a1 · La2 , where K denotes the quantity of capital and L denotes the quantity of labour used for the production of a firm. Determining the partial derivatives, we get QK (K, L) = A · a1 · K a1 −1 · La2 and QL (K, L) = A · K a1 · a2 · La2 −1 . The partial derivative QK is also known as the marginal product of capital and QL as the marginal product of labour. For the partial elasticities at point (K, L), we obtain f ,K (K, L) = K · A · a1 · K a1 −1 La2 = a1 A · K a1 · La2 and Functions of several variables 405 f ,L (K, L) = L · A · K a1 · La2 −1 · a2 A · K a1 · La2 = a2 . Thus, the exponents in the Cobb–Douglas function correspond to the partial elasticities. Finally, from the equalities Q(tK, tL) = A · (tK)a1 · (tL)a2 = t a1 +a2 · Q(K, L), we find that function Q is homogeneous of degree a1 + a2 . 11.6 IMPLICIT FUNCTIONS In our previous considerations, the function was defined explicitly, i.e. we considered a function of one or several variables in the form: y = f (x) and z = f (x1 , x2 , . . . , xn ), respectively. However, in many applications a relationship between two variables x and y may be given in the form: F(x, y) = 0, or correspondingly, F(x1 , x2 , . . . , xn ; z) = 0. In the former case, we say that y is implicitly defined as a function of x if there exists a function f with y = f (x) such that F(x, y) = F(x, f (x)) = 0. Let Example 11.15 x2 y2 − 1 = 0. + a2 b2 The above equation defines two elementary functions of the form 0019 y1 = +b 1− x2 a2 and 0019 y2 = −b 1− x2 . a2 Each of the above functions defines half an ellipse (see Figure 11.7) with the half-axes a and b and the centre at point (0, 0). Inserting the above explicit representations into F(x, y) = 0, we obtain in both cases the identity 0004 0003 x2 x2 − 1 = 0. + 1 − a2 a2 406 Functions of several variables Figure 11.7 The ellipse x2 /a2 + y2 /b2 − 1 = 0. However, not every implicit representation F(x, y) = 0 can be solved for y, e.g. the equation 2y6 + y − ln x = 0 cannot be solved for y. Similarly, one can consider an explicitly given system in the form: y1 = f1 (x1 , x2 , . . . , xn ) y2 = f2 (x1 , x2 , . . . , xn ) . . (11.2) ym = fm (x1 , x2 , . . . , xn ). Such a system of equations might be given in implicit form: F1 (x1 , x2 , . . . , xn ; y1 , y2 , . . . , ym ) = 0 F2 (x1 , x2 , . . . , xn ; y1 , y2 , . . . , ym ) = 0 . . (11.3) Fm (x1 , x2 , . . . , xn ; y1 , y2 , . . . , ym ) = 0. Representation (11.2) is referred to as the reduced form of system (11.3). The next problem we deal with is to find an answer to the following question: under what conditions is it possible to put a system given in the form (11.3) into its reduced form (11.2)? This question is answered by the following theorem. THEOREM 11.5 Let a system (11.3) of implicitly defined functions with continuous partial 0) derivatives with respect to all variables be given and (x0 ; y0 ) = (x10 , x20 , . . . , xn0 ; y10 , y20 , . . . , ym be a point that satisfies system (11.3). If the matrix of the partial derivatives of the functions Functions of several variables 407 (x0 ; y0 ), Fj with respect to the variables yk is regular at point i.e. 001f 001e ∂Fj (x0 ; y0 ) |J (x0 ; y0 )| = det = 0 (|J (x0 ; y0 )| is the Jacobian determinant), ∂yk then system (11.3) can be uniquely put into its reduced form (11.2) in a neighbourhood of point (x0 ; y0 ). We illustrate the use of Theorem 11.5 by the following example. Example 11.16 Consider the implicitly given system F1 (x1 , x2 ; y1 , y2 ) = x1 y1 − x2 − y2 = 0 (11.4) F2 (x1 , x2 ; y1 , y2 ) = x22 y2 − x1 + y1 = 0, (11.5) and we investigate whether this Jacobian determinant, we obtain 0010 ∂F 0010 1 0010 0010 ∂y1 |J (x1 , x2 ; y1 , y2 )| = 00100010 0010 ∂F2 0010 ∂y1 system can be put into its reduced form. Checking the ∂F1 ∂y2 ∂F2 ∂y2 0010 0010 0010 0010 0010 0010 x1 0010=0010 0010 0010 1 0010 0010 0010 −1 00100010 = x1 x22 + 1. 2 x2 0010 According to Theorem 11.5, we can conclude that the given system can be put into its reduced form for all points (x, y) = (x1 , x2 ; y1 , y2 ) with x1 x22 + 1 = 0 satisfying equations (11.4) and (11.5). Indeed, from equation (11.4) we obtain y2 = x1 y1 − x2 . Substituting the latter term into equation (11.5), we get x22 (x1 y1 − x2 ) − x1 + y1 = 0 which can be rewritten as (x1 x22 + 1)y1 = x1 + x23 . For x1 x22 + 1 = 0, we get y1 = x1 + x23 x1 x22 + 1 and 001e y2 = x1 · x1 + x23 x1 x22 +1 001f − x2 = x12 + x1 x23 − x1 x23 − x2 x1 x22 +1 = x12 − x2 x1 x22 + 1 . 408 Functions of several variables If in the representation F(x1 , x2 , . . . , xn ; z) = 0 variable z can be eliminated, we can perform the partial differentiation with respect to some variable as discussed before. However, if it is not possible to solve this equation for z, we can nevertheless perform the partial differentiations by applying the following theorem about the differentiation of an implicitly given function. THEOREM 11.6 (implicit-function theorem) Let function F : DF → R, DF ⊆ Rn+1 , with F(x; z) = F(x1 , x2 , . . . , xn ; z) be continuous and F(x0 , z 0 ) = 0. Moreover, let function F be continuously partially differentiable with Fz (x0 , z 0 ) = 0. Then in some neighbourhood U (x0 ) of point x0 with > 0, there exists a function f with z = f (x1 , x2 , . . . , xn ), and function f is continuously partially differentiable for x ∈ U (x0 ) with fxi (x) = fxi (x1 , x2 , . . . , xn ) = − Fxi (x; z) . Fz (x; z) For the special case of n = 2, the above implicit-function theorem can be formulated as follows. Let F(x, y) = 0 be an implicitly given function and (x0 , y0 ) be a point that satisfies the latter equation. Then, under the assumptions of Theorem 11.6, for points x of some interval (a, b) containing point x0 the first derivative of function f with y = f (x) is given by y0010 (x) = − Fx (x, y) . Fy (x, y) Example 11.17 defined by (11.6) We determine the derivative of function f with y = f (x) implicitly F(x, y) = 8x3 − 24xy2 + 16y3 = 0 by means of the implicit-function theorem. This yields Fx (x, y) = 24x2 − 24y2 , Fy (x, y) = −48xy + 48y2 , and we obtain y0010 = − x+y 24(x2 − y2 ) = . −48y(x − y) 2y As an alternative, we can also directly differentiate function F(x, y(x)) = 0 with respect to x. In this case, using the product rule for the second term of F(x, y) (since y depends on x), we get 24x2 − 24y2 − 48xyy0010 + 48y2 y0010 = 0, from which we obtain 48y0010 y(y − x) = 24(y2 − x2 ) and consequently y0010 = x+y . 2y Functions of several variables 409 The latter formula allows us to determine the value of the first derivative at some point (x, y) with F(x, y) = 0 without knowing the corresponding explicit representation y = f (x). This is particularly favourable when it is impossible or difficult to transform the implicit representation F(x, y) = 0 into an explicit representation y = f (x) of function f . Using formula (11.6), we can determine the second derivative y00100010 of function f . By applying the chain rule, we get Fxx (x, y) + Fxy (x, y) · yx · Fy (x, y) − Fyx (x, y) + Fyy (x, y) · yx · Fx (x, y) y00100010 = − Fy2 (x, y) with yx = y0010 = f 0010 . Substituting yx = y0010 according to formula (11.6), we obtain y00100010 = − Fxx (x, y) · Fy2 (x, y) − 2 · Fxy (x, y) · Fx (x, y) · Fy (x, y) + Fyy (x, y) · Fx2 (x, y) Fy3 (x, y) . (11.7) Example 11.18 Let function f : Df → R be implicitly given by F(x, y) = x − y + 2 sin y = 0. (Note that we cannot solve the above equation for y.) We determine y0010 and y00100010 . Using Fx (x, y) = 1 and Fy (x, y) = −1 + 2 cos y, we get y0010 = − 1 1 . = 1 − 2 cos y −1 + 2 cos y Moreover, using Fxx (x, y) = Fxy (x, y) = 0 and Fyy (x, y) = −2 sin y, we obtain by means of formula (11.7) y00100010 = − = 0 · (−1 + 2 cos y)2 − 2 · 0 · 0 · (−1 + 2 cos y) + (−2 sin y · 1) (−1 + 2 cos y)3 2 sin y . (2 cos y − 1)3 11.7 UNCONSTRAINED OPTIMIZATION 11.7.1 Optimality conditions In economics, one often has to look for a minimum or a maximum of a function depending on several variables. If no additional constraints have to be considered, it is an unconstrained 410 Functions of several variables optimization problem. As an introductory example we consider the maximization of the profit of a firm. Example 11.19 For simplicity assume that a firm produces only two goods X and Y . Suppose that this firm can sell each piece of product X for 45 EUR and each piece of product Y for 55 EUR. Denoting by x the quantity of product X and by y the quantity of product Y , we get revenue R = R(x, y) with R(x, y) = 45x + 55y, DR = R2+ . The production cost of the firm also depends on the quantities x and y. Suppose that this cost function C = C(x, y) is known as C(x, y) = 300 + x2 + 1.5y2 − 25x − 35y, DC = R2+ . One can easily show that inequality C(x, y) ≥ 0 holds for all non-negative values of variables x and y, and for the fixed cost of production one has C(0, 0) = 300. In order to determine the profit P = f (x, y) in dependence on both values x and y, we have to subtract the cost from revenue, i.e. f (x, y) = R(x, y) − C(x, y) = 45x + 55y − (300 + x2 + 1.5y2 − 25x − 35y) = −x2 + 70x − 1.5y2 + 90y − 300. To maximize the profit, we have to determine the point (x, y) ∈ R2+ that maximizes function f . We say that we look for an extreme point (in this case for a maximum point) of function f . This is an unconstrained optimization problem since no other constraints of the production have to be taken into consideration (except the usual non-negativity constraints of variables x and y). In the following, we look for necessary and sufficient conditions for the existence of extreme points of a function depending on n variables. First, we formally define a local extreme (i.e. minimum or maximum) point as well as a global extreme point. Definition 11.11 A function f : Df → R, Df ⊆ Rn , has a local maximum (minimum) at point x0 if there exists an -neighbourhood U (x0 ) ∈ Df of point x0 with > 0 such that f (x) ≤ f (x0 ) (f (x) ≥ f (x0 ), respectively) (11.8) for all points x ∈ U (x0 ). Point x0 is called a local maximum (minimum) point of function f . If inequality (11.8) holds for all points x ∈ Df , function f has at point x0 a global maximum (minimum), and x0 is called a global maximum (minimum) point of function f . Functions of several variables 411 THEOREM 11.7 (necessary first-order conditions) Let function f : Df → R, Df ⊆ Rn , be partially differentiable. If x0 = (x10 , x20 , . . . , xn0 ) ∈ Df is a local extreme point (i.e. a local maximum or minimum point) of function f , then grad f (x0 ) = grad f (x10 , x20 , . . . , xn0 ) = 0, i.e. fx1 (x0 ) = 0, fx2 (x0 ) = 0, .., fxn (x0 ) = 0. A point x0 with grad f (x0 ) = 0 is called a stationary point. According to Theorem 11.7 we can conclude that, if x0 = (x10 , x20 , . . . , xn0 ) is a local extreme point (assuming that function f is partially differentiable at this point), then it has to be a stationary point. In order to determine a global minimum or maximum point of a function f depending on n variables, we have to search among the stationary points (which can be found by means of differential calculus), the points on the boundary of the domain and possibly among the points where the partial derivatives do not exist. This is similar to the case of a function of one variable (see Chapter 4). In the following, we focus on the determination of local extreme points by differential calculus. Next, we are looking for a sufficient condition for the existence of a local extreme point. In the case of a function of one variable, we have presented one criterion which uses higherorder derivatives. In the following, we present a criterion that uses the second-order partial derivatives to answer the question of whether a given stationary point is indeed a local extreme point. We begin with the definition of the Hessian. Definition 11.12 The matrix ⎛ ⎜ 0011 0012 ⎜ Hf (x0 ) = fxi ,xj (x0 ) = ⎜ ⎝ fx1 x1 (x0 ) fx1 x2 (x0 ) fx2 x1 (x0 ) fx2 x2 (x0 ) . . . . · · · fx1 xn (x0 ) · · · fx2 xn (x0 ) . . fxn x1 (x0 ) · · · fxn xn (x0 ) fxn x2 (x0 ) ⎞ ⎟ ⎟ ⎟ ⎠ is called the Hessian, or the Hessian matrix, of function f at point x0 = (x10 , x20 , . . . , xn0 ) ∈ Df ⊆ Rn . Under the assumptions of Theorem 11.1 (i.e. continuous second-order partial derivatives), the Hessian is a symmetric matrix. The following theorems present sufficient conditions for so-called isolated (local or global) extreme points, for which in Definition 11.11, the strict inequality holds in inequality (11.8) for x = x0 . THEOREM 11.8 (local and global sufficient second-order conditions) Let function f : Df → R, Df ⊆ Rn , be twice continuously partially differentiable and let x0 = (x10 , x20 , . . . , xn0 ) ∈ Df be a stationary point of function f . Then: (1) If the Hessian Hf (x0 ) is negative (positive) definite, then x0 is a local maximum (minimum) point of function f . 412 Functions of several variables (2) If Hf (x0 ) is indefinite, then function f does not have a local extremum at point x0 . (3) If Hf (x) is negative (positive) definite for all points x ∈ Df , then x0 is a global maximum (minimum) point of function f over Df . A stationary point x0 of f (x) that is neither a local maximum nor a local minimum point is called a saddle point of function f . Next, we give a sufficient condition for a stationary point to be either a local minimum, a local maximum or a saddle point in the case of a function of two variables. THEOREM 11.9 (special case of Theorem 11.8 for the case n = 2) Let function f : Df → R, Df ⊆ R2 , be twice continuously partially differentiable and let (x0 , y0 ) ∈ Df be a stationary point of function f . Then: (1) If 000e 000f2 |Hf (x0 , y0 )| = fxx (x0 , y0 ) · fyy (x0 , y0 ) − fxy (x0 , y0 ) > 0 , then (x0 , y0 ) is a local extreme point of function f , in particular: (a) If fxx (x0 , y0 ) < 0 or fyy (x0 , y0 ) < 0, it is a local maximum point. (b) If fxx (x0 , y0 ) > 0 or fyy (x0 , y0 ) > 0, it is a local minimum point. (2) If |Hf (x0 , y0 )| < 0, then (x0 , y0 ) is a saddle point of function f . (3) If |Hf (x0 , y0 )| = 0, then (x0 , y0 ) could be a local maximum point, a local minimum point, or a saddle point of function f . In case (1) of Theorem 11.9, we can take any of the second-order partial derivatives fxx or fyy in order to decide whether a stationary point is a local maximum or minimum point. Notice also that in the case of a function f depending on two variables, at a saddle point function f has a local maximum with respect to one of the independent variables and a local minimum with respect to the other independent variable. Now we are able to find the extreme point of the profit function f with z = f (x, y) given in Example 11.1. Example 11.1 (continued) We first establish the necessary first-order conditions according to Theorem 11.7 and obtain fx (x, y) = −2x + 70 = 0 fy (x, y) = −3y + 90 = 0. Thus, we get the only stationary point P1 : (x1 , y1 ) = (35, 30). In order to check the sufficient second-order conditions according to Theorem 11.9, we determine the second-order partial derivatives fxx (x, y) = −2, fyy (x, y) = −3, fxy (x, y) = fyx (x, y) = 0, Functions of several variables 413 set up the Hessian 0003 fxx (x, y) Hf (x, y) = fyx (x, y) fxy (x, y) fyy (x, y) 0004 0003 = −2 0 0 −3 0004 , and obtain 000b2 |Hf (35, 30)| = fxx (35, 30) · fyy (35, 30) − fxy (35, 30) = (−2) · (−3) − 02 = 6 > 0. Therefore, point P1 is a local extreme point. It is worth noting that in this case the value of the determinant of the Hessian does not even depend on the concrete values of variables x and y. Moreover, since fxx (35, 30) = −2 < 0 (or analagously since fyy (35, 30) = −3 < 0) point P1 with (x1 , y1 ) = (35, 30) is a local maximum point of the profit function. The maximal profit is equal to P = f (35, 30) = 2, 275 units. Example 11.20 f (x, y) = Let function f : R2 → R with 1 3 1 2 1 x + x y + x2 − 4x − y3 3 2 6 be given. We determine all local extreme points of function f . The necessary first-order conditions according to Theorem 11.7 for an extreme point are fx (x, y) = x2 + xy + 2x − 4 = 0 (11.9) 1 1 fy (x, y) = x2 − y2 = 0. 2 2 (11.10) From equation (11.10) we obtain x2 = y2 which yields |x| = |y| or equivalently y = ±x. Case 1 y = x. Substituting y = x into equation (11.9), we get 2x2 + 2x − 4 = 0, or accordingly, x2 + x − 2 = 0. This quadratic equation has the two solutions x1 = −2 and x2 = 1. Thus, we get y1 = −2 and y2 = 1, which gives the two stationary points P1 : (x1 , y1 ) = (−2, −2) and P2 : (x2 , y2 ) = (1, 1). Case 2 y = −x. In this case, we obtain from condition (11.9) the equation 2x − 4 = 0 which yields x3 = 2 and y3 = −2 and the stationary point P3 : (x3 , y3 ) = (2, −2). Thus, we have three stationary points, P1 : (−2, −2), P2 : (1, 1) and P3 : (2, −2). Next, we check the sufficient second-order conditions according to Theorem 11.9 and obtain fxx (x, y) = 2x + y + 2, fyy (x, y) = −y, fxy (x, y) = fyx (x, y) = x. 414 Functions of several variables This yields the Hessian 0003 fxx (x, y) Hf (x, y) = fyx (x, y) fxy (x, y) fyy (x, y) 0004 0003 = 2x + y + 2 x x −y 0004 , and for the stationary points, we obtain 000b2 |Hf (−2, −2)| = fxx (−2, −2) · fyy (−2, −2) − fxy (−2, −2) = (−4) · 2 − (−2)2 = −12 < 0, 2 |Hf (1, 1)| = fxx (1, 1) · fyy (1, 1) − fxy (1, 1) = 5 · (−1) − 12 = −6 < 0, 000b2 |Hf (2, −2)| = fxx (2, −2) · fyy (2, −2) − fxy (2, −2) = 4 · 2 − 22 = 4 > 0. Thus, according to Theorem 11.9, P1 and P2 are saddle points, and only point P3 is a local extreme point of function f . Due to fxx (2, −2) = 4 > 0 (or accordingly fyy (2, −2) = 2 > 0), point P3 is a local minimum point with the function value f (x3 , y3 ) = −4. We continue with another example of finding the maximum profit of a firm. Example 11.21 We consider the profit maximization problem of a firm that uses two input factors F1 and F2 to produce one output. Let P be the constant price per unit of the output, x be the quantity of factor F1 , y be the quantity of factor F2 as well as p1 and p2 be the prices per unit of the two inputs. Moreover, function f : R2+ → R is the production function of the firm, i.e. f (x, y) gives the quantity produced in dependence on quantities x and y. Thus, Pf (x, y) denotes revenue from which cost of production, i.e. p1 x + p2 y, has to be subtracted to get the profit function g : R2+ → R. This problem can be written as g(x, y) = P · f (x, y) − p1 x − p2 y → max! The necessary first-order conditions (according to Theorem 11.7) for the optimal choices of the input factors F1 and F2 are as follows: gx (x, y) = Pfx (x, y) − p1 = 0 gy (x, y) = Pfy (x, y) − p2 = 0. Let us suppose that the production function f is of the Cobb–Douglas type given by f (x, y) = x1/2 · y1/3 . Then the two necessary first-order optimality conditions according to Theorem 11.7 become 1 · P · x−1/2 · y1/3 − p1 = 0 2 1 gy (x, y) = · P · x1/2 · y−2/3 − p2 = 0. 3 gx (x, y) = Functions of several variables 415 Multiplying the first equation by x and the second one by y, we get 1 · P · x1/2 · y1/3 − p1 x = 0 2 1 · P · x1/2 · y1/3 − p2 y = 0. 3 Moreover, multiplying the first equation by −2/3 and summing up both equations gives 2 p1 x − p2 y = 0 3 and therefore y= 2p1 · x. 3p2 Substituting this into the above equation gx (x, y) = 0, we get 0003 1 · P · x−1/2 · 2 2p1 ·x 3p2 00041/3 − p1 = 0. The latter can be rewritten as the equation 1 · P · x(−1/2+1/3) · 2 0003 2p1 3p2 00041/3 = p1 from which we obtain % x= P · 2p1 0003 2p1 3p2 00041/3 &6 and thus 0003 x0 = P 2p1 00046 0003 0004 0004 0003 0004 0003 2p1 2 P6 P 4 P 2 · = = · . 3p2 2p1 3p2 (2p1 )4 · (3p2 )2 Furthermore, we get y0 = 2p1 2p1 · x0 = · 3p2 3p2 0003 P 2p1 00044 0003 0004 0004 0003 0004 0003 P 2 P 3 P 3 · = · . 3p2 2p1 3p2 Thus, we have obtained the only stationary point 001e0003 (x0 , y0 ) = P 2p1 00044 0003 0004 0003 0004 0003 0004 001f P 2 P 3 P 3 · , · 3p2 2p1 3p2 for the quantities of both input factors F1 and F2 . 416 Functions of several variables Checking the sufficient second-order conditions according to Theorem 11.9, we get 1 2 gxx (x, y) = − · P · x−3/2 · y1/3 , gyy (x, y) = − · P · x1/2 · y−5/3 , 4 9 1 gxy (x, y) = gyx (x, y) = · P · x−1/2 · y−2/3 6 and consequently, 0010 0010 0010 0010 − 1 · P · x−3/2 · y1/3 gxy (x, y) 00100010 00100010 4 = gyy (x, y) 0010 00100010 1 −1/2 · y−2/3 0010 6 ·P·x 2 1 1 = · P 2 · x−1 · y−4/3 − · P 2 · x−1 · y−4/3 = 36 36 36 1 · P · x−1/2 · y−2/3 6 2 − · P · x1/2 · y−5/3 9 0010 0010 g (x, y) |Hf (x, y)| = 00100010 xx gyx (x, y) 0010 0010 0010 0010 0010 0010 0010 0010 · P 2 · x−1 · y−4/3 . Since x0 and y0 are positive, we have obtained inequality |Hf (x0 , y0 )| > 0 which means that (x0 , y0 ) is a local extreme point. Since gxx (x0 , y0 ) < 0, (x0 , y0 ) is a local maximum point according to Theorem 11.9, and the corresponding supply is given by %0003 P f (x0 , y0 ) = = 2p1 0004 0003 0004 0003 P 3 P 2 = · . 2p1 3p2 1/2 x0 1/3 · y0 00044 0003 0004 &1/2 %0003 0004 0003 0004 &1/3 P 2 P 3 P 3 · · · 3p2 2p1 3p2 Example 11.22 Each of two enterprises E1 and E2 offers a single product. The relationships between the outputs x1 , x2 and the prices p1 , p2 are as follows: x1 = 110 − 2p1 − p2 , x2 = 140 − p1 − 3p2 . The total costs for the enterprises are given by C1 (x1 ) = 120 + 2x1 , C2 (x2 ) = 140 + 2x2 . (a) First, we determine the profit functions f1 , f2 of both enterprises and the total profit function f = f1 + f2 . We recall that the profit function fi , i ∈ {1, 2}, is obtained as the difference of the revenue xi pi and the cost Ci = Ci (xi ), i.e. f1 (p1 , p2 ) = x1 p1 − C1 (x1 ) = (110 − 2p1 − p2 ) · p1 − [120 + 2(110 − 2p1 − p2 )] = 110p1 − 2p12 − p1 p2 − 120 − 220 + 4p1 + 2p2 = −340 − 2p12 − p1 p2 + 114p1 + 2p2 , Functions of several variables 417 f2 (p1 , p2 ) = x2 p2 − C2 (x2 ) = (140 − p1 − 3p2 ) · p2 − [140 + 2(140 − p1 − 3p2 )] = 140p2 − p1 p2 − 3p22 − 140 − 280 + 2p1 + 6p2 = −420 − p1 p2 − 3p22 + 2p1 + 146p2 . This yields the total profit function f = f1 + f2 as follows: f (p1 , p2 ) = −760 − 2p12 − 2p1 p2 − 3p22 + 116p1 + 148p2 . (b) Next, we determine prices p1 and p2 such that the total profit is maximized. Applying Theorem 11.7, we get fp1 (p1 , p2 ) = −4p1 − 2p2 + 116 = 0 fp2 (p1 , p2 ) = −2p1 − 6p2 + 148 = 0. The latter system can be rewritten as 2p1 + p2 = 58 p1 + 3p2 = 74, which has the unique solution p10 = 20 and p20 = 18. Checking the sufficient conditions according to Theorem 11.9, we obtain 000b2 |Hf (20, 18)| = fp1 ,p1 (20, 18) · fp2 ,p2 (20, 18) − fp1 ,p2 (20, 18) = (−4) · (−6) − (−2)2 = 20 > 0. Therefore, the stationary point obtained is a local extreme point. From fp1 ,p1 (20, 18) = −4 < 0, we have found that (p10 , p20 ) = (20, 18) is a local maximum point with the total profit f (20, 18) = −760 − 2 · (20)2 − 2 · 20 · 18 − 3 · (18)2 + 116 · 20 + 148 · 18 = 1, 732. (c) Following some problems between the enterprises, E2 fixes the price p2∗ = 22. Next, we want to investigate which price p1∗ would ensure a maximum profit f1 for enterprise E1 under this additional condition. In this case, we get a function f1 with f1 (p1 , 22) = −340 − 2p12 − 22p1 + 114p1 + 44 = −296 − 2p12 + 92p1 which depends now only on variable p1 . We denote F(p1 ) = f1 (p1 , 22). Applying the criterion given in Chapter 4.5 (see Theorem 4.12), we get as a necessary condition F 0010 (p1 ) = −4p1 + 92 = 0 418 Functions of several variables which yields the only solution p1∗ = 23. Since F 00100010 (p1∗ ) = −4 < 0, the solution obtained is indeed the maximum profit of enterprise E1 under the above constraint p2∗ = 22. From equality f (23, 22) = 1, 642, the total profit of both enterprises reduces by 1, 732 − 1, 642 = 90 units in this case. Example 11.23 The profit function P : R3+ → R for some agricultural product is a function of the quantities used of three sorts of chemical fertilizers. Let function P with 1 1 1 P(C1 , C2 , C3 ) = 340 − C13 + C12 + 2C1 C2 − C22 + C2 C3 − C32 + 2C1 + C2 + C3 3 2 2 be given, where Ci denotes the quantity of the ith chemical fertilizer used in tons, i ∈ {1, 2, 3}, and P(C1 , C2 , C3 ) is the resulting profit in thousands of EUR. We determine the quantities of the three sorts of chemical fertilizers such that the profit is maximized. Checking the necessary conditions of Theorem 11.7, we get PC1 = −C12 + C1 + 2C2 + 2 = 0 (11.11) PC2 = 2C1 − 2C2 + C3 + 1 = 0 (11.12) PC3 = C2 − C3 + 1 = 0. (11.13) Adding equations (11.12) and (11.13), we obtain C2 = 2C1 + 2 = 2(C1 + 1). (11.14) Inserting equation (11.14) into equation (11.11), we get −C12 + C1 + 4C1 + 4 + 2 = 0 which can be rewritten as C12 − 5C1 − 6 = 0. (1) (2) The latter quadratic equation has two solutions denoted by C1 and C1 : (1) C1 = 6 and (2) C1 = −1. (2) The second root C1 = −1 does not lead to a point belonging to the domain of function P. (1) Using C1 = 6, we get from equations (11.14) and (11.13) (1) (1) C2 = 2(C1 + 1) = 14 and (1) (1) C3 = C2 + 1 = 15. So, we have only one stationary point (6, 14, 15) within the domain of function P. Next, we check whether this point is indeed a local maximum point. We get the Hessian Functions of several variables 419 HP (C1 , C2 , C3 ) of function P as follows: ⎛ −2C1 + 1 2 2 −2 HP (C1 , C2 , C3 ) = ⎝ 0 1 (1) (1) ⎞ 0 1 ⎠. −1 (1) Thus, for point (C1 , C2 , C3 ) = (6, 14, 15), we obtain ⎛ ⎞ −11 2 0 2 −2 1 ⎠. HP (6, 14, 15) = ⎝ 0 1 −1 Checking the sufficient conditions for a local maximum point, we determine the leading principal minors D1 , D2 and D3 of matrix HP (6, 14, 15) and test the conditions of Theorem 11.8 using Theorem 10.7: 0010 0010 0010 −11 2 00100010 D1 = −11 < 0, D2 = 00100010 = 18 > 0, 2 −2 0010 0010 0010 0010 −11 2 0 00100010 0010 2 −2 1 00100010 = −7 < 0. D3 = 00100010 0010 0 1 −1 0010 Thus, from Theorem 10.7 the Hessian HP (C1 , C2 , C3 ) is negative definite for point (6, 14, 15), and this point is indeed the only local maximum point found by differential calculus. Since function P has no larger objective function values on the boundary of the domain and since function P has continuous partial derivatives, the obtained solution is also the global maximum point with function value P(6, 14, 15) = 396.5, which means that the resulting profit is equal to 396, 500 EUR. 11.7.2 Method of least squares We consider the following application of the determination of an extreme point of a function of several variables. Assume that n points (xi , yi ), i = 1, 2, . . . , n, are given. The aim is to determine a function f of a specified class (for instance a linear function or a quadratic function) such that f (x) describes the relationship between x and y ‘as well as possible’ for this class of functions. We apply the criterion of minimizing the sum of the squared differences between yi and f (xi ) (denoted as the Gaussian method of least squares), i.e. Q= n 0006 (f (xi ) − yi )2 → min! i=1 It is assumed that we look for an approximation by a linear function f with f (x) = ax + b (see Figure 11.8). The objective formulated above leads to the following optimization problem with function Q = Q(a, b) depending on two arguments a and b: Q(a, b) = n 0006 i=1 (axi + b − yi )2 → min! 420 Functions of several variables Figure 11.8 Approximation by a linear funtion f with f (x) = ax + b. As we know from Theorem 11.7, the necessary conditions for an extreme point are that the first-order partial derivatives with respect to the variables a and b are equal to zero: Qa (a, b) = 0 Qb (a, b) = 0. and Determining the first-order partial derivatives of function Q, we get: Qa (a, b) = 2 n 0006 xi (axi + b − yi ) = 0 i=1 Qb (a, b) = 2 n 0006 (axi + b − yi ) = 0. i=1 This is a system of two linear equations with two variables a and b which can be solved by applying Gaussian elimination or the pivoting procedure (we leave this calculation to the reader). As a result, we obtain the values of parameters a and b as follows: n· a= n 0006 001e xi y i − i=1 001e n 0006 i=1 001f 001e xi · i=1 n· b= n 0006 n 0006 001e xi2 − i=1 001f 001e xi2 · n 0006 001f i=1 001f2 , 001e n 0006 001f 001e xi · i=1 001e xi2 − (11.15) xi i=1 i=1 n· 001f yi i=1 n 0006 yi + n 0006 n 0006 n 0006 i=1 001f2 xi n 0006 i=1 001f x i yi . Functions of several variables 421 The latter equality can be rewritten in terms of parameter a calculated according to equation (11.15) as follows: n 0006 b= yi − a · i=1 n 0006 i=1 n xi . (11.16) One can derive that the denominator of value a is equal to zero if and only if all xi values are equal. Thus, we get a unique candidate for a local extreme point of function Q provided that not all xi are equal. Applying Theorem 11.9, we get that a and b obtained above satisfy the sufficient conditions for a local minimum point and we have determined the ‘best’ straight line ‘through’ this set of n points according to the criterion of minimizing the sum of the squared differences between yi and the function values f (xi ) of the determined straight line. Example 11.24 The steel production of Nowhereland during the period 1996–2001 (in millions of tons) was as shown in Table 11.1: Table 11.1 Data for Example 11.24 Year 1995 1996 1997 1998 1999 2000 2001 Production 450.2 458.6 471.4 480.9 492.6 501.0 512.2 Drawing the relationship between the year and the steel production in a coordinate system, we see that it is justified to describe it by a linear function, and therefore we set f (x) = ax + b. First, we assign a value xi to each year as follows and compute the required values. Table 11.2 Application of Gaussian method of least squares xi2 Year i xi 1995 1996 1997 1998 1999 2000 2001 1 2 3 4 5 6 7 −3 −2 −1 0 1 2 3 450.2 458.6 471.4 480.9 492.6 501.0 512.2 9 4 1 0 1 4 9 −1,350.6 − 917.2 − 471.4 0 492.6 1,002.0 1,536.6 0 3,366.9 28 292.0 Sum yi xi yi In Table 11.2, we have taken the year 1998 as origin x4 = 0 (of course, this can be arbitrarily chosen). With our choice, the x values are integer values ranging from −3 to 3, i.e. they are symmetric around zero. Such a choice is recommended for cases when n is odd and distances between successive x values are equal since it simplifies that calculations. 422 Functions of several variables Using n = 7, we can immediately apply formulas (11.15) and (11.16), and we obtain 001e n 001f 001e n 001f n 0006 0006 0006 xi y i − xi · yi n· 7 · 292.0 − 0 · 3366.9 i=1 i=1 i=1 a= = ≈ 10.429, 001e n 001f2 n 7 · 28 − 02 0006 0006 n· xi2 − xi i=1 n 0006 b= yi − a · i=1 i=1 n 0006 i=1 n xi = 3366.9 − 10.429 · 0 ≈ 480.986. 7 The above values for parameters a and b have been rounded to three decimal places. Thus, we have found function f with f (x) ≈ 10.429x + 480.986 describing steel production between 1995 and 2001 (i.e. for the years x ∈ {−3, −2, −1, 0, 1, 2, 3}). Using this trend function, we obtain for the expected steel production for the year 2006 (i.e. for x = 8): f (8) ≈ 10.429 · 8 + 480.986 = 564.418 millions of tons. Example 11.25 Assume that the following ordered pairs have been found experimentally: (x1 , y1 ) = (1, 3), (x2 , y2 ) = (2, 8), (x3 , y3 ) = (−1, 0) and (x4 , y4 ) = (−2, 1). Further, we assume that the relationship between variables x and y should be expressed by a function f of the class f (x) = ax + bx2 . In this case, we cannot use formulas (11.15) and (11.16) for calculating the parameters a and b, but we have to derive formulas for determining the parameters of the required function. This may be done in a quite similar way as for the approximation by a linear function. According to the Gaussian method of least squares we get the problem Q(a, b) = n 0011 0006 axi + bxi2 − yi 00122 → min! i=1 Using n = 4, we obtain the first-order necessary conditions for a local extreme point: Qa (a, b) = 2 4 0006 0011 0012 xi axi + bxi2 − yi = 0 i=1 Qb (a, b) = 2 4 0006 i=1 0011 0012 xi2 axi + bxi2 − yi = 0. Functions of several variables 423 We can rewrite the above conditions as follows: 4 0006 xi yi = a i=1 4 0006 4 0006 xi2 + b i=1 xi2 yi = a i=1 4 0006 4 0006 xi3 (11.17) xi4 . (11.18) i=1 xi3 + b i=1 4 0006 i=1 From equation (11.17) we obtain 10a = 17 and thus a = 1.7, and from equation (11.18) we obtain 34b = 39 and thus b = 39/34 ≈ 1.147. Hence the best function f of the chosen class according to Gauss’s method of least squares is approximately given by f (x) ≈ 1.7x + 1.147x2 . 11.7.3 Extreme points of implicit functions Next, we give a criterion for the determination of local extreme points in the case of an implicitly defined function F(x, y) = 0. THEOREM 11.10 Let function f : Df → R, Df ⊆ R, be implicitly given by F(x, y) = 0 and let function F be twice continuously partially differentiable. If the four conditions (1) F(x0 , y0 ) = 0, (2) Fx (x0 , y0 ) = 0, (3) Fy (x0 , y0 ) = 0 and (4) Fxx (x0 , y0 ) = 0 are satisfied, then (x0 , y0 ) is a local extreme point of function f , in particular: (1) If Fxx (x0 , y0 ) > 0, Fy (x0 , y0 ) then (x0 , y0 ) is a local maximum point. (2) If Fxx (x0 , y0 ) < 0, Fy (x0 , y0 ) then (x0 , y0 ) is a local minimum point. Example 11.26 Given F(x, y) = −x + 2y2 − 8 = 0, 2 we determine local extreme points by applying Theorem 11.10. To find the stationary points, we check Fx (x, y) = −2x = 0, from which we obtain the only solution x1 = 0. Substituting this into F(x, y) = 0, we get the 424 Functions of several variables two solutions y1 = 2 and y2 = −2, i.e. there are two stationary points P1 : (x1 , y1 ) = (0, 2) and P2 : (x1 , y2 ) = (0, −2). Furthermore we get Fy (x, y) = 4y and Fxx (x, y) = −2. For point P1 , this yields −2 Fxx (0, 2) = < 0, Fy (0, 2) 8 i.e. point (0, 2) is a local minimum point. For point P2 , we get −2 Fxx (0, −2) = > 0, Fy (0, −2) −8 i.e. point (0, −2) is a local maximum point. It should be noted that F(x, y) = 0 is not a function but only a relation here, nevertheless we can apply Theorem 11.10 for finding local extreme points. 11.8 CONSTRAINED OPTIMIZATION So far, we have dealt with optimization problems without additional constraints. However, in economics, one often looks for an optimum of a function subject to certain constraints. Such problems are known as constrained optimization problems. We have already considered problems where the output of a firm depends on the quantity of the used capital K and the used quantity of labour L. A natural question in economics is the maximization of the output subject to certain budget constraints. This means that the quantity of capital and labour which can be used in production is bounded by some given constants. In the case of a linear objective function and linear inequality constraints, we have already considered such problems in Chapter 9. Here, we consider the minimization (or maximization) of a function subject to some equality constraints. Such a problem may arise if the output Q(K, L) has to be equal to a given isoquant Q∗ . The production cost depends on the quantity of used capital and labour. In that case, the objective is the minimization of a cost function f depending on variables K and L subject to the constraint Q(K, L) = Q∗ . In this section, we consider the general case of constrained optimization problems with non-linear objective and/or constraint functions. 11.8.1 Local optimality conditions We consider the problem z = f (x1 , x2 , . . . , xn ) → min! (max!) s.t. g1 (x1 , x2 , . . . , xn ) = 0 g2 (x1 , x2 , . . . , xn ) = 0 . . gm (x1 , x2 , . . . , xn ) = 0, and we assume that m < n. Next, we discuss two basic methods of solving the above problem. Functions of several variables 425 Optimization by substitution If the constraints gi (x1 , x2 , . . . , xn ) = 0, i = 1, 2, . . . , m, can be put into its reduced form (according to Theorem 11.6, this means that the Jacobian determinant at point x = (x1 , x2 , . . . , xn ) is different from zero) x1 = w1 (xm+1 , xm+2 , . . . , xn ) x2 = w2 (xm+1 , xm+2 , . . . , xn ) . . xm = wm (xm+1 , xm+2 , . . . , xn ), we can substitute the terms for x1 , x2 , . . . , xm into the given function f . As a result, we get an unconstrained problem with n − m variables W (xm+1 , xm+2 , . . . , xn ) → min! (max!), which we can already treat. Necessary and sufficient optimality conditions for the case n = m + 1 (i.e. function W depends on one variable) have been given in Chapter 4.5 and for the case n > m + 1 (function W depends on several variables) in Chapter 11.7. This method is known as optimization by substitution. The procedure can also be applied to non-linear functions provided that it is possible to eliminate the variables x1 , x2 , . . . , xm . This approach is mainly applied in the case when we look for the optimum of a function of two variables subject to one constraint (i.e. n = 2, m = 1). Lagrange multiplier method In what follows, we discuss an alternative general procedure known as the Lagrange multiplier method which works as follows. We introduce a new variable λi for each constraint i (1 ≤ i ≤ m), and add all the constraints together with the multipliers to the function to be minimized (or maximized). Thus, we obtain: L(x, λ) = L(x1 , x2 , . . . , xn ; λ1 , λ2 , . . . , λm ) = f (x1 , x2 , . . . , xn ) + m 0006 λi · gi (x1 , x2 , . . . , xn ). i=1 The above function L is called the Lagrangian function and the parameters λi are known as Lagrangian multipliers. Note that function L depends on n + m variables. The following theorem gives a necessary condition for the existence of a local extreme point of the Lagrangian function. THEOREM 11.11 (Lagrange’s theorem) Let functions f : Df → R, Df ⊆ Rn , and gi : Dgi → R, Dgi ⊆ Rn , i = 1, 2, . . . , m, m < n, be continuously partially differentiable 426 Functions of several variables and let x0 = (x10 , x20 , . . . , xn0 ) ∈ Df be a local extreme point of function f subject to the constraints gi (x1 , x2 , . . . , xn ) = 0, i = 1, 2, . . . , m. Moreover, let |J (x10 , x20 , . . . xn0 )| = 0. Then grad L(x0 , λ0 ) = 0. (11.19) Condition (11.19) in Theorem 11.11 means that the Lagrangian function L(x; λ) has a stationary point at (x10 , x20 , . . . , xn0 ; λ01 , λ02 , . . . , λ0m ) and can be written in detail as follows: Lxj (x0 ; λ0 ) = fxj (x10 , x20 , . . . , xn0 ) + m 0006 i=1 Lλi (x0 ; λ0 ) = gi (x10 , x20 , . . . , xn0 ) = 0 ; λ0i · ∂gi (x10 , x20 , . . . , xn0 ) =0; ∂xj j = 1, 2, . . . , n, i = 1, 2, . . . , m. The second part of the formula above, which requires that the partial derivatives of function L with respect to all variables λi must be equal to zero, guarantees that only vectors x are obtained which satisfy all constraints. Assume that we have determined the stationary points of the Lagrangian function by means of Theorem 11.11. We now look for sufficient conditions to verify whether a stationary point is indeed an (isolated) local extreme point and if so, whether it is a local minimum or a local maximum point. In the case of an unconstrained optimization problem, a criterion has already been given, where the Hessian has to be checked. As the following theorem shows, in the case of constrained optimization problems, we have to check the so-called bordered Hessian which includes the usual Hessian in its lower right part as a submatrix. THEOREM 11.12 (local sufficient conditions) Let functions f : Df → R, Df ⊆ Rn , and gi : Dgi → R, Dgi ⊆ Rn , i = 1, 2, . . . , m, m < n, be twice continuously partially differentiable and let (x0 ; λ0 ) with x0 ∈ Df be a solution of the system grad L(x; λ) = 0. Moreover, let ⎛ ⎞ Lλ1 x1 (x; λ) · · · Lλ1 xn (x; λ) 0 ··· 0 ⎜ ⎟ . . . . ⎜ ⎟ . . . . ⎜ ⎟ ⎜ ⎟ 0 · · · 0 L (x; λ) · · · L (x; λ) λ x λ x m 1 m n ⎟ HL (x; λ) = ⎜ ⎜ Lx λ (x; λ) · · · Lx λ (x; λ) Lx x (x; λ) · · · Lx x (x; λ) ⎟ m n 1 1 1 1 1 1 ⎜ ⎟ ⎜ ⎟ . . . . ⎝ ⎠ . . . . Lxn λ1 (x; λ) · · · Lxn λm (x; λ) Lxn x1 (x; λ) · · · Lxn xn (x; λ) be the bordered Hessian and consider its leading principal minors |H¯ j (x0 ; λ0 )| of the order j = 2m + 1, 2m + 2, . . . , n + m at point (x0 ; λ0 ). Then: (1) If all leading principal minors |H¯ j (x0 ; λ0 )|, 2m + 1 ≤ j ≤ n + m, have the sign (−1)m , then x0 = (x10 , x20 , . . . , xn0 ) is a local minimum point of function f subject to the constraints gi (x) = 0, i = 1, 2, . . . , m. (2) If all leading principal minors |H¯ j (x0 ; λ0 )|, 2m + 1 ≤ j ≤ n + m, alternate in sign, the sign of |H¯ n+m (x0 ; λ0 )| = |HL (x0 ; λ0 )| being that of (−1)n , then x0 = (x10 , x20 , . . . , xn0 ) is a local maximum point of function f subject to the constraints gi (x) = 0, i = 1, 2, . . . , m. (3) If neither the conditions of (1) nor those of (2) are satisfied, then x0 is not a local extreme point of function f subject to the constraints gi (x) = 0, i = 1, 2, . . . , m. Here the case Functions of several variables 427 when one or several leading principal minors have value zero is not considered a violation of condition (1) or (2). It is worth noting that the order of the bordered Hessian HL is n + m if the problem includes n variables xi and m constraints. We illustrate the application of Theorem 11.12 for two special cases. Assume first that n = 2 and m = 1, i.e. function f depends on two variables x, y and there is one constraint g(x, y) = 0. In that case, we have 2m+1 = 3 and only the determinant |H¯ 3 (x0 , y0 ; λ0 )| = |HL (x0 , y0 ; λ0 )| has to be evaluated at the stationary point (x0 , y0 ; λ0 ). If |H¯ 3 (x0 , y0 ; λ0 )| < 0, i.e. the sign of the determinant is equal to (−1)m = (−1)1 = −1, then (x0 , y0 ) is a local minimum point according to part (1) of Theorem 11.12. If |H¯ 3 (x0 , y0 ; λ0 )| > 0, i.e. the sign of the determinant is equal to (−1)n = (−1)2 = 1, then (x0 , y0 ) is a local maximum point according to part (2) of Theorem 11.12. However, if |H¯ 3 (x0 , λ0 )| = 0, then a decision cannot be made on the basis of Theorem 11.12 since, due to the comment at the end of part (3), none of the conditions (1), (2) or (3) is satisfied (i.e. neither condition (1) nor condition (2) holds, nor does the violation mentioned in (3) occur). Assume now that n = 3 and m = 1, i.e. function f depends on three variables and there is one constraint g(x) = 0. In this case, the bordered Hessian HL has order n + m = 4. Moreover, we have 2m + 1 = 3, i.e. the leading principal minors |H¯ 3 (x0 ; λ0 )| and |H¯ 4 (x0 ; λ0 )| = |HL (x0 ; λ0 )| have to be evaluated. From Theorem 11.12 we can draw the following conclusions. If |H¯ 4 (x0 ; λ0 )| > 0, point x0 is not a local extreme point due to part (3) of Theorem 11.12, since the sign of |H¯ 4 (x0 ; λ0 )| is different both from (−1)m = (−1)1 = −1 and (−1)n = (−1)3 = −1. If |H¯ 4 (x0 , λ0 )| < 0 and |H¯ 3 (x0 , λ0 )| < 0, i.e. both leading principal minors have the sign (−1)m = (−1)1 = −1, then according to part (1) of Theorem 11.12, x0 is a local minimum point. If |H¯ 4 (x0 ; λ0 )| < 0, but |H¯ 3 (x0 ; λ0 )| > 0, i.e. both leading principal minors have alternative signs, and the sign of |H¯ 4 (x0 ; λ0 )| is equal to (−1)n = (−1)3 = −1, then according to part (2) of Theorem 11.12, x0 is a local maximum point. If |H¯ 4 (x0 ; λ0 )| < 0, but |H¯ 3 (x0 ; λ0 )| = 0, then a decision about an extreme point cannot be made. (Neither condition (1) nor condition (2) is satisfied, but according to the remark at the end of part (3), none of the conditions are violated.) If |H¯ 4 (x0 ; λ0 )| = 0, then a decision about an extreme point cannot be made independently of the value of the leading principal minor |H¯ 3 (x0 ; λ0 )| (see remark at the end of part (3) of Theorem 11.12). Example 11.27 with We want to determine all local minima and maxima of function f : R2 → R z = f (x, y) = x2 + 4y2 − 3 subject to the constraint g(x, y) = 4x2 − 16x + y2 + 12 = 0. We formulate the Lagrangian function depending on the three variables x, y and λ, which gives L(x, y, λ) = x2 + 4y2 − 3 + λ(4x2 − 16x + y2 + 12). 428 Functions of several variables The necessary conditions for a local optimum according to Theorem 11.11 are Lx (x, y, λ) = 2x + 8λx − 16λ = 0 (11.20) Ly (x, y, λ) = 8y + 2λy = 0 (11.21) Lλ (x, y, λ) = 4x − 16x + y + 12 = 0. 2 2 (11.22) Factoring out 2y in equation (11.21), we obtain 2y · (4 + λ) = 0. Thus, we have to consider the following two cases. y = 0. Substituting y = 0 into equation (11.22), we get 4x2 − 16x + 12 = 0 and Case 1 x2 − 4x + 3 = 0, respectively. This yields √ x1 = 2 + 4 − 3 = 3 and x2 = 2 − √ 4 − 3 = 1, and from equation (11.20) we obtain the corresponding λ-values λ1 = −3/4 and λ2 = 1/4. Hence point P1 : (x1 , y1 ) = (3, 0); with λ1 = − 3 4 and point P2 : (x2 , y2 ) = (1, 0); with λ2 = 1 4 are candidates for a local extreme point in this case. Case 2 4 + λ = 0, i.e. λ = −4. Then we obtain from (11.20) the equation 2x − 32x + 64 = 0, which has the solution x3 = 64 32 = . 30 15 Substituting the latter into equation (11.22), we obtain the equation 0003 4 32 15 00042 − 16 · 32 + y2 + 12 = 0 15 which yields y2 = 884 . 225 Hence we obtain also the following points 001e 001f √ 32 884 , ≈ 1.98 ; P3 : (x3 , y3 ) = 15 15 with λ3 = −4 Functions of several variables 429 and 001e P4 : (x3 , y4 ) = 001f √ 884 32 ,− ≈ −1.98 ; 15 15 with λ4 = −4 as candidates for a local extreme point. Next, we check the local sufficient conditions for a local extreme point given in Theorem 11.12. We first determine Lxx (x, y; λ) = 2 + 8λ Lxy (x, y; λ) = 0 = Lyx (x, y; λ) Lyy (x, y; λ) = 8 + 2λ Lxλ (x, y; λ) = 8x − 16 = Lλx (x, y; λ) Lyλ (x, y; λ) = 2y = Lλy (x, y; λ). Thus, we obtain the bordered Hessian ⎛ 8x − 16 0 HL (x, y; λ) = ⎝ 8x − 16 2 + 8λ 0 2y ⎞ 2y ⎠. 0 8 + 2λ Since we have two variables in function f and one constraint, we apply Theorem 11.12 with n = 2 and m = 1. This means that we have to check only the leading principal minor for j = 2m + 1 = n + m = 3, i.e. we have to check only the determinant of the bordered Hessian to apply Theorem 11.12. We obtain for the four stationary points of the Lagrangian function: 0010 0010 0010 0003 0010 0010 00040010 0010 0 8 0 00100010 0010 0010 0010 0010 0010 3 0010HL 3, 0; − 0010 = 0010 8 −4 0 0010 = 6.5 · 0010 0 8 0010 = 6.5 · (−64) < 0, 0010 0010 8 −4 0010 4 0010 00100010 0 0 6.5 00100010 0010 0010 0010 0003 0010 0010 00040010 0010 0 −8 0 0010 0010 0010 0010 0010 0010 0 −8 0010 1 0010HL 1, 0; 0010 = 0010 −8 4 0010 0010 0010 0 0010 = 8.5 · 0010 0010 −8 4 0010 4 0010 00100010 0 0 8.5 0010 = 8.5 · (−64) < 0, 0010 0010 √ 0010 2 884 00100010 16 0010 0 0010 0010 15 15 0010 001e 0010 001f0010 0010 √ 0010 0010 0010 0010 16 32 884 0010 0010 0010 0010 −30 0 , ; −4 0010 = 0010 0010HL 0010 > 0, 0010 0010 0010 0010 15 15 15 0010 √ 0010 0010 2 884 0010 0010 0010 0 0 0010 0010 15 0010 0010 √ 0010 2 884 00100010 16 0010 − 0 0010 0010 15 15 0010 001e 0010 001f0010 0010 √ 0010 0010 0010 0010 16 32 884 0010 0010 0010 0010 −30 0 ,− ; −4 0010 = 0010 0010HL 0010 > 0. 0010 0010 0010 0010 15 15 15 0010 0010 √ 0010 2 884 0010 0010 0010 0 0 0010 − 0010 15 430 Functions of several variables Hence, due to part (1) of Theorem 11.12, the points (3, 0) and (1, 0) are local minimum points, since in both cases the sign of the determinant of the bordered Hessian is √ equal to 1 = −1, and due to part (2) of Theorem 11.12, the points (32/15, 884/15) (−1)m = (−1)√ and (32/15, − 884/15) are local maximum points, since the sign of the determinant of the bordered Hessian is equal to (−1)n = (−1)2 = +1. Notice that in this problem the check of the conditions of Theorem 11.12 reduces to the determination of the sign of the determinant of the bordered Hessian (however, in the general case of an unconstrained optimization problem we might have to check the signs of several leading principal minors in order to verify a local extreme point). Some level curves of function f together with the constraint g(x, y) = 0 are given in Figure 11.9. To illustrate, we consider equation f (x, y) = 0 corresponding to x2 + 4y2 = 3 which can be rewritten as 0003 x √ 3 001e 00042 + y 001f2 √ 3 2 = 1. √ √ The latter is the equation of an ellipse with the half-axes a = 3 and b = 3/2. If we consider now f (x, y) = C with C = 0, we also get ellipses with different half-axes, where the half-axes are larger the bigger the value of C is. The constraint g(x, y) = 0 can be rewritten as 4(x2 − 4x) + y2 + 12 = 4(x2 − 4x + 4) + y2 − 4 = 0 which also corresponds to the equation of an ellipse: (x − 2)2 + 0011 y 00122 2 = 1. Figure 11.9 Level curves for function f with f (x, y) = x2 + 4y2 − 3. From Figure 11.9 one can confirm that z1 = 6 and z2 = −2 are indeed the local minima and z3 = z4 = 3885/225 ≈ 17.3 are the local maxima of function f subject to the given constraint. Functions of several variables 431 Example 11.28 A utility function assigns to each point (vector) x = (x1 , x2 , . . . , xn ), where the ith component xi describes the amount of the ith commodity (i = 1, 2, …, n; such a vector is also called a commodity bundle), a real number u(x1 , x2 , . . . , un ) which measures the degree of satisfaction or utility of the consumer with the given commodity bundle. Assume that a Cobb–Douglas utility function u : R2+ → R+ depending on the amounts of two commodities with u(x, y) = x2 y3 is given. We wish to maximize this utility function subject to the constraint 7x + 5y = 105. The equation 7x + 5y = 105 describes the budget constraint. It is convenient to take logarithms of function u and work with equation ln u(x, y) = 2 ln x + 3 ln y. This is possible since, due to the strict monotonicity of the logarithmic function, it has no influence on the optimal solution whether we minimize function u or function ln u. Thus, we have to solve the following constrained optimization problem: f (x, y) = 2 ln x + 3 ln y → max! subject to 7x + 5y = 105. We consider two approaches to solving this problem. The first way is to apply optimization by substitution, which means that we have to substitute the budget constraint into the function of the maximization problem from the beginning. This yields 7 y = w(x) = 21 − x. 5 If we do this, the problem is now as follows: 0003 0004 7 W (x) = 2 ln x + 3 ln 21 − x → max! 5 The necessary optimality condition of first order for this problem is as follows (see Chapter 4.5, Theorem 4.12): W 0010 (x) = 5 7 2 −3· · =0 x 105 − 7x 5 which can be written as W 0010 (x) = 2 21 − = 0. x 105 − 7x 432 Functions of several variables The latter equation yields the solution x0 = 6. Substituting this back into the budget constraint, we get y0 = 21 − 7 · 6 = 12.6. 5 To check the sufficient condition given in Chapter 4.5 (see Theorem 4.14), we determine W 00100010 (x) = − 2 −7 . + 21 · 2 x (105 − 7x)2 Thus, we obtain for x0 = 6 W 00100010 (6) = − 2 147 ≈ −0.0926 < 0, − 36 632 and we can conclude that x0 = 6 is a local maximum point of function W . The second way to solve this example is to apply Lagrange’s multiplier method. We set up the Lagrangian function L with L(x, y; λ) = 2 ln x + 3 ln y + λ(7x + 5y − 105) and differentiate it to get the three necessary optimality conditions of the first order according to Theorem 11.11: 2 + 7λ = 0 x 3 Ly (x, y; λ) = + 5λ = 0 y Lx (x, y; λ) = Lλ (x, y, λ) = 7x + 5y − 105 = 0. (11.23) (11.24) (11.25) Let us solve this system first for λ and then for x and y. After cross-multiplying, we get from equations (11.23) and (11.24) 7λx = −2 and 5λy = −3. We now add these two equations and use equation (11.25): −5 = λ(7x + 5y) = 105λ which gives λ0 = − 1 . 21 Functions of several variables 433 Just as before, substituting this back into the equations (11.23) and (11.24) and solving them for x and y, we get x0 = 6 y0 = 12.6. and To check the sufficient conditions given in Theorem 11.12, we determine Lxx (x, y; λ) = − 2 x2 Lxy (x, y; λ) = 0 = Lxy (x, y; λ) 3 Lyy (x, y; λ) = − 2 y Lxλ (x, y; λ) = 7 = Lλx (x, y; λ) Lyλ (x, y; λ) = 5 = Lλy (x, y; λ), and obtain the bordered Hessian: ⎛ 7 5 0 ⎜ 2 ⎜ 7 0 − 2 HL (x, y; λ) = ⎜ ⎜ x ⎝ 3 5 0 − 2 y ⎞ ⎟ ⎟ ⎟. ⎟ ⎠ Since n = 2 and m = 1, according to Theorem 11.12 we have to check only one leading principal minor for j = 2m + 1 = 3 which corresponds to the determinant of the bordered Hessian. For point (x0 , y0 ; λ0 ) = (6, 12.6; −1/21), we get ⎛ 0003 HL 6, 12.6; − 1 21 0004 0 ⎜ ⎜ =⎜ 7 ⎝ 5 7 1 − 18 0 5 0 − 3 12.62 ⎞ ⎟ ⎟ ⎟. ⎠ Thus, we obtain 0010 0010 0003 00040010 0010 7 0010 0010 0010HL 6, 12.6; − 1 0010 = (−7) · 00100010 0010 0010 0010 0 21 5 3 − 12.62 0010 0010 0010 0010 7 0010 0010 1 + 5 · 0010 0010 0010 0010 − 18 0010 5 0010 0010 0010 > 0. 0 0010 Notice that we do not have to calculate the values of both determinants above, we only need the sign of the left-hand-side determinant. Since the first summand obviously has a value greater than zero (the first determinant has a negative value multiplied by a negative factor) and the second determinant also has a value greater than zero, the determinant of the bordered Hessian is greater than zero. According to part (2) of Theorem 11.12, we find that P 0 : (x0 , y0 ) = (6, 12.6) is a local maximum point since the sign of the determinant of the bordered Hessian is equal to (−1)n = (−1)2 = +1. 434 Functions of several variables 11.8.2 Global optimality conditions Next, we present a sufficient condition for a global maximum and a global minimum point. THEOREM 11.13 (global sufficient conditions) Let functions f : Df → R, Df ⊆ Rn , and gi : Dgi → R, Dgi ⊆ Rn , i = 1, 2, . . . , m, m < n, be twice continuously partially differentiable and let (x0 ; λ0 ) with x0 ∈ Df be a solution of the system grad L(x; λ) = 0. If the Hessian ⎛ ⎞ Lx1 x1 (x; λ0 ) · · · Lx1 xn (x; λ0 ) ⎜ ⎟ . . H (x; λ0 ) = ⎝ ⎠ . . Lxn x1 (x; λ0 ) · · · Lxn xn (x; λ0 ) (1) is negative definite for all points x ∈ Df , then x0 is the global maximum point of function f subject to the constraints gi (x) = 0, i = 1, 2, . . . , m; (2) is positive definite for all points x ∈ Df , then x0 is the global minimum point of function f subject to the constraints gi (x) = 0, i = 1, 2, . . . , m. Theorem 11.13 requires us to check whether the Hessian is positive or negative definite for all points x ∈ Df , which can be rather complicated or even impossible. If function f and the constraints gj contain only quadratic and linear terms, the second-order partial derivatives are constant, and evaluation of the determinant of HL (x, λ0 ) is possible. We look at the following example. Example 11.29 Given is the function f : Df → R, Df ⊆ R3 , with f (x, y, z) = 2x + 2y − 2z s.t. x2 + 2y2 + 3z 2 = 66. We determine all global optimum points by applying Theorems 11.11 and 11.13. Setting up the Lagrangian function, we get L(x, y, z; λ) = 2x + 2y − 2z + λ(x2 + 2y2 + 3z 2 − 66). Applying Theorem 11.11, we get the necessary optimality conditions Lx = 2 + 2λx = 0 (11.26) Ly = 2 + 4λy = 0 (11.27) Lz = −2 + 6λz = 0 (11.28) Lλ = x2 + 2y2 + 3z 2 − 66 = 0. (11.29) From equations (11.26) to (11.28), we obtain x=− 1 , λ2 y=− 1 2λ and z= 1 . 3λ Functions of several variables 435 Substituting the above terms for x, y and z into equation (11.29), we obtain x2 + 2y2 + 3z 2 = 1 1 1 + 2 · 2 + 3 · 2 = 66 4λ 9λ λ2 which can be rewritten as 0004 0003 1 1 11 1 1 + = 2· = 66. · 1 + 2 3 6 λ2 λ This yields λ2 = 1 36 which gives the two solutions λ1 = 1 6 and 1 λ2 = − . 6 Using equations (11.26) to (11.28), we obtain the following two stationary points of the Lagrangian function: P1 : x1 = −6, P2 : x2 = 6, y1 = −3, z1 = 2; with λ1 = 1 6 and y2 = 3, z2 = −2; with 1 λ2 = − . 6 We now check the sufficient conditions for a global optimum point given in Theorem 11.13. For point P1 , we obtain ⎛ ⎞ 1 ⎞ ⎛ 0 0 0 0 2λ1 ⎜ 3 ⎟ ⎜ ⎟ 2 0 ⎠=⎜ H (x, y, z; λ1 ) = ⎝ 0 4λ1 ⎟. 0 0 ⎝ ⎠ 0 0 6λ1 3 0 0 1 (Notice that the above matrix does not depend on variables x, y and z since the mixed secondorder partial derivatives are constant.) All leading principal minors have a positive sign and by Theorem 10.7, matrix H (x, y, z; λ1 ) is positive definite. Therefore, P1 is the global minimum point. ⎛ ⎞ 1 ⎞ ⎛ − 0 0 0 0 2λ1 ⎜ 3 ⎟ ⎜ ⎟ 2 0 ⎠=⎜ H (x, y, z; λ2 ) = ⎝ 0 4λ1 ⎟. 0 − 0 ⎝ ⎠ 0 0 6λ1 3 0 0 −1 The leading principal minors alternate in sign beginning with a negative sign of the first leading principal minor. Therefore, matrix H (x, y, z; λ2 ) is negative definite, and P2 is the global maximum point. 436 Functions of several variables 11.9 DOUBLE INTEGRALS In this section, we extend the notion of the definite integral to functions of more than one variable. Integrals of functions of more than one variable have to be evaluated in probability theory and statistics in connection with multi-dimensional probability distributions. In the following, we restrict ourselves to continuous functions f depending on two independent variables. Definition 11.13 The integrals ⎛ ⎞ y,2 (x) ,b ,b ⎜ ⎟ I1 = ⎝ f (x, y) dy⎠ dx = F2 (x) dx y1 (x) a a and ⎛ ,d x,2 (y) ⎜ ⎝ I2 = ⎞ ⎟ f (x, y) dx⎠ dy = x1 (y) c ,d F1 (y) dy c are called (iterated) double integrals. For simplicity, the parentheses for the inner integral are often dropped. To evaluate integral I1 , we first determine y,2 (x) F2 (x) = f (x, y) dy, y1 (x) where x is (in the same way as for partial differentiation) treated as a constant, and then we evaluate the integral ,b I1 = F2 (x) dx a as in the case of integrating a function of one variable. To evaluate both integrals, we can apply the rules presented in Chapter 5. Example 11.30 We evaluate the integral , , I= 2xy dy dx R Functions of several variables 437 where R contains all points (x, y) with ≤ y ≤ 3x and 0 ≤ x ≤ 1 (see Figure 11.10). We obtain ⎛ ⎞ ,1 ,3x ,1 000e 00103x ( ⎜ 0010 ⎟ I = ⎝ 2xy dy⎠ dx = xy2 0010 2 dx x2 x 0 = 0 x2 ,1 000e x · (3x) − x · (x ) 0 0003 = 2 2 2 000f ,1 dx = (9x3 − x5 ) dx 0 00040010 9 4 1 6 001000101 9 1 25 x − x 0010 = − = . 4 6 4 6 12 0 Figure 11.10 Region R considered in Example 11.30. Finally, we consider the case where both the lower and the upper limits of integration of both variables are constant. In this case, we get the following property. THEOREM 11.14 Let function f : Df → R, Df ⊆ R2 , be continuous in the rectangle given by a ≤ x ≤ b and c ≤ y ≤ d. Then ,b a ⎛ ⎝ ,d c ⎞ f (x, y) dy ⎠ dx = ,d c ⎛ ⎝ ,b a ⎞ f (x, y) dx ⎠ dy. 438 Functions of several variables One application of double integrals is the determination of the volume of a solid bounded above by the surface z = f (x, y), below by the xy plane, and on the sides by the vertical walls defined by the region R. If we have f (x, y) ≥ 0 over a region R, then the volume of the solid under the graph of function f and over the region R is given by , V = f (x, y) dx dy. R Example 11.31 We want to find the volume V under the graph of function f with f (x, y) = 2x2 + y2 and over the rectangular region given by 0 ≤ x ≤ 2 and 0 ≤ y ≤ 1. We obtain: , ,2 (2x2 + y2 ) dy dx = V = R ⎛ 1 ⎞ , ⎝ (2x2 + y2 ) dy⎠ dx 0 0 ,2 0003 000400101 y3 00100010 2x2 y + dx 3 00100 = 0 ,2 0003 = 2x2 + 1 3 0 0004 0003 dx = 000400102 16 2 2x3 x 00100010 18 = + + = = 6. 0010 3 3 0 3 3 3 Thus, the volume of the solid described above is equal to six volume units. The considerations in this section can be extended to the case of integrals depending on more than two variables. EXERCISES 11.1 (a) Given is a Cobb–Douglas function f : R2+ → R with z = f (x, y) = A · x1α1 · x2α2 , where A = 1, α1 = 1/2 and α2 = 1/2. Graph isoquants for z = 1 and z = 2 and illustrate the surface in R3 . (b) Given are the following functions f : Df → R, Df ⊆ R2 , with z = f (x, y): (i) z = 0018 9 − x2 − y2 ; xy (ii) z = x − y ; (iii) z = x2 + 4x + 2y. Graph the domain of the function and isoquants for z = 1 and z = 2. Functions of several variables 439 11.2 Find the first-order partial derivatives for each of the following functions: 2) (b) z = f (x, y) = x(y √ ; (a) z = f (x, y) = x2 sin2 y; √ y x (c) z = f (x, y) = x + y ; (d) z = f (x, y) = ln ( 0017 x y); (e) 11.3 z = f (x1 , x2 , x3 ) = 2xex1 +x2 +x3 ; 2 2 2 (f) z = f (x1 , x2 , x3 ) = x12 + x22 + x32 . The variable production cost C of two products P1 and P2 depends on the outputs x and y as follows: C(x, y) = 120x + 1, 200, 000 32, 000, 000 + 800y + , x y where (x, y) ∈ R2 with x ∈ [20, 200] and y ∈ [50, 400]. (a) Determine the marginal production cost of products P1 and P2 . (b) Compare the marginal cost of P1 for x1 = 80 and x2 = 120 and of P2 for y1 = 160 and y2 = 240. Give an interpretation of the results. 11.4 Find all second-order partial derivatives for each of the following functions: (a) z = f (x1 , x2 , x3 ) = x13 + 3x1 x22 x33 + 2x2 + ln(x1 x3 ); 11.5 1 + xy x+y z = f (x, y) = 1 − xy ; (c) z = f (x, y) = ln x − y . Determine the gradient of function f : Df → R, Df ⊆ R2 , with z = f (x, y) and specify it at the points (x0 , y0 ) = (1, 0) and (x1 , y1 ) = (1, 2): 0018 (a) z = ax + by; (b) z = x2 + xy2 + sin y; (c) z = 9 − x2 − y2 . 11.6 Given is the surface (b) z = f (x, y) = x2 sin2 y and the domain Df = R2 , where the xy plane is horizontal. Assume that a ball is located on the surface at point (x, y, z) = (1, 1, z). If the ball begins to roll, what is the direction of its movement? 11.7 Determine the total differential for the following functions: (a) z = f (x, y) = sin yx ; (b) z = f (x, y) = x2 + xy2 + sin y; (c) z = f (x, y) = e(x +y ) ; (d) z = f (x, y) = ln(xy). 11.8 Find the surface of a circular cylinder with radius r = 2 m and height h = 5 m. Assume that measurements of radius and height may change as follows: r = 2±0.05 and h = 5 ± 0.10. Use the total differential for an approximation of the change of the surface in this case. Find the absolute and relative (percentage) error of the surface. 11.9 Let f : Df → R, Df ⊆ R2 , be a function with 2 2 z = f (x1 , x2 ) = x12 ex2 , where x1 = x1 (t) and x2 = x2 (t). (a) Find the derivative dz/dt. (b) Use the chain rule to find z 0010 (t) if (i) x1 = t 2 ; x2 = t 2 . x2 = ln t 2 ; (ii) x1 = ln t 2 ; 440 Functions of several variables (c) Find z 0010 (t) by substituting the functions of (i) and (ii) for x1 and x2 , and then differentiate them. 11.10 Given is the function z = f (x, y) = 0017 9 − x2 − y2 . Find the directional derivatives in direction r1 = (1, 0)T , r2 = (1, 1)T , r3 = (−1, −2)T at point (1, 2). 11.11 Assume that C(x1 , x2 , x3 ) = 20 + 2x1 x2 + 8x3 + x2 ln x3 + 4x1 is the total cost function of three products, where x1 , x2 , x3 are the outputs of these three products. (a) Find the gradient and the directional derivative with the directional vector r = (1, 2, 3)T of function C at point (3, 2, 1). Compare the growth of the cost (marginal cost) in the direction of fastest growth with the directional marginal cost in the direction r. Find the percentage rate of cost reduction at point (3, 2, 1). (b) The owner of the firm wants to increase the output by 6 units altogether. The owner can do it in the ratio of 1:2:3 or of 3:2:1 for the products x1 : x2 : x3 . Further conditions are x1 ≥ 1, x3 ≥ 1, and the output x2 must be at least 4 units. Which ratio leads to a lower cost for the firm? 11.12 Success of sales z for a product depends on a promotion campaign in two media. Let x1 and x2 be the funds invested in the two media. Then the following function is to be used to reflect the relationship: √ z = f (x1 , x2 ) = 10 x1 + 20 ln(x2 + 1) + 50; x1 ≥ 0, x2 ≥ 0. 11.13 Find the partial rates of change and the partial elasticities of function f at point (x1 , x2 ) = (100, 150). Determine whether the following function is homogeneous. If it is, what is the degree of homogeneity? Use Euler’s theorem and interpret the result. 0017 (b) z = f (x1 , x2 ) = x1 x22 + x12 . (a) z = f (x1 , x2 ) = x13 + 2x1 x22 + x23 ; 11.14 Let F(x, y) = 0 be an implicitly defined function. Find dy/dx by the implicit-function rule. 2 2 y (b) F(x, y) = xy − sin 3x = 0; F(x, y) = x 2 − 2 − 1 = 0 (y ≥ 0); a b y 2 (c) F(x, y) = x − ln xy + x y = 0. We consider the representation of variables x and y by so-called polar coordinates: (a) 11.15 x = r cos ϕ, y = r sin ϕ, Functions of several variables 441 or equivalently, the implicitly given system F1 (x, y; r, ϕ) = r cos ϕ − x = 0 F2 (x, y; r, ϕ) = r sin ϕ − y = 0. 11.16 Check by means of the Jacobian determinant whether this system can be put into its reduced form, i.e. whether variables r and ϕ can be expressed in terms of x and y. Check whether the following function f : Df → R, Df ⊆ R2 , with z = f (x, y) = x3 y2 (1 − x − y) has a local maximum at point (x1 , y1 ) = (1/2, 1/3) and a local minimum at point (x2 , y2 ) = (1/7, 1/7). 11.17 Find the local extrema of the following functions f : Df → R, Df ⊆ R2 : (a) z = f (x, y) = x2 y + y2 − y; (b) z = f (x, y) = x2 y − 2xy + 34 ey . 11.18 The variable cost of two products P1 and P2 depends on the production outputs x and y as follows: C(x, y) = 120x + 11.19 1, 200, 000 32, 000, 000 + 800y + , x y where DC = R2+ . Determine the outputs x0 and y0 which minimize the cost function and determine the minimum cost. Given is the function f : R3 → R with f (x, y, z) = x2 − 2x + y2 − 2z 3 y + 3z 2 . 11.20 Find all stationary points and check whether they are local extreme points. Find the local extrema of function C : DC → R, DC ⊆ R3 , with C(x) = C(x1 , x2 , x3 ) = 20 + 2x1 x2 + 8x3 + x2 ln x3 + 4x1 and x ∈ R3 , x3 > 0. 11.21 The profit P of a firm depends on three positive input factors x1 , x2 , x3 as follows: P(x1 , x2 , x3 ) = 90x1 x2 − x12 x2 − x1 x22 + 60 ln x3 − 4x3 . Determine input factors which maximize the profit function and find the maximum profit. 11.22 Sales y of a firm depend on the expenses x of advertising. The following values xi of expenses and corresponding sales yi of the last 10 months are known: xi yi 20 180 20 160 24 200 25 250 26 220 28 250 30 250 30 310 33 330 34 280 (a) Find a linear function f by applying the criterion of minimizing the sum of the squared differences between yi and f (xi ). (b) What sales can be expected if x = 18, and if x = 36? 442 Functions of several variables 11.23 Given the implicitly defined functions F(x, y) = (x − 1)2 (y − 2)2 + − 1 = 0, 4 9 verify that F has local extrema at points P1 : (x = 1, y = 5) and P2 : (x = 1, y = −1). Decide whether they are a local maximum or a local minimum point and graph the function. 11.24 Find the constrained optima of the following functions: (a) z = f (x, y) = x2 + xy − 2y2 , s.t. 2x + y = 8; (b) z = f (x1 , x2 , x3 ) = 3x12 + 2x2 − x1 x2 x3 , s.t. x1 + x2 = 3 11.25 and 2x1 x3 = 5. Check whether the function f : Df → R, Df ⊆ R3 , with z = f (x, y, z) = x2 − xz + y3 + y2 z − 2z, s.t. x − y2 − z − 2 = 0 and x+z−4=0 has a local extremum at point (11, −4, −7). Find the dimensions of a box for washing powder with a double bottom so that the surface is minimal and the volume amounts to 3,000 cm3 . How much cardboard is required if glued areas are not considered? 11.27 Find all the points (x, y) of the ellipse 4x2 + y2 = 4 which have a minimal distance from point P0 = (2, 0). (Use the Lagrange multiplier method first, then substitute the constraint into the objective function.) 11.28 The cost function C : R3+ → R of a firm producing the quantities x1 , x2 and x3 is given by 11.26 C(x1 , x2 , x3 ) = x12 + 2x22 + 2x32 − 2x1 x2 − 3x1 x3 + 500. The firm has to fulfil the constraint 2x1 + 4x2 + 3x3 = 125. 11.29 Find the minimum cost. (Use the Langrange multiplier method as well as optimization by substitution.) Evaluate , (2x + y + 1) dR, R Functions of several variables 443 where the region R is the triangle with the corners P1 : (1, 1), P2 : (5, 3) and P3 : (5, 5). 11.30 Find the volume of a solid of constant height h = 2 which is bounded below by the xy plane, and the sides by the vertical walls defined by the region R: R is the set of all points (x, y) of the xy plane which are enclosed by the curves y2 = 2x + 1 and x − y − 1 = 0. 11.31 Evaluate the double integral , x dx dy, y R where R is the region between the two parabolas x = y2 and y = x2 . 12 Differential equations and difference equations We start our considerations with the following introductory example. Example 12.1 We wish to determine all functions y = y(x) having the rate of change √ 3 2x + 1. Using the definition of the rate of change (see Chapter 4.4), this problem can be formulated as follows. Determine all functions y = y(x) which satisfy the equation y (x) = √ y0010 (x) = 3 2x + 1. y(x) The latter equation can be rewritten as √ y0010 (x) = 3 2x + 1 · y(x). This is a relationship between the independent variable x, the function y(x) and its derivative y0010 (x). Such an equation is called a differential equation. In this chapter, we discuss some methods for solving special types of differential equations (and later related difference equations). In what follows, as well as y(x), y0010 (x), . . . we also use the short forms y, y0010 , . . . First, we formally define a so-called ordinary differential equation. Definition 12.1 A relationship F(x, y, y0010 , y00100010 , . . . , y(n) ) = 0 (12.1) between the independent variable x, a function y(x) and its derivatives is called an ordinary differential equation. The order of the differential equation is determined by the highest order of the derivatives appearing in the differential equation. If representation (12.1) can be solved for y(n) , we get the explicit representation y(n) = f (x, y, y0010 , y00100010 , . . . , y(n−1) ) Differential and difference equations 445 of a differential equation of order n. The notion of an ordinary differential equation indicates that only functions depending on one variable can be involved, in contrast to partial differential equations which include also functions of several variables and their partial derivatives. Hereafter, we consider only ordinary differential equations and for brevity, we skip ‘ordinary’. Definition 12.2 A function y(x) for which relationship F(x, y, y0010 , y00100010 , . . . , y(n) ) = 0 holds for all values x ∈ Dy is called a solution of the differential equation. The set S = {y(x) | F(x, y, y0010 , y00100010 , . . . , y(n) ) = 0 for all x ∈ Dy } is called the set of solutions or general solution of the differential equation. 12.1 DIFFERENTIAL EQUATIONS OF THE FIRST ORDER A first-order differential equation can be written in implicit form: F(x, y, y0010 ) = 0 or in explicit form: y0010 = f (x, y). In the rest of this section, we first discuss the graphical solution of a differental equation of the first order and then a special case of such an equation which can be solved using integration. 12.1.1 Graphical solution Consider the differential equation y0010 = f (x, y). At any point (x0 , y0 ), the value y0010 = f (x0 , y0 ) of the derivative is given, which corresponds to the slope of the tangent at point (x0 , y0 ). This means that we can roughly approximate the function in a rather small neighbourhood by the tangent line. To get an overview on the solution, we can determine the curves, along which the derivative y0010 has the same value (i.e. we can consider the isoquants with y0010 being constant). In this way, we can ‘approximately’ graph the (infinitely many) solutions of the given differential equation by drawing the so-called direction field. This procedure is illustrated in Figure 12.1 for the differential equation y0010 = x/2. For any x ∈ R (and arbitrary y), the slope of the tangent line is equal to x/2, i.e. for x = 1, we have y0010 = 1/2 (that is, along the line x = 1 we have the slope 1/2 of the tangent line), for x = 2, we have y0010 = 1 (i.e. along the line x = 2 the slope of the tangent line is equal to 1) and so on. Similarly, for x = −1, we get y0010 = −1/2, for x = −2 we get y0010 = −1 and so on. Looking at many x values, we get a rough impression of the solution. In the given case, we get parabolas (in Figure 12.1 the parabolas going through the points (0, −2), (0, 0) and (0, 2) are drawn), and the general solution is given by y= x2 +C 4 with C being an arbitrary real constant. 446 Differential and difference equations Figure 12.1 The direction field for y0010 = x/2. Example 12.2 We apply this graphical approach to the differential equation x y0010 = − . y Figure 12.2 The direction field for y0010 = −x/y. Differential and difference equations 447 y0010 For each point on the line y = x, we have = −1, and for each point on the line y = −x, we have y0010 = 1. Analogously, for each point on the line y = x/2, we get y0010 = −2 and for each point on the line y = −x/2, we get y0010 = 2. Continuing in this way, we get the direction field given in Figure 12.2. Thus, the solutions are circles with the origin of the coordinate system as the centre, i.e. x2 + y2 = C (C > 0). (In Figure 12.2 the circles with radius 2, i.e. C = 4, and radius 4, i.e. C = 16, are given.) Implicit differentiation of y2 = C − x2 confirms the result: 2y · y0010 = −2x x y0010 = − . y 12.1.2 Separable differential equations We continue with such a type of differential equation, where a solution can be found by means of integration. Let us consider the following special case of a first-order differential equation: y0010 = f (x, y) = g(x) · h(y), i.e. function f (x, y) can be written as the product of a function g depending only on variable x and a function h depending only on variable y. In this case, function f = f (x, y) has a special structure, and we say that this is a differential equation with separable variables. Solving this type of differential equation requires only integration techniques. First, we rewrite y0010 = dy = g(x) · h(y) dx in the form dy = g(x) dx. h(y) That is, all terms containing function y are now on the left-hand side and all terms containing variable x are on the right-hand side (i.e. we have separated the variables). Then we integrate both sides , , dy = g(x) dx h(y) and obtain H (y) = G(x) + C. 448 Differential and difference equations If we eliminate y, we get the general solution y(x) including constant C. A particular (or definite) solution yP (x) is obtained when C is assigned a particular value by considering an initial condition y(x0 ) = y0 . The latter problem of finding a particular solution is also referred to as an initial value problem. We illustrate this approach by the following two examples. Example 12.1 (continued) If we rewrite equation √ dy = y0010 = 3 2x + 1 · y dx in the form √ dy = 3 2x + 1 dx, y we get a differential equation with separable variables. Let us integrate both sides: , , √ dy =3 2x + 1 dx. y (12.2) Now we get on the left-hand side the natural logarithm function as an antiderivative. Considering the integral on the right-hand side, we apply the substitution u = 2x + 1 which yields du = 2. dx Now we can find the integral: , , √ 1 √ 1 2x + 1 dx = u du = · u3/2 + C ∗ . 2 3 Therefore, equation (12.2) leads to ln |y| = (2x + 1)3/2 + C ∗ . Thus, we get the general solution 3/2 +C ∗ |y| = e(2x+1) = e(2x+1) 3/2 · eC ∗ which can be rewritten as |y| = C · e(2x+1) , 3/2 C > 0. ∗ Here we have written the term eC as a new constant C > 0. By dropping the absolute values (they are now not necessary since the logarithmic terms have been removed), we can rewrite the solution obtained as y(x) = C · e(2x+1) 3/2 Differential and difference equations 449 with C being an arbitrary constant. If now the initial condition y(0) = e is also given, we obtain from the general solution e = C · e1 , i.e. C = 1. This yields the particular solution yP (x) = e(2x+1) 3/2 of the initial value problem. Example 12.3 the function We wish to determine all functions y = y(x) whose elasticity is given by 000by (x) = 2x + 1. Applying the definition of elasticity (see Chapter 4.4), we obtain y (x) = x · y0010 (x) = 2x + 1. y(x) We can rewrite this equation as dy dx y = 2x + 1 x or, equivalently, 0004 0003 dy 1 dx. = 2+ x y This is a differential equation with separable variables and we can integrate the functions on both sides. This yields: , dy = y , 0003 2+ 0004 1 dx x ln |y| = 2x + ln |x| + C ∗ ∗ |y| = e2x+ln |x|+C = e2x · eln |x| · eC |y| = e2x · |x| · C, ∗ C > 0. By dropping the absolute values, we can rewrite the solution obtained as y(x) = C · x · e2x with C being an arbitrary real constant. 450 Differential and difference equations Finally, we mention that we can reduce certain types of differential equations by an appropriate substitution to a differential equation with separable variables. This can be done e.g. for a differential equation of the type: y0010 = g(ax + by + c). We illustrate this procedure by an example. Example 12.4 Let the differential equation y0010 = (9x + y + 2)2 be given. (In this form it is not a differential equation with separable variables.) Applying the substitution u = 9x + y + 2, we get by differentiation dy du =9+ dx dx which can be rewritten as y0010 = du dy = − 9. dx dx After substituting the corresponding terms, we get du − 9 = u2 . dx This is a differential equation with separable variables, and we get du = dx. u2 + 9 Integrating both sides, we obtain u 1 · arctan = x + C 3 3 or, correspondingly, arctan u = 3(x + C) 3 u = tan 3(x + C) 3 u = 3 tan 3(x + C). Differential and difference equations 451 After substituting back, we get 9x + y + 2 = 3 tan 3(x + C) y = 3 tan 3(x + C) − 9x − 2. 12.2 LINEAR DIFFERENTIAL EQUATIONS OF ORDER n Definition 12.3 A differential equation y(n) + an−1 (x) y(n−1) + · · · + a1 (x) y0010 + a0 (x) y = q(x) (12.3) is called a linear differential equation of order n. Equation (12.3) is called linear since function y and all derivatives occur only in the first power and there are no products of the n + 1 functions y, y0010 , . . . , y(n) . The functions ai (x), i = 1, 2, . . . , n − 1, depend on variable x and can be arbitrary. Definition 12.4 If q(x) ≡ 0 in equation (12.3), i.e. function q = q(x) is identically equal to zero for all values x, the differential equation (12.3) is called homogeneous. Otherwise, i.e. if q(x) ≡ 0, the differential equation is called non-homogeneous. Function q = q(x) is also known as the forcing term. Next, we present some properties of solutions of a linear differential equation of order n and then we discuss an important special case. 12.2.1 Properties of solutions Homogeneous differential equations First, we present some properties of solutions of linear homogeneous differential equations of order n. THEOREM 12.1 ential equation Let y1 (x), y2 (x), . . . , ym (x) be solutions of the linear homogeneous differ- y(n) + an−1 (x)y(n−1) + · · · + a0 (x)y = 0. Then the linear combination y(x) = C1 y1 (x) + C2 y2 (x) + · · · + Cm ym (x) is a solution as well with C1 , C2 , . . . , Cm ∈ R. (12.4) 452 Differential and difference equations Definition 12.5 The solutions y1 (x), y2 (x), . . . , ym (x), m ≤ n, of the linear homogeneous differential equation (12.4) are said to be linearly independent if C1 y1 (x) + C2 y2 (x) + · · · + Cm ym (x) = 0 for all x ∈ Dy is possible only for C1 = C2 = · · · = Cm = 0. Otherwise, the solutions are said to be linearly dependent. The next theorem gives a criterion to decide whether a set of solutions for the homogeneous differential equation is linearly independent. THEOREM 12.2 The solutions y1 (x), y2 (x), . . . , ym (x), m ≤ n, of the linear homogeneous differential equation (12.4) are linearly independent if and only if 0010 0010 y1 (x) 0010 0010 0010 y1 (x) 0010 W (x) = 0010 . 0010 . 0010 (m−1) 0010y (x) 1 y2 (x) y20010 (x) . . (m−1) y2 .. .. (x) . . . 0010 0010 0010 0010 0010 0010 = 0 0010 0010 (m−1) (x) 0010 ym ym (x) 0010 (x) ym . . for x ∈ Dy . W (x) is called Wronski’s determinant. It can be proved that, if W (x0 ) = 0 for some x0 ∈ Dy = (a, b), then we also have W (x) = 0 for all x ∈ (a, b). Under the assumptions of Theorem 12.2, it is therefore sufficient to investigate W (x0 ) for some particular point x0 ∈ Dy . If we have W (x0 ) = 0, the m solutions y1 (x), y2 (x), . . . , ym (x) are linearly independent. Example 12.5 y1 (x) = x, We check whether the functions y2 (x) = 1 x and y3 (x) = ln x x are linearly independent solutions of the differential equation x3 y001000100010 + 4x2 y00100010 + xy0010 − y = 0. We first determine the required derivatives of the function yi , i ∈ {1, 2, 3}, and obtain y10010 (x) = 1, y100100010 (x) = 0, 1 , x2 1 − ln x y30010 (x) = , x2 2 6 , y2001000100010 (x) = − 4 , x3 x −3 + 2 ln x 11 − 6 ln x y300100010 (x) = , y3001000100010 (x) = . x3 x4 y20010 (x) = − y200100010 (x) = y1001000100010 (x) = 0, Differential and difference equations 453 Next, we check whether each yi , i ∈ {1, 2, 3}, is a solution of the differential equation. We get y1 (x) : x3 · 0 + 4x2 · 0 + x · 1 − x 0003 0004 0004 0003 0003 0004 1 1 6 8 1 1 6 2 y2 (x) : x3 · − 4 + 4x2 · 3 + x · − 2 − = − + − − x x x x x x x x 11 − 6 ln x −3 + 2 ln x 1 − ln x ln x − + 4x2 · +x· x x3 x2 x4 11 − 6 ln x −12 + 8 ln x 1 ln x ln x = + + − − x x x x x = 0, = 0, y3 (x) : x3 · = 0. We now investigate Wronski’s determinant W (x): 0010 0010 1 ln x 0010 0010 0010x 0010 0010 0010 x x 0010 0010 0010 0010 1 1 − ln x 0010 W (x) = 001000101 − 0010 2 2 x x 0010 0010 0010 0010 −3 + 2 ln x 0010 2 0010 00100 0010 3 3 x x 0010 0010 0010 0010 0010 1 0010 00101 1 − ln x 0010 ln x 0010− 0010 0010 0010 2 0010 x2 0010 0010x 0010 x x 0010−1·0010 0010 = x · 00100010 0010 0010 0010 −3 + 2 ln x 0010 0010 2 0010 2 −3 + 2 ln x 0010 0010 3 0010 0010 0010 3 3 3 x x x x 3 − 2 ln x − 2 + 2 ln x −3 + 2 ln x − 2 ln x 4 =x· − = 4. x5 x x4 Thus, Wronski’s determinant is different from zero for all x ∈ Dy = (0, ∞). Therefore, the solutions y1 (x), y2 (x) and y3 (x) are linearly independent. Now we can describe the general solution of a linear homogeneous differential equation of order n as follows. THEOREM 12.3 Let y1 (x), y2 (x), . . . , yn (x) be n linearly independent solutions of a linear homogeneous differential equation (12.4) of order n. Then the general solution can be written as SH = {yH (x) | yH (x) = C1 y1 (x) + C2 y2 (x) + · · · + Cn yn (x), C1 , C2 , . . . , Cn ∈ R}. (12.5) The solutions y1 (x), y2 (x), . . . , yn (x) in equation (12.5) constitute a fundamental system of the differential equation, and we also say that yH (x) is the complementary function. Referring to Example 12.5, the general solution of the given homogeneous differential equation is yH (x) = C1 x + C2 · 1 ln x + C3 · . x x 454 Differential and difference equations Non-homogeneous differential equations We present the following theorem that describes the structure of the general solution of a linear non-homogeneous differential equation. THEOREM 12.4 Let SH = {yH (x) | yH (x) = C1 y1 (x) + C2 y2 (x) + · · · + Cn yn (x), C1 , C2 , . . . , Cn ∈ R} be the general solution of the homogeneous equation (12.4) of order n and yN (x) a particular solution of the non-homogeneous equation (12.3). Then the general solution of a linear non-homogeneous differential equation (12.3) can be written as S = {y(x) | y(x) = yH (x) + yN (x), yH (x) ∈ SH } = {y(x) | y(x) = C1 y1 (x) + C2 y2 (x) + . . . + Cn yn (x) + yN (x), C1 , C2 , . . . , Cn ∈ R}. Initial value problems If an initial value problem of the linear differential equation of order n is considered, n initial conditions y(x0 ) = y0 , y0010 (x0 ) = y1 , .., y(n−1) (x0 ) = yn−1 (12.6) are given from which the n constants C1 , C2 , . . . , Cn of the general solution can be determined. This yields the particular solution yP (x). The following theorem gives a sufficient condition for the existence and uniqueness of a particular solution. THEOREM 12.5 (existence and uniqueness of a solution) Let n initial conditions according to (12.6) for a linear differential equation of order n y(n) + an−1 y(n−1) + · · · + a1 y0010 + a0 y = q(x) (12.7) with constant coefficients ai , i ∈ {0, 1, . . . , n − 1}, be given. Then this linear differential equation has exactly one particular solution yP (x). We only note that for arbitrary functions ai (x), i ∈ {1, 2, . . . , n − 1}, an initial value problem does not necessarily have a solution. 12.2.2 Differential equations with constant coefficients Now, we consider the special case of a linear differential equation of order n with constant coefficients, i.e. ai (x) = ai for i = 0, 1, 2, . . . , n − 1. Homogeneous equations First, we consider the homogeneous differential equation. Thus, we have y(n) + an−1 y(n−1) + · · · + a1 y0010 + a0 y = 0. (12.8) Differential and difference equations 455 This type of homogeneous differential equation can be easily solved. We set y(x) = eλx with λ ∈ C and obtain y0010 (x) = λ eλx , y00100010 (x) = λ2 eλx , y(n) (x) = λn eλx . .., Substituting the latter terms into equation (12.8) and taking into account that the exponential function eλx has no zeroes, we obtain the so-called characteristic equation: Pn (λ) = λn + an−1 λn−1 + · · · + a1 λ + a0 = 0. (12.9) We know from Chapter 3 that polynomial Pn (λ) has n (real or complex) zeroes λ1 , λ2 , . . . , λn . For each root λj of the characteristic equation, we determine a specific solution of the homogeneous differential equation (Table 12.1). It can be shown by Theorem 12.2 that the resulting set of solutions forms a fundamental system, i.e. the n solutions obtained in this way are linearly independent. It is worth noting that also in the case of complex roots of the characteristic equation, the solutions used are real. The solutions in cases (c) and (d) are based on Euler’s formula: e(αj +iβj )x = eαj x · eiβj x = eαj x (cos βj + i sin βj ) = eαj x cos βj x + ieαj x sin βj x = yj (x) + iyj+1 (x) with yj (x) and yj+1 (x) being real solutions. Table 12.1 Specific solutions in dependence on the roots of the characteristic equation Roots of the characteristic equation (a) (b) λj is a real root of multiplicity one λj = λj+1 = . . . = λj+k−1 is a real root of multiplicity k > 1 (c) λj = αj + βj i λj+1 = αj − βj i is a pair of conjugate complex roots of multiplicity one λj = λj+2 = · · · = λj+2k−2 = αj + βj i λj+1 = λj+3 = · · · = λj+2k−1 = αj − βj i is a pair of conjugate complex roots of multiplicity k > 1 (d) Solution yj (x) = eλj x yj (x) = eλj x yj+1 (x) = x · eλj x . . yj+k−1 (x) = xk−1 · eλj x yj (x) = eαj x · cos βj x yj+1 (x) = eαj x · sin βj x yj (x) = eαj x · cos βj x yj+1 (x) = eαj x · sin βj x yj+2 (x) = x · eαj x · cos βj x yj+3 (x) = x · eαj x · sin βj x . . yj+2k−2 (x) = xk−1 · eαj x · cos βj x yj+2k−1 (x) = xk−1 · eαj x · sin βj x 456 Differential and difference equations Table 12.2 Settings for special forcing terms Forcing term q(x) Setting yN (x) a · ebx (a) (b) bn xn + bn−1 xn−1 + · · · + b1 x + b0 (a) (b) a · cos αx + b · sin αx (a or b can be equal to zero) (a) (b) A · ebx if b is not a root of the characteristic equation (12.9) A · xk · ebx if b is a root of multiplicity k of the characteristic equation (12.9) An xn + An−1 xn−1 + · · · + A1 x + A0 if a0 = 0 in equation (12.9) xk · (An xn + An−1 xn−1 + · · · + A1 x + A0 ) if a0 = a1 = · · · = ak−1 = 0 in (12.9), i.e. y, y0010 , . . . , y(k−1) do not occur in (12.8) A cos αx + B sin αx if αi is not a root of the characteristic equation (12.9) xk · (A · cos αx + B · sin αx) if αi is a root of multiplicity k of the characteristic equation (12.9) Non-homogeneous differential equations Now we discuss how a particular solution yN (x) of a non-homogeneous linear differential equation with constant coefficients can be found. We describe the method of undetermined coefficients and consider here only three specific types of forcing terms q. In these cases, a solution can be found by a special setting of the solution function yN (x) in dependence on function q in equation (12.7). It can be seen that the setting in Table 12.2 follows the structure of the forcing term given on the left-hand side. We emphasize that the small letters in Table 12.2 are fixed in dependence on the given function q = q(x) whereas capital letters are parameters which have to be determined such that yN (x) is a solution of the given differential equation. Using the corresponding setting above, we now insert yN (x) into the non-homogeneous differential equation and determine the coefficients A, B and Ai , respectively, by a comparison of the coefficients of the left-hand and right-hand sides. It should be noted that the modifications described in cases (b) of Table 12.2 are necessary to guarantee that the suggested approach works (otherwise we would not be able to determine all coefficients by this approach). Moreover, we mention that this approach can also be used for sums or products of the above forcing terms, where the setting has to follow the structure of the given forcing term. We illustrate the determination of the general solution by the following examples. Example 12.6 We solve the differential equation y00100010 + y0010 − 2y = 2x2 − 4x + 1. First, we consider the homogeneous differential equation y00100010 + y0010 − 2y = 0. (12.10) Differential and difference equations 457 eλx By setting y(x) = and substituting we get the characteristic equation y(x), y0010 (x) and y00100010 (x) into the homogeneous equation, λ2 + λ − 2 = 0 which has the two solutions 1 λ1 = − + 2 0016 1 +2=1 4 and λ2 = 1 − 2 0016 1 + 2 = −2. 4 Since we have two real roots of multiplicity one, case (a) in Table 12.1 applies for each root and we get the complementary function yH (x) = C1 ex + C2 e−2x . Since function q is a polynomial of degree two, we use the setting yN (x) = Ax2 + Bx + C in order to find a particular solution of the non-homogeneous differential equation. We obtain (yN )0010 (x) = 2Ax + B (yN )00100010 (x) = 2A. Substituting the above terms into the differential equation (12.10), we get 2A + (2Ax + B) − 2(Ax2 + Bx + C) = 2x2 − 4x + 1. Comparing now the coefficients of all powers of x, i.e. the coefficients of x2 , x1 and x0 , we obtain: x2 : x1 : x0 : −2A 2A 2A − + B 2B − 2C = 2 = −4 = 1. From the first equation we obtain A = −1, and then from the second equation B = 1 and, finally, from the third equation C = −1/2. The general solution of the non-homogeneous differential equation (12.10) is given by 1 y(x) = yH (x) + yN (x) = C1 ex + C2 e−2x − x2 + 2x − . 2 Example 12.7 We solve the non-homogeneous differential equation y001000100010 + 2y00100010 + 5y0010 = cos 2x. 458 Differential and difference equations Considering the homogeneous equation and again setting y = eλx , we obtain the characteristic equation λ3 + 2λ2 + 5λ = λ · (λ2 + 2λ + 5) = 0. From the latter equation we get √ λ1 = 0, λ2 = −1 + 1 − 5 = −1 + 2i and λ3 = −1 − √ 1 − 5 = −1 − 2i, i.e. there is a real root of multiplicity one and two (conjugate) complex roots of multiplicity one. Therefore, we get the complementary function yH (x) = C1 e0x + C2 e−x cos 2x + C3 e−x sin 2x = C1 + e−x (C2 cos 2x + C3 sin 2x). To find a particular solution yN of the non-homogeneous differential equation, we use the setting yN (x) = A cos 2x + B sin 2x and obtain (yN )0010 (x) = −2A sin 2x + 2B cos 2x (yN )00100010 (x) = −4A cos 2x − 4B sin 2x (yN )001000100010 (x) = 8A sin 2x − 8B cos 2x. Substituting yN (x) and its derivatives into the non-homogeneous differential equation, we obtain (8A sin 2x − 8B cos 2x) + 2(−4A cos 2x − 4B sin 2x) + 5(−2A sin 2x + 2B cos 2x) = cos 2x. Comparing now the coefficients of sin 2x and cos 2x on both sides of the above equation, we obtain: sin 2x : 8A − 8B − 10A = −2A − 8B = 0 cos 2x : −8B − 8A + 10B = −8A + 2B = 1. This is a system of two linear equations with the two variables A and B which can be easily solved: A=− 2 17 and B= 1 . 34 Therefore, we obtain the general solution y(x) = yH (x) + yN (x) = C1 + e−x (C2 cos 2x + C3 sin 2x) − 2 1 cos 2x + sin 2x. 17 34 Differential and difference equations 459 Example 12.8 Let the non-homogeneous differential equation y001000100010 − 4y0010 = e2x be given. In order to solve the homogeneous equation, we consider the characteristic equation λ3 − 4λ = λ(λ2 − 4) = 0 which has the three roots λ1 = 0, λ2 = 2 and λ3 = −2. This yields the complementary function yH (x) = C1 e0x + C2 e2x + C3 e−2x = C1 + C2 e2x + C3 e−2x . To get a particular solution of the non-homogeneous equation, we have to use case (b) with k = 1 for an exponential forcing term e2x in Table 12.2 since λ2 = 2 is a root of multiplicity one of the characteristic equation, and we set yN (x) = Axe2x . (Indeed, if we used case (a) and set yN (x) = Ae2x , we would be unable to determine coefficient A, as the reader can check.) We obtain (yN )0010 (x) = Ae2x + 2Axe2x = Ae2x (1 + 2x) (yN )00100010 (x) = 2Ae2x (1 + 2x) + 2Ae2x = 2Ae2x (2 + 2x) = 4Ae2x (1 + x) (yN )001000100010 (x) = 8Ae2x (1 + x) + 4Ae2x = 4Ae2x (3 + 2x). Substituting the terms for (yN )0010 and (yN )001000100010 into the non-homogeneous equation, we get 4Ae2x (3 + 2x) − 4Ae2x (1 + 2x) = e2x . Comparing the coefficients of e2x on both sides, we get 12A − 4A = 1 which gives A = 1/8 and thus the particular solution yN (x) = 1 2x xe . 8 Therefore, the general solution of the given differential equation is given by 1 y(x) = yH (x) + yN (x) = C1 + C2 e2x + C3 e−2x + xe2x . 8 460 Differential and difference equations Example 12.9 Consider the homogeneous differential equation y001000100010 − 4y00100010 + 5y0010 − 2y = 0 with the initial conditions y(0) = 1, y0010 (0) = 1 y00100010 (0) = 2. and The characteristic equation is given by λ3 − 4λ2 + 5λ − 2 = 0. Since the sum of the above four coefficients is equal to zero we immediately get the root λ1 = 1. Applying now Horner’s scheme for dividing the polynomial of degree three by λ−1, we get the equation λ3 − 4λ2 + 5λ − 2 = (λ − 1) · (λ2 − 3λ + 2). The quadratic equation λ2 − 3λ + 2 = 0 has the two roots λ2 = 1 and λ3 = 2, i.e. there is one real root λ1 = λ2 = 1 of multiplicity two and one real root λ3 = 2 of multiplicity one. Therefore, the following complementary function is obtained: yH (x) = C1 ex + C2 xex + C3 e2x . Now we determine a particular solution satisfying the given initial conditions. We obtain: yH (0) = 1 : C1 + C3 = 1 H 0010 (y ) (0) = 1 : C1 + C2 + 2C3 = 1 (yH )00100010 (0) = 2 : C1 + 2C2 + 4C3 = 2. This is a system of three linear equations with three variables which has the unique solution C1 = 0, C2 = −1 and Hence we get the particular solution yP (x) = −xex + e2x of the initial value problem. C3 = 1. Differential and difference equations 461 12.3 SYSTEMS OF LINEAR DIFFERENTIAL EQUATIONS OF THE FIRST ORDER In this section, we briefly discuss the solution of systems of linear differential equations of the first order. We consider the system y10010 (x) = a11 y1 (x) + a12 y2 (x) + · · · + a1n yn (x) + q1 (x) y20010 (x) = a21 y1 (x) + a22 y2 (x) + · · · + a2n yn (x) + q2 (x) . . . . yn0010 (x) = an1 y1 (x) + an2 y2 (x) + · · · + ann yn (x) + qn (x) or, equivalently, in matrix representation ⎛ y10010 (x) ⎜ y0010 (x) ⎜ 2 ⎜ . ⎝ . yn0010 (x) ⎞ ⎛ a11 ⎟ ⎜a21 ⎟ ⎜ ⎟ = ⎜. ⎠ ⎝. an1 a12 a22 . . . . . a1n . . . a2n . . an2 . . . ann ⎞ ⎛ y1 (x) ⎟ ⎜ y2 (x) ⎟ ⎜ ⎟ · ⎜ . ⎠ ⎝ . yn (x) ⎞ ⎛ q1 (x) ⎟ ⎜ q2 (x) ⎟ ⎜ ⎟ + ⎜ . ⎠ ⎝ . qn (x) ⎞ ⎟ ⎟ ⎟. ⎠ (12.11) A system is called homogeneous if all functions qi (x), i = 1, 2, . . . , n, are identically equal to zero. We define a solution and the general solution of a system of linear differential equations of the first order in an analogous way as for a differential equation. Now a solution is characterized by a vector y(x) = (y1 (x), y2 (x), . . . , yn (x))T of n functions satisfying the given system of n linear differential equations. For an initial value problem, also given are the n initial conditions y1 (x0 ) = y10 , y2 (x0 ) = y20 , .., yn (x0 ) = yn0 , and a particular solution yP (x) is a vector of n functions in which the n constants from the general solution have been determined by means of the initial values. The following theorem gives a relationship between one linear differential equation of order n and a system of n linear differential equations of first order. THEOREM 12.6 Let a differential equation of order n be linear with constant coefficients. Then it can be written as a system of n linear differential equations of first order with constant coefficients. PROOF If a linear differential equation of order n is given in the form (12.7), we set y1 (x) = y(x), y2 (x) = y0010 (x), .., yn (x) = y(n−1) (x) 462 Differential and difference equations and obtain a system of differential equations: y10010 (x) = y2 (x) y20010 (x) = y3 (x) . . . . 0010 yn−1 (x) = yn (x) yn0010 (x) = −a0 y1 (x) − a1 y2 (x) − · · · − an−1 yn−1 (x) + q(x). 0001 Under additional assumptions (which we do not discuss here), there is even an equivalence between a linear differential equation of order n with constant coefficients and a system of n linear differential equations of the first order with constant coefficients, i.e. a linear differential equation of order n can be transformed into a system of n linear differential equations of the first order with constant coefficients, and vice versa. By the following example, we illustrate this process of transforming a given system into a differential equation of order n by eliminating systematically all but one of the functions yi (x). Example 12.10 Let the system of first-order differential equations y10010 = y2 + y3 + ex y20010 = y1 − y3 y30010 = y1 + y2 + ex be given with the initial conditions y1 (0) = 0, y2 (0) = 0 and y3 (0) = 1. We transform this system containing three functions y1 (x), y2 (x), y3 (x) and its derivatives into one linear differential equation of order three. To this end, we differentiate the first equation which yields y100100010 = y20010 + y30010 + ex = (y1 − y3 ) + (y1 + y2 + ex ) + ex = 2y1 + y2 − y3 + 2ex . Differentiating the latter equation again, we get y1001000100010 = 2y10010 + y20010 − y30010 + 2ex = 2(y2 + y3 + ex ) + (y1 − y3 ) − (y1 + y2 + ex ) + 2ex = y2 + y3 + 3ex . Now we can replace the sum y2 + y3 by means of the first equation of the given system, and we obtain the third-order differential equation y1001000100010 − y10010 = 2ex . Differential and difference equations 463 Using the setting y1 (x) = eλx , we obtain the characteristic equation λ3 − λ = 0 with the roots λ1 = 0, λ2 = 1 and λ3 = −1. This yields the following general solution of the homogeneous equation: y1H = C1 + C2 · ex + C3 · e−x . In order to find a particular solution of the non-homogeneous equation, we set y1N = Axex (notice that number 1 is a root of the characteristic equation, therefore we use case (b) in the setting for an exponential forcing term according to Table 12.2), which yields (y1N )0010 = Aex (x + 1), (y1N )00100010 = Aex (x + 2) and (y1N )001000100010 = Aex (x + 3) and therefore Aex (x + 3) − Aex (x + 1) = 2ex , i.e. we obtain ex (Ax + 3A − Ax − A) = 2ex from which we obtain A = 1. Therefore, the general solution is given by y1 (x) = C1 + C2 ex + C3 e−x + xex . We still have to determine the solutions y2 (x) and y3 (x). This can be done by using the first equation of the given system and the equation obtained for y100100010 . We get y2 + y3 = y10010 − ex y2 − y3 = y100100010 − 2y1 − 2ex . Adding both equations and dividing the resulting equation by 2 gives y2 (x) = 1 0010 1 3 y + y00100010 − y1 − ex . 2 1 2 1 2 464 Differential and difference equations Inserting the general solution y1 and its derivatives y10010 = C2 ex − C3 e−x + ex + xex y100100010 = C2 ex + C3 e−x + 2ex + xex , we get y2 (x) = −C1 − C3 e−x . Subtracting the above equations for y2 + y3 and y2 − y3 and dividing the resulting equation by two yields y3 (x) = 1 0010 1 1 y − y00100010 + y1 + ex . 2 1 2 1 2 Inserting the above terms for y10010 and y100100010 we obtain y3 (x) = C1 + C2 ex + xex . Next, we consider the initial conditions and obtain y1 (0) = 0 : y2 (0) = 0 : y3 (0) = 1 : C1 −C1 C1 + C2 + C2 + − C3 C3 = = = 0 0 1. This system of three linear equations with three variables has the unique solution C1 = 1, C2 = 0 and C3 = −1. The particular solution of the initial value problem is y1P (x) = 1 − e−x + xex y2P (x) = −1 + e−x y3P (x) = 1 + xex . It is worth noting that not every initial value problem necessarily has a solution. Next, we discuss an approach for solving a system of linear differential equations of the first order directly, without using the comment after Theorem 12.6. Homogeneous systems We first discuss the solution of the homogeneous system, i.e. we assume qi (x) ≡ 0 for i = 1, 2, . . . , n. We set ⎞ ⎛ ⎞ ⎛ y1 (x) z1 ⎜ y2 (x) ⎟ ⎜ z2 ⎟ ⎟ ⎜ ⎟ ⎜ y(x) = ⎜ . ⎟ = ⎜ . ⎟ eλx . (12.12) ⎝ . ⎠ ⎝ . ⎠ yn (x) zn Differential and difference equations 465 Using (12.12) and ⎛ 0010 ⎞ ⎛ y1 (x) 0010 ⎜ y (x) ⎟ ⎜ ⎜ 2 ⎟ ⎜ ⎜ . ⎟ = λ ⎜ ⎝ . ⎠ ⎝ yn0010 (x) z1 z2 . . ⎞ ⎟ ⎟ λx ⎟e , ⎠ zn we obtain from system (12.11) the eigenvalue problem: ⎞ ⎞⎛ ⎛ a12 .. a1n z1 a11 − λ ⎟ ⎜ ⎜ a21 a22 − λ . . . a2n ⎟ ⎟ ⎜ z2 ⎟ ⎜ ⎟ ⎜ . ⎟ = 0. ⎜ . . . ⎠⎝ . ⎠ ⎝ . . . an1 zn an2 . . . ann − λ We determine the eigenvalues λ1 , λ2 , . . . , λn and the corresponding eigenvectors ⎛ 1 ⎞ ⎛ 2 ⎞ ⎛ n ⎞ z1 z1 z1 ⎜ z1 ⎟ ⎜ z2 ⎟ ⎜ zn ⎟ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ z1 = ⎜ .2 ⎟ , z2 = ⎜ .2 ⎟ , .., zn = ⎜ .2 ⎟ . ⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠ zn1 zn2 znn As in the case of one differential equation, we have to distinguish several cases. In the case of real eigenvalues λi of multiplicity one, the general solution yH (x) is as follows: ⎛ 1 ⎞ ⎛ 2 ⎞ ⎛ H ⎞ ⎛ n ⎞ z1 z1 y1 (x) z1 ⎜ 1 ⎟ ⎜ 2 ⎟ ⎜ H ⎟ ⎜ zn ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ y (x) z z ⎟ 2 ⎟ = C1 ⎜ 2 ⎟ eλ1 x + C2 ⎜ 2 ⎟ eλ2 x + · · · + Cn ⎜ yH (x) = ⎜ ⎜ .2 ⎟ eλn x . . ⎜ . ⎟ ⎜ . ⎟ ⎜ ⎟ ⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠ ⎝ ⎠ . znn ynH (x) zn1 zn2 The other cases can be handled analogously to the case of a differential equation of order n. We give only a few comments for the cases of a real eigenvalue of multiplicity k > 1 and of complex eigenvalues αj ± iβj of multiplicity one. If λi is a real eigenvalue of multiplicity k, the eigenvectors zi , zi+1 , . . . , zi+k−1 associated with this eigenvalue are not necessarily linearly independent. Therefore, we have to modify the term for finding the corresponding part of the general solution, and we use the following setting for an eigenvalue λi of multiplicity k: ⎛ 1 ⎞ ⎛ 2 ⎞ ⎛ k ⎞ z1 z1 z1 ⎜ 1 ⎟ ⎜ 2 ⎟ ⎜ k ⎟ ⎜ z2 ⎟ λ x ⎜ z2 ⎟ λ x ⎜ z2 ⎟ k−1 λ x i ⎟ i ⎜ ⎟ i ⎜ ⎟ Ci ⎜ ⎜ . ⎟ e + Ci+1 ⎜ . ⎟ xe + . . . + Ci+k−1 ⎜ . ⎟ x e . ⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠ zn1 zn2 znk In the case of complex eigenvalues αj ± iβj of multiplicity one, we take one of them, e.g. αj + iβj , and determine the corresponding eigenvector zj (which contains in general complex numbers as components in this case). Then we calculate the product zj · eαj +iβj = zj · eαj · (cos βj + i sin βj ) = z∗ . 466 Differential and difference equations The real part z j and the imaginary part z j+1 of the resulting vector z∗ , respectively, yield two linearly independent solutions, and we use the setting Cj z j + Cj+1 z j+1 for the corresponding part of the general solution of the homogeneous differential equation. Non-homogeneous systems A particular solution of the non-homogeneous system can be determined in a similar way as for a differential equation of order n. However, all occurring specific forcing terms q1 (x), q2 (x), . . . , qn (x) have to be considered in any function yiN (x). We finish this section with two systems of linear differential equations of the first order. Example 12.11 y10010 y20010 y30010 = = = We consider the system 2y1 y1 −2y1 + − y2 4y2 + − 2y3 3y3 with the initial conditions y1 (0) = 13, y2 (0) = 0 and y3 (0) = −4. We use the setting ⎞ ⎞ ⎛ ⎛ z1 y1 (x) ⎝ y2 (x) ⎠ = ⎝ z2 ⎠ eλx . z3 y3 (x) Inserting this and the derivatives yi0010 , i ∈ {1, 2, 3}, into the system, we get the eigenvalue problem ⎞ ⎛ ⎞ ⎛ 2−λ 0 0 z1 ⎝ 1 ⎠ · ⎝ z2 ⎠ = 0, 1−λ 2 z3 −2 −4 −3 − λ i.e. we have to look for the eigenvalues of matrix ⎛ ⎞ 2 0 0 1 2 ⎠. A=⎝ 1 −2 −4 −3 Determining the eigenvalues of matrix A, we investigate 0010 0010 0010 0010 00102 − λ 0 0 0010 0010 0010 0010 = (2 − λ) · 00101 − λ 1−λ 2 |A − λI | = 00100010 1 0010 0010 −4 0010 −2 −4 −3 − λ 0010 0010 0010 2 0010 −3 − λ 0010 = (2 − λ) · [(1 − λ)(−3 − λ) + 8] = (2 − λ) · (λ2 + 2λ + 5) = 0. Differential and difference equations 467 This yields the eigenvalues √ λ1 = 2, λ2 = −1 + 1 − 5 = −1 + 2i λ3 = −1 − and √ 1 − 5 = −1 − 2i. Next, we determine for each eigenvalue a corresponding eigenvector. For λ1 = 2, we obtain the system 0z11 + 0z21 + 0z31 = 0 z11 −2z11 − z21 + = 0 − 4z21 − 2z31 5z31 = 0. The coefficient matrix of this system has rank two (row 1 can be dropped and the third equation is not a multiple of the second equation), so we can choose one variable arbitrarily. Choosing z31 = 6, all components in the resulting solution are integer and the corresponding eigenvector is ⎛ ⎞ −13 1 z = ⎝ −1 ⎠ . 6 For λ2 = −1 + 2i, we get the following system: (3 − 2i)z12 + 0z22 + 0z32 = 0 z12 + + 2z32 = 0 −2z12 − (2 − 2i)z22 4z22 + (−2 − 2i)z32 = 0. From the first equation, we get z12 = 0. This reduces the original system to the following system with two equations and two variables: (2 − 2i)z22 −4z22 + 2z32 = 0 + (−2 − 2i)z32 = 0. Obviously, both equations (i.e. both row vectors) are linearly dependent (since multiplying the first equation by (−1 − i) yields the second equation). Choosing z22 = 2, we obtain z32 = −2 + 2i. Thus, we have obtained the eigenvector ⎛ ⎞ 0 2 ⎠. 2 z =⎝ −2 + 2i Finally, for λ3 = −1 − 2i, we get the system + 0z23 + 0z33 = 0 z13 + + 2z33 = 0 −2z13 − (2 + 2i)z23 4z23 + (−2 + 2i)z33 = 0. (3 + 2i)z13 468 Differential and difference equations As in the above case, we get from the first equation z13 = 0. Solving the reduced system with two equations and two variables, we finally get the eigenvector ⎛ ⎞ 0 ⎠. 2 z3 = ⎝ −2 − 2i Thus, we have obtained the solutions ⎛ ⎞ −13 ⎝ −1 ⎠ e2x , 6 ⎛ ⎞ 0 ⎝ ⎠ e(−1+2i)x 2 −2 + 2i ⎛ and ⎞ 0 ⎝ ⎠ e(−1+2i)x . 2 −2 − 2i In order to get two linearly independent real solutions, we rewrite the second specific solution as follows. (Alternatively, we could also use the third solution resulting from the complex conjugate.) ⎛ ⎞ ⎞ 0 0 (−1+2i)x ⎠ e−x ⎝ ⎠e 2(cos 2x + i sin 2x) 2 =⎝ (−2 + 2i)(cos 2x + i sin 2x) −2 + 2i ⎛ ⎞ 0 ⎠ e−x . 2(cos 2x + i sin 2x) =⎝ −2 cos 2x − 2 sin 2x + i(2 cos 2x − 2 sin 2x) ⎛ In the first transformation step above, we have applied Euler’s formula for complex numbers. As we know from the consideration of differential equations of order n, the real part and the imaginary part form specific independent solutions (see also Table 12.1), i.e. we obtain ⎞ 0 ⎝ ⎠ e−x 2 cos 2x −2(cos 2x + sin 2x) ⎛ ⎛ and ⎞ 0 ⎝ ⎠ e−x . 2 sin 2x −2(− cos 2x + sin 2x) From the considerations above, we obtain the general solution as follows: ⎛ ⎞ ⎛ ⎞ ⎞ −13 0 y1 (x) ⎠ e−x ⎝ y2 (x) ⎠ = C1 ⎝ −1 ⎠ e2x + C2∗ ⎝ 2 cos 2x 6 −2(cos 2x + sin 2x) y3 (x) ⎛ ⎞ 0 ∗⎝ ⎠ e−x . 2 sin 2x + C3 −2(− cos 2x + sin 2x) ⎛ Differential and difference equations 469 Using −2Ci∗ = Ci for i ∈ {2, 3}, we can simplify the above solution and obtain ⎛ ⎛ ⎞ ⎛ ⎞ ⎞ y1 (x) −13 0 ⎝ y2 (x) ⎠ = C1 ⎝ −1 ⎠ e2x + C2 ⎝ ⎠ e−x − cos 2x y3 (x) 6 cos 2x + sin 2x ⎛ ⎞ 0 ⎠ e−x . − sin 2x + C3 ⎝ − cos 2x + sin 2x It remains to determine a solution satisfying the initial conditions. We obtain y1 (0) = y2 (0) = y3 (0) = 13 : −13C1 0: −C1 −4 : 6C1 + − + 0 C2 C2 + + − 0 = 0 = C3 = 13 0 −4 which has the solution: C1 = −1; C2 = 1 and C3 = −1. Thus, the particular solution yP (x) of the given initial value problem for a system of differential equations is as follows: ⎞ y1P (x) ⎟ ⎜ yP (x) = ⎝ y2P (x) ⎠ y3P (x) ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 13 0 0 ⎠ e−x + ⎝ ⎠ e−x . − cos 2x sin 2x = ⎝ 1 ⎠ e2x + ⎝ −6 cos 2x + sin 2x cos 2x − sin 2x ⎛ We consider the following system of linear differential equations of the Example 12.12 first order: y10010 = y1 + e2x y20010 = 4y1 − 3y2 + 3x. In order to determine a solution of the homogeneous system y10010 = y1 y20010 = 4y1 − 3y2 , we use the setting 0003 y1 (x) y2 (x) 0004 0003 = z1 z2 0004 eλx . 470 Differential and difference equations and determine first the eigenvalues of matrix 0003 0004 1 0 A= 4 −3 formed by the coefficients of the functions yi (x) in the given system of differential equations. We obtain 0010 0010 00101 − λ 0010 0 0010 0010 |A − λI | = 0010 4 −3 − λ 0010 = (1 − λ)(−3 − λ) = 0 which has the solutions λ1 = 1 and λ2 = −3. We determine the eigenvectors associated with both eigenvalues. For λ1 = −1, we obtain the system 0z11 + 0z21 = 0 4z11 − 4z21 = 0, which yields the eigenvector 001f 0003 0004 001e z11 1 z1 = = . 1 z1 2 For λ2 = −3, we obtain the system 4z12 + 0z22 = 0 4z12 + 0z22 = 0, which yields the eigenvector 001f 0003 0004 001e z12 0 z2 = = . 1 z22 Thus, we obtain the following general solution yH (x) of the homogeneous system: 001e 001f 0003 0004 0003 0004 y1H (x) 1 0 H x y (x) = e e−3x . + C = C 1 2 1 1 y2H (x) Next, we determine a particular solution yN (x) of the non-homogeneous system. Since we have the two forcing terms 3x and e2x , we use the setting 001e 001f 0003 0004 y1N (x) A1 x + A0 + Ae2x N y (x) = . = B1 x + B0 + Be2x yN (x) 2 Differential and difference equations 471 Notice that, according to the earlier comment, we have to use both the linear term and the exponential term in each of the functions y1N (x) and y2N (x). Differentiating the above functions, we get 001e 001f 0003 0004 (y1N )0010 (x) A1 + 2Ae2x . = B1 + 2Be2x (yN )0010 (x) 2 Inserting now yiN and (yiN )0010 , i ∈ {1, 2}, into the initial non-homogeneous system, we obtain A1 + 2Ae2x = A1 x + A0 + Ae2x + e2x B1 + 2Be2x = 4A1 x + 4A0 + 4Ae2x − 3B1 x − 3B0 − 3Be2x + 3x, which can be rewritten as A1 + 2Ae2x = A0 + A1 x + (A + 1)e2x B1 + 2Be2x = (4A0 − 3B0 ) + (4A1 − 3B1 + 3)x + (4A − 3B)e2x . Now we compare in each of the two equations the coefficients of the terms x1 , x0 and e2x which must coincide on the left-hand and right-hand sides: x 1 : 0 = A1 0 = 4A1 − 3B1 + 3, which yields the solution A1 = 0 B1 = 1. and Furthermore, x0 : A1 = A0 B1 = 4A0 − 3B0 . Using the solution obtained for B1 and A1 , we find A0 = 0 and 1 B0 = − . 3 Finally, e2x : 2A = A + 1 2B = 4A − 3B, which yields the solution A=1 and B= 4 . 5 472 Differential and difference equations Combining the results above, we get the particular solution yN (x) of the non-homogeneous system: 001e y (x) = N 001f y1N (x) y2N (x) ⎞ e2x ⎠. 1 4 =⎝ x − + e2x 3 5 ⎛ Finally, the general solution y(x) of the non-homogeneous system of differential equations is given by the sum of yH (x) and yN (x): 0003 y(x) = y1 (x) y2 (x) 0003 = C1 1 1 0004 = yH (x) + yN (x) 0004 0003 e + C2 x 0 1 ⎞ e2x ⎠. 1 4 +⎝ x − + e2x 3 5 ⎛ 0004 e −3x 12.4 LINEAR DIFFERENCE EQUATIONS 12.4.1 Definitions and properties of solutions While differential equations consider a relationship between a function of a real variable and its derivatives, in several economic applications differences in function values have to be considered. A typical situation arises when the independent variable is a discrete variable which may take e.g. only integer values. (This applies when a function value is observed every day, every month or every year or any other discrete periods.) For instance, if time is considered as the independent variable, and we are interested in the development of a certain function value over time, then it is often desired to give a relationship between several successive values yt , yt+1 , . . . , yt+k which are described by a so-called difference equation. For the solution of special types of such equations we use a similar concept as for the solution of differential equations. We begin with the formal definition of a difference equation of order n. Definition 12.6 If for arbitrary successive n + 1 terms of a sequence {yt } the equation yt+n = an−1 (t) yt+n−1 + · · · + a1 (t) yt+1 + a0 (t) yt + q(t) (12.13) holds, it is called a linear difference equation. If a0 (t) = 0, the number n denotes the order of the difference equation. Definition 12.7 If we have q(t) ≡ 0 in equation (12.13), the difference equation is called homogeneous, otherwise it is called non-homogeneous. Differential and difference equations 473 Definition 12.8 Any sequence {yt } satisfying the difference equation (12.13) is called a solution. The set S of all sequences {yt } satisfying the difference equation (12.13) is called the set of solutions or the general solution. First, we present some properties of solutions of linear homogeneous differential equations of order n. Homogeneous difference equations For a homogeneous difference equation of order n, we can state analogous results as for a homogeneous differential equation of order n given in Chapter 12.2.1. THEOREM 12.7 Let {yt1 }, {yt2 }, . . . , {ytm } with t = 0, 1, . . . be solutions of the linear homogeneous difference equation yt+n = an−1 (t) yt+n−1 + · · · + a1 (t) yt+1 + a0 (t) yt (12.14) Then the linear combination yt = C1 yt1 + C2 yt2 + · · · + Cm ytm is a solution as well with C1 , C2 , . . . , Cm ∈ R. Definition 12.9 The solutions {yt1 }, {yt2 }, . . . , {ytm }, m ≤ n, t = 0, 1, . . ., of the linear homogeneous difference equation (12.14) are said to be linearly independent if C1 yt1 + C2 yt2 + · · · + Cm ytm = 0 for t = 0, 1, . . . . is only possible for C1 = C2 = · · · = Cm = 0. Otherwise, the solutions are said to be linearly dependent. The next theorem gives a criterion to decide whether a set of solutions for the homogeneous difference equation is linearly independent. THEOREM 12.8 The solutions {yt1 }, {yt2 }, . . . , {ytm }, m ≤ n, t = 0, 1, . . . , of the linear homogeneous difference equation (12.14) are linearly independent if and only if 0010 1 0010 0010 yt yt2 .. ytm 00100010 0010 1 m 2 0010 y 0010 yt+1 .. yt+1 0010 t+1 0010 C(t) = 0010 . 0010 = 0 for t = 0, 1, . . . . . . . . 0010 . 0010 0010 0010 0010y 1 0010 y2 . . . ym t+m−1 t+m−1 t+m−1 C(t) is known as Casorati’s determinant. Now we can describe the general solution of a linear homogeneous difference equation of order n as follows. 474 Differential and difference equations THEOREM 12.9 Let {yt1 }, {yt2 }, . . . , {ytn } with t = 0, 1, . . ., be n linearly independent solutions of a linear homogeneous difference equation (12.14) of order n. Then the general solution can be written as + * SH = {ytH } | ytH = C1 yt1 + C2 yt2 + · · · + Cn ytn , C1 , C2 , . . . , Cn ∈ R . Non-homogeneous difference equations The following theorem describes the structure of the general solution of a linear nonhomogeneous difference equation of order n. THEOREM 12.10 Let * SH = {ytH } | ytH = C1 yt1 + C2 yt2 + · · · + Cn ytn , C1 , C2 , . . . , Cn ∈ R + be the general solution of a linear homogeneous difference equation (12.14) of order n and {ytN } be a particular solution of the non-homogeneous equation (12.13). Then the general solution of a linear non-homogeneous difference equation (12.13) can be written as * + S = {yt } | yt = ytH + ytN , {ytH } ∈ SH * + = {yt } | yt = C1 yt1 + C2 yt2 + · · · + Cn ytn + ytN , C1 , C2 , . . . , Cn ∈ R . Initial value problems If in addition n initial values of a linear difference equation of order n are given, we can present the following sufficient condition for the existence and uniqueness of a solution of the inital value problem. THEOREM 12.11 (existence and uniqueness of a solution) Let n successive initial values y0 , y1 , . . . , yn−1 for a linear difference equation yt+n = an−1 yt+n−1 + · · · + a1 yt+1 + a0 yt + q(t) with constant coefficients ai , i ∈ {0, 1, . . . , n − 1}, be given. Then this linear difference equation has exactly one particular solution {ytP } with t = 0, 1, . . . . We note that, if the n initial values are given for not successive values of t, it is possible that no solution or no unique solution exists. In the following, we describe the solution process for difference equations of the first order with arbitrary coefficients and for difference equations of the second order with constant coefficients. 12.4.2 Linear difference equations of the first order Next, we consider a linear difference equation of the first order such as: yt+1 = a0 (t)yt + q(t). Differential and difference equations 475 This difference equation can be alternatively written by means of the differences of two successive terms yt+1 and yt as y = yt+1 − yt = a(t)yt + q(t), where a(t) = a0 (t) − 1. (The latter notation illustrates why such an equation is called a difference equation.) Constant coefficient and forcing term First, we consider the special cases of a constant coefficient a0 (t) = a0 and a constant forcing term q(t) = q. If we consider the homogeneous equation yt+1 = a0 yt , we obtain the following: y1 = a0 y0 y2 = a0 y1 = a20 y0 . . . . yt = a0 yt−1 = at0 y0 . If there is a constant forcing term, i.e. the non-homogeneous difference equation has the form yt+1 = a0 yt + q, we obtain: y1 = a0 y0 + q y2 = a0 y1 + q = a0 (a0 y0 + q) + q = a20 y0 + a0 q + q y3 = a0 y2 + q = a0 (a20 y0 + a0 q + q) + q = a30 y0 + q(a20 + a0 + 1) . . . . yt = at0 y0 + q(at−1 0 + · · · + a0 + 1). For a0 = 1, the latter equation can be written as follows (using the formula for the (t + 1)th partial sum of a geometric sequence): yt = at0 y0 + q · at0 − 1 , a0 − 1 t = 1, 2, . . . . For a0 = 1, we obtain yt = y0 + q · t. Variable coefficient and forcing term Now we consider the general case, where the coefficient and the forcing term depend on variable t, i.e. the difference equation has the form yt+1 = a0 (t)yt + q(t). 476 Differential and difference equations Then we obtain: y1 = a0 (0)y0 + q(0) y2 = a0 (1)y1 + q(1) = a0 (1) [a0 (0)y0 + q(0)] + q(1) = a0 (1)a0 (0)y0 + a0 (1)q(0) + q(1) y3 = a0 (2)y2 + q(2) = a0 (2) [a0 (1)a0 (0)y0 + a0 (1)q(0) + q(1)] + q(2) = a0 (2)a0 (1)a0 (0)y0 + a0 (2)a0 (1)q(0) + a0 (2)q(1) + q(2) . . . . yt = t−1 a0 (i)y0 + t−2 0006 i=0 Example 12.13 j=0 ⎛ ⎝ t−1 ⎞ a0 (i)⎠ q(j) + q(t − 1), t = 1, 2, . . . i=j+1 Consider the following instance of the so-called cobweb model: yt+1 = 1 + 2pt xt = 15 − pt y t = xt , where for a certain product, yt denotes the supply, xt denotes the demand and pt the price at time t. Moreover, let the initial price p0 = 4 at time zero be given. The first equation expresses that a producer fixes the supply of a product in period t + 1 in dependence on the price pt of this product in the previous period (i.e. the supply increases with the price). The second equation means that the demand for a product in period t depends on its price in that period (i.e. the demand decreases with increasing price). The third equation gives the equilibrium condition: all products are sold, i.e. supply is equal to demand. Using the equilibrium yt+1 = xt+1 , we obtain 1 + 2pt = 15 − pt+1 pt+1 = 14 − 2pt which is a linear difference equation of the first order (with constant coefficient and forcing term). Applying the above formula and taking into account that a0 = −2 = 1, we obtain at0 − 1 (−2)t − 1 = (−2)t p0 + 14 · a0 − 1 −2 − 1 0003 0004 14 14 14 t t = (−2) p0 + · 1 − (−2) = + (−2)t p0 − . 3 3 3 pt = at0 p0 + q · Using the initial condition p0 = 4, we get the solution 0004 0004 0003 0003 14 14 14 2 pt = + (−2)t · 4 − = + (−2)t · − . 3 3 3 3 Differential and difference equations 477 For the initial value problem, we get the terms p0 = 4, p2 = 14 3 p4 = 14 3 0003 0004 14 2 18 + (−2)1 · − = = 6, 3 3 3 0003 0003 0004 0004 2 14 2 6 30 + (−2)2 · − = = 2, p3 = + (−2)3 · − = = 10, 3 3 3 3 3 0003 0004 2 18 =− = −6, . . . , + (−2)4 · − 3 3 p1 = which is illustrated in Figure 12.3. Figure 12.3 The solution of Example 12.13. 478 Differential and difference equations 12.4.3 Linear difference equations of the second order Next, we consider a linear difference equation of the second order with constant coefficients: yt+2 = a1 yt+1 + a0 yt + q(t), (12.15) or equivalently, yt+2 − a1 yt+1 − a0 yt = q(t). First, we consider the homogeneous equation, i.e. q(t) ≡ 0: yt+2 − a1 yt+1 − a0 yt = 0. In order to determine two linearly independent solutions and then the general solution ytH according to Theorem 12.10, we use the setting ytH = mt , which is substituted in the homogeneous difference equation. This yields the quadratic characteristic equation m2 − a1 m − a0 = 0 (12.16) which has the two solutions 0019 a21 a1 m1,2 = + a0 . ± 4 2 We can distinguish the following three cases. Treatment of these cases is done in the same way as in Chapter 12.2.2 for linear differential equations (where the term yt+i in a difference equation ‘corresponds’ to the derivative y(i) (x) in the differential equation). (1) Inequality a21 + a0 > 0 4 holds, i.e. there are two different real roots of the quadratic equation. Then we set yt = C1 mt1 + C2 mt2 . (2) Equation a21 + a0 = 0 4 holds, i.e. there is a real root m1 = m2 of multiplicity two. Analogously to the treatment of a differential equation, we set yt = (C1 + C2 t)mt1 . Differential and difference equations 479 (3) Inequality a21 + a0 < 0 4 holds, i.e. there are two complex roots m1,2 = a ± bi of the quadratic characteristic equation with 0017 −a21 − 4a0 a1 and b= . a= 2 2 Then we get the real solutions yt = r t · (C1 cos αt + C2 sin αt), where r= 0018 √ −a0 = a2 + b2 and α is such that a a1 √ · −a0 = cos α = − 2a0 r 0017 and sin α = − 4a20 + a0 a21 2a0 = b . r Next, we consider the non-homogeneous equation, i.e. we have q(t) ≡ 0 in equation (12.15). A particular solution ytN of a non-homogeneous differential equation of the second order can be found for a polynomial or an exponential function as forcing term q(t) by a specific setting as shown in Table 12.3. Analogously to differential equations, we can also use this method of undetermined coefficients for a sum or product of the above forcing terms. In such a case, the setting has to Table 12.3 Settings for specific forcing terms in the difference equation Forcing term q(t) Setting ytN bn t n + bn−1 t n−1 + · · · + b1 t + b0 (a) ytN = An t n + An−1 t n−1 + · · · + A1 t + A0 if a0 + a1 = 1 in the characteristic equation (12.16) (b) ytN = t · (An t n + An−1 t n−1 + · · · + A1 t + A0 ) if a0 + a1 = 1 and a1 = 2 in (12.16) (c) ytN = t 2 · (An t n + An−1 t n−1 + · · · + A1 t + A0 ) if a0 + a1 = 1 and a1 = 2 in (12.16) (a) ytN = Abt if b is not a root of the characteristic equation (12.16) (b) ytN = A · t k · bt if b is a root of multiplicity k of the characteristic equation (12.16) abt 480 Differential and difference equations represent the corresponding sum or product structure of the forcing term. We now apply this method as before in the case of differential equations. Example 12.14 As an example, we consider an instance of the multiplier-accelerator model by Samuelson, which describes a relationship between the national income yt , the total consumption ct and the total investment it of a country at some time t. Let the following equations yt = ct + it + 50 1 · yt−1 2 it = ct − ct−1 ct = together with the initial conditions y0 = 55 and y1 = 48 be given. The first equation shows that the income at time t is equal to the consumption plus the investment plus the constant government expenditure (in our example they are equal to 50 units). The second equation shows that the consumption at time t is half the income at time t − 1. Finally, the third equation shows that the total investment at time t is equal to the difference in the total consumption at times t and t − 1. Substituting the equations for ct and it into the first equation, we obtain 1 · yt−1 + (ct − ct−1 ) + 50 2 0004 0003 1 1 1 = · yt−1 + · yt−1 − · yt−2 + 50 2 2 2 yt = = yt−1 − 1 · yt−2 + 50. 2 Using the terms yt+2 , yt+1 , yt instead of yt , yt−1 , yt−2 , we obtain the second-order difference equation yt+2 = yt+1 − 1 · yt + 50. 2 To find the general solution of the homogeneous equation, we use the setting yt = mt which, after substituting into the difference equation, yields the quadratic equation m2 − m + 1 = 0. 2 We get the two solutions 0016 1 1 1 1 i m1 = + − = + 2 4 2 2 2 and 1 m2 = − 2 0016 1 1 1 i − = − . 4 2 2 2 Differential and difference equations 481 Using a1 = 1 and a0 = −1/2, we get the following equations for r and α: 0016 1√ 1 r= = 2 2 2 and α is such that a1 √ 1 cos α = − · −a0 = − · 2a0 2(− 12 ) and sin α = − 0017 4a20 + a0 a21 2a0 =− 0016 1√ 1 2 = 2 2 0017 4 · (− 12 )2 + (− 12 ) · 1 2(− 12 ) 0016 = 1− 1√ 1 = 2. 2 2 The latter two equations give α = π/4. Thus, the general solution of the homogeneous difference equation is given by 0003 0004 1√ t 0011 π π 0012 ytH = 2 · C1 cos t + C2 sin t . 2 4 4 To find a particular solution of the non-homogeneous equation, we use the setting yt = A. Using yt+2 = yt+1 = A and substituting the latter into the difference equation, we obtain 1 A = A − A + 50. 2 This yields 1 · A = 50 2 A = 100, and thus the particular solution of the non-homogeneous equation is ytN = 100. Therefore, the general solution of the non-homogeneous difference equation is given by 0003 0004 1√ t 0011 π π 0012 yt = ytH + ytN = 2 · C1 cos t + C2 sin t + 100. 2 4 4 Finally, we determine the solution satisfying the initial conditions. We get y0 = C1 · 1 + C2 · 0 + 100 = 55 =⇒ C1 = −45 1√ 1√ 1√ 1√ y1 = C1 · 2· 2 + C2 2· 2 + 100 2 2 2 2 1 1 = C1 · + C2 · + 100 2 2 45 1 = − + C2 · + 100 = 48 =⇒ C2 = −59. 2 2 482 Differential and difference equations Thus, we get the particular solution ytP of the initial value problem 0003 ytP = ytH + ytN = −45 0004t 1√ 2 2 cos 0003 0004 1√ t π π t − 59 2 sin t + 100. 4 2 4 Using this solution, we get 0003 00042 y2 = −45 · 1√ 2 2 y3 = 96.5, y4 = 111.25, . . . cos π − 59 · 2 0003 00042 1√ 2 2 sin π + 100 = 70.5, 2 The solution is illustrated in Figure 12.4. Figure 12.4 The solution of Example 12.14. Example 12.15 Let the non-homogeneous differential equation yt+2 = 2yt+1 − yt + 1 be given. To solve the homogeneous equation, we get the characteristic equation m2 − 2m + 1 = 0 Differential and difference equations 483 (notice that we have a1 = 2 and a0 = −1 in the characteristic equation) which has a real solution of multiplicity two: m1 = m2 = 1. Therefore, the general solution of the homogeneous equation is given by ytH = C1 · 1t + C2 · t · 1t = C1 + C2 t. To get a particular solution of the non-homogeneous equation, we have to use case (c) of a polynomial forcing term (since a0 + a1 = −1 + 2 = 1 and a1 = 2) which means that we set ytN = At 2 . (In fact, the reader may check that both settings ytN = A and ytN = At are not appropriate to determine coefficient A). Substituting ytN into the non-homogeneous equation, we get A(t + 2)2 = 2A(t + 1)2 − At 2 + 1 which can be rewritten as A(t 2 + 4t + 4) − 2A(t 2 + 2t + 1) + At 2 = 1. Comparing now the terms on both sides, we get 4A − 2A = 1 which gives the solution A= 1 . 2 Therefore, we have obtained ytN = 1 2 t , 2 and the general solution of the difference equation is given by 1 yt = ytH + ytN = C1 + C2 t + t 2 . 2 EXERCISES 12.1 Consider the differential equation y0010 = y. (a) Draw the direction field. (b) Solve the differential equation by calculation. Find the particular solution satisfying y(0) = 1 resp. y(0) = −1. (c) Draw the particular solution from part (b) in the direction field of part (a). 484 Differential and difference equations 12.2 Find the solutions of the following differential equations: (a) 12.3 12.4 y0010 = ex−y ; (b) (1 + ex )yy0010 = ex with y(0) = 1. Let a curve go through the point P : (1, 1). The slope (of the tangent line) of the function at any point of the curve should be proportional to the squared function value at this point. Find all curves which satisfy this condition. The elasticity εf (x) of a function f : Df → R is given by 0003 0004 1 εf (x) = 2x2 ln x + . 2 Find function f as the general solution of a differential equation and determine a particular function f satisfying the equality f (1) = 1. 12.5 Let y be a function of t and y0010 (t) = ay(T − t)/t, where 0 < t ≤ T and a is a positive constant. Find the solution of the differential equation for a = 1/2 and T = 200. 12.6 Check whether y1 = x, y2 = x ln x and y3 = 1 x form a fundamental system of the differential equation x3 y001000100010 + 2x2 y00100010 − xy0010 + y = 0. Determine the general solution of the given differential equation. 12.7 Find the solutions of the following differential equations: (a) y0010 − 2y = sin x; (b) y00100010 − 2y0010 + y = 0 with y(0) = 1 and y0010 (0) = 2; (c) 2y00100010 + y0010 − y = 2ex ; (d) y00100010 − 2y0010 + 10y = 10x2 + 18x + 7.6 with y(0) = 0 and y0010 (0) = 1.2. 12.8 Solve the following differential equations: (a) 12.9 y001000100010 − 2y00100010 + 5y0010 = 2 cos x; (b) y001000100010 + 3y00100010 − 4y = 3ex . Find the solutions of the following systems of first-order differential equations. (a) y10010 = ay1 −y2 y20010 = y1 +ay2 ; −2y2 −y3 (b) y10010 = y1 y20010 = −y1 +y2 +y3 y30010 = y1 −y3 with y1 (0) = 12, y2 (0) = 6 and y3 (0) = −6. 12.10 A time-dependent process is described by the linear difference equation yt+1 = a(t)yt + b(t). (a) Find the solution of the difference equation for a(t) = 2 and b(t) = b (b is constant) such that y0 = 1. Differential and difference equations 485 (b) Investigate how the forcing term b influences the monotonicity property of the solution. (c) Graph the solution for b = 1/2 and for b = −2. 12.11 Consider the following cobweb model. In period t, the supply of a product is given by yt = −5 + 2pt−1 , and the demand is given by xt = 7 − pt , where pt denotes the price of this product at time t. Moreover, equality yt = xt holds. (a) Formulate the problem as a difference equation. (b) Solve this difference equation with p0 = 3. (c) Graph the cobweb. 12.12 Find the solutions of the following difference equations: (a) yt+2 = −2yt+1 − 4yt ; (b) yt+2 = 2yt+1 + 8yt + 3t + 6 with y0 = −2/3 and y1 = 5; (c) yt+2 = −2yt+1 − yt + 4t . 12.13 A company intends to modernize and enlarge its production facilities, which is expected to lead to an increase in the production volume. The intended increase in the production volume in year t + 1 is 40 per cent of the volume of year t plus 15 per cent of the volume of year t − 1. In year t, it is assumed that 440 units are produced, and in year t − 1, 400 units have been produced. (a) Formulate a difference equation for this problem. (b) Find the general solution of this difference equation. (c) What is the production volume after five years? 12.14 Given is the difference equation yt+1 = 3yt + 2t + 1. (a) Find a solution by using the solution formula. (b) Find the solution as the sum of the general solution of the corresponding homogeneous difference equation and a particular solution of the non-homogeneous difference equation. Proceed as in the case of a linear difference equation of second order with constant coefficients. (c) Prove by induction that for t ∈ N, the solutions found in (a) and (b) coincide. Selected solutions 1 INTRODUCTION 1.1 1.2 (a) false; (d) true; (e) false D : A ∧ B ∧ C, E : A ∧ B ∧ C, F : A ∨ B ∨ C, G : (A ∧ B ∧ C) ∨ (A ∧ B ∧ C) ∨ (A ∧ B ∧ C), H : F, I : D 1.4 (a) A ⇔ B; (b) A ⇒ B; (c) B ⇒ A; (d) A ⇔ B; (e) : 1.5 (a) true, negation x2 − 5x + 10 ≤ 0 is false; x: (b) false, negation x2 − 2x ≤ 0 is true A⇒B x 1.8 1.9 1.16 (a) T, F, F, T; (b) T, F, T, T; (c) T, T, F, F; (d) F, T, T, T A ∪ B = {1, 2, 3, 5, 7, 8, 9, 11}, |A ∪ B| = 8, A ∩ B = {1, 3, 7, 9}, |A ∩ B| = 4, |A| = 6, |B| = 6, A B = {5, 11}, |A B| = 2 subsets: ∅, {1}, {2}, {1, 2}; |P(P(A))| = 16 4 110 students have a car and a PC, 440 have a car but no PC, 290 have a PC but no car and 840 students have a car or a PC A × A = {(1, 1), (1, 2), (2, 1), (2, 2)}, A × B = {(1, 2), (1, 3), (2, 2), (2, 3)}, B × A = {(2, 1), (3, 1), (2, 2), (3, 2)}, A × C = {(1, 0), (2, 0)}, A × C × B = {(1, 0, 2), (1, 0, 3), (2, 0, 2), (2, 0, 3)}, A × A × B = {(1, 1, 2), (1, 2, 2), (2, 1, 2), (2, 2, 2), (1, 1, 3), (1, 2, 3), (2, 1, 3), (2, 2, 3)} M1 × M2 × M3 = {(x, y, z) | (1 ≤ x ≤ 4) ∧ (−2 ≤ y ≤ 3) ∧ (0 ≤ z ≤ 5)} 1.17 1.18 1.19 1.20 479,001,600 and 21,772,800 56 125 and 60 n2 − n 1.21 5 1.22 (a) 1.10 1.13 1.14 1.15 1,128 2 2 30y − 7x 1.23 (a) ; (b) a +2b 6xy + 12y2 2a (a) {x ∈ R | x ≥ −1}; (b) {x ∈ R | (x < 2) ∨ (x ≥ 11/4)}; 1.24 (c) {x ∈ R | (x < −1) ∨ (0 < x ≤ 2)}; (d) {x ∈ R | (x < −2) ∨ (2 < x < 4)} 1.25 (a) (−2, 2); (b) (1, 5); (c) (1/2, 5/2); (d) (0, ∞); (e) (−1, −1/3] Selected solutions 487 −19/7; 1.26 (a) 1.27 (a) x = 15; (c) x1 = −2, 1.28 1.29 (a) a = 2; (a) x = 6; 1.30 (b) x1 = 2i, (b) 9c/b6 √ √ (b) x1 = 2, x2 = − 2, x2 ≈ 19.36, (x3 ≈ −4.99) (b) x = 1 (b) x1 = 1, x2 = −2i, x2 = 10, x3 = 3i, x3 = 1, x3 = 0.001; x4 = −1; (c) x=9 x4 = −3i 1.31 1.32 z1 + z2 = −1 + 5i, z1 − z2 = 3 + 3i, z1 2 9 = − i z2 5 5 z1 z2 = −6 − 7i, 0019(z) = 0, 001a(z) = −1, z = −i; 0019(z) = −1, √ 001a(z) = 0, z = −1; √ 0019(z) = − 3, 001a(z) = −3, z = − 3 − 3i 1.34 (a) z = −i = 1(cos 32 π + i sin 32 π) = e3π i/2 ; (b) z = −1 = 1(cos π + i sin π) = eπ i ; (c) z = 3(cos 60◦ + i sin 60◦ ) = 3eπ i/3 √; √ (d) z = 5(cos 243◦ + i sin 243◦ ) = 5 · e4.25i √ √ √ z2 = −1 + 3 i, z3 = − 3 − i, 1.37 (a) z1 = 3 + i, √ √ (b) z1 = 25 i, z3 = 45 ( 3 − i) z2 = 45 (− 3 − i), 1.38 a1 = −32, a2 = 1 1.33 (a) (b) (c) z4 = 1 − 2 SEQUENCES; SERIES; FINANCE 2.1 (a) 2.2 (a) a101 = 815; (b) d = 2, a1 = 7, an = 2n + 5 strictly increasing (b) unbounded; (c) an+1 = an + 1/2 2.3 (a) a1 = 3, 2.4 {an } is strictly decreasing and bounded, n = 16; (b) a1 = 18, {bn } is not monotone but bounded, {cn } is decreasing and bounded, q = 1/3 lim an = −5; n→∞ lim bn = 0; n→∞ lim cn = 0 n→∞ √ 3 i; 488 Selected solutions 2.5 (a) (b) (c) 2.6 (a) (b) lim an = 2/e; ⎧ ⎨ −∞ for a < 0 1/3 for a = 0 ; lim bn = n→∞ ⎩ ∞ for a > 0 lim cn = 0 (both for c1 = 1 and c1 = 4) n→∞ n→∞ shirts: a1 = 2, 000; a2 = 2, 500; a3 = 3, 000; a10 = 6, 500; trousers: a1 = 1, 000; a2 = 1, 200; a3 = 1, 440; a10 = 5, 159.78; shirts: s15 = 82, 500; trousers: s15 = 72, 035.11 (a) s = −10/7; (b) s = 1 (a) series converges for −1 ≤ x < 1; (b) series converges for |x| > 1 92 2.9 (a) s1 = −12, s2 = −8, s3 = − 9 , s4 = − 157 18 , series converges; (b) s1 = 1, s2 = 3, s3 = 29 , s4 = 31 , series converges; 6 16 9 113 , s = 113 + 3 , s = s + 4 , series converges; (c) s1 = 21 , s2 = 162 3 3 162 49 4 516 25 , series does not converge. (d) s1 = 0, s2 = 23 , s3 = 65 , s4 = 12 2.7 2.8 2.10 (a) 17, 908.47 EUR; (b) 17, 900.51 EUR; (c) 18, 073.83 EUR; (c) is the best. 2.11 49, 696.94 EUR 2.12 interest rate i = 0.06 2.13 (a) Aannually = 16, 288.95 EUR; Aquarterly = 16, 436.19 EUR; Amonthly = 16, 470.09 EUR; (b) ieff −quarterly = 0.050945; ieff −monthly = 0.051162 2.14 5,384.35 EUR 2.15 (a) bank A: 2, 466 EUR; bank B: 2, 464.38 EUR (b) bank A: 2, 478 EUR; bank B: 2, 476.21 EUR 2.16 (a) sinking fund deposit: 478.46 EUR; (b) sum of interest and deposit: 550.46 EUR 2.17 1, 809.75 EUR 2.18 (a) V1 = 62, 277.95 EUR; (b) V2 = 63, 227.62 EUR 2.19 (a) A = 82, 735.69 EUR; 2.20 (a) P = 22, 861.15 EUR; (b) P120 = 58, 625.33 EUR total payment 228, 611.50 EUR Redemption table for answer 2.20(a) (EUR) Period (year) Annuity Amortization instalment Interest Amount of the loan at the end 1 2 3 4 5 6 7 8 9 10 22,861.15 22,861.15 22,861.15 22,861.15 22,861.15 22,861.15 22,861.15 22,861.15 22,861.15 22,861.15 10,111.15 10,970.59 11,903.10 12,914.86 14,012.62 15,203.70 16,496.01 17,898.17 19,419.52 21,070.18 12,750.00 11,890.55 10,958.05 9,946.29 8,848.52 7,657.45 6,365.13 4,962.97 3,441.63 1,790.97 139,888.85 128,918.25 117,015.15 104,100.29 90,087.66 74,883.96 58,387.95 40,489.77 21,070.25 0.07 Selected solutions 489 (b) total payment: 220,125.00 EUR Redemption table for answer 2.20(b) (EUR) 2.21 2.22 (c) (a) (a) (b) (c) Period (year) Annuity Amortization instalment Interest Amount of the loan at the end 1 2 3 4 5 6 7 8 9 10 27,750 26,475 25,200 23,925 22,650 21,375 20,100 18,825 17,550 16,275 15,000 15,000 15,000 15,000 15,000 15,000 15,000 15,000 15,000 15,000 12,750 11,475 10,200 8,925 7,650 6,375 5,100 3,825 2,550 1,275 135,000 120,000 105,000 90,000 75,000 60,000 45,000 30,000 15,000 0 15.1 years the project should not go ahead; (b) 1 per cent, 2 per cent; (c) yes depreciation amount in year 1: 7,000 EUR, in year 8: 7,000 EUR depreciation amount in year 1: 12,444.44 EUR, in year 8: 1,555.55 EUR depreciation amount in year 1: 9,753.24 EUR, in year 8: 4,755.21 EUR 3 RELATIONS; MAPPINGS; FUNCTIONS OF A REAL VARIABLE R: 3.1 (a) 5 4 3 2 1 (b) (c) (d) 3.2 (a) (c) 3.3 (a) q q q q q q r r r r PP 0011 3 @ P0011 7 0013 P q P @00110013 0011 0013 0011 @ P QP 0013[email protected] Q P R @ q P 1 0010 0013 Q 00100010 s Q 00100010 Q 0013 r r q q q q q 1 2 3 4 5 T, F, F, F; R−1 = {(2, 1), (4, 1), (1, 3), (4, 3), (5, 3), (1, 5), (4, 5)}; R: no mapping; S: mapping bijective mapping, (b) surjective mapping, mapping, (d) injective mapping Illustration by graphs: f: g: 1 Q 3 1 0011 Q00110011 - 2 2 0011 Q Q Q 3 s 3 0011 2 3 (b) (c) 1 2 3 4 5 r 1 2 3 4 5 f ◦ g: - 1 - 2 3 2 3 - 1 1 Q - 2 Q - 2 Q Q s 3 Q 3 Df = A, Rf = B; Dg = C, Rg = {1, 2} ⊆ A; Df ◦g = Rf ◦g = C ⊆ B; f : bijective, g: injective, f ◦ g: injective, g ◦ f : no mapping 490 Selected solutions 3.4 F −1 is a mapping with domain DF −1 = R and range RF −1 = [−2, ∞). F q q 4 @ @ F −1 3q F −1 @ 2q @ 1q q [email protected] q q q q q q q -4 -3 [email protected] q 1 2 3 4 -1 @ -2 q q -3 @ q @ @ @ @F 3.5 F is a function, F −1 exists.√ √ G is not a function, y = ± 9 − 4.5x2 is an ellipse with midpoint (0, 0) and a = 2, b = 3. 3.6 (g ◦f )(x) = 4x2 + 4x − 1 (f ◦g)(x) = 2x2 − 3 3.7 (a) f and g bijective; (b) f −1 (x) = ln x, (c) f ◦g = e−x , g ◦f = −ex g −1 (x) = −x; 3.8 a = −1, Rf = {x ∈ R | −4 ≤ y < ∞}, f −1 (x) = −1 + 3.9 (a) Df = {x ∈ R | x ≥ 0}, Rf = {y ∈ R | − 1 ≤ y < 1}; 2 f −1 : y = 16 · (x + 1)2 , −1 ≤ x < 1; (x − 1) √ f −1 : y = 3 x + 2 (b) Df = R, Rf = R, 3.10 (a) (P5 /P2 )(x) = 2x3 − 2x2 − 12x; (d) √ 4+x (b) P5 (x) = (x − 1)(x − 1)(x + 2)(x − 3)2x Selected solutions 491 3.11 x1 , x2 , x4 are zeroes; x3 is not a zero; 000e √ 000f000e √ 000f f (x) = (x − 1)(x + 1)(x + 1)(x + 2) x − 12 (1 + 3i) x − 12 (1 − 3i) 3.12 (a) (b) f1 : Df1 = R, Rf1 = [−1, 1], odd; f3 : Df3 = R, Rf3 = [−1, 1], odd; f5 : Df5 = R, Rf5 = [−1, 1]; f2 : Df2 = R, Rf2 = [−2, 2], odd; f4 : Df4 = R, Rf4 = [1, 3]; f1 : Df1 = R, Rf1 = {x ∈ R : x > 0}; f2 : Df2 = R, Rf2 = {x ∈ R : x > 0}; f3 : Df3 = R, Rf3 = {x ∈ R : x > 0}; f4 : Df4 = R, Rf3 = {x ∈ R : x > 2}; f5 : Df5 = R, Rf5 = {x ∈ R : x > 0} 3.13 (a) Df = {x ∈ R | x = 0}, Rf = R, (b) Df = {x ∈ R | x > 0}, Rf = R, f unbounded and even, f unbounded, f strictly decreasing for x < 0, f strictly increasing f strictly increasing for x > 0 492 Selected solutions (c) Df = R, Rf = {y ∈ R | y ≥ 5}, f bounded from below and even, f strictly decreasing for x ≤ 0, f strictly increasing for x ≥ 0 (d) Df = {x ∈ R | |x| ≤ 2}, Rf = {y ∈ R | 0 ≤ y ≤ 2}, f bounded and even, f strictly increasing for x ≤ 0, f strictly decreasing for x ≥ 0 (e) Df = R, Rf = {y ∈ R | y > 1}, f bounded from below, f strictly decreasing (f) Df = R, Rf = {y ∈ R | y ≥ 0}, f bounded from below, f decreasing 4 DIFFERENTIATION 4.1 (a) (c) 4.2 (a) (b) (c) 4.3 (a) 4.4 (a) (b) 4.5 (a) (c) (d) lim f (x) = a; (b) lim f (x) = 0; (d) x→x0 x→0 lim f (x) = 1; x→1 lim f (x) = 2, x→1+0 lim f (x) = 1 x→1−0 3 2 + 2x = 2, gap; lim x −x3x −2 x→2 3 3 2 3x2 lim x x− lim x − 3x = −∞, pole; − 2 = ∞, x→2+0 (x − 2) 2 lim x − 3x = −∞, pole x→2 (x − 2)2 f (2) not defined, x0 = 2 gap; (b) continuous; (c) jump; (d) jump not differentiable at x0 = 5, differentiable at x1 = 0; differentiable at x0 = 0, not differentiable at x1 = 2 y0010 = 6x2 − 5 − 3 cos x; (b) y0010 = (4x3 + 4) sin x + (x4 + 4x) cos x; 2 2 2 y0010 = 4x + 2x sin x + 2 sin x + sin x2 − x cos x + cos x ; (2 + sin x) y0010 = 4(2x3 − 3x + ln x)3 (6x2 − 3 + 1x ); x→2−0 3 Selected solutions 493 y0010 −4(x3 + 3x2 − 8)3 (3x2 + 6x) sin(x3 + 3x2 − 8)4 ; (e) = (f) y0010 = −4(3x2 + 6x) cos3 (x3 + 3x2 − 8) sin(x3 + 3x2 − 8); x x (g) y0010 = e√cos e ; (h) y0010 = 22x x +1 2 sin ex 1 ; (b) y0010 = cos (c) y0010 = 32 (x−2/3 + x−1/3 ) 4.6 (a) y0010 = sin x + cos x; x 000f 000e x 4.7 (a) f 0010 (x) = ln(tan x) + (tan x)x ; sin x cos x 0011 0012 (b) f 0010 (x) = ln x + 1 − 1x xx−1 cos(xx−1 ); √ 0003 0004 1 2 1 3 (x + 2) x − 1 0010 (c) f (x) = − + − x−2 x + 2 2(x − 1) x x3 (x − 2)2 4 4.8 (a) f 001000100010 (x) = 6 cos x − 6x sin x − x2 cos x; (b) f 001000100010 (x) = 3 ; x x2 + 12x + 12 (c) f 001000100010 (x) = −12 · ; (d) f 001000100010 (x) = (x + 4)ex (x − 2)6 4.9 exact change: 28; approximation: 22 4.10 −10 1 ; ε (x) = 1 x; ε (x) = 1 ; 4.11 f (x) = 21 ; g (x) = 2x g f 2 2 1 1 1 ; g (1) = ; g (100) = . 2 2 200 Function f is inelastic for x ∈ (0, 2) and elastic for x > 2; function g is inelastic for x > 2. f (1) = f (100) = When f changes from x0 = 1 to 1.01, the function value changes by 0.5 per cent; when f changes from 100 to 101, the function value changes by 50 per cent. When g changes from some x0 by 1 per cent, the function value always changes by 0.5 per cent. 4.12 εD (p) = −4(p − 1)p; √ √ demand D is elastic for p > (1 + 2)/2 and inelastic for 0 < p < (1 + 2)/2; p = 1/2 (εD (1/2) = 1) 4.13 (a) local minima at P1 : (0, −5) and P2 : (2, −9), local maximum at P3 : (0.25; −4.98), global maximum at P4 : (−5; 1, 020); (b) global maximum at P1 : (3, 4), global minimum at P2 : (−5, −4); (c) local and global maximum at P1 : (0, 1); (d) local minimum at P1 : (4, 8), local maximum at P2 : (0; 0), global maximum does not exist; global minimum at P0 : (0, 0), global maximum at the endpoint of I , i.e. P1 : (5, 0.69) 4.14 local minimum at P1 : (1, 5/6), local maximum at P2 : (2, 2(2 − ln 2)/3 for a = −2/3, b = −1/6 4.15 (a) 1; (b) 1/4; (c) 1; (d) 1; (e) 1/6; (f) −1/6 4.16 (a) Df = R{2}, no zero, discontinuity at x0 = 2, local minimum at P1 : (−1/2, 1/5), inflection point at P2 : (−7/4, 13/45), lim f (x) = 1, f strictly (e) x→±∞ 494 Selected solutions decreasing on (−∞, −1/2] ∪ (2, ∞), f strictly convex for x ≥ −7/4 (b) Df = R {0, 1/2}, zero: x0 = 4/3, discontinuities: x1 = 0 and x2 = 1/2, lim f (x) = −3/2, f strictly decreasing on Df , f strictly convex for x > 1/2 x→±∞ (c) Df = R{0; 1}, zero: x0 = −1, discontinuities: x1 = 0 and x2 = 1, local minimum at P3 : (3.56, 8.82), local maximum at P4 : (−0.56, 0.06), inflection point at P5 : (−0.20, 0.02), lim f (x) = −∞, lim f (x) = ∞, f x→−∞ x→∞ strictly decreasing on [−0.56, 0) ∪ (1, 3.56], f strictly convex for x ≥ −0.2 (d) Df (x) = R, no zeroes, local maximum at P1 : (2, 1), inflection points at P2 : (1, e−1 ), P3 : (3, e−1 ), lim f (x) = 0, f strictly increasing on (−∞, 2], x→±∞ Selected solutions 495 f strictly concave for 1 ≤ x ≤ 3 (e) Df = {x ∈ R | x > 2}, no zero, local maximum at P1 : (4, −3 ln 2), inflection point at P2 : (6.83, −2.27), lim f (x) = −∞, lim f (x) = −∞, f strictly x→∞ x→2+0 decreasing for x ≥ 4, f strictly convex for x ≥ 6.83 (f) Df (x) = R, zeroes: x0 = 0, x1 = 2, local minimum at P2 : (0, 0), local maximum at P3 : (1.33, 1.06), inflection point at P4 : (2, 0), lim f (x) = ∞, x→−∞ lim f (x) = −∞, f strictly decreasing on (−∞, 0] ∪ (4/3, ∞), f strictly convex x→∞ for x ≥ 2 496 Selected solutions 4.17 (a) 2 π4 2 4 f (x) = 1 − π 32 (x000e − 2) + 6144 (x 000f− 2) + R5 , 6 6 R5 = − π6 · sin π4 (2 + λ(x − 2)) (x − 2) , 0 < λ < 1; 6! 4 n 2 3 4 f (x) = x − x2 + x3 − x4 + − . . . + (−1)n−1 · xn + Rn , (−1)n xn+1 Rn = , 0 < λ < 1; (λx + 1)n+1 (n + 1) (c) f (x) = 2x − 2x2 − 13 x3 + x4 + R4 , 5 R4 = x · e−λx [−41 sin(2λx) − 38 cos(2λx)], 0 < λ < 1 5! 4.18 T4 (− 15 ) = 1 + (− 15 ) + 1 (− 15 )2 + 1 (− 15 )3 + 1 (− 15 )4 = 0.81873 2! 3! 4! 4.19 (a) Newton’s method (b) xn 1 0 0.333333 0.339869 0.339877 (b) f (xn ) f 0010 (xn ) −3 +2 +0.037037 +4.29153 · 10−5 +4.29153 · 10−5 −3 −6 −5.66667 −5.65347 −5.65347 regula falsi xn 0 1 0.4 0.342466 0.339979 0.339881 0.339877 4.20 f (xn ) +2 −3 −0.336 −0.146291 −0.000577 −0.000023 −9.536 · 10−7 Newton’s method xn 5 4.5117973 4.5052428 4.505241496 f (xn ) 0.3905620 0.0051017 0.0000010 1.6 · 10−10 5 INTEGRATION 5.1 (a) esin x + C; (b) (c) (d) (e) − 45 ln |1 − 4x| + C; √ x2 + 1 + C; (f) (g) 1 2x x 2 ln(e + 1) − 2 arctan e + C; (h) 1 (ln x)2 + C; 2 1 e−3+2x + C; 2 1 √x2 + 1(x2 − 2) + C; 3 1 arcsin √3 x + C; 3 2 Selected solutions 497 1 (j) − − 1 − sin x + C; + C; sin x tan 2x 0010 0010 000e 000f x 1 x0010 1 0010 (k) ln 0010 tan 0010 + tan2 +C 4 2 2 2 5.2 (a) ex (x2 − 2x + 2) + C; (b) 12 ex (sin x + cos x) + C; (c) x tan x + ln | cos x| + C; (d) 12 (cos x sin x + x) + C; x3 x3 (e) + C; (f) x ln(x2 + 1) − 2x + 2 arctan x + C ln x − 9 3 5.3 (a) 3; (b) 4 − 2 ln 3; (c) 2/3; (d) 10/3; (e) 12 ln |2t − 1|; (f) 1/3; (g) π/4 (i) total cost C = 9, 349.99; total sales S = 15, 237.93; total profit P = 5, 887.94 (b) average sales Sa = 3, 809.48; average cost Ca = 2, 337.49; (c) P(t) = 10, 000e−t (−t 2 − 2t − 2) + 20, 000 − 1, 000(4t − 2 ln (et + 1) +2, 000 ln 2 5.5 (a) 4; (b) A = 32.75 5.6 (a) graph for q0 = 8, t0 = 4: 5.4 (a) (b) 0003 0004 2 + 48 ; q0 T 1 − T − 12T 300 5.7 (a) 0.784981; 5.8 (a) 1; (e) 4; (b) 0.783333; (c) (c) 1; λ > 0 ; 0; λ = 0 −∞ ; λ < 0 (f) does not exist (b) 1/2; 5.9 1,374.13 5.10 PS = 7; CS = 4. (c) x = T /2 0.785398; (d) exact value: π/4 2/λ2 ; λ > 0 0; λ = 0 −∞ ; λ < 0 498 Selected solutions 6 VECTORS ⎛ ⎞ ⎛ ⎞ 1 5 6.1 (a) a + b − c = ⎝ −5 ⎠, a + 3b = ⎝ −11 ⎠, −9 −7 ⎛ ⎞ ⎛ ⎞ −3 −7 b − 4a + 2c = ⎝ −4 ⎠, a + 3(b − 2c) = ⎝ −23 ⎠ ; 14 −43 (b) a > b, c ≥ a, c > b; (c) aT · b = 0, aT · c = 0, bT · c = −18; a,b orthogonal; a,c orthogonal; ∠(b, c) ≈ 126.3◦ ⎛ ⎞ ⎛ ⎞ 0 −36 T T (d) (a · b)c = ⎝ 0 ⎠, a(b · c) = ⎝ −18 ⎠; 0 18 √ √ √ √ √ (e) |b + c| = 29, |b| + |c| = 21 + 44, |bT · c| = 18, |b||c| = 21 44 6.2 α = β − 2, β ∈ R √ 6.3 (a) 19; (b) a ≥ b : M = {(a1 , a2 ) ∈ R2 | a1 ≥ b1 ∧ a2 ≥ b2 }; 0017 0017 |a| ≥ |b| : M = {(a1 , a2 ) ∈ R2 | a21 + a22 ≥ b21 + b22 } all the vectors are linear combinations of a1 and a2 ; (0, 0.5)T is a convex combination 6.5 a4 = 21 a1 + 14 a2 + 14 a3 . 6.4 6.6 6.7 no no; no basis 6.8 6 5q 4q 3q 2q Y H HH HH 1 q q q q q HqH q q q q q -4 -3 -2 -1 q H 1H j2 3 4 5 q yes ⎛ 6.9 ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ Selected solutions 499 3 1 0 1 3 ⎠ = 0 ⎝ 0 ⎠ + 3 ⎝ 1 ⎠ + 3 ⎝ 0 ⎠; −3 3 0 −1 (a) ⎝ (b) (c) bases: a, a1 , a2 and a, a1 , a3 ; b = 2a1 − a2 + a 7 MATRICES AND DETERMINANTS ⎛ ⎞ 3 1 T 2 ⎠, no equal matrices; 7.1 (a) A = ⎝ 4 1 −2 0003 0004 0003 0004 6 5 1 0 3 1 (b) A + D = , A−D = , 5 4 0 −3 0 −4 ⎛ ⎞ 0003 0004 1 0 −2 −1 −1 0 ⎠, C − D = AT − B = ⎝ 3 −5 −1 −1 2 −2 0003 0004 −9 1 −2 T (c) A + 3(B − 2D) = −20 −4 −14 ⎛ ⎞ ⎛ ⎞ 2 0 1.5 0 −1 −1.5 ⎝ ⎠ ⎝ 0 1 −0.5 1 0 −0.5 ⎠ 7.2 A = + 1.5 −0.5 −2 1.5 0.5 0 0003 0004 0003 0004 26 32 0 45 51 58 7.3 (a) AB = ; (b) AB = ; 23 27 1 25 18 32 ⎛ ⎞ −6 −9 −12 −15 ⎜ 12 18 24 30 ⎟ ⎟ (c) AB = 10; BA = ⎜ ⎝ −6 −9 −12 −15 ⎠; 4 6 8 10 ⎛ ⎞ 0003 0004 −8 4 −12 0 0 3 ⎠; (d) AB = ; BA = ⎝ 2 −1 0 0 6 −3 9 0003 0004 2x+3y+ z (e) AB = x+5y+2z ⎛ ⎞ ⎛ ⎞ 0003 0004 2 −4 −8 2 6 2 1 −1 2 ⎠· 4 −1 −3 ⎠ = (BA)T , 7.4 AT BT = ⎝ −1 =⎝ 3 0 −2 3 −6 −12 3 9 ⎛ ⎞ 0003 0004 0003 0004 2 −4 2 1 −1 0 0 2 ⎠= B T AT = · ⎝ −1 = (AB)T 3 0 −2 0 0 3 −6 7.5 (a) ACB = D(3,1) , ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ −1 −2 0 1 0 2 ⎜ 2 ⎟ ⎟ ⎝ −178 ⎠, (b) (AC)B = ⎝ −14 −12 7 −24 ⎠ · ⎜ ⎝ 0 ⎠= −10 9 5 18 154 7 ⎛ ⎞ ⎛ ⎞ 0003 0004 1 0 2 2 A(CB) = ⎝ 7 −4 ⎠ · = ⎝ −178 ⎠ 48 5 3 154 500 Selected solutions ⎛ 0 ⎜ 0 2 ⎜ 7.6 (a) A = ⎝ 0 0 (b) 7.7 (a) (b) 7.8 (a) (b) (c) (d) 7.9 (a) 0 0 0 0 ⎞ ⎛ 46 0 ⎜ 35 ⎟ ⎟, A3 = ⎜ 0 ⎝ 0 0 ⎠ 0 0 14 0 0 0 0 0 0 0 0 0 0 0 ⎞ 70 0 ⎟ ⎟, Ak = 0, k ≥ 4; 0 ⎠ 0 B2k = I , k ∈ N ; B2k−1 = B; k ∈ N R1 : 74,000 units and R2 : 73,000 units; 31 EUR each unit of S1 , 29 EUR each unit of S2 and 20 EUR each unit of S3 ; 275 EUR for F1 and 156 EUR for F2 . 0003 0004 0003 0004 −1 4 2 0 A12 = , A22 = ; 0 1 0 1 |A12 | = −1, |A22 | = 2; cofactor of a11 : −18, cofactor of a21 : −3, cofactor of a31 : 12; |A| = −33 −6, (b) 5, (c) 70, (d) 8, (e) 0, (f ) (−1)n−1 3n 7.10 (a) x = 2; (b) x1 = −2, x2 = 0 7.11 x1 = 2, x2 = 3, x3 = −5 7.12 the zero vector is the kernel of the mapping ⎛ ⎞ ⎞ ⎛ ⎛ x1 −1 4 u1 9 7.13 ⎝ x2 ⎠ ∈ R3 001c → ⎝ u2 ⎠ = ⎝ 10 −14 −10 x3 u3 ⎛ ⎞ −1/11 3/11 −3/11 −1 1/11 10/11 ⎠; 7.14 (a) A = ⎝ −4/11 4/11 −1/11 1/11 ⎛ ⎞ 2 −1 −1 1 ⎜ 0 1/2 1 −1/2 ⎟ ⎟ (d) D−1 = ⎜ ⎝ 5 −4 −3 2 ⎠ 2 −3/2 −1 1/2 ⎛ ⎞ 1 −2 5 21 −41 ⎜ 0 1 −2 −9 16 ⎟ ⎜ ⎟ −1 ⎜ 0 1 3 −5 ⎟ 7.15 A = ⎜ 0 ⎟ ⎝ 0 0 0 1 −2 ⎠ 0 0 0 0 1 0003 0004 −23 −37 1 7.16 (AB)−1 = B−1 A−1 = 17 −7 −12 ⎞⎛ ⎞ −1 x1 −1 ⎠ ⎝ x2 ⎠ ∈ R3 4 x3 ⎛ 1 −3/2 −1 1/2 (c) C = ⎝ 0 0 0 7.17 (a) (d) X = BT A−1 ; (b) X = B(A + 2I )−1 ; (c) X = C −1 AB−1 ; (e) X = [(A + 4I )C T ]−1 7.18 (b) x 001c −→ (I − A)x = y; 7.19 (a) q = (I − A)p ⎛ ⎜ ⎜ with A = (ai,j ) = ⎜ ⎜ ⎝ 0 0 0 0 0 (c) 3 0 0 0 0 0 0 0 0 0 no; 0 0 2 0 0 (d) 1 5 0 2 0 ⎞ ⎟ ⎟ ⎟; ⎟ ⎠ ⎞ −8 2 ⎠; −1 X = A−1 CB−1 ; y ∈ R4 001c −→ x = (I − A)−1 y ∈ R4 Selected solutions 501 (b) (c) − A)−1 q; p = (I r = Bp = B(I − A)−1 q ⎛ 1 with B = (bij ) = ⎝ 5 0 rT = ( 660, 1740, 0 1 0 3 0 0 0 0 0 ⎛ ⎞ 0 0 ⎠, 7 (I −A)−1 1 ⎜ 0 ⎜ =⎜ ⎜ 0 ⎝ 0 0 3 1 0 0 0 0 0 1 0 0 0 0 2 1 0 16 5 4 2 1 ⎞ ⎟ ⎟ ⎟, ⎟ ⎠ 70 ) 8 LINEAR EQUATIONS AND INEQUALITIES 8.1 8.2 8.3 8.4 8.5 8.6 8.7 (a) x1 = 1, x2 = 2, (c) no solution; x3 = 0; (b) no solution; (d) x1 = 5, x2 = 4, x3 = 1 = x2 = x3 = x4 = 0; = −9t, x2 = 11t, x3 = −3t, x4 = t, t ∈ R; = 3, x2 = −3, x3 = 1, x4 = −1; = −6 − 9t, x2 = 8 + 11t, x3 = −2 − 3t, x4 = t, t ∈ R 13 (a) x1 = 5 + 3t, x2 = 25 − 52 t, x3 = 13 2 + 2 t, x4 = t, t ∈ R; 6 5 2 t, t ∈ R x1 = 2 + 13 t, x2 = 5 − 13 t, x3 = t, x4 = −1 + 13 (b) x1 = 4 − 3t1 − 2t2 , x2 = −1 + 2t1 − t2 , x3 = t1 , x4 = 1 + 3t2 , x5 = t2 ; x1 = 6 + 2t1 − 7t2 , x2 = t1 , x3 = t2 , x4 = −2 − 3t1 + 6t2 , x5 = −1 − t1 + 2t2 ; t1 , t2 ∈ R 8a − 46 −3a + 12 7 a = 31 6 : x = 6a − 31 , y = 6a − 31 , z = 6a − 31 the cases ‘no solution’ and ‘unique solution’ do not exist for any λ. λ = 2 : x = − 31 z − 23 y, y, z ∈ R; λ = 2 : x = − 32 y, y ∈ R; z = 0 (a) no solution for: a = 0, b ∈ R or a = −1/3, b = 1; unique solution for: a = 0, b = 1; general solution for: a = −1/3, b = 1 (b) x1 = 2 + t, x2 = −t, x3 = −3, x4 = t, t ∈ R; (c) x1 = 6, x2 = 0, x3 = 1, x4 = 4 x = 1, y = 2, z = 3 for a = 0, |a| = |b|; x = 1, y = t, z = 5 − t, t ∈ R, for a = 0, b = 0; x = 4 − t, y = 2, z = t, t ∈ R, for a = 0, a = b; x = −2 + t, y = 2, z = t, t ∈ R, for a = 0, a = −b (a) (i) x1 (ii) x1 (b) (i) x1 (ii) x1 ker(A) = (−3t/2, −t/2, t)T , t ∈ R 8.9 system 1: x1 = 1, x2 = 2, x3 = 0; system 2: x1 = 5, x2 = 0, x3 = 3 ⎛ ⎞ −4 4 1 −1 ⎝ 1 −2 −1 ⎠; 8.10 (a) A = (c) 1 1 1 8.8 ⎛ C −1 2 ⎜ 0 ⎜ =⎝ 5 2 −1 1/2 −4 −3/2 −1 1 −3 −1 ⎞ 1 −1/2 ⎟ ⎟ 2 ⎠ 1/2 502 Selected solutions 0003 0004 −6 78 51 ; (b) 8.11 (a) X = 19 −11 −52 −40 0003 0004 −10 10 −2 (c) X = 15 −11 1 ⎛ 5.79 17.80 14.19 ⎜28.99 29.67 9.46 ⎜ 8.12 (b) X = ⎝ 11.59 23.73 18.52 11.59 35.60 14.19 8.13 a belongs to the set; b does not belong to it. 8.14 (b) 0003 (c) 4 0005 x y 0003 0004 = λ1 λi = 1, 0 0 0004 0003 + λ2 λi ≥ 0, 200 0 X = 15 0003 −11 2 8 −1 0004 ; ⎞ 10.18 30.54 ⎟ ⎟ 20.36 ⎠ 25.46 0004 0003 + λ3 0 240 0004 0003 0004 0003 + λ4 100 200 0004 , i = 1, 2, 3, 4; i=1 8.15 0003 x1 x2 4 0005 i=1 0004 0003 = λ1 λi = 1, 0 0 0004 0003 +λ2 λi ≥ 0, 0 3 0004 0003 +λ3 i = 1, 2, 3, 4, 1 5 +λ4 1 0 0004 µ1 , µ2 ≥ 0 0003 +µ1 1 1 0004 0003 +µ2 2 1 0004 , ⎛ 8.16 (a) ⎞ x1 ⎝ x2 ⎠ x3 ⎛ +λ5 ⎝ ⎛ (b) ⎞ x1 ⎝ x2 ⎠ x3 ⎛ + λ5 ⎝ Selected solutions 503 ⎞ ⎛ ⎞ 0 0 3/4 5/9 = λ1 ⎝ 0 ⎠ + λ2 ⎝ 0 ⎠ + λ3 ⎝ 0 ⎠ + λ4 ⎝ 7/9 ⎠ 0 3 0 0 ⎞ ⎛ ⎞ 0 0 6 0005 11/7 ⎠ + λ6 ⎝ 1 ⎠ , λi = 1, λi ≥ 0, i = 1, 2, . . . , 6; i=1 10/7 0 ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 3 0 0 0 = λ1 ⎝ 0 ⎠ + λ2 ⎝ 0 ⎠ + λ3 ⎝ 3 ⎠ + λ4 ⎝ 12/5 ⎠ 0 0 0 3/5 ⎞ 0 5 0005 0 ⎠, λi = 1, λi ≥ 0, i = 1, 2, . . . , 5 i=1 3/5 ⎛ ⎞ ⎛ ⎞ ⎛ 9 LINEAR PROGRAMMING 9.1 (a) (b) (c) 9.2 z infinitely many solutions = 10x1 x1 + 6x2 −3x1 x1 + + x2 x2 4x2 x1 , x2 → ≤ ≥ ≤ ≤ ≥ (d) max! 100 30 0 200 0 no optimal solution 504 Selected solutions 9.3 9.4 optimal solution: x1 = 80, x2 = 30; z = 980 (a) z¯ = −z = −x1 + 2x2 − x3 → max! = 7 s.t. x1 + x2 + x3 + x4 −3x1 + x2 − x3 + x5 = 4 xj ≥ 0, j = 1, 2, 3, 4, 5 (b) z¯ = −z = −x1∗ − 2x2 + 3x3 − x4∗ + x4∗∗ → max! s.t. −x1∗ + x2 − 12 x3 + 32 x4∗ − 32 x4∗∗ −x1∗ + 2x3 + x4∗ − x4∗∗ + x5 ∗ −2x1 − 2x3 + 3x4∗ − 3x4∗∗ + x6 x1∗ , x2 , x3 , x4∗ , x4∗∗ , x5 , x6 , x7 = 4 = 10 = 0 ≥ 0 (a) x1 = 8/5, x2 = 3/5; z = 11/5; (b) infinitely many optimal solutions: z = −11; 0003 0004 0003 0004 1 3 x=λ + (1 − λ) , 0≤λ≤1 5 4 (a) x1 = 0, x2 = 10, x3 = 5, x4 = 15; z = 155; (b) a (finite) optimal solution does not exist; 9.6 x1 = 80, x2 = 30, x3 = 20, x4 = 0, x5 = 210, x6 = 0; 9.7 (a) a (finite) optimal solution does not exist; (b) z = 24; ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ x1 5.5 10 ⎝ x2 ⎠ = λ ⎝ 4.5 ⎠ + (1 − λ) ⎝ 0 ⎠ , 0≤λ≤1 9.5 14 x3 9.5 9.8 z = 980; (c) the problem does not have a feasible solution. dual problem of problem 9.5 (a): dual problem of problem 9.7 (a): w = 400u5 + 30u6 + 5u7 → min! s.t. 20u5 + u6 ≥ 10u5 + u6 ≥ 12u5 + u6 + u7 ≥ 16u5 + u6 ≥ uj ≥ 0; j = 5, 6, 7 7 4 5 6 w = 10u5 + 4u6 → max! s.t −3u4 + u5 u4 + 2u5 + u6 4u4 + + 3u6 uj ≥ 0; j = 4, 5, 6 ≤ 1 ≤ 1 ≤ −1 Selected solutions 505 dual problem of 9.7(c): w = 4u5 + 9u6 + 3u7 → min! s.t. u5 + 2u6 u5 + 3u6 + u7 −u5 − u6 + 2u7 −u5 − 2u6 u5 ∈ R; u6 ∈ R; u7 ≤ 0 9.9 ≥ ≥ ≥ ≥ 2 −1 1 0 primal problem: x1 = 0, x2 = 9/2, x3 = 1/2, x4 = 17/2; z = −18; dual problem: u5 = 1, u6 = 4, u7 = 1; w = −18; (b) primal problem: x1 = 17, x2 = 0, x3 = 9, x4 = 3; z = 63; dual problem: u5 = 25/4, u6 = 3/4, u7 = 35/4; w = 63 (a) optimal solution of problem (P): x1 = 4, x2 = 2; z = 14; optimal solution of problem (D): u3 = 2/3, u1 = 1/3, u5 = 0; 9.11 (c) optimal solutions: (1) x1 = 0, x2 = 100, x3 = 50, x4 = 40; z = 190; (2) x1 = 50, x2 = 0, x3 = 100, x4 = 40; z = 190 9.10 w = 14 10 EIGENVALUE PROBLEMS AND QUADRATIC FORMS 0003 10.1 A: λ1 = 3, λ2 = −2, B: λ1 = 3 + i, C: λ1 = 1, λ2 = 3 − i, λ2 = 2, ⎛ ⎞ 1 x1 = t1 ⎝ 0 ⎠ , 1 D: λ1 = 1, (c) 10.3 (a) 1 1 0004 , λ3 = −1, ⎛ ⎞ 2 x2 = t2 ⎝ −1 ⎠ , 2 ⎛ λ2 = λ3 = 2, t1 , t2 , t3 ∈ R; t1 , t2 , t3 = 0 10.2 (a) (b) 0003 0004 1 x2 = t2 ; −4 0003 0004 0003 0004 1 1 , x 2 = t2 ; x1 = t1 1+i 1−i x1 = t1 ⎛ ⎞ 3 x3 = t3 ⎝ 1 ⎠ 2 ⎞ ⎛ 0 x1 = t1 ⎝ 1 ⎠ , 0 ⎞ 0 x2 = t2 ⎝ −2 ⎠; 1 λ1 = 1.1 greatest eigenvalue with eigenvector xT = (3a/2, a), a = 0; based on a production level x1 = 3a/2 and x2 = a (a > 0) in period t, it follows a proportionate growth by 10 per cent for period t + 1: x1 = 1.65a, x2 = 1.1a; x1 = 6, 000; x2 = 4, 000; subsequent period: x1 = 6, 600; x2 = 4, 400; two periods later: x1 = 7, 260; x2 = 4, 840 A: λ1 = 1, ⎛ λ2 = 3, ⎞ 1 ⎜ 0 ⎟ ⎟ x1 = t1 ⎜ ⎝ 0 ⎠, 0 λ3 = −2, ⎛ ⎞ 1 ⎜ 1 ⎟ ⎟ x2 = t2 ⎜ ⎝ 0 ⎠, 0 λ4 = 5, ⎛ ⎞ 4/15 ⎜ −2/5 ⎟ ⎟ x3 = t3 ⎜ ⎝ 1 ⎠, 0 ⎛ ⎞ 1/4 ⎜ 0 ⎟ ⎟ x4 = t4 ⎜ ⎝ 0 ⎠; 1 506 Selected solutions λ1 ⎛ = λ2 ⎞ = 4, ⎛ λ3 = ⎞ −2, ⎛ ⎞ 1 0 0 ; x1,2 = t1 ⎝ 0 ⎠ + t2 ⎝ 1 ⎠ , x3 = t3 ⎝ 1 ⎠ 0 1 −1 t1 , t2 , t3 , t4 ∈ R {0} 0003 0004 2 −1/2 10.4 xT Bx = xT Bs x with Bs = , Bs is positive definite −1/2 4 √ √ B: λ1,2 = (3 ± 17)/2, 10.5 (a) A: λ1,2 = 2 ± 2, √ D: λ1 = 1, λ2,3 = 3 ± 8; C: λ1 = −4, λ2 = 2, λ3 = 3, (b) A and D are positive definite, B and C are indefinite 10.6 (a) a1 = 3, a2 = 1, a3 ∈ R; (b) any vector x1 = (t, t, 0)T with t ∈ R, t = 0, is an eigenvector; (c) a3 > 1; (d) no B: 11 FUNCTIONS OF SEVERAL VARIABLES 11.1 (a) f (x1 , x2 ) = √ x 1 x2 surface in R3 isoquants (b) domains and isoquants (i) (ii) Df = {(x, y) ∈ R2 | x2 + y2 ≤ 9} Df = {(x, y) ∈ R | x = y} Selected solutions 507 (iii) Df = R 11.2 (b) fx = y2 x(y −1) , 2 fy = 2y x(y ) ln x; 1, (d) fx = 2x 1; fy = xy ln x + xy(x−1) ; fy = 2y 2 2 2 x1 , (e) fx = ex +y +z (2 + 4x2 ), (f) fx1 = 0017 2 x1 + x22 + x32 fy = 4xyex fz = 4xzex 11.3 (a) (b) 11.4 (a) (b) 2 +y 2 +z 2 2 +y 2 +z 2 , ; fx2 = 0017 fx3 = 0017 x2 x12 + x22 + x32 x3 , x12 + x22 + x32 1, 200, 000 32, 000, 000 , Cy = 800 − ; x2 y2 Cx (80) = − 67.5, Cx (120) ≈ 36.67, Cy (160) = − 450, Cx = 120 − fx1 x2 = fx1 x1 = fxx = fx2 x1 = 6x2 x33 , 6x1 − x1−2 , 2 4y , (1 − xy)3 fx1 x3 = fx3 x1 = 9x22 x32 , fx2 x2 = 6x1 x33 , 4x2 , fyy = (1 − xy)3 Cy (240) ≈ 244.5 fx2 x3 = fx3 x2 = 18x1 x2 x32 , fx3 x3 = 18x1 x22 x3 − x3−2 ; 2xy + 2 fxy = fyx = ; (1 − xy)3 4xy −2(x2 + y2 ) , fyy = fxx , fxy = fyx = (x2 − y2 )2 (x2 − y2 )2 T T grad f (1, 2) = (a, b) ; (a) grad f (1, 0) = (a, b) , (b) grad f (1, 0) = (2, 1)T , grad f (1, 2) = (3, 3.6)T ; T (c) grad f (1, 0) = (−1/2, 0) , grad f (1, 2) = (−1/2, −1)T (c) 11.5 2 fx = 2x sin2 y, fy = 2x2 sin y cos y; (c) fx = yx(y−1) + yx ln y, (a) fxx = 11.7 the direction of movement is −grad f (1, 1) = (−1.416, −0.909)T (a) dz = 1y cos yx dx − x2 cos yx dy; (b) dz = (2x + y2 ) dx + (2xy + cos y) dy; y 2 2 (c) dz = (2x dx + 2y dy)ex +y ; (d) dz = 1x dx + 1y dy 11.8 surface: S = 28π, 11.6 absolute error: 4.08, relative error: 4.6 per cent 508 Selected solutions dz = 2x ex2 dx1 + x2 ex2 dx2 ; 1 1 dt dt dt 0010 x 2 (b) (i) z = 2x1 e 2t + x12 ex2 2t = 6t 5 , 0012 0011 2 (ii) z 0010 = 2x1 ex2 2t + x12 ex2 2t = 8et 1t + t ln t ln t; (c) (i) z = t 6 ; z 0010 = 6t 5 , 2 2 (ii) z = (ln t 2 )2 et , z 0010 = 8et ( 1t + t ln t) ln t √ 5 1 3√ ∂f ∂f ∂f 11.10 ; (a) = − (a) = − 2; (a) = 2 4 2 ∂r1 ∂r2 ∂r3 ∂C (P ) = 13.36; 11.11 (a) grad C(3, 2, 1) = (8, 6, 10)T , ∂r 0 1 percentage rate of cost reduction: 5.52 per cent; (b) The first ratio is better or equal 11.12 ρf ,x1 = 0.002; ρf ,x2 = 0.00053; εf ,x1 = 0.2; εf ,x2 = 0.079 11.13 (a) partial elasticities: 2 x1 (3x1 2 + 2x2 2 ) , εf ,x1 = εf ,x2 = x23(4x1 x2 + 23x2 ) 3 3 2 3 2(x1 + 2x1 x2 + x2 ) 2(x1 + 2x1 x2 + x2 ) f homogeneous of degree r = 3/2, r > 1; (b) f is not homogeneous 11.9 (a) y(yxy − 1 + 2x2 y) y0010 = 0018 bx ; (b) y0010 = 3x cos 3x2− sin 3x ; (c) y0010 =− 2 2 x(yxy ln x − 1 + yx2 ) x a x −a 0004 0003 x y resp. ϕ = arccos 2 11.15 |J | = r; for r = 0 : r = x2 + y2 , ϕ = arctan x x + y2 11.16 local maximum at (x1 , y1 ) = (1/2, 1/3); no local extremum at (x2 , y2 ) = (1/7, 1/7) 11.17 (a) local minimum at (x1 , y1 ) = (0, 1/2) with z1 = −1/4; (b) local minimum at (x1 , y1 ) = (1, ln 43 ) with z1 = 1 − ln 43 11.14 (a) stationary point: (x0 , y0 ) = (100, 200); local minimum point with C(100, 200) = 824, 000 11.19 local minimum point: x1 = (1, 0, 0), x2 = (1, 1, 1), x3 = (1, −1, −1) are not local extreme points 11.20 no local extremum 11.21 stationary point x = (30, 30, 15) is a local maximum point with P(30, 30, 15) = 26, 777.5 11.22 (a) y = 10.05x − 28.25; (b) y(18) = 152.64, y(36) = 333.5 11.23 P1 : maximum; P2 : minimum 11.18 Selected solutions 509 (a) local maximum at (x1 , y1 ) = (4, 0) with z1 = 16; (b) local minimum at (x11 , x21 , x31 ) = (−1/12, 37/12, −30) with z1 = −73/48 11.25 local maximum point; values of the Lagrangian multipliers: λ1 = −13 and λ2 = −16 11.24 11.26 11.27 11.28 11.29 11.30 11.31 length = breadth = 12.6 cm, height = 18.9 cm local and global maximum of distance Dmax = 84/9 √ √ at (x1 , y1 ) = (− 32 , 32 5), (x2 , y2 ) = (− 32 , − 32 5); local minimum of distance D1 min = 3 at point (x3 , y3 ) = (−1, 0); local and global minimum of distance D2 min = 1 at point (x4 , y4 ) = (1, 0) stationary point (x1 , x2 , x3 ; λ) = (25, 7.5, 15; 5); local minimum point with C(25, 7.5, 15) = 187.5 136/3 32/3 3/8 12 DIFFERENTIAL EQUATIONS AND DIFFERENCE EQUATIONS 12.1 12.2 (a), (c) (b) y P = ex (a) y = ln |ex + C|; x (b) y2 = 1 + 2 ln 1 +2 e 12.3 ky · (x − 1) − y + 1 = 0 12.4 general solution y = Cxx , 12.5 y = Ct 100 e−t/2 2 particular solution yP = xx 2 12.6 The functions y1 , y2 , y3 form a fundamental system; y = C1 x + C2 x ln x + C3 1x 12.7 (a) y = Ce2x − 15 cos x − 52 sin x; (b) yP = ex (1 + x); 12.8 y = C1 e−x + C2 ex/2 + ex ; (d) y = −ex cos 3x + x2 + 2.2 x + 1 2 1 (a) y = C1 + e−x (C2 cos 2x + C3 sin 2x) + cos x + sin x; 5 5 1 (b) y = C1 ex + C2 e−2x + C3 xe−2x + xex 3 (c) 510 Selected solutions 12.9 12.10 (a) y1 = C1 eax cos x+ y2 = C1 (a) yt = eax sin x+ (1 + b)2t − b; C2 eax sin x (b) y1P C2 eax (b) cos x y2P y3P = 3e2x = −2e2x = e2x +9 +8e−x −16e−x + 9 strictly increasing for b > −1; (c) 12.11 (a) pt+1 = −2pt +12; (b) pt = −(−2)t +4 with p0 = 3, p1 = 6, p2 = 0, . . .; (c) 0003 0004 2 2 1 2 yt = 2t C1 cos πt + C2 sin πt ; (b) ytP = 4t − (−2)t − t − ; 3 3 3 3 1 (c) yt = C1 (−1)t + C2 t(−1)t + · 4t 100 12.13 (a) yt+2 = 1.4yt+1 + 0.15yt ; (b) yt = C1 (1.5)t + C2 (−0.1)t ; (c) 1,518.76 units 0005 t−i−1 ; 12.14 (a) yt = 3t y0 + t−1 (b) yt = 3t y0 + 3t − t − 1 i=1 (2i + 1)3 12.12 (a) Literature Anthony, M. and Biggs, N., Mathematics for Economics and Finance, Cambridge: Cambridge University Press, 1996. Bronstein, I.N. and Semandjajew, K.A., Taschenbuch der Mathematik, twenty-fifth edition, Stuttgart: Teubner, 1991 (in German). Chiang, A.C., Fundamental Methods of Mathematical Economics, third edition, New York: McGraw-Hill, 1984. Dück, W., Körth, H., Runge, W. and Wunderlich, L. (eds), Mathematik für Ökonomen, Berlin: Verlag Die Wirtschaft, 1979 (in German). Eichholz, W. and Vilkner, E., Taschenbuch der Wirtschaftsmathematik, second edition, Leipzig: Fachbuchverlag, 2000 (in German). Kalischnigg, G., Kockelkorn, U. and Dinge, A., Mathematik für Volks- und Betriebswirte, third edition, Munich: Oldenbourg, 1998 (in German). Luderer, B. and Würker, U., Einstieg in die Wirtschaftsmathematik, Stuttgart: Teubner, 1995 (in German). Mizrahi, A. and Sullivan, M., Mathematics. An Applied Approach, sixth edition, New York: Wiley, 1996. Mizrahi, A. and Sullivan, M., Finite Mathematics. An Applied Approach, seventh edition, New York: Wiley, 1996. Nollau, V., Mathematik für Wirtschaftswissenschaftler, third edition, Stuttgart and Leipzig: Teubner, 1999 (in German). Ohse, D., Mathematik für Wirtschaftswissenschaftler I–II, third edition, Munich: Vahlen, 1994 (in German). Opitz, O., Mathematik. Lehrbuch für Ökonomen, Munich: Oldenbourg, 1990 (in German). Rommelfanger, H., Mathematik für Wirtschaftswissenschaftler I–II, third edition, Hochschultaschenbücher 680/681, Mannheim: B.I. Wissenschaftsverlag, 1994 (in German). Rosser, M., Basic Mathematics for Economists, London: Routledge, 1993. Schmidt, V., Mathematik. Grundlagen für Wirtschaftswissenschaftler, second edition, Berlin and Heidelberg: Springer, 2000 (in German). Schulz, G., Mathematik für wirtschaftswissenschaftliche Studiengänge, Magdeburg: Otto-vonGuericke-Universität, Fakultät für Mathematik, 1997 (in German). Simon, C.P. and Blume, L., Mathematics for Economists, New York and London: Norton, 1994. Sydsaeter, K. and Hammond, P.J., Mathematics for Economic Analysis, Englewood Cliffs, NJ: Prentice-Hall, 1995. Varian, H.R., Intermediate Microeconomics. A Modern Approach, fifth edition, New York: Norton, 1999. Werner, F., Mathematics for Students of Economics and Management, sixth edition, Magdeburg: Otto-von-Guericke-Universität, Fakultät für Mathematik, 2004. Index 000b-neighbourhood of point 387 n-dimensional space 231 nth derivative 163 nth partial sum 71 absolute value 37 amortization installment 90, 93 amortization table 93 amount of annuity 85 annuity 85, 90; ordinary 85 antiderivative 197 apex 129 approximation: by rectangles 215; by trapeziums 215 Argand diagram 49 argument 117 artificial variable 338 augmented matrix 291 auxiliary objective function 350 back substitution 301 basic solution 293 basis of vector space 245 Bernoulli–l’Hospital’s rule 178 binomial coefficient 28 bordered Hessian 426 break-even point 172 canonical form 293, 339 Cartesian product 24, 25 Cauchy–Schwarz inequality 237 chain rule 160, 398 characteristic equation 369 Cobb–Douglas production function 383, 404 Cobb–Douglas utility function 431 cobweb model 476 coefficient 288 coefficient of the polynomial 126 cofactor 265 column vector 230 complex number 47 component of vector 230 composite mapping 114, 272 composition 114 composition of relations 108 conclusion 3, 9 conjunction 2 constant-factor rule 199 constrained optimization problem 424 consumer surplus 225 continuous future income flow 222 contradiction 1, 5 convex combination 240, 310 convex polyhedron 334 coordinate of vector 230 cosine function 141 cotangent function 141 Cramer’s rule 269 criterion: Leibniz 77; quotient 78; root 79 critical point 170 debt 90 definite solution 448 degeneration case 351 degree of freedom 292 demand-price-function 172 dependent variable 117 deposits 85 depreciation: arithmetic-degressive 101; degressive 101; digital 102; geometric-degressive 102; linear 101; table 101 derivative 156; directional 399; partial 387; second 163 determinant: Casorati’s 473; Wronski’s 452 determinant of matrix 264 difference equation: linear 472, of the first order 474, of second order 478 difference of vectors 233 difference quotient 155 difference set 17 differential equation 444; homogeneous 451; non-homogeneous 451; ordinary 444; with separable variables 447 differential of function 164 differential quotient 156 514 Index dimension of a vector space 245 dimension of matrix 255 direction field 445 disjunction 2 domain of the function 110, 117 domain of the mapping 110 double integral 436 downward parabola 128 dual problem 358 duality 357; economic interpretation 361 effective rate of interest 83 eigenvalue 368 eigenvalue equation 369 eigenvector 368 elasticity 166, 183 elementary transformation 292 empty set 16 entering variable 342 equal matrices 256 equal vectors 232 equivalence 4 equivalent transformation 292 Euclidean distance 236 Euler’s theorem 404 extreme point 310 factor of the polynomial 131 Falk’s scheme 260 feasible region 309 first-derivative test 170 first-order differential equation 445 first-order partial derivative 390 forcing term 451 function 110; algebraic 136; antisymmetric 124; arccosine 143; arccotangent 143; arcsine 143; arctangent 143; bounded 123; bounded from: above 123, below 123; circular 140; complementary 453; concave 125, 175; constant 126; continuous 151, 387; continuously differentiable 156; convex 125, 175; cubic 126; decreasing 121, 168; differentiable 156; elastic 167; even 124; exponential 137; homogeneous of degree k 403; implicitly defined 405; increasing 121, 168; inelastic 167; inside 120; left-continuous 154; linear 126; logarithmic 138; non-decreasing 121, 168; non-increasing 121, 168; odd 124; outside 120; periodic 125; propositional 7; quadratic 126; rational 134, improper 134, proper 134; right-continuous 154; strictly concave 125, 175; strictly convex 125, 175; strictly decreasing 121, 168; strictly increasing 121, 168; symmetric 124; trigonometric 140 function of a real variable 117 fundamental system of the differential equation 453 fundamental theorem of algebra 130 Gauss–Jordan elimination 293 Gaussian elimination 293, 299 general solution of the differential equation 445, 448, 453 general solution of the system of linear equations 289 generalized chain rule 398 geometric interpretation of an LPP 330 Gini coefficient 224 global maximum 169, 410 global maximum point 169, 410 global minimum 169, 410 global minimum point 169, 410 global sufficient conditions 434 gradient of function 392 Hessian matrix 411 higher-order derivative test 171 higher-order partial derivative 391 Horner’s scheme 132 hypothesis 3 identity matrix 257 imaginary part of the complex number 47 implication 3 implicit-function theorem 408 independent variable 117 indeterminate form 178 inflection point of function 176 initial value problem 448 inner product 235 input–output model 277 integral: definite 210; improper 219, 221; indefinite 198 integrand 198 integration by parts 204 integration by substitution 200 interest 80; compound 81; simple 80 inverse 273 inverse demand function 172 inverse element 244 investment project 97 isoquant 384, 445 Jacobian determinant 407 Kepler’s formula 217 kernel 271 Lagrange multiplier method 425 Lagrange’s theorem 425 Lagrangian function 425 Lagrangian multiplier 425 Index 515 law: associative 6, 19, 235, 259; commutative 6, 19, 235, 259; distributive 6, 19, 235, 259; of de Morgan 6 leading principal minor 379 leaving variable 342 length of vector 236 Leontief model 277 limit of sequence 65 limit of function 148; left-side 149; right-side 149 linear combination 240 linear differential equation of order n 451 linear objective function 329 linear programming problem 329 linear space 244 linear substitution 200 linearly dependent vectors 241 linearly independent vectors 241 loan: amortized 90 loan repayments 90 local maximum 169, 410 local maximum point 169, 410, 423 local minimum 169, 410 local minimum point 169, 410, 423 local sufficient condition 426 logarithmic differentiation 162 Lorenz curve 224 mapping 110; bijective 112; identical 116, 273; injective 112; inverse 114; linear 271; surjective 112 marginal 156 marginal cost 213, 389 marginal function 156 marginal propensity to consume 159 market price 225 matrix 255; antisymmetric 256; diagonal 257; indefinite 377; inverse 273; invertible 273; lower triangular 257; negative definite 377; negative semi-definite 377; orthogonal 263; positive definite 377; positive semi-definite 377; symmetric 257; upper triangular 257 matrix difference 258 matrix product 260 matrix representation 288 matrix representation of an LPP 330 matrix sum 258 mean-value theorem 184, 214 method: of undetermined coefficients 456 minor 265 mixed LPP 360 monopolist 172 monotonicity of function 168 mortgage 94 multiplier-accelerator-model 480 necessary first-order conditions 411 negation 2 negative integer 32 neutral element 244, 258, 261 Newton’s method 189 Newton’s method of second order 190 Newton–Leibniz’s formula 210 non-negativity constraint 309, 329 norm 236 number: irrational 32; natural 32; rational 32; real 32 objective row 340 one-parametric set of solutions 299 one-to-one mapping 112 onto-mapping 112 operation: logical 1 optimal solution 330 optimality criterion 341 optimization by substitution 425 order of matrix 255 order of the differential equation 444 orthogonal vectors 239 parabola 128 partial differential 394 partial elasticity 403 partial rate of change 402 particular solution 448 Pascal’s triangle 29 payment: annual 85; periodic 85 period of function 125 periods for interest 85 permutation 26 pivot 300, 342 pivot column 342 pivot element 300, 342 pivot row 342 pivoting procedure 294, 339 polar form of complex number 49 pole of second order 222 polynomial 126 polynomial function 126 power function 136 power set 16 premises 9 present value of annuity 86 price–demand function 195 primal problem 358 principal 80 producer surplus 226 production function 383 profit function 416 proof: by induction 13; direct 10; indirect 10, of contradiction 10, of contrapositive 10 proportional rate of change 166 516 Index proposition: compound 1, 5; existential 8; false 1; open 7; true 1; universal 8 Pythagorean theorem 238 quadratic form 376; indefinite 377; negative definite 377; negative semi-definite 377; positive definite 377; positive semi-definite 377 radian 141 range of the function 110, 117 range of the mapping 110 rank of matrix 290 rate of interest 80 real part of complex number 47 rectangle formula 248 rectangle rule 297 redemption table 90, 94 regula falsi 191 relation: binary 107; inverse 108; reflexive 107; symmetric 107 remainder 130 remainder theorem 130 rentability 98 return to scale 403; decreasing 404; increasing 404 Riemann integral 210 right-hand side 288 right-hand side vector 330 Rolle’s theorem 184 root function 136 root of the function 128 row vector 230 saddle point 412 Sarrus’ rule 265 scalar multiplication 233, 258 scalar product 235 sequence 61; arithmetic 62; bounded 65; decreasing 64; geometric 63; increasing 64; strictly decreasing 64; strictly increasing 64 series 73; alternating 77; geometric 75; harmonic 74 set 15; cardinality 15; complement 17; convex 310; disjoint 17; finite 15; infinite 15; intersection 17; union 16 set of feasible solutions 309 set of solutions 289 shadow price 361 short form of the tableau 341 simplex algorithm 343 simplex method 339 Simpson’s formula 218 sine function 141 slack variable 315, 337 smallest subscript rule 344 solution of systems of linear equations 289 solution of systems of linear inequalities 309; degenerate 314; feasible 309; non-degenerate 314 solution of the differential equation 445 solutions: linearly dependent 452, 473; linearly independent 452, 473 square matrix: non-singular 269; regular 269; singular 269 standard form of an LPP 336 stationary point 170, 411 Steinitz’s procedure 248 straight line 127 subset 16 sufficient second-order conditions 411 sum of the infinite series 73 sum of vectors 233 sum–difference rule 199 surface 384 surplus variable 337 system: consistent 291; homogeneous 289, 302; inconsistent 291; non-homogeneous 289 system of linear equations 287 system of linear inequalities 308 tangent function 141 tautology 5 Taylor polynomial 187 Taylor’s formula 187 total differential 394 transition matrix 262 transpose of matrix 256 transposed vector 230 trivial solution 292 truth table 2 unit vector 232 unknown 288 upward parabola 129 utility function 431 variable 288; basic 293; non-basic 293 vector 230 vector space 244, 259 Venn diagram 17 Vieta’s theorem 133 withdrawals 86 Young’s theorem 392 zero element 244 zero matrix 258 zero of the function 128 Advanced Mathematical Economics Rakesh V. Vohra, Northwestern University, USA As the intersection between economics and mathematics continues to grow in both theory and practice, a solid grounding in mathematical concepts is essential for all serious students of economic theory. In this clear and entertaining volume, Rakesh V. Vohra sets out the basic concepts of mathematics as they relate to economics. The book divides the mathematical problems that arise in economic theory into three types: feasibility problems, optimality problems and fixed-point problems. Of particular salience to modern economic thought are the sections on lattices, supermodularity, matroids and their applications. In a departure from the prevailing fashion, much greater attention is devoted to linear programming and its applications. Of interest to advanced students of economics as well as those seeking a greater understanding of the influence of mathematics on ‘the dismal science’, Advanced Mathematical Economics follows a long and celebrated tradition of the application of mathematical concepts to the social and physical sciences. Series: Routledge Advanced Texts in Economics and Finance November 2004: 208pp Hb: 0-415-70007-8: £65.00 Pb: 0-415-70008-6: £22.99 eB: 0-203-79995-X: £22.99 Available as an inspection copy Routledge books are available from all good bookshops, or may be ordered by calling Taylor & Francis Direct Sales on +44 (0)1264 343071 (credit card orders). For more information please contact David Armstrong on 020 7017 6028 or email [email protected] 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN www.routledge.com/economics available from all good bookshops <∞>∞>∞='>
Download Book Student Solutions Manual For Mathematics For Economics in PDF format. You can Read Online Student Solutions Manual For Mathematics For Economics here in PDF, EPUB, Mobi or Docx formats.
ELEMENTARY MATHEMATICS FOR ECONOMICS Catering the need of Second year B.A./B.Sc. Students of Economics (Major) Third Semester of Guwahati and other Indian Universities. A cook-book of mathematics. Downloadable as a PDF file, it has four chapters (Linear algebra, Calculus, Constrained Optimization. Mathematics for economists is a course webpage produced by Dieter Balkenborg of the University of. Applied Microeconomics Consumption, Production and Markets This is a microeconomic theory book designed for upper-division undergraduate students in economics and agricultural economics. This is a free pdf download of the entire book. As the author, I own the copyright. Amazon markets bound. AN INTRODUCTION TO MATHEMATICS FOR ECONOMICS Download An Introduction To Mathematics For Economics ebook PDF or Read Online books in PDF, EPUB, and Mobi Format. Click Download or Read Online button to AN INTRODUCTION TO MATHEMATICS FOR ECONOMICS book pdf for free now.
Mathematics For Economics
Author :Michael Hoy ISBN :0262582015 Genre :Mathematics File Size : 66.93 MB Format :PDF, Mobi Download :989 Read :908
Mathematics For Economics And Business Ian Jacques Free Download Pdf
This text offers a presentation of the mathematics required to tackle problems in economic analysis. After a review of the fundamentals of sets, numbers, and functions, it covers limits and continuity, the calculus of functions of one variable, linear algebra, multivariate calculus, and dynamics.
Study Guide And Student Solutions Manual For Mathematical Analysis For Business Economics And The Life And Social Sciences Fourth Edition
Author :Jagdish C. Arya ISBN :0135611768 Genre :Business mathematics File Size : 80.56 MB Format :PDF, ePub Download :200 Read :809
Solutions Manual For Introduction To The Economics And Mathematics Of Financial Markets
Author :Jakša Cvitanić ISBN :026253259X Genre :Business & Economics File Size : 60.18 MB Format :PDF Download :827 Read :
Essential Mathematics For Economics And Business 3rd Edition Pdf Free Download
462
This textbook in financial economics provides a rigorous overview of the subject that -- because of an innovative presentation -- is suitable for use with different levels of undergraduate and graduate students. Each chapter presents mathematical models of financial problems at three levels of sophistication: single-period, multi-period, and continuous-time. The single-period and multi-period models require only basic calculus and an introductory probability/statistics course, while an advanced undergraduate course in probability is helpful in understanding the continuous-time models. In this way the material is given complete coverage at different levels; the less advanced student can stop before the more sophisticated mathematics and still be able to grasp the general principles of financial economics.
Summary: 'Further Mathematics for Economic Analysis is a companion volume to the successful and highly regarded Essential Mathematics for Economic Analysis. It finds the right balance between mathematics and economic examples, providing a text that is demanding in level and broad ranging in content, whilst remaining accessible and interesting to its target audience.This book is intended for advanced undergraduate and graduate students of economics whose mathematical requirements go beyond the material usually taught in undergraduate courses.'--Publisher description.
Mathematics For Economics And Finance
Author :Michael Harrison ISBN :9781136819223 Genre :Business & Economics File Size : 74.17 MB Format :PDF, ePub, Docs Download :536 Read :999
The aim of this book is to bring students of economics and finance who have only an introductory background in mathematics up to a quite advanced level in the subject, thus preparing them for the core mathematical demands of econometrics, economic theory, quantitative finance and mathematical economics, which they are likely to encounter in their final-year courses and beyond. The level of the book will also be useful for those embarking on the first year of their graduate studies in Business, Economics or Finance.
Student Solutions Manual For Statistics For Management And Economics Ninth Edition
Author :Gerald Keller ISBN :9781111531881 Genre :Economics File Size : 50.76 MB Format :PDF, Kindle Download :125 Read :166
This manual contains worked-out solutions to selected problems in the text, showing students step-by-step how to complete exercises.
Mathematics For Economics And Business
Author :Ian Jacques ISBN :0273701959 Genre :Business & Economics File Size : 34.94 MB Format :PDF, ePub, Docs Download :454 Read :866
'clear logical patient style which takes the student seriously' John Spencer, formerly of Queen's University Belfast This market leading text is highly regarded by lecturers and students alike and has been praised for its informal, friendly style which helps students to understand and even enjoy their studies of mathematics. Assuming little prior knowledge of the subject, 'Mathematics for Economics and Business' promotes self-study encouraging students to read and understand topics that can, at first, seem daunting. This text is suitable for undergraduate economics, business and accountancy students taking introductory level maths courses. Key Features: - Includes numerous applications and practice problems which help students appreciate maths as a tool used to analyse real economic and business problems. - Solutions to all problems are included in the book. - Topics are divided into one- or two-hour sessions which allow students to work at a realistic pace. - Techniques needed to understand more advanced mathematics are carefully developed. - Offers an excellent introduction to Excel and Maple. New to this edition: - Brand new companion website containing additional material for both students and lecturers. - New appendices on Implicit Differentiation and Hessian matrices for more advanced courses. Ian Jacques 'was formerly a senior lecturer in the School of Mathematical and Information Sciences at Coventry University, and has considerable experience of teaching mathematical methods to students studying economics, business and accountancy.
The third edition of Mathematics for Economists features new sections on double integration and discrete-time dynamic programming, as well as an online solutions manual and answers to exercises.
Student S Solutions Manual For College Mathematics For Business Economics Life Sciences And Social Sciences
Author :Raymond A. Barnett ISBN :0321946774 Genre :Mathematics File Size : 75.67 MB Format :PDF, ePub Download :639 Read :552
This manual contains detailed, carefully worked-out solutions for all the odd-numbered section exercises and all Chapter Review exercises. Each section begins with Things to Remember, a list of key material for review.
This manual provides solutions to approximately 500 problems appeared in various chapters of the text Principles of Mathematical Economics. In some cases, a detailed solution with the additional discussion is provided. At the end of each chapter, new sets of exercises are given.