Ke-W ^ Hntt QloUcgc of AgtiCttUure ^t (dorncll Inmerattg STANFORD UNIVERSITY PUBLICATIONS SCHOOL ^OF EDUCAltlON SPECIAL MONOGRAPH NO. 1 Chart to Facilitate the Calculation of Partial Coefficients of Correlation and Regression Eqixations STANFORD UNI VERS IT Y PUBLICATIONS SCHOOL OF EbU CATION , , SPECIAL MONOGRAPHS No. 1. Chart To Facilitate the Calculation of Partial Coeffi- cients OF CqRrela'tion AND REGRESSION EQUATIONS. Trutnari;, L.i Kelley, Professbr of Education. 24 pip. 7 x 10 chart., Price, Alignment Chart of Correlation Functions, 17x23, a Supple- ment to, this Monograph, $i.OO. , STANFORD UNIVERSITY PUBLICATIONS SCHOOL OF EDUCATION SPECIAL MONOGRAPH NO. 1 Chart to Facilitate the Calculation of Partial Coefficients of Correlation and Regression Equations BY TRUMAN L. KELLEY Professor of Education STANFORD UNIVERSITY, CALIFORNIA PUBLISHED BY THE UNIVERSITY 1921 Cornell University Library The original of this book is in the Cornell University Library. There are no known copyright restrictions in the United States on the use of the text. http://www.archive.org/details/cu31924013998517 CONTENTS PAGE Preface 5 Alignment Chart of Correlation Functions . . Facing 6 Section I. The Meaning of Partial Correlation . 7 Section II. Formulas Involved in Multiple Correlation . 14 Section III. The Use of the Chart 19 PREFACE In 1916 I published, as Bulletin 27 of the University of Texas, "Tables : To Facilitate the Calculation of Partial Coefficients of Corre- lation and Regression Equations." The present work, in terms of its purpose but not of its method, constitutes %l second edition of that work. The preface to the "Tables" is as follows : The regression equation method has been so laborious, as well as involving such accuracy in and knowledge of statistical method, that it has not been used in many studies in which it alone could evaluate the data in such a manner as to answer the questions involved. It is hoped that the tables here presented have so materially decreased the labor of calculation that the method will be used extensively. If this has been accomplished a second edition will be demanded, and if such is called for two important improvements may be expected : first, that the tables be carried at least two decimal places further and, second, that entries for at least one of the variables be for every .001 instead of as at present for every .01. The shortcomings mentioned are well recognized. The author would be glad to hear of any others discovered by users. That the "Tables" met a real need has been proven by the early exhaustion of the edition and by the insistent demands for republication. However, certain shortcomings have made it undesirable to republish. These are, first, the labor involved in double interpolation and, second, the presence of cumulative inaccuracies in case the work is not carried to a sufficient number of decimal places and in case several variables are in- volved. These defects would only in part be remedied by extending the tables as suggested in the first preface. A final and decisive reason for not reprinting has been the discovery of the adaptability of the alignment chart to regression equation calculation. I have tried the following methods of calculation of regression coef- ficients: (a) slide rule; (b) logarithmic; (c) by use of my Tables ; (d) by use of the Chart here given; (e) by use of determinants; and (/) by use of successive approximations which more and more nearly approach the true values. (Method (/) will be described in detail in a forthcoming treatise.) I have found that the method which proves the best, depends upon the number of variables involved and in certain cases upon the degree of accuracy required. In general, if there are three or four variables, the Chart method will serve to advantage (also for five or six variables, if an inaccuracy in the regression coefficients in the neighborhood of .03 is not prohibitive) ; if five or six variables, the determinant method (utilizing a calculating machine rather than logarithms) will be most accurate and satisfactory; and if over six variables, the convergent series method of PREFACE approximation will be the most expeditious and will result, by carrying the work the required number of steps, in any desired reliability. The chart herewith described is accordingly especially recommended for use in problems involving three or four variables. A new notation and procedure for the calculation of partial regression coefficients are used which materially simplify both the theoretical treat- ment and the arithmetical calculation. The notation includes three new symbols, z , k , and P , and the procedure permits the calculation of the regression coefficient of a given order by means of regression coefficients of an order one lower, thus entirely obviating the calculation of partial correlation coefficients and partial standard deviations. Alignment Chart of Correlation Functions -1000 990- 980- 970- 960- 950- -900 -600 -700 -600 -500 -300 900- -: — 200 850- 800- 700- 150 600-^: 500- 400- 300 — r 200 -3; 100^ Product 100— g 900—1 800- 700- 600- 500- 400- -100 300—^ 200- 150— i 100 -J 900- 800- 700- 600-^ 500- 400- 300- 200- 150 — ; 100- 1000- 900- 800- 700- —950 940 600- 500- Z^-900 400- 300- 200- 150- 100- -930 995 -994 -920 -910 -850 -800 -700 -600 -500 -300 -200 =100 -992 -990 -980 -970 -960 -950 THE MEANING OF PARTIAL CORRELATION Section I. The Meaning of Partial Correlation It is assumed that the meaning of the Pearson product-moment coef- ficient of correlation is well known to the reader and that the following symbols require no further exposition : Xi is the magnitude of the first, the dependent, variable. Xj is the magnitude of the second variable. Zj is the magnitude of the third variable, etc. ^1 is the magnitude of the first variable expressed as a deviation from its own mean. x^ , x^ , etc., have similar meanings for the second, third, etc., variables. Oj is the standard deviation of the ^^'s, o^ of the x^'s, etc. rjj is the Pearson product-moment coefficient of correlation between variables 1 and 2 ; accordingly '^x^x^ r's with other subscripts have similar meanings. fci2 is the regression coefficient of variable 1 on 2 . b's with other subscripts have similar meanings. The magnitude b-^2 ^2 is an estimated x^ . Letting x^ stand for such estimates, we have ^r^ = ^12^2 • The regression coefficient fc^z is such a value that if in the correlation table (the scatter diagram) a straight line with this slope is drawn (in other words the line represented by the equation x-^^b-^^^z) the sum of the squares of the deviations of the ob- served .^'I's from this line will be a minimum. The x-^'s are thus the closest estimates of the x^'s which it is possible to obtain knowing the x^'s and assuming a rectilinear relationship. The regression coefficient should be used only after a study of the correlation table has justified this assump- tion. Should such study definitely show curvilinear relationship, a simple transmutation of the scores of one of the variables can frequently be made resulting in rectilinear regression. In the following treatment rectilinear regression of the original scores, or of transmuted scores, is taken for granted. In case of non-rectilinear regression, b^^ still represents the slope of the best straight line regression which can be obtained, but the ^r^'s resulting would differ more from the observed x^'s than would ^I's determined by means of the appropriate curvilinear re- gression line. Let Xi — ^1 = ^1-2 • Then Xi.2 is the "error of estimate" or "resid- ual." If A'' is the population, there are of course N such magnitudes. 8 CALCULATION OF PARTIAL COEFFICIENTS Their mean equals zero and their standard deviation is given by the equation : Oi.j is the "standard error of estimate" and is smaller than would have been the case had any value other than bi2 been used. This "best" value is given by the equation : ^^^ ~ N& 2 If x^ scores are estimated, knowing x^ scores, the regression equation is : .Tj = &2i-*'i , in which fcgi = "Sx^x^ The difference between x^ and x^ is ^iTj.i and the standard deviation of these residuals, Oa-j , is the standard error of estimate of the x^s . If the standard deviations of the two variables are equal "12 ^^ ''21 ^^ ''12 a 'measure of mutual implication. It is desirable, whether standard devi- ations are equal or not, to have a measure of mutual implication, and the coefficient of correlation continues such, though, in this case. h2=^r+b 21 The relation between f-jj and fcja ^nd b^i is simple. Let us express each variable in terms of its own standard deviation and call the new variables obtained "standard measures," z.^ and z^ • x^ - x^ ^1 ^ — also Zi = — ^2 Substituting ^r's for .r's in the equation. ^x,x, ^-^ X, and remembering that a^ and Oj are constants so that '^z^a.^z.^G^ a.fi^z^z^ gives : re I '^z^z. N and by a similar derivation -' AT z. ^^x THE MEANING OF PARTIAL CORRELATION 9 The common regression coefficient obtained when variables are expressed in terms of their own standard deviations is the coefficient of correlation, the desired measure of mutual implication. Therefore as a matter of definiton : and as an immediate consequence Vfcl2&; X2"il The correlation between variables 1 and 2 could be designated either as ''12 or ^21 > but custom places the numerically smaller subscript first. It will be found serviceable in the following treatment to think of the correlation coefficient as simply the regression coefficient which exists when variables are expressed in terms of their own standard deviations. If a third variable, x^ , is involved and it is desirable to obtain as accurate estimates of the x-^'s as possible, knowing the -Tj's and x^'s , it is done by the equation : ^l ^^^^ "12-3 -'■21 ''l3-2 -'^3 (The two dashes over the x^ distinguish it from the preceding x^ .) In general b^^.^ will not be identical with b.^^ , for it is now necessary to weight X2 in such a manner that when combined with an appropriately weighted X3 the two together will yield an estimated x^ , x.^^ , which will correlate as highly as possible with x-^^ . In other words, the errors of estimate, or residuals, x^ — ^1 = ^1.23 ; 3,re to be as small as possible. More accurately stated, the standard error of estimate, a^.^g , is to be a minimum. Logically it is obvious that if x^ has any value whatever independent of X2 in indicating x^ scores then a^.^s < a^.j . Also ai.23 < a^.g . It may readily be proven by calculus that if the standard error of estimate is a minimum : z, ^12 ^13^23 sy ''l nnrl h ''l3 ^12^23 ^ ^1 012-3 — ^j -2 — X - ana O13.2 — 1 „2 X - ^ '23 "2 ^ '23 "3 It will aid thinking to have a concept corresponding to that part of ^123 which does not depend upon the standard deviations of the variables. Calling that part P12.3 we have : &12-3 = Pi2-3 zr and &13.2 = |3i3.2 -' "2 "3 The regression equation may then be written either as ^1 = ^12-3 ^2 + ^13-2 ^'3 or ^1 = Pl2-= ^2 + Pl3-2 ^3 10 CALCULATION OF PARTIAL COEFFICIENTS The chart herewith given enables the ready calculation of the constants of the type P^^.a . From this point on, the general treatment will be in terms of z's, and to return to x's the simple substitution, ^1 0, is all that is necessary. The numerical illustration given is in terms of .r's . The weights, or regression coefficients, P^j.j and P13.2 , are such that Z2 is weighted according to its relationship with 2^ independent of what- ever common relationship ^3 may have with both s^ and Z2 > ^nd similarly ^3 is weighted in terms of its importance independent of ^2 . That, in order to obtain the minimum standard error of estimate, the weightings of Z2 and Z3 must be according to their relationships with z^ independent of €ach other, is probably a mere matter of logic which the keen philos- opher can see directly and without elaborate exposition, but the writer has been convinced of this fact only as a result of mathematical analysis. A numerical illustration in which the steps parallel those involved in the general analysis will serve to make clear the concrete meaning of the partial coefficient of correlation and the regression coefficient. Three variables will be considered and, to recapitulate, the notation will be as follows : Xi = gross score of variable 1 Ml = the mean X^ score x^ = X^ — M^ a^ = the standard deviation of the x^'s X-, Z-, Ol Variables 2 and 3 are similarly expressed Cia = the correlation of variables 1 and 2 &^j = the regression of a-j on x.^ = 3i2 — P12 = the regression of z^ on z^ = h^^ — (In case of two variables P^j =^ r^^, but in the case of three variables P^j-s H= »'i2-s) Let ^12 be defined by the equation rX^ -\- kX^ =^ \ . k\s thus a mag- nitude which measures lack of relationship between variables 1 and 2 . It will be called a "coefficient of alienation" and will be found to be as useful in interpreting data as the coefficient of correlation.^ ^ See Kelley, Truman L., "Principles Underlying the Classification of Men." Jour, of Applied Psych., 3, 50-67, March, 1919. THE MEANING OF PARTIAL CORRELATION 11 Ox-2 a,.. V in which x^.. ^1 (^\ = &12 ^,) 'S^Ff: ^' , in which .r^.jg = x\ — x^ (^1 = ^12-3 ^2 + Ks-l -y.) Comparable meanings for a's with other subscripts. With this notation we will consider the relationships in the three accompanying series, x^^ , ^2 I ^a • ^1 ^\ ^9.'% X -t.n Sums -12 —2 —3 —9 —3 1 — .8 —2.2 —9.8 —8 —2 —2 —6 2 .0 —2.0 —6.0 -4 —2 —2 —6 2 .0 2.0 —6.0 4 —6 —1 —3 7 —5 4.0 3.0 1.0 —4 —1 —3 3 —3 2.4 .6 — .6 —1 —3 3 1 — .8 3.8 —3.8 —8 —8 .0 —8.0 .0 —4 2 —4 2 —1.6 —2.4 —1.6 4 2 —4 2 —1.6 —2.4 —1.6 6 6 —4.8 4.8. ^1.8 8 —4 1 3 5 —5 4.0 1.0 7.0 4 4 1 3 1 3 —2.4 3.4 .6 4 1 3 1 — 1 .8 .2 3.8 12 2 6 6 —2 1.6 4.4 7.6 2 2 6 —6 .0 —6.0 6.0 8 4 3 9 —1 1 — .8 2 8.2 .0 .0 .0 Standard /540_ /160 I40_ /360 /280^ 1120 /76.8 / 203.2 /436.8 deviations .\ 16 \ 16 \ 16 \ 16 \ 16 \T6~'VT6"\ 16 \ 16 For convenience the three series have been so chosen that the mean of each is zero, so that the measures recorded are x measures. _3^ 40 and r23 = -^ We may use series 3 to estimate series 1 by the equation, (J, x^ = &13 jfg , in which h-^^ = 3 The values recorded in column x.^ are such values. These estimated values of x^ differ from the actual x^ values by the amounts shown in column .jTj.a (to be read "the errors of estimate of the -r/s when estimated 12 CALCULATION OF PARTIAL COEFFICIENTS from the x^s," or "the residuals in the .\\'s after estimation from the x^'s"). Thus far x^ has not been used. We, of course, cannot use the x^'s to estimate these residuals, x^.^s , as the correlation between the x-^.^'s and the Xg's is of necessity zero. We must resort to an additional source of data, such as is series 2, to reduce the error of estimate still further. However, in so far as series 2 is related to series 3 it will not be of service. Instead, therefore, of correlating series 2 with the residuals, jtr,.3 , we will correlate that part of series 2 which is independent of series 3 with these residuals, x^.^ is the part desired. These are the residuals in the X2S when x^s are estimated from x^'s . They are obtained in a manner com- parable to that of the x.^.^'s, being given by the equation, ■^2-3 = ^'2 — ^23 -^3 . in which b^^ = r^^ —^ = 1 "3 These residuals, recorded in column x.^.^ , may be used to estimate the residuals, .r^.g , thus leading to a closer approximation of the x^ scores than is possible by utilizing series 3 only. The correlation between Xi.^ and x„.., is - -96 r„.3,(2.3, - ^^Q ^-^Q Accordingly the regression coefficient of the x^.^'s upon the x^.^'s is ^1 -a I — ''(i-3)(2-3) ;; — / — — -^ j-1 The correlation coefficient, »'(i.3)(2.3) , is usually expressed as r^^-s and may be read "the correlation between those parts of x^ and X2 which are inde- pendent of Xg " ; in other words, "the partial correlation between x^ and .fj when the relationship of x^ is eliminated." Or again, "the correlation of Xj^ and x^ independent of x^ ," or "the correlation between x^ and x^ when x^ is constant." Using the regression coefficient just obtained, — .8 , to estimate x^^.^ residuals from the x^-^ residuals gives the measures in column x^.g . The differences between these and the x^.^ residuals are recorded in column x^.23 and are the errors of estimate or residuals when both X2 and x^ have been utilized to their fullest extent in estimating x^ scores. The standard deviation of these final residuals is -4' ^ _ 203.2 16 This is the standard ersor of estimate, a^..^ . The probable error of estimate = .6745 a^.^s = 2.4037 . Accordingly if x^ scores are estimated from X2 and x^ scores, the resulting estimate probably will be in error by as much as 2.4037 . Finally, if the estimate of the residual x^^.^ , namely x^.^ , be added to the estimate of x^ , namely ^1 , a magnitude x^ , is obtained which is the THE MEANING OF PARTIAL CORRELATION 13 estimate of the x^'s obtained by utilizing both aTj and x^ . x^ is thus given by the equation : = ~ _1_ ~ "!._]_ "l'^ •''1 ■ '^11 -I'l-a ''l3 _ ^"3 I ''l2'3 "Z ^2-i Oj O2.3 It can be proven that the correlation between x^ and .Ti , designated by ^1.23 and called the multiple correlation, is the maximum obtainable. The difference between .i\ and x.^ is of course jfj.jg , already obtained in a slightly different manner. The equation just given for x-^ could be used to estimate x^ scores knowing X2 and x^ scores, but it would necessitate the calculation of magnitudes ;i;2.3 . Accordingly this equation has no practical utility, except as here given in illustrating how A'g's and .^z's may be fully utilized to estimate x-^'s . Algebraic reduction (involving the expressing of partials in terms of totals, e. g., ^1:2. 3 = x^ — ^23 ^3 • etc., for all factors involved in a^.g , 02.3 and r^^-z) yields: ^i ''12 ''is^'aa -^21 ^13 ^12^23 -^'3 ~r 1 „2 « (equivalent to z^ = Pi2-3 ^2 + ^13-2 •s'a) • The problem before us, then, is primarily to determine such regression coefficients as P12.3 . The chart here given was drawn up to facilitate such determinations. It, however, is equally serviceable in enabling the calculation of ri2.3 , Vl — ''12 designated as k^^ , and other correlation functions. 14 CALCULATION OF PARTIAL COEFFICIENTS Section II. Formulas Involved in Multiple Correlation. The basic problem of multiple correlation is to estimate the value of one variable, knowing the values of several others. Provided relation- ships are rectilinear, or approximately so, this problem is solved by means of an equation : -^l = £'l2-34 ■ ■ ■ n -A2 -|- 0i3-24 • • • 11-^3 "T " ' ' "T ^ni-23 • • • m-l -^n 'T ^ V ■'■ ) in which the b's and the c are constants, so chosen that when the variables X2 , Xg , ■ ■ ■ Xn are multiplied by the successive b's and added to c a final measure, X^ , is obtained which is the most accurate estimate of X^ pos- sible of attainment. If measures are expressed as deviations from their own means di- vided by their own standard deviations the above regression equation simplifies. Let X^ — M^ _ X^ — M^ etc., in which the M's are successive means and the o's successive standard deviations. There is thus a one-to-one relation between the s's and the X's . The regression equation connecting the ^r's is : ■^i = Pi2-34 ■ ■ ■ n ^2 -\- P13.24 ■ ■ ■ n^s -\- ' ' ' -f- "1,1-23 • • ■ n-1 ^n \^) in which the b's and |3's are connected by the relationships : C7i2-34 • • • n Pl2-34 • • • 1 rt "2 C'l3-24 ■ ■ ■ n Pl3-24 ■ • • « 77 ' ^^'-• (3) Also: c = Ml — fei2.34 . . . „ M2 — &13.21 . . . „ M3 — ■ • ■ — &i„.23 . . . „.i M„ (4) The P constants involved in equation (2) are the regression co- efficients derived by aid of the chart. Knowing equation (2) and having relationships (3) and (4) it is but a step to secure equation (1) which is the most serviceable form of the regression equation for actual use. The probable error of the estimated X^'s , i. e., of the X^'s , is .6745 X their standard error, or .6745 X their standard deviation, <'l-23 ■ ■ ■ n ■ Coefficients of alienation, k , may be defined by the type equation : Kl2-84 • • ■ K V^ ^12'84 ■ • • n (jj FORMULAS INVOLVED IN MULTIPLE CORRELATION 15 In the case of two variables the following continued equality may be written : K2 = *i.2 = VT=7f, = VI — ^U (5a) [for ^1.2 and r^.j see (6) and (7) ] and in general, k^ -\- r' =^ \ , where the subscript of k is identical with the subscript of r . It may be shown that: Ol.ss . . . n = Oi ^1.,, . . . „ , (6) in which *i2 may be read, "the coefficient of alienation between 1 and 2." •^1-2 ^^ "l ''12 Accordingly k,^^ is a measure of variability, or freedom of variable 1 from variable 2 which is independent of the standard deviations of the variables. If 1 is the dependent variable and 2 the independent it will be convenient, in place of k^^ , to use the notation fej.j , which may be read, "the freedom of 1 from 2." Similarly ^1.23 is the freedom of 1 from 2 and 3, or "the variability in 1 unaccounted for by 2 and 3.'' With this explanation the meaning of such symbols as k^^s , ^1-23 • ■ • « . ^3-124 ■ • • n , etc., will be obvious. Further, if we write, ^l23 + r\.^3 = 1 (in general : k\.^s ...n + r\.^^ . . . „ = 1) (8) ri.33 is the correlation between 1, and 2 and 3 when combined by the re- gression equation. The symbol here given, r^^.^s , is an extension of the notation introduced by Yule ^ in the symbol ai.23 . The relationship is fi ^1-23 = Oi VI— »"' The standard error of estimate, O1.23 . . . „ , is given in terms of k in equation (6) . k with a single primary subscript, e. g., ^1-23 ...,., is given in terms of k's with two primary subscripts in equation (7). k's with two 1 Yule, G. Udny, "An Introduction to the Theory of Statistics." Lippincott, 1912. The symbol r^.^^ . . . „ differs from the symbol i?^,23 • • • n) which Yule gives, but as capital R has earlier and with much pertinence been used by Pearson and others to designate certain correlation determinants it is undesirable to use it as a multiple correlation coefficient. Pearson has used various symbols including R^ and e . Furthermore, the relationship between a^.^^ . ..„, ^j.ga ■ • ■ n ^""^ ''i-za • • • n is an argument in favor of both symbols, ^^.23 . . . „ and r^.^^ . . . « . 16 CALCULATION OF PARTIAL COEFFICIENTS primary subscripts are given in terms of r's in equations (5) and (5a) and in terms of P's in the following equations : K2 = VI — PiaPzi (When there are no secondary subscripts P^j = P21 = ''12) ^12-3 = VI — Pi2-3P2i-3> and in general ^12-34 ■ ■ • n V 1 Pl2'34 ■ • • n P21-34 • • • n (," j Partial coefficients of correlation may be found from partial regres- sion coefficients: >'l2-34 ■ ■ ■ n VPl2-34 • ■ ■ n P21-34 ■ ■ ■ n ( lU) These two P's may appropriately be called "conjugate regression coeffi- cients." The elements entering into them are the same and they are in- volved in a reciprocal manner. Whatever the sign of ^12-34 • ■ • n , the partial regression of 1 upon 2 , it is likewise the sign of ^21-34 • ■ • n , the partial regression of 2 upon 1 . This is the sign to be attached to 1'\2-Si • ■ • n ■ The foregoing formulas show that every correlation coefficient can be found if the P's are known. The fundamental formula for expressing a p of a given order in terms of P's of lower order is : Pl2-45 ■ ■ ■ n Pl3-4S • • • II P32-45 • ■ • " Z 1 1 \ 12'34 • ■ ■ n 1 a o K'^'- ) '■ |-'23-45 ■ ■ ■ n P32-45 • • • « Following Yule (who, however, deals with b's instead of P's) Pi2-34 ■ ■ ■ n may be called a regression coefficient of the n — 2 order, Pi2.45 ■ . ■ n one of the n — 3 order, etc. — the order being determined by the number of sec- ondary subscripts. P^j is accordingly a regression coefficient of zero order. The order of the secondary subscripts is immaterial, but the order of the primary subscripts is definite. In equation (11) one (called the unique secondary subscript) and only one of the secondary subscripts appearing in the P in the left-hand member has disappeared from the secondary sub- scripts in the P's in the right-hand member. Since all but one of the secondary subscripts appear as secondary subscripts in both members the general principle may be illustrated by a P of the second order : . Pl2-4 Pl3-4P32-4 12-34 1 ft 8 ^ |-'23-4|-'32-4 The first primary subscript in the left-hand member term becomes the first primary subscript in the first and second P's in the numerator of the right- hand member. The second primary subscript in the left-hand member term becomes the second primary subscript in the first and third P's in FORMULAS INVOLVED IN MULTIPLE CORRELATION 17 the numerator. The remaining two primary subscipts of the numerator P's are identical and are the unique secondary subscript. The denomina- tor P's are the third numerator p and its conjugate. From these general directions it is obvious that there are as many different ways for expressing a regression coefficient of a given order as the order of the coefficient. P12.34 . . . „ may be expressed in « — 2 ways. Equation (11) is one such, and the following is another : Pl2-3 Pl2-35 ■ • • n Pl4-3S ■ • ■ n P42'35 • • • W ^ P24-35 • ■ • n P42'36 • • ■ « In practice it is desirable to calculate in at least two ways as a check. Formula (11) simpHfies in case a partial regression coefficient of the first order is being calculated : Q P12 P13P32 . ^12 ^13^23 /'11o^ '"•^~ I-P23P32 ~ 1— r|3 ^ ^ By repeated use of formulas (11) and (11a) in calculating regression coefficients of a given order from those of an order one less, every regres- sion coefficient may be obtained. By formulas (6), (7), and (9) the standard error of estimate, or the standard deviation of the residuals, may be found. Finally, by equation (8), which may be written in the form: >'i-23 ■ ■ • n VI ^1-23 • • ■ n (8) the multiple coefficient of correlation between one variable and any num- ber of others may be found. The accompanying table indicates the partial regression coefficients of the first and second orders which will be needed to complete the solution of a four-variable problem. The outline provides for the calculation of each second order partial in two ways. The first time a coefficient appears in the table it is designated by a number in parentheses, or, in case it and its conjugate are both required, by a letter in parentheses. The lack of a number or letter before a coefficient indicates that it has appeared earlier in the table. NEEDED MAY BE OBTAINED FROM MAY ALSO BE OBTAINED FROM Pl2-34 (l)Pl2-4 (2)P,3-4 («)p32.4 (3)P,2-3 (4) P,4-3 {b)^..-. |3,3.3, Pi3-4 Pl2-4 P23-4 (5)Px3-2 (6)Pi4-2 (0^43-2 P14.23 Pl4-3 Pl2-3 P24-3 Pl4-2 Pl3'2 P34-2 In addition to the preceding which suffice to obtain the regression coefficients, the following conjugates of coefficients already obtained will enable the calculation in two ways of the multiple alienation coefficient, fei.234 , of the multiple correlation coefficient, r^^.^M , and of the standard error of estimate, O1.234 • 18 CALCULATION OF PARTIAL COEFFICIENTS Pj1-«4 Pil-l "23-4 P31-4 Pai-S P24-J P41 k\.,st is calculated before r^-js^ or a-^.^s^ . kU,. = (l-p21.34Pl2-34)(l-P31-4Pl,.4)(l-Ml4) Also, *l234 = (l-P21.S4pi2-a4)(l-P4a.3Pl4-3)(l-P3lPl,) In the case of five variables the following outline may be followed : NEEDED MAY BE OBTAINED FROM MAY ALSO BE OBTAINED FROM (13)P,2-4= (l)Pl2.. (2)P,4.B (a)P42-5 (3)P.2.4 (4)P.B-4 (^')Pb2.4 (14)P,3-4 = (5)P.3-5 Px4-5 (c)P43.B (6)P.3.4 Pl5-4 («;)Pb3.4 (15)Px4-23 (7)P.4.2 (8)P,3-2 (^)p34-2 .(9)Pl4.3 (10)P...3 (/)P24.3 (16)P,...3 (ll)Pl.-2 Pl3-2 (^)p3 = -2 (12)P.B.3 Pl2-3 (^)P2B-, (17)P...34 Pl2-4 Pl3-4 (0P32-4 Pl2-3 Pl4-3 P42-8 (18)Pl,.3^ Pl5-4 Pl3-4 P36-4 PlB-3 Pl4-3' (y)P4B.s (19)Px3-2. Pl3-6 Pl2-5 (^)P23-|S Pl3-2 PlB-2 P63-2 (20) p,,.,. Pl4-6 Pl2-6 P24-5 Pl4-2 PlB-2 (/)Pb4.2 (»m)P23.„ P23-6 P24-5 P43-6 P23-4 P25-4 P6S-4 (M)P3..« P32-5 P34-5 P42-B P32-4 P35-4 Pb2-4 (w)P45-23 p45-2 P43-2 P35-2 P4B'3 P42-3 P25-3 (A^)P54-23 Ps4-2 P53-2 P34-2 P54-3 Pb2-3 P24-8 (o)P,».34 P25-4 P23-4 p3S-4 P23-3 P24-3 P45-S (0)P,,.34 P52-4 Pb3-4 P32-4 P62-3 P54-3 P42-S (/')P34.25 P34-5 P32-5 P24-B p34-2 P35-2 P64-2 (P)P43.25 P43-S P42-6 P2S-6 P43-2 P4B.2 P63-2 Pl2-345 (13)Pl2-45 (14)P,3-4B (M)P3,.« (17)P,2-34 (18)P.B-34 (0)pB2-34 Pl8-24B Pl3-45 Pl2-45 (W)P33.4B (19)P,3-2B (20)P,,.3B (^)P43-2B Pl4-23B (15)P.4-23 (16)P.B.23 (A^)Pb4.2S Pl4-25 Pl3-2B (/')P34.2B Pl5-234 PlB'23 Pl4-23 (m)P4B-23 Pl6-34 Pl2-34 (0)p25-34 In addition to the preceding, the following conjugates enable the calculation in two ways of ^1-2345 , ri.234o and *'l'2S4B • (13a)p2,.« (3a)P,, ■4 P25-4 (4a) P,,., (14a)p3x.4, (6a) p3. ■4 P36-4 Pb1.4 (17a)P,,.34 (10a)p2i •3 P24-3 (9a)P,,3 (18a)p„.34 (12a)P„ •3 P54-3 P41-! P21-345 P21 •45 P23-40 PsiMB Psi-ai P2B-34 P61-S4 '^l-2346 ^ (1-P21-34. iPl2-348)(l P31-4bPi3- 4b)(1-P41.bP,4-.)(1 — PlBpBl) Also, *i-3»4« = (1 P21-»4»Pia«4lHl Pei'84PlS-i4)(l P«l'4Pl«-4) (1 P41P14 ) THE USE OF THE CHART 19 Section III. The Use of the Chart. The directions here given apply to the small chart in this monograph and also to a large twenty-inch chart, which may be secured separately and which gives results of approximately the same degree of accuracy as a twenty-inch slide rule. The scales for r^^ and r^s are graduated according to the logarithms of numbers from 10 to 100, and the product scale is so graduated as to indicate the products of any two numbers on scales r^g and j-js when con- nected by a straight line. Accordingly all products and quotients, includ- ing squares and square roots, may be obtained. In all these operations the simplest way to keep track of the decimal point is to roughly carry the operation through in one's head and then place the point where it belongs. Scale l/k is graduated according to the logarithms of l/Vl — ''^ and scale l/k' according to the logarithms of 1/1 — r' . Scale l/K' is a continuation of scale l/k' . When values on scale l/K' are used, place a straight edge through this value and parallel to the base line [as explained in example (c)] and locate a point on scale l/k' . Then continue the cal- culation using the point so located on scale l/k' in lieu of the point on scale l/K" . The following magnitudes are needed in multiple correlation work : (a) Products, such as fi^r^s (b) Quotients, such as — (c) Square roots, such as VPi2-3P2i-s (d) Factors t — (= -) whichenter in partial coefficients of _ «i3 VI— ''L correlation (e) Coefficients of ahenation, such as ^13 (= VI — ''13) (/) Factors 7^ (= :j ^) which enter into regression coefficients ^23 ^ ^23 (g) Squares of coefficients of alienation, such as k'^o (= 1 — rls) (h) Partial regression coefficients, such as Pi2-3 (= ^" .2 " ^^ ) (4) Partial correlation coefficients, such as (= llL^l^ = VPa2-sP2..3 = VK 2'3''2l'S/' 20 CALCULATION OF PARTIAL COEFFICIENTS (/) Partial regression coefficients involving four variables P( . Pl2-4 Pl3-4P32-4 Pl2-3 Pl4-3P42-3 \ 12-34 ^ 1,2 1,2 / ''23-4 "•24'S Since ^Ig.^ =1 — p23-4P32-4 > ^nd since the calculation which leads to ^23-4 is changed in but one simple respect to obtain ^32-4 it is convenient to write : Pl2-4 Pl3-4P32-4 12-34 1 a g ^ P23-4P32-4 {k) Partial regression coefficients involving more than four variables Pi P12-1 • • ■ n P18-4 • ■ ■ n P32-4 ■ 1 P23-4 ■ ■ ■ n P32-4 - • - n The same procedure as in (;) is followed, but in this case the calculation which leads to P23-4 - - • « does not, by one Simple change, lead to P32.4 . . . « . Examples : (a) .2 X -4 Place a straight edge on 20, scale r^ , and upon 40, scale r^i , and read the product, .08, on the product scale. 2 {b) -7- Place a straight edge upon 20, product scale, and upon 40, scale r23 , and read the quotient, 5.0, on scale r-^^ . (c) V"25 Place a straight edge on 25, product scale, and parallel to the base line of the chart (this can be done by rotating the straight edge until the readings on scales r.^^ and r^^ are identi- cal) and read the square root, .50, on either scale r^^ or ?-23 - (d) ^ — : Find 60 on scale \/k and read the answer, 1.25, ^ ^ Vl_.602 _ from the same point on scale r^s . {e) y/\ — .60^ Place a straight edge through 60, scale \/k , and 100, product scale, and read the answer, .80, on scale r^^ . (/) Tpj^ Find 60 on scale \/k^ and read the answer, 1.5625, from the same point on scale r^s ■ (g) 1 — .60^ Place a straight edge through 60, scale 1/k' , and 100, product scale, and read the answer, .64, on scale r^^ - (h) •^^""•^^^•^'^ Find the product of .60 and .80 by (a). On 1 — 80 a separate scratch paper subtract this from .78, obtaining .30. Place a straight edge between 30, scale r^^ , and 80, scale l/k^ , and read the answer, .833, on the product scale. (,-)- -78 -60 X .80 p.^^.78-.60X-80 ^ ^ VI— 60^ VI — -80'' 1 — .80^ THE USE OF THE CHART 21 „. , .78 — .60X .80 L ,,s ,T , ■ , rind ^ ,„3 by («) . Multiply and extract the square root by (o) and (c), yielding the answer .625. U) Given: ^,,.^ ^ .70; ^^3., = 60; %,., ^ .80; ^,,., = .5469. T3 • , o .70 — .60 X. 80 ^. , ,, Required: Pi2.34 = ^j „„ - . ,„ Find the numerator as in (h) and the denominator in the same manner. Then divide as 2200 in (b). This gives ^|gj = .3911 . If, as is frequently the case, P32.4 and P23.4 are nearly equal, k's-i is closely given by : 1,2 1 / P32-4 + P23-4N2 "'23-4 '■ ^. 2 '' In this case the procedure may be as follows : .70 — .60 X -80 1 — .80 X .76 Find the numerator, .2200, as before. On scratch paper deter- mine .78, the arithmetic average of .80 and .76. Place a straight edge between .78, scale 1/k^ , and .22, scale r^g , and read the answer, .5618, on the product scale. This answer is in error by .0006, which is of the same order of magnitude as the error attendant upon the use of the large chart. As a sample problem in three variables the following data are given : TABLE OF CORRELATIONS, MEANS AND STANDARD DEVIATIONS Variables 1 2 3 2 .225 3 .274 .404 Means 68.15 43.60 52.20 a's 10.50 12.24 Solving P,2.3 = -1366 P2.3 = .1236 p,3.2 = .2200 fe^23 = -9093 r,.23 = .3011 O1.23 = 10.01 9.63 01 = .1366 ^2 + .2200 23 Z, = .1172 X, + .2399 X, + 50.52 22 CALCULATION OF TARTIAL COEFFICIENTS As a sample problem in four variables the following data are given : Variables 1 2 3 4 2 .225 3 .274 .404 4 " .134 .060 .231 Means 68.15 43.60 52.20 45.40 o's 10.50 12.24 9.63 14.25 Solving Pl2-34 = .1398 P21 ■34 = .1270 Pis •24 = .1991 Pl4-23 = .0796 kl 234 = .9033 ^1-. 234 = .3109 Oi. 234 = 9.980 z^ =3 .1398 z^ + .1991 03 + .0796 z^ Z, = .1199 X^ + .2171 Z3 + .0587 X, + 48.92 (k) The detailed steps in the solution of the accompanying five- variable problem are given. Such a scheme as shown for re- cording calculations will facilitate procedure. Table I VARIABLES 12 3 4 5 2 3 4 5 .72 .62 .58 .63 .61 .61 .82 .66 .55 .59 Table II FIRST ORDER P's, ETC. 12 .3 .5444 .5552 .4 .5832 .5518 .5 .6209 .3373 .7906 13 .2879 .4203 .3574 .3921 .4535 .8498 .8218X THE USE OF THE CHART 23 14 .2242 .3026 .3195 .2774 .3454 .2900 .3325 15 .1209 .4143 .4695 .4415 .4337 .4376 23 *.3675 .3303 .3489 .2280 .4853 .8894 24 .3675 .3303 .3489 .1936 .3852 .9254 25 .6946 .7716 .4640 .7058 .7328 .7193 34 .4585 .4585 .4585 .5146 .4810 .4978 35 .1520 .0793 .9879 .2464 .2845 .2654 45 .2741 .3254 .1430 .4022 .9608 .8691 Table III SECOND ORDER ^'s, ETC. Pl4-23 ^^ .1168 Pl3-25 ^ .2818 Pl2'34 .5058 Pl2-45 = .5379 P21-34 = .4948 P21-45 ^ .3039 Pl5-23 ^^ .0780 Pl4-25 = .2154 Aver. = .5003 ''12-45 • .8365 P4S-23 =^ .2069 P34-25 ^^^ .4546 P54-23 =^ .1350 P43-25 ^^^ .4421 Pl5-34 =^ .3634 Pl3-45 ^^^ .3169 Aver. = .4483 P51-34 ^= .3907 P31-45 ^^^ .3099 ''45-23 .9721 Aver. 3= .3771 Aver. = .3134 P25-34 = .6616 P23-45 ^^ .1792 P52'34 ^^ .7270 P32-45 ^^^ .3102 «26-34 .5190 ''23'4B .9444 24 CALCULATION OF PARTIAL COEFFICIENTS Table IV THIRD ORDER P'S, ETC. Pi2-3« = -4655 p,3-245 = -2334 p,,.,,, ^ .1093 p,,.,3. = .0554 P,,.3„ = .2754 ^12-345 = -8718 The constants derived in Tables III and IV give the regres- sion equation: Ji = .4655 2^ + .2334 z^ + .1093 z^ + .0554 z^ Also fel2345 = -4218 ;ri.,3,, = . 7604; ^1-2346 = -6495 ; 01.2345 == .6495 a^ . In Table II first order (3's are given, primary subscripts being shown in the stub, and secondary subscripts in the captions of the columns. The various entries in a compartment of the table have the following meanings : The first entry is the regres- sion coefficient indicated, the second its conjugate, the third the arithmetic average of the two (calculated in case the two coef- ficients are nearly equal), and the fourth the k^ derived from the two conjugate P's (calculated in case the two coefficients are quite unequal). Whenever there is a third entry there is no fourth, and vice versa, as but one of these two items is needed. Table III is derived from Table 'II, each entry in Table III being calculated in two ways. Table IV is derived from Table III, each entry being calcu- lated in two ways.