Directed Acyclic Graphs.ppt_麦多课文库mydoc123.com

资源描述

1、1,Directed Acyclic Graphs,David A. Bessler Texas A&M UniversityNovember 20, 2002Universidad Internacional del Ecuador Quito, Ecuador,2,Outline,Introduction Causal Forks Inverted Causal Forks D-separation Markov Property The Adjustment Problem Policy Modeling PC Algorithm,3,Outline Continued,Example:

2、 Traffic Fatalities Correlation and Partial Correlation Forecasting Traffic Fatalities More Examples: US Money, Prices and Income World Stock Markets Conclusion,4,Motivation,Oftentimes we are uncertain about which variables are causal in a modeling effort. Theory may tell us what our fundamental cau

3、sal variables are in a controlled system; however, it is common that our data may not be collected in a controlled environment. In fact we are rarely involved with the collection of our data.,5,Observational Data,In the case where no experimental control is present in the generation of our data, suc

4、h data are said to be observational (non-experimental) and usually secondary, not collected explicitly for our purpose but rather for some other primary purpose.,6,Use of Theory,Theory is a good potential source of information about direction of causal flow. However, theory usually invokes the ceter

5、is paribus condition to achieve results. Data are usually observational (non-experimental) and thus the ceteris paribus condition may not hold. We may not ever know if it holds because of unknown variables operating on our system (see Malinvauds econometric text).,7,Experimental Methods,If we do not

6、 know the “true“ system, but have an approximate idea that one or more variables operate on that system, then experimental methods can yield appropriate results. Experimental methods work because they use randomization, random assignment of subjects to alternative treatments, to account for any addi

7、tional variation associated with the unknown variables on the system.,8,Directed Graphs Can Be Used To Represent Causation,Directed graphs help us assign causal flows to a set of observational data.The problem under study and theory suggests certain variables ought to be related, even if we do not k

8、now exactly how; i.e. we dont know the “true“ system.,9,Causal Models Are Well Represented By Directed Graphs,One reason for studying causal models, represented here as X Y, is to predict the consequences of changing the effect variable (Y) by changing the cause variable (X). The possibility of mani

9、pulating Y by way of manipulating X is at the heart of causation. Hausman (1998, page 7) writes: “Causation seems connected to intervention and manipulation: One can use causes to wiggle their effects.”,10,We Need More Than Algebra To Represent Cause,Linear algebra is symmetric with respect to the e

10、qual sign. We can re-write y = a + bx as x = -a/b +(1/b)y. Either form is legitimate for representing the information conveyed by the equation. A preferred representation of causation would be the sentence x y, or the words: “if you change x by one unit you will change y by b units, ceteris paribus.

11、” The algebraic statement suggests a symmetry that does not hold for causal statements.,11,Arrows Carry the Information,An arrow placed with its base at X and head at Y indicates X causes Y: X Y. By the words “X causes Y” we mean that one can change the values of Y by changing the values of X.Arrows

12、 indicate a productive or genetic relationship between X and Y. Causal Statements are asymmetric: x y is not consistent with y x.,12,Problems with Predictive Definitions of Cause,Definition of the word “cause” that focus on prediction alone, without distinguishing between intervention (first) and su

13、bsequent realization, may mistakenly label as causal variables that are associated only through an omitted variable. Prediction is one attribute of the word “cause.” We must be careful not to make it the only attribute (more or less a summary of Bunge 1959).,13,Granger-type Causality,For example, Gr

14、anger-type causality (Granger 1980) focuses solely on prediction, without considering intervention. If we can predict Y better by using past values of X than by not using past values of X , then X Granger-causes Y.The consequences of such focus is to open oneself up to the frustration of unrealized

15、expectations by attempting policy on the wrong set of variables.,14,Graph,A graph is an ordered triple .V is a non-empty set of vertices (variables). M is a non-empty set of marks (symbols attached to the end of undirected edges).E is a set of ordered pairs. Each member of E is called an edge.,15,Ve

16、rtices are variables; Edges are lines,Vertices connected by an edge are said to be adjacent. If we have a set of vertices A,B,C,D the undirected graph contains only undirected edges (e.g., A B). A directed graph contains only directed edges:C D.,16,Directed Acyclic Graphs (DAGs),A directed acyclic g

17、raph is a directed graph that contains no directed cyclic paths. An acyclic graph has no path that leads away from a variable only to return to that same variable. The path A B C A is labeled “cyclic” as here we move from A to B, but then return to A by way of C.,17,Graphs and Probabilities of Varia

18、bles,Directed acyclic graphs are pictures (illustrations) for representing conditional independence as given by the recursive decomposition:n Pr(v1,v2 vn-1,vn ) = Pr( vi | pai )i=1 where Pr is the probability of vertices (variables) v1, v2, v3, . vn and pai the realization of some subset of the vari

19、ables that precede (come before in a causal sense) vi in order (v1, v2, v3, . vn), and the symbol represents the product operation, with index of operation denoted below (start) and above (finish) the symbol. Think of pai as the parent of variable i.,18,D-Separation,Let X, Y and Z be three disjoint

20、subsets of variables in a directed acylic graph G, and let p be any path between a vertex variable in X and a vertex variable in Y, where by path we mean any succession of edges, regardless of their directions. Z is said to block p if there is a vertex w on p satisfying one of the following: (i) w h

21、as converging arrows along p, and neither w nor any of its descendants are on Z or (ii) w does not have converging arrows along p, and w is in Z. Furthermore, Z is said to d-separate X from Y on graph G, written (X Y | Z)G , if and only if Z blocks every path from a vertex variable in X to a vertex

22、variable in Y.,19,Graphs and D-Separation,Geiger, Verma and Pearl (1990) show that there is a one-to-one correspondence between the set of conditional independencies, X Y | Z, implied by the above factorization and the set of triples, X, Y, Z, that satisfy the d-separation criterion in graph G. If G

23、 is a directed acyclic graph with vertex set V, if A and B are in V and if H is also in V, then G linearly implies the correlation between A and B conditional on H is zero if and only if A and B are d-separated given H.,20,Colliders (Inverted Fork),Consider three variables (vertices): A, B and C. A

24、variable is a collider if arrows converge on it: A B C. The vertex B is a collider, A and C are d-separated, given the null set. Intuitively, think of two trains one starting at A, the other at C. Both move toward B. Unconditionally, they will crash at B. However, if we condition on B, (if we build

25、a switch station at B with side tracks), we open-up the flow from A to C. Conditioning on B makes A and C d-connected (directionally connected).,21,Conditioning on Children (of colliders) Opens Up Information Flows Too!,Amend the above graph given above to include variable D, as a child of B, such t

26、hat: A B C D If we condition on D rather than B, we, as well, open up the flow between A and C (Pearl, 2000 p.17). This illustrates the (i) component of the definition given above.,22,Common Causes (causal fork),Say we have three vertices K, L and M, described by the following graph: K L M. Here L i

27、s a common cause of K and M. The unconditional association (correlation) between K and M will be non-zero, as they have a common cause L. However, if we condition on L (know the value of L), the association between K and M disappears (Pearl, 2000, p.17). Conditioning on common causes blocks the flow

28、 of information between effects.,23,Causal chains,Finally, if our causal path is one of a chain (causal chain), condition (ii) in the above definition again applies. If D causes E and E causes F, we have the representational flow: D E F. The unconditional association (correlation) between D and F wi

29、ll be non-zero, but the association (correlation) between D and F conditional on E will be zero. (For those in the audience familiar with Box and Jenkins time series methods, this is a property they exploited in testing for AR models),24,Example of an Inverted Causal Fork,In the example we study bel

30、ow we take data from Peltzman (Jo. Political Economy 1976). This is a study of Traffic Fatalities in the U.S. over the period 1947 1972. Roh, Bessler and Gilbert (1997) find the following (not a surprise): Speed(t) Alcohol Consumption(t) Traffic Fatalities(t),25,What Should We Expect Based On The Pr

31、evious Directed Graph?,Here year to year changes in speed and year to year changes in alcohol consumption are direct causes of year to year changes in traffic fatalities. The graph is an inverted fork. So, we should expect to see that Speed and Alcohol Consumption are not related in unconditional te

32、sts of association. However, if we condition on Traffic Fatalities , we should see a non-zero measure of association between Speed and Alcohol Consumption.,26,OLS Regressions On An Inverted Fork (use ols to measure association),Regression #1: Speed(t) = .01 - .01*( Alcohol Consumption(t)(.002) (.053

33、)Estimated standard errors of the coefficients are in ( ). Based on this regression we would say Speed(t)and Alcohol Consumption(t) are not related (note: -.01/.053 2.0).,27,OLS Regressions On An Inverted Fork: Now We Condition on the Effect (traffic fatalities),Regression #2: Speed(t) = .01 - .11*(

34、 Alcohol Consumption(t)(.002) (.051)+ .15 * ( Traffic Fatalities(t)(.046)Here conditioning on the common effect makes the two causes dependent (note: -.11/.051 2.0).,28,Example of a Causal Chain,In another example, consider the relationship among GDP, Poverty and Malnutrition. Based on World Bank da

35、ta for 80 less developed countries, we find:GDP Poverty Malnutrition We expect, from the directed graph theory given above, Malnutrition and GDP will be related in unconditional tests. However, if we condition on poverty they should be unrelated.Lets see!,29,Regressions with Causal Chains,Regression

36、 #1 (for i=1, , 80 countries)Malnutrition(i) = 24.18 - .003* GDP(i)(1.91) (.0006)Note the t-ratio of -.003/.0006 = -5.38 suggests that GDP is an important variable in moving levels of malnutrition.,30,Regressions with Causal Chains, continued.,Regression #2 (for i=1, , 80 countries)Malnutrition(i) =

37、 7.52 - .0013* GDP(i)(2.09) (.0007)+ .289 * Poverty(i)(.055)Note the t-ratio of -.0013/.0007 = -1.78 suggests (if we are 5% ers) that GDP is not informative with respect to malnutrition if we have information about a countrys poverty levels.,31,Markov Property,Key to understanding these ideas is tha

38、t d-separation allows us to write the probability of our variables X,Y, and Z in terms of the product of the conditional probabilities on each variable (X,Y, or Z), where the conditioning factor is the immediate parent of each variable. We do not have to condition on grandparents, great grandparents

39、, aunts, uncles or children. (It is helpful and valid to refer to genealogical analogies when thinking about conditioning information.),32,Some probabilities,The following directed graphs have these associated probability factorizations:A B C ; Pr(A,B,C) = Pr(A) Pr(C)Pr(B|C,A)D E F ; Pr(D,E,F) = Pr(

40、D)Pr(E|D)Pr(F|E)GHI J; Pr(G,H,I,J) = Pr(G)Pr(J)Pr(H|G)Pr(I|J,H)P Q ; Pr(P,Q) = Pr(P)Pr(Q) Here Pr(.) refers to the probability of the variable(s) in parentheses,33,Adjustment Problem (from Pearl 2000),What must I measure if I want to know how X affects Y?Z1 Z2Z3 Z4 Z5 Z6 Z7Z8 Z9 Z10X Z11 Y Original

41、Causal Graph Illustrating the “Adjustment Problem”,34,D-Separation is Key to Solving the Adjustment Problem,Ask the question: can I get back to Y via the ancestors of X without running into converging arrows? Yes! I can take several paths from X to Y through Xs ancestors: X Z3 Z1 Z4 Z7 Y X Z6 Z4 Z7

42、YX Z6 Z4 Z2 Z5 Z9 YX Z6 Z4 Z2 Z7 YI have to condition on variables to “block” the path back to Y from X. There are several possibilities: It looks like Z7 and Z9 are two. Below we give six steps for solving the “adjustment problem”.,35,Step 1. Z7 and Z9 should be non-descendants of X,Z1 Z2Z3 Z4 Z5 Z

43、6 Z7Z8 Z9 Z10X Z11 Y Z11 will not work as it is a child of X.,36,Step 2. Delete all non-ancestors of X,Y and Z.,Z1 Z2Z3 Z4 Z5 Z6 Z7Z8 Z9 Z10X Z11 Y Here Z is the set of candidate “blocking” variables Z = Z7 and Z9 .,37,Step 3. Delete all arcs emanating from X.,Z1 Z2Z3 Z4 Z5 Z6 Z7Z8 Z9 Z10X Z11 Y Her

44、e we will remove the X Z11 edge, as Z11 is a child of X.,38,Step 4. Connect any two parents sharing a common child.,Z1 Z2Z3 Z4 Z5 Z6 Z7Z8 Z9 Z10X Z11 Y Here we will use dotted lines to connect parents with a common child,39,Step 5. Strip arrow-heads from all edges,Z1 Z2Z3 Z4 Z5 Z6 Z7Z8 Z9 Z10X Z11 Y

45、,40,Step 6. Delete Lines into and out of Z7 and Z9,Z1 Z2Z3 Z4 Z5 We cannot get Z6 Z7 from X to YZ8 Z9 Z10X Z11 Y Here we delete all lines into the variables that we wish to condition on, Z7 and Z9.,41,Test,Test: if X is disconnected from Y in the remaining graph, then Z7 and Z9 are sufficient measur

46、ements to condition on. By “disconnected” we mean that we cannot get from X to Y via the remaining lines.Z7 and Z9 pass the test. So we can perform ols regression of Y on X, Z7 and Z9 to find an unbiased estimate of the effect of X on Y.,42,Another candidate: Lets Try Z4 all by Itself.,If we try just Z4 as a sole candidate variable to condition on, our last figure will be amended as follows:Z1 Z2Z3 Z4 Z5 Z6 Z7Z8 Z9 Clearly Z4Z10 will not workX Z11 Y,

展开阅读全文