Page 154 - MODES of EXPLANATION

Basic HTML Version

provides information about the flow out of the funnel
Y
, because
X
provides information about the
flow into the funnel
Z
. But given the flow out of the faucet
Z
, the setting of the handle
X
provides
no extra information about the flow from the funnel
Y
. That is the same dependency profile that is
associated with the common cause. By symmetry, the result is the same if we swap
X
for
Y
. So the
common effect pattern of conditional and non-conditional dependencies differs
empirically
from
the other three cases. Recognizing the common effect pattern in the data can potentially yield
genuine causal knowledge from cheap, abundant, moral, non-experimental data, as long as one
examines at least three variables.
Of course, all of that depends on the variables in question providing a
complete
causal
description of the situation, a condition that, itself, requires some heavy lifting from Ockham. It
also assumes that causal paths do not cancel perfectly―no inductive method can win against a
perfect
illusion. But even given those assumptions, dependence is an
effect
―it is verifiable from
data, but not refutable (because the dependence could be arbitrarily small), so the above logic
concerning the problem of induction, simplicity, Ockham’s razor, and reversals of opinion applies.
It can be shown (Kelly and Mayo-Wilson 2010) that inferring causal directionality from non-
experimental data is subject to the kind of forcible reversals of opinion that were discussed above,
in connection with the polynomial degree problem. No matter how strong a given causal
connection happens to be, you can never really guard against discovering new, arbitrarily small
effects that cause you to flip the orientation of the connection in question any finite number of
times, given that you can converge to the true orientation of the cause at all. Skepticism is one
response to our argument, but it is a luxury―sometimes, one
must
make a policy decision and
experimental data will not be forthcoming. The right response is that unavoidable reversals of
opinion are justified because they are unavoidable and avoidable reversals are not justified because
they are avoidable. The best possible methods for causal discovery from non-experimental data
are, therefore, those that minimize causal reversals. And which methods are those? The Ockham
efficiency theorem says: the Ockham ones.
Our analysis raises some real machine learning issues for causal discovery algorithms.
There are myriad causal theories, and the simplicity order over such theories branches massively.
Ockham’s horizontal razor is prohibitive to implement in that setting. However, the Ockham
efficiency theorems have some flexibility in application, because efficiency is relative to the
underlying simplicity concept, which can be understood more or less coarsely. It turns out that the
simplicity order over causal theories is
ranked
by the total number of individual causal
connections, in the sense that each step along a path in the order amounts to the addition of one
more causal connection between variables (Chickering and Meek 2002). If one thinks of simplicity
degrees as
levels
in that ranking (i.e., as the total number of causal connections), then Ockham’s
horizontal razor allows one to return the disjunction of the theories of least rank that are consistent
with the data, rather than all theories that are minimal in the order (which could include many
more). Moreover, that strategy is optimal in terms of worst-case reversals over each rank level
(efficiency is relative to what one takes simplicity to be). Finally, there is an attractive trade-off.
The rank version of horizontal Ockham’s razor licenses one to say more and is also easier to
8