Page 153 - MODES of EXPLANATION

Basic HTML Version

under consideration converge to the true answer to the problem at all, cycle efficiency is equivalent
to Ockham’s vertical razor and reversal efficiency is equivalent to Ockham’s horizontal razor. So
efficiency in
both
senses is equivalent to the very plausible
conjunction
of Ockham’s vertical razor
and Ockham’s horizontal razor.
When Close Is Not Good Enough
In spite of all that has been said, it remains tempting to conclude, with the machine learning
community, that complex effects do not really matter, if they are small. Who cares, for example,
about missing tiny terms in a polynomial law? After all, it is routine in physics to expand a function
in a Taylor series and to truncate all but the first few terms. We concede the point, if one merely
wishes to predict what a passive observer would see. We also concede the point when one wishes
to predict the results of an action or policy from experimental data, which are sampled from the
modified world that the policy
would
produce. However, we disagree resolutely if one wishes to
infer the effects of a policy or action from non-experimental data, as, for example, when a
corporation hires a machine learning firm to mine customer data as a guide to designing new retail
displays. Then, the planned policy may
perturb
the system under study, invalidating the usual
machine learning claims about predictive accuracy from samples drawn from the non-perturbed
system. Evidently, the causal truth
can
matter, even in (or
especially
in) the most mundane of
contexts. For example, there is a statistical link (correlation) between ashtray frequency and lung
cancer. One can estimate that link very accurately using standard statistical and machine learning
techniques. Yet taking away the ashtrays does not cure lung cancer―instead, it destroys the link.
That is why the causal truth matters for policy. The familiar moral: correlation between
X
and
Y
tells one nothing about the nature of the causal relationship between
X
and
Y
, which could be that
X
causes
Y
, that
Y
causes
X
, or that there is a confounding cause of both
X
and
Y
, as in this example.
It is, therefore, a potentially revolutionary discovery that causal orientation
can
be inferred
from patterns of correlation in non-experimental data over
three or more
variables (Spirtes,
Glymour, and Scheines 1993; Pearl 2009). Here is the basic idea. Consider a faucet
Z
governed by
two handles
X
and
Y
(e.g., hot and cold) and suppose that is the
complete
causal story. Then the
setting of handle
X
tells you nothing about the setting of handle
Y
―neither handle is causally
connected to the other. But given the flow
Z
out of the faucet, the setting of handle
X
provides a
great deal of information about the setting of handle
Y
. The moral: joint causes
X
,
Y
of a common
effect
Z
are statistically independent, but become statistically dependent, given the common effect.
Next, suppose that we have a single handle
Z
that governs two faucets
X
,
Y
. That is the dual or
opposite situation in which we have a common cause of two effects. Then information about the
flow from
X
provides information about flow from the other faucet
Y
,
because
the flow from one
faucet provides information about the setting of the handle
Z
, which governs both faucets. But
given
the setting of the handle, the flow from one faucet provides no further information about the
flow from the other. Notice that the conditional and unconditional dependencies are exactly
opposite in the common effect case. It remains for us to consider causal chains. Suppose that there
is one handle
X
that governs a faucet
Z
that pours into a funnel
Y
. Then the setting of the handle
X
7