Basic HTML Version

under consideration converge to the true answer to the problem at all, cycle efficiency is equivalent

to Ockham’s vertical razor and reversal efficiency is equivalent to Ockham’s horizontal razor. So

efficiency in

senses is equivalent to the very plausible

of Ockham’s vertical razor

and Ockham’s horizontal razor.

In spite of all that has been said, it remains tempting to conclude, with the machine learning

community, that complex effects do not really matter, if they are small. Who cares, for example,

about missing tiny terms in a polynomial law? After all, it is routine in physics to expand a function

in a Taylor series and to truncate all but the first few terms. We concede the point, if one merely

wishes to predict what a passive observer would see. We also concede the point when one wishes

to predict the results of an action or policy from experimental data, which are sampled from the

modified world that the policy

produce. However, we disagree resolutely if one wishes to

infer the effects of a policy or action from non-experimental data, as, for example, when a

corporation hires a machine learning firm to mine customer data as a guide to designing new retail

displays. Then, the planned policy may

the system under study, invalidating the usual

machine learning claims about predictive accuracy from samples drawn from the non-perturbed

system. Evidently, the causal truth

matter, even in (or

in) the most mundane of

contexts. For example, there is a statistical link (correlation) between ashtray frequency and lung

cancer. One can estimate that link very accurately using standard statistical and machine learning

techniques. Yet taking away the ashtrays does not cure lung cancer―instead, it destroys the link.

That is why the causal truth matters for policy. The familiar moral: correlation between

and

tells one nothing about the nature of the causal relationship between

and

, which could be that

causes

, that

causes

, or that there is a confounding cause of both

and

, as in this example.

It is, therefore, a potentially revolutionary discovery that causal orientation

be inferred

from patterns of correlation in non-experimental data over

variables (Spirtes,

Glymour, and Scheines 1993; Pearl 2009). Here is the basic idea. Consider a faucet

governed by

two handles

and

(e.g., hot and cold) and suppose that is the

causal story. Then the

setting of handle

tells you nothing about the setting of handle

―neither handle is causally

connected to the other. But given the flow

out of the faucet, the setting of handle

provides a

great deal of information about the setting of handle

. The moral: joint causes

,

of a common

effect

are statistically independent, but become statistically dependent, given the common effect.

Next, suppose that we have a single handle

that governs two faucets

,

. That is the dual or

opposite situation in which we have a common cause of two effects. Then information about the

flow from

provides information about flow from the other faucet

,

the flow from one

faucet provides information about the setting of the handle

, which governs both faucets. But

the setting of the handle, the flow from one faucet provides no further information about the

flow from the other. Notice that the conditional and unconditional dependencies are exactly

opposite in the common effect case. It remains for us to consider causal chains. Suppose that there

is one handle

that governs a faucet

that pours into a funnel

. Then the setting of the handle

7