Course Lectures, Spring 2017

Calling Bull, Spring 2017

We will record and post all ten lectures from our Spring 2017 course. Each will be presented here as it becomes available. We have divided up every lecture into a set of a shorter segments; these segments should more or less stand alone on their own merits. The full playlist of all course videos is available on the UW Information School's YouTube channel.

Lecture 1: An Introduction to Bull

March 29, 2017

1.1 Introduction to Bull.
Bull is everywhere, and we've had enough. We want to teach people to detect and defuse bull where ever it may arise.

1.2 Calling Bull on Ourselves.
Jevin uses data graphics to boast about explosive growth at our website — and Carl calls bull. Old-school bull versus new-school bull.

1.3 Brandolini's Bull Asymmetry Principle.
Lecture 1.3 "The amount of effort necessary to refute bull is an order of magnitude bigger than to produce it."

1.4 Classroom Discussion.
Students discuss: What is bull anyway?

1.5 The Philosophy of Bull.
How do we define bull? Does intention matter? Calling bull as a speech act.

Lecture 2: Spotting Bull

April 5, 2017

2.1 Spotting Bull.
Jevin discusses some ways to spot bull and challenges students to tell whether four nuggets of wisdom from the internet are true or bull.

2.2 Sounds Too Good to be True.
If a claim seems too good — or too bad — to be true, it probably is. An example involving recommendation letters, and the perils of confirmation bias.

2.3 Entertain Multiple Hypotheses.
The importance of generating and considering multiple alternative hypotheses. As an example, we consider why men cite themselves more than women do.

2.4 Fermi Estimation.
Using Fermi estimation to check the plausibility of claims, with an example of food stamp fraud. This example is treated in further detail in one of our case studies.

2.5 Unfair Comparisons.
In this segment on unfair comparisons, Carl explains why St. Louis and Detroit are not quite as bad as clickbait "most dangerous cities" lists portray them to be, and looks at the silly arguments over attendance at Trump's inauguration. Also: how to call bull on algorithms and statistics without a PhD in machine learning or statistics.

2.6 Assignment: Bull Inventory.
In our first assignment, we ask students to take a week-long bull inventory of the bull they encounter, create, and debunk.

Lecture 3: Correlation and Causation

April 12, 2017

3.1 Correlation and Causation
Correlations are often used to make claims about causation. Be careful about the direction in which causality goes. For example: do food stamps cause poverty?

3.2 What are Correlations?
Jevin providers an informal introduction to linear correlations.

3.3 Spurious Correlations?
We look at Tyler Vigen’s silly examples of quantities appear to be correlated over time, and note that scientific studies may accidentally pick up on similarly meaningless relationships.

3.4 Correlation Exercise
When is correlation all you need, and causation is beside the point? Can you figure out which way causality goes for each of several correlations?

3.5 Common Causes
We explain how common causes can generate correlations between otherwise unrelated variables, and look at the correlational evidence that storks bring babies. We look at the need to think about multiple contributing causes. The fallacy of post hoc propter ergo hoc: the mistaken belief that if two events happen sequentially, the first must have caused the second.

3.6 Manipulative Experiments
We look at how manipulative experiments can be used to work out the direction of causation in correlated variables, and sum up the questions one should ask when presented with a correlation.

Lecture 4: Statistical Traps and Trickery

April 19, 2017

4.1 Right Censoring
We look at a graph of age at death for musicians in different genres, and use this to illustrate the problem of right-censored data. We consider this issue in further detail in one of our case studies.

4.2 Means and Medians
Simple as it may sound, the difference between mean and median values offers fertile ground for cooking up misleading statistics.

4.3 p-Values and the Prosecutor’s Fallacy
Carl presents what he thinks may be one of the most important segments in the whole course: a discussion of the prosecutor’s fallacy. This logical fallacy is not limited to the courtroom: it underlies a very common misinterpretation of the p values associated with scientific experiments.

4.4 The Will Rogers Effect
Will Rogers purportedly quipped that when the Okies left Oklahoma for California, they raised the average intelligence in both states. The same phenomenon can arise in epidemiology and a host of other areas.

4.5 Jevin's Turn
Jevin goes looking for bull and finds it — in Carl’s textbook. Jevin calls bull on Carl’s use of track and field records by age to illustrate senescence, and Carl tries to explain himself. This example is described further in another of our case studies.

Lecture 5: Big Data

April 26, 2017

5.1 Big Data Introduction
We briefly introduce big data and provide a few the cautionary tales surrounding this recent phenomenon. Beware of those ponies…

5.2 Garbage In, Garbage Out
You don’t need a PhD in statistics or machine learning to call bull on big data. Simply by focusing on the input data and the results is often sufficient to refute a claim.

5.3 Big Data Hubris
We discuss the Google Flu Trends project and how it moved from being a poster child for big data to a providing an important precautionary tale.

5.4 Overfitting
We examine overfitting, the Achilles heel of machine learning. We illustrate overfitting visually, and consider and what to look out for.

5.5 Criminal Machine Learning
A recent paper claims that machine learning can determine whether or not you are a criminal from a photograph of your face. That's bull. This example is described further in one of our case studies.

5.6 Algorithmic Ethics
We discuss gender and racial biases inherent to many of the machine learning algorithms and recommender systems prevalent in today’s technology, and encourage others to call bull on machine injustice.

Lecture 6: Data Visualization

May 3, 2017

6.1 Dataviz in the Popular Media
Until recently, the popular media made minimal use of sophisticated data visualization. People have not necessarily had time to hone their bull detectors for application to data graphics.

6.2 Misleading Axes
One of the most common abuses of data visualization involves the inappropriate ranges on the dependent variable (y) axis. Carl looks at a series of example, and explain why bar charts should include zero whereas line graphs need not — and often should not — do so. This example is treated in further detail in one of our articles.

6.3 Manipulating Bin Sizes
By binning data in different ways, bar charts can be made to tell very different stories. Here we consider an example from the Wall Street Journal.

6.4 Dataviz Ducks
Edward Tufte uses the term “ducks” to refer to data graphics that put style ahead of substance. We explain why, and explore a number of examples.

6.5 Glass Slippers
We propose the term “glass slipper” to describe to data visualizations in which the designer has taken a beautiful data design intended for very specific situations, and tried to shoehorn entirely inappropriate types of data into it. Carl considers examples including a periodic table of data science, a subway map of corporate acquisitions, a phylogenetic tree of internet marketing, and numerous Venn diagrams.

6.6 The Principle of Proportional Ink
Our principle of proportional ink states that when a shaded region is used to represent a numerical value, the area of that shaded region should be directly proportional to the corresponding value. We look at graphs that violate this principle and discuss how such violations can be misleading. This example is treated in further detail in one of our articles.

Lecture 7: Publication bias

May 10, 2017

7.1 Duck hunting
For last week’s homework assignment, students searched for examples of “duck” and “glass slipper” data visualizations. Carl and Jevin look at a few of the best finds.

7.2 Science is amazing, but…
Science is probably the greatest human invention of all time, but that doesn’t mean it doesn’t come with its share of bull..

7.3 Reproducibility
Jevin discusses how spreadsheet errors reversed the conclusions of a high-profile paper that was used to justify austerity measures.

7.4 A Replication Crisis
Scientists have difficulty reproducing a surprisingly large fraction of the published literature. What is going on?

7.5 Publication Bias
Journals prefer to publish positive results and scientists prefer to submit successful experiments. This can be misleading given that we typically can look only at the published literature.

7.6 Science is not bull
The subject matter of today’s lecture notwithstanding, science generally works pretty darn well. We can build airplanes and iPhones and save lives with antibiotics and vaccines, after all. Carl looks at five reasons why this is true.