In Part 1 of this question, we explored how the correlation coefficient is calculated, and how that calculation relies heavily on the covariance between two quantitative variables. We left off with a few questions: why is r bound between -1 and 1, and why does a value of r near 0 indicate a weak association (and near an extreme indicate a strong one)? In this post, we will answer these questions!
A student asked me a really interesting question recently; a pair of questions, really. We have just discussed the correlation coefficient as a measure of the direction/strength of a linear association between two quantitative variables, and I demonstrated in class that the calculation for this quantity, referred to by the letter r, can be found by the formula
In other words, for each point of a scatterplot, find the z-score for the x-coordinate and the y-coordinate of that point and multiply those together. Do this for all of the points in your scatterplot, add them together, and divide by n-1 to get your correlation coefficient.
We discussed various properties of this quantity, and my student asked me that question that teachers always hope for (if not without a bit of dread sometimes!): “Why?” Why does this formula produce a quantity that measures the strength of a linear association? Also, why must the value of r necessarily be bound between -1 and 1? In this post, I seek to start an answer to these questions.
The thing of it is, he makes some really good points with this post. A lot of the things we learn in high school took mathematicians centuries to come to terms with. The concept of a complex or imaginary number, i = sqrt(-1), wasn’t really accepted until the 18th century (why do you think they’re called “imaginary numbers,” after all?). So don’t dispair if it takes you a little while to understand something in math class. Mathematicians of old died before they could come to terms with it!
Correction: The cost of purchasing every possible ticket combination was miscalculated in the previous version of this post. It has been changed to the correct value.
Every so often, the news media becomes all abuzz when a particular lottery jackpot starts to grow really large. Right now is one of those times, with no winner on Saturday putting the jackpot for Wednesday’s drawing at around $1.3 Billion, the largest lottery jackpot in US History.
My students sometimes ask me, as a math teacher and a guy who “knows numbers,” whether I play the lottery. Usually I just smile and tell them I buy the occasional scratch ticket for the fun of it, but almost never anything beyond that. It would require a “special occasion” or a “huge jackpot” for me to consider buying one.
This certainly seems like one of those special occasions.
To understand how to approach this question from a math standpoint, we first need to understand the probability of winning.
No, nothing about Homer or OJ (is that too much of a nineties reference?), this paradox is about a statistical phenomenon where analysis of pooled data can lead a researcher to make a conclusion in direct contradiction to the one that unpooled data would lead. There have been several prominent examples of Simpson’s Paradox arising in areas of college admissions, treatment of kidney stones, and baseball batting averages.
The gist is this: Say you need to have a major operation done and there are two hospitals in your town where you could have it. You’re worried about post-surgery complications, so you do some research into the hospitals and find that in the past year, patients at the larger hospital suffered post-surgery complications in 130 out of 1000 cases, and patients at the smaller hospital suffered complications in only 30 out of 300. Based on these results, it looks like the smaller hospital is the better bet: only 10% of patients had complications after surgery there versus 13% at the larger hospital.
However, not all surgeries have the same rate of complications. Relatively minor surgeries are less invasive and would probably result in a lower complication rate. With that in mind, you look further at the data and find that, at the large hospital, 120 out of the 800 major surgery patients experienced complications compared to 10 out of 200 minor surgery patients, and at the small hospital, 10 of the 50 major surgery patients suffered complications compared to 20 out of 250 minor surgery patients. In other words, broken down by type of surgery, the complication rates at the large hospital were 15%/5% for major/minor surgeries while the small hospital saw a rates of 20%/8%. We see now that the larger hospital has a lower rate of complication across the board, regardless of the type of procedure done.
So why the different conclusion? It has to do with how many of both types of procedures the hospitals did. The vast majority of the larger hospital’s 1000 surgeries in the last year were major surgeries, which have higher complication rates across the board. The majority of the smaller hospital’s 300 surgeries were more minor procedures, which generally have lower rates of complication. As a result of this imbalance, the overall, pooled complication rates for the two hospitals are biased: the larger hospital towards a higher rate and the smaller hospital towards a lower rate. So it only appears that the smaller hospital has a lower complication rate because most of the surgeries performed there are less likely to have complications.
Check out this website for another explanation of Simpson’s Paradox, as well as some clever interactive animations that demonstrate how and why it can arise. It’s an important lesson as consumers of data and statistics: while the saying may go “Less is More,” when it comes to how much detail to include in your research, sometimes less is wrong.
Happy Pi Day, dear readers! Try not to party too hard, and take some time to check out the following links!
- Sweet Number Pi – Pi music video
- One Million Digits of Pi – can you memorize them all?
- Official Guinness World Record for Most Memorized Digits of Pi – the record is 67,890 places!
- Search for Your Birthday in Pi – mine starts at the 2,373,070th decimal place!
- The Tau Manifesto – Pi is probably not correctly defined and should be twice the value as it is. Many mathematicians call this number “tau” and there is a convincing argument to be made about their point!
- Other Pi Day websites – PiDay.org, PiZone.com
Back in 2012, an opinion piece was written for the New York Times asking Is Algebra Necessary? The piece, written by Andrew Hacker, emeritus professor of political science from Queens College in New York City, suggested that making math education mandatory for all high school students:
Prevents us from discovering and developing young talent. In the interest of maintaining rigor, we’re actually depleting our pool of brainpower.
This post is not to explore the virtues or flaws with Professor Hacker’s arguments, but to point out what many bloggers have recently observed: that the New York Times answered their own question last week in an article about Sony’s controversial movie The Interview.
The movie in question, you may have heard, was pulled from theatrical release on the 25th after the studio’s computer network was hacked and threats were made against theaters showing the Seth Rogan/James Franco comedy that depicts the two actors as journalists asked by the CIA to use an upcoming interview with North Korean president Kim Jong Un as an opportunity to assassinate the dictatorial leader. After public outcry from major Hollywood figures and even president Obama, Sony released the film in independent theaters and online. The December 28th NYT article discusses the amount of money Sony earned off of online sales and rentals, but observes that Sony “did not say” how much of the $15 million revenue was from each source (sales vs. rentals).
It would appear that Algebra was not reporter Michael Cieply’s strongest subject either, as there is enough information in this article to set up and solve a simple system of equations to answer that exact question.
Let s = the number of $15 sales Sony made and r = the number of $6 digital rentals. From the $15 million headline, we can write the equation 15s + 6r = 15 000 000.
The second paragraph also tells us that there were about two million transactions overall. Therefore, we can make the second equation s + r = 2 000 000.
Solving this system of equations is a matter any 9th grader can do:
Multiply the second equation by -6 and add vertically
Substitute that value of s back into original equation, and you get:
So there you have it, New York Times. With about 2 minutes of high school-level algebra, we can see that The Interview saw about 300,000 downloads and 1.7 million rentals in its first four days. Maybe you should employ more ninth-graders…
I ran across this listing in Amazon for A Million Random Digits, a book of random digits used in statistics and other fields for simulations. Clearly, not riveting reading. What is very entertaining, however, are the almost 500 user reviews. Some of the best reviews include one bemoaning the lack of an index, another suggesting the numbers be sorted in order “to better find the one I’m looking for,” and another suggestion that readers find the source in the original binary so as to not lose the most “significant digits” in the translation.
For another moment of bizarre, check out the “Also Viewed” section.
This is one of the central tenants of any Statistics course: just because to things appear to be related does not mean that one causes the other. This CNN news piece linking a variety of food consumption to depression falls into this trap. This is perhaps one of the most common mistakes in the popular understanding of statistics, with quite a few dire consequences. Part of this problem comes from the fact that you can often find a relationship between obviously unrelated things, just because of how their individual trends just happen to coincidentally line up.
For an awesome example of this, check out Spurious Correlations, a website that takes real data and finds ridiculous correlations between them. For example, did you know that the marriage rate in Kentucky can be a very strong predictor of the number of people who drown after falling out of a fishing boat? Or that the United States decrease in oil imports from Norway seems to cause fewer drivers to die in a collision with a railway train? Or that there’s a clear link to the precipitation rate in Tompkins County and the number of trip/slip related deaths in male Texans?
You can try and find your own correlations as well. If you find something good, post it here!
It’s always fun to see what creative things people can do with a little bit of data and some statistical analysis. Designer and data scientist Matt Daniels analyzed the first 35,000 lyrics in the official works of more than 80 rap and hip hop artists and groups, and sorted them by who has the most extensive vocabularies. The data analysis may not be perfect — I’m not sure I approve of counting variations of the same word as distinct from each other — but the picture that emerges is quite entertaining. Check it out here, but be forewarned: this is an analysis of rap and hip hop music, so there is some strong language on this site.
This will be of principal interest to my statistics students (young and old!) but this is a nice summary of some of the poor ways science and scientific findings are reported in the news. Number 4 should look very familiar, as should number 7!
My general recommendation about reading these news articles is, when in doubt, go read the source. Don’t rely on other people to do your thinking for you. Go and seek out the information you need and make your own conclusions!
Here’s an amusing little time waster: http://gabrielecirulli.github.io/2048/
The goal of the game is simple: Get a tile of value 2048. The controls are also simple: press an arrow key and every tile that can move in that direction, will. If two tiles of the same value are next to each other, they’ll combine to one tile double that value. Also, with every move, a two or a four will appear at random in a free spot on the board. Seems easy, right?
Well keep in mind that to get a 2048 tile, you’ll need to create and combine two 1024 tiles. To get those two 1024 tiles, you’ll need four 512 tiles, which require eight 256 tiles, which require sixteen 128 tiles. All of these numbers are powers of two, which are the key to the concept of binary numbers, which at the most basic and fundamental level is how computers operate. A binary number is one of base two, in the same way that a “conventional,” decimal number is base 10. Consider the number 2048. You might remember from elementary school that this could be thought of as two 1000’s, zero 100’s, four 10’s, and eight 1’s. Those numbers – 1000, 100, 10, and 1 – are all powers of 10 (10^3, 10^2, 10^1, and 10^0, respectively). A number written in binary uses powers of 2, meaning there is a 2^0 = 1’s place, a 2^1 = 2’s place, a 2^2 = 4’s place, and so on.
Moreover, just as any one place value in a decimal number could be occupied by a digit from 0-9, giving you ten options, a place value in a binary number can only be occupied by two digits: 0 and 1. To write a decimal number like 459 in binary, you first need to figure out how to “assemble” the number using powers of two. The biggest power of two that fits is 256 (2^8). That leaves 203 left, meaning 128 (2^7) fits also. Subtracting that leaves 75, meaning we can take away 64 (2^6). This leaves only 11 left, from which we can subtract 8 (2^3), then 2 (2^1), then only 1 (2^0) remaining. So the decimal number 459 can be rewritten as 11100111. Taken in the other direction, the binary number 110101 would be interpreted as one 1, one 4, one 16, and one 32, giving a decimal number of 53.
What’s important to note is that 53 and 110101 are referring to the same quantity; they are just different ways of representing that quantity. It’s the same way as how “the cat,” “el gato,” and “l’chat” all refer to the same animal. Thinking of decimal and binary as different languages for numbers is actually a great analogy, because binary is how computers think of numbers. The reason why has to do with how computers are made. The circuits on the motherboard, inside the processor, and all throughout your computer are essentially tiny wires. At any instant, the wire either has an active electrical charge running through it or it doesn’t. If the wire is “on,” it is considered a 1. If it is “off,” it is considered a 0. The sender on one end of the wire will turn the current on and off extremely quickly in a manner much like morse code, and the receiver on the other end of the wire will interpret the rapid fire of 1’s and 0’s as binary numbers that can be interpreted in any number of ways.
So far, the best I’ve been able to get in the game is a pair of 256 tiles that I wasn’t able to combine before blocking myself off, so my high score is only 3180. Think you can beat it?
Did you take the AMC 10 or AMC 12 on Tuesday? If you did, you might be interested in checking out the video solutions to the last few questions of both tests that the YouTube channel Art of Problem Solving just posted, viewable here.
I recently received the following email from my job at TC3…
2013 “POT OF GOLD” 50/50 RAFFLE
Our 17th year of making some lucky winner $1,000 richer!
$20 per ticket – only 100 tickets sold.
This got me to thinking: Is this 50/50 raffle worth the ticket? Are such raffles ever worth it? Is it even possible for it to be worth it?
First, check out this news article from Time Magzine. Give it a skim. Then come back.
A new largest prime number has been discovered through a program run by a mathematician at the University of Central Missouri. In case you’ve forgotton, a prime number is one whose factors are merely 1 and itself. The numbers 3, 5, 7, 11, and 13 are all prime, but 6 isn’t (it can factor into 2*3) and 15 isn’t (3*5). Prime numbers are really the bread-and-butter of many mathematicians, especially those who study a branch of mathematics called number theory.
With the new semester, I want to try to actually live up to the second half of the purpose of this website. Yes, the primary purpose is to provide a location for you to find homework assignments that you missed, project deadlines that you’ve forgotten, and upcoming test dates that you do not yet know, but the name of my website is “Assignments and Mathematical Musings.” And while there has been a copious amount of the former, there has been none of the latter. I hope to, as regularly as I can, fix that. These posts will contain interesting mathematical tidbits that I will try to write so that all of my students could enjoy. If I encounter an interesting or significant news article, I might write about it here. If I come across a fun puzzle or nifty proof of an easy-to-understand idea, I’ll try to share it. If you find something that you think I might think interesting, please send it along and I’ll give you the proper credit.
**Please note: For those math super fans who may be reading these posts, remember that their intended audience is high school math students in 9th-12th grade. I will try to be mathematically accurate in all of my posts, but I may “fudge” some things here and there for the sake of clarity. If I make an egregious error, please call me on it, but otherwise permit me a bit of poetic license, as it were.
For my first post, I want to discuss the significance of the picture below, an example of a proof without words.
No, that isn’t a gong being suspended from the floor with weird ray beams coming out of it. I’ll explain what it actually is shortly, but first I want to talk about infinity.
A recent episode of Futurama featured the lovable alcoholic robot Bender creating duplicates of himself that are 60% his size. Later in the episode the duplicates continue to replicate at 60% size, until the sheer number of sub-atomic Benders start to overwhelm the world and eat away at it. This is an actual end-of-world scenario called, as it is in this episode, the “grey goo,” but the episode also makes a point of talking about the generations of Benders and how their population increases without bound.
The question that popped into my mind is what generation of Bender would be so small as to influence matter at the subatomic level, as they do in the episode. In particular, the diminutive Benders are shown altering the molecular structure of water, and so we shall use that as a frame of reference. The size of a water molecule is 0.942 angstrom, about 94.2 picometers (1/10th a nanometer or 1.0 x 10^-10 meters). The Benders pulling apart the molecules appear to be roughly three times that size, so we will assume they are roughly 300 picometers in height.
A full sized Bender seems to be the same height as a typical human, which we will assume to be 1.73 meters. From this, we can derive the formula h(g) = 1.73(.6)^g where g is the number of generations and h(g) is the height of the g-th generation. If we want to know at what generation the height will reach 300 x 10^-10 meters, we can easily substitute it in and solve for g (I’ll leave that for you to calculate). According to my math, I get very near 35 generations, which we will use as our number.
So there were 35 generations of Benders out there to get to the sub-atomic sized Benders we saw in the show. How many Benders does this mean? We see that each time a Bender replicates himself, two copies are created. If we assume a Bender only copies himself one time, that means there are 2^35 = 34.36 billion Benders in the 35th generation! With all the Benders in all the other generations, this works out to be 2^0 + 2^1 + 2^2 + … + 2^35 = 68.72 billion Benders on the entire planet! And that’s if a Bender only copies itself one time! If each Bender copies itself twice, we would wind up with 4.7 x 10^21 (4.7 quintillion) Benders! It seems that Bender’s call for each of his descendants to perform “1 quntillionth” of a task was not too far off!
I recently read an article discussing how just 10 digits would be enough to end privacy as we know it. The article is a bit alarmist, but makes some interesting points that I’d like to discuss here.
Firstly, the article claims that a 10-digit code is sufficient to uniquely identify every person alive on earth. Where do they get that figure? It has to do with a tool in mathematics called a permutation, which is essentially an ordering of some sequence of numbers or objects. Consider sports jerseys.
A sports jersey has room for two digits, both of which can be any number from 0 to 9. There are, therefore, 10 options for both places, giving us a total of 100 possible jersey numbers – from 00 to 99. What we have just used here is something called the Fundamental Counting Principle (also known as the rule of product). Essentially, given a number of slots to fill and a number of choices for each slot, the total number of outcomes is equal to the products of all the numbers of choices for each slot. Since a jersey has two slots with ten choices each, the total number of outcomes is 10*10 = 10^2 =100.
If we instead have a string of ten digits, each place having ten options, the total number of outcomes is 10*10*10*10*10*10*10*10*10*10 = 10^10 = 10,000,000,000, or ten billion. Considering the world’s population is still less than 7 billion (though is predicted to reach that mark in 2011), a quantity of 10 billion identification numbers would be more than enough to assign one to every living human being. The idea of having your entire persona and identity reduced to a string of numbers is a frightening thing to many people.