Saturday, January 21, 2017

The Brodie Helmet


At the outbreak of World War I, none of the combatant nations provided steel helmets to their troops. Soldiers of most nations went into battle wearing cloth or leather hats that offered little protection from modern weapons. As a result, many soldiers suffered head injuries from exploding shrapnel.

In April of 1916, British soldiers began using a metal helmet in battle called the Brodie helmet, but authorities discovered that the proportion of head injuries then increased. Why should the incidence of head injuries increase when soldiers wore metal helmets rather than cloth caps? Click below to see the answer.




Saturday, January 14, 2017

Panama Canal


A ship sailed through the Panama Canal going from west to east. When it exited the canal, it entered the Pacific Ocean. (The ship did not double back.) How can this be so? Click below to see the answer.



Saturday, January 7, 2017

Rope Around the Earth


Suppose you tie a rope tightly around the Earth at the equator. (Assume the Earth is perfectly spherical, and that the surface is smooth so that the rope lies tight against the surface at all points.) Now suppose that you add an additional 6 feet to the length of the rope. How high off the surface would the rope lie? You could look up the Earth's circumference and do the math to come up with an exact answer, but can you quickly come up with an intuitive guess? (High enough to slide a piece of paper under? To wave your hand under? To walk under?) Click below to see a hint or the answer.









Sunday, January 1, 2017

Voyageurs

I've been reading The Revenant by Michael Punke and came across the following few passages. The main character, Hugh Glass, is embarking on a canoe trip up the Missouri River with a group of French Canadian fur traders known as voyageurs.
...For the rest of their voyage, Glass manned not a paddle but an enormous sponge, constantly bailing water as it pooled on the bottom of the canoe.
It was a full-time job, since the bâtard leaked steadily. The canoe reminded Glass of a floating quilt. Its patchwork skin of birch bark was sewn together with wattope, the fine root of a pine tree. The seams were sealed with pine tar, reapplied constantly as leaks appeared. As birch had become more difficult to find, the voyageurs were forced to use other materials in their patching and plugging. Rawhide had been employed in several spots, stitched on and then slathered in gum. Glass was amazed at the fragility of the craft. A stiff kick would easily puncture the skin, and one of La Vierge's main tasks as steersman was the avoidance of lethal, floating debris. At least they benefited from the relatively docile flow of the fall season. The spring floods could send entire trees crashing downstream.
If you've ever maintained a large code base, you probably already see where I'm going with this. The constant patching and plugging of leaks, the fragility of the craft, one man constantly bailing out water while several others row the boat guided by a steersman. These elements all remind me of several large software projects I've been on.  The passage continues.
There was an upside to the bâtard's shortcomings. If the vessel as frail, it was also light, an important consideration as they labored against the current. Glass came quickly to understand the odd affection of voyageurs for their craft. It was a marriage of sorts, a partnership between the men who propelled the boat and the boat that propelled the men. Each relied upon the other. The voyageurs spent half their time complaining bitterly about the manifold ails of the craft, and half their time nursing them tenderly.
This reminds me not only of the relationship programmers have with our code, but also of the relationship we have with our tools. How much time do we spend complaining about an IDE or a framework? How much time configuring them? But after we've gotten comfortable using them, most of us will strongly resist switching to a new one. Finally...
They took great pride in the appearance of the bâtard, dressing it in jaunty plumes and bright paint. On the high prow they had painted a stag's head, its antlers tilted challengingly toward the flowing water. (On the stern, La Vierge had painted the animal's ass.)
This final bit surprises me the most, but in a way I suppose it shouldn't. I don't know much about boating, but I do know that you should fix the leaks in your boat before you bother to decorate it. But that isn't how we always approach software development, is it? I've seen people spend plenty of time refactoring and cleaning code that didn't really need to change, or adding test cases just to get a higher percentage in test coverage. At times I've been guilty of this myself. I guess it's worth it to ask yourself, before you make a change to your code, am I fixing a leak, or am I just painting a stag's ass on this canoe?



Quetico Superior Route, Passing a Waterfall by Frances Anne Hopkins

Friday, December 30, 2016

Love Triangle

Alice is in love with Bob, but Bob is in love with Carol. To complicate matters even further, Alice is married, but Carol is unmarried. Is a married person in love with an unmarried person in this love triangle?

Click below to see the answer.






Thursday, November 24, 2016

Think Negative

I was reading Artillery Through the Ages: A Short Illustrated History of Cannon, Emphasizing Types Used in America by Albert Manucy, when I came across the following passage.

There is one apocryphal tale, however, about an experiment with chain shot as anti-personnel missiles: instead of charging a single cannon with the two balls, two guns were used, side by side. The ball in one gun was chained to the ball in the other. The projectiles were to fly forth, stretching the long chain between them, mowing down a sizeable segment of the enemy. Instead, the chain wrapped the gun crews in a murderous embrace; one gun had fired late.

Whether the story is true or not, it teaches an important lesson. When designing a system, don't just think about the happy path. Make sure you think about all of the ways that things could go wrong, or risk being wrapped in the "murderous embrace" of your own design. (This is also known as being "hoist by one's own petard," a rather antiquated phrase, also explained in colorful detail in Artillery Through the Ages.)

Monday, August 22, 2016

Twitter's Favorite Films

If you were on Twitter at all last week, you probably couldn't help but notice a flurry of "Fav7" hashtags trending, including #Fav7Films, #Fav7Books, #Fav7TVShows where people were posting a list of their favorite 7 things from each category.



I thought it would be fun to scrape the data to see what Twitter's favorite films are, and compare it to the top rated films on IMDb and Rotten Tomatoes. Here are the results.

Twitter's Top 25 Films
  1. The Dark Knight (9.0 IMDb, 94% RT)
  2. Pulp Fiction (8.9 IMDb, 94% RT)
  3. The Empire Strikes Back (8.8 IMDb, 94% RT)
  4. Goodfellas (8.7 IMDb, 96% RT)
  5. The Shawshank Redemption (9.3 IMDb, 91% RT)
  6. Fight Club (8.8 IMDb, 79% RT)
  7. The Godfather (9.2 IMDb, 99% RT)
  8. Back to the Future (8.5 IMDb, 96% RT)
  9. Inception (8.8 IMDb, 86% RT)
  10. Jurassic Park (8.1 IMDb, 93% RT)
  11. Forrest Gump (8.8 IMDb, 72% RT)
  12. The Big Lebowski (8.2 IMDb, 81% RT)
  13. Jaws (8.0 IMDb, 97% RT)
  14. Star Wars (8.7 IMDb, 93% RT)
  15. Raiders of the Lost Ark (8.5 IMDb, 94% RT)
  16. The Princess Bride (8.1 IMDb, 97% RT)
  17. Blade Runner (8.2 IMDb, 89% RT)
  18. Alien (8.5 IMDb, 97% RT)
  19. The Departed (8.5 IMDb, 91% RT)
  20. The Matrix (8.7 IMDb, 87% RT)
  21. Interstellar (8.6 IMDb, 71% RT)
  22. Aliens (8.4 IMDb, 98% RT)
  23. Good Will Hunting (8.3 IMDb, 97% RT)
  24. The Shining (8.4 IMDb, 88% RT)
  25. Die Hard (8.2 IMDb, 92% RT)

A few observations:

  • Less than half of the films in Twitter's top 25 are also in IMDb's top 25.
  • The Godfather (1972) is the oldest film on the list, while Interstellar (2014) is the newest.
  • Harrison Ford has starred in the most (4) of the top 25 films, while Stephen Spielberg has directed the most (3).
  • Only three sequels appear in the top 25. For two of those, the original film also appears in the top 25.
  • Action/adventure and science fiction films dominate the list.
  • As popular as they are right now, only one film based on a comic book character is in the top 25 (although it did take the top spot).

The source code

Start by loading the required libraries, including twitteR for accessing the Twitter API, and setting up authentication. You'll need to sign up for free on Twitter Developers to get your own authentication keys and tokens. (If you've never done this before, see Bogdan Rau's Collecting Tweets Using R and the Twitter Search API for a more detailed guide.)

library(dplyr)
library(purrr)
library(twitteR)

# Download cacert file for Windows use.
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

consumer_key <- 'your key'
consumer_secret <- 'your secret'
access_token <- 'your access token'
access_secret <- 'your access secret'
setup_twitter_oauth(consumer_key,
                    consumer_secret,
                    access_token,
                    access_secret)

Next, query the Twitter search API for the "#Fav7Films" hashtag, and initialize a data frame with tweets.

requests <- 1 # keep count of how many requests are sent
num_tweets <- 3000 # number of tweets to fetch per request
delay <- 62.0 # add in a delay so the API doesn't block

fav_film_tweets <- searchTwitter("#Fav7Films\n", n=num_tweets)
Sys.sleep(delay) # be nice to the API
fav_film_df <- tbl_df(map_df(fav_film_tweets, as.data.frame))

fav_film_all <- fav_film_df[fav_film_df$isRetweet == FALSE, ]

Now we want to keep searching in a loop, until we've downloaded all the tweets we're interested in. To do that, we'll keep looping as long as the API returns as many tweets as we told it to. Once it returns fewer tweets, we know it ran out.

while(nrow(fav_film_df) == num_tweets) {
    max_id <- fav_film_df$id[num_tweets]
    requests <- requests + 1
    fav_film_tweets <- searchTwitter("#Fav7Films\n", n=num_tweets, maxID=max_id)
    fav_film_df <- tbl_df(map_df(fav_film_tweets, as.data.frame))
    fav_film_all <- rbind(fav_film_all, fav_film_df[fav_film_df$isRetweet == FALSE, ])

    Sys.sleep(delay) # be nice to the API
}

Note that I added the maxID=max_id parameter to the request. This tells the search API to return tweets older than the previous set of tweets. Also note that I added a delay in the loop. Twitter has set a rate limit on their search API to 15 requests every 15 minutes, so this delay is to avoid being blocked.

That will take a while, but once it's done we'll have over 100,000 tweets, so we want to save them so we don't have to go through all that again. I just saved the whole data frame to an R data blob.

save(fav_film_all, file="Fav7FilmTweets.Rda")

You can download that file from GitHub at Fav7FilmTweets.Rda if you want to follow along from this point, or if you want to do your own analysis on this data set. Just use load("Fav7FilmTweets.Rda") to load the data frame from the file.

Next, we want to remove any retweets or multiple tweets from the same user.

fav_film_all <- fav_film_all[fav_film_all$isRetweet == FALSE, ]
fav_film_all <- fav_film_all[!duplicated(fav_film_all$screenName), ]

Now we can start parsing the lists of film titles from the tweets. Most people formatted their titles on separate lines, so we'll assume that format. Any tweets that don't use that format will just fall to the bottom of the list of films once we rank them.

# remove the hashtag, ignoring case
fav_film_all$text <- gsub("#fav7films", "", fav_film_all$text, ignore.case=TRUE)

# remove numbers from lists
fav_film_all$text <- gsub("\\d\\.|\\)|-", "", fav_film_all$text)

# convert to common case for all tweets
fav_film_all$text <- tolower(fav_film_all$text)

# trim any whitespace left over from earlier steps
fav_film_all$text <- trimws(fav_film_all$text)

At this point, we should have a bunch of lists of seven movie titles. What we want to do next is separate them all out into one large list of titles, count how many times each title appears, then sort the list. We'll also remove "A" and "The" from the beginning of any titles that include them, since many people included them, but many didn't.

titles <- list()
titles <- append(titles, strsplit(fav_film_all$text, split="\n"))
titles <- unlist(titles)

# remove leading 'a' and 'the' from titles
titles <- gsub("^a ", "", titles)
titles <- gsub("^the ", "", titles)

# remove empty titles
titles <- titles[titles != ""]

ranked_titles <- sort(table(titles), decreasing=TRUE)
top_25 <- head(ranked_titles, 25)

That's the final list. There are a lot of other conditioning steps that we could have taken, like looking for common abbreviations or misspellings, but I think this gets us pretty close to an accurate list.

You can view the full R source code that I used to gather and analyze tweets for this project in my Fav7 GitHub repository. Feel free to fork that and use it to analyze other Twitter favorites, and leave me a comment if you do, or if you have any questions.