Oscar 2017 predictions

For the past few years, I have tried to predict the winners in all categories at the Academy Awards. And for the past two years, I’ve also used statistics and data analysis to inform my decision in four categories: Best Picure, Best Director, Best Actor, and Best Actress.

As for the last two years, I stick to what the model tells me for my prediction in these categories. However, I’m skeptical about the predictions I have for the acting categories. First, for best actor, Denzel Washington and Casey Affleck are the only two front-runners–there is no way Ryan Gosling will win this award. Second, for best actress, although I would argue that Emma Stone is now the favourite, Isabelle Huppert definitely has a chance too. Therefore, I expect to miss one of these categories.

Finally, this year is definitely La La Land’s show: 14 nominations, tied with Titanic and All About Eve for the most ever. I don’t think it will set a new record (which is 11 wins), but I expect them to win around 10 awards. I don’t expect them to run the tables in the sound/music categories–which would be a first for a musical. I think the only one of these four awards they could miss is Best Sound Editing, which I predict will probably go to Arrival.

My predictions are below, in bold. After the Academy Awards, I will update this post and point out the winners–I will indicate them in italics.

Update (2017/02/27): Wow! What a finale! As for my predictions, I did better than last year: 16/24.

Read more

US Presidential Inaugural Addresses

Earlier this week, on January 20th 2017, Donald J. Trump was inaugurated as the 45th president of the USA. He also gave what seemed like a very short inaugural address, and so I was curious to see how short it really was compared to previous addresses. It was also an opportunity to have a quick look at other properties of his speech.

Read more

The Instability of Forward and Backward Selection

Classical statistics often assumes that the analyst knows which variables are important and which variables are not. Of course, this is a strong assumption, and therefore many variable selection procedures have been developed to address this problem. In this blog post, I want to focus on two subset selection methods, and I want to address their instability. In other words, I want to discuss how small changes in the data can lead to completely different solutions.

Read more

Removing all R CMD check warnings

Making R packages is an important aspect of the statistician’s work. Or at least it should be: it is quite annoying when a new method appears in the literature but no implementation is readily available.

A favourite mantra of mine when making R packages is the following: an R package is more than the sum of its functions. A functioning R package needs to be able to interact properly with the R environment (through the NAMESPACE); a good R package also needs great documentation; a great R package will also include a vignette to guide new users and explain how all the functions interact with one another.

The main reference for how to make R packages is Writing R extensions. Everything you need to know is there, if you know what you are looking for. Another, very useful reference is Hadley Wickam’s book on R packages. This book explains the different components of an R package, and it also serves as an introduction to his devtools package.

In what follows, I don’t want to go over how to make an R package; the above references do a better job than I could hope to do. Rather, I want to share my experience about some of the most annoying part of making an R package: passing the R CMD check. Removing the errors is the most important part, and what kind of errors you get really depends on the package (the log file is typically quite useful in figuring out what triggered the errors). On the other hand, you also want to minimize the number of warnings and notes, and most warnings you probably want to remove altogether.

Read more

By how much will Clinton win?

American politics is great for statistics: there are huge amounts of polls being conducted every week, some positions are up for re-election every other year, and there is really only two parties. Moreover, the complicated nature of the whole election process, which for example involves the electoral college for the presidential election, makes it more interesting than most democracies around the world. It’s for all these reasons that an incredible website like FiveThirtyEight is possible.

Read more