Guest Blog by Mary Pat Campbell, FSA, MAAA, PRM,
I found myself nodding several times through Brian Fannin’s blog post “My Journey with R” – I particularly liked Brian’s remark:
Excel would return diagnostics about a statistical model, but I could tell that it had a greater interest in enabling changes to font size or attaching a file to an e-mail.
Excel is a general purpose numerical tool. It can be useful and powerful in many ways, but too often it’s not built to implement more complicated analysis of data than simple aggregation and trending. If one looks at the amount of “ribbon real estate” various style and formatting elements take up, one sees that, indeed, a lot of space is spent on making spreadsheets look pretty and not so much on numerical heavy-lifting.
On the other hand, R is built specifically for heavy-duty statistical analysis. Originally developed in the early 1990s as an outgrowth of Bell Labs research, R has grown into a number-crunching powerhouse due to the statistically-minded open source community that has developed over 11,000 packages of functions and procedures for over two decades. One can even find an actuarial-specific package at the CRAN repository.
I have found through the years that having a variety of tools at my disposal makes it easier to solve the problems I’m looking wants to solve. Yes, I do most of my day-to-day work in Excel. I will even try linear fits on Excel data and explore various trends; however, the brutal truth is that it’s an awful tool to use for other types of analyses. I’ve used R, Python, C++, and other languages and software systems to solve a specific piece of a problem for myself. I choose the tool based on the kind of problem – issues varying from the amount of data, type of data, types of analyses to be performed (periodic and updating vs. one-time exercise) – and often I’m combining the tools for the best results.
The main place where Excel wins, though, is when a broad group of people need to be able to pick up my work product and either replicate or update it.
What’s great now, however, is that are that there are several online tools that make it easier than ever for others to pick up and adjust code in R and Python without having to download new software or spend too much time in trying to figure out what needs to be adjusted where.
You can see examples here:
The concept is that one can run code in the cloud and have the results displayed through a browser, all without needing to install any software on your own machine. These online “notebooks” or “kernels” also make it easy for the creator of a file to explain code in a much more natural way than standard code commenting.
In my upcoming ACTEX Webinar on 11/30, Intro to Predictive Analytics for Actuaries, I will be using such a notebook for attendees to try out various predictive analytics techniques in R. It’s a great way to get started with a new tool – I don’t assume attendees will know anything about R nor that they have R installed on their machines.
Come and join me as we explore an easy way to learn a new tool and how to use it right!
Mary Pat Campbell, FSA, MAAA, PRM, is Vice President, Insurance Research at Conning in Hartford, Connecticut. She also teaches courses on computing (Excel, Access, and VBA) and business writing for actuarial science students at the University of Connecticut.
Mary Pat has had a long-standing interest in modeling techniques, having worked on models covering molecular physics, neuroscience, finance, population studies, signal processing, statistics, information retrieval, electronic logic games, and, of course, actuarial models. Mary Pat Campbell is a founding member of the SOA Modeling Session and is co-editor of its newsletter, The Modeling Platform.
Mary Pat wrote “Getting Started in Predictive Analytics: Books and Courses” in the December 2015 issue of the Predictive Analytics and Futurism newsletter and presented at the 2016 Life and Annuity Symposium: Session 16 — Predictive Analytics, Where Do I Even Start?