Actuaries Should Start Paying Attention to Python by Andrew Webster

Screen Shot 2018-06-13 at 10.58.52 AM.pngThe 2016 movie “Hidden Figures” highlights the careers in the 1960s of three brilliant mathematicians, Katherine Johnson, Dorothy Vaughan, and Mary Jackson, who manually performed the calculations necessary to launch astronaut John Glenn into orbit. IBM’s FORTRAN (FORmula TRANslation) programming language was growing in popularity and dramatically changing the role that mathematicians played in the U.S. space program. More than 50 years later, the same issues involving technology and how it impacts the actuarial and data scientist professions remain. Surprisingly, both R and Python’s Scientific Programing Language (SciPy) library source code is still composed of 24% and 25.7% FORTRAN, respectively!

As the founder of an actuarial technology company, I have been exposed to many programming languages. Most are tailored for specific purposes, such as Facebook’s “React” for web programming or R for statistical analysis. There are low-level compiled languages optimized for speed such as C, C++ and FORTRAN as well as high-level interpreted languages such as JavaScript. But in my experience, Python is the one language that strikes the appropriate balance between low and high-level and is capable of solving almost any programming challenge.

As a general-purpose programming language, Python is suitable for many tasks including non-actuarial tasks where automation may benefit the actuary. Python is the “Swiss army knife” of the programming world. The importance of this requirement depends on the degree of IT resources available at a company to support their actuarial team. For example, at Validate Health we gather hundreds of flat files each week containing healthcare claims data. Python is used to transfer and aggregate the files, as well as interact with the SQL database to permanently store the transferred data.

Python is simple to learn for several reasons. The syntax is concise and used consistently throughout libraries. Punctuation is used sparingly to separate code blocks, making Python code highly readable. Python code is interpreted at runtime versus being compiled*. Variables are dynamically typed at runtime. This makes it easier to develop and edit code but causes the program to execute more slowly. It is common for Python code to connect to low-level compiled C, C++ or FORTRAN code to gain speed. For example, the machine learning library scikit-learn exposes compiled C routines through a Python interface to quickly evaluate algorithms.

Python has an open philosophy and is interoperable with other programming languages. It is the de facto language for writing application programming interfaces (APIs). Some have termed Python as “programming glue” since it is used to connect so many disparate systems. Another important benefit for actuaries who learn Python is that they will share a common language with the IT department. This expands the pool of professionals available for peer review and makes it more seamless to implement actuarial analyses in production.

*Python can also be compiled for performance improvements (e.g. Cython project).

If an actuary is interested in a non-traditional career as a data scientist, there are multiple industry signals that indicate Python is growing in popularity among data scientists. . KDNuggets is a website that tracks the top analytics, data science and machine learning tools. In a 2018 KDNuggets poll of 2,300 data scientists, 65.6% used Python whereas 48.5% used R.1. Kaggle is an online platform where data scientists can compete in data science competitions to win prize money. Participating data scientists publish their source code as Kaggle Scripts. In a 2016 analysis of Python vs R as seen in Kaggle Scripts2, both languages were shown to be competitive with each other in terms of usage and ratings on Kaggle. The premier global “R in insurance” conference was renamed the “Insurance Data Science” conference in 2018 after five successful years, now incorporating Python sessions as well as R.3. The largest indicator of interest in Python is that the tech giants are investing heavily in analytics tools built in Python. For example, in 2015 Google open-sourced the deep learning library TensorFlow. By mid-2018, the code base for TensorFlow is 48.2% C++ and 40.8% Python with over 1,480 code contributors.4

Ultimately, the software tools and programming languages that an actuary uses depends on familiarity with what was learned in school, compatibility with legacy actuarial software and the availability of other actuaries knowledgeable in the same language to peer review and maintain code. If actuaries want to explore non-traditional opportunities as data scientists, having Python skills certainly provides a competitive advantage over other candidates.

Join ACTEX and me for a Webinar on June 21st  to delve deeper into the statistical modeling packages that Python offers including NumPy, SciPy and Scikit-learn. We’ll show how Python can be used to tackle commonly encountered regression and classification problems using linear regression, random forests, logistic regression and support vector machines. This is a continuation of the first two parts of our Python for Actuaries Webinar series which introduced the core Python programming language and the Python data analysis library, Pandas.

References: