Five Things to Do with That Outlier Data of 2020
December 03, 2021
Five Things to Do with That Outlier Data of 2020
Insurers and actuaries looking to develop predictive models or set assumptions based on data collected during the pandemic are in a quandary. Use it? Lose it? Modify it?
Has time stood still for the past 20 months, or does it just feel that way?
We’re all aware of just how different the world is since the pandemic upended our regularly scheduled lives in early 2020. Coping with the effects of COVID-19 on society has imposed uncommon restrictions on us all — we’ve had to cut back on our car trips and postpone routine doctor visits, for instance. We’re not commuting into the office as often, hitting the gym, or going to the movies or restaurants as much as we did pre-pandemic.
The aberration in our behavior patterns presents quite a dilemma for insurers and actuaries. But they know that. Typically, both parties base their forecasts and future pricing models on recent historic data and trends, in many cases collected over the most recent three- to five-year period.
But these are atypical times. The usual rules that apply to behavioral and claims data collected during temporary aberrations — like natural disasters, for instance — cannot simply be applied to the pandemic data. (Well, they can, but as you’ll see, the results can get pretty messy.)
How is an auto insurer supposed to use data collected during the pandemic if we weren’t tooling around in our cars as often? What should life insurers do to set their rates when there’s been an unprecedented — but temporary — spike in deaths and the effect on future trends in mortality rates is highly uncertain? Should annuities be cheaper to reflect lower life expectancies?
No worries if you find your head spinning over the quandary. The problem of how to handle outlier data from the pandemic period is enormous — and it’s only going to grow worse as that data becomes the new norm. Below are five ideas for handling the data, along with the downside of each.
In this scenario, 2020 data is factored in and forecasts are made based on a new, post-pandemic “normal.” The idea is that people have now established new pattens of behavior because of the pandemic. They may continue to travel less, for instance, drive less, visit their doctors less frequently and prioritize life or health insurance spending more. If that’s the case, then factoring in the data makes sense, because it reflects the future landscape of insurance.
Disaster coverage already reflects a shift to a new normal, as climate change continues to increase floods, hurricanes and wildfires in the United States. Why not adapt your model in a similar way?
But here’s the downside: Most insurance systems aren’t set up to deal with the kind of massive outliers that disaster insurers are. You’ll probably still have to make manual adjustments to the data using methods that may be new to you. (Realistically, the slope of the path out of the depths of the pandemic can’t be extrapolated into the future, so you can set this idea aside.)
Remove the data.
Looking over the past 20 months, it’s clear the relevant data set is so far outside others collected over the past 20 years that the easiest solution might be simply to remove it and splice together numbers from early 2020 and late 2021. That way, there’s no impression of a new trend that may not actually exist.
The downside? It may not reflect our new reality. (And depending on where you live and the actions that occurred during the pandemic, the splicing would vary.) On top of that, if you splice together, say, August 2021 as the month ‘following’ February 2020, you’ll invalidate all your seasonality models and trending models and require a completely new treatment for inflation.
Modify the data.
Consider the quandary of auto insurers. In a normal year, there might be 12 accidents for every 100 policies, and you can use that percentage to forecast for next year. But people drove far less in 2020 and there were fewer automobile accidents. Do you exclude that data or modify it in some way? What about the accountant who drove less frequently and the delivery person who drove more?
Look at health insurance. With claims unusually low, is that reflective of reality or are there “stored up” claims or health issues that have become worse? How do you allow for the impact of the pandemic on non-COVID health services, such as missed cancer screening appointments? Will insurers find mental health claims increasing in importance in the coming years?
The downside when tinkering with data is that you introduce a new dimension into how the numbers are used. Doing so normalizes the way the formulas are employed in the future.
Change the structure of data from absolute to relative.
A general insurer typically creates models drawn from a large number of component models. Perhaps you could restructure some of those combinations using different techniques specifically designed for COVID modeling. Instead of predicting car accidents or deaths for a cohort, for instance, you could predict the variation of that cohort against the population average.
The downside? It’s not as simple as it sounds. First you need to select cohorts in your population you believe this would work for, then you need to build new relative variation prediction models. To use the new models in the future, you would need to predict the population average so that you have something to apply the variance prediction against. Phew. The moving parts alone in this new system should be a warning flag in and of themselves.
Use set theory, data classification or fuzzy logic on the data.
Why not let your imagination run free and use an entirely new way of looking at sets of cohorts? Take the population identifying as teachers. Within the set, you could pick out the 60 percent with certain familiar characteristics for whom you have a wealth of data, like high income, medium credit, multi-family and no children, and classify them as “urban professionals.” The remaining 40 percent become a population you estimate for using models that load the outcomes for your lower confidence.
Or perhaps you use fuzzy logic and set “membership” as a way to deal with the situation. An insurer might define risk-set populations by those who have common characteristics, like “professional staying at home” or “long commuter” or “weekend getaway driver.” For health insurance, it could be the “committed fitness enthusiast” or the “too-busy-to-exercise stressed-out-worker.”
The downside? There’s almost no tested application of decision-making in the insurance sector for any of the above. Risk is everywhere.
These are tough decisions to make under extreme circumstances. With each passing day, the pressure grows to make decisions on forecasts and rates as competitors rush to market before year’s end. Time may feel like it’s standing still, but moving quickly is paramount.
It’s a challenging situation. But what isn’t these days?
© Copyright 2021. The views expressed herein are those of the author(s) and not necessarily the views of FTI Consulting, Inc., its management, its subsidiaries, its affiliates, or its other professionals.
About The Journal
The FTI Journal publication offers deep and engaging insights to contextualize the issues that matter, and explores topics that will impact the risks your business faces and its reputation.