The Impact of Automated Exploratory Data Analysis in Litigation
When litigation matters require complex data analytics, data scientists often need to deal with large volumes of data with little to no information regarding its contents, and often under tight deadlines. In such cases, exploratory data analysis (EDA) can be an effective tool to gain a thorough understanding of data and how it can be used.
EDA starts with an initial assessment of the data’s structure and quality to better inform how it can be organized. Once the initial assessment has been completed, EDA moves on to examine what illogical conditions might exist that would provide clues to the overall quality and reliability of the data. Data analytics experts use computer code to automate a large portion of the EDA process, producing rapid and precise results. Without automated EDA, it would be a very challenging and time-consuming task to glean insights across a large number of variables of unknown type and meaning.
Automated EDA and the review of standard reporting packages provides at least two key insights that make it a smart first step upon acquisition of any large data set. First, the reports may reveal that, while the dataset appeared robust at the outset, many fields were not viable for analysis. Second, by reviewing date fields, it may be discovered that large, critical portions of a data set are missing. Either of these issues, if not discovered at the outset and quickly addressed with EDA, could result in costly delays and unnecessary labor at later stages of an analysis.
Reprinted with permission from the July 27, 2020 edition of Legaltech News. © 2020 ALM Media Properties, LLC. All rights reserved.