Mind the Machines: Time to Explore the Potential of Machine Learning

Why should law firms explore machine-learning software?

Law firms should use machine learning software so that they can provide better service to their clients and so that they can make more money. They can serve their clients better because they can forecast and categorize legal fees or tasks. They can make more money because they can sell more work or handle existing work more efficiently.

To explain this potential, we will address law firm data, the specialized computer programs that manipulate data, and the output models they generate. (This article refers to law firms but its key points have nearly equal applicability to large law departments.)

Law Firm Data Sets

Machine-learning software requires data, and law firms have data sets aplenty. They are awash in information about many things, such as matters their lawyers have worked on. One of the deepest pools for that is collected by a firm's time and billing system.

Think of a data set for matters as a spreadsheet where each row is one matter, and it has many columns of information about that matter. For instance, the data likely includes the start date of the matter, the type of matter (litigation, real estate, etc.), and the total fees incurred. A different data set might have to do with clients, such as how many different types of matters have been assigned to the law firm in the past two years, which partners have worked on the client's matters, how long the firm has represented the client, etc.

Starting with such internal data, firms can enhance the set with information from other sources, possibly within the firm or from data outside the firm. With clients, supplemental information might include whether it is publicly traded, or whether it has an in-house law department. Or, the years of experience or academic degrees of timekeepers could be mixed in from the firm's HR database; revenue, number of employees, and SIC codes of clients can be "mashed up" from external sources.

Whatever the kind, source or quality of the data sets, each of them harbors insights that help managers within the firm (or law department) direct lawyers and paralegals more effectively and provide more value to their clients. Machine learning software, often referred to as "algorithms," finds patterns in data far beyond what humans can do. The shaping of complex data and calculations of relationships extract from data sets the kinds of management insights that benefit lawyers and their clients.

Machine Learning Software

It is possible to do some limited forms of machine learning with Excel or other spreadsheet programs, such as multiple regression, but that is not the software solution of choice. Firms can license software that carries out machine learning analyses, such as SPSS and SAS and Mathematica. But there are also extremely powerful and completely free open-source choices such as Python and R (yes, the language is called "R").

R comes with a vast, foundational set of capabilities known as "base R." It can handle almost anything mathematical or graphical. Moreover, R has more than 100 "packages" that do various tasks (with more coming available all the time). Think of R packages as free add-on modules that can be downloaded easily.

Whatever the software, they can "fit models" where the data is in a spreadsheet format, and it can be numeric or categorical (industry is a categorical element) or binary (publicly traded or not). The software (or other R packages) can test whether the data is appropriate for conducting machine learning and help users correct the data, such as when there are missing or highly unusual values. Then too, some machine-learning software does "ensemble learning" where it quickly produces hundreds or thousands of models and takes the average of them. Finally there is a whole array of R packages or capabilities within packages (and with the other choices) to display the results of the model fitting.

Output Models

What can machine-learning algorithms produce? What the algorithms generate is called a "model." Once you have a model, you can extract information from it.

A model often takes in data and makes predictions regarding new cases or clients or matters or whatever. Think of a model as the software learning on a "training set" of data that has been labeled, such as settled for less than $10,000 or not, and applying that learning to predict something (maybe total fees) for a new case or example. With multiple regression, naïve Bayes, or neural nets prediction is a common output. For example, given a few dozen instances of a type of lawsuit, any of those machine learning algorithms could predict the likely cost of the of a new matter, once sufficient information is available, and tell you how probable that cost would be.

Other models can also classify new observations into the most appropriately fitting group. With several types of algorithms, including K-Nearest Neighbor or Support Vector Machines, you can classify clients or other data. You would be able, for example, to identify publicly traded clients or clients likely to reach a certain realization level.

Other varieties of machine-learning software do not require labels. Their models cluster the data into groupings that will reveal something. For example, they might cluster a firm's clients by profitability. The K-Means algorithm can do this. And with Principal Components Analysis you can aggregate "variables" and find out which of them are more influential.

The underlying mathematics that the software applies to produce predictions, clusters, or classifications is arcane. It helps to understand some aspects of it so you can appreciate the strengths and limitations of machine learning algorithms, but you certainly do not need to know linear algebra or matrix decomposition or master n-dimensional hyperspaces.

You do, however, have to make some decisions about the settings of algorithms, known as "parameters." To give one example, if you implement a neural net, you need to specify the learning rate (or accept the default rate parameter). When you code the software with a handful of parameters, it will compute everything under the hood, and with the small datasets of law firms, produce the model and results in less than 10 seconds.

If machine learning software is not familiar to your lawyers, you might start with an introduction at an off-site or practice group meeting. Lawyers need to appreciate the potential of this broad set of tools to invest in it, see how works, appreciate its limitations, and demystify impressions they might have of it being beyond mere mortals' comprehension.

As a next step, your firm might do a pilot study where you take a data set and apply several of the algorithms to it. For instance, you might collect data about the associates you had two years ago and use it to predict which associates will leave the firm next year.

During the next five years, machine learning software, a domain of the loosely-defined field of "artificial intelligence," is going to make its mark in the management toolbox of progressive law firms and law departments. Firms that wisely collect data and astutely choose the programming algorithm that suits it will surpass their competitors who rely on experience or impressions. Machine learning algorithms will become mainstays in the emerging world of law firm data analytics.

Rees Morrison is a principal with Altman Weil.  One of his specialties is data analytics for law firms and corporate law departments.  Contact him at rwmorrison@altmanweil.com.  

This article originally appeared in Law Technology News, October 21, 2016.   Copyright 2016. ALM Media Properties, LLC. All rights reserved.

Email this page
Email this page


Practices & Services