Pieter van der Made – Executive Chairman
Imprima AIR, Artificial Intelligence for M&A Due Diligence
As extensively covered in the media, the application of Artificial Intelligence (AI) is increasing rapidly in a plethora of use cases – from the translation of natural languages through to robotics, self-driving cars to virtual assistants, image recognition, and beyond, to name just a few applications.
AI also has a lot of potential for M&A Due Diligence, where ever-increasing amounts of data have to be analysed in ever-decreasing amounts of time. In a typical deal, a Virtual Data Room (VDR) can contain thousands of documents that need to be analysed by interested parties (bidders & their reviewers). Even more, documents need to be reviewed on the sell side when collating what needs to go into the VDR. As well as being time-consuming, the accuracy of manually analysing all of these documents is limited. This is where AI can help, by providing the potential to be much more efficient and significantly more accurate.
Imprima started working on AI in 2016, with the purpose of building a system that helps our clients deal with these problems, by making their M&A Due Diligence processes more streamlined, cost-effective, and robust. At this juncture, we would like to share our design considerations for this system.
Overview of AI technologies and their pros and cons.
First, it is important to understand which AI technologies could be relevant for M&A Due Diligence:
- Rule-based systems (not to be confused with rule-based machine learning) – where a set of rules, set up by expert users, is used to make decisions, retrieve information or classify data.
- Machine learning – where generic algorithms are used that are agnostic to the problem they need to solve. These algorithms become problem specific when trained by feeding them with large amounts of data. Specific forms of machine learning which are relevant for M&A Due Diligence are:
- Supervised machine learning, where a machine learning algorithm is trained using data sets that have been classified (or “labelled”) by domain experts (e.g. lawyers in the case of legal documents). Thereafter the algorithm is applied to new data, in order to automatically classify it.
- Unsupervised machine learning, where the machine learning algorithm is trained with data that has not been labelled.
- Semi-supervised machine learning, where the machine learning algorithm is trained with data that has been partially labelled.
While (supervised) machine learning is currently often promoted as a panacea in many situations, it is important to consider the pros and cons of the various AI techniques first:
- Rule-based AI has been proven to work, and is very pragmatic and versatile. New rules can be created and put to work instantaneously, and do not require intensive training first. However, it requires people with expertise with such systems to create the rules. On the other hand, once they have been set up, the resulting queries can be used without needing any AI or other technical expertise going forward.
- Machine learning is potentially more accurate (i.e. higher “recall” and “precision”), does not require sets of rules to be created by technical experts or AI expertise to train the algorithm. Training can therefore be executed by non-technical domain experts (e.g. lawyers).
- However, the current approaches promoted in the M&A Due Diligence space rely heavily on “supervised” machine learning, which is completely dependent on large training data sets that are supposed to be classified (“labelled”), for each different use case, by expert lawyers. That is a big downside, as domain experts (e.g. lawyers) simply do not have the time to provide such data for all possible use cases.
- Unsupervised machine learning is, in principle, able to identify groups of documents of a similar nature without any learning data, but it still requires user interaction to classify these groups.
- Semi-supervised machine learning also requires large amounts of data for training, but only a fraction of the data needs to be labelled, making it much more feasible for practical usage. Obviously, however, the results of semi-supervised machine learning may be less accurate than with supervised machine learning.
How can we make optimal use of these technologies in M&A?
At Imprima, our vision is as follows:
- The key to make machine learning work is to find an approach that does not require extensive training data prior to obtaining meaningful results. Therefore, supervised machine learning as a stand-alone solution is not practical.
- Machine learning is powerful, because it is agnostic to the problem at hand: it can “learn anything”, provided sufficient data is supplied to train the algorithm. However, that is also its weakness: it does not make use of what we already know about the problem at hand. For instance, if we are looking for a provision of a certain type, in the vast majority of contracts that provision can be found in a clause with the same or very similar title. That does not mean a simple search for clauses with such a title will suffice: provisions that don’t have that title will be missed, while machine learning could be successful in identifying such provisions. It does mean, however, that combining “prior knowledge” with machine learning should provide the best of both worlds.
- For both these reasons, joint application of multiple AI techniques holds much promise, as we can make use of the advantages of all methods, while offsetting their disadvantages. As such, an approach can be designed that does not require extensive learning data sets, or any at all, while still providing optimal accuracy.
- As each data set analysed by our users is different, the algorithms are likely to need fine-tuning in each case. Supervised machine learning is the logical choice for this part of the process, provided that it is embedded in the work flow that the user executes in the Due Diligence process anyway: the user should not be required to go through a separate training step.
- In the case of M&A Due Diligence, the VDR providers can play an important role, because they provide direct access to the data as well as user behaviour (the users’ actions when analysing the data as part of the regular Due Diligence, process). A precondition is that the VDR provider gives direct access to machine learning algorithms, without having to move data back and forth between the VDR and AI systems. Apart from the latter simply not being very user friendly, it will not allow the user to fine-tune the AI algorithms while analysing the data during the regular Due Diligence process.
- We believe that simple is best, both in terms of AI and the User Interface (UI). The UI for applying AI needs to be dead simple; the user does not want to be bothered with the technical details of the underlying AI technology.
- The AI algorithms also need to be kept as simple as possible, so they are fast and can be applied interactively, but also to avoid “overfitting”. Overfitting is a phenomenon where a machine learning algorithm is trained to perfectly predict the exact same data as the training data set, but is not able to deal with insignificant data variations that are not exactly covered by the training data set. There are various ways to avoid overfitting (such a regularization), but the most effective and practical method is to choose the simplest algorithm that explains the data – a principle often referred to as “Occam’s Razor”. Improvement of AI results is often not obtained by more complex algorithms, but by more data: more raw data, as well as additional labelled data.
Introducing Imprima AI
Imprima is leading the way in AI adoption for M&A Due Diligence and other use cases. Our comprehensive AI functionality is available under the Imprima AI brand. It caters to a range of use cases, including document classification, information retrieval to create document summaries, legal clause extraction, detection of legal red flags in documents, and much more.