Smart VDRs – It is all about Accuracy

17 January 2023

Last week we shared our blog on Smart VDRs with you. It included some accuracy stats on Smart Summaries, the key Machine Learning tool to extract data points from transaction documentation (such as dates, locations, etc.). Before that we shared the accuracy stats of our Smart Redaction tool.

Today we would like to zoom in on these results and discuss what they mean for you.

AI powered redaction sounds good. So does fully automated redaction…

But none of that is of value if the results are not accurate: it needs to actually work. That is why, at Imprima, we share our test results with you. We seem to be unique in that. You may ask yourself why other vendors are not doing that. As one of our customers, who used Smart Redaction to redact sensitive data in their client’s data room, said: “The accuracy rates are high which was something we didn’t experience before when using other redaction tools – a welcome change indeed.”

Why is accuracy important? And what do we mean by “accuracy”?

First, let us reiterate the results we obtained:

Smart Redaction test results

Test results of Imprima Automatic Redaction on a combination of English, German, Italian and Dutch documents, randomly picked from Imprima VDRs, where the objective was to automatically redact: person name, address etc., see table. Imprima’s machine-learning algorithm (a native tool in the Imprima VDR) was applied. As can be seen from the results in the table, a very high recall is achieved for all entities, in all languages, where overall recall is higher than 90% for all items redacted. So, what does this mean? Recall is a measure of reliability, in particular how many of the items to be redacted are actually found. Not shown in the table, but this was also achieved with very high precision (a measure for how few words were redacted that did not need to be redacted): only a very tiny fraction of all words in the document were “over redacted”. If you would like more information on this experiment, please contact Imprima.

Smart Summaries test results

Test results on a Real Estate transaction with Dutch documents. As can be seen both precision (a measure of the absence of False Positives) and Recall (a measure of the absence of False Negatives) are high (F1 is a measure of overall accuracy, combining Recall and Precision). Average recall is 86%, meaning that 86% are automatically found. This was achieved after training on less than 100 examples per data point. When combined with manual review of the AI results (Technology Assisted Review), it will result in both very significant time saving, and higher reliability.

Accuracy – what is it, and why is it important – Saving time and increased reliability

In essence it is all about how many data points can be automatically (and correctly!) retrieved. In other words: “Recall”¹.

So how does this all help you? Obviously, if – as in the above cases – around 90% of the data points are automatically extracted or redacted, then you then only have to verify and add the missing ones.

Obviously, that means that a lot of time is saved. Quoting yet another customer: ”The AI-based redaction option is an excellent tool and saved us a tremendous amount of time.” In addition, the reliability of the end results is significantly improved. Humans are not infallible, as research has shown the ability of humans to retrieve key data from documentations results in recall of only 50%-85% (according to a 2011 scientific study by Grossmann and Cormack).

Technology

The reason that we can achieve such high accuracy is that we use a Neural Net that:

Uses the context of the key data points rather than the data points itself: only that way can it determine what the data point is about. And that context is not only used in terms of the terms (words etc.) in it, but it also looks in which order they occur, and their relevance for the key words to be extracted.
Allows training in one language while predicting in another. Therefore, the trained data of all languages used benefits the accuracy of predicting in any language.

Ease of use

Finally, our technology is paired with – we dare say – a very slick user interface. One of our customers characterised it as “very powerful and intuitive” and further said that “it has certainly made our lives easier compared to other projects we have been involved in with other VDR providers.”

Conclusion

What conclusions might we draw? To begin with we need to acknowledge that we asked ChatGPT to review this blog, and it came up with some useful suggestions. Why are we bothering to mention that? Because it serves to prove the point that not only is AI here to stay, but it is only going to become more pervasive, forming an integral part of our daily (working) lives. Whether that be in open-source tools like ChatGPT, or in very specific, dedicated solutions like Imprima’s Smart VDR. We embrace this evolution and are committed to further developing and deploying AI to its fullest potential, helping our customers save time whilst making their lives easier.

Interested? Contact Imp rima