AI Driven Redaction – Balancing Recall and Precision

recall-precision-1

In our previous blog, we explored how Large Language Models (LLMs) power the automatic redaction of sensitive information, emphasising the potential for high accuracy. Today, we’ll dive deeper into the evidence behind this claim, focusing on the critical metrics of recall and precision in our LLM-driven Smart Redaction technology

Understanding Recall

PII items precision table

As highlighted an earlier blog, Smart VDRs: It’s All About Accuracy, recall is the cornerstone of effective automated redaction. High recall ensures that nearly all sensitive terms requiring redaction are identified. Without it, the tool fails its primary purpose. We tested our LLM-driven Smart Redaction technology on a set of data room documents, using 5-fold cross-validation to ensure statistically significant results. The experiment yielded an impressive average recall of 93%, confirming that our tool successfully identifies the vast majority of terms that need redaction. 

The Role of Precision

While recall is critical, precision is equally important to assess a model’s accuracy. Precision measures the proportion of correctly redacted terms, or conversely, how few terms are incorrectly redacted (False Positives). High precision minimises unnecessary redactions, preserving document readability. 

Achieving perfect recall and precision simultaneously is impossible, even with cutting-edge AI. For example, redacting every word in a document would guarantee 100% recall but render the document useless due to 0% precision. Our LLM-driven tool optimises for high recall while maintaining strong precision, typically in the 80-90% range. In our recent test, precision was at the higher end of this spectrum. 

What This Means in Practice

Let’s break down the real-world impact. Suppose precision is at the lower end, say 80%. This means 20% of redacted terms were incorrectly flagged. If 5% of a document’s terms require redaction (a reasonable estimate), only about 1% of the document’s terms are erroneously redacted. This minimal error rate ensures the document remains highly readable. 

Moreover, our tools allow users to easily review and undo erroneous redactions. Stay tuned for next week’s blog, where we’ll explore these features in detail. 

Conclusion

AI-driven redaction, powered by Large Language Models, offers significant time savings and reduces human error compared to traditional automation methods. As demonstrated last week, only LLM-based AI delivers the high accuracy required for reliable redaction. With recall above 90% and precision between 80-90%, our Smart Redaction technology strikes an effective balance, ensuring both security and usability. 

Stay tuned for more insights next week! 

Are you looking for a VDR with fully integrated redaction software which leverages AI? Speak to our sales team or check out our Smart Redaction page here.

Want to find out more?

Submit your details below and a member of our team will get back to you shortly

This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is for validation purposes and should be left unchanged.
Related Blog Posts
See all blog posts relating to: