Time and error
Redacting documents can be time-consuming, particularly when dealing with a large volume of documents or complex information. Time pressure can arise due to impending deadlines, or the need to share information quickly while ensuring data privacy. When there isn’t enough time allocated for redaction, there is an increased risk of errors. We’ve seen in the past that redaction failures have caused not only financial, but also reputational damage. Indeed, we’ve seen some examples of this from the highly publicised Manafort, and also Giuliani US court cases.
Even if there’s ample time to conduct the redactions, manual errors still pose a significant challenge in this process. Even well-trained professionals can inadvertently miss sensitive information or make mistakes during the redaction process. It could be simply due to human oversight in complex documents, or to fatigue or distractions in even the simplest documents. Either way, it will lead to unintended disclosure of confidential data. It becomes very likely that something will be missed, as information can be buried deep in very long documents, in increasingly large data sets. Verizon reports that 74% of data breaches involve a human element. IBM reports that “for 83% of companies, it’s not if a data breach will happen, but when.”
The problem with traditional search methods
Your organisation may have moved on from purely manual techniques and may be using search techniques as available in Word or Adobe. Your organisation may even be using or looking into the use of “regular expressions” (regex)? Well, in practice, these don’t cut it either. We will explain why with two examples:
- Finding names: Whilst this might sound simple, it is not in practice: you probably don’t know which names are in the documents, so making a list of names and searching for those names in all documents is only a partial solution at best.
And even if you have a list of all the names you need to redact, there are still always going to be exceptions. In practice, the names may be presented in a variety of different ways not included in your pre-defined list e.g. James Johnson, J. Johnson, Jim Johnson, Mr. Johnson… a daunting task to capture all those! There may also be unexpected spelling errors or other inaccuracies caused by low-quality scans, which will make simple search methods very difficult.
- Finding structured PII (such as telephone numbers, social security numbers etc): Whilst this is something that can be captured by regex, it actually still opens up yet another can of worms. Telephone numbers and SSNs have different formats in each country, so capturing all these formats with an individual regex is quite a bit of work. And then again, there are likely to be spelling and OCR errors when dealing with low-quality scans. Like with search tools, regex can only get you so far. With the likelihood for human error, the incomplete solution provided by search and regex techniques, and the consequences of improper redaction, it seems all too risky to rely on traditional processes.
Ready to throw in the towel?
Don’t! This is where the benefits of utilising AI, alongside human action really come into play, which we will discuss next week …. Stay tuned!