Redaction

Personally Identifiable Information Redaction: Detect and Redact PII at Scale

Automatically detecting and redacting personally identifiable information is fundamental to GDPR compliance, DSAR response preparation, FOI disclosures, and any workflow that involves sharing documents containing personal data. At volume, manual redaction is the bottleneck.

What is Personally Identifiable Information?

Personally identifiable information (PII) is any data that can be used — either alone or in combination with other information — to identify a specific individual. The concept originated in US privacy law but is now used globally as shorthand for the personal data that privacy regulations require organisations to protect.

What counts as PII is context-dependent. A person's job title alone is not PII, but the same job title combined with a company name and a date of birth may identify them. Modern privacy regulations recognise this and treat identifiability as the key test, not any specific data type.

Types of Personally Identifiable Information

PII falls into two broad categories based on how it identifies a person:

Direct Identifiers

Data that explicitly names or identifies a person, standing alone:

  • Full name
  • National insurance or PPS number
  • Passport or driver licence number
  • Email address
  • Phone number
  • Bank account and card numbers

Indirect Identifiers

Data that identifies a person only when combined with other information:

  • Date of birth
  • Postcode or full address
  • IP address
  • Device identifiers and cookies
  • Employment history
  • Physical descriptions and photographs

Sensitive Personally Identifiable Information

A subset of PII requires additional protection because of the harm its disclosure could cause. This sensitive PII overlaps with GDPR's “special category data” under Article 9:

  • Health records and medical information
  • Biometric data (fingerprints, facial recognition, voice prints)
  • Genetic data
  • Racial or ethnic origin
  • Political opinions, religious beliefs, trade union membership
  • Sex life and sexual orientation
  • Financial account details and credit information

Personally Identifiable Information and GDPR

GDPR does not use the term “personally identifiable information” — it uses “personal data”, which is deliberately broader. Article 4(1) defines personal data as “any information relating to an identified or identifiable natural person”. This covers everything traditionally considered PII, plus a wide range of indirect identifiers that US privacy law historically excluded.

For practical purposes, all PII is personal data under GDPR. The difference matters when designing systems: a PII-focused approach can miss data that GDPR treats as personal, creating compliance risk. Modern redaction tooling must recognise the full scope of GDPR personal data, not just traditional PII categories.

UK GDPR and the Data Protection Act 2018

Following Brexit, the UK retained GDPR's core provisions as UK GDPR, supplemented by the Data Protection Act 2018. The definition of personal data is the same as under EU GDPR, and the obligations around redaction for DSAR responses, data sharing, and disclosures are materially identical. The GDPR redaction page covers the regulatory framework in detail.

Special Category Data Under Article 9

GDPR Article 9 creates an enhanced protection regime for special category data — the sensitive PII types listed above. Processing special category data requires both a lawful basis under Article 6 and a separate condition under Article 9. When redacting documents, special category data demands particular care: accidental disclosure creates greater regulatory risk and potential harm to the data subject than disclosure of ordinary personal data.

What Personally Identifiable Information Should Be Redacted?

The specific PII that requires redaction depends on the context. There is no universal rule — the legal basis, purpose, and recipient all shape what must be removed before disclosure.

DSAR Responses

When responding to a data subject access request, third-party personally identifiable information must be redacted unless disclosure is lawful. This includes names, contact details, and identifiers of anyone other than the requester.

FOI Responses

Public bodies responding to freedom of information requests must redact personal data covered by statutory exemptions, commercially sensitive information, and other exempt material before release.

Data Sharing with Third Parties

Sharing documents with external parties — auditors, consultants, regulators — requires redaction of personal data that the recipient has no legitimate need to see, under the GDPR principle of data minimisation.

Document Publication

Documents published externally — research papers, annual reports, case studies — must have personal data redacted unless explicit consent has been obtained or another lawful basis applies.

Internal Document Distribution

Circulating sensitive documents within an organisation may require redaction of personal data that recipients do not need for their role, particularly special category data.

Manual vs Automated PII Redaction

The economics of PII redaction change sharply with document volume. A single document can be manually reviewed and redacted in a reasonable time. Hundreds or thousands of documents cannot — not consistently, not within regulatory deadlines, and not without significant specialist cost.

Manual Redaction

  • Hours to days per document set
  • Inconsistent redaction across reviewers
  • Risk of missed PII (false negatives)
  • Over-redaction through caution (false positives)
  • Limited audit trail for defence

Automated PII Detection & Redaction

  • Minutes per document set
  • Consistent rule application
  • Comprehensive PII detection
  • Human reviewer confirms edge cases
  • Full audit trail for every decision

How ComplyLoft Redaction Works

The ComplyLoft Redaction tool automates the detection and redaction of personally identifiable information across document sets. It combines pattern matching with contextual analysis to identify PII in any position — structured fields, free text, tables, images with OCR — and applies consistent redaction rules across every file.

  • Upload individual documents or bulk document sets for processing
  • Automated detection of direct and indirect PII, including special category data
  • Customer-defined redaction rules to tailor detection to your specific document types
  • Human review step: confirm or remove flagged redactions before applying
  • Permanent redaction that removes the underlying data — redactions cannot be reversed
  • Full audit trail documenting every redaction decision and the rule that triggered it

ComplyLoft automates the groundwork of personally identifiable information redaction. A qualified human must always review, confirm, and sign off on all redactions before disclosure. ComplyLoft does not guarantee compliance.

PII Redaction by Use Case

Personally identifiable information redaction supports several distinct compliance workflows. Each has its own rules, timelines, and exemption frameworks.

Data Subject Access Requests

30-day GDPR deadline. Third-party PII must be redacted before disclosure to the data subject.

DSAR redaction guide →

Freedom of Information

Public bodies must redact personal data and exempt material under FOI legislation before disclosure.

FOI redaction guide →

GDPR Compliance

Document redaction as part of broader GDPR obligations — data minimisation, data sharing, and publication.

GDPR redaction guide →

Audit Trail Evidence

Defensible redaction records for regulatory inquiries, ICO investigations, and internal review appeals.

Redaction audit trails →

Frequently Asked Questions

What is personally identifiable information (PII)?
Personally identifiable information (PII) is any data that can be used to identify a specific individual, either on its own or in combination with other information. Direct identifiers include names, national insurance numbers, passport numbers, and email addresses. Indirect identifiers include job titles, date of birth, location data, and IP addresses — each benign on its own but capable of identifying someone when combined.
What types of information are considered personally identifiable information?
Types of personally identifiable information fall into two broad categories. Direct identifiers explicitly name an individual: full name, national insurance number, passport number, driver licence number, email address, phone number. Indirect identifiers can identify someone when combined with other data: date of birth, postcode, IP address, device identifiers, employment history, health information. Sensitive PII — including health records, biometric data, racial or ethnic origin, and financial account details — requires additional protection under GDPR Article 9.
What is the difference between personally identifiable information and personal data under GDPR?
Personally identifiable information is the traditional US-centric concept focused on data that directly identifies a person. Personal data under GDPR is broader — it covers any information relating to an identified or identifiable natural person, including indirect identifiers like online identifiers and location data. In practice, all PII is personal data under GDPR, but GDPR extends beyond traditional PII to cover data that could indirectly identify someone.
Can personal data be sensitive and confidential?
Yes. Personal data under GDPR can be both sensitive (special category data under Article 9, such as health, biometric, racial, or genetic data) and confidential (information the organisation has a duty to protect, such as commercially sensitive information combined with personal data). These categories overlap: a medical record is both special category personal data and confidential information. Sensitive personal data requires an explicit lawful basis under Article 9 on top of the general Article 6 basis.
What personally identifiable information should be redacted in a DSAR response?
In a DSAR response, organisations must redact third-party personally identifiable information before disclosure. This includes names, contact details, and identifiers of any individuals other than the data subject making the request. Commercially sensitive information, legally privileged communications, and material covered by specific GDPR exemptions may also require redaction. Every redaction should be documented with a clear rationale.
How does automated PII redaction work?
Automated PII redaction uses pattern recognition and contextual analysis to identify personally identifiable information across documents. The software detects names, addresses, identifiers, financial data, and other PII categories, then applies redaction rules consistently. A human reviewer confirms or adjusts the flagged redactions before the final output is produced. ComplyLoft can reduce manual redaction effort by up to 90% while maintaining a full audit trail.
What types of documents can be redacted for personally identifiable information?
PII redaction applies to any document containing personal data: emails, reports, contracts, statements, call transcripts, forms, medical records, case files, and correspondence. ComplyLoft supports PDF documents including scanned PDFs with OCR, and maintains document structure through the redaction process. Multi-document sets can be processed in bulk with consistent rules applied across every file.
Is personally identifiable information redaction required by law?
PII redaction is not a standalone legal requirement, but it is essential for meeting obligations under GDPR, UK GDPR, the Data Protection Act 2018, and sector-specific regulations. DSAR responses must redact third-party PII. FOI responses must redact personal data covered by exemptions. Data sharing with third parties requires redaction to comply with data minimisation. Failure to redact properly can result in ICO enforcement action and fines.

Automate Your PII Redaction

Request a demo to see how ComplyLoft detects and redacts personally identifiable information across document sets.

Request a Demo