Data Cleaning
Upload qualitative data files for cleaning. Removes timestamps, filler words, and personally identifying information (PII) — pattern-matching catches structured identifiers, then AI review catches names, locations, and contextual identifiers. How it works →
Before you start
- .docx — Word documents (text is extracted automatically)
- .txt — Plain text files
- .csv — CSV files (you'll choose which columns to clean)
- .pdf — PDF documents (text is extracted automatically)
Not supported: Excel (.xlsx) — export as CSV first (File → Save As → CSV). Scanned or image-based PDFs cannot be processed — use a text-based PDF or convert to .docx first.
Time estimate: Pattern scanning is near-instant. AI de-identification adds 1–2 minutes per file.
How your data is handled
Step 1 — Pattern scan: Regex catches phone numbers, emails, postal codes, SINs, URLs, social media handles, and long ID numbers.
Step 2 — AI review: A language model reviews the text for names, initials, locations, organisation names, job titles, and identifying combinations of details. Processed on EU servers with zero data retention.
Always review cleaned files before sharing. Automated de-identification is a strong first pass, not a replacement for human review.