Deep dive
How CSVPrune's deterministic cleaning pipeline works
CSVPrune models cleaning as an explicit, ordered pipeline rather than a pile of one-off edits. Each operation is a discrete, column-aware step — trim whitespace, normalize email-like values to lowercase, find/replace with literal text or a regular expression, coerce a column to a number, currency, date, or boolean, remove duplicates by whole row or by key, prune selected rows, or add a calculated column — and the steps run in the order you arrange them, every time, with the same result. Determinism is the point: the same input and the same recipe always produce the same output, which is what makes the work auditable and safe to automate.
The operations are also conservative. Type coercion, for instance, only converts values it confidently recognizes and leaves anything ambiguous untouched, so a cleanup never quietly destroys data it did not understand. As you work, a live stats panel shows exactly what happened — original rows, cleaned rows, duplicates removed, cells trimmed, emails normalized, cells replaced, cells coerced, rows pruned — so there is no mystery about the effect of a step. Column profiling complements this by inferring each column's type and surfacing blanks, distinct counts, ranges, and invalid cells, with a one-click filter to show only the rows that fail validation. When the sequence is right, you save it as a recipe and it becomes repeatable across future files, folders, and watch-folder automation — and an audited export can attach a full before/after record of every changed cell for compliance.