SmartQueryTools

CSV vs Parquet

CSV and Parquet are the two most common formats in data analytics and data engineering. CSV is the universal plain-text export format; Parquet is a binary columnar format optimised for high-performance queries and compressed storage. Choosing between them — and knowing when to convert — is one of the most common decisions in a data pipeline.

What is CSV?

CSV (Comma-Separated Values) is a plain-text format where each line is a row and commas separate column values. It requires no special software — any text editor, spreadsheet, database, or programming language can read it. CSV is the default export format from databases, SaaS platforms, CRMs, and reporting tools worldwide.

CSV has no built-in schema. Column types are inferred on read, which introduces ambiguity — a column of "2024-01-15" values could be dates or strings depending on the parser. CSV files are uncompressed by default, so large datasets are stored exactly as large as their text representation.

What is Parquet?

Apache Parquet is an open-source binary columnar storage format. Unlike CSV, which writes one complete row at a time, Parquet groups all values for each column together. This layout means analytical queries that read only a few columns can skip most of the data entirely — a critical performance advantage on large datasets.

Parquet applies compression automatically using codecs like Snappy or Zstandard. Combined with columnar encoding (dictionary encoding for repeated values, delta encoding for sorted integers), a CSV file typically compresses to 10–30% of its original size as Parquet. The column schema is embedded in the file footer. Parquet is the native format of AWS Athena, Google BigQuery, Apache Spark, Delta Lake, and Apache Iceberg.

CSV vs Parquet: Key Differences

FeatureCSVParquet
File typePlain textBinary
Human readableYes — opens in any text editorNo — requires a tool
SchemaNone (types inferred on read)Embedded in file footer
CompressionNone by defaultBuilt-in (Snappy, Zstd, Gzip)
Typical file size100%10–30% of equivalent CSV
Columnar storageNo (row-oriented)Yes
Query performanceSlow on large files (full scan)Fast (column pruning + compression)
Tool supportUniversalData engineering tools (DuckDB, Spark, Athena, Pandas)
Append recordsSimple (append lines)Requires rewriting the file

When to use CSV

  • Sharing data with colleagues who use Excel or Google Sheets
  • Exporting from a database or SaaS tool for a one-off analysis
  • Loading into a system that only accepts plain-text input
  • Files are small (under ~10 MB) and compression savings are negligible
  • Debugging data — CSV is immediately readable in any editor

When to use Parquet

  • Storing data in a cloud data lake (S3, GCS, Azure Blob Storage)
  • Querying large datasets with DuckDB, Athena, BigQuery, Spark, or pandas
  • Archiving large exports to reduce storage costs (typically 3–8× compression)
  • Building a pipeline where downstream tools support Parquet natively
  • Preserving accurate column types (dates, integers, floats) across systems

Convert between CSV and Parquet

Convert files instantly in your browser — no upload, no account, no server.

More format comparisons