SmartQueryTools

Shuffle Parquet Files Online

Randomly shuffle the row order of Parquet files directly in your browser. Useful for randomising data before sampling or ML train/test splits.

Drop your .parquet file here

or click to browse — max 50 MB

About this tool

Randomly shuffle the row order of a Parquet file in one click. Drop the file, click Shuffle, and download a new file with rows in a random order. Useful before train/test splitting for machine learning, random sampling, or removing any ordering bias in your data. Your data never leaves your device.

Frequently Asked Questions

Does shuffling a Parquet file change the data in any way?

No. Only the row order changes. All column values, data types, and column names remain exactly the same. Shuffling is a purely positional operation.

Is the shuffle reproducible?

No — each run produces a different random order. Each shuffle uses a non-deterministic random seed. For a reproducible shuffle, use the SQL Query tool with ORDER BY md5(CAST(rowid AS VARCHAR)) or a similar hash-based ordering.

Why would I shuffle a Parquet file?

The most common reason is preparing data for machine learning — shuffling ensures that train and test splits are random rather than ordered by time, ID, or any other latent pattern in the original file.

Is my data private?

Yes — completely. Your file is never uploaded to any server. Everything runs locally in your browser using WebAssembly — processing happens entirely inside your tab. Once you close the tab, nothing is retained.

What is the maximum file size?

The free limit is 50 MB. For larger files, performance depends on your device's available memory — most modern machines handle 500 MB to 1 GB comfortably.

Related Tools