About 27,700,000 results
Open links in new tab
  1. What are the pros and cons of the Apache Parquet format compared …

    Apr 24, 2016 · Parquet has gained significant traction outside of the Hadoop ecosystem. For example, the Delta Lake project is being built on Parquet files. Arrow is an important project that makes it easy …

  2. Reading / Fixing a corrupt parquet file - Stack Overflow

    Sep 3, 2024 · Reading / Fixing a corrupt parquet file Asked 1 year, 3 months ago Modified 6 months ago Viewed 2k times

  3. Is it possible to read parquet files in chunks? - Stack Overflow

    Nov 29, 2019 · The Parquet format stores the data in chunks, but there isn't a documented way to read in it chunks like read_csv. Is there a way to read parquet files in chunks?

  4. Inspect Parquet from command line - Stack Overflow

    How do I inspect the content of a Parquet file from the command line? The only option I see now is $ hadoop fs -get my-path local-file $ parquet-tools head local-file | less I would like to avoid

  5. Extension of Apache parquet files, is it '.pqt' or '.parquet'?

    Oct 19, 2021 · I wonder if there is a consensus regarding the extension of parquet files. I have seen a shorter .pqt extension, which has typical 3-letters (like in csv, tsv, txt, etc) and then there is a rather …

  6. How do I get schema / column names from parquet file?

    Nov 24, 2015 · Also, Cloudera (which supports and contributes heavily to Parquet) has a nice page with examples on usage of hangxie's parquet-tools. An example from that page for your use case: …

  7. How to append new data to an existing parquet file?

    Apr 27, 2023 · I have parquet files with some data in them. I want to add more data to them frequently every day. I want to do this without having to load the object to memory and then concatenate and …

  8. Is it better to have one large parquet file or lots of smaller parquet ...

    Mar 28, 2017 · The only downside of larger parquet files is it takes more memory to create them. So you can watch out if you need to bump up Spark executors' memory. row groups are a way for Parquet …

  9. Updating values in apache parquet file - Stack Overflow

    Mar 3, 2015 · I have a quite hefty parquet file where I need to change values for one of the column. One way to do this would be to update those values in source text files and recreate parquet file but I'm …

  10. How to read a Parquet file into Pandas DataFrame?

    How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only a moderate amount of data …