ParquetïŒããŒã±ããïŒ ãšã¯ïŒ
- åïŒã«ã©ã ïŒåäœã§ããŒã¿ãæ ŒçŽãããããå¿ èŠãªåã ããå¹çããèªã¿èŸŒãã
- åãåã®ããŒã¿ããŸãšãŸãã®ã§å§çž®çãéåžžã«é«ããã¹ãã¬ãŒãžå®¹éãå€§å¹ ã«ç¯çŽã§ãã
- Apache SparkãBigQueryãAthenaãªã©äž»èŠãªåæåºç€ããã€ãã£ã察å¿ããŠãã
- CSVãJSONãšæ¯ã¹ãŠæ°ååã®èªã¿èŸŒã¿é床ãå®çŸããããšããã
Parquetã£ãŠèããããšãããã©ãCSVãšäœãéãã®ïŒ
äžçªå€§ããªéãã¯ãããŒã¿ã®äžŠã¹æ¹ã ããCSVã¯ãè¡ããšãã«ããŒã¿ãä¿åãããã©ãParquetã¯ãåããšãã«ä¿åãããã ãããšãã°100äžäººåã®ããŒã¿ããã幎霢ãã ãéèšããããšããCSVã ãšå šè¡ãèªã¿èŸŒãå¿ èŠããããã©ãParquetãªã幎霢ã®åã ãèªãã°æžããã ã
ãªãã»ã©ãå¿ èŠãªãšããã ãèªããããéããã ãïŒ
ãã®ãšããããããååäœã ãšåãåã®ããŒã¿ãé£ç¶ãããããå§çž®ããã¡ããã¡ãå¹çããããã ãããšãã°ãæ§å¥ãã®åãªããç·ã»å¥³ã»ç·ã»ç·âŠãã®ããã«äŒŒãå€ã䞊ã¶ãããCSVã®10åã®1以äžã®ãµã€ãºã«ãªãããšãããã
ãããïŒãããå šéšParquetã«ããã°ããããããªãã®ïŒ
åæã«ã¯æåŒ·ãªãã ãã©ã1è¡ãã€ããŒã¿ã远å ãããããªåŠçã«ã¯åããŠããªããã ãåããšã«ãŸãšããŠæžã蟌ãå¿ èŠãããããããªã¢ã«ã¿ã€ã ã§1ä»¶ãã€æžã蟌ããããªããŒã¿ããŒã¹çšéã«ã¯CSVãJSONã®ã»ããæ±ããããããšãããã
䜿ãåãã倧äºãªãã ãïŒã©ããããšããã§äœ¿ãããŠãã®ïŒ
ããŒã¿ã¬ã€ã¯ãããŒã¿ãŠã§ã¢ããŠã¹ã§ã®åæãã¡ã€ã³ã ããApache SparkãAWS AthenaãGoogle BigQueryãªãããParquetããã€ãã£ããµããŒãããŠããŠãS3ã«Parquetãã¡ã€ã«ã眮ããŠããã°SQLã§çŽæ¥ã¯ãšãªã§ãããã ã
Parquetãã¡ã€ã«ã®äžèº«ã£ãŠã©ããªã£ãŠãã®ïŒ
å®ã¯ããªãè³¢ãæ§é ã«ãªã£ãŠããŠããè¡ã°ã«ãŒãããšããåäœã§ããŒã¿ãåå²ããŠããã®äžãåããšã«æ ŒçŽããŠãããã ãããã«ãã¡ã€ã«ã®æ«å°Ÿã«ã¡ã¿ããŒã¿ïŒååã®æå°å€ã»æå€§å€ãªã©ïŒãå ¥ã£ãŠããŠãã¯ãšãªæã«ãã®ã¡ã¿ããŒã¿ãèŠãŠããã®è¡ã°ã«ãŒãã¯èªãå¿ èŠãªãããšå€æã§ããããããè¿°èªããã·ã¥ããŠã³ãšåŒã¶ãã ã
ã¡ã¿ããŒã¿ã§èªã¿é£ã°ãããªããŠè³¢ããïŒ
ãããªãã ããããParquetã¯ã¹ããŒãæ å ±ããã¡ã€ã«å ã«åã蟌ãŸããŠãããããCSVã®ããã«ããã®åã¯æ°å€ïŒæååïŒããšæ©ãå¿ èŠããªããApache Arrowãšçµã¿åããããšã¡ã¢ãªäžã§ããŒãã³ããŒã§èªã¿èŸŒããŠããŸãã«ããã°ããŒã¿åæã®ããã¡ã¯ãã¹ã¿ã³ããŒããšèšããååšã ã