ã2026幎çãDuckDBã®å§ãæ¹ â ããŒã«ã«ã§è¶ é«éããŒã¿åæãããå®å šã¬ã€ã
- DuckDBã®ã€ã³ã¹ããŒã«ãšCLIèµ·å
- CSVãParquetãã¡ã€ã«ãžã®çŽæ¥ã¯ãšãª
- SQLiteã»PostgreSQLãšã®éããšäœ¿ãåã
- Python飿ºã®åºæ¬
ã€ã³ã¹ããŒã«æé
# macOS
brew install duckdb
# Pythonçµç±ïŒæãæè»œïŒ
pip install duckdb
# ããŒãžã§ã³ç¢ºèªã»CLIèµ·å
duckdb --version
duckdb # CLIãèµ·åïŒ.quit ã§çµäºïŒ
DuckDBã³ãã³ãã»ã¯ãšãªæ©èŠè¡š
-- CSVãã¡ã€ã«ãçŽæ¥ã¯ãšãª
SELECT * FROM 'data.csv' LIMIT 10;
SELECT COUNT(*), AVG(price) FROM 'sales.csv' WHERE year = 2025;
-- Parquetãã¡ã€ã«ãçŽæ¥ã¯ãšãª
SELECT * FROM 'data.parquet' WHERE category = 'A';
-- URLããããŒã¿ãååŸïŒhttpfsæ¡åŒµïŒ
INSTALL httpfs;
LOAD httpfs;
SELECT * FROM 'https://example.com/data.csv';
-- çµæããã¡ã€ã«ã«ä¿å
COPY (SELECT * FROM 'input.csv') TO 'output.parquet' (FORMAT PARQUET);
COPY (SELECT * FROM 'input.csv') TO 'output.json' (FORMAT JSON);
-- ããŒãã«ãäœæ
CREATE TABLE sales AS SELECT * FROM 'sales.csv';
DESCRIBE sales;
Python飿º
import duckdb
# ã€ã³ã¡ã¢ãªDB
con = duckdb.connect()
# CSVãçŽæ¥ã¯ãšãª
df = con.execute("SELECT * FROM 'data.csv'").df()
# pandasãšã®é£æº
import pandas as pd
df = pd.read_csv('data.csv')
result = con.execute("SELECT category, SUM(amount) FROM df GROUP BY category").df()
print(result)
ããããè©°ãŸããã€ã³ã
Q: SQLiteãšäœãéãã®ïŒ â SQLiteã¯è¡å¿åïŒOLTPåãïŒã§ãããDuckDBã¯åå¿åïŒOLAPåãïŒã§ãã倧éããŒã¿ã®éèšã¯ãšãªã¯DuckDBãå§åçã«é«éã§ãããã¡ã€ã«ãçŽæ¥ã¯ãšãªã§ããç¹ã倧ããªéãã§ãã
Q: è€æ°ããã»ã¹ããåæã¢ã¯ã»ã¹ã§ããïŒ â DuckDBã¯åäžã©ã€ã¿ãŒã®ã¿èš±å¯ããŸããåæçšéïŒ1ããã»ã¹ãããŒã¿ãåŠçïŒã«ã¯åé¡ãããŸããããWebãµãŒããŒã®ãããªå€æ°ã®åææžã蟌ã¿ã«ã¯åããŸããã
Q: S3ã®ãã¡ã€ã«ãçŽæ¥ã¯ãšãªã§ããïŒ
â httpfs æ¡åŒµã䜿ãã°S3ã®CSV/ParquetãçŽæ¥ã¯ãšãªã§ããŸããSET s3_region='ap-northeast-1' ãªã©ã§èªèšŒæ
å ±ãèšå®ããŠãã ããã
æè¿ãDuckDBãã£ãŠããèããã©ãã©ããªããŒã¿ããŒã¹ãªã®ïŒ
DuckDBã¯ãããŒã«ã«ã§åãè¶ é«éãªåæçšããŒã¿ããŒã¹ãã ããSQLiteã£ãŠç¥ã£ãŠãããªïŒ ããã¯ãã©ã³ã¶ã¯ã·ã§ã³åŠçïŒOLTPïŒãåŸæã ãã©ãDuckDBã¯å€§éããŒã¿ã®éèšãåæïŒOLAPïŒã«ç¹åããããŒãžã§ã³ã¿ãããªã€ã¡ãŒãžã ãããµãŒããŒãç«ãŠãå¿ èŠããªããŠãèªåã®PCã ãã§å®çµããã®ããã€ã³ãã ãã
ãµãŒããŒãªãã§åæã§ããã®ã¯ãæè»œã ãïŒ ã©ããã£ãŠã€ã³ã¹ããŒã«ããã®ïŒ
ãã¡ããã¡ãç°¡åã ããPythonãªããpip install duckdbããMacãªããbrew install duckdbããNode.jsãªããnpm install duckdbãã§äžçºã ããCLIããããããã¿ãŒããã«ã§ãduckdbãã£ãŠæã€ã ãã§SQLã詊ããããäŸåé¢ä¿ãã»ãŒãªããããç°å¢æ§ç¯ã§ãããããšãã»ãšãã©ãªããã ã
ããã§å®éã«è©ŠããŠã¿ãŠãããpip install duckdbãã§ã€ã³ã¹ããŒã«ãããPythonã§ãimport duckdb; con = duckdb.connect(); print(con.execute('SELECT 42 AS answer').df())ããšå®è¡ããŠã¿ãããDuckDBãåããŠDataFrameãè¿ã£ãŠããã°æåã ãïŒ
CSVãã¡ã€ã«ããã®ãŸãŸåæã§ããã£ãŠæ¬åœãªã®ïŒ
æ¬åœã ãïŒ DuckDBã®äžçªããããšããã®äžã€ã ãããSELECT * FROM 'sales.csv' WHERE amount > 1000ãã¿ããã«ãCSVãã¡ã€ã«ããã®ãŸãŸããŒãã«ãšããŠSQLã§æ±ãããã ãParquetãã¡ã€ã«ãåãããã«ã¯ãšãªã§ããããJSONãã¡ã€ã«ã«ã察å¿ããŠããããããããŒã¿ãã€ã³ããŒãããæéããªããããã¡ãã£ãšããåæãããå§ãããããã
æ®éã®ããŒã¿ããŒã¹ããéãã£ãŠèãããã©ããªãã§ãããªã«éãã®ïŒ
DuckDBã¯ãåæåããšããããŒã¿ã®æã¡æ¹ãããŠããã ãæ®éã®ããŒã¿ããŒã¹ã¯è¡ããšã«ããŒã¿ãä¿åãããã©ãDuckDBã¯åããšã«ãŸãšããŠä¿åãããããšãã°ã売äžã®åèšããåºããšããè¡æåã ãšå šè¡ãèªãå¿ èŠããããã©ãåæåãªã売äžã®åã ãèªãã°ãããã ããSUMã»AVGã»GROUP BYã¿ãããªéèšåŠçãã³ã£ããããã»ã©éããã ããæ°åäžè¡ã®CSVã§ãæ°ç§ã§éèšã§ããããšããããã
PythonããDuckDBã䜿ãã®ã¯æ¬åœã«å¿«é©ã ãããimport duckdbãããŠãduckdb.sql("SELECT ...")ãã§çµæãè¿ã£ãŠããããããpandasã®DataFrameããã®ãŸãŸSQLã®ããŒãã«ãšããŠäœ¿ãããã ããduckdb.sql("SELECT * FROM df WHERE col > 100")ãã¿ããã«ããéã«DuckDBã®çµæãã.df()ãã§DataFrameã«å€æãã§ãããpandasã§é ãéèšåŠçãDuckDBã«çœ®ãæããã ãã§10å以äžéããªãããšããããã
Jupyter Notebookã§ã䜿ããã®ïŒ ããŒã¿åæã¯ããŒãããã¯ã§ããããšãå€ãããªã
ãã¡ãã䜿ãããïŒ JupyterãGoogle Colabã§ããã®ãŸãŸåããã ãã%%sqlãããžãã¯ã³ãã³ãã䜿ãã°ãã»ã«ã«SQLãçŽæ¥æžããŠå®è¡ããããšãã§ããããJupySQLæ¡åŒµãšçµã¿åããããšãå¯èŠåãŸã§ããŒãããã¯äžã§å®çµããããpandasã§ã¡ã¢ãªãè¶³ããªããªããããªå€§ããªããŒã¿ã»ããããDuckDBãªãé å»¶è©äŸ¡ã§å¹çããåŠçããŠããããã
ããŒã«ã«ã ããããªããŠãã¯ã©ãŠãã®ããŒã¿ãæ±ããã®ïŒ
ãã質åã ãïŒ DuckDBã¯Amazon S3ãGoogle Cloud Storageäžã®ãã¡ã€ã«ãçŽæ¥èªã¿èŸŒãããããSELECT * FROM 's3://bucket/data.parquet'ãã¿ããã«æžãã ãã§ãããŒã«ã«ã«ããŠã³ããŒãããªããŠãã¯ãšãªã§ãããã ãhttpfsãšããæ¡åŒµæ©èœã䜿ãã°HTTPSäžã®ãã¡ã€ã«ãèªãããããŒã¿ã¬ã€ã¯ãšã®çžæ§ãæçŸ€ãªãã ãã
ããã䟿å©ããã ãïŒ ã§ãDuckDBã«ãèŠæãªããšã£ãŠããã®ïŒ
ãã¡ããããããDuckDBã¯åæã¯ãšãªïŒèªã¿åãäžå¿ïŒãåŸæã ãã©ãWebã¢ããªã®ããã¯ãšã³ãã¿ããã«åæã«å€§éã®æžã蟌ã¿ãçºçãããŠãŒã¹ã±ãŒã¹ã«ã¯åããŠãªããã ãããããçšéã«ã¯PostgreSQLãMySQLãé©ããŠãããããšãã«ããŠãŒã¶ãŒã®åæã¢ã¯ã»ã¹ãèŠæãã§ããæå ã§ããŒã¿ãçŽ æ©ãåæããããšããçšéã§ã¯æåŒ·ã¯ã©ã¹ã ãã2026幎çŸåšãããŒã¿ãµã€ãšã³ãã£ã¹ããã¢ããªã¹ãã®éã§ã¯ããŸãDuckDBã§è©Šãããå®çªã«ãªãã€ã€ãããã