Skip to content Marginalia About Donate Random

Search The Internet

Showing search results from dataengineeringcentral.substack.com.
https://dataengineeringcentral.substack.com/p/duckdb-delta-lake

DuckDB + Delta Lake. - by Daniel Beach

I’ve done a lot of playing around with Delta Lake in my day, enough to have fallen in and out of love a few different times. I think for those of us who grew up in the land of Kimball and SQL Server, before the demi-gods of Snowflake and Databricks came a

Js ⠆⣤⡯⣟⡉⠀⠀⠀
Terms appear in 22 positions

https://dataengineeringcentral.substack.com/p/duckdb-for-data-engineering

DuckDB for Data Engineering - by Daniel Beach

and now seem to be a common part of the Data Engineering commentary. It’s probably here to stay. Say what you will about the thing and what it can and cannot replace, the future will decide.

Js ⠆⣁⢹⠿⠇⠀⠀⠀
Terms appear in 19 positions

https://dataengineeringcentral.substack.com/p/duckdb-100-lets-kick-the-tires

DuckDB 1.0.0 - Let's Kick The Tires - by Daniel Beach

Sometimes you have to take your own medicine, eat your own words, and swallow that bitter pill. I keep wondering if I’m going to have to do that with my DuckDB takes. I mean if you watch talking heads at Databricks Data and AI Summit it looks like DuckDB

Js ⣂⢝⠿⡽⡅⠀⠀⠀
Terms appear in 23 positions

https://dataengineeringcentral.substack.com/p/duckdb-pyiceberg-lambda

DuckDB + PyIceberg + Lambda - by Daniel Beach

Every once in a great while, I feel it’s good to pay some penance, to do a thing that isn’t fun at all, that you find appalling and horrible. I tell my children regularly that it’s a good thing to struggle; to do a “hard thing makes ya’ stronger

Js ⡂⡀⠛⣼⣞⠀⠀⠀
Terms appear in 19 positions

https://dataengineeringcentral.substack.com/p/duckdb-processing-remote-s3-json

DuckDB processing remote (s3) JSON files. - by Daniel Beach

In a quest not to get too bored in the Data Engineering world, with Databricks and Snowflake being old hat these days it’s always good to go looking for interesting things to do. Interesting things yet obvious things, the simple things, things other peopl

Js ⡂⠺⡎⡙⠀⠀⠀⠀
Terms appear in 14 positions

https://dataengineeringcentral.substack.com/p/duckdb-inside-postgres

DuckDB inside Postgres!!?? - by Daniel Beach

Maybe it’s my age, I’m not totally sure, but it’s getting harder and harder for this old guy to keep up with all the new things that come trickling out of the quagmire that is the Data Engineering space these days. Some of the ideas that come into being s

Js ⠆⡌⠽⣂⢥⡏⠀⠀
Terms appear in 22 positions

https://dataengineeringcentral.substack.com/p/duckdb-vs-polars-thunderdome

DuckDB vs Polars - Thunderdome. - by Daniel Beach

Mar 04, 2024 If you know me, you know I like to stir the pot, the big boiling and smoldering cauldron of Data Tools pot. Yes, that’s the one, blackened and burned pot from years of conjurers pouring myriads of Modern Data Stack tools into it, which have s

Js ⠆⡐⠈⢸⣼⠆⠀⠀
Terms appear in 17 positions

https://dataengineeringcentral.substack.com/p/why-duckdb-is-losing-to-polars

Why DuckDB is losing to Polars - by Daniel Beach

Jan 29, 2024 In my never-ending quest to make angry I’m always on the lookout for Golden Calves and other Idols that I can smash to bits, and wait for the angry masses frothing at the mouth to hunt me down. Sorry, not sorry, after years of doing this, I k

Js ⡂⠤⣦⡟⡇⠀⠀⠀
Terms appear in 19 positions

https://dataengineeringcentral.substack.com/p/whats-all-the-hype-with-duckdb

What's all the hype with DuckDB? - by Daniel Beach

Dec 18, 2022 Have you been as confused as me about My goal with this article is to cut through all the crud and get to the bottom of the issue. I keep seeing the name DuckDB keep popping up in my feeds here and there in my varied and disjoined internet tr

Js ⡜⣅⣿⣿⡿⡽⠀⠀
Terms appear in 37 positions

https://dataengineeringcentral.substack.com/p/should-you-use-duckdb-or-polars

Should you use DuckDB or Polars? - by Daniel Beach

This is an interesting question indeed, is it not? What to use, what to use? Both DuckDB and Polars seem to be flying high at the moment, the new cool kids on the block. Everyone talking about them, and 1% of the people actually using them. Typical.

Js ⡤⠢⡨⡹⡇⠀⠀⠀
Terms appear in 17 positions

https://dataengineeringcentral.substack.com/p/aws-lambda-duckdb-and-delta-lake

AWS Lambda + DuckDB (and Delta Lake) - by Daniel Beach

I’m well aware volumes of words have been written about simplicity in the context of Software Engineering, ye ole’ KISS concept that your grandmother taught you is as old as time. Yet the siren call of the modern data stack has lured many a poor soul down

⡄⠠⣜⡽⢚⠱⠀⠀
Terms appear in 21 positions

https://dataengineeringcentral.substack.com/p/date-and-time-manipulation-with-duckdb

Date and Time Manipulation with DuckDB - by Daniel Beach

In my never-ending quest to don’t take it personally, I do it for the good of all Data Engineers) I realized I’d forgotten something oh-so-important. I mean there are few more tedious things that we have to do day in and day out than date and datetime man

Js ⡄⠀⠞⡚⠀⠀⠀⠀
Terms appear in 10 positions

https://dataengineeringcentral.substack.com/p/polars-and-duckdb-release-unity-catalog

Polars and DuckDB release Unity Catalog (Delta Lake) integrations. Who lied? Who didn't?

You know, just when you think things have finally settled down and our poor ears won’t have to hear another thing about The Great Catalog War, or The Great Lake House Format War it just keeps rolling in.

Js ⠊⠀⠀⢁⠾⠀⠀⠀
Terms appear in 9 positions

https://dataengineeringcentral.substack.com/p/10-billion-row-challenge-duckdb-vs

10 billion row challenge. DuckDB vs Polars vs Daft.

Sometimes I find myself lying in my sunroom, staring out the window in the blue sky above me while the sun plays on the maple tree, empty of most all but a few red leaves wondering what else I can do to make the already angry readers of my babbling even m

⡄⠀⠂⡾⠂⠀⠀⠀
Terms appear in 10 positions

https://dataengineeringcentral.substack.com/p/smallpond-distributed-duckdb

smallpond ... distributed DuckDB? - by Daniel Beach

When it comes to the AI hype, I pretty much have tried to ignore the constant roll of never ending models and other hoopla that is mostly meaningless. Better to let the dust settle.

Js ⡄⠐⠀⡖⠠⠁⠀⠀
Terms appear in 9 positions

https://dataengineeringcentral.substack.com/p/duckdb-inside-postgres/comments

Comments - DuckDB inside Postgres!!?? - by Daniel Beach

Thanks for the dedication to test the new shiny tool. We need people like you who is skeptical of enterprise blog posts and test yourself using your own platform. Im sure they are happy with the feedback and more people are aware of the new integration.

Js ⠂⠠⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://dataengineeringcentral.substack.com/p/why-duckdb-is-losing-to-polars/comments

Comments - Why DuckDB is losing to Polars - by Daniel Beach

Unsure if this "means" anything, but I happened to notice that as of Mar 17, 2025, DuckDB seems to have recently passed Polars in number of downloads per day and per week.

Js ⢢⣔⠁⠀⠀⠀⠀⠀
Terms appear in 8 positions

https://dataengineeringcentral.substack.com/p/polars-and-duckdb-release-unity-catalog/comments

Comments - Polars and DuckDB release Unity Catalog (Delta Lake) integrations. Who lied? Who didn't?

But you have to say what he did wrong too you're an idiot" is just the first step to becoming them people. Nice try though.

Js ⠄⠂⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://dataengineeringcentral.substack.com/p/10-billion-row-challenge-duckdb-vs/comments

Comments - 10 billion row challenge. DuckDB vs Polars vs Daft.

re DuckDB: it has gotten a lot better at larger-than-memory queries. However, in this case, you’re trying to create the database in memory and then run a query on it. suggest two options: a) run your aggregation query directly without the CTAS by specifyi

Js ⠄⠀⠨⠄⠀⠀⠀⠀
Terms appear in 4 positions

https://dataengineeringcentral.substack.com/p/whats-all-the-hype-with-duckdb/comments

Comments - What's all the hype with DuckDB?

An interesting article, thanks for it. However, I think one could expand your thoughts a little bit First of all, I assume that edge and serverless computing are two big areas where DuckDB could shine. Being able to easily examine a dataset inside an AWS

Js ⣄⡈⣭⠁⠀⠀⠀⠀
Terms appear in 12 positions

https://dataengineeringcentral.substack.com/p/delta-lake-vs-apache-iceberg-the

Delta Lake vs Apache Iceberg. The Lake House Squabble.

No solving world hunger, not building some fancy SaaS tool, nope, he was happily retired living inside a giant bespoke log mansion because there is apparently enough fighting that goes on in high places of power that kept him busy and rich, and probably s

⠀⠀⠀⠀⠀⣀⠈⠀
Terms appear in 3 positions

https://dataengineeringcentral.substack.com/p/aws-s3-tables-the-iceberg-cometh

AWS S3 Tables?! The Iceberg Cometh. - by Daniel Beach

Dec 05, 2024 Well, what is that old saying? Better late than never? Something like that. Weep, howl, and moan all ye Databricks and Snowflake padiwans, what you have greatly feared has come down upon you with a heavy hand. I can just see all the meetings

⠀⠀⠀⠀⠠⠈⠀⠀
Terms appear in 2 positions

https://dataengineeringcentral.substack.com/p/what-an-iceberg-catalog-that-works

What?! An Iceberg Catalog that works? - by Daniel Beach

May 20, 2025 I will be the first to admit, in an unapologetic way, that working with Apache Iceberg is far from a pleasant experience once you move past the “playing around on my laptop” stage. The tight, inflexible relationship between Iceberg and a cata

Js ⠀⠀⢀⡁⠀⠀⠀⠀
Terms appear in 3 positions

https://dataengineeringcentral.substack.com/p/hey-you-yes-you-come-here/comments

Hey. You. Yes you. Come here. - by Daniel Beach

We have opinions, what's yours? Tell me about what topics you want me to write about more. Do you want more indepth stuff with code? You want more high level stuff? No wait, maybe you want both? No, I know, you probably like it when I do data product revi

Js ⠀⡐⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://dataengineeringcentral.substack.com/p/replace-databricks-spark-jobs-using

Replace Databricks Spark Jobs (using Delta) with Polars

I’m usually not an advocate of being a dreamer, talking big talk but having it all be a pipe dream. I’m a dream killer most of the time. Maybe I’ve been around too long probably. Every tool in its place and every place needs a tool. It’s not that I’m agai

Js ⠀⢀⠀⠀⠀⠀⠀⠀
Terms appear in 1 positions

https://dataengineeringcentral.substack.com/p/review-of-mageai-data-pipelines-for

Review of Mage.ai (data pipelines) for Data Engineers.

I know you are all gasping and covering your mouths in astonishment and disbelief, hardly able to contain yourself. Another data pipeline tool has arrived at your doorstep, promising to solve every known problem you have, and ones you don’t Part of my job

⠀⠀⠀⠀⠂⠀⠀⠀
Terms appear in 1 positions

https://dataengineeringcentral.substack.com/p/introduction-to-daft-vs-polars

Introduction to Daft ( ... vs Polars) - by Daniel Beach

It’s been a while since I’ve kicked ye ole’ tires on something new. You know how much I love to pick and poke at things, I just can’t help it. But this one didn’t take any convincing on my part. I couldn’t even tell you where I ran into it. that is. Part

Js ⠀⠀⠀⠀⠐⠀⠀⠀
Terms appear in 1 positions

Filters

  • Remove Javascript
  • Reduce Adtech
  • Recent Results
  • Search In Title

Domains

  • No Filter
  • Small Web
  • Blogosphere
  • Academia

  • Vintage
  • Plain Text
  • ~tilde

  • Wiki
  • Forum
  • Docs
  • Recipes

Syntax

This is a keyword-based search engine. When entering multiple search terms, the search engine will attempt to match them against documents where the terms occur in close proximity.

Search terms can be excluded with a hyphen.

While the search engine at present does not allow full text search, quotes can be used to specifically search for names or terms in the title. Using quotes will also cause the search engine to be as literal as possible in interpreting the query.

Parentheses can be used to add terms to the query without giving weight to the terms when ranking the search results.

Samples

soup -chicken
Look for keywords that contain soup, but not chicken.
"keyboard"
Look for pages containing the exact word keyboard, not keyboards or the like.
"steve mcqueen"
Look for pages containing the exact words steve mcqueen in that order, with no words in between.
apology (plato)
Look for pages containing apology and plato, but only rank them based on their relevance to apology

Special Keywords

Several special keywords are supported by the search engine.

KeywordMeaning
site:example.comDisplay site information about example.com
site:example.com keywordSearch example.com for keyword
browse:example.comShow similar websites to example.com
ip:127.0.0.1Search documents hosted at 127.0.0.1
links:example.comSearch documents linking to example.com
tld:edu keywordSearch documents with the top level domain edu.
?tld:edu keywordPrefer but do not require results with the top level domain edu. This syntax is also possible for links:..., ip:... and site:...
q>5The amount of javascript and modern features is at least 5 (on a scale 0 to 25)
q<5The amount of javascript and modern features is at most 5 (on a scale 0 to 25)
year>2005(beta) The document was ostensibly published in or after 2005
year=2005(beta) The document was ostensibly published in 2005
year<2005(beta) The document was ostensibly published in or before 2005
rank>50The ranking of the website is at least 50 in a span of 1 - 255
rank<50The ranking of the website is at most 50 in a span of 1 - 255
count>10 The search term must appear in at least 10 results form the domain
count<10 The search term must appear in at most 10 results from the domain
format:html5Filter documents using the HTML5 standard. This is typically modern websites.
format:xhtmlFilter documents using the XHTML standard
format:html123Filter documents using the HTML standards 1, 2, and 3. This is typically very old websites.
generator:wordpressFilter documents with the specified generator, in this case wordpress
file:zipFilter documents containing a link to a zip file (most file-endings work)
file:audioFilter documents containing a link to an audio file
file:videoFilter documents containing a link to a video file
file:archiveFilter documents containing a link to a compressed archive
file:documentFilter documents containing a link to a document
-special:mediaFilter out documents with audio or video tags
-special:scriptsFilter out documents with javascript
-special:affiliateFilter out documents with likely Amazon affiliate links
-special:trackingFilter out documents with analytics or tracking code
-special:cookiesFilter out documents with cookies

Results Legend

The estimated relevance of the search result is indicated using the color saturation of the color of the search result, as well as the order the results are presented.

Information about the position of the match is indicated using a dot matrix in the bottom bar of each search result. Each dot represents four sentences, and are presented in an order of top-to-bottom, left-to-right.

⣿⠃⠀⠀   — The terms occur heavily toward the beginning of the document.

⠠⠀⡄⠁   — The terms occur sparsely throughout the document.

⠀⠁⠀⠀   — The terms occur only in a single sentence.

Potentially problems with the document are presented with a warning triangle, e.g. ⚠ 3. Desktop users can mouse-over this to get a detailed breakdown.

Policies

This website complies with the GDPR by not collecting any personal information, and with the EU Cookie Directive by not using cookies. More Information.

Contact

Reach me at kontakt@marginalia.nu, @MarginaliaNu on twitter.

Open Source

The search engine is open source with an AGPL license. The sources can be perused at https://git.marginalia.nu/.

Data Sources

IP geolocation is sourced from the IP2Location LITE data available from https://lite.ip2location.com/ under CC-BY-SA 4.0.