Skip to content Marginalia About Donate Random

Search The Internet

Showing search results from www.stackoverflow.com.
https://www.stackoverflow.com/questions/73053149

DuckDB Not saving huge database

DuckDB Not saving huge database We are trying to embed duckdb in our project but DuckDB doesn't seem to be able to save database after closing connection. Informations: Database size: 16Go Amount of tables: 3 I searched for information about data not pers

Js Tr ⠎⡒⠀⠀⠀⠀⠀⠀
Terms appear in 6 positions

https://www.stackoverflow.com/questions/76493134

DuckDB slower than Polars in single table over + groupby context

DuckDB slower than Polars in single table over + groupby context For the following toy example which involves both calculations over window and groupby aggregations, DuckDB performs nearly 3x slower than Polars in Python. Both give exactly the same result

Js Tr ⣢⠢⡈⠀⠀⠀⠀⠀
Terms appear in 8 positions

https://www.stackoverflow.com/questions/76597352

Can DuckDB be used as Document Database?

Can DuckDB be used as Document Database? As far as I know, the DuckDB is columnar database and can process and store sparse data efficiently. So, would it be possible to use it as "tuple space" or "document database"? I don't expect to get top performance

Js Tr ⣊⢰⢕⡆⠀⠀⠀⠀
Terms appear in 14 positions

https://www.stackoverflow.com/questions/68546824

DuckDB python API: query composition

DuckDB python API: query composition Suppose I use DuckDB with python, for querying an Apache parquet file test.pq with a table containing two columns f1 and f2. r1 = duckdb.query(""" SELECT f1 FROM parquet_scan('test.pq') WHERE f2 > 1 """) Now I would l

Js Tr ⡆⢒⠀⠀⠀⠀⠀⠀
Terms appear in 6 positions

https://www.stackoverflow.com/questions/75712317

IMPORT and EXPORT in Duckdb due to change of version

IMPORT and EXPORT in Duckdb due to change of version I have been using duckdb and have a database but recently I updated duckdb and not able to use the duckdb and getting following error. duckdb.IOException: IO Error: Trying to read a database file with v

Js Tr ⡴⣻⠇⠀⠀⠀⠀⠀
Terms appear in 14 positions

https://www.stackoverflow.com/questions/76497937

DuckDB - Rank correlation is much slower than regular correlation

DuckDB - Rank correlation is much slower than regular correlation Comparing the following two code sections with the only difference as the second one first computes rank, the second section results in much slower performance than the first one (~5x). Alt

Js Tr ⡂⡂⠐⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/71952623

Reading partitioned parquet files in DuckDB

Reading partitioned parquet files in DuckDB Background: DuckDB allows for direct querying for parquet files. e.g. con.execute("Select * from 'Hierarchy.parquet') Parquet allows files to be partitioned by column values. When a parquet file is paritioned a

Js Tr ⡄⢺⠀⠀⠀⠀⠀⠀
Terms appear in 7 positions

https://www.stackoverflow.com/questions/74015798

Unable to access tables written to duckdb when starting new R session (but .duckdb file is not empty)

Unable to access tables written to duckdb when starting new R session (but .duckdb file is not empty) I am having trouble with Duckdb (through R) since I have changed computer and reinstalled all of my software. I have a local duckdb connection through wh

Js Tr ⢴⢈⢳⠁⠀⠀⠀⠀
Terms appear in 12 positions

https://www.stackoverflow.com/questions/69801372

Using DuckDB with s3?

Using DuckDB with s3? I'm trying to use DuckDB in a jupyter notebook to access and query some parquet files held in s3, but can't seem to get it to work. Judging on past experience, I feel like I need to assign the appropriate file system but I'm not sure

Js Tr ⡆⠅⠍⠀⠀⠀⠀⠀
Terms appear in 8 positions

https://www.stackoverflow.com/questions/76736297

Does DuckDB support multi-threading when performing joins?

Does DuckDB support multi-threading when performing joins? Does DuckDB support multi-threaded joins? I've configured DuckDB to run on 48 threads, but when executing a simple join query, only one thread is actively working. Here is an example using the CLI

Js Tr ⡊⠈⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/67912040

UnsatisfiedLinkError for DuckDb native code in Java

UnsatisfiedLinkError for DuckDb native code in Java When trying to open a connection to DuckDb on an EC2 instance: NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAM

Js Tr ⡊⡆⠀⠀⠀⠀⠀⠀
Terms appear in 6 positions

https://www.stackoverflow.com/questions/68656092

How do I limit the memory usage of duckdb in R?

How do I limit the memory usage of duckdb in R? I have several large R data.frames that I would like to put into a local duckdb database. The problem I am having is duckdb seems to load everything into memory even though I am specifying a file as the loca

Js Tr ⡨⠎⠫⠀⠀⠀⠀⠀
Terms appear in 10 positions

https://www.stackoverflow.com/questions/76478913

Polars is much slower than DuckDB in conditional join + groupby/agg context

Polars is much slower than DuckDB in conditional join + groupby/agg context For the following example, where it involves a self conditional join and a subsequent groupby/aggregate operation. It turned out that in such case, DuckDB gives much better perfor

Js Tr ⠬⢓⡅⠀⠀⠀⠀⠀
Terms appear in 10 positions

https://www.stackoverflow.com/questions/66292152

R: DuckDB DBconnect is very slow - Why?

R: DuckDB DBconnect is very slow - Why? I have a *.csv file containing columnar numbers and strings (13GB on disk ) which I imported into a new duckdb (or sqlite) database and saved it so I can access it later in R. But reconnecting duplicates it and is v

Js Tr ⡒⡁⠀⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/76297471

How to read a csv file from google storage using duckdb

How to read a csv file from google storage using duckdb I'm using duckdb version 0.8.0 I have a CSV file located in google storage gs://some_bucket/some_file.csv and want to load this using duckdb. In pandas I can do pd.read_csv("gs://some_bucket/some_fil

Js Tr ⣸⡀⠀⠀⠀⠀⠀⠀
Terms appear in 6 positions

https://www.stackoverflow.com/questions/76980632

does duckDB create a copy of an R data frame when I register it?

does duckDB create a copy of an R data frame when I register it? I am trying to learn about using DuckDB in R. In my reading of the docs and what people say online, it sounds as if, when I register a data frame as a virtual table, no copy is made. Rather,

Js Tr ⠪⢀⠇⠀⠀⠀⠀⠀
Terms appear in 7 positions

https://www.stackoverflow.com/questions/76989538

Deterministic random number generation in duckdb with dplyr syntax

Deterministic random number generation in duckdb with dplyr syntax How can I use duckdb's setseed() function (see reference doc) with dplyr syntax to make sure the analysis below is reproducible? # dplyr version 1.1.1 # arrow version 11.0.0.3 # duckdb 0.7

Js Tr ⣌⠀⣠⠀⠀⠀⠀⠀
Terms appear in 7 positions

https://www.stackoverflow.com/questions/75352219

Fix unimplemented Casting error in Duckdb Insert

Fix unimplemented Casting error in Duckdb Insert I am using Duckdb to insert data by Batch Insert While using following code conn.execute('INSERT INTO Main SELECT * FROM df') I am getting following error Invalid Input Error: Failed to cast value: Unimple

Js Tr ⡌⠊⠀⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/69408334

tableau how to connect duckdb

tableau how to connect duckdb I download the duckdb jdbc driver and copy it to the install directory: C:\Program Files\Tableau\Drivers\duckdb_jdbc-0.2.9.jar then I start the tableau , and choose the others jdbc drivers to connect, set the configuration li

Js Tr ⣄⠂⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/67462150

Transfer a SQLServer table directly to DuckDB in R

Transfer a SQLServer table directly to DuckDB in R I've been reading into DuckDB recently and most of the examples involve having some sort of data already in an R session, then pushing that data into DuckDB. Here is a basic example of that using the iris

Js Tr ⠬⡰⠀⠀⠀⠀⠀⠀
Terms appear in 6 positions

https://www.stackoverflow.com/questions/75727685

How do I get a list of table-like objects visible to duckdb in a python session?

How do I get a list of table-like objects visible to duckdb in a python session? I like how duckdb lets me query DataFrames as if they were sql tables: df = pandas.read_parquet("my_data.parquet") con.query("select * from df limit 10").fetch_df() I also l

Js Tr ⣰⡒⠀⠀⠀⠀⠀⠀
Terms appear in 7 positions

https://www.stackoverflow.com/questions/71467792

How to import a .sql file into DuckDB database?

How to import a .sql file into DuckDB database? I'm exploring DuckDB for one of my project. Here I have a sample Database file downloaded from https://www.wiley.com/en-us/SQL+for+Data+Scientists%3A+A+Beginner%27s+Guide+for+Building+Datasets+for+Analysis-p

Js Tr ⠬⡱⠀⠀⠀⠀⠀⠀
Terms appear in 7 positions

https://www.stackoverflow.com/questions/72945218

How to bulk load list values into DuckDB

How to bulk load list values into DuckDB I have a CSV file that looks like this: W123456,{A123,A234,A345} W2345567,{A789,A678,A543} I have python code that tries to load this csv file: import duckdb con = duckdb.connect(database='mydb.duckdb', read_only=

Js Tr ⠬⣈⡄⠀⠀⠀⠀⠀
Terms appear in 8 positions

https://www.stackoverflow.com/questions/74411716

Set read only connection to duckdb in dbeaver

Set read only connection to duckdb in dbeaver I'm working in python with duckdb and would like to use dbeaver alongside in read only mode. Where in dbeaver can I alter the config for duckdb, it doesn't appear in same location as Postgres ? What I've tried

Js Tr ⡬⠁⠀⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/76965197

Filter based on a list column using arrow and duckdb

Filter based on a list column using arrow and duckdb I'm using the R arrow package to interact with a duckdb table that contains a list column. My goal is to filter on the list column before collecting the results into memory. Can this be accomplished on

Js Tr ⣘⠐⠄⠀⠀⠀⠀⠀
Terms appear in 6 positions

https://www.stackoverflow.com/questions/75475994

How many threads is DuckDB using?

How many threads is DuckDB using? Using duckDB from within R, e.g. library(duckdb) dbname <- "sparsemat.duckdb" con2 <- dbConnect(duckdb(), dbname) dbExecute(con2, "PRAGMA memory_limit='1GB';") how can I find out how many threads the (separate process) i

Js Tr ⡜⠀⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/72604630

DuckDB deleting rows from dataframe error: RuntimeError: Binder Error: Can only delete from base table

DuckDB deleting rows from dataframe error: RuntimeError: Binder Error: Can only delete from base table I have just started using DuckDB in python jupyter notebook. So far everything has worked great. I can't figure out how to delete records from a datafra

Js Tr ⠪⡄⠀⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/75671499

DuckDB Binder Error: Referenced column not found in FROM clause

DuckDB Binder Error: Referenced column not found in FROM clause I am working in DuckDB in a database that I read from json. Here is the json: [{ "account": "abcde", "data": [ { "name": "hey", "amount":1,

Js Tr ⡒⠂⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/75921270

DuckDB beginner needs help: IOException error

DuckDB beginner needs help: IOException error I'm starting to learn DuckDB (on Windows) and I'm having some problems and I don't find much information about it on the internet. I'm following the following tutorial for beginners: https://marclamberti.com/b

Js Tr ⠎⢀⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/76083306

DuckDB SQL Query ParserException: Error on executing SQL query with column name which includes # symbol

DuckDB SQL Query ParserException: Error on executing SQL query with column name which includes # symbol When I tried to execute a query on DuckDB which accesses parquet file from Azure Blob Storage. It is showing parse ParserException at column names Pati

Js Tr ⠪⡀⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/72969276

How can I write raw binary data to duckdb from R?

How can I write raw binary data to duckdb from R? My best guess is that this simply isn't currently supported by the {duckdb} package, however I'm not sure if I'm doing something wrong/not in the in the intended way. Here's a reprex which reproduces the (

Js Tr ⡘⢠⠅⠀⠀⠀⠀⠀
Terms appear in 7 positions

https://www.stackoverflow.com/questions/74186714

Add columns to a table or records without duplicates in Duckdb

Add columns to a table or records without duplicates in Duckdb I have the following code: import time from watchdog.observers import Observer from watchdog.events import FileSystemEventHandler, PatternMatchingEventHandler import duckdb path = "landing/pe

Js Tr ⡨⠠⠄⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/75389288

Export a SQLite table to Apache parquet without creating a dataframe

Export a SQLite table to Apache parquet without creating a dataframe I have multiple huge CSV files that I have to export based on Apache Parquet format and split them into smaller files based on multiple criteria/keys (= column values). As I understand A

Js Tr ⠀⠀⣰⠊⡄⠀⠀⠀
Terms appear in 8 positions

https://www.stackoverflow.com/questions/76509686

duckdb query takes too long to process and return inside Flask application

duckdb query takes too long to process and return inside Flask application I have a Flask app and want to use duckdb as a database for several endpoints. My idea is to query the data and return it as a .parquet file. When I test my database with a simple

Js Tr ⡒⠈⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/68606131

https://www.stackoverflow.com/questions/68606131

One possibility would be to use DuckDB to perform the distinct count and then export the result to a pandas dataframe. Duckdb is a vectorized state-of-the-art DBMS for analytics and can run queries directly on the CSV file. It is also tightly integrated w

Js Tr ⢌⠁⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/74756192

Syntax for Duckdb > Python SQL with Parameter\Variable

Syntax for Duckdb > Python SQL with Parameter\Variable I am working on a proof of concept, using Python and Duckdb. I am wanting to use a variable\parameter inside the Duckdb SELECT statement. For example, y = 2 dk.query("SELECT * FROM DF WHERE x > y").to

Js Tr ⡲⠀⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/72053676

Unable to Install DuckDB using Python PIP

Unable to Install DuckDB using Python PIP Everything goes fine until the following lines: Installing collected packages: duckdb Running setup.py install for duckdb ... \ And it is stuck. Nothing moves. Please, I seek help from Python community members. Is

Js Tr ⡔⠂⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/74152013

https://www.stackoverflow.com/questions/74152013

df1 = pd.read_parquet("file1.parquet") This statement will read the entire parquet file into memory. Instead, I assume you want to read in chunks (i.e one row group after another or in batches) and then write the data frame into DuckDB. This is not possi

Js Tr ⠠⢠⠄⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/76909034

Fast upsert into duckdb

Fast upsert into duckdb I have a dataset where I need to upsert data (on conflict replace some value columns). As this is the bottleneck of an app, I want this to be fairly optimized. But duckdb is really slow compared to sqlite in this instance. What am

Js Tr ⠤⠀⡀⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/72310208

DuckDB: turn dataframe dictionary column into MAP column

DuckDB: turn dataframe dictionary column into MAP column I have a Pandas dataframe with a column containing dictionary values. I'd like to query this dataframe using DuckDB and convert the result to another dataframe, and have the type preserved across th

Js Tr ⢐⠈⠃⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/70123769

DuckDB R : Calculate mean and median for multiple columns

DuckDB R : Calculate mean and median for multiple columns I have a duckdb and want to calculate the means and median or multiple columns at once: e.g. #This works: mtcars %>% summarise(across(everything(),list(mean, median)) #This doesn't tbl(con,"mtcars

Js Tr ⡊⠀⠀⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/73706380

Does Duck DB support triggers?

Does Duck DB support triggers? I suspect the answer is no, but I just wanted to check if anyone has a way to implement triggers in DuckDB? I have a SQLite database that relies heavily on views with INSTEAD OF INSERT/ UPDATE/ DELETE triggers to mask the un

Js Tr ⢐⠌⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/73338009

Speeding up group_by operations dplyr

Speeding up group_by operations dplyr I have a tibble with a lot of groups, and I want to do group-wise operations on it (highly simplified mutate below). z <- tibble(k1 = rep(seq(1, 600000, 1), 5), category = sample.int(2, 3000000, replace =

Js Tr ⠀⠀⡖⠃⠀⠀⠀⠀
Terms appear in 6 positions

https://www.stackoverflow.com/questions/76240893

How to alter data constraint in duckdb R

How to alter data constraint in duckdb R I am trying to alter a Not Null constraint to a Null constraint in duckdb (R api) and can't get it to stick. Here is an example of the problem. drv<- duckdb() con<- dbConnect(drv) dbExecute(con, "CREATE TABLE db(a

Js Tr ⡴⠀⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/74085530

How to updated a table (accessed in pandas) in DuckDB Database?

How to updated a table (accessed in pandas) in DuckDB Database? I'm working on one of use case, I have a larger volumes of records created in a duckdb database table, these tables can be accessed in pandas dataframe, do the data manipulations and send the

Js Tr ⡨⠠⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/76066470

How to show user schema in a Parquet file using DuckDB?

How to show user schema in a Parquet file using DuckDB? I am trying to use DuckDB to show the user-created schema that I have written into a Parquet file. I can demonstrate in Python (using the code example at Get schema of parquet file in Python) that th

Js Tr ⡘⠁⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/76288007

arrow R duration/difftime casting to float

arrow R duration/difftime casting to float I am working with a large set of datasets containing time-series. My time-series data include ID and a value for each day for several years (about 90Gb in total). What I am trying to do is to merge (Non-equi join

Js Tr ⠀⢄⠈⠃⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/68628271

Partially read really large csv.gz in R using vroom

Partially read really large csv.gz in R using vroom I have a csv.gz file that (from what I've been told) before compression was 70GB in size. My machine has 50GB of RAM, so anyway I will never be able to open it as a whole in R. I can load for example the

Js Tr ⠀⠀⢀⠤⠃⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/53982871

Pandas : Reading first n rows from parquet file?

Pandas : Reading first n rows from parquet file? I have a parquet file and I want to read first n rows from the file into a pandas data frame. What I tried: df = pd.read_parquet(path= 'filepath', nrows = 10) It did not work and gave me error: TypeError:

Js Tr ⠀⠀⡞⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/74136530

Is there a tool to query Parquet files which are hosted in S3 storage?

Is there a tool to query Parquet files which are hosted in S3 storage? I have Paraquet files in my S3 bucket which is not AWS S3. Is there a tool that connects to any S3 service (like Wasabi, Digital Ocean, MinIO), and allows me to query the Parquet files

Js Tr ⠀⡓⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/55506633

NodeJS - reading Parquet files

NodeJS - reading Parquet files Does anyone know a way of reading parquet files with NodeJS? I tried node-parquet -> very hard (but possible) to install - it works most of the time but not working for reading numbers (numerical data types). Also tried parq

Js Tr ⠀⢸⠂⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/76964556

arrow::to_duckdb coerces int64 columns to doubles

arrow::to_duckdb coerces int64 columns to doubles arrow::to_duckdb() converts int64 columns to a double in the duckdb table. This happens if the .data being converted is an R data frame or a parquet file. How can I maintain the int64 data type? Example li

Js Tr ⠈⣄⠃⠀⠀⠀⠀⠀
Terms appear in 6 positions

https://www.stackoverflow.com/questions/76341201

Trying to do a docker build which fails at chromadb installation

Trying to do a docker build which fails at chromadb installation I am trying to build a docker image for my python flask project. Seems like there is some issue with the below packages on which Chromadb build is dependent duckdb, hnswlib Below are the con

Js Tr ⢀⠄⢠⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/76239414

Can you load a JSON object into a duckdb table with the Node.js API?

Can you load a JSON object into a duckdb table with the Node.js API? The duckdb Node.js API can load data from a JSON file. However, I don't see a way to load data from a JSON object, similar to the way duckdb Wasm ingestion works. Is there a way to do th

Js Tr ⣘⠀⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/75777109

Unsupported result column Struct()[] for DuckDB 0.7.1 from_json

Unsupported result column Struct()[] for DuckDB 0.7.1 from_json I am trying to get a large set of nested JSON files to load into a table, each file is a single record and there are ~25k files. However when I try to declare the schema it errors out when tr

Js Tr ⡄⠁⠀⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/72883083

Create an auto incrementing primary key in DuckDB

Create an auto incrementing primary key in DuckDB Many database engines support auto-incrementing primary keys, and I would like to use this approach in my new DuckDB approach, but I can't figure out how to set it up. For example, in MySQL: CREATE TABLE P

Js Tr ⠜⠀⠀⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/73457316

How to query bytearray data in pandas dataframe using duckdb?

How to query bytearray data in pandas dataframe using duckdb? df_image : is a pandas data frame with a column labelled 'bytes', which contains image data in bytearray format. I display the images as follows: [display(Image(copy.copy(BytesIO(x)).read(),wid

Js Tr ⡈⠈⠀⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/49367372

GUI tools for viewing/editing Apache Parquet

GUI tools for viewing/editing Apache Parquet I have some Apache Parquet file. I know I can execute parquet file.parquet in my shell and view it in terminal. But I would like some GUI tool to view Parquet files in more user-friendly format. Does such kind

Js Tr ⠀⢠⠃⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/73711197

NodeJS Parquet write

NodeJS Parquet write I have a bunch of columns ( around 30). Out of which there are arrays, text fields with multiple line space (Word document) etc. I think CSV will not be an apt format because of multiple new lines. I am thinking of using Parquet forma

Js Tr ⠀⡨⠀⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/66369926

How to combine two factors to make filtering faster?

How to combine two factors to make filtering faster? I have a data.frame of 1e8 rows which has a column of results that I would like to filter by the following two columns: Model and subModel. I would like two figure out a how to join Model and subModel t

Js Tr ⠀⠀⠀⠨⠂⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/76442164

https://www.stackoverflow.com/questions/76442164

library(tidyverse) library(dbplyr) We can use duckdb to create a small in-memory test setup con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:") Let’s say we have the following three tables: table1 <- tibble( col1 = c("A", "B", "aBc"), col2 =

Js Tr ⠄⠀⠅⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/73383536

DuckDB - efficiently insert pandas dataframe to table with sequence

DuckDB - efficiently insert pandas dataframe to table with sequence CREATE TABLE temp ( id UINTEGER, name VARCHAR, age UINTEGER ); CREATE SEQUENCE serial START 1; Insertion with series works just fine: INSERT INTO temp VALUES(nextval('serial'

Js Tr ⡂⠀⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/75017887

DuckDB `explain analyze` total time and operator time discrepancy

DuckDB `explain analyze` total time and operator time discrepancy When I use explain analyze to profile a join query: D create or replace table r1 as select range, (random()*100)::UINT8 as r from range(0,500000); D create or replace table r2 as select ran

Js Tr ⡂⠀⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/75431182

DuckDB multi threading is not Working on Google Cloud Run with multiple CPU

DuckDB multi threading is not Working on Google Cloud Run with multiple CPU I have a relatively simply cloud function Gen2, which is deployed using Cloud Run regardless of how many vCPU I assigned, DuckDB seems to be using only 1 CPU ,the Memory works fin

Js Tr ⡢⠁⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/63235906

https://www.stackoverflow.com/questions/63235906

What is default number of rows used by csv reader to decide on column types? The current behavior is that 10 chunks of 100 rows each are sampled. It can be further broken down into two scenarios. File has ~ 1000 rows or less (or is compressed): chunks are

Js Tr ⠀⢀⠃⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/75739177

Converting JSON to Parquet in a NodeJS lambda to write into S3

Converting JSON to Parquet in a NodeJS lambda to write into S3 I am running an AWS Lambda function with NodeJS as the language. This lambda receives some JSON input that I need to transform into Parquet format before writing it to S3. Currently, I'm using

Js Tr ⠀⠀⡨⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/58486414

https://www.stackoverflow.com/questions/58486414

I don't know the library, so I can't give definite answer. I will be going by the code at https://github.com/cwida/duckdb. According to the error message in the problematic code is in line 332 of test/sql/capi/test_capi.cpp, which is: REQUIRE(stmt != NULL

Js Tr ⠀⠀⣤⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/33883640

How do I get schema / column names from parquet file?

How do I get schema / column names from parquet file? I have a file stored in HDFS as part-m-00000.gz.parquet I've tried to run hdfs dfs -text dir/part-m-00000.gz.parquet but it's compressed, so I ran gunzip part-m-00000.gz.parquet but it doesn't uncompre

Js Tr ⠀⠀⢠⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76456801

data.table::fread fails for larger file (long vectors not supported yet)

data.table::fread fails for larger file (long vectors not supported yet) fread() fails when reading large file ~335GB with this error. appreciate any suggestions on how to resolve this. opt$input_file <- "sample-009_T/per_read_modified_base_calls.txt" Err

Js Tr ⠀⠀⢀⠂⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76704746

https://www.stackoverflow.com/questions/76704746

The error is likely because your file does not have sufficient memory (RAM) to process it. I have 64GB of RAM, it took well over 6 minutes to read in the file, and the resulting object is over 13GB in R (the original file is over 20GB in size, uncompresse

Js Tr ⠀⠨⠇⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/17071871

How do I select rows from a DataFrame based on column values?

How do I select rows from a DataFrame based on column values? How can I select rows from a DataFrame based on values in some column in Pandas? In SQL, I would use: SELECT * FROM table WHERE column_name = some_value To select rows whose column value equal

Js Tr ⠀⠀⠀⠀⠀⠀⠀⡀
Terms appear in 1 positions

https://www.stackoverflow.com/questions/76640352

How do you specify column compression algorithm in duckdb?

How do you specify column compression algorithm in duckdb? I've read DuckDB Lightweight Compression and understand that DuckDB is designed to choose the best compression strategy automatically, but would like to know if it is possible to give hints in CRE

Js Tr ⡘⠀⠀⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/45865608

Executing an SQL query over a pandas dataset

Executing an SQL query over a pandas dataset I have a pandas data set, called 'df'. How can I do something like below; df.query("select * from df") Thank you. For those who know R, there is a library called sqldf where you can execute SQL code in R, my q

Js Tr ⠀⠀⠸⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/76513797

In DuckDB, how do I SELECT rows with a certain value in an array?

In DuckDB, how do I SELECT rows with a certain value in an array? I've got a table with a field my_array VARCHAR[]. I'd like to run a SELECT query that returns rows where the value ('My Term') I'm searching for is in "my_array" one or more times. These (a

Js Tr ⠊⠀⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76101671

https://www.stackoverflow.com/questions/76101671

You need to use some of DuckDB's text functions for your use case. https://duckdb.org/docs/sql/functions/char Normally, you can use DuckDB's string_split to separate your VARCHAR into a list of VARCHARs (or JSONs in your case). In your example, the comma

Js Tr ⠌⠀⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/59671329

How does DuckDB handle Sparse tables?

How does DuckDB handle Sparse tables? We are evaluating embedding duckdb in our applications. We deal with a lot of tables where the columns will be around 60-70 % sparse most of the time. Does duckdb fill them with default null values or does it support

Js Tr ⣪⠀⠀⠀⠀⠀⠀⠀
Terms appear in 5 positions

https://www.stackoverflow.com/questions/73608991

How can I fast outer-join and filter two vectors (or lists), preferably in base R?

How can I fast outer-join and filter two vectors (or lists), preferably in base R? ## outer join and filter outer_join <- function(x, y, FUN) { if (missing(y)) {y = x} cp <- list() for (d1 in x) { for (d2 in y) { if ( missing(FUN) || FUN(

Js Tr ⠀⠀⡄⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/73628387

https://www.stackoverflow.com/questions/73628387

not an answer (since I'm looking for one as well), but may still help. I think DuckDB may not recognize any index. If you do this: rel = conn.from_df(df) rel.create("a_table") result = conn.execute("select * from a_table").fetch_df() You will see that t

Js Tr ⠈⠂⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/50933429

How to view Apache Parquet file in Windows?

How to view Apache Parquet file in Windows? I couldn't find any plain English explanations regarding Apache Parquet files. Such as: What are they? Do I need Hadoop or HDFS to view/create/store them? How can I create parquet files? How can I view parquet f

Js Tr ⠀⠀⠀⠀⠆⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/73628489

https://www.stackoverflow.com/questions/73628489

One approach could be to use purrr::map_dfr + readr::read_csv for the reading, which allows you to assign an "id" column based on names assigned to the file paths, and then register that as a duckdb table: library(dplyr) purrr::map_dfr(c(year01 = path,

Js Tr ⠐⠈⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76424844

Iterating on rows in pandas DataFrame to compute rolling sums and a calculation

Iterating on rows in pandas DataFrame to compute rolling sums and a calculation I have a pandas DataFrame, I'm trying to (in pandas or DuckDB SQL) do the following on each iteration partitioned by CODE, DAY, and TIME: Iterate on each row to calculate the

Js Tr ⠠⠀⠀⠁⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/75607953

[SQL]: Efficient sampling from cartesian join

[SQL]: Efficient sampling from cartesian join I have two tables. What I want is a random sample from all the possible pairings. Say size of t1 is 100, and size of t2 is 200, and I want a sample of 300 pairings. The naive way of doing this (ran on the onli

Js Tr ⢀⠈⠈⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/66027598

how to vacuum (reduce file size) on duckdb

how to vacuum (reduce file size) on duckdb I am testing duckdb database for analytics and I must say is very fast. The issue is the database file is growing and growing but I need to make it small to share it. In sqlite I recall to use the VACUUM commadn,

Js Tr ⠌⠁⠀⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/76854735

How to increase row output limit in DuckDB in Python?

How to increase row output limit in DuckDB in Python? I'm working with DuckDB in Python (in a Jupyter Notebook). How can I force DuckDB to print all rows in the output rather than truncating rows? I've already increased output limits in the Jupyter Notebo

Js Tr ⠜⠀⠀⠀⠀⠀⠀⠀
Terms appear in 3 positions

https://www.stackoverflow.com/questions/73197581

Select All Columns Except the Ones I Transformed?

Select All Columns Except the Ones I Transformed? Apologies, I am still a beginner at DBT. Is there a way to select all the columns that I didn't explicitly put in my select statement? Something like this: {{ config(materialized='view') }} with my_view a

Js Tr ⠀⠂⠀⠀⠀⠀⠀⠀
Terms appear in 1 positions

https://www.stackoverflow.com/questions/76937626

SQLite Database File Invalidated from Query Being Interrupted (using DuckDB Python)

SQLite Database File Invalidated from Query Being Interrupted (using DuckDB Python) Connected to an SQLite DB file via DuckDB Python DB API in read_only mode. Ran a typical SELECT query, which was interrupted - I believe my python process was closed, I do

Js Tr ⡘⡀⠀⠀⠀⠀⠀⠀
Terms appear in 4 positions

https://www.stackoverflow.com/questions/68403015

Is there a way to group by intervals of 15 min in DuckDB?

Is there a way to group by intervals of 15 min in DuckDB? I made a table with create table counter ( createdat TIMESTAMP, tickets INT, id VARCHAR ) and I would like to group the rows by intervals of 15 min, so I am trying to do it with: SELECT S

Js Tr ⡐⠀⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/74895737

https://www.stackoverflow.com/questions/74895737

Based on the error message, it seems unlikely that you can read the CSV file en toto into memory, even once. I suggest for analyzing the data within it, you may need to change your data-access to something else, such as: DBMS, whether monolithic (duckdb o

Js Tr ⠠⢀⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76743694

https://www.stackoverflow.com/questions/76743694

OK... Just replacing import * as duckdb from 'duckdb' with import duckdb from 'duckdb' solved the issue. Otherwise, duckdb.default.Database should be used instead of duckdb.Database.

Js Tr ⡄⠀⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76634451

https://www.stackoverflow.com/questions/76634451

You can use DuckDB with the dbt-duckdb plugin. When configured with your dbt model you can either use an existing DuckDB instance, or spin up an in memory instance that will complete the dbt transformations. By default memory is used and it is easy to rea

Js Tr ⠌⠀⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76223088

https://www.stackoverflow.com/questions/76223088

You can read it with read_csv and write it to parquet with write_parquet import duckdb from io import BytesIO csv_data = BytesIO(b'col1,col2\n1,2\n3,4') duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet') Note - this does not work on

Js Tr ⡐⠀⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/72049264

How to determine cause of "RuntimeError: Resource temporarily unavailable" error in Python notebook

How to determine cause of "RuntimeError: Resource temporarily unavailable" error in Python notebook In a hosted Python notebook, I'm using the duckdb library and running this code: duckdb.connect(database=":memory:", read_only=False) This returns the fol

Js Tr ⠐⠂⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/50988026

Python - read parquet file without pandas

Python - read parquet file without pandas Currently I'm using the code below on Python 3.5, Windows to read in a parquet file. import pandas as pd parquetfilename = 'File1.parquet' parquetFile = pd.read_parquet(parquetfilename, columns=['column1', 'colum

Js Tr ⠀⠃⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76995634

https://www.stackoverflow.com/questions/76995634

It's not possible to use row_number() directly ... library(dplyr) arrow::arrow_table(iris) %>% mutate(rn = row_number()) %>% filter(Sepal.Width == 3.8) %>% collect() # Warning: Expression row_number() not supported in Arrow; pulling data into R #

Js Tr ⠀⠂⠀⡀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/73026431

How can I initialize `duckdb-wasm` within NextJS?

How can I initialize `duckdb-wasm` within NextJS? I'm working on a NextJS project that leverages a wasm package via npm; specifically this is duckdb-wasm. duckdb-wasm needs to initialize from a set of bundles (e.g. based on browser capability). this can b

Js Tr ⠀⠄⠂⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/73129253

Dealing with very large sas7bdat (>300GB) files with R

Dealing with very large sas7bdat (>300GB) files with R I have been searching for a solution to this problem without making any progress. I am looking for a way to deal with (manipulate, filter, etc) sas7bdat files using R without the need to load them to

Js Tr ⠀⡈⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76016847

chromadb.errors.NoIndexException: Index not found, please create an instance before querying

chromadb.errors.NoIndexException: Index not found, please create an instance before querying What does this mean? How can I load the following index? tree langchain/ langchain/ ├── chroma-collections.parquet ├── chroma-embeddings.parquet └── index ├─

Js Tr ⠀⢐⠀⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/51590671

PandaSQL very slow

PandaSQL very slow I'm currently switching from R to Python (anconda/Spyder Python 3) for data analysis purposes. In R I used to use a lot R sqldf. Since I'm good at sql queries, I didn't want to re-learn data.table syntax. Using R sqldf, I never had perf

Js Tr ⠀⠀⡄⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76249396

Efficient and Scalable Way to Handle Time Series Analysis with Large Datasets in Python

Efficient and Scalable Way to Handle Time Series Analysis with Large Datasets in Python I'm working with a very large dataset (over 100 million rows) of time-series data in Python. Each row represents a separate event with a timestamp, and there are multi

Js Tr ⠀⠀⠰⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://www.stackoverflow.com/questions/76953964

https://www.stackoverflow.com/questions/76953964

You can run complex SQL queries from data in a CSV file, if you have Java installed in your OS (that's pretty common) and by combining Ant (scripting) and H2 (in-memory database). For example, if you have the file my_file.csv as: "name", "sex", "age",

Js Tr ⠀⠀⠀⠁⠀⠀⠀⠀
Terms appear in 1 positions

Filters

  • Remove Javascript
  • Reduce Adtech
  • Recent Results
  • Search In Title

Domains

  • No Filter
  • Small Web
  • Blogosphere
  • Academia

  • Vintage
  • Plain Text
  • ~tilde

  • Wiki
  • Forum
  • Docs
  • Recipes

Syntax

This is a keyword-based search engine. When entering multiple search terms, the search engine will attempt to match them against documents where the terms occur in close proximity.

Search terms can be excluded with a hyphen.

While the search engine at present does not allow full text search, quotes can be used to specifically search for names or terms in the title. Using quotes will also cause the search engine to be as literal as possible in interpreting the query.

Parentheses can be used to add terms to the query without giving weight to the terms when ranking the search results.

Samples

soup -chicken
Look for keywords that contain soup, but not chicken.
"keyboard"
Look for pages containing the exact word keyboard, not keyboards or the like.
"steve mcqueen"
Look for pages containing the exact words steve mcqueen in that order, with no words in between.
apology (plato)
Look for pages containing apology and plato, but only rank them based on their relevance to apology

Special Keywords

Several special keywords are supported by the search engine.

KeywordMeaning
site:example.comDisplay site information about example.com
site:example.com keywordSearch example.com for keyword
browse:example.comShow similar websites to example.com
ip:127.0.0.1Search documents hosted at 127.0.0.1
links:example.comSearch documents linking to example.com
tld:edu keywordSearch documents with the top level domain edu.
?tld:edu keywordPrefer but do not require results with the top level domain edu. This syntax is also possible for links:..., ip:... and site:...
q>5The amount of javascript and modern features is at least 5 (on a scale 0 to 25)
q<5The amount of javascript and modern features is at most 5 (on a scale 0 to 25)
year>2005(beta) The document was ostensibly published in or after 2005
year=2005(beta) The document was ostensibly published in 2005
year<2005(beta) The document was ostensibly published in or before 2005
rank>50The ranking of the website is at least 50 in a span of 1 - 255
rank<50The ranking of the website is at most 50 in a span of 1 - 255
count>10 The search term must appear in at least 10 results form the domain
count<10 The search term must appear in at most 10 results from the domain
format:html5Filter documents using the HTML5 standard. This is typically modern websites.
format:xhtmlFilter documents using the XHTML standard
format:html123Filter documents using the HTML standards 1, 2, and 3. This is typically very old websites.
generator:wordpressFilter documents with the specified generator, in this case wordpress
file:zipFilter documents containing a link to a zip file (most file-endings work)
file:audioFilter documents containing a link to an audio file
file:videoFilter documents containing a link to a video file
file:archiveFilter documents containing a link to a compressed archive
file:documentFilter documents containing a link to a document
-special:mediaFilter out documents with audio or video tags
-special:scriptsFilter out documents with javascript
-special:affiliateFilter out documents with likely Amazon affiliate links
-special:trackingFilter out documents with analytics or tracking code
-special:cookiesFilter out documents with cookies

Results Legend

The estimated relevance of the search result is indicated using the color saturation of the color of the search result, as well as the order the results are presented.

Information about the position of the match is indicated using a dot matrix in the bottom bar of each search result. Each dot represents four sentences, and are presented in an order of top-to-bottom, left-to-right.

⣿⠃⠀⠀   — The terms occur heavily toward the beginning of the document.

⠠⠀⡄⠁   — The terms occur sparsely throughout the document.

⠀⠁⠀⠀   — The terms occur only in a single sentence.

Potentially problems with the document are presented with a warning triangle, e.g. ⚠ 3. Desktop users can mouse-over this to get a detailed breakdown.

Policies

This website complies with the GDPR by not collecting any personal information, and with the EU Cookie Directive by not using cookies. More Information.

Contact

Reach me at kontakt@marginalia.nu, @MarginaliaNu on twitter.

Open Source

The search engine is open source with an AGPL license. The sources can be perused at https://git.marginalia.nu/.

Data Sources

IP geolocation is sourced from the IP2Location LITE data available from https://lite.ip2location.com/ under CC-BY-SA 4.0.