Skip to content Marginalia About Donate Random

Search The Internet

Showing search results from francoismichonneau.net.
https://francoismichonneau.net/2024/12/advent-of-sql/

Advent of SQL with DuckDB and R | François Michonneau, PhD

is a popular advent calendar of programming puzzles. I have attempted to do it in the past using R but I always gave up after a few days because it was taking too much of my time, and prefer programming puzzles that work with data. Last year, I had fun go

Js ⡅⢼⠈⢯⣰⣷⣟⣿
Terms appear in 41 positions

https://francoismichonneau.net/2023/06/duckdb-r-remote-data/

How to work with remote Parquet files with the duckdb R package? | François Michonneau, PhD

For large datasets, it is sometimes convenient to explore them without downloading them locally. With Arrow, you can work with these remotes files if they are stored in AWS S3 or Google Cloud Storage. It is however not yet possible for files stored over H

Js ⡰⠜⡄⠀⠀⠀⠀⠀
Terms appear in 8 positions

https://francoismichonneau.net/hire-me/

Hire me | François Michonneau, PhD

Follow I’m looking for my next role, if you are (or know someone who is) recruiting, you will find below my resume, a short “About me and a longer version of my skills and accomplishments. At the end, you can read some testimonials from previous colleague

Js ⠀⠄⠄⠀⠀⠀⠀⠀
Terms appear in 2 positions

https://francoismichonneau.net/2022/10/import-big-csv/

How to use Arrow to work with large CSV files? | François Michonneau, PhD

A short practical guide to load a 15 GB dataset with Apache Arrow using R and Python.

Js ⠀⠀⠀⠀⠀⠘⠀⠀
Terms appear in 2 positions

https://francoismichonneau.net/blog/

Posts by Year | François Michonneau, PhD

A snippet to add to Emacs configuration file to use the new R code formatter Air An annoted list of solutions to the Advent of SQL challenges Learn how to work with Parquet files over HTTPS using duckdb and dplyr. A short practical guide to load a 15 GB d

Js ⠀⠁⠀⠀⠀⠀⠀⠀
Terms appear in 1 positions

https://francoismichonneau.net/Michonneau_resume.pdf

François Michonneau, PhD

francois.michonneau@gmail.com linkedin.com/in/francois-michonneau https github.com/fmichonneau https francoismichonneau.net WORK EXPERIENCE Voltron Data Remote Senior Training Engineer, Customer Success 04/2023 07/2024 Liaised with customers to collect fe

⠀⠈⠀⠂⠀⠀⠀⠀
Terms appear in 2 positions

Filters

  • Remove Javascript
  • Reduce Adtech
  • Recent Results
  • Search In Title

Domains

  • No Filter
  • Small Web
  • Blogosphere
  • Academia

  • Vintage
  • Plain Text
  • ~tilde

  • Wiki
  • Forum
  • Docs
  • Recipes

Syntax

This is a keyword-based search engine. When entering multiple search terms, the search engine will attempt to match them against documents where the terms occur in close proximity.

Search terms can be excluded with a hyphen.

While the search engine at present does not allow full text search, quotes can be used to specifically search for names or terms in the title. Using quotes will also cause the search engine to be as literal as possible in interpreting the query.

Parentheses can be used to add terms to the query without giving weight to the terms when ranking the search results.

Samples

soup -chicken
Look for keywords that contain soup, but not chicken.
"keyboard"
Look for pages containing the exact word keyboard, not keyboards or the like.
"steve mcqueen"
Look for pages containing the exact words steve mcqueen in that order, with no words in between.
apology (plato)
Look for pages containing apology and plato, but only rank them based on their relevance to apology

Special Keywords

Several special keywords are supported by the search engine.

KeywordMeaning
site:example.comDisplay site information about example.com
site:example.com keywordSearch example.com for keyword
browse:example.comShow similar websites to example.com
ip:127.0.0.1Search documents hosted at 127.0.0.1
links:example.comSearch documents linking to example.com
tld:edu keywordSearch documents with the top level domain edu.
?tld:edu keywordPrefer but do not require results with the top level domain edu. This syntax is also possible for links:..., ip:... and site:...
q>5The amount of javascript and modern features is at least 5 (on a scale 0 to 25)
q<5The amount of javascript and modern features is at most 5 (on a scale 0 to 25)
year>2005(beta) The document was ostensibly published in or after 2005
year=2005(beta) The document was ostensibly published in 2005
year<2005(beta) The document was ostensibly published in or before 2005
rank>50The ranking of the website is at least 50 in a span of 1 - 255
rank<50The ranking of the website is at most 50 in a span of 1 - 255
count>10 The search term must appear in at least 10 results form the domain
count<10 The search term must appear in at most 10 results from the domain
format:html5Filter documents using the HTML5 standard. This is typically modern websites.
format:xhtmlFilter documents using the XHTML standard
format:html123Filter documents using the HTML standards 1, 2, and 3. This is typically very old websites.
generator:wordpressFilter documents with the specified generator, in this case wordpress
file:zipFilter documents containing a link to a zip file (most file-endings work)
file:audioFilter documents containing a link to an audio file
file:videoFilter documents containing a link to a video file
file:archiveFilter documents containing a link to a compressed archive
file:documentFilter documents containing a link to a document
-special:mediaFilter out documents with audio or video tags
-special:scriptsFilter out documents with javascript
-special:affiliateFilter out documents with likely Amazon affiliate links
-special:trackingFilter out documents with analytics or tracking code
-special:cookiesFilter out documents with cookies

Results Legend

The estimated relevance of the search result is indicated using the color saturation of the color of the search result, as well as the order the results are presented.

Information about the position of the match is indicated using a dot matrix in the bottom bar of each search result. Each dot represents four sentences, and are presented in an order of top-to-bottom, left-to-right.

⣿⠃⠀⠀   — The terms occur heavily toward the beginning of the document.

⠠⠀⡄⠁   — The terms occur sparsely throughout the document.

⠀⠁⠀⠀   — The terms occur only in a single sentence.

Potentially problems with the document are presented with a warning triangle, e.g. ⚠ 3. Desktop users can mouse-over this to get a detailed breakdown.

Policies

This website complies with the GDPR by not collecting any personal information, and with the EU Cookie Directive by not using cookies. More Information.

Contact

Reach me at kontakt@marginalia.nu, @MarginaliaNu on twitter.

Open Source

The search engine is open source with an AGPL license. The sources can be perused at https://git.marginalia.nu/.

Data Sources

IP geolocation is sourced from the IP2Location LITE data available from https://lite.ip2location.com/ under CC-BY-SA 4.0.