Awesome video of feature: https://youtu.be/aFDUlyeMBc8
Disclaimer: I’m a co-founder at MotherDuck.
Feature request: enable the tuning of when Instant SQL is run and displayed. The erroring out with flashing updates at nearly every keystoke while expanding on a query is distracting for me personally (my brain goes into troubleshooting vs thinking mode). I like the feature (so I will keep it on by default), but I’d like to have a few modes for it depending on my working context (specifically tuning of update frequency at separation characters [space, comma], end of statement [semicolon/newline], and injections [paste/autocomplete]).
Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'ui': Failed to download extension "ui" at URL "http://extensions.duckdb.org/0069af20ab/osx_arm64/ui.duckdb_..." (HTTP 403) Extension "ui" is an existing extension.
Is it looking to download the preview version of the extension, but getting blocked/unauthorized (hence the 403 forbidden response)? Or is there something about the auto-loading behavior that I'm supposed to disable maybe?
Good to know a totally offline tool is being considered.
Thanks for the great tool BTW.
Example:
FROM orders |> WHERE order_date >= '2024-01-01' |> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id |> WHERE total_spent > 1000 |> INNER JOIN customers USING(customer_id) |> CALL ENRICH.APOLLO(EMAIL > customers.email) |> AGGREGATE COUNT(*) high_value_customer GROUP BY company.country
I added a similar "get results as you type" feature to the SQLite integration in the Logfile Navigator (lnav)[2]. When entering PRQL queries, the preview will show the results for the current and previous stages of the pipeline. When you move the cursor around, the previews update accordingly. I was waiting years for something like PRQL to implement this since doing it with regular SQL requires more knowledge of the syntax and I didn't want to go down that path.
[1] - https://prql-lang.org [2] - https://lnav.org/2024/03/29/prql-support.html
https://community-extensions.duckdb.org/extensions/prql.html
This is one of the reasons I'm excited about DuckDB's upcoming [2]PEG parser. If they can pull it off, we could have alternative dialects that run on DuckDB.
[1] https://www.malloydata.dev/ [2] https://duckdb.org/2024/11/22/runtime-extensible-parsers.htm...
That, I think, is most developers' real sticking point with SQL. It's not object-relational impedance mismatch between their application and the data store, it's imperative-declarative impedance mismatch between their preferred or demonstrated talent. They are used to thinking about problems in exactly one way, so when they struggle to adapt to a different way of thinking about the problems they assume their familiarity is what's more correct.
I think this is why the same developers insist that XML/HTML is "just a markup language." Feeding a document into an executable to produce output isn't really significantly different than feeding imperative language into a compiler. The only real difference is that one is Turing complete, but Turning completeness is not a requirement of programming languages.
https://cloud.google.com/blog/products/data-analytics/simpli...
It's pretty neat:
FROM mydataset.Produce
|> WHERE sales > 0
|> AGGREGATE SUM(sales) AS total_sales, COUNT(\*) AS num_sales
GROUP BY item;
Edit: formattingWhy is it a killer feature? First of all, LLMs complete text from left to right. That alone is a killer feature.
But for us meatboxes with less compute power, pipe syntax allow (much better) code completion.
Pipe syntax is delightful to work with and makes going back to SQL a real bummer moment (please insert meme of Kate Perry kissing the earth here).
// Select all orders for users registered in last year, and compute average earnings per user
SELECT ...
input_data = duckdb.sql("SELECT * FROM read_parquet('...')")
step_1 = duckdb.sql("SELECT ... FROM input_data JOIN ...")
step_2 = duckdb.sql("SELECT ... FROM step_1")
final = duckdb.sql("SELECT ... FROM step_2;")
I think this would be better if it was combined with information about valid words in the cursor position, which would likely be a bit more involved but achievable through querying the schema and settling on a subset of SQL. It would help people that aren't already fluent in SQL to extract the data they want. Perhaps allow them to click the suggestions to add them to the query.
I've done partial implementations of this too, that query the schema for table or column names. It's very cheap even on large, complex schemas, so it's fine to just throw every change at the database and check what drops out. In practice I didn't get much out of either beyond the fun of hacking up an ephemeral tool, or I would probably have built some small product around it.
I spent the first two quarters of 2024 working on observability for a build-the-plane-as-you-fly-it style project. I can’t express how useful the cte preview would have been for debugging.
will do the same!
We are working on how to make it easy to switch from instant sql -> run query -> instant sql
1. https://github.com/manifold-systems/manifold/blob/master/man...
SELECT json_serialize_sql('SELECT 2');
[
{
"json_serialize_sql('SELECT 2')": {
"error": false,
"statements": [
{
"node": {
"type": "SELECT_NODE",
"modifiers": [],
"cte_map": {
"map": []
},
"select_list": [
{
"class": "CONSTANT",
"type": "VALUE_CONSTANT",
"alias": "",
"query_location": 7,
"value": {
"type": {
"id": "INTEGER",
"type_info": null
},
"is_null": false,
"value": 2
}
}
],
"from_table": {
"type": "EMPTY",
"alias": "",
"sample": null,
"query_location": 18446744073709551615
},
"where_clause": null,
"group_expressions": [],
"group_sets": [],
"aggregate_handling": "STANDARD_HANDLING",
"having": null,
"sample": null,
"qualify": null
},
"named_param_map": []
}
]
}
}
]
Is mother duck editor features available on-prem? My understanding is that mother duck is a data warehouse sass.
-Customer software engineer at MotherDuck
It doesn't currently - the UI assets are loaded at runtime - but we do have an offline mode planned. See https://github.com/duckdb/duckdb-ui/issues/62.
Kinda like regex101, draw.io or excalidraw.
I'm assuming it's more of a user preference like commas in front of the field instead of after field?
SQL is a declarative language. The ordering of the statements was carefully thought through.
I will say it's harmless though, the clauses don't have any dependency in terms of meaning so it's fine to just allow them to be reordered in terms of the meaning of the query, but that's true of lots and lots of things in programming and just having a convention is usually better than allowing anything.
For example, you could totally allow this to be legal:
def
for x in whatever:
print(x)
print_whatever(whatever):
There's nothing ambiguous about it, but why? Like if you are used to seeing it one way it just makes it more confusing to read, and if you aren't used to seeing it the normal way you should at least somewhat master something before you try to improve it through cosmetic tweaks.I think you see this all the time, people try to impose their own comfort onto things for no actual improvement.
LINQ popularized it and others followed. It does what it says.
Btw: saying that "people try to impose their own comfort" is uncalled for.
If you are right about why they did it its even dumber than my reason, they are changing a language grammar to let them make a much worse solution to the same problem.
Sure you can do something that is close enough, but the LINQ authors were looking for precision in the autocompletion and for the LINQ query to have the same ordering as expression syntax.
The goals of this syntax are very precise and people seem to like it. Once again: calling it dumb is uncalled for.
It doesn't require a time machine, just a basic understanding of statistics or probability.
On the other hand, statistical autocomplete is not as good as having a precise autocomplete that does’t require jumping around lines.
My point here is that different people enjoy different things. There is no need to shit on other people’s accomplishments or preferences.
If your tables have very heterogeneous column names, like 1 column will identify any table on average. There will be some duplicates but the median columns will be one or two, but generally you can even complete those after a few characters.
If your database has very homogenous column names you don't need to identify a single table for autocomplete to be very precise, unless there is no correlation between column name co occurence within tables. However if there is no correlation you are back to very low number of columns to identify the table.
> Getting the AST is a big step forward, but we still need a way to take your cursor position in the editor and map it to a path through this AST. Otherwise, we can’t know which part of the query you're interested in previewing. So we built some simple tools that pair DuckDB’s parser with its tokenizer to enrich the parse tree, which we then use to pinpoint the start and end of all nodes, clauses, and select statements. This cursor-to-AST mapping enables us to show you a preview of exactly the SELECT statement you're working on, no matter where it appears in a complex query.
But that's exactly what they show in the blog post??
Except for system performance data.
You can checkout our sandbox at
(129304 rows affected)
First, repeat data analyst queries are a usage driver in SQL DBs. Think iterating the code and executing again.
Another huge factor in the same vein is running dev pipelines with limited data to validate a change works when modelling complex data.
This is currently a FE feature, but underneath lies effective caching.
The underlying tech is driving down usage cost which is a big thing for data practitioners.