Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Waiting for SQL:202y: Group by All (eisentraut.org)
59 points by ingve 36 days ago | hide | past | favorite | 39 comments


Let me reference fields as I create them:

  select xxxxx as a
       , a * 2 as b


This will be great! One of the things ClickHouse has had since 2016.


SQL needs to have `select` as the _last_ part, not the first. LINQ has had this for 2 decades by now: "from table_a as a, table_b as b where ... select a.blah, b.duh".


This is not relevant to GP's point. This is a separate topic, which... I don't really care, but I know a lot of people want to be able to write SQL as you suggest, and it's not hard to implement, so, sure.

Though, I think it might have to be table sources, then `SELECT`, then `WHERE`, then ... because you might want to refer to output columns in the `WHERE` clause.


WHERE clauses are pushed down into the query planner before the SELECT list is processed, that’s why HAVING exists.

The logical order, in full, is:

FROM

WHERE/JOIN (you can join using WHERE clauses and do FROM a,b still)

SELECT

HAVING


That's the order in which the processing happens, but this doesn't need to be reflected in the language. The language has this ordering so it sounds like a natural language which SQL was invented for.


See u/cyberax's comment below. It would be nice to be able to create scalar (as opposed to table-valued) bindings that can be referred to in a WHERE (or JOIN) clause. Currently it's SELECT that establishes such bindings, and... well, it's not terribly clear where they can be used (certainly in HAVING, but first you have to GROUP BY, no?). u/cyberax's idea is to have a LET for this that can come before WHERE and before SELECT.


I mean, I get it, but the big problem is, again, the different phases of execution. The projections you perform with a select can be absolutely arbitrary and do crazy ass things (like do more subqueries that return scalar values, and query planners are notoriously bad at pushing these down), which is why I was trying to say SELECT before WHERE (project before filtering) may be linguistically intuitive, but full of foot guns.

Something like a ‘let’ binding after the FROM/JOIN list would make sense, though - from the query planners perspective it’s nothing more than a token substitution and everything would compile the same.


Ideally, it needs to be "from", then arbitrary number of something like `let` statements that can introduce new variables, maybe interspersed with where-s, and then finally "select".

"select" can also be replaced with annotations, something like: `from table_1 t1 let t1.column_1 as @output_1 where ...` and then just collect all the @-annotated variables.

I need to write a lot of SQL, and it's so clumsy. Every time I need a CTE, I have to look into the documentation for the exact syntax.


> Ideally, it needs to be "from", then arbitrary number of something like `let` statements

Isn't that what a CTE is?


Not quite. u/cyberax wants scalar bindings, not table-valued bindings.

Something like

  FROM foo
  LET a = (x + y) * z
  SELECT a;
whereas CTEs are... Common Table Expressions.


That was kind of my first thought...


https://prql-lang.org/ and compile to SQL.


Thank you! This is indeed close to what I want from SQL!


The Pipe Query Syntax in GoogleSQL implements this elegantly as well:

https://docs.cloud.google.com/bigquery/docs/reference/standa...


Also in the Kusto Query Language (KQL) as used by Azure Log Analytics.


Also just let me reference the damn alias in a group by, FUCK


At least in PostgreSQL, both by alias and ordinal are possible:

  localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:02:15 PST 2025]
  > create table test (a_key integer primary key, a_group integer, a_val numeric);
  CREATE TABLE
  Time: 3.102 ms

  localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:02:25 PST 2025]
  > insert into test (a_key, a_group, a_val) values (1, 1, 5.5), (2, 1, 2.6), (3, 2, 1.1), (4, 2, 6.5);
  INSERT 0 4
  Time: 2.302 ms

  localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:02:58 PST 2025]
  > select a_group AS my_group, sum(a_val) from test group by my_group;
   my_group | sum
  ----------+-----
          2 | 7.6
          1 | 8.1
  (2 rows)
  
  Time: 4.124 ms
  localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Sun Nov 16 12:03:15 PST 2025]
  > select a_group AS my_group, sum(a_val) from test group by 1;
   my_group | sum
  ----------+-----
          2 | 7.6
          1 | 8.1
  (2 rows)
  
  Time: 0.360 ms


Some do. It would also be nice to reference by ordinal number similar to order by. Very handy for quick and dirty queries. I can see the issue though that people start to lean on it too much.


The problem with this and similar requests is that it would change the identifier scoping in incompatible ways and therefore potentially break a lot of existing SQL code.


I think it should be not only in GROUP BY, but in every context, e.g., inside expressions in SELECT, WHERE, etc.


PostgreSQL and DuckDB support this, which makes MSSQL feel like a dinosaur in context.


So, why not a SORT BY ALL or a GROUPSORT BY ALL, too? Not always what you want (e.g., when you're ranking on a summarized column), but it often alphabetic order on the GROUP BY columns is just what the doctor ordered! :-)


The working group also discussed ORDER BY ALL, but for some reason most participants really did not like it.


Not directly related, but I saw this project recently of a data language by google which is quite cool https://www.malloydata.dev/


Snowflake has that, once you start using it, it's painful to go back.


What about reusing a CTE? Let me import a CTE definition so that it can be used throughout my app, not just in the current context.


I believe that’s what we call a "view".


BigQuery has that and I've been loving using it since they introduced it



this seems to ignore the fact that you can group by a column that isn't in the select statement.

it's not something that i've found a particular use for, but it IS a thing you can do.


What's wrong with GROUP BY 1,2,3?


SELECT * EXCEPT(col_name) next please.


This was also discussed at the last SQL WG meeting but was postponed for further refinement. But it’s likely to be added soon.



Yes it needs to be in the standard though.


That might be nice for manual experimentation, but for application use, this seems brittle compared to specifying the columns you really want to have and process.


would be nice


What? No! I want GROUP BY * and more importantly GROUP BY mytable.*




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: