Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it, as far as I know. Hence, no tree-of-objects database engine provides the level of detail in their "EXPLAIN" output that SQLite provides.

I believe Microsoft SQL Server uses an object tree internally, and yet its query plan output is a table:

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-s...



It can also execute a query incrementally. In fact, certain features rely on this behavior. “Watch live data” (in SSMS) for extended events is actually an infinite table-valued function. It’s not exactly rocket science to do in a database, you just need to model data sources and operators as iterators.


Same with PostgreSQL. You can choose between text output (one table row per line), XML or JSON. But none of them are ordered according to a strict execution order like bytecode would be. To me that makes perfect sense though because if represents what can be parallelized in the plan. The bytecode representation in SQLite seems to have the inherent limitation that it is single threaded.


Dolt's EXPLAIN output prints the execution tree directly. E.g.:

    explain select * from xy join uv on (x = u and u  > 0) where u < 2;
    Project
     ├─ columns: [xy.x:2!null, xy.y:3, uv.u:0!null, uv.v:1]
     └─ LookupJoin
         ├─ IndexedTableAccess(uv)
         │   ├─ index: [uv.u]
         │   ├─ static: [{(0, 2)}]
         │   ├─ colSet: (3,4)
         │   ├─ tableId: 2
         │   └─ Table
         │       ├─ name: uv
         │       └─ columns: [u v]
         └─ IndexedTableAccess(xy)
             ├─ index: [xy.x]
             ├─ keys: [uv.u:0!null]
             ├─ colSet: (1,2)
             ├─ tableId: 1
             └─ Table
                 ├─ name: xy
                 └─ columns: [x y]


Actually, SAP HANA has a nice tool called PlanViz for that in Eclipse. A google image search gives you a good impression how that works. There are also some blogs about it, e.g. https://community.sap.com/t5/enterprise-resource-planning-bl...


I love how the example SQL query uses abbreviated columns that sound like runes:

    select
    --   count(*)
    a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
    --     *
    from       BSEG    as a
    innerjoin  ACDOCA  as b
    on    b.rclnt  = a.mandt
    and   b.rbukrs = a.bukrs
    and   b.gjahr  = a.gjahr
    and   b.belnr  = a.belnr
    where a.mandt  = '715'
    and   b.rldnr  = '0L'
    and   b.docln  = '000001'
    and   a.buzei  = '001'
    and   a.gjahr  = '2018'
    --and   a.gjahr = '2017'
    --and   a.gjahr = '2019'
    --order by a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
    limit 200;
Just realised this is probably from German too, "jahr" being year.


yeah, these accounting table field names have not changed since around 30 years, which is not necessarily a bad thing...

for a human readable view of this you can look e.g. here https://docsfortec.com/sap/S4/tables/ACDOCA


They're a bit of a bilingual mess.

Did they start out with German table names and decide to give new tables English names at some point?


Out of the guts, the only maybe english abbrev is docln = document line (?). Referring to the example, not the link.


Typically you get XML for showplan because it can represent the tree structure better.


You get XML for estimated execution plans if you use SET SHOWPLAN_XML ON. You get the actual execution plan XML along with your results if you use SET STATISTICS XML. You can see cached execution plans in the sys.dm_exec_query_plan dynamic management view.

The estimated/actual execution plan feature in SQL Server Management Studio was changed to use XML in SQL Server 2005. The SQL Server team are only adding new information to the XML representation, not the text representation.

The name "estimated" execution plan is a bit misleading. If you executed the statement(s) with the current indexes and statistics, that's how it would be executed. I think the "estimated" part of the name is because the number of rows returned from each node, and the execution time of each node, is an estimate.

Assuming that the indexes and statistics don't change, you should expect the "actual" execution plan to have the same shape. It will just be annotated with the number of rows that were actually retrieved as well as the estimated numbers. SQL Server doesn't re-plan if it turns out the number of rows was greater (or fewer) than expected.

As a SQLite and SQL Server user, I find that I would like something more detailed from SQLite than EXPLAIN QUERY PLAN (which just tells you the join order), but less detailed than the bytecode. Particularly I want to know why it chose that join order or to use that index.


I havent used SQL server much since the 2008 days but it used to happen quite often that the estimated and actual execution plan differed a lot. I think the estimated part also includes how many rows get returned from a subquery and that informs which joining algorithm is used.

I second your longing for something more detailed than explain query plan in Sqlite though.

In fact I complained about it here and some sqlite developer pointed me to these resources:

https://news.ycombinator.com/item?id=30405275

I've never actually tried yet. Queries are still too fast to do all that:)


I don’t doubt the author, but what is it that makes rendering a tree of objects to a table a difficult problem? Is that not what browsers do when they render a table element?


Quote from the article:

"A tree-of-objects representation is more difficult to publish in a human-readable form. The objects that comprise the tree tend to all be very different, and thus it is tricky to come up with a consistent and simple table representation with which to display the objects. Any any such table representation that you do come up with would almost certainly have more than six columns, probably many more. The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it"

To further elaborate on this important point.

There is an 'impedance mismatch' (conceptual difficulty mapping between the two logic models) between the tree abstract data type and the table abstract data type. Specifically, there are four key differences between the simple table data structure and the more complex tree data structure that makes mapping between them a non-trivial operation.

Hierarchical: A table has a flat representation; a tree has a hierarchical representation.

Order: The order of the rows in a table typically do not matter (they may have a unique rowid). The order of branches (nodes) and leaves in a tree is important, and the ordering itself in an encoding of valuable information.

Semi-structured: A table has a fixed structure (rows multiplied by columns). A tree has a flexible structure - an arbitrary combination of branches (internal nodes) and leaves (terminal nodes). Semi-structured data has a structure that may not necessarily be known in advance, the tree has irregular and variable formation; the tree may have branches with missing or supplementary nodes.

Meta-data: The information describing the meaning of the data in a table is typically stored separately from the table - consequently a schema is often mandatory. A schema is optional for a tree abstract data type.

As an aside, I have been visiting hacker news almost daily since 2010. This is my first comment on hacker news. I want to say thank you to the community for the many valuable insights over this years.


Congratulations for your first comment!

What I want to do is represent my tree-structure as a table in a relational database and then be able to efficiently get the tree structure back by transforming the table-representation back into a tree.

Further I would like to do that in plain standard SQL. This must be a common problem, any documented solutions out there?


Thank you.

Great question - you have touched on the key difference between a labeling scheme and an encoding scheme for tree data structures.

As mentioned previously, the tree is an abstract data type, that is to say, a conceptual model that defines the nodes in a tree, and their relationships.

To be able to evaluate a expression that processes a tree, one needs a labeling scheme. The purpose of a labeling scheme is to assign unique labels to each node in the tree and these labels must facilitate node ordering, and often (but not always) a labeling scheme will permit the reconstruction of the tree structure.

However, no labeling scheme captures the node type, names or the content stored at the nodes. For that we need an encoding scheme. An encoding scheme is constructed upon a labeling scheme and augments it with the information necessary to fully represent the tree in a table-like data structure. In answer to your question, it also permits the full transformation from the table representation to the original tree structure.

Thus, it sounds like what you are looking for is an encoding scheme.

There are many different labeling schemes out there for tree structure data, and virtually all of them can be augmented with additional information to construct a complete encoding scheme. Concerning documented solutions - I have not been active in this space for a number of years, so off the bat - I don't have a recommended documented solution to point you too.

But to help, I will put a link to my PhD thesis [1] which gives a more in-depth understanding of labeling schemes and encoding schemes for tree structured data with an example of a simple implementation of an encoding scheme enabling the full transformation from the table representation to the original tree structure (pages 5-9) and a survey of the advantages and disadvantages of existing labeling schemes concerning their usefulness to be part of an encoding scheme you could use in your solution (see chapter 2)

Caveat 1: My thesis was written in the context of updating dynamic (XML) trees but it addresses the transformation between tree and table data structures.

Caveat 2: The thesis was written 11 years ago, but every now and then I have kept in touch with the latest developments in the area, and to my knowledge, there have been no major developments since.

I hope it helps.

[1]: https://doras.dcu.ie/19316/


Thanks for the link


Hey, this is excellent. Thanks for the thorough response.


At least they're hard to display on a terminal. Reading EXPLAIN ANALYZE diagrams from psql does not spark joy.


It does in something like DataGrip. Honestly, I can't imagine using Postgres without it these days.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: