05 July 2011

UNION ALL and sorting - first objection for a SQL quiz! (3466)

The SQL quiz on GROUPING SETS (week of 25 June) raised the following objection from a player:

"I don't think the second choice (the "do-it-yourself implementation") is correct, because it relies on the fact, that the UNION ALL operator preserves the order of the rows returned by subqueries. This seems to be true in recent versions of Oracle database, but I cannot find it documented anywhere, so it should not be relied on. Or am I missing something here?"

Since I am fortunate enough to have authors and reviewers with much more depth in SQL than I working on the SQL quiz, I will post this objection and leave it to them (and the rest of you) to respond. Then we will conclude a next step for this quiz.

5 comments:

  1. It is true that the correctness of answer 6891 relies on UNION ALL preserving the order of the rows. It is also true that this is not documented.

    However it is a well known practice that using UNION ALL avoids doing an "outer" sorting (see for example this tip: http://www.oraclebrains.com/2007/09/sql-tipsuse-union-all-instead-of-union/ ) I cannot conceive of examples in present Oracle releases where the optimizer might decide to do otherwise for a UNION ALL.

    On the other hand it is a slight possibility that Oracle may change the implementation since they do not document that UNION ALL will preserve ordering (perhaps doing each subquery in parallel and merging results.) If that should ever happen, answer 6891 would become false.

    As of present Oracle releases and to the best of my knowledge and common sense about the optimizer - I will claim that the answer is correct right now. But I concede it relies on undocumented behaviour and as such could be suspect.

    For now I will add a warning to the quiz and show an alternative that should be safe "forever".

    And I ask for discussion: Do any of you have experience or cases where a UNION ALL does not preserve ordering? Other comments, please?

    ReplyDelete
  2. Hello All,
    I also thought a little bit over the same issue,
    and what Kim says is probably right, though undocumented ...

    To tell the truth, I found the last choice even more intriguing, because it presented a feature
    which also went undocumented as far as I know, namely, the fact that using aliases with same names
    as existing columns and specifying those aliases in an ORDER BY will in fact perform ordering by the alias,
    and not by the column ...

    If the old SQL-ists probably remember, using
    aliases in an ORDER BY was NOT an option at all
    in older versions ... it was even stronly documented that a UNION query can ONLY be ordered by
    specifying position numbers of columns from the select list, and never column names ...
    Adding an ORDER BY with column names was automatically considered as belonging to the LAST query in the UNION,
    and NOT to the overall UNION query ( I even remember a bug related to this, back somewhere in Oracle V6... ).

    What I really missed in this quiz was a choice
    with an ordering based on GROUPING or GROUPING_ID, which is in fact the recommended deterministic way
    of ordering such queries.

    Also, it looks like when specifying several groupings in a GROUPING SETS clause, but without specifying
    an ordering at all, then Oracle presents the results of each grouping set
    one after the other, without "mixing them up",
    therefore (probably), most of the examples I saw about GROUPING SETS do not contain an ORDER BY clause,
    the documentation just mentions in a phrase that the correct ordering would probably be based on using
    one or more GROUPING functions.

    In summary, a good quiz in my opinion.

    As a preparing quiz author for this competition,
    after I saw this verbose quiz categorized as Intermediate, I just started to wonder how an Advanced quiz
    will look along these lines ...
    It's true, long does not always means very complex, but however it takes up your time
    "to compile" it ... so I had a strong feeling that maybe some different scoring calculation should
    be applied to this competition,
    because, by its very nature, a good and complex SQL quiz will probably require some more
    "business/application level introduction" than we are used to from the PL/SQL quizzes.

    I'd like very much to also hear Steven's opinion
    on this issue, a one coming both from a specialist as from somebody who is
    auto-considering himself "not an SQL specialist"...

    Thanks a lot to all and hope for an interesting competition, the start was promising anyway :) :)

    Best Regards,
    Iudith Mentzel

    ReplyDelete
  3. Thanks for the comments, Iudith


    - About the ordering by column aliases:

    In the diagram for the ORDER BY clause in the documentation there are 3 possibilities: expr, position and c_alias.

    http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/statements_10002.htm#i2168299

    Interestingly further down in the docs where ORDER BY clause is described in words, only expr and position is discussed. Looks like the c_alias was put into the flowchart only :-)

    Good point about a choice ordering by GROUPING - that would have been ideal if I had remembered that.


    - About the difficulty level:

    Intermediate was chosen for the topic (actually it was discussed among reviewers whether it should be beginner knowledge :-) I feel that knowing GROUPING SETS may not necessarily be beginner knowledge, but an intermediate player should know it. Advanced I would use for SQL MODEL clause or advanced analytic functions, for example.

    The verbocity of that quiz mostly is due to 6 choices with a good deal of code. But that is the same for all players, so it at most means that everybody scores on average a little lower for this quiz than another intermediate quiz with less text. Ranking would not be influenced. And the challenge scoring algorithm has always weighed correctness much higher than time anyways.

    So how much text is in a quiz should not (IMHO) influence the difficulty level. (Personally I prefer a slightly more verbose quiz if that means less ambiguities, but that is just my preference - I prefer answering correct rather than quickly :-)


    Hope you all keep trying the "SQL Challenge" - we will all endeavour to make the quizzes interesting :-)

    ReplyDelete
  4. I think many people here would have heard about Tom Kyte's opinion on this:
    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:75397449124988

    In short, he thinks we should never rely on sorting without 'ORDER BY'. And I agree. It is undocumented. It may be not correct. It is not a best practice.
    And if it's not best practice, we should not promote its usage, should we?

    ReplyDelete
  5. I agree, Scott.

    Problem is that I *do* have ORDER BY, and no documentation I have found states much about how ordering of inner selects may or may not be preserved.

    Yes, I admit it is not the very best answer I have concocted - we have added a warning and alternative answer to the quiz and I will try not make this same mistake in the future. I just do not believe it to be "wrong enough" that it warrants a re-score, that's all.

    But I may be overruled by general consensus - I do not claim infallibility :-)

    ReplyDelete