08 February 2012

Winner Selected for Hierarchical Queries Challenge (8471)

In October 2011, we posted a competition regarding hierarchical queries, with this introduction:

Here at the PL/SQL Challenge, we recently encountered the need to accumulate counts of questions up through our topics tree (explained further below). With the help of Kim Berg Hansen, we came up with a solution. In the process, however, Kim realized that (a) the first pass may not be the most optimal solution and (b) he also had a more complex scenario, with which he could use some help. So we decided to offer up this challenge to our players. Remember: you can come back to this problem as often as you'd like. You do not need to submit an answer right away!

Several dozen players submitted solutions, many of them obviously taking lots of time and effort on their parts. Kim went through all the proposed solutions and evaluated them. His analysis is available in the answer for this competition, as well as the "All Submitted Answers" archive.

He chose Milan Vontorcik as offering the best solution for the "complex" scenario and he has been awarded a prize of an O'Reilly Media ebook.

Thanks to all players who submitted a solution to this puzzle.

Steven Feuerstein

8 comments:

  1. Congrats to Milan Vontorcik! And thanks to Kim for the very interesting challenge :)

    ReplyDelete
  2. I'd also like to offer my personal thanks to all participants in this challenge. I'm quite certain I can nick several of your ideas for my work ;-)

    Apologies for taking some time evaluating the answers, I hope you haven't forgotten that you participated ;-)

    The archive (tar.gz) Steven links to in the post contains all the evaluated answers along with test runs, spool outputs, trace files and tkprof output along with a few text files with my evaluations and conclusions.

    Even though I selected what IMHO was the best answer to the challenge we set, there can quite easily be situations where one of the other solutions might be preferable. If you are interested, take a look at the various answers - can be quite educational to understand the different methods ;-)

    Congrats to Milan Vontorcik (and thanks to O'Reilly for prize sponsorship ;-)

    ReplyDelete
  3. Hello Kim, All,

    I would be glad if you could elaborate a little bit more about why the PUSH_PRED hint is needed at all in Milan's solution.
    The solution is based on a set of WITH queries that are built each one upon the preceding one,
    so Oracle should start from the first one, where the bind variable is already used at the innermost level.

    By the way, just two days ago I heared from Tom Kyte once again that WITH queries are implemented internally by using a temporary table, that is, by "materializing" the result set, which in our case seems pretty obvious due to the several successive WITH queries.

    As far as I am aware, the PUSH_PRED hint comes into play when we have a predicate like
    "domain_id = 10" located at an OUTER (or even OUTERMOST) level in a join query and want
    "to push that predicate down" to an inner or the innermost level.

    So, in our context, this would have meant "to carry" the domain_id column up to the outermost
    (last) query, like into a view or materialized view defining query, even one based on
    separate views for each specific domain_id, like in this solution,
    and have the bind variable predicate applied only when querying that outermost view.

    Second, I think that the central point here was to use "entire dataset trees",
    instead of summing each node by a separate scalar select.
    I expect this approach to be faster than the "node-by-node" aproach, though it will probably use
    more resources (nothing comes for free in life, you know ...).

    One of my solutions (the one based on the MODEL clause) also arrived at the point where an "inverted" tree was used for summing up and now I see that the winner solution also uses this idea very elegantly.

    Another consideration should also be whether you want "to calculate" ALL the trees
    (say in a periodic fashion, like a scheduled materialzed view refresh) OR, instead, you want to achieve the best performance whenever executing the "live query" for ONE single domain_id value chosen arbitrarily "on the fly".
    Think "Maratone" versus 100m runs !

    Anyway, the problem by itself was very interesting and instructive .

    Thanks a lot & Best Regards,
    Iudith

    ReplyDelete
    Replies
    1. @Iudith

      I have never had the need to use PUSH_PRED hint myself, so I am not sure :-)

      I know that a WITH query (that I use a lot) is not necessarily materialized as a temporary table - only if it makes sense to the optimizer, for example in cases where the WITH alias is used more than once in the subsequent part of the selects. If the WITH query is just used once, the optimizer can often choose to rewrite as a simple inline view.

      As I see it the main "trick" is to be certain that only one of the three parts of the UNION ALL is actually executed depending on the content of the bind variable. Whether this would have worked even without PUSH_PRED I have not tested - maybe it would? Maybe Milan reads this and can comment further :-)

      But Milan did address the point of getting best performance for each of the domains, and I am thankful for seeing that there can be magnitudes differences in which solution works best for each "type" of domain. The method in v_domain_10 and v_domain_11 would work non-optimal for domain 12, and vice versa. That was great for me to know that it is worth while to NOT try and do a generic solution.

      I can make the final solution more easy to read and control, though, by doing a solution for each domain (there will by only 4-5 domains) like Milans seperate views. It just might be possible that one players solution will fit one of my domains, another players solution for a second domain, and so on :-)

      The purpose of the PUSH_PRED seems to me mainly to conform to the requirement set up in the quiz (that it should work with a bind variable or doing all domains with possibility of using domain as predicate,) but what I learned was that the requirement was a bad idea actually...

      Therefore I did not think harder on the PUSH_PRED other than as a neat trick to know :-)

      Delete
  4. Hello Kim,

    I also thought and even experienced some cases in which a WITH select was not materialized,
    except if we "helped" this materialization to happen by adding a /*+ MATERIALIZE */ hint or a ROWNUM trick.
    However, Tom Kyte just explained to one of the attendees that this is what Oracle does internally, at least in most cases.

    What I found about the PUSH_PRED hint is that if we have a non-mergeable view that is joined to another
    table and that other table has a good enough single table predicate, then the PUSH_PRED hint can help to "push" the join condition (with one of the sides replaced with values coming from the table) inside the view as a filter condition, or in other words, to perform a NESTED LOOPS join having
    the table as the outer and the view as the inner table and this operation even appears in the execution plan as a "VIEW PUSHED PREDICATE" operation.

    It just happens that a few days ago I had a very similar case at my work, where instead of the 3 views
    ( like in this quiz ) we had 4 WITH queries, each retrieving similar data from 4 different remote
    databases, and each query had a "constant filter" ( similar to the ":domain_id = 10" condition in Milan's solution ), and then a UNION ALL followed whose purpose was to effectively retrieve data from only one of the 4 sites.
    What happened was that the full query did however access ALL the 4 sites, though we expect from 3 of the sites to return immediately without any data, because the constant filter condition is "always false".
    It would be nice if static SQL would have some CASE-like construct for "chosing" one out of several
    tables in a FROM clause and being able to execute the query by accessing only that single table, based on the value of a bind variable, without the need to use dynamic SQL for this.

    Back to our problem, I just wondered what would happen with such a solution if the distribution of the data suddenly changes ( maybe in a real life environment, like your "big problem" case, rather than in this reduced one ), that is, the different syntax versions used to make each view
    to be as optimal as possible may happen to be reversed ?

    I also sensed this issue while working on the solutions, but did not go that far as to chose a "hard-coded values" solution for the specific cases, rather looked after a generic solution that could cover satisfactorily enough ALL the particular cases, and maybe, above all, to cover all of them
    in a single select, with domain_id available as a filtering column to be used or not on demand.

    We should probably start a deep study of Joe Celko's books on hierarchical queries, I only wish I could find more time to do it :) :) :)
    It's an issue that can keep you busy and preoccupied for a long time !


    Thanks again a lot for a really exciting experience !

    Iudith

    ReplyDelete
  5. Hello all.
    my solution uses (maybe overuses) a subquery factoring clause (a set of WITH queries). I love this feature of SQL because query is more readable and easier to maintain.
    Iudith is right - in my solution PUSH_PRED hint is not necessary. In the past I used more complex query in UNION ALL construct in subquery factoring clause and it didn't work as I supposed so when I tried to solve Kim's challenge I used PUSH_PRED hint and I didn't tested my query without PUSH_PRED hint. Now I tested my query without PUSH_PRED hint and execution plan is exactly same as execution plan for query with PUSH_PRED hint. So it's not needed.
    Milan

    ReplyDelete
  6. Hello Milan, All,

    Thanks a lot for these additions.

    Yes, as far as I know, in 10g and higher Oracle always tries "its best" to push predicates
    inside UNION ALL views.
    This does not happen always for UNION views, but I have also seen cases where a UNION is performed by a UNION ALL (eventually pushing a predicated), followed by a sorting for uniqueness.
    This highly depends on which other predicates are used in each of the UNION branches.

    WITH factoring is a very cool feature, first of all because it allows for shorter and more elegant code, and then, in 11gR2 for the really cool recursive queries.

    For making it really performant, though, Oracle has to somehow materialize the result of the WITH query, because otherwise this would mean to perform the same query more than one time.

    I would just add, as kind of a trick, that if you use a program which interprets SQL, like for example SQL*PLUS, then you can "fabricate" the table name to use "on the fly", based
    on a parameter, so you can even avoid accessing each of the views, even if all of them except one will not return effectively any rows.

    Just for who is curious, to our surprise, exactly today we were successful in using exactly the same trick in a Crystal Reports select statement, which also is interpreted
    (bad for bind variables and sharing, but useful in this particular case).

    Otherwise, of course, you always have PL/SQL at hand where you can "play-around" and choose exactly the table (or view) that is best for the different parameter values.

    Best Regards,
    Iudith

    ReplyDelete
  7. Congratulations to all the winners

    ReplyDelete