PL/SQL Challenge: Winner Selected for Hierarchical Queries Challenge (8471)

08 February 2012

Winner Selected for Hierarchical Queries Challenge (8471)

In October 2011, we posted a competition regarding hierarchical queries, with this introduction:

Here at the PL/SQL Challenge, we recently encountered the need to accumulate counts of questions up through our topics tree (explained further below). With the help of Kim Berg Hansen, we came up with a solution. In the process, however, Kim realized that (a) the first pass may not be the most optimal solution and (b) he also had a more complex scenario, with which he could use some help. So we decided to offer up this challenge to our players. Remember: you can come back to this problem as often as you'd like. You do not need to submit an answer right away!

Several dozen players submitted solutions, many of them obviously taking lots of time and effort on their parts. Kim went through all the proposed solutions and evaluated them. His analysis is available in the answer for this competition, as well as the "All Submitted Answers" archive.

He chose Milan Vontorcik as offering the best solution for the "complex" scenario and he has been awarded a prize of an O'Reilly Media ebook.

Thanks to all players who submitted a solution to this puzzle.

Steven Feuerstein

8 comments:

Jeff Kemp09 February, 2012 05:15
Congrats to Milan Vontorcik! And thanks to Kim for the very interesting challenge :)
ReplyDelete
Replies
Kim Berg Hansen09 February, 2012 07:25
I'd also like to offer my personal thanks to all participants in this challenge. I'm quite certain I can nick several of your ideas for my work ;-)

Apologies for taking some time evaluating the answers, I hope you haven't forgotten that you participated ;-)

The archive (tar.gz) Steven links to in the post contains all the evaluated answers along with test runs, spool outputs, trace files and tkprof output along with a few text files with my evaluations and conclusions.

Even though I selected what IMHO was the best answer to the challenge we set, there can quite easily be situations where one of the other solutions might be preferable. If you are interested, take a look at the various answers - can be quite educational to understand the different methods ;-)

Congrats to Milan Vontorcik (and thanks to O'Reilly for prize sponsorship ;-)
ReplyDelete
Replies
iudith09 February, 2012 09:00
Hello Kim, All,

I would be glad if you could elaborate a little bit more about why the PUSH_PRED hint is needed at all in Milan's solution.
The solution is based on a set of WITH queries that are built each one upon the preceding one,
so Oracle should start from the first one, where the bind variable is already used at the innermost level.

By the way, just two days ago I heared from Tom Kyte once again that WITH queries are implemented internally by using a temporary table, that is, by "materializing" the result set, which in our case seems pretty obvious due to the several successive WITH queries.

As far as I am aware, the PUSH_PRED hint comes into play when we have a predicate like
"domain_id = 10" located at an OUTER (or even OUTERMOST) level in a join query and want
"to push that predicate down" to an inner or the innermost level.

So, in our context, this would have meant "to carry" the domain_id column up to the outermost
(last) query, like into a view or materialized view defining query, even one based on
separate views for each specific domain_id, like in this solution,
and have the bind variable predicate applied only when querying that outermost view.

Second, I think that the central point here was to use "entire dataset trees",
instead of summing each node by a separate scalar select.
I expect this approach to be faster than the "node-by-node" aproach, though it will probably use
more resources (nothing comes for free in life, you know ...).

One of my solutions (the one based on the MODEL clause) also arrived at the point where an "inverted" tree was used for summing up and now I see that the winner solution also uses this idea very elegantly.

Another consideration should also be whether you want "to calculate" ALL the trees
(say in a periodic fashion, like a scheduled materialzed view refresh) OR, instead, you want to achieve the best performance whenever executing the "live query" for ONE single domain_id value chosen arbitrarily "on the fly".
Think "Maratone" versus 100m runs !

Anyway, the problem by itself was very interesting and instructive .

Thanks a lot & Best Regards,
Iudith
ReplyDelete
Replies
iudith09 February, 2012 17:48
Hello Kim,

I also thought and even experienced some cases in which a WITH select was not materialized,
except if we "helped" this materialization to happen by adding a /*+ MATERIALIZE */ hint or a ROWNUM trick.
However, Tom Kyte just explained to one of the attendees that this is what Oracle does internally, at least in most cases.

What I found about the PUSH_PRED hint is that if we have a non-mergeable view that is joined to another
table and that other table has a good enough single table predicate, then the PUSH_PRED hint can help to "push" the join condition (with one of the sides replaced with values coming from the table) inside the view as a filter condition, or in other words, to perform a NESTED LOOPS join having
the table as the outer and the view as the inner table and this operation even appears in the execution plan as a "VIEW PUSHED PREDICATE" operation.

It just happens that a few days ago I had a very similar case at my work, where instead of the 3 views
( like in this quiz ) we had 4 WITH queries, each retrieving similar data from 4 different remote
databases, and each query had a "constant filter" ( similar to the ":domain_id = 10" condition in Milan's solution ), and then a UNION ALL followed whose purpose was to effectively retrieve data from only one of the 4 sites.
What happened was that the full query did however access ALL the 4 sites, though we expect from 3 of the sites to return immediately without any data, because the constant filter condition is "always false".
It would be nice if static SQL would have some CASE-like construct for "chosing" one out of several
tables in a FROM clause and being able to execute the query by accessing only that single table, based on the value of a bind variable, without the need to use dynamic SQL for this.

Back to our problem, I just wondered what would happen with such a solution if the distribution of the data suddenly changes ( maybe in a real life environment, like your "big problem" case, rather than in this reduced one ), that is, the different syntax versions used to make each view
to be as optimal as possible may happen to be reversed ?

I also sensed this issue while working on the solutions, but did not go that far as to chose a "hard-coded values" solution for the specific cases, rather looked after a generic solution that could cover satisfactorily enough ALL the particular cases, and maybe, above all, to cover all of them
in a single select, with domain_id available as a filtering column to be used or not on demand.

We should probably start a deep study of Joe Celko's books on hierarchical queries, I only wish I could find more time to do it :) :) :)
It's an issue that can keep you busy and preoccupied for a long time !

Thanks again a lot for a really exciting experience !

Iudith
ReplyDelete
Replies
Milan Vontorcik12 February, 2012 17:35
Hello all.
my solution uses (maybe overuses) a subquery factoring clause (a set of WITH queries). I love this feature of SQL because query is more readable and easier to maintain.
Iudith is right - in my solution PUSH_PRED hint is not necessary. In the past I used more complex query in UNION ALL construct in subquery factoring clause and it didn't work as I supposed so when I tried to solve Kim's challenge I used PUSH_PRED hint and I didn't tested my query without PUSH_PRED hint. Now I tested my query without PUSH_PRED hint and execution plan is exactly same as execution plan for query with PUSH_PRED hint. So it's not needed.
Milan
ReplyDelete
Replies
iudith13 February, 2012 18:18
Hello Milan, All,

Thanks a lot for these additions.

Yes, as far as I know, in 10g and higher Oracle always tries "its best" to push predicates
inside UNION ALL views.
This does not happen always for UNION views, but I have also seen cases where a UNION is performed by a UNION ALL (eventually pushing a predicated), followed by a sorting for uniqueness.
This highly depends on which other predicates are used in each of the UNION branches.

WITH factoring is a very cool feature, first of all because it allows for shorter and more elegant code, and then, in 11gR2 for the really cool recursive queries.

For making it really performant, though, Oracle has to somehow materialize the result of the WITH query, because otherwise this would mean to perform the same query more than one time.

I would just add, as kind of a trick, that if you use a program which interprets SQL, like for example SQL*PLUS, then you can "fabricate" the table name to use "on the fly", based
on a parameter, so you can even avoid accessing each of the views, even if all of them except one will not return effectively any rows.

Just for who is curious, to our surprise, exactly today we were successful in using exactly the same trick in a Crystal Reports select statement, which also is interpreted
(bad for bind variables and sharing, but useful in this particular case).

Otherwise, of course, you always have PL/SQL at hand where you can "play-around" and choose exactly the table (or view) that is best for the different parameter values.

Best Regards,
Iudith
ReplyDelete
Replies
James06 March, 2012 17:41
Congratulations to all the winners
ReplyDelete
Replies

Add comment