20 January 2011

Cursor FOR loop, automatic optimization and use of BULK COLLECT (1924)

In the 19 January 2011 quiz, you were asked to choose the block containing the most efficient looping through employee data. It was:
BEGIN
   FOR emp_rec IN (SELECT last_name FROM plch_employees)
   LOOP
      DBMS_OUTPUT.put_line (emp_rec.last_name);
   END LOOP;
END;
/
The primary lesson of the quiz is that in Oracle Database 10g and higher, with optimization set to at least level 2 (the default), the compiler optimizes cursor FOR loops so that execute at BULK COLLECT-like levels of performance.

Several players asked about documentation of this optimization feature. Others suggested that a better solution, not offered, is an explicit BULK COLLECT. I have invited them to post their comments here.

Regarding documentation, it looks like in the official Oracle documentation, specific optimizations like this are not detailed out. The feeling seems to be that such optimizations can change over time (to paraphrase: "In version 10 we optimize cursor FOR loops, in version 12 we no longer to do that or do it differently") and so providing a list of "promises" would not be helpful.

Regardless, Ask Tom discusses such optimizations here. I also cover it in my "Best of Oracle PL/SQL" training available here, as well as in Oracle PL/SQL Programming, the book.

So, dear players, please add your comments...

20 comments:

  1. Placed a dbms_monitor.session_trace_enable and dbms_monitor.session_trace_disable "wrapper" around the cursor for loop, and the bulk_collect and display loop.

    From the tkprof of the cursor for loop:
    call count cpu elapsed disk query current rows
    ------- ------ -------- ---------- ---------- ---------- ---------- ----------
    Parse 1 0.00 0.00 0 0 0 0
    Execute 1 0.00 0.00 0 0 0 0
    Fetch 101 0.01 0.00 0 146 0 10000
    ------- ------ -------- ---------- ---------- ---------- ---------- ----------
    total 103 0.01 0.00 0 146 0 10000
    (Note 101 fetch calls)

    Tkprof of the bulk_collect:
    call count cpu elapsed disk query current rows
    ------- ------ -------- ---------- ---------- ---------- ---------- ----------
    Parse 1 0.00 0.00 0 0 0 0
    Execute 1 0.00 0.00 0 0 0 0
    Fetch 1 0.01 0.01 0 47 0 10000
    ------- ------ -------- ---------- ---------- ---------- ---------- ----------
    total 3 0.01 0.01 0 47 0 10000
    (Note only 1 fetch call, and fewer lios)

    ReplyDelete
  2. Personally, since all the code sees to do is display the last name, I feel this code is the best way to do this:

    SELECT last_name FROM plch_employees ;

    Why even use PL/SQL in this case?

    ReplyDelete
  3. None of the answers included BULK COLLECT, so I answered "none of the above".

    Oracle's recent optimization feature to convert a cursor FOR loop to a BULK COLLECT must be slightly slower than using the BULK COLLECT explicitly in the code (I haven't tried, to see if it is measurable). I learned the BULK COLLECT feature from you, Steve, years ago and have been a avid and delighted user of the feature ever since.

    Why force the compiler to perform its optimization step - an extra step-, when it can be explicitly coded that way?

    Two words: Unfair quiz!

    ReplyDelete
  4. This is such a simple block of code that, indeed, PL/SQL is not even needed to perform the operation. Assuming it was part of a large body of code, however, it remains important to be able to recognize when simplification (and full leveraging) of SQL can be done inside PL/SQL.

    Unfair? I don't really see that. Sure, you could also use BULK COLLECT and that would have been another good choice here, but I don't see why you would answer "None of the above" just because it wasn't there.

    As for "why force the compiler to perform its optimization step"....a cursor FOR loop is simpler to write and easier to maintain than a conversion to BULK COLLECT. And in many situations, it will be sufficiently fast.

    You can probably tease somewhat better performance out of an explicit BULK COLLECT, but the difference will in most cases (I believe) be unimportant/not noticed.

    I am happy to leave as many steps as possible for the compiler to take care of, automagically.

    ReplyDelete
  5. I got it wrong also, but am still happy with this quiz:
    1. I actually choose "None" for not a bad reason (I thought of BULK Collect)
    2. Reading the solution, I was reminded of the automatic optimization. I did know, but forgot about it. So I learned actually. That is what it is all about. And writing BULK COLLECT and then looping again through a filled PL/SQL table is much more work than the automatic optimized FOR LOOP. The minimal benefits of a probably more efficient explicit BULK COLLECT do not outweigh the much simpler and readable code
    3. And in the end pretty happy it was "just" a beginners question so my mistake did not count for much. It is not a good Quarter for me so far, so I can use some luck ;-)

    Toine

    ReplyDelete
  6. O yes (follow-up on my previous post here), this reminds me of the following:
    in my former job my boss once looked at me and asked me: "Why would you want to go to a presentation of Steven for the second time?"...(he was paying for it). Well:

    I heard about this automatic optimization for the first time from Steven...and now you see: I forget (I am getting older, must be that ;-))...so it is of great value to hear Steven talk more than once. Repetition pays of.

    Toine

    ReplyDelete
  7. Hello All,
    I think the point here resides AGAIN in the wording of the question itself.

    I'll try to emphasize it like this:

    "Which choice contains a block of code that
    1. utilizes the most appropriate type of loop
    AND
    2. is the fastest way to display the last names of all the employees in the table?

    Maybe asking it directly as:
    "Which of the choices below provides the fastest way to ..."
    would have removed the interpretations.

    Then:
    1. If we restrict ourselves to using LOOPS,
    then we already have excluded the possibility
    of using a single SELECT ... BULK COLLECT INTO ...
    ( that is, WITHOUT using a loop at all ),
    which for 10,000 rows is probably the FASTEST
    method.

    2. If we restrict ourselves to LOOPS, then
    the correct choice is right in the sense that
    the FOR loop is definitely faster than ANY type of row-by-row loop (the 3 other choices practically all do the same, going row by row).

    But, if we consider that the DEFAULT array size
    used by Oracle's optimizing cursor FOR loops
    is 100 (only !), then YES, an explicit loop
    using "FETCH ... BULK COLLECT INTO ... LIMIT size_greater_than_100" can definitely be considered as probably being faster !

    How faster ?

    Always faster if you make the array LIMIT higher,
    but then it's a trade-off between memory usage
    and and performance.

    What exactly is the optimal size of the array
    for the LIMIT clause ?
    Tom Kyte would say "IT DEPENDS" , and that is probably the right answer.

    I can only tell you that the last days we successfully used a FETCH ... BULK COLLECT INTO ... LIMIT 10000 in one of our projects,
    which together with a FORALL bulk insert
    reduced the processing time dramatically !

    But, in the framework of the quiz, I think the
    cursor FOR loop being marked as the right choice
    is correct.

    I also don't remember of having seen the default array size of 100 mentioned in the documentation,
    I have learnt about it in one of the wonderful threads of Asktom :) :) :)

    Best Regards,
    Iudith

    ReplyDelete
  8. I fall into the group that chose "none of the above" because I felt the best and fastest way was to use the bulk_collect. I actually initially chose the cursor-for loop initially as the best choice of the ones listed, and then changed.

    The way I feel this question could be improved is to remove the "none of the above" option.

    ReplyDelete
  9. Heya

    I agree with -Dan- (not with "unfair quiz" but with "none of the above").

    "Which choice contains a block of code that utilizes the most appropriate type of loop and is the fastest way to display the last names of all the employees in the table?" for 10000 records.

    None of provided. The most appropriate type of loop and the fastest way for 10000 rows is using BULK COLLECT. This always works fast, no matter what optimizer level is set to. So why should I bother about optimizer level setting if I can write a code which always runs fast :)
    This option wasn't provided so IMHO the right answer was "None of the above".

    Regards
    Ludo

    ReplyDelete
  10. Darn, darn, darn it!

    I must agree. What I MEANT by my question was "the fastest way among the choices presented". I did not, however, say that. As a result, it is perfectly reasonable for a player to decide that the optimal solution is not present.

    I will therefore do the following:

    1. Change the text of the question so that I ask for the best choice among those presented.

    2. Give anyone who chose "None of the above" 100% credit.

    3. Change their selection on this choice to "correct."

    Thanks for pointing this out, all....

    Steven

    ReplyDelete
  11. As Steven has said, you can probably get a bit better performance out of an explicit bulk collect, but unless performance is critical you need to think about the ease of maintenance - a thorough bulk collect ... limit ... statement within a loop (to avoid issues with memory consumption with growing numbers of records) is more complicated to write and test than a nice simple cursor for loop.

    ReplyDelete
  12. I don't understand the commotion about this quiz.

    Steven just wanted to point out(make clear or whatever) that you don't have to re-write a cursor-for-loop into a bulk collect.
    Oracle, the database, the compiler or whatever does it for you.

    When you're an experienced developer you should know this. When you don't know this and you took this quiz, you know it now(also by making a mistake).

    Oh...maybe you read this wrong...I should probably say "experienced PL/SQL developer", as a .Net developer will not know this.
    I'll do a re-score for this...

    ReplyDelete
  13. > "Why force the compiler to perform its optimization step - an extra step-, when it can be explicitly coded that way?"

    It's a compile-time optimization. You don't generally care if the compiler takes slightly longer while it converts the code you wrote into something faster. There's no overhead at runtime.

    > Always faster if you make the array LIMIT higher, but then it's a trade-off between memory usage and and performance.

    Is it automatically faster? I'd like to see a demo. My understanding is that the streaming approach should make better use of system resources, and in any case I would expect diminishing returns after the low hundreds. As for FORALL, that certainly changes things - the compiler can't (yet!) re-write INSERT/UPDATE/DELETE logic within your loop.

    > So why should I bother about optimizer level setting if I can write a code which always runs fast

    You don't need to! It's the default setting, that very few sites would ever change. It's basically just the way the PL/SQL compiler works. You don't need to change anything.

    As for documentation, this is just one of a whole set of optimizations that Oracle have been adding throughout the lifetime of PL/SQL (e.g. skipping loops if they contain only unreachable code, perhaps transforming "myvar := myvar +1" into some sort of internal "myvar++", who knows) and I wouldn't expect them to document every single one.

    ReplyDelete
  14. Regarding the use of dbms_output to display, in fact dbms_output stores data in a package level nested table, so it is not a good idea to use dbms_output to output 10000 of varchar2 (100).

    ReplyDelete
  15. Hello All,

    To this already "fat" thread I'd like to add just one more comment:

    As Steven said in the answer of this quiz (and this appears also in Bryn Llewellyn's white paper),
    the automatic optimization of using BULK COLLECT ... LIMIT 100 "behind the scenes" is only applied
    for cursor FOR loops, and NOT also for regular OPEN ... FETCH ... CLOSE loops (explicit cursors).

    The explanation supplied is that the complier is not doing it because "it cannot be sure that the result
    will logically be the same", maybe you have an additional FETCH inside the loop, a.s.o.

    In my opinion, I think that the PL/SQL compiler would/should ALWAYS attempt to do this,
    regardless of the PL/SQL unit's logic.
    That is, each time when it sees a "FETCH single-row" operation in the source code
    ( that is, a FETCH ... INTO without a BULK COLLECT specification )
    to ALWAYS try to "pre-fetch" an array of rows and buffer them into the PGA memory,
    for having them "ready" for subsequent FETCH-es, if any, wherever they are performed in the code.

    That is, something similar to how it treats a CLOSE cursor statement, when behind the scenes it only
    performs a "soft-close", leaving the cursor cached in the session memory, for eventual reuse in the same
    session, though from the pl/sql unit's point of view the cursor appears as closed.

    Maybe we can expect such an optimization in the future, and I think it SHOULD NOT affect any existing code
    in any way.
    For those who are "very severe" about such a feature, maybe with an option to set the feature ON or OFF
    ( or maybe just a PLSQL_OPTIMIZE_LEVEL setting that would do it ).

    Thanks & Best Regards,
    Iudith

    ReplyDelete
  16. > In my opinion, I think that the PL/SQL compiler would/should ALWAYS attempt to do this, regardless of the PL/SQL unit's logic.
    Actually not quite true:

    This is the copy of the comment that I have sent Steven through the quiz feedback:

    This is the tread on AskTom: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:588234700346069527 where Tom explains the difference in fetch count between optimized "FOR" loop and "SLOW BY SLOW" loop using the 10046 trace. I want to explain by pure PL/SQL how we can see that optimization.

    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options

    SQL> set serveroutput on
    SQL>
    SQL> create or replace type number_t is table of number;
    2 /

    Type created.

    At first I created the package for variables storing. Variable pkg_variable.bol will be modified from the pipelined function and the PL/SQL block utilizing this function inside the query in the "FOR" loop.

    SQL> create or replace package pkg_variable is
    2 bol boolean;
    3 cnt pls_integer;
    4 end;
    5 /

    Package created.

    Then I create pipelined function, which pipes 300 rows and checks if pkg_variable.bol was modyfied outside the function between piping, and if so increases the variable pkg_variable.cnt by 1. So we can count how many fetches was done.

    SQL> create or replace function check_limit return number_t pipelined is
    2 begin
    3 for i in 1 .. 300 loop
    4 if pkg_variable.bol then
    5 pkg_variable.cnt := pkg_variable.cnt + 1;
    6 pkg_variable.bol := false;
    7 end if;
    8 pipe row (i);
    9 end loop;
    10 end;
    11 /

    Function created.

    My first case is for plsql_optimize_level = 1 where "FOR" loop optimization is not present. For every row that is piped from pipelined function the loop body is executed, and so pkg_variable.bol = true before each next pipe call to the function. After execution we get pkg_variable.cnt = 300.

    SQL> alter session set plsql_optimize_level = 1;

    Session altered.

    SQL>
    SQL> begin
    2 pkg_variable.cnt := 0;
    3 pkg_variable.bol := true;
    4 for rec in (select /*+ ABC_1 */ * from table(check_limit)) loop
    5 pkg_variable.bol := true;
    6 end loop;
    7 dbms_output.put_line('plsql_optimize_level = 1, pkg_variable.cnt = '||pkg_variable.cnt);
    8 end;
    9 /
    plsql_optimize_level = 1, pkg_variable.cnt = 300

    PL/SQL procedure successfully completed.

    ReplyDelete
  17. My second case is for plsql_optimize_level = 2, which is default. In this case the "FOR" loop optimization happens. And there are only 4 switches (3 + 1 after the last pipe) between fetch and loop body execution.

    SQL> alter session set plsql_optimize_level = 2;

    Session altered.

    SQL>
    SQL> begin
    2 pkg_variable.cnt := 0;
    3 pkg_variable.bol := true;
    4 for rec in (select /*+ ABC_2 */ * from table(check_limit)) loop
    5 pkg_variable.bol := true;
    6 end loop;
    7 dbms_output.put_line('plsql_optimize_level = 2, pkg_variable.cnt = '||pkg_variable.cnt);
    8 end;
    9 /
    plsql_optimize_level = 2, pkg_variable.cnt = 3

    PL/SQL procedure successfully completed.

    After all I check cursor statistics in v$sql due to the little trick with hint-like comment in queries:

    SQL> select s.sql_text, s.executions, s.fetches from v$sql s where s.sql_text like 'SELECT%ABC_1%';

    SQL_TEXT EXECUTIONS FETCHES
    -------------------------------------------- ---------- ----------
    SELECT /*+ ABC_1 */ * FROM TABLE(CHECK_LIMIT) 1 301

    SQL> select s.sql_text, s.executions, s.fetches from v$sql s where s.sql_text like 'SELECT%ABC_2%';

    SQL_TEXT EXECUTIONS FETCHES
    -------------------------------------------- ---------- ----------
    SELECT /*+ ABC_2 */ * FROM TABLE(CHECK_LIMIT) 1 4

    In any case, it is difficult to provide a reasonable code that uses the fact that between pipes will execute the loop body, and there is always a way to disable the optimization through the use plsql_optimize_level = 1

    ReplyDelete
  18. I absolutly agree with William Robertson, at question was not told about the expected number of rows in the table and other limits(eg memory), so if we don't know about it would be correct to trust optimization to oracle.

    ReplyDelete
  19. "Our human resources application contains a table defined as you see below. It contains 10,000 rows."

    Regards
    Ludo

    ReplyDelete
  20. Hello All,
    Regarding Nikotin's nice example, I would just say that pipelined functions are a very particular case, where the whole purpose is to return (pipe)
    the rows ONE-BY-ONE to the calling select
    instead of waiting for ALL the rows to be collected (or produced in any way !) and stored
    as an entire collection.

    I am pretty sure that Oracle's default optimization of cursor FOR loops was NOT intended for pipelined function cursors,
    but for cursors that perform heavy database retrievals.

    In fact, as we see from your example, the results for the different optimization levels
    are different for a pipelined function, even for the case where a cursor FOR loop is used, that is, for the case that Oracle considers as "safe"
    for performing the array optimization,
    as being free of side effects.

    For the case of pipelined functions either the default optimization SHOULD NOT be applied at all (even for PLSQL_OPTIMIZE_LEVEL=2),
    or maybe be applied only when the pipelined function has a certain degree of "purity",
    similar to what was imposed in previous versions
    for a function called from SQL
    (ex.WNPS for your example).

    I dare suppose that Oracle's initiative to perform this default optimization arises from the many cases of slowness (the slow-by-slow-by-slow coding as Tom Kyte calls it) and from just trying to achieve additional performance without having to massively change code.

    There exist for sure other cases for which such functionality should be restricted, but there are also many other cases where it can help.

    Thanks & Best Regards,
    Iudith

    ReplyDelete