PL/SQL Challenge: 2011-11-27

03 December 2011

The Pain of a Missing WHERE Clause Predicate

I sent the following email out to 1243 players today:

Dear PL/SQL Challenge player,

One of things I like most about the PL/SQL Challenge is that it keeps me humble. I am reminded on an almost daily basis by one or more players that there is lots for me learn about PL/SQL and the wider Oracle technology stack.

And then there are the moments of deep, deep humility - such as when I discover a nasty bug that has corrupted data in the PL/SQL Challenge repository. Painful lessons, those.

Unfortunately, I write to you about precisely just such a moment. We discovered recently that a DELETE statement was missing a critical predicate in the WHERE clause (did a chill just run down your spine, as it did me when I realized what was happening?). As a result, if a player pressed the "Start Over" button while taking the Oracle Magazine quiz, all the choices made by players for this quiz were deleted.

This bug went undiscovered for quite a long time, so it is not possible to recover the data. Instead, we are going to "wipe the slate clean", extend the close date on these quizzes, and offer you the opportunity to take them again.

Look on the bright side: the second time around, you will likely answer faster and with higher accuracy!

Our deepest apologies for wasting even a few moments of your time. I hope you will continue to play the daily, weekly and monthly quizzes, thereby deepening your expertise in key Oracle technologies.

Trying to look on the bright side of things, this same problem would have occurred if anyone competing in the quarterly championship playoff had pressed that same "Start Over" button (it is only visible when the competition has multiple quizzes in it). That would have wreaked havoc - and required me to come up with five more playoff-worthy quizzes.

Still, I feel horrible that we would have made such a mistake and not noticed it testing; and that I would cause any player to have to re-do something they've done already.

I have extended the close date for the Sept/Oct quiz to the end of the year. I hope that you all will find the time to take the quiz before the end of the year.

Again, my apologies....

Steven Feuerstein

The Complex World of Indexing (10193)

The SQL quiz for the week ending 2 December tested your knowledge of the circumstances under which Oracle would use an index to optimize a SQL statement.

Two players registered concerns about scoring the first choice as correct. This choice was:

SELECT last_name
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')

My intention was to isolate a small enough sampling of the total data set so as to invoke the index - and to also point out (through other choices) that if you use the column inside an expression, the index will not be used.

Here are the comment; I look forward to your thoughts.

Steven Feuerstein

1. I think, that the first choice for this quiz is ambiguous. The problem is, that the query selects about 3% of the rows. With the index in place, the database can choose either table full scan or index range scan followed by table access by rowid to run the query. Based on the assumptions, it's impossible to tell, which execution plan will be more efficient and hence whether the index improves query performance. In real world situation, we could expect, that the table is "almost" ordered by hire_date, which means that the clustering factor of the index would be low and the index access path would be more efficient than full table scan. However, there was no such assumption specified for this quiz. It would be better to include last_name in the index or select just a single month or a week instead of a year.

2. I have performed some research around the topic of this quiz, just to verify a little bit more deeper the case of choice number 1, which selects about 1/31 of the rows ( 1 year out of 31 years ) and it seemed to be the most obvious among all the choices. But query optimization is a huge topic, never ending and always dynamic, sometimes very frustrating and always exciting to research. I have performed some research around the topic of this quiz, just to verify a little bit more deeper the case of choice number 1, which selects about 1/31 of the rows ( 1 year out of 31 years ).

That is, though the query returns about 3% of the rows only, this is still not a complete guaranty that the index will be used in any case. It highly depends on the clustering factor of the index, or, in other words, it depends on how the rows were loaded into the table, in which degree they were loaded ordered by the HIRE_DATE column.

Below are two test cases, one in which the data is loaded with minimum clustering of the hire_date values (values generated in a circular fashion) and the other in which the values are highly clustered.

In both cases I used a direct-path insert for creating the test data faster.Then, for each case, I created the index and checked the execution plan:

a. without any statistics collected, which means that dynamic sampling will be used
b. with statistics collected using 20% sampling
c. with complete statistics collected.
Also, I use two slightly different queries:

1. One query identical with that of choice 1 in the quiz:

SELECT last_name
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/

2. One that selects the HIRE_DATE instead of the LAST_NAME

SELECT hire_date
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/

The conclusions are as follows:

1. When the rows are loaded cyclically by HIRE_DATE (not clustered, but dispersed), then the index is NOT used. Here dynamic sampling accurately estimates cardinality.

2. When the rows are loaded clustered by HIRE_DATE, then the index is always used, even for high cardinality estimated. Here dynamic sampling highly underestimates cardinality.

3. The index is always used when the query can be resolved entirely using the index only ( ex. selecting HIRE_DATE instead of LAST_NAME ).

See below the two test cases:

SET TIMING ON
SET LINESIZE 130
SET PAGESIZE 0


----------------------------------------------------------
CASE 1 -- data loaded in a circular fashion
----------------------------------------------------------
CREATE TABLE plch_employees
(
   employee_id     INTEGER
,  last_name       VARCHAR2 (100)
,  first_name      VARCHAR2 (100)
,  hire_date       DATE
,  salary          NUMBER
)
/

Table created.

Elapsed: 00:00:02.02

-- create a test case (10,000,000 rows) using smaller steps to avoid error
-- ORA-30009: Not enough memory for CONNECT BY operation

DECLARE
   K  PLS_INTEGER;
BEGIN
   FOR I IN 1 .. 1000
   LOOP
       K := (I-1)*10000;

       INSERT /*+APPEND */ INTO plch_employees
       SELECT K+ROWNUM, 
              'ABC '||(K+ROWNUM),
              'XYZ '||(K+ROWNUM),
               TO_DATE('01011980','DDMMYYYY') + MOD(K+ROWNUM, 31*365),
               10000 + MOD(K+ROWNUM, 20000)
       FROM DUAL
       CONNECT BY LEVEL <= 10000 ;

       COMMIT;
   END LOOP;
END;
/

PL/SQL procedure successfully completed.

Elapsed: 00:01:59.05


CREATE INDEX i_plch_employees
ON plch_employees (hire_date)
/

Index created.

Elapsed: 00:00:50.02

-- check execution plans without any statistics (will use dynamic sampling)

/* TEST1: selecting LAST_NAME */

EXPLAIN PLAN SET STATEMENT_ID='T1' FOR
SELECT last_name
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/
Explained.

Elapsed: 00:00:00.07


select plan_table_output 
  from table(dbms_xplan.display('plan_table','T1','serial'))
/

------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |   326K|    50M|  6883  (10)| 00:00:27 |
|*  1 |  TABLE ACCESS FULL| PLCH_EMPLOYEES |   326K|    50M|  6883  (10)| 00:00:27 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

Note
-----
   - dynamic sampling used for this statement

19 rows selected.

Elapsed: 00:00:00.00

/* TEST2: selecting HIRE_DATE instead of LAST_NAME - index only access suffices */

EXPLAIN PLAN SET STATEMENT_ID='T2' FOR
SELECT hire_date
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/
Explained.

Elapsed: 00:00:00.00


select plan_table_output 
  from table(dbms_xplan.display('plan_table','T2','serial'))
/

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |   326K|  2866K|   869   (2)| 00:00:04 |
|*  1 |  INDEX RANGE SCAN| I_PLCH_EMPLOYEES |   326K|  2866K|   869   (2)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

Note
-----
   - dynamic sampling used for this statement

19 rows selected.

Elapsed: 00:00:00.00

-- gather statistics with 20% sampling
BEGIN
    DBMS_STATS.GATHER_TABLE_STATS(USER,'PLCH_EMPLOYEES',
                                   cascade     => TRUE,
                                   method_opt  => 'FOR ALL INDEXED COLUMNS',
                                   estimate_percent => 20);
END;
/

PL/SQL procedure successfully completed.

Elapsed: 00:00:46.00

/* TEST3: selecting LAST_NAME */

EXPLAIN PLAN SET STATEMENT_ID='T3' FOR
SELECT last_name
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/

Explained.

Elapsed: 00:00:01.04

select plan_table_output 
  from table(dbms_xplan.display('plan_table','T3','serial'))
/

------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |   315K|    12M|  6867  (10)| 00:00:26 |
|*  1 |  TABLE ACCESS FULL| PLCH_EMPLOYEES |   315K|    12M|  6867  (10)| 00:00:26 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

15 rows selected.

Elapsed: 00:00:00.03

/* TEST4: selecting HIRE_DATE instead of LAST_NAME - index only access suffices */

EXPLAIN PLAN SET STATEMENT_ID='T4' FOR
SELECT hire_date
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/

Explained.

Elapsed: 00:00:00.00

select plan_table_output 
  from table(dbms_xplan.display('plan_table','T4','serial'))
/

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |   315K|  2463K|   907   (2)| 00:00:04 |
|*  1 |  INDEX RANGE SCAN| I_PLCH_EMPLOYEES |   315K|  2463K|   907   (2)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

15 rows selected.

Elapsed: 00:00:00.00

-- gather full statistics  -  almost same results as with 20%  
BEGIN
    DBMS_STATS.GATHER_TABLE_STATS(USER,'PLCH_EMPLOYEES',
                                   cascade     => TRUE,
                                   method_opt  => 'FOR ALL INDEXED COLUMNS',
                                   estimate_percent => 99);
END;
/

PL/SQL procedure successfully completed.

Elapsed: 00:02:52.03

/* TEST5: selecting LAST_NAME */

EXPLAIN PLAN SET STATEMENT_ID='T5' FOR
SELECT last_name
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/
Explained.

Elapsed: 00:00:00.00

select plan_table_output 
  from table(dbms_xplan.display('plan_table','T5','serial'))
/

------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |   315K|    12M|  6867  (10)| 00:00:26 |
|*  1 |  TABLE ACCESS FULL| PLCH_EMPLOYEES |   315K|    12M|  6867  (10)| 00:00:26 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

15 rows selected.

Elapsed: 00:00:00.03

/* TEST6: selecting HIRE_DATE instead of LAST_NAME - index only access suffices */

EXPLAIN PLAN SET STATEMENT_ID='T6' FOR
SELECT hire_date
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/
Explained.

Elapsed: 00:00:00.00

select plan_table_output 
  from table(dbms_xplan.display('plan_table','T6','serial'))
/

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |   315K|  2462K|   853   (2)| 00:00:04 |
|*  1 |  INDEX RANGE SCAN| I_PLCH_EMPLOYEES |   315K|  2462K|   853   (2)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

15 rows selected.

Elapsed: 00:00:00.00


DROP TABLE plch_employees
/

----------------------------------------------------------
CASE 2 -- data loaded clustered by HIRE_DATE
----------------------------------------------------------
CREATE TABLE plch_employees
(
   employee_id     INTEGER
,  last_name       VARCHAR2 (100)
,  first_name      VARCHAR2 (100)
,  hire_date       DATE
,  salary          NUMBER
)
/

Table created.

Elapsed: 00:00:02.02

DECLARE
   K  PLS_INTEGER;
   N  PLS_INTEGER := TRUNC(10000000 / (31*365));
BEGIN
   FOR I IN 1 .. 1000
   LOOP
       K := (I-1)*10000;

       INSERT /*+APPEND */ INTO plch_employees
       SELECT K+ROWNUM, 
              'ABC '||(K+ROWNUM),
              'XYZ '||(K+ROWNUM),
               TO_DATE('01011980','DDMMYYYY') + FLOOR( (K+ROWNUM)/ N ),
               10000 + MOD(K+ROWNUM, 20000)
       FROM DUAL
       CONNECT BY LEVEL <= 10000 ;

       COMMIT;
   END LOOP;
END;
/

PL/SQL procedure successfully completed.

Elapsed: 00:01:51.00

-- check correct (expected) distribution
SELECT COUNT(*)
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/

  COUNT(*)
----------
    322295

1 row selected.

Elapsed: 00:00:13.04

CREATE INDEX i_plch_employees ON plch_employees (hire_date)
/

Index created.

Elapsed: 00:00:47.01

/* TEST1: selecting LAST_NAME --

now index is used, though wrong cardinality estimation ! */

EXPLAIN PLAN SET STATEMENT_ID='T1' FOR
SELECT last_name
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/
Explained.

Elapsed: 00:00:00.00

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |  2500 |   393K|    26   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PLCH_EMPLOYEES   |  2500 |   393K|    26   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_PLCH_EMPLOYEES |  2500 |       |     9   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - dynamic sampling used for this statement

19 rows selected.

Elapsed: 00:00:00.04


/* TEST2: selecting HIRE_DATE instead of LAST_NAME - index only access suffices */

EXPLAIN PLAN SET STATEMENT_ID='T2' FOR
SELECT hire_date
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/

Explained.

Elapsed: 00:00:00.00

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |  2500 | 22500 |     9   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| I_PLCH_EMPLOYEES |  2500 | 22500 |     9   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

Note
-----
   - dynamic sampling used for this statement

19 rows selected.

Elapsed: 00:00:00.01


-- gather statistics with 20% sampling
BEGIN
    DBMS_STATS.GATHER_TABLE_STATS(USER,'PLCH_EMPLOYEES',
                                   cascade     => TRUE,
                                   method_opt  => 'FOR ALL INDEXED COLUMNS',
                                   estimate_percent => 20);
END;
/
PL/SQL procedure successfully completed.

Elapsed: 00:00:48.02


/* TEST3: selecting LAST_NAME - now with correct (much higher) cardinality, index still used ! */

EXPLAIN PLAN SET STATEMENT_ID='T3' FOR
SELECT last_name
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/
Explained.

Elapsed: 00:00:00.00

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |   318K|    12M|  3160   (2)| 00:00:12 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PLCH_EMPLOYEES   |   318K|    12M|  3160   (2)| 00:00:12 |
|*  2 |   INDEX RANGE SCAN          | I_PLCH_EMPLOYEES |   318K|       |   861   (2)| 00:00:04 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

15 rows selected.

Elapsed: 00:00:00.00


/* TEST4: selecting HIRE_DATE instead of LAST_NAME - index only access suffices */

EXPLAIN PLAN SET STATEMENT_ID='T4' FOR
SELECT hire_date
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/
Explained.

Elapsed: 00:00:00.00

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |   318K|  2490K|   861   (2)| 00:00:04 |
|*  1 |  INDEX RANGE SCAN| I_PLCH_EMPLOYEES |   318K|  2490K|   861   (2)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

15 rows selected.

Elapsed: 00:00:00.00


-- gather full statistics  -  almost same results as with 20%  
BEGIN
    DBMS_STATS.GATHER_TABLE_STATS(USER,'PLCH_EMPLOYEES',
                                   cascade     => TRUE,
                                   method_opt  => 'FOR ALL INDEXED COLUMNS',
                                   estimate_percent => 99);
END;
/

PL/SQL procedure successfully completed.

Elapsed: 00:02:42.05

/* TEST5: selecting LAST_NAME -

now with correct (much higher) cardinality, index still used ! */

EXPLAIN PLAN SET STATEMENT_ID='T5' FOR
SELECT last_name
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/
Explained.

Elapsed: 00:00:00.00

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |   321K|    12M|  3035   (2)| 00:00:12 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PLCH_EMPLOYEES   |   321K|    12M|  3035   (2)| 00:00:12 |
|*  2 |   INDEX RANGE SCAN          | I_PLCH_EMPLOYEES |   321K|       |   868   (2)| 00:00:04 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

15 rows selected.

Elapsed: 00:00:00.00


/* TEST6: selecting HIRE_DATE instead of LAST_NAME - index only access suffices */

EXPLAIN PLAN SET STATEMENT_ID='T6' FOR
SELECT hire_date
  FROM plch_employees
 WHERE hire_date BETWEEN TO_DATE ('01-01-2010', 'MM-DD-YYYY')
                     AND TO_DATE ('12-31-2010', 'MM-DD-YYYY')
/
Explained.

Elapsed: 00:00:00.00

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |   321K|  2511K|   868   (2)| 00:00:04 |
|*  1 |  INDEX RANGE SCAN| I_PLCH_EMPLOYEES |   321K|  2511K|   868   (2)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("HIRE_DATE">=TO_DATE(' 2010-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIRE_DATE"<=TO_DATE(' 2010-12-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

15 rows selected.

Elapsed: 00:00:00.00

02 December 2011

Looking for a few good...SQL quizzes

Friends,

As many of you know, we now offer a weekly SQL quiz, in addition to the daily PL/SQL quizzes.

I have a nice solid backlog of PL/SQL quiz submissions from players - and I must use these very gradually due to their impact on rankings (the author gets maximum score possible for their quiz).

We are in need, though, of SQL quizzes. I can and will write them, but my SQL expertise is not very expert. I'd love to feature the experience and creativity of players from all over the world.

Just click on Submit Quiz and have a go at it. I am happy to help you "polish up" the text, so don't worry about the fine details. Just make sure you provide working verification code, a nice variety of chocies, and a solid explanation from which I can work.

Many thanks in advance,
Steven Feuerstein

Late Report:: Results of Q3 2011 Championship Playoff

[My apologies to players of the championship - I just discovered that I had neglected to publish this post way back in October.]

You will find below the results of the Q3 2011 championship playoff, played on 25 October. The number next to the player's name is the number of times that player has been in a playoff.

You can take a look at the quizzes that were in the competition through the Library page. These results will also be available through the Rankings page.

Congratulations to all players, of course, but let's give a special round of applause to our top three players:

1. Frank Schrader, who has played in an astonishing five playoffs (this is the 6th since we started the PL/SQL Challenge in April 2010) and wins a US$250 Amazon.com gift card.

2. Kevan Gelling, who has played in three playoffs (a fine accomplishment in and of itself) and wins a US$175 Amazon.com gift card.

3. Justin Cave, who has played in four playoffs (very impressive) and wins a US$100 Amazon.com gift card.

Rank	Name (Playoffs)	Country	Total Time	Total Score
1	Frank Schrader (5)	Germany	17 mins 08 secs	3052
2	Kevan Gelling (3)	Isle of Man	19 mins 57 secs	2651
3	Justin Cave (4)	United States	19 mins 59 secs	2635
4	Randy Gettman (4)	United States	17 mins 49 secs	2629
5	_Nikotin (3)	Russia	16 mins 28 secs	2566
6	mentzel.iudith (4)	Israel	19 mins 54 secs	2512
7	Jeff Kemp (6)	Australia	13 mins 25 secs	2342
8	Niels Hecker (5)	Germany	17 mins 09 secs	2267
9	Jerry Bull (2)	United States	19 mins 54 secs	2247
10	Janis Baiza (3)	Latvia	18 mins 45 secs	2235
11	Siim Kask (4)	Estonia	19 mins 27 secs	2201
12	james su (3)	Canada	16 mins 46 secs	2070
13	Anton Scheffer (2)	Netherlands	16 mins 04 secs	1974
14	Viacheslav Stepanov (3)	Russia	19 mins 32 secs	1954
15	Ludovic Szewczyk (2)	Belgium	19 mins 59 secs	1850
16	Joaquin Gonzalez (3)	Spain	16 mins 42 secs	1786
17	kowido (3)	No Country Set	19 mins 32 secs	1649
18	Ninoslav Čerkez (1)	Croatia	19 mins 22 secs	1638
19	Chad Lee (2)	United States	19 mins 56 secs	1596
20	Syed Ariful Bari (1)	Bangladesh	05 mins 59 secs	1550
21	Anna Onishchuk (3)	Ireland	17 mins 32 secs	1519
22	Dmitry Pushkashu (1)	Moldova	16 mins 24 secs	1462
23	Michal Cvan (4)	Slovakia	19 mins 41 secs	1426
24	Thierry Poels (2)	Belgium	18 mins 32 secs	1394
25	Rajesh Venkataramani (2)	India	19 mins 26 secs	1196
26	Dejan Topalovic (1)	Austria	14 mins 09 secs	1147
27	macabre (2)	Russia	19 mins 47 secs	1099
28	Alain Boulianne (1)	French Republic	15 mins 11 secs	1061
29	Spoon (2)	Singapore	19 mins 40 secs	957
30	dobloman (1)	Netherlands	16 mins 16 secs	700
31	John Hall (3)	United States	19 mins 41 secs	481

30 November 2011

Nuances of Deterministic Functions in PL/SQL (9588)

The 23 November quiz tested your knowledge of the little-known fact that as of Oracle Database 11g, the compiler will optimize the execution of deterministic functions in PL/SQL blocks. Previously, defining a function with the DETERMINISTIC clause led to optimizations when the function was called inside an SQL statement, but not in PL/SQL code itself.

_Nikotin, a devoted and high-ranking PL/SQL Challenge player, took the quiz as a launch point for an in-depth analysis of this optimization and came up with several interesting conclusions:

1. The optimization only occurs when the values passed as arguments to the function are compile-time constants, such as 1 or 'abc' or true or even TO_NUMBER('123') when the optimization level is set to at least 2. If you pass to one of the IN arguments an expression that cannot be determined at compile time to be a constant , no optimization occurs. Makes perfect sense, as this optimization occurs at compile time.

2.The optimization does not seem to occur outside of the context of a loop. In other words, calling the function N times with the same compile-time static arguments explicitly in your code results in the body of the function being executed N times.

3. Furthermore, if the loop contains non-deterministic code (that is, it is not simply calling deterministic functions with compile-time static constants), then the optimization also does not occur.

So at this point, it looks to _Nikotin and I that the applicability of this new optimization will be fairly narrow. Perhaps over time, the PL/SQL team will extend the circumstances under which the optimization will occur.

You will find below some details reinforcing the above points. In addition, I have attached a file to the list of resources for this quiz to demonstrate circumstances of optimization.

Cheers, Steven

DBMS_PROFILER Demonstrates No Optimization Outside of Loop (from _Nikotin)

def proc_name = PLCH_TEST_LOOPS
def func_name = PLCH_TEST_DETERM

create or replace function &func_name
(
i varchar2
) return varchar2
deterministic is
begin
return i;
end;
/

create or replace procedure &proc_name is
res varchar2(1);
begin
res := &func_name('a');
res := &func_name('a');
res := &func_name('a');
res := &func_name('a');
res := &func_name('a');
res := &func_name('a');
res := &func_name('a');
res := &func_name('a');
res := &func_name('a');
res := &func_name('a');
end;
/

var res number
exec :res := dbms_profiler.start_profiler('&proc_name');

exec plch_test_loops;
exec :res := dbms_profiler.stop_profiler;

col text format a60
set pagesize 100

select s.text, d.total_occur
from plsql_profiler_data d
   , user_source s
   , plsql_profiler_units u
where d.runid =
      (
        select max(runid) runid
        from plsql_profiler_runs
        where run_comment = '&proc_name'
      )
and d.runid = u.runid
and u.unit_name in ('&proc_name', '&func_name')
and d.unit_number = u.unit_number
and s.name = u.unit_name
and s.line = d.line#
order by u.unit_number, d.line#;

And the results:

TEXT                                        TOTAL_OCCUR
------------------------------------------- -----------
procedure PLCH_TEST_LOOPS is                                           0
res := PLCH_TEST_DETERM('a');              1
res := PLCH_TEST_DETERM('a');              1
res := PLCH_TEST_DETERM('a');              1
res := PLCH_TEST_DETERM('a');              1
res := PLCH_TEST_DETERM('a');            1
res := PLCH_TEST_DETERM('a');              1
res := PLCH_TEST_DETERM('a');              1
res := PLCH_TEST_DETERM('a');              1
res := PLCH_TEST_DETERM('a');              1
res := PLCH_TEST_DETERM('a');              1
end;                                         1
function PLCH_TEST_DETERM                    0
return                                    10
end;                                        10

For comparison with loops:

def proc_name = PLCH_TEST_LOOPS
def func_name = PLCH_TEST_DETERM

create or replace function &func_name
(
i varchar2

) return varchar2
deterministic is
begin
return i;
end;
/

create or replace procedure &proc_name is
res varchar2(1);
begin
for i in 1 .. 10 loop
    res := &func_name('a');
end loop;
end;
/

var res number
exec :res := dbms_profiler.start_profiler('&proc_name');

exec plch_test_loops;
exec :res := dbms_profiler.stop_profiler;

col text format a60
set pagesize 140

select s.text, d.total_occur
from plsql_profiler_data d
   , user_source s
   , plsql_profiler_units u
where d.runid =
      (
        select max(runid) runid
        from plsql_profiler_runs
        where run_comment = '&proc_name'
      )
and d.runid = u.runid
and u.unit_name in ('&proc_name', '&func_name')
and d.unit_number = u.unit_number
and s.name = u.unit_name
and s.line = d.line#
order by u.unit_number, d.line#;

And result is:

TEXT                                    TOTAL_OCCUR

--------------------------------------- -----------

procedure PLCH_TEST_LOOPS is              0

  for i in 1 .. 10 loop                  11

    res := PLCH_TEST_DETERM('a');        10

end;                                      1

function PLCH_TEST_DETERM                 0

  r                                       1

end;                                      1

Impact of Non-Deterministic Code Inside Loop

The following code uses inserts into a table (via an autonomous transaction) to demonstrate how often the body of the deterministic function is executed. You might think that this logging means the function is not deterministic and therefore will turn off the optimization, but that is not the case. The PL/SQL compiler does not validate that a function is trulydeterministic; it does not make a decision on whether or not to avoid execution of the function based on the code inside the function. All of that is decided at the time the code is compiled, based solely on the way the function is called - and, in turns out, the other code executed in the loop.

CREATE TABLE plch_log (created_on DATE)
/

CREATE OR REPLACE PACKAGE plch_getdata
IS
   PROCEDURE log_count (title_in IN VARCHAR2);

   FUNCTION vc (vc VARCHAR2)
      RETURN VARCHAR2
      DETERMINISTIC;
END;
/

CREATE OR REPLACE PACKAGE BODY plch_getdata
IS
   PROCEDURE log_count (title_in IN VARCHAR2)
   IS
      l_count   PLS_INTEGER;
   BEGIN
      SELECT COUNT (*) INTO l_count FROM plch_log;
      DBMS_OUTPUT.put_line (title_in || '=' || l_count);
   END;

   PROCEDURE log_call
   IS
      PRAGMA AUTONOMOUS_TRANSACTION;
   BEGIN
      INSERT INTO plch_log VALUES (SYSDATE);
      COMMIT;
   END;

   FUNCTION vc (vc VARCHAR2)
      RETURN VARCHAR2 DETERMINISTIC
   IS
   BEGIN
      log_call ();
      RETURN vc;
   END;
END;
/

DECLARE
   vc   VARCHAR2 (100);
BEGIN
   FOR i IN 1 .. 10
   LOOP
      vc := plch_getdata.vc ('abc');
   END LOOP;

   plch_getdata.log_count ('VC 10 iterations in loop');

   FOR i IN 1 .. 10
   LOOP
      vc := plch_getdata.vc ('abc');
      vc := 'abc' || TO_CHAR (SYSDATE);
   END LOOP;

   plch_getdata.log_count (

'VC 10 iterations with non-deterministic code in loop');
END;
/

VC 10 iterations in loop=1
VC 10 iterations with non-deterministic code in loop=11

DROP TABLE plch_log
/

DROP PACKAGE plch_getdata
/

Effect on Inlining Optimization (level 3) - from _Nikotin

Because Oracle can inline the local subroutines with plsql_optimize_level = 3, the "non-deterministic" part of the loop can became deterministic:

def proc_name  = PLCH_TEST_LOOPS

def func_name = PLCH_TEST_DETERM

create or replace  function &func_name

(

  i varchar2

) return varchar2 

   deterministic is

begin

  return i;

end;

/

alter session set plsql_optimize_level  = 2;

create or replace procedure &proc_name is

  res  varchar2(1);

  procedure inline_proc (i number) is

  begin

     null;

  end;

begin

  for i in 1 .. 10 loop

    res :=  &func_name('A');

    inline_proc(1);

  end  loop;

end;

/

var res number

exec :res :=  dbms_profiler.start_profiler('&proc_name');

exec plch_test_loops;

exec  :res := dbms_profiler.stop_profiler;

col text format a60

set pagesize  140

select s.text, d.total_occur

from plsql_profiler_data d

   ,  user_source s

   , plsql_profiler_units u

where d.runid = 

       (

        select max(runid) runid 

        from plsql_profiler_runs  

        where run_comment = '&proc_name'

      )

  and d.runid =  u.runid

  and d.unit_number = u.unit_number

  and s.name = u.unit_name

  and s.line = d.line#

order  by u.unit_number, d.line#;

And result:

TEXT                                              TOTAL_OCCUR

------------------------------------------------  -----------

procedure PLCH_TEST_LOOPS  is                        1

  procedure inline_proc (i  number) is               0

     null;                                          10

   for i in 1 .. 10 loop                            11

     res := PLCH_TEST_DETERM('A');                  10

     inline_proc(1);                                 10

end;                                                 1

function PLCH_TEST_DETERM                            0

  return  i;                                         10

end;                                                10

And if uses:

alter session set plsql_optimize_level  = 3;

Then it will:

TEXT                                              TOTAL_OCCUR

------------------------------------------------  -----------

procedure PLCH_TEST_LOOPS  is                        0

  for i in 1 .. 10  loop                            11

    res :=  PLCH_TEST_DETERM('A');                  10

     inline_proc(1);                                 10

end;                                                 1

function PLCH_TEST_DETERM                            0

  return  i;                                          1

end;                                                                    1

And nothing with inline_proc.