28 April 2011

ORDER BY needed on SELECT for answers to be correct? (2243)

The 27 April quiz focused on associative arrays and creative use of string indexing to reorder data in a collection. It prompted a flurry of emails, most of which were along these lines:

"I think the bulk select should have an order by to ensure that one of the choices is either always right or always wrong. Without the order by, I don't think you can assume the code will work in a certain way.."

"The way I read this question is which block of code _will_ show the two dates from the hire column in the order shown. The query in the pl/sql block does not contain an order by clause - there is no way to guarantee the order of the rows selected by the query. None of the blocks of code can be guaranteed to show the rows in the order shown in the question. Had the question been phrased which block of code _may_ show the specified results, then some of the answers would be correct."

"The options are effectively: 1) order by to_char(hire_date, 'J') 2) order by to_char(hire_date, 'DD-MON-RR') as NLS assumptions have a DD-MON-RR date format 3) order by to_char(hire_date, 'YYYYMMDD') 4) no order by With the date provided (inserted in the order that it is), options 1 and 3 are definitely correct, option 2 is definitely incorrect and option 4 is technically indeterminate but practically incorrect given the insert order and no indexing (based on my prior experience - ie didn't test to confirm). While it is not a actual requirement, none of the options perform a true date sort (assuming BC data, although option 1 needs pre 7-Nov-1975 BC data)."

"While in this particular case it is almost 100% sure that performing the BULK SELECT without an ORDER BY will place the 2 rows into the l_plch_employee associative array in the same order as the 2 rows were inserted into the plch_employees table, this is in general NOT ensured by SQL. Therefore, the quiz choice with AnswerID=6054, that is most probably meant to be an incorrect one, is not 100% decidable, except if you add for example an "ORDER BY employee_id" to the BULK SELECT. Just a remark for SQL-purists, that might object on this issue."

And that wasn't all! One player wrote:

"Today's quiz appeared to have no correct answer. All of these answers, except for the 3 answer, contain DECLARE...BEGIN...END blocks. These cannot be used within the BEGIN...END section of an anonymous block. The 3rd answer does nothing to specify the order of the hire dates, and is therefore inadequate."

Finally, a couple of players wrote to complain about the sheer volume of code that had to be read to answer the quiz.

My responses:

1. Regarding the lack of ORDER BY: I'd actually rather invite answers from players first, before I offer my view on this. I am curious to see what you think.

2. Regarding the inclusion of DECLARE...BEGIN...END inside another BEGIN...END: actually, that is absolutely acceptable in PL/SQL. That is an example of a nested block. I generally avoid nested blocks in my code and instead create a nested or local subprogram (procedure or function). This means that instead of the following:
BEGIN
   some_code;

   DECLARE
      l_var   NUMBER;
   BEGIN
      l_var := 100;
   END;

   more_code;
END;
I would write the following:
DECLARE
   PROCEDURE set_l_var
   IS
      l_var   NUMBER;
   BEGIN
      l_var := 100;
   END;
BEGIN
   some_code;

   set_l_var;

   more_code;
END;
I find this approach much more readable and easier to maintain.

3. Regarding volume of code: generally I try to avoid lots of code repetition in the choices by isolating the common parts, putting them in the question and just including a comment in the question code that should be replaced by the code in the choices. In this case, however, three of them were very similar but the fourth was quite different, so I could not easily avoid the situation.

Looking forward to your thoughts....SF

20 comments:

  1. I was among those who whined about the long choices. While your explanation about the last choice being different makes sense, I still spent more time parsing the diffs than actually pondering the topics being tested.

    On the ORDER BY kerfuffle, I recognized that the lack of ORDER BY meant the resulting order was indeterminate, something you and others have hammered home repeatedly over the years. So I knew that choice was wrong.

    Overall I'd say this quiz was long but fair. But given how much parsing I had to do, I was pleasantly surprised to have gotten 100%.

    - Stew

    ReplyDelete
  2. Hello All,
    The lack of the ORDER BY can only be an issue for AnswerID=6054, that is, for the choice that relies on the order in which the first array was filled by the BULK SELECT.

    For all the other choices, the arrays were reordered by a very precise criteria,
    which DOES NOT depend on the ordering of the initial array, and the result of this second ordering was in fact the whole point of the quiz.
    That is, I don't think there is any problem here.

    Regarding the code volume, I think that the complexity of this quiz was not that high as to justify any objection.
    There were other quizes in the past with a similar volumn of code, but with much higher complexity, and I don't remember having seen similar complaints.

    Best Regards,
    Iudith

    ReplyDelete
  3. The order by of the bulk collect does not matter.

    The associative array is indexed by varchar2. The loop that outputs data is using FIRST and NEXT to read through that associative array - thus that loop WILL be ordered by the varchar2 value.

    The order by of the bulk collect can only influence the order of which data is PUT into the associative array, not the order in which the data is READ from the associative array, since FIRST and NEXT works in the order of the array INDEX, not the order by which the array is populated.

    (As for comment 2 and 3 I have nothing to add to Stevens comments :-)

    ReplyDelete
  4. If you're already issuing a SQL query, and it is feasible to define the sort order in the ORDER BY clause, then it's almost always better to use the power of SQL to sort the results before processing them in PL/SQL.

    The point of this quiz, however, was to show that sorting can be done without using a SQL sort. I think the quiz would have been better had it avoided the table and query altogether, and just started, say, an array of dates - in which case using an associated array indexed by varchar2 is a fast solution (although it uses more memory to do it).

    The sheer amount of code to read through could have been reduced by moving more of the bits and pieces, that were the same in each answer, into the question text. For example, you could remove the nested block (DECLARE..BEGIN..END) by having the declarations in the question text (even though one answer doesn't actually use all the declared bits):

    DECLARE
    TYPE plch_employees_aat IS TABLE OF plch_employees%ROWTYPE
    INDEX BY PLS_INTEGER;
    TYPE emps_by_hd_aat IS TABLE OF plch_employees%ROWTYPE
    INDEX BY VARCHAR2 (100);

    l_plch_employees plch_employees_aat;
    l_emps_by_hd emps_by_hd_aat;
    l_index VARCHAR2 (100);
    BEGIN
    SELECT *
    BULK COLLECT INTO l_plch_employees
    FROM plch_employees;

    /*MORE NEEDED*/
    END;
    /

    (sorry if the above isn't formatted well)

    On another unrelated point: I almost objected to the 'J' answer, thinking that since the code is sorting an integer using string sorting it would fail for some dates (e.g. 1/1/2011BC is 986541, whereas 1/1/2011AD is 2455563; as strings, 986541 would be sorted after 2455563, thus sorting these dates incorrectly). Of course I checked - 'J' left-pads the number with zeroes to seven digits, so 1/1/2011BC is formatted as '0986541', and so string sorting still works for all Oracle dates.

    ReplyDelete
  5. The quiz asked "which choice will," not "which choice might," produce the indicated result. The absence of an ORDER BY clause makes the briefest of the choices (answer ID 6054) indeterminate and thus not a valid solution. Although that choice is likely to return the selected rows in a consistent order in each specific environment, that order cannot be guaranteed. One of the comments you quoted suggested that the rows would likely be returned in the order in which they were inserted; however, my experience has been that rows from non-indexed tables are often returned with the last inserted row first followed by the remaining rows in the order they were inserted.

    I, too, generally avoid using nested blocks; however, I will use them on occasion to trap specific exceptions that can be safely ignored or locally handled (e.g., intercepting a NO_DATA_FOUND exception when the absence of matching data does not indicate an application failure state).

    I agree with Iudith, although the volume of code was greater than usual, it was not very complex and could be quickly evaluated. So long as syntactic defects and misleading formatting are not the basis for rejecting choices, the volume of code should not be an issue.

    ReplyDelete
  6. Jeff,

    Thanks for your idea about refactoring the code - but your point "even though one answer doesn't actually use all the declared bits" is exactly why I didn't do that. I don't want to provide hints or possibly extraneous code. That could actually be confusing.

    Cheers, SF

    ReplyDelete
  7. Regarding the lack of ORDER BY.
    Steven and reviewers, could swapping of inserts change correctness or not? Don't be cunning.

    ReplyDelete
  8. The order in which data is inserted should not make any difference; we are not relying on any sort of order of data in the table in this quiz.

    ReplyDelete
  9. The "order by" objection would be absolutely valid if the quiz will rely on an order in which the initial select returns rows. But the quiz does not rely on this order. So it is 100% valid.

    A code volume is somewhat too high - but anyway acceptable (while almost hits limits as for me).

    ReplyDelete
  10. Now output doesn't look as questioned. It's the second (or first of twos) reason for incorrectness. It is a CUNNING.

    I want know is it possible that matched output (i.e. "correct" output) will be marked as incorrect choice because of it is unreliable solution?

    Early THE INDETERMINACY did not play the game. But eyes did.
    Is it changed at now?

    ReplyDelete
  11. Kim: "The order by of the bulk collect does not matter."
    al0: "But the quiz does not rely on this order."

    Seems, you didn't answer the shortest choice with [Answer ID: 6054]. Did you?

    ReplyDelete
  12. Jeff, you mentioned that you almost rejected the Julian date choice until you checked and saw that the string would be left padded to seven digits. Even if the string weren't padded, this choice would still have been correct: the quiz didn't ask for a general solution, it asked for a solution for two specific dates (in this case both already have seven digits).

    Vitally, determinacy is always important, but only for relevant characteristics. For this quiz both value and order were significant. For some quizzes only the set of values is significant and order is unimportant.

    ReplyDelete
  13. Hi,

    I had the same thought as Vitaliy when I was solving the quiz. This is not the first time when I saw 2 possible results of a code depending on "different" conditions. In such situations I am not sure if the solution should be valid in all cases or there is just one valid case enough.

    You can compare this to quiz "15 April 2011" where I found a case where the result was correct and case where the result was incorrect. Original anwer was "not correct" but after I found one valid case it was rescored.
    In this quiz I see similar situation. The anwer is counted "not correct" but there is obviously a case when we can get a valid result.

    Unluckily I decided for the wrong option.
    What worries me is when there will be no clear definition to this, I (and others) will most probably have the same issue in future quizes.

    Regards,
    Michal Cvan

    ReplyDelete
  14. The discrepancy of both camps raises from different interpretations of the following sentence: "Which choice will show the following text after execution?"

    While the first camp can understand it as: "Which choice will certainly show the following text after execution?", the second camp can understand it as: "Which choice has a chance to show the following text after execution?"

    I tend to the second variant.

    ReplyDelete
  15. "Will" usually infers that it will always be this way (I belong in the first camp). However, future quizzes may be less confusing if they use wording like "Will definitely", "Will never" or "Could possibly" to better distinguish what the question is after.

    ReplyDelete
  16. Interesting. I really didn't expect the discussion to veer into this semantic analysis.

    I realize that English is not the first language of everyone playing the quiz, and I need to be careful about my choice of words.

    But I would have thought that the meaning of "will" is clear enough and in clear contrast to "could".

    Will - it is always this way. No exceptions.

    Could/can - you can identify a scenario in which this is true.

    I will see about using additional qualifying terms to avoid any misunderstanding, but I will not be changing the scoring of this quiz.

    ReplyDelete
  17. @jhall62: you're right - my objection would not have nullified that answer for this quiz.

    WRT "will" - I agree with Steven.

    I have a feeling the problem here is not one of English comprehension, but of understanding that the dates here are being sorted reliably (deterministically, if you will) by the table indexed by varchar2.

    ReplyDelete
  18. Thank you, now it is clear to me.

    ReplyDelete
  19. @Vitaliy

    Seems, you didn't answer the shortest choice with [Answer ID: 6054]. Did you?
    Definitely, I have not marked [Answer ID: 6054] as correct - as it is not. Will means will, not could.

    ReplyDelete
  20. Hi,

    the missing order by is not crucial for this quiz and the meaning of the correct answer shows 100% the problem of the behaviour. I would also not change the scoring on this.

    One other thing about this quiz is, that it would perfectly fit into an SQL Challenge as PlSql is not needed in any case to show this behaviour.

    Br Martin

    ReplyDelete