24 March 2013

Defining "the same results" for "unnecessary code" quizzes

The discussions around choice 4 of the 19 March quiz, in particularly those titled "Choice 4's commit is unnecessary" and 'The "last_name" column is unnecessary in Choice 4', made me realize that I need to offer a more explicit definition of what it means for two blogs to have the "same result."

Some players interpreted this to mean "same output displayed on the screen." Others engaged in a very deep analysis of whether or not the commit of a row is the "same" as an uncommitted row in a single session, etc.

Just like all of you, the players, I do not want to have debates on these sorts of issues after quizzes. I would much rather discuss the Oracle technology behind the quizzes.

So I am going to come up with a definition of the "same result" for these quizzes.

And I will start by first offering a definition or way to prove that two blocks do not have the same result.

I run the same code (let's call it T for test code) after each block (let's call them A and B). If T displays "A" after running the A block, and displays "B" after running the B block, then A and B do not have the "same result."

Expressed algebraically:

If A = B, then A + C = B + C.

And, conversely:

If A + C != B + C, then A != B.

Let's apply this rule to the following block:

BEGIN
   INSERT INTO plch_employees
        VALUES (1, 'Splog', 1000000);

   INSERT INTO plch_employees
        VALUES (200, 'Rogash', 1000000);

   COMMIT;
END;
/

Several players argued that COMMIT is unnecessary. I claim that it is necessary. I will now prove it using the proposed rule.

I run the following block immediately after the above block (call it A) and another without the commit (call it B):

DECLARE
   l_count INTEGER; 
BEGIN
   ROLLBACK;
 
   SELECT COUNT(*) into l_count
     FROM plch_employees;
 
   IF l_count = 2 
   THEN 
      DBMS_OUTPUT.PUT_LINE ('A');
   ELSE 
      DBMS_OUTPUT.PUT_LINE ('B');
   END IF;
END;
/

I will not see the same output displayed on the screen. Those two blocks, then, do not have the same result. The COMMIT is, therefore, necessary code.

This rule gives us all a way to demonstrate that two blocks are not the same. So if anyone claims that a certain chunk of code is not necessary, that claim can now be dis-proven objectively - as long as a block of test code can be devised.

Note that the rules state:

A change in the resources needed to execute the choice (CPU, memory, etc.) does not, for this quiz, constitute a change in the choice. In other words, if the removal results in a choice that is slower or consumes more memory, but otherwise accomplishes the same work (inserts a row, displays text, etc.), then the choice does contain unnecessary code.

So test code cannot display "A" vs "B" based on a "change in the resources" as defined in this rule.

Well, I will start with this rule and see what you all think.

10 comments:

  1. Hello Steven,

    Nice idea !

    However, strictly logically speaking, it is much easier to prove that two blocks DO NOT have the same result,
    because, for such proof, only one single such "C" block is needed for which (A+C) <> (B+C).

    On the other hand, for proving that the output IS the same, you will theoretically have to prove that (A+C) = (B+C)
    for each and every possible C ... which is pretty impossible ...

    The fact that no such "C" was found
    (yet ... who knows ? in 100 years other players will read our quiz repository :):), no ? )
    does not mean that such a "C" does not exist ...

    Well ... we cannot completely ignore "the built-in common sense" in defining when two "results" are considered the same.

    I don't think that any player will argue that a non-committed transaction is different from a committed one,
    the possible argument was only whether this REAL difference is indeed a difference as by the rules of this quiz type.

    I can only hope that this blog will be shorter than the previous one, to give us at least a tiny feeling of progress towards understanding and agreement :) :)

    Thanks a lot & Best Regards,
    Iudith

    ReplyDelete
  2. Hi Steven,

    This is again Nicolas from the PL/SQL Challenge. The one who start the "COMMIT" issue.. can of worms should I say.
    I never intend to make that an issue, and I apologize for it. Just want to have this point to be clarified properly in rules, so we can make your challenge better.

    I'm sorry, but, unless there is something I do not understand in the wording of your rule (I'm not native English speaker), your example is not solving the issue.

    You put an explicit ROLLBACK here, so of course, I would never say that the COMMIT is unnecessary... but the ROLLBACK become unnecessary as there is nothing to rollback after a COMMIT.
    I could even say that BOTH ROLLBACK and COMMIT are unnecessary if I still look at the context of the single session - remove them both and the block will still say 'A', regardless of the state of the transaction.

    The matter of the question I raise originally is really a matter of perception on how we play the game:

    - Should we think as part of the Database (we are the "server" process - so state of the transaction matter to us),
    - or simply look at our SQL*Plus window (we are a "client" process - state of the transaction doesn't matter to us, this is the role of the server- only what we see on the output matter to us - SQL*Plus automatically commit upon exit) ?

    I will use another example based on yours to state the difference - where there is no explicit rollback

    BEGIN
    INSERT INTO plch_employees
    VALUES (1, 'Splog', 1000000);

    INSERT INTO plch_employees
    VALUES (200, 'Rogash', 1000000);

    COMMIT;
    END;
    /


    DECLARE
    PRAGMA AUTONOMOUS_TRANSACTION;
    l_count INTEGER;
    BEGIN
    SELECT COUNT(*) into l_count
    FROM plch_employees;

    IF l_count = 2
    THEN
    DBMS_OUTPUT.PUT_LINE ('A');
    ELSE
    DBMS_OUTPUT.PUT_LINE ('B');
    END IF;
    END;
    /

    In that case, the COMMIT is really necessary, because the second block is explicitly an AUTONOMOUS TRANSACTION, so the fact we commit or not commit the first block change the state of the table and therefore change the output of the display of the second block.
    But if I look to the output only (I'm a client process), I could still say there is unnecessary code - if I remove BOTH the commit and the pragma, the block will still say 'A' .

    For me, you simply need to set the rule explicitly: either

    - The status of the transaction matter - even is not explicitly visible on the output, a COMMITTED transaction is different than transaction that has not been committed yet.
    - We only look at the output of the execution of the block, regardless of the state of the transaction (i.e. A commit inside a pl/sql block is not impacting the output, a commit OUTSIDE a pl/sql block impact the output)

    Be sure I perfectly understand your point of view here (You think as the "Server"), and I respect it, I simply want to make the game better.


    Note: an idea for a next quizz, using several blocks and Pragma autonomous transaction on some could be an interesting situation to play with :)

    ReplyDelete
  3. Iudith writes "I don't think that any player will argue that a non-committed transaction is different from a committed one, the possible argument was only whether this REAL difference is indeed a difference as by the rules of this quiz type."

    Nicolas writes: "For me, you simply need to set the rule explicitly: either the status of the transaction matters...[or] we only look at the output of the execution of the block, regardless of the state of the transaction"

    I reply:

    I am getting the impression that the problem players have had with this issue is that so many of the "regular" PL/SQL quizzes use system output (text displayed on the screen) as a kind of proxy or representation of behavior of the code. I suppose that's why players might think in terms of the "client process" vs. the server. I certainly haven't been thinking that way; PL/SQL code executes in the database and can change the state of the database in various ways.

    It is true that in the unnecessary code quiz, I did not define "result" to be limited to system output. I can, therefore, argue that players should not have assumed that, but clearly lots of you have developed that view about the quizzes.

    You will need to change that thinking for these quizzes, and I need to make sure there is no ambiguity about it.

    As Iudith points out, my post so far only offers a tool to prove the "negative" - that removal of code causes a change in results. That does not, however, offer a general definition of "the same result".

    I really do not want to have to come up with a list of rules defining the "same." Nicolas wants a rule regarding transaction state. What about application context values, database pipe contents, .... and who knows what else?

    So I will keep on thinking. Thanks for your feedback!

    ReplyDelete
  4. Hmm... Maybe something along the lines of:

    "After removal of code, the choice must still leave the session and database state identical, so the result does not change no matter what happens afterwards - even if the session disconnects immediately afterwards due to network error or similar. Any code that executes after the choice must not be influenced by the removal. If a piece of code can be demonstrated to behave differently when executed after the changed choice compared to executed after the original choice, then the code was necessary to keep the state and could not be removed."

    Or in other words - if just a single C can be found that makes A+C != B+C, then the code was necessary. C might be code that executes after A or B, or C might be an unexpected disconnection (some players have argued that since quiz environment is SQL*Plus then we autocommit on exit, but that is only if exiting properly :-).

    ReplyDelete
  5. I like the simplicity of statement in "A+C != B+C"

    However, it doesn't address splitting hairs about what defines the "state".

    Part of the rules say resource consumption is not considered part of state. So, exactly what elements of session/database state should count? If I query the data dictionary after A and B and find something "C" that has changed. Am I right? or will my C be thrown out as unrelated performance information?

    I also really, really hate the idea of "after code" as a proving criteria.
    The quiz is over, all code has completed but right/wrong is not determined by the possibility of running something more?

    That's just begging for after-the-fact adjustments,.rescores and everybody-wins.

    For the sake of the quiz integrity, I think the sql or dbms_output results should be the only criteria.
    It makes for the most unambiguous grading criteria. I expect X, did you get X? yes/no.

    For "unnecessary quizzes", this codes produces X, I remove Y, do I still get X? yes/no

    If you want a quiz to challenge/teach something about client vs server action, or session vs database state, then write a quiz about those.

    The continued effort to make more and more refined rules is self defeating. For one it's unfair to players that might not be comfortable with English. Second, it's unlikely the rules will ever be complete, because somebody will always be able to find something different between two executions and the after-arguments will be nothing more than an arbitrary list of attributes that should or should not be counted.

    I'd like to renew my suggestion from the original discussion to SIMPLIFY the rules, with one adjustment:
    Only allow "unnecessary" quizzes to have output generated by pl/sql.
    If you want to run sql, you can always capture those results in pl/sql and print them.
    The advantage of this limit on quiz construction is you can simplify the grading criteria: Only the results returned from dbms_output will count.

    Then going on the rule of SIMPLIFY:
    Allow removal of any non-white space text and check if the dbms_output results differ.
    If they do, the code was unnecessary, if they don't, the code was necessary.

    If you want to check commit status, put that into the quiz so it shows up in the results
    If you want to check performance, put that into the quiz so it shows up in the results
    If you want to check session state, put that into the quiz so it shows up in the results
    If you want to check database state, put that into the quiz so it shows up in the results

    If you really want to protect some specific element of the quiz text, then make it explicit in that quiz.
    For instance, if you don't want players to remove PLS_ from PLS_INTEGER, then make it explicit in the quiz text that any non-whitespace text may be removed except "PLS_"

    Or better yet, write the quiz so the difference between PLS_INTEGER and INTEGER will show up in the results thereby making the PLS_ implicitly necessary.

    If it's simple then it's MUCH easier to make the rules unambiguous.



    ReplyDelete
    Replies
    1. This comment has been removed by a blog administrator.

      Delete
  6. Sean, you are very good at giving one pause. I have pause.

    I certainly never expected that my "unnecessary code" quiz would require anything so elaborate, and it does beg the question of "Is there a better way?" I am not yet entirely comfortable with what you suggest, but I will keep thinking about it.

    By the way, you write "I also really, really hate the idea of 'after code' as a proving criteria." Actually, the "after code" would not be used to prove, but to dis-prove. If a player tried to argue that some code was unnecessary, I would produce a script that demonstrates otherwise. And of course others could do the same.

    Kim, thanks for your formulation. I like it; now we just need to see if I will use this approach for the next one.

    ReplyDelete
  7. well, I'm glad I can help, if in fact I am helping.

    The "really, really hate" part is a bit odd since it's always going to be a facet of the quiz. Quiz claims A, somebody writes code to demonstrate B. Scores are adjusted.

    That's ok, because it's supposed to be an exception. Here it seems like an expectation.

    With these new ones, you've run all the code in the quiz. But, the necessity of your answer is determined by code that may or may not be run afterward. To me that feels like skirting edges of trickery. Also, I think it makes the quizzes less useful as a lesson since the quiz isn't complete. You don't know the truth unless you also run something else and that something else isn't actually part of the quiz.

    For example: I miss a question because I forget about a commit - fine, my mistake I know about commits but I brain-fart.

    Next guy misses the same question because he doesn't know the importance of a commit.
    While I'm sure after-the-fact commentary will be available to help explain it, it seems to me the quiz itself should be self-contained. That's part of the reason I recommend a visible output be used as the criteria. Not only is it unambiguous, but it also makes the solutions self-describing.

    I think the answer is A. I run the code I get A - great, I knew it!
    or
    I think the answer is A. I run the code I get B - oops, better rethink that.

    Even if you're clueless about what the code does, the answer is staring you in the face. It gives you something to work from.

    With the new quizzes that require a mystery-after-code:

    I think the code generates A. I remove something, I still get A - so the something wasn't necessary - am I right?

    I don't know, maybe.

    I think the answer is A. I remove something, I get B - so the something was necessary - am I right?

    I don't know, maybe.

    In either case, if a player is unsure of the results, the rules or their own modification they can't determine if they are right or wrong without an external ruling.
    You either forgot some case, or your removal was deemed illegal.

    That's why the current model seem less useful to me.

    ReplyDelete
  8. Whew. I decided to try a third time with one of these quizzes, applying what I've learned from these discussions. Set up the quiz, four choices, three of them with 100% necessary code.

    Or so I thought. Elic found unnecessary code in all three of them.

    So I give up. This is too complicated and difficult (just to write a good quiz!).

    So I am taking a different approach: I will show a block of code and ask you to decide which of the choices contain code that CAN be removed.

    Bottom line is that I will be, in essence, following Sean's advice. Thanks, Sean!

    ReplyDelete
  9. Hello Steven, All,

    I think this approach is a much "happier and precise" one than the previous, can we start to celebrate ?

    As I have predicted, for almost any piece of code somebody will be "clever enough" and find something
    that can be removed, especially if the code is "complicated enough".

    Asking whether a precisely specified "element X" can be removed or not is a much better focused quiz,
    it is supposed indeed to require and check PL/SQL knowledge to understand the result of that removal
    and not just "wild imagination" to try to find by all means anything that can be removed,
    however illogical that piece might be, as we have seen in several cases discussed up to now.

    Thanks a lot & Best Regards,
    Iudith

    ReplyDelete