29 August 2011

Impact of NULLs and Analyzing Choice Correctness (5926)

The SQL quiz that started on 13 August has prompted a number of players to raise questions about the impact of NULL values (sorry about the delay in writing about this!). And, though it might seem odd, their comments also relate to how you should read, interpret and choose correct answers for our quizzes.

The quiz asked the following question of the plch_employees table:

We want to retrieve the total earnings (i.e. salary + commission) of an employee. Which queries give us the correct result?

The rows in the table included some NULL values for commission, while none of the salary values were NULL.

Players questioned our scoring as correct any of the choices, because all the queries assumed that the salary will never be NULL.

I will make a few comments, and then invite both our SQL author/reviewers and players to offer their own perspectives.  My comments:

1. It would have been better if we'd put a NOT NULL constraint on the employee_salary column. That would avoid the issue altogether (in fact, I plan to do this after we finish this discussion).

2. It is important to draw a distinction between answering a quiz and developing "bullet-proof" code that handles all possible scenarios.

When you take a quiz, you answer the question in the context of the information provided in that question - and nothing else. In this case, none of the salaries were NULL and so the issue of the impact of NULL on a formula like "salary + commission" only affects the commission "side" of the formula.

So I do not believe that any change in scoring needs to take place. But, as noted earlier, I do think we should add a NOT NULL constraint on the salary column.

1. I would argue that the problem is not just the fact that employee_salary might be null, but also that every answer scored as correct assumes that an employee_comm value of NULL should be interpreted as 0. Nowhere in the instructions did it indicate this, so I don't think that assumption can be made.

More simply put, if MILLER has an employee_salary of 1300 and employee_comm of NULL then her total earnings (employee_salary + employee_comm) are NULL. None of the answers give that result so none of them are correct.

2. Hello All,
The issue here is to define what is the meaning of
"total earning". If it is a "life common sense"
meaning, then the scoring is correct.
If it is the "SQL" meaning, then it is not.

Anyway, adding the NOT NULL is welcome.

On a more general plan, this also relates to the missing of an ORDER BY clause from a SELECT.
In my opinion, NOT specifying it where it is supposed to be specified also renders a choice as incorrect, even if it might happen that for the given sample data, the output itself could be correct anyway.

So, maybe we can all agree upon the fact that
a correct choice means a "correct solution"
and not just a "correct result" .
Of course, if the solution is correct then
the result is also always correct .

Thanks & Best Regards,
Iudith

3. This sounds to me like the question: "Is the following table with the presented data in third Normal Form"? The correct answer can only be given in the light of the present data. And off course if you add new records, the answer could change. It is not really valid to say that a specific example could be there. The real world domain must be taken into consideration. And while it is probable that the NULL value could be there and must be considered for all possible values in the table. It is not part of the question.

Ok, I admit, I've studied mathematics, database theory and logistics. And just saw "The Big Bang Theory" for three hours and Sheldon is my favorite.

4. Goran Stefanovic30 August, 2011 10:06

Dan Kennedy:
"More simply put, if MILLER has an employee_salary of 1300 and employee_comm of NULL then her total earnings (employee_salary + employee_comm) are NULL. None of the answers give that result so none of them are correct."

I strongly disagree with this. If you (in real life) have a salary of 1300, and did not recieve any commission (which means you literally recieved nothing; you DID NOT recieve 0!), your total earnings are definitely 1300.

Translated to SQL/PLSQL: in this case NULL should be treated as 0 - it is logical assumption based on real life experience. If you do not do that, you lose all the money from the programming language's point of view. I do not think MILLER would agree with that. :-)

Steven Feuerstein:
"Players questioned our scoring as correct any of the choices, because all the queries assumed that the salary will never be NULL."

It does not matter how the salary will change over time (although it is logical to assume it will never be null, right?) because you gave us INSERT statements with the snapshot of salaries and commissions at that precise moment. These values are world of their own for this quiz, and other possible values should not be considered (in my opionion).

Of course, in the real world domain this could be fatal, and nullable columns should always be treated as "could be NULL". If you did not give us INSERT statements it would have been completely different story, but in this case I think it is the right way to answer questions based ONLY on the specified data.

5. Goran: "If you (in real life) have a salary of 1300, and did not recieve any commission (which means you literally recieved nothing; you DID NOT recieve 0!), your total earnings are definitely 1300."

How can you say MILLER didn't receive any commission? My point is that, lacking any other documentation of what NULL means in this environment, we don't know what MILLER's commission was.

So if I told you that your salary next year was going to be \$100k but I don't know what your commission is going to be, you tell me: what is your total compensation going to be?

Btw, this is a SQL test not a real life test, right? I (strongly) think the answers should be driven not by assumptions made in real life but only by what is provided in the question.

6. When salary is null a man is a volunteer not an employee.
Adding NOT NULL constraint will stop strange objections.

7. Goran Stefanovic30 August, 2011 14:44

Dan Kennedy:
"How can you say MILLER didn't receive any commission? My point is that, lacking any other documentation of what NULL means in this environment, we don't know what MILLER's commission was.

So if I told you that your salary next year was going to be \$100k but I don't know what your commission is going to be, you tell me: what is your total compensation going to be?"

My assumption was that the commission is past value for that precise moment (in the MILLER's example, I assumed that his salary up to this moment was 1300, and that his commission up to this moment was NULL, meaning "nothing" and not "undefined value"). I am pretty confident that I can say MILLER's commission WAS nothing. Now, what MILLER's commission WILL BE is clearly undefined, as is his future salary. I don't think potential future values should have an impact on what was asked in the quiz, which is basically what were the total earnings for these people based on the provided data.

Of course, this is just a matter of opinion. It could be said that both our statements are correct - it is logical that NULL commission means nothing, but it is also a fact that NULL commission means undefined commission, unless described differently. It is, as usual, on Steven to decide. :-)

8. I have added NOT NULL to all columns except commission.

9. The question said nothing about past value for that precise moment and that's not one of the standard assumptions. I think the question was too ambiguous, and the "correct" answers were based on unstated assumptions. Just my opinion...

10. Dan's argument has merit. NULL is not a value. It is not unusual for NULL to be treated as zero in some domains; however, there was nothing in the wording of this quiz to indicate that such a convention was to be applied. Because there were candidate choices that replaced NULLs with zeroes, I guessed that the author intended a NULL commission to mean "no commission" and answered accordingly; however, those options could also have been intended to be traps for the unwary. The semantics of NULLs are ambiguous without explicit direction regarding their interpretation.

It is important to understand the methods available to handle NULLs in SQL. It is even more important to understand the meaning of a NULL so that one may choose an appropriate method.

11. Hello All,
The issue here is to decide whether the term "total earnings" DOES or DOES NOT imply the assumption that a NULL commission is to be considered as a ZERO commission, that is, one that contributes by zero to the "total earnings".

This quiz, including its sample data is maybe the first or even almost single example of
NULL usage used in SQL beginner's classes,
so this exact example is probably the one that makes everybody consider ONLY ZERO to be a correct "replacement" for a NULL.

Maybe the quiz could have stated it explicitly,
instead of just relying on the common meaning of the "total earnings" term, but then the quiz would have probably lost its main point.

Anyway, what we clearly can infer from such cases is that problems, especially SQL problems,
are better to be formulated IN WORDS, not just in sample data terms.

"Tell me in plain English what do you try to achieve" is what Tom Kyte always asks from his site visitors who are asking questions ...

Thanks & Best Regards,
Iudith

12. Hi,

I must say that I totally disagree with Dans point here. The way I understand, all quizzes are "as is" quizzes, which I interpret in such way that when you start the quiz you have a fully set up database that has no user generated tables, procedures etc. Hence the quiz is about the data that you have before you and nothing else. In my mind, in this particular quiz NULL means nothing and hence is zero. In another quiz it can mean something else, but that would be indicated by something else.

The way I read the quiz first, I saw the NULLs more like N/A, this person does not receive commission at all. Hence null should equal 0.

And I am not sure it was neccessary to put not null on the other columns for the quiz in question, even though in the real world it is a good practice, as none of the data entered there had null anywhere else than in commission column.

Regards,
Ingimundur K. Gudmundsson

13. @Ingimundur: I'm sorry, but if all quizzes are "as-is" and "the quiz is about the data that you have before you and nothing else" then how can you possibly conclude that, in this case, "NULL means nothing and hence is zero"? I completely disagree.

Have a look at this Oracle Magazine article (http://www.oracle.com/technetwork/issue-archive/2005/05-jul/o45sql-097727.html) which, funny enough, discusses this very example of salaries and commissions. It says: "Does a null commission mean 'no commission,' or might it mean 'commission unknown'? The answer depends on your application, and there might not be a clear answer at all."