22 October 2013

Questions Arising from First Database Design Quiz


[This post is from Chris Saxon, manager of the weekly Database Design quiz.]
 
The database design quiz for the 12th-18th October covered implementation of  a preferred address feature for customers. The first choice in the answers added a DEFAULT_ADDRESS_ID column to the customers table. 

This answer was marked as correct, however many players selected this choice as incorrect and a few objected on the basis that no foreign key was defined (on the DEFAULT_ADDRESS_ID column). 

The requirements in the question stated that customers could select "at most one" of the addresses to be their preferred address. By definition the PLCH_CUSTOMERS.DEFAULT_ADDRESS_ID meets this, as it's only possible to store one default address per customer. In my opinion, adding a FK to this column to an address table while a very desirable action isn't strictly necessary to meet the "at most one" requirement. 

This also brings up a wider concern I have with these quizzes regarding foreign keys. This week's quiz (18th-25th October 2013) includes a CURRENCY_CD field. If I was designing "for real", there would definitely be a FK on this column to a CURRENCIES table. I excluded the FK and additional table from the question however to keep it shorter and easier to read. There are a couple of other "missing" FKs and tables from the question, which also aren't strictly necessary to answer the question. 

Based on the response to the default address in the previous quiz, some players may feel these are necessary for some of the choices to be correct. 

So I would like to gather your thoughts on how we should handle these issues. 

The questions I have are: 
  • If a quiz includes a choice with fields that reference a key from another table, is it necessary to define a FK for the choice to be correct? (If the question states that we must do something, for example to use valid addresses, I certainly agree that a FK would be necessary) 
  • In general, should questions include as many FKs and tables as possible? Or should FKs only be listed where directly impact the correctness of some of the choices? 
Please let us know your thoughts on this!

Also, remember that we will not publish any comments that reference the details of this week's quiz! No spoiler alerts!

8 comments:

  1. Hello!
    If PLCH_CUSTOMERS.DEFAULT_ADDRESS_ID is not associated with a FK, how can we be sure it points to a valid address?
    It is possible to argue that a value stored here but having no correspondent acts exactly like a NULL value. It is true up to the point where we need extra and not common sense information to deal with this sort of problem.
    Because it raises useless complications about data quality I think it is a mistake not to have a corresponding Foreign Key (FK).
    I have seen too many times when dis-considering the FK importance and simplicity had as consequence data issues and also other sort of issues.
    On the contrary I have seen no situation when having a FK was a bad idea. Yes, it might have some negative performance impact, but it is a long way from "some performance impact" up to "a bad idea".

    ReplyDelete
  2. Hello Chris, all

    I think the quiz text should be kept as simple and short as possible, therefore the constraints and the database objects, that are not relevant to the essence (main topic) of the question should be omitted. On the other hand, if the quiz concentrates on the FKs, all the FKs should be listed.

    ReplyDelete
  3. Dear Chris,

    In my opinion, a quiz should contain the minimum amount of code necessary for the quiz. So if the omission of a FK violates best practices, but the FK is not strictly necessary for the quiz, I'd suggest you omit it. Of course, in the quiz solution, there should be a mention of the missing FK and why it is recommended.

    Kind regards,
    Frank

    ReplyDelete
  4. I vote for shortness and readability. If you add a foreign key as a nice-to-have addition, then you should also have an index on that column for good practices. And that might already be questionable.

    ReplyDelete
  5. Hello Chris, All,

    I understand your concern, and, at least as I can conclude at this point based on these last two quizzes only, I would say the following:

    1. For the previous quiz, about customer addresses, I think that the FK is necessary,
    because of the following:

    a) Defining a default address for a customer and possible ways to accomplish this was
    the central point/focus of the quiz.

    b) The parent table that the FK was supposed to reference WAS PRESENT in the quiz setup,
    which somehow "demanded" for that FK constraint to be defined as part of a correct solution
    to the question asked.

    c) The presence of another choice that DOES create the (correct) FK also naturally suggests
    a choice that omits doing this to be considered as incorrect.
    Of course, this can be different if the quiz clearly specifies that a choice containing just a
    partial but correct step from a possible complete solution should also be regarded as correct.


    2. For the last quiz of the current week, the central point of the quiz does *NOT* lie in the correctness
    of the values of CURRENCY_CD or any other column specified, but rather in the right placement of
    that column in one of the two tables that DO APPEAR in the quiz ( one in the setup and
    the second one created by Choice 1 ), while a CURRENCIES table is not present,
    though, as you say, naturally understood.

    The quiz DID NOT ask about how to implement a full solution for an INVOICE data model,
    or even not for one of the columns in its entirety, but focused on the design of the one
    (or two) tables and the relationship between those two tables.

    If by moving a column like CURRENCY_CD from one table to the other its integrity requirements
    were changed, then yes, maybe we would be missing something at this point,
    but I think it is not the case here.


    So, in my opinion, these two quizzes are basically different in what they are asking.


    Considering that probably the data models that will appear in the following quizzes will not
    always be too simple, I would NOT go so far as to suggest defining ALL the FK-s
    and present ALL the tables for a full implementation, but, however,
    if possible "to focus" the quiz around a "closed in itself" fragment or aspect of a complete solution
    to a problem, then for that specific fragment or aspect, then yes, I would go for including
    ALL the tables and ALL the relationships among them, either in the quiz introduction,
    or, if it suits the quiz, then in one or more of the Choices themselves.

    I hope that I was able to convey my idea in this answer.

    This discussion is very interesting by itself, and I am sure that it will continue,
    we will become "more enlightened" as time passes and more quizzes will be played.


    Thanks a lot & Best Regards,
    Iudith

    ReplyDelete
  6. For the first question, only if it is imperative for the quiz specification. The second question is no to first part and yes to the second part. We have gone through this discussion before on the pl/sql side. There we assume nothing, only the text of the quiz applies. It does not matter that the choice is stupid or wrong to do generally, if it applies to the specification stated in the quiz. And this one did.

    Why would this choice incorrect. It is certainly correct for the some developers who write all the application logic in a java web application that then writes into the table, and uses only available addresses as source. This way, the application takes care of the problem and the developer doesn't have to first check if this referential integrity is violated.

    So use the same assumption as in PL/SQL, never assume anything about what may or may happen outside the scope of the quiz, it is only information in the quiz that matter.

    Regards,
    Ingimundur K. Gudmundsson

    ReplyDelete
  7. Personally I don't think those FKs are necessary. From personal experience, we don't always add them in the database - we implement the constraint in the UI layer so there is no need to duplicate it within the DB layer (which not everyone will agree to, but it works!). The meaning of the column is understood, so any DB designer should not need all the additional info that does not any value to the question.

    Kev Scott

    ReplyDelete
  8. Thanks for your thoughts all - glad most feel that keeping the questions as short as possible is a desireable goal!

    Ingimundur and Kev have both pointed out a reason I feel that we can't say that a missing foreign key "automatically" makes a choice incorrect - many applications manage to work perfectly well without any defined! Having no FKs is not something I would advise - it's likely to lead to data quality issues and unexpected behaviour.

    So we can be clear about the correctness of future quizzes where having foreign keys forms part of the answer, I propose the following:

    Questions including the words "must" and "valid" (e.g. "column x must refer to valid customers") when stating the business requirements need a foreign key for choices to be considered correct. If these words are not present, then options that have no referential integrity defined can still be considered correct (providing they meet the requirements in the quiz).

    Does that sound like a workable solution?

    ReplyDelete