17 August 2011

Finding Holes in Algorithms - "Number in String" (5979)

The 16 August quiz stated the following:

I need to write a function named plch_num_instr that returns the number of times a substring occurs within a string. Here is the header of the function:
FUNCTION plch_num_instr (
   string_in IN VARCHAR2, substring_in IN VARCHAR2)
We then asked you:

"Which of the choices implement plch_num_instr so that the following block displays these five lines after execution..."

One player wrote after taking this quiz:

"Fine quiz but I think that none of the solutions would correctly implement your requirement (although I found the three correct answers for the quiz). Just try "plch_num_instr( 'Steieimer Meieieier', 'eiei')". The correct answer is 3 but all of your solutions just return 2 (haven't tried it before, but I have thought that the RegExp_Count() should return the correct value). The error lies in the implementation of the answer with id 7628 where you add the length of the substring to the actual starting position instead of adding just 1. This is no request for a scoring adjustment just a little bit to think about."  He also offered a corrected implementation, which you will find at the end of this posting.

And another player wrote with a similar concern: "In the given function calls, it give us correct answers, but there are cases when this implementation returns incorrect answer. For example: plch_num_instr ('123123123123', '123123'). Call to this implementation will return 2, but substring '123123' occurs 3 times within a string."

And finally as one player so succinctly expressed it: "Just out of interest, given this specification what should the following block display? BEGIN DBMS_OUTPUT.put_line (plch_num_instr ('steeeeeeven', 'ee')); END;"

All players are correct; the choices did not contain an implementation that handled all cases (nor did we claim that they would). I will add the implementation you see below to the answer, along with the points noted above. But no change will be made in scoring.
CREATE OR REPLACE
FUNCTION InStrCount (pString  IN VARCHAR2,
                     pSearch  IN VARCHAR2)
                    RETURN BINARY_INTEGER
/*====================================================================
    Returns the number of occurrences of a search-string in a string
  --------------------------------------------------------------------
    pString:  String to search for the occurrences of "pSearch"
    pSearch:  String to search in "pString"
  --------------------------------------------------------------------
    Result:  number of occurrences of "pSearch" in "pString"
  --------------------------------------------------------------------
    History:  000-00  nhecker
  ====================================================================*/
IS

    Result  BINARY_INTEGER := 0;
    iPos    BINARY_INTEGER;
    iStart  BINARY_INTEGER := 1;

BEGIN
  LOOP
    iPos := InStr( pString, pSearch, iStart);
    EXIT WHEN (iPos = 0);

    Result := Result + 1;
    iStart := iPos + 1;
  END LOOP;
  RETURN (Result);
END InStrCount;
/

7 comments:

  1. Beside problems how many times is a substring in a string (some characters are counted twice, is it really another different substring?) is it an idea to make it more clear that the Oracle version is not 10 as is the default? I did not notice, so I missed an answer (reg_count) as I thought, that it was not existing. If I had noticed the minimum version I had started google on this one. Bold, red comes to mind.

    ReplyDelete
  2. Hello All,
    Maybe what was missing from the quiz text was the clear specification that occurrences that are counted should be always supposed as non-overlapping.

    In fact, this is (more or less "silently") the basic assumption of how all the Oracle built-ins work, for example REPLACE, REGEXP_COUNT and INSTR (the 4-th parameter).

    Therefore, the 100% complete solution would be to add a 3-rd BOOLEAN parameter to the function,
    with TRUE=allow overlaps, FALSE=do not allow overlaps and make the code count accordingly.

    Adapting the above solution:

    CREATE OR REPLACE
    FUNCTION InStrCount (pString IN VARCHAR2,
    pSearch IN VARCHAR2,
    pAllowOverlaps IN BOOLEAN DEFAULT FALSE)
    RETURN BINARY_INTEGER
    /*====================================================================
    Returns the number of occurrences of a search-string in a string
    --------------------------------------------------------------------
    pString: String to search for the occurrences of "pSearch"
    pSearch: String to search in "pString"
    --------------------------------------------------------------------
    Result: number of occurrences of "pSearch" in "pString"
    --------------------------------------------------------------------
    History: 000-00 nhecker
    000-01 iudith
    ====================================================================*/
    IS

    Result BINARY_INTEGER := 0;
    iPos BINARY_INTEGER;
    iStart BINARY_INTEGER := 1;

    BEGIN
    LOOP
    iPos := InStr( pString, pSearch, iStart);
    EXIT WHEN (iPos = 0);

    Result := Result + 1;
    -- iStart := iPos + 1;
    iStart := iPos +
    CASE WHEN pAllowOverlaps THEN 1
    ELSE LENGTH(pSearch)
    END;
    END LOOP;
    RETURN (Result);
    END InStrCount;
    /

    This way everybody will be satisfied :) :)

    Thanks & Best Regards,
    Iudith

    ReplyDelete
  3. Thanks, Iudith. I will use your modified version in the answer text.

    Wim - I am reluctant to highlight the version and/or other aspects of the quiz. We present all the information on the page. It seems like when a person takes the quiz, they would start at the top, read all the information, then dive into the details of the quiz. If players hurriedly skip over information on the page...well...that's a good lesson learned, isn't it?

    ReplyDelete
  4. Agree, Steven - I ought to know by now that I cannot just skip the headers and dive right into the question :-)

    I did notice "version 11" and tried to be alert. But I failed on REGEXP_COUNT anyway because 11.1 documentation was not as up-to-date as 11.2 documentation. This page (http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions001.htm#i88893) does not mention REGEXP_COUNT, whereas this one does (http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/functions002.htm). My lesson learned: Go for the newest docs :-)

    ReplyDelete
  5. Hello Kim,
    The interesting is that the PDF version
    of the SQL Language reference downloadable from the same 11.1 page that you indicated does contain REGEXP_COUNT :) :)

    The "gold dream" ?
    To use the documentation all the time, except while answering the quizzes :) :) :)

    Best Regards,
    Iudith

    ReplyDelete
  6. Yes, indeed a good lesson learned. Yesterday and today the first thing I did was check the minimal version.

    ReplyDelete
  7. Thanks to you Iudith for enhancing my function.
    Niels

    ReplyDelete