30 January 2011

Even our assumptions help develpers!

I thought I would share this email received from a player last week:

Steven,

I took an extra moment today to look as the updated assumptions on the PL/SQL Quiz and found the DBMS_SESSION notes really useful to resolve a curious problem we’ve been having ...

The problem – for various reasons – came down to the simple reproducible SQL statement:-
select * from dual
where regexp_like('aa', '[ -~]');
If you run the following SQL:-
exec DBMS_SESSION.SET_NLS ('NLS_LANGUAGE'  ,'DUTCH');
then the regexp_like command fails with:
ORA-12728: invalid range in regular expression
Our problem was that, for the DBMS_JOB that uses this regular expression, the database server is running on a Dutch system so that task does not run automatically and has to be done manually. We can now use DBMS_SESSION to set it to American to enable it to run automatically.

Now it would be rather evil to pose a PL/SQL Quiz on this topic ...

I’ve no doubt there’s an explanation for the problem buried somewhere in how Oracle handles ‘~’ as a combining character in a UTF-8 database when the language used for the session is one that has such combined characters ...

Anyway – thought it would provide some amusement and I wanted to say “thanks”.
Tim

4 comments:

  1. Tim, maybe I can provide part of the reason.

    Try this:

    create table plch_dutch (c varchar2(10))
    /

    declare
    TIC constant char(1) := chr(39);
    procedure get_values(p_lang in varchar2) is
    s varchar2 (30) := p_lang;
    begin
    for r in (select c from plch_dutch order by c) loop
    s := s || TIC||r.c||TIC||' ';
    end loop;
    dbms_output.put_line(s);
    end;
    begin
    insert into plch_dutch values (' ');
    insert into plch_dutch values ('-');
    insert into plch_dutch values ('~');
    commit;

    dbms_session.set_nls('NLS_LANGUAGE','AMERICAN');
    get_values('American: ');

    dbms_session.set_nls('NLS_LANGUAGE','DUTCH');
    get_values('Dutch: ');

    end;
    /

    American: ' ' '-' '~'
    Dutch: '-' '~' ' '

    And why is that?
    You can query NLS_SESSION_PARAMETERS to see the effects of NLS_LANGUAGE and NLS_TERRITORY.
    One of the parameters affected by NLS_LANGUAGE is NLS_SORT. American=BINARY, Dutch=DUTCH

    Query NLS_DATABASE_PARAMETERS to see the database default settings.

    Why your particular combination of characters raise the error, I do not know. Despite its undisputed usefulness, I am very regexp illeterate. Maybe someone else can jump in an provide the explanation?

    Mike

    ReplyDelete
  2. I thought it would be something like that: the RE is looking for any character that needs to be encoded in an email and is, in the application, more complex than the above sample. What your code clearly shows is that in the character range between space and tilde in American is reversed in Dutch - like asking it to look for all numbers between 127 and 32, instead of 32 and 127.

    The DBMS_SESSION.SET_NLS prevents us having to ask the Dutch DBAs to run their database in English.

    Another lesson learnt - and glad that wasn't a quiz lesson !!

    Tim

    ReplyDelete
  3. Yes, the NLS_SORT can be quite tricky when a programmers mind often assumes BINARY :-)

    We (a danish company) had a swedish company writing some code for us - did not work as specified in production :-(

    Turns out that for some reason in NLS_SORT=DANISH letters are sorted BEFORE digits, but NLS_SORT=SWEDISH acts like NLS_SORT=BINARY in that letters are sorted AFTER digits.

    Example:

    SQL> with s as (
    2 select 'A' txt from dual
    3 union all
    4 select '1' txt from dual
    5 )
    6 select s.txt
    7 from s
    8 order by
    9 nlssort(s.txt,'NLS_SORT=BINARY');

    T
    -
    1
    A

    SQL> with s as (
    2 select 'A' txt from dual
    3 union all
    4 select '1' txt from dual
    5 )
    6 select s.txt
    7 from s
    8 order by
    9 nlssort(s.txt,'NLS_SORT=DANISH');

    T
    -
    A
    1

    SQL> with s as (
    2 select 'A' txt from dual
    3 union all
    4 select '1' txt from dual
    5 )
    6 select s.txt
    7 from s
    8 order by
    9 nlssort(s.txt,'NLS_SORT=SWEDISH');

    T
    -
    1
    A


    Advice: Be very aware of your NLS_SORT setting :-)

    Sometimes it can make sense to use NLS_SORT=BINARY for both your database and client settings - even though you may have different NLS_TERRITORY/NLS_LANG settings. It depends on what your application expects your sorting to be...

    ReplyDelete
  4. Yes, it may help "develpers" but does it helps true "developErs" :) ?

    ReplyDelete