Watch, Follow, &
Connect with Us
Public Report
Report From: Delphi-BCB/RTL/Delphi/ConvUtils    [ Add a report in this area ]  
Report #:  100685   Status: Open
Ord and Chr got out of sync due to Unicode switch
Project:  Delphi Build #:  XE2
Version:    16.0 Submitted By:   Dmitry Burov
Report Type:  Feature Specification issue Date Reported:  11/3/2011 6:05:58 AM
Severity:    Infrequently encountered problem Last Updated: 3/20/2012 2:24:39 AM
Platform:    All versions Internal Tracking #:   288670
Resolution: None (Resolution Comments) Resolved in Build: : None
Duplicate of:  None
Voting and Rating
Overall Rating: No Ratings Yet
0.00 out of 5
Total Votes: 10
Description
XE2 Update 1 actualy. List does not have even single update even after Upd2 released.


In Pascal standard Chr and Ord were considered mutually-reversing fucntions.

Chr(Ord(a)) = x and Ord(Chr(i))=i - weither true, or broken due to out of range values.
http://writerguy.users.btopenworld.com/Pascal/pascal3.html


with Chr turned into WideChar and having no overload to return AnsiChar, it became broken!

This obviously means that char constants and char variables are of DIFFERENT types !!!
Maybe that also true for string constants and string variables.


Related bugs:
http://qc.embarcadero.com/wc/qcmain.aspx?d=100687
http://qc.embarcadero.com/wc/qcmain.aspx?d=100688

Below some letters changed to question marks.
Too see cyrillics glyphs and their order please open http://en.wikipedia.org/wiki/Cyrillics
Paragraph "Letters" has a table "The early Cyrillic alphabet"
Heading 5 letters did not change to our days, and are those exactly that used in example code.
Steps to Reproduce:
'?' here stands for cyrillic letter in Windows-1251 codepage. Used IDE XE2 Upd1 on Windows 7 with Russian language, pas source files are in "ANSI" file format.

А  and АБВГД - let specify them in direct HTML is QC web front-end fails at Unicode :-(

put 8 buttons on form.
They should enlist 1st 5 letters of Russian alphabet: ?????
Instead they enlist central-European special modifications of Latin A - A with diacritics.

This proves that Chr(Ord(Cyrillic "A") == Latin "A" with dyacritics <> original Cyrillic "A"

--------------

procedure TForm2.btn1Click(Sender: TObject);
var i: Byte;
begin
  for i := 0 to 5 {33} do
     showmessage( IntToStr(i) + ' => ' +  Chr(Ord('?') + i));
end;

procedure TForm2.btn2Click(Sender: TObject);
var i: integer;
begin
  for i := 0 to 5 {33} do
     showmessage( IntToStr(i) + ' => ' +  Chr(Ord('?') + i));
end;

procedure TForm2.btn4Click(Sender: TObject);
var i: Byte;
begin
for i := 0 to 5 {33} do
    showmessage(  IntToStr(i) + ' => ' + Chr(Ord(AnsiChar('?')) + i));
end;

procedure TForm2.btn5Click(Sender: TObject);
var i: byte;
begin
for i := 0 to 5 {33} do
    showmessage(  IntToStr(i) + ' => ' +  Chr(Byte((Ord('?')) + i)));
end;

procedure TForm2.btn6Click(Sender: TObject);
var i: byte;
begin
for i := ord('?') to Ord('?') do
    showmessage(  IntToStr(i) + ' => ' +  Chr(i));
end;
Workarounds
see above about QC deleting important Cyrillics letter here.

(*** We are hackers. Let be ignoring Pascal standard and forcing typecast of blittable types ***)
procedure TForm2.btn7Click(Sender: TObject);   var i: byte;
begin
for i := 0 to 5 {33} do
    showmessage( IntToStr(i) + ' => ' + AnsiChar(Byte((Ord('?')) + i)) ) ;
end;


(**** forcing compiler to acknowledge that char constants and char variables should belong to same type ***)
procedure TForm2.btn8Click(Sender: TObject);
var i: byte;
begin
  for i := 0 to 5 {33} do
     showmessage( IntToStr(i) + ' => ' +  Chr(Ord(WideChar('?')) + i));
end;

(*** same as above, yet less obvious but also less boilerplate ****)
procedure TForm2.btn3Click(Sender: TObject);
var i: Byte;
begin
for i := 0 to 5 {33} do
   showmessage(  IntToStr(i) + ' => ' + Chr($410 + i));
end;
Attachment
QC100685 - Unicode_Chr.zip
Comments

Dmitry Burov at 11/3/2011 6:06:58 AM -
This problem with Apple cross-compiling may be related.

https://forums.embarcadero.com/thread.jspa?messageID=406229&tstart=0#406229

Dmitry Burov at 11/3/2011 6:09:57 AM -
changing source file format to UCS-2 or UTF8 does not change a thing.
Compiler still trates constants and variables different types!

Dmitry Burov at 11/3/2011 6:22:52 AM -
I wanted to disable Unicode and try plain ANSI and see if bug perishes.

Alas i could not do it.


ms-help://embarcadero.rs_xe2/rad/Unicode_in_RAD_Studio.html
This tells Unicode can be disabled
"by default, the type string is now a Unicode string " - means there is non-default mode.


https://forums.embarcadero.com/message.jspa?messageID=15778
This tells Unicode can not be disabled.

Dmitry Burov at 11/3/2011 6:32:10 AM -
Related bugs:

http://qc.embarcadero.com/wc/qcmain.aspx?d=100687
http://qc.embarcadero.com/wc/qcmain.aspx?d=100688

Dmitry Burov at 11/3/2011 7:50:30 AM -

(**** For char in set - failure  ***)
procedure TForm2.btn10Click(Sender: TObject); var c:Char;
begin
  for c in ['?'..'?'] do showmessage(c);
end;

(**** For AnsiChar in 'WideSet': no Unicode - no problem ***)

procedure TForm2.btn13Click(Sender: TObject); var c: AnsiChar;
begin
  for c in [WideChar('?')..WideChar('?')] do showmessage(c);
end;


(* for char in 'WideSet' - unexpected failure
   Maybe WideChar were narrowed down to AnsiChar before
   set creation ? But then there SHOULD be compiler warning !!!

   I can see no other explanation *)

procedure TForm2.btn12Click(Sender: TObject); var c: Char;
begin
  for c in [WideChar('?')..WideChar('?')] do showmessage(c);
end;



(********  for char in string - sudden success  *****)
(**** maybe due to string.CodePage that Char lacks
  or maybe because  string ==UnicodeString  ***)

procedure TForm2.btn9Click(Sender: TObject); var c: Char;
begin
  for c in '??????' do showmessage(c);
end;

(************  for AnsiChar in AnsiString: same here  ******)
procedure TForm2.btn11Click(Sender: TObject); var c: AnsiChar;
begin
  for c in AnsiString('??????') do showmessage(c);
end;

Tomohiro Takahashi at 11/6/2011 8:21:26 PM -
> [Workaround]
> ...
>     showmessage( IntToStr(i) + ' => ' + AnsiChar(Byte((Ord('?')) + i)) ) ;
Is your issue related to this article?
[Delphi in a Unicode World Part III: Unicodifying Your Code - Use of the Chr Function]
http://edn.embarcadero.com/article/38693

Dmitry Burov at 11/6/2011 11:58:35 PM -
In some, distant sense, it does.

> Certain uses of the Chr function may result in the following error:
> [DCC Error] PasParser.pas(169): E2010 Incompatible types: 'AnsiChar' and 'Char'

Never met it. And in sinppets above, i'd more liked to see this error, rather than to get incorrect code generated.


> Can be changed to MyChar := AnsiChar(i);

Obvious to me, though may be not so for others.
But i have background with assembler and with C (in C Byte and Char is one and the same type, not 2 distinct types as in Pascal/Delphi).

Chr/Ord were required in original Pascal for typecasting, that was missed from the language.
Since Turbo Pascal introduced manual type overriding, many built-in function became alternative ways to achieve the same typecasts.
Also standard library functions should to be more portable (platform-independent) than typecasts (which are intentional break of compiler-enforced type safety). They are expected to work even when direct typecast fails and need to be split into separate, platform-specific $IfDef'ed codepaths.



I think, overall, this article would be good find in offline help, linked to from char-relaed types and functions.

This however does not mean that Chr(Ord(char const)) should work no more.
For what i observe, i believe the main reason of failure is that compiler treats char literal constants like "?" beeing AnsiChar, rather than WideChar. Hence, it has not direct relation to the quotes above.
This is in sharp contract with types like PChar, char and string, with functions like Chr, etc - all the latter are treated WideChar-based, rather than AnsiChar. This split in very contra-intuitive and leads to unexpected result.

The article also has "Using Character Literals" section.
>  if Edit1.Text[1] = #128 then
> will evaluate to False in Delphi 2009 because ... #128 is ... a control character in Unicode
>  if Edit1.Text[1] = '?' then
> will ... also work (i.e., recognize the Euro) in Delphi 2009 (where '?' is #$20AC)

It implies that #128 and '?' are treated WideChar in if-conditions.
Not AnsiChar implicitly casted to WideChar (if so, then 1st if would still work).
Nor '?' being AnsiChar (because it told not to depend upon Codepage, and because of above).

If so, then the contrast with Ord('?') is even more mindbreaking.
The same literal treated so vastly different in different context just seems to have no sense.

PS: i see Unicode characters are broken in this comment again.
QC is incapable of getting Unicode-related tickets, two years after Unicode Support became selling point for Delphi. Pity. Makes communication much harder than it should be :-(

Dmitry Burov at 11/7/2011 12:22:40 AM -
Added few more tests into sample app.
Direct string/char comparision works.
But as soon as Ord get in the way - it breaks.


[Window Title] Project1
[Content] Ord values: 1041 == 193
[OK]
This math is admirable.


procedure TForm2.btn14Click(Sender: TObject);
begin
  if lbl1.Caption[Length(lbl1.Caption) - 3] = '?'
     then ShowMessage('Correct')  (** <--- here **)
     else ShowMessage ('Broken');

  if '?' = lbl1.Caption[Length(lbl1.Caption) - 3]
     then ShowMessage('Correct')  (** <--- here **)
     else ShowMessage ('Broken');

  if Ord(lbl1.Caption[Length(lbl1.Caption) - 3]) = Ord('?')
     then ShowMessage('Correct')
     else ShowMessage ('Broken');  (** <--- there **)

  if Chr(Ord(lbl1.Caption[Length(lbl1.Caption) - 3])) = '?'
     then ShowMessage('Correct')  (** <--- here **)
     else ShowMessage ('Broken');

  if lbl1.Caption[Length(lbl1.Caption) - 3] = Chr(Ord('?'))
     then ShowMessage('Correct')
     else ShowMessage ('Broken');  (** <--- there **)

  if lbl1.Caption[Length(lbl1.Caption) - 3] = Char(Ord('?'))
     then ShowMessage('Correct')
     else ShowMessage ('Broken');  (** <--- there **)

  {expected} ShowMessage ( lbl1.Caption[Length(lbl1.Caption) - 3] + #13#10 + '?' );
  {expected} ShowMessage ( '?' );
  {unexpected} ShowMessageFmt(' Ord values: %d == %d', [Ord(lbl1.Caption[Length(lbl1.Caption) - 3]) , Ord('?')]);
end;

Server Response from: ETNACODE01