Log On
Embarcadero Home
Watch, Follow, &
Connect with Us
Share This
QualityCentral
Communities
Articles
Blogs
Resources
Downloads
Help
QualityCentral
Delphi-BCB
RTL
Delphi
Other Classes
TEncoding
TStreamReader/TStreamWriter
TStringBuilder
TStringReader/TStrngWriter
TTextReader/TTextWriter
You are not logged in.
Help
Print
Public Report
Report From:
Delphi-BCB/RTL/Delphi/Other Classes/TEncoding
[ Add a report in this area ]
Report #:
79042
Status:
Open
Remove MB_ERR_INVALID_CHARS flag from TEncoding.UTF8
Project:
Delphi
Build #:
14.0.3615.26342
Version:
14.0
Submitted By:
Remy Lebeau (TeamB)
Report Type:
Minor failure / Design problem
Date Reported:
10/27/2009 4:24:51 PM
Severity:
Commonly encountered problem
Last Updated:
3/20/2012 2:24:39 AM
Platform:
All platforms
Internal Tracking #:
273480
Resolution:
None
(Resolution Comments)
Resolved in Build:
:
None
Duplicate of:
None
Voting and Rating
Overall Rating:
No Ratings Yet
0.00 out of 5
Total Votes:
None
Description
TEncoding.UTF8 is the only encoding object that uses the MB_ERR_INVALID_CHARS flag when calling MultiByteToWideChar(). When a byte buffer is passed to GetCharCount() or GetChars(), and the buffer contains an incomplete character sequence at the end (because the sequence is straddling multiple buffer boundaries), the entire decode operation fails even if the buffer contains fully decodable sequences.
Using TEncoding.GetEncoding(65001) instead of TEncoding.UTF8 does not use the MB_ERR_INVALID_CHARS flag, and GetCharCount() and GetChars() is able to process full sequences correctly, ignoring any partial sequences at the end of the byte buffer.
MB_ERR_INVALID_CHARS should be removed from the SysUtils.TUTF8Encoding class.
Steps to Reproduce:
// using TEncoding.UTF8...
var
utf8: TBytes;
utf16: TCharArray;
enc: TEncoding;
begin
SetLength(utf8, 8);
utf8[0] := Ord('T');
utf8[1] := Ord('e');
utf8[2] := Ord('s');
utf8[3] := Ord('t');
utf8[4] := Ord(' ');
// UTF-8 encoding of Greek PI, for example
utf8[5] := $CE;
utf8[6] := $A0;
utf8[7] := 0;
utf16 := TEncoding.UTF8.GetChars(utf8, 0, 6);
// utf16 is completely empty!
end;
// using TEncoding.GetEncoding(65001)...
var
utf8: TBytes;
utf16: TCharArray;
enc: TEncoding;
begin
SetLength(utf8, 8);
utf8[0] := Ord('T');
utf8[1] := Ord('e');
utf8[2] := Ord('s');
utf8[3] := Ord('t');
utf8[4] := Ord(' ');
// UTF-8 encoding of Greek PI, for example
utf8[5] := $CE;
utf8[6] := $A0;
utf8[7] := 0;
enc := TEncoding.GetEncoding(65001);
try
utf16 := enc.GetChars(utf8, 0, 6);
finally
enc.Free;
end;
// utf16 contains 'Test ' as expected...
end;
Workarounds
None
Attachment
None
Comments
Michiel Spoor at 1/17/2013 12:07:44 AM
-
Proposed solution does not look good. It silently ignores the unconvertible token.
Please NEVER silently ignore errors!
Item #111980 seems related
View Your Reports
Search
Server Response from: ETNACODE01
Developer Tools
Blackfish SQL
C++Builder
Delphi
FireMonkey
Prism
InterBase
JBuilder
J Optimizer
HTML5 Builder
3rdRail & TurboRuby
Database Tools
Change Manager
DBArtisan
DB Optimizer
ER/Studio
Performance Center
Rapid SQL
Technical Articles
Tutorials
White Papers
Press Releases
Newsletters
Add Content (GetPublished)
Audio
Audio & Video
Video
Bugs & Suggestions (QualityCentral)
Discussion Forums
Examples (CodeCentral)
Tags
Technology Partners
Downloads
Free Trials
Registered User Downloads
Beta Programs
Add Content (GetPublished)
Articles
Blogs
Bugs & Suggestions (QualityCentral)
Discussion Forums
Examples (CodeCentral)
Member Services
About
Connect with Us