Watch, Follow, &
Connect with Us
Public Report
Report From: Delphi-BCB/Compiler/Delphi/Other Compiler    [ Add a report in this area ]  
Report #:  119501   Status: Open
[iOS, Android] Add support for 8-bit COW strings on mobile platforms
Project:  Delphi Build #:  19.0.13476.4176
Version:    19.0 Submitted By:   Dalija Prasnikar
Report Type:  Basic functionality failure Date Reported:  10/4/2013 1:00:37 AM
Severity:    Serious / Highly visible problem Last Updated: 2/5/2014 6:22:04 PM
Platform:    Apple mobile OS Internal Tracking #:   47291
Resolution: None (Resolution Comments) Resolved in Build: : None
Duplicate of:  None
Voting and Rating
Overall Rating: (59 Total Ratings)
4.90 out of 5
Total Votes: 748
Description
Please add support for 8-bit COW strings on mobile platforms.

8-bit strings are important part of Delphi language on desktop,
not having them on mobile platform is not improvement of the language but cripplement.

Changing existing code that uses 8-bit strings and that could run unchanged on mobile platform is very hard and costly process. Accomodating changes can also lead to suboptimal performance of that code. All that can be serious showstopper for using Delphi on mobile platforms.
Steps to Reproduce:
See:
https://forums.embarcadero.com/thread.jspa?threadID=93670&tstart=0&start=0

http://blog.marcocantu.com/blog/strings_immutability_cow_ansistrings.html
Workarounds
None
Attachment
None
Comments

Roger Connell at 10/8/2013 3:23:18 PM -
The whole attraction of Delphi Mobile to me is the ability to use existing code and existing applications with a new mobile interface.

The total lack of support for 8 bit strings seems to block this path. If I wanted to make new mobile developments from scratch it is unlikely Delphi would be my product of choice. The potential value rests mainly in the potential compatibility with existing applications and code.

Dalija Prasnikar at 10/9/2013 1:56:45 AM -
Same here. Ability to reuse existing code is completely broken for me because of lack of 8-bit strings.

Jared Davison at 10/8/2013 6:51:12 PM -
The semantics of using array of byte versus a copy on write (COW) bytestring are different. Converting an ansistring to  array of byte, means that the = operator no longer produces a test of equality of the content of the string but compares the reference. A lot of code needs to be found to find all tests for equality in our code!

Dmitry Nikolaenko at 10/9/2013 4:59:16 AM -
Let's look through the source code of System.pas

  {$IFDEF NEXTGEN}
   ...
  _AnsiStr = _AnsiString;
   ...
  {$ELSE}
   ...
  _AnsiStr = AnsiString;
   ...
  {$ENDIF}

And further in the code:

  TMarshal = class(TObject)
    class function AsAnsi(const S: string): _AnsiStr; overload; inline; static;

  VarToLStrProc: procedure (var Dest: _AnsiStr; const Source: TVarData) = nil;   // for internal use only

  procedure _LGetDir(D: Byte; var S: _AnsiStr);

  procedure WideCharLenToStrVar(Source: PWideChar; SourceLen: Integer; var Dest: _AnsiStr); overload;

procedure OleStrToStrVar(Source: PWideChar; var Dest: _AnsiStr); overload;
function StringToOleStr(const Source: _AnsiStr): PWideChar; overload;

procedure _ReadLString(var t: TTextRec; var s: _AnsiStr; CodePage: Word);

function _WriteLString(var t: TTextRec; const s: _AnsiStr; width: Longint): Pointer;
function _Write0LString(var t: TTextRec; const s: _AnsiStr): Pointer;

function _NewAnsiString(CharLength: LongInt; CodePage: Word): Pointer;

function _LStrLen(const S: _AnsiStr): Longint; inline;

...and so on, and so forth.

I.e., AnsiString is available in System.pas for internal use (and the compiler supports AnsiString), and this capability is closed for the rest. In my opinion, this is not quite fair.

Joe Petrie at 1/31/2014 1:15:25 AM -
Yes.  Very unfair.


I just wasted 2 hours trying to figure out how to get the Unicode string passed to a C library that will be included with my Android app, that is, of course using C strings.  A forum user brought to my attention MarshaledAString which is just a PAnsiChar.  Which worked great for sending constant string values to the library and reading the strings from the library.  However I still had to do this to convert a unicode string variable to pass it to the library:

function GetBytesZero(const S: string): TBytes;
var
  Len: Integer;
begin
  Len := TEncoding.ANSI.GetByteCount(S);
  SetLength(Result, Len + 1);
  TEncoding.ANSI.GetBytes(S, Low(S), Len, Result, 0);
  Result[High(Result)] := 0;
end;

MarshaledAString( GetBytesZero(S) ).

Really??  It's not like I could have had the third party vendor update their library to Unicode to compensate for the fact that the vendor of my programming tool decided to take away fundamental string types.

All 8-bit string types need to be put back where they belong.

Joe Petrie at 1/31/2014 1:23:50 AM -
So basically, what should have been a non issue to get some already written desktop code to compile into my android app became an unnecessary 2 hours of wasted time.

Everest Software International Director at 10/9/2013 9:47:46 PM -
You cannot wish ANSI or UTF8 away and so we still need to work with this data.

In fact UTF8 is the default string type in iPhones and on the WEB.

All available awkward workarounds like:
TMarshal.ReadStringAsUtf8(TPtrWrapper.Create(buffer))
make code harder to write and read.

This reduces our ability to serve our customers and makes us waste time on workarounds.

And replacing compiler support for these types with any RTL surrogates is just not good enough. It has to be supported in the compiler.

Alexander Snigerev at 10/11/2013 4:50:32 AM -
there actually ARE support for ansi characters

  PAnsiChar=  MarshaledAString;
  AnsiChar=^MarshaledAString;

---
so stop compiler guys' slovenliness and profanity. Stop lame excuses.
We want compatibility and flexibility back.

Alysson Cunha at 10/11/2013 5:24:37 AM -
Java, C# and Javascript works with and only with UTF16 in-memory/in-work, just like the nextgen Delphi's compiler. They use ANSI or UTF-8 only when storing the string to somewhere or loading the string from somewhere, but when working in memory is just UTF-16.

There are several  beneficts of working UTF16 against ANSI and against UTF8.

UTF16 vs ANSI - Internationalization.
UTF16 vs UTF8 - with UTF8 is hard to detect the length of the String and hard to locate correctly the characters using index.

And by using just one charset when working with strings in-memory, the compiler becomes less complex.

I think it's time to evolve! Everybody it's evolving!
It's not too hard to change the code to Load/Store strings between different charsets in Delphi.

Alysson Cunha at 10/11/2013 5:27:08 AM -
But even if you REALLY need to work with ANSI/UTF-8 string in memory, the Delphi's compiler have a really, reallyyyyyyyyyyy nice feature that will help you deal with this easy easy, and with nice readable code:

Records + class operators.

I will make a sample here for you, guys...

Alysson Cunha at 10/11/2013 6:12:28 AM -
Please, see this! Probably it will help you guys A LOT!

I implemented a TAnsiString type (a record with class operators) that will work with AnsiStrings in-memory in a clean way, with nice readable code.


TAnsiString type implementation: http://pastebin.com/aduEzg8M
TAnsiString use demo: http://pastebin.com/RNz4Yhc0


I've made it really fast to show as example. It does not have COW behaviour, but with little effort you can add COW behaviour to TAnsiString.

You can add more Implicit/Explicit class operators to support conversions to other different types, but is easy!

I hope this helps you. As you saw, it's easy to implement, easy to customize and easy to suit to different situations!

Alexander Snigerev at 10/11/2013 7:12:25 AM -
those samples are number times SLOWER than built-in implementation. It does matter A LOT on mobile!

No, we NEED 1st class solution! The compiler-backed one, just as on desktop.
For God's sake! Utf-8 is SYSTEM string type on Linux kernel/Android! It deserves compiler support.

Alysson Cunha at 10/11/2013 12:01:55 PM -
I think you misunderstood what UTF-8 is. The "8" in the name of the UTF-8, does not means that each characters of the UTF-8 encoding occupies 1byte/8bits. Each UTF-8 encoded char may occupy 1, 2, 3 or 4 bytes. It varies according to the char.

Knowing that UTF-8 encoded chars vary its size, there are 3 huge issues when dealing with it.

1 - It's not possible to discover the Length of a UTF-8 encoded string without interpreting all chars of the sequence. (Performance issue)


2 - Imagine the following assignment on a imaginary native UTF8 String type:
  Str[512] := UpCase(Str[512]);

Getting the 512th character of the String is really tricky with UTF-8. Since each char may occupy different sizes in memory, to identify the 512th character you must reinterpret the entire string sequence till find the 512th char.

Imagine this issue if you are inside of a loop. Imagine the performance the code would have. a simple "for i := 0 to Length(Str) -1 do" would become a terrible O(N^2) complexity algorithm instead of simpler O(N) one.

3 - The worst of all issues:
     Str[512] := UpCase(Str[512]);

Suppose that the 512th char occupies 2 byte in-memory, but we are trying to assign a 4-byte encoded character to the same 512th char (suppose that the char returned by UpCase occupies 4bytes in memory ).

Ta da! What will you do? We have a 2-bytes slot in memory, but we need a 4-bytes slot. What would you do? Reallocate the entire sequence to open a 4-bytes slot before the assignment? Imagine again this situation inside a loop. A really problematic peformance issue.

----------------------------------------

Don't get me wrong, UTF-8 is very useful. It can store international strings occupying smaller sizes of memory. It's really GREAT when you are exchanging data with other entity (Like on HTTP connections) or when you are storing strings to somewhere (Like on HTML files, or in the file system).

UTF-8 is used just in exchanging data situation or string storage situation. But when working/processing strings, the UTF-8 encoded string is first decoded to some fixed-length char encoding before the processing.

Java, Linux, Javascript, C# and Delphi (YES!) use this idea. Almost every Linux distributions uses UTF-32 as working charset. Java (even on Android), C#, Javascript and Delphi uses UTF-16 as working charset.

And YES, Delphi brought to us a 1st-class way to deal with UTF-8. TEncoding.UTF8.GetBytes and TEncoding.UTF8.GetString is very similar the way that Java and C# uses to encode strings..

Alexander Snigerev at 10/12/2013 3:03:11 AM -
>I think you misunderstood what UTF-8 is.
You are wrong.
>Each UTF-8 encoded char may occupy 1, 2, 3 or 4 bytes. It varies
>according to the char.
JUST AS IN UTF-16. There is surrogates.

>Knowing that UTF-8 encoded chars vary its size, there are 3 huge
>issues when dealing with it.

>1 - It's not possible to discover the Length of a UTF-8 encoded
>string without interpreting all chars of the sequence.
>(Performance issue)
So what?
n~ino -5 characters, 4 glyphs, 4 meaningful characters!
[ñino]

>2 - Imagine the following assignment on a imaginary native UTF8

This is same issue, as 1st one. Again, UTF16 is NOT a solution, keeping in mind accents, diacritical marks, and other Unicode surrogates.

>3..
This is all lame excuses. There are numerous way to shoot your leg, dealing with true Unicode. UTF16 won't save you.
---
Don't get me wrong. Utf-8 is a MUST. How I am supposed to work with such code, as GOOGLE GUMBO PARSER? Convert UTF16->UTF8 and back? That's what I call - "Performance issue", dude.

Again, UTF-16 IS NOT FIXED-LENGTH, goto [school] and discover that.
And linux kernel IS UTF-8 based, stop lying.
http://stackoverflow.com/questions/10051559/which-unicode-encoding-does-the-linux-kernel-use

UTF8,this is native way to deal with LINUX/UNIX system, on THE low LEVEL Delphi DOES.
>And YES, Delphi brought to us a 1st-class way to deal with
>UTF-8.
AND NO, this is lame way to do things, I will not put up with.
I need freedom of C++, not JAVA or C# cage

Alysson Cunha at 10/12/2013 4:57:42 AM -
I know that UTF-16 is not fixed size, and I also know of  the existence of the surrogates in UTF-16, but as you know, 16bits (codepoints U+0000 to U+D7FF and U+E000 to U+FFFF) is enough to cover the BMP (Basic Multilingual Plane, see http://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane)

"The BMP contains characters for almost all modern languages, and a large number of special characters"... The Surrogates are rarely used, because the most commonly used characters are all in the Basic Multilingual Plane

So, dealing with UTF-16 basically as fixed size (with surrogates, as many programming languages do) has enormous advantages.

I do not see why Embarcadero should spend time dealing "with the exception", since the actual implementation solves 99,99% of the problems with good performance. (TEncoding.UTF8.GetBytes/GetString deal with surrogates, so it supports nicely UTF-8 for 99,999% of the situations)

Is Delphi's UTF-16 implementation correct? Technically speaking, I believe not. But I am totally satisfied, because it solves the BMP case easly and with good performance.

---------------------------

About the linux, following the link you gave, go to http://www.xsquawkbox.net/xpsdk/mediawiki/Unicode#Linux
and you will see there is a note: "Ben says: can anyone verify the above notes or fix it?" .. That statement must not be taken as true.

I will try to find here where I had read that most of linux compilations use UTF-32 as in-memory working encoding and stores/exchange string as UTF-8.

Alexander Snigerev at 10/12/2013 8:15:05 AM -
Linux API is often said : unicode agnostic - this means effectively 8-bit standard. Which means utf-8 for unicode case!

see http://www.man7.org/linux/man-pages/man7/unicode.7.html
"Under GNU/Linux, the C type wchar_t is a signed 32-bit integer type."
that is , utf-32!

next:

"UCS/Unicode can be used just like ASCII in input/output streams,  terminal communication, plaintext files, filenames, and environment variables in the ASCII compatible UTF-8 multibyte encoding"

this is  COMMON CASE for kernel functions like fopen, see system.pas, where calls to, say, AssignFile in POSIX environments leads us to
_UTF8Str(f.Name)

So stop lying about UTF16 on linux low level, it's NON-EXISTENT.

Alysson Cunha at 10/13/2013 4:14:21 AM -
I never told that Linux uses UTF-16, I told that it uses UTF-32. As you brought to us, wchar_t is 32bits on linux. I never saw the inner code of linux, but I believe it converts from UTF-8 to UTF-32 before processing strings in many situations.

We are talking about a UTF-8 decoding when a code transverses the string getting the glyphs/characters. Because a UTF-8 char may occupy more than 4 bytes you would need a 64-bit integer to hold and understand a UTF-8 glyph without decoding it, and I never heard of someone doing this.

Supposing a UTF-8 UpperCase function, a UTF-8 decoding, processing, and coding back occurs. The difference is that the decoding/encoding is made char-by-char instead of decoding the entire string before the processing, and encoding the entire string back after processing.

That's why I truly believe that Linux do what I said: It uses UTF-8 as storage and exchange encoding, but when dealing with the content of the string, a UTF-8 to UTF-32 conversion is made.

Alexander Snigerev at 10/13/2013 1:40:04 PM -
thats utterly irrelevant, what encoding Linux uses internally: to exchange string, linux and gnu libraries does use UTF8.

And there a lot of scenarios, when UTF-8 is the best approach to deal with strings, say HTML/XML, which is commonly utf-8, too.
U can parse structure as it were ASCII, since all tags are basic characters; fast, and effective. U can extract inner portions of text, to UnicodeString or whatever.

We don't need UnicodeString to disappear from Delphi, rather, we want AnsiString, Utf8String back -- to ensure flexibility AND COMPATIBILITY with DESKTOPD.

you think those chickens from Emb would dare to remove Utf8String from Desktop Compiler? Never! They're aware, this is direct loss of money and customers

So what the heck we are talking about?
We,old schoolers, need compatibility between desktop and mobile! That's the whole point of Delphi mobile Embarcadero needs it too, since they have no many customers except old classic Delphi users
So stop arguing, there is no good from that.
This request already gained 400+ voices,this is TOPMOST issue to fix.
Either they DO FIX it, or we know Embarcadero products does not deserve trust, and our future plans connected, as the management they became deaf, unable to deal with customer challenge.

Everest Software International Director at 10/12/2013 3:44:09 AM -
Alysson, most of what you say is incomplete or incorrect.

UTF8 can span up to 7 bytes, but at the same time it's actually much easier to count the chars in UTF8 than in UTF16, exactly because you do not need to decipher every char, it's the power of the format.

And taking Delphi down to Java level is no progress, it's degrading. Delhi strings in general have more power than just text type, their application to binary data is unmatched. Taking it away would reduce our power significantly.

Surrogates are the obvious bandaid, I did this for ANdroid development in XE5 because I had to, but it's awkward and ugly, it takes longer to write and is harder to read, plus slower to work and it's never going to be as elegant as a built-in AnsiString, UTF8String & RawByteString with compiler support.

The fact that the lesser languages never had the power of Pascal string type is no reason to forfeit our best advantage.

And if we need to receive UTF8 XML, process it and send it back in UTF8, then all these unnecessary calls and conversions and inconvenient typecasts and utility function calls are really a horrible overhead for no benefit at all.

Alysson Cunha at 10/12/2013 5:16:10 AM -
I really thought that UTF-8 chars could grow up to 4 bytes only. But you are correct, it can grow further than this.

Please, see what I commented about the BMP (Basic Multilingual Plane). Considering  that dealing with UTF-16 basically as fixed size can satisfy "almost all modern languages, and a large number of special characters", there's a lot advantages doing it so.

And speaking of performance, I bet you that decoding UTF-8 to UTF-16, process, and code back to UTF-8 (when exchanging data or storing/loading data) is faster than working with UTF-8 directly in many cases.

Alexander Snigerev at 10/12/2013 8:30:11 AM -
>And speaking of performance, I bet you that decoding UTF-8 to
>UTF-16, process, and code back to UTF-8 (when exchanging data or
>storing/loading data) is faster than working with UTF-8 directly
>in many cases.

Not in all cases, obviously.  And we don't need that bets. We have our own goals and considerations.

We are working on custom software dealing complex scripts,
communicating complex c++ libraries(which return, say, parsed html in utf-8 and,say char indices), etc.

We need all the flexibility and famous power of Delphi strings, which was stolen from us.

We need backward compatibly, desktop/mobile single code base, AND freedom to choose.

Alysson Cunha at 10/13/2013 3:15:38 AM -
If you are dealing with complex scripts, there is one more reason to stop and think a little more about the complexity of directly handling UTF-8 strings.

For sample. a simple algorithm like

for i := 0 to Length(Str) - 1 do
DoSomeProcessing with Str[i];

would become a O(N^2) complexity algorithm. In fixed-size chars (or something near fixed-size) the same loop is O(N). What this means? Why this matter?

In a String with 10 characters, the algorithm would be 10 times slower.. In a String with 1000 characters, it would be 1000x slower. In a string with 128k chars, it would become 128,000x slower. The algorithm would scale poorly. That means that, if 1 second is needed to scan a fixed-length 128k string, scanning the same string in UTF-8 directly would tend to take unbelievable 35 hours (Hello O(N^2) complexity).

To avoid that unbelievable time, you would need to use some kind of String Sequencial Scan Class/Utility, with this you will loose the nice way that Delphi deals with strings today. And, of course, this is not what you want.

A code that decodes the UTF-8 to UTF-16, loops the string, and code back to UTF-8 is still a O(N) complexity algorithm. This means that it scales better (much better) than O(N^2)

There are technical issues when handling directly UTF-8. I believe it's almost impossible to treat UTF8 strings with high performance using the same way Delphi uses in UTF-16 strings.

Arnaud BOUCHEZ at 3/15/2014 8:57:24 AM -
You totally missed the point, sorry.

There is no such "fixed-size chars" in Unicode, neither in UTF-8 nor UTF-16. Both may have diacritics, and a lot of Unicode glyphs in UTF-16 may consist in 2 WideChars (=4 bytes).

On the opposite, UTF-8 is known to be faster than UTF-16 when you just want to loop over all its content, when speed matters, i.e. for XML or JSON, which consist mainly on 7 bit ASCII content.

I do not have problem with string = UnicodeString to be encoded as UTF-16.
But I need the ability to handle nativly UTF-8 strings and memory buffers (PUTF8Char) in some part of my code.

Everest Software International Director at 10/13/2013 8:13:35 PM -
These arguments are not really relevant. The same arguments can be used to justify dropping UTF16 support from the compiler. And in fact, that would be better then the current direction in a number of ways.

But the simple fact is that the strings of all these types still exist and we still need to work with them.

And of course we want to work with compiler-supported built-in types, because alternatives are simply unjustifiable.

Alysson Cunha at 10/14/2013 11:20:00 AM -
Dealing with UTF-16 basically as fixed-size solves the Basic Multilanguage Plane, which contains almost all modern languages and a lot of special symbols. This is what Delphi (and a lot of others compilers) does and it solves 99,999% of the cases with excelent performance.So, there's no good reason to change it (yet).

But I was discussing about native handling UTF-8 strings. The type UTF8String was simply strange. Dealing UTF-8 strings as 1-byte char is complety strange. In UTF8String type, Length returns the byte-length of the string, not the count of the characters. Var[Index] returns/sets a byte of the string, not a character (This way, you could corrupt a UTF8String or get a invalid character).  The UTF8String was creepy, and I thanks for its removal, because it will save many spent hours of newbies in Delphi programming when they deal with UTF-8.

AnsiString I miss it sometimes. I am ok with AnsiString. I don't need it anymore, because my software must be internationalized. But I understand that AnsiString may be useful in many situations.


But, please, don't fool yourselves thinking that bringing back the AnsiString will make the "code run unchanged on mobile platform". Even that AnsiString revives, you will need to analise your code to adapt it (I already ported many units of my software to became desktop-and-mobile compatible).

NextGen string is 0-based, and Desktop's is 1-based. You will need to change your code to deal with this.

ARC will demand of you too. You will need to analise the relationship of your objects and put some [Weak] on your code. Without this, your code would leak memory. And you will need to prepare your code to deal with [weak] behaviour when an object is destroyed (the [weak] var will be nilled automatically, maybe you will need to put some "ifs" to avoid access violations).

Knowing about this, will you ask Embarcadero to remove ARC and come back with 1-based strings too? It's time to evolve!

I think Delphi for Mobile was made to be easy to produce Desktop-and-Mobile compatible code, but I think you should not expect that OLD codes will be multiplatform compatible without some change.

Trust-me, I ported a huge portion of code, and it was easy.

Dalija Prasnikar at 10/14/2013 12:02:15 PM -
Sorry, but what is exactly your problem. Obviously you don't need 8-bit strings, good for you. Nobody is forcing you to use them.

But we, who do need them, know why and when we are using them and why they are critical to our code.

Alysson Cunha at 10/14/2013 2:04:45 PM -
My "problem" is that I am trying to defend the concept of a next-generation compiler, and trying to keep a newborn compiler clean from the unecessary legacy stuff. Remember, its a newborn compiler that whe are talking about, and it will last the next years!

I know that the existence of AnsiString would help you, but when I told "unecessary", is because there are other means to do the same thing. And its not hard to use these other means, but is hard to keep many Strings with Different Enconding natively, in the compiler perspective.

Markus Humm at 10/18/2013 12:59:22 AM -
Hello,

we do not need EMBT to waste precious time with implementing and testing workarounds for already existing and even better working (means more time efficient) solutions.

That's contra productive.

We do like to have a language which provides us with best of class solutions, but taking away such types is heading into the wrong direction.

Greetings

Markus

Alysson Cunha at 10/14/2013 2:22:51 PM -
Here are my final argumentation/suggestions, and I really hope Embarcadero take this into consideration.

----------------
We are talking about a newborn next-generation compiler. Embarcadero should maintain it clean from unnecessary legacy stuff, because it will last the next years!

Unecessary, because there are other means to do the same thing (And its not hard to use these other means, but is hard to keep many Strings with Different Enconding natively, in the compiler perspective.)

But, to keep compatibility, I suggest embarcadero to create something like this http://pastebin.com/aduEzg8M (use demonstration: http://pastebin.com/RNz4Yhc0), name it as AnsiString (Probably also create an fake AnsiChar type, and do the same to UTF8String) and put these on a unit with a nice name like System.DeprecatedStrings. The type AnsiString also must emit a deprecated warning when is used.

AnsiString as a record with class operators will create a nice compatibility with legacy code with a really nice performance, but, at the same time, will encourage the programmers to use the nextgen String type.  AnsiString as record+operators would, basically, do the same thing the compiler does natively with Desktop's AnsiString, that's why I truly believe it will deliver a nice performance.

If Embarcadero revive the AnsiString/UTF8String natively, the programmers will NEVER abandon it, and Emb will live with this legacy stuff forever, increasing the complexity of the compiler.

Everest Software International Director at 10/14/2013 9:43:09 PM -
You are assuming that ansistring & utf8string are not needed and that getting rid of them is progress, while it's exactly the other way around.

What you are saying is much like "let's drop the unnecessary BYTE, WORD and DWORD types from the compiler to keep it clean, because we now have modern QBYTE on 64-bit CPU's and it works faster and you can pretty much do the same things with QBYTE".

ansistring & utf8string are natural basic data types which cannot be ignored. Workarounds would pollute the language and rob it of its power.

I would suggest that those who cannot see any use for pointers, mutable strings or utf8strings just switch to the language that does not have these benefits, like Java.

Because if Delphi becomes a Java-like language, it would lose all of its attractiveness to a lot of people. And it would in fact stop being a flavour of Pascal entirely.

Anyway, this bug report is here specifically to return these necessary data types back, so your comments really look out of place here.

Arnaud BOUCHEZ at 3/15/2014 9:03:49 AM -
Yes!

Why not just drop the whole concept of integers, and use floating point value everywhere, like in JavaScript?

So it will induce a lot of issues about performance and execution complexity, just like with modern JavaScript engines! V8 had to "invent" a 31 bit signed integer execution-time type, which may fallback into a double later.

Less types do not induce an easier to learn language.

Just use the one you need, and if you need something more specific, you have it at hand.

For instance, there are some many third-party APIs which are not relying on UTF-16 but on UTF-8, especially outside the Windows platform (remember that Delphi is cross-platform now)... So getting rid of direct and native UTF-8 string support is just a PITA.

Dalija Prasnikar at 10/15/2013 12:37:52 AM -
Well said.

Arnaud BOUCHEZ at 3/15/2014 8:41:17 AM -
In fact, the feature is still there, but disabled.

There is no need to "add" something new to the compiler.
Just "re-enable" a feature which is still there, at compiler and RTL level.

See http://andy.jgknet.de/blog/2013/12/system-bytestrings-support-for-xe5-update-2

Marvin Colgin at 3/17/2014 12:14:21 PM -
My thanks Andreas Hausladen. It seems we can always count on you to help fix what Borland->Insprise->CodeGear->EmbT breaks.

Christopher Burke at 3/16/2014 6:09:20 AM -
We require 8 bit strings to support the numerous 8 bit interfaces used in retail systems. Without 8 bit strings, we are forced into some pretty weird magic to get those interfaces working.

Leif Uneus at 3/16/2014 3:11:59 PM -
The same requirements are also for many industrial 8 bit protocol interfaces. Modbus and NMEA are some examples. We are supporting at least 30 more industrial standards.

Marvin Colgin at 3/17/2014 12:12:26 PM -

+Marco Cantù 670 votes on the QC, 47 comments on your blog; these comments represent developers with millions of lines of code. I'm one of them.

Yes, these applications use AnsiStrings and it's predecessor String, as it's a type that's existed in Pascal since the beginning. This codebase is why Delphi still exists, as the "contract" with developers was established decades ago that allowed my code (rtl) from 20 years ago to compile today. Arbitrarily deciding to drop support for a fundamental type is breaking this contract and represents a point-in-time where Delphi no longer supports my Pascal code.

You could continue to ask us to prove why we need AnsiStrings, like it's some kind of academic interview question, but I'll always say it's because my existing Pascal codebase needs to compile in Delphi. Why? Because developer-time == $$$. The labor required to audit the codebase and make changes represents a significant amount of money. Much, much more than what EMBT would charge for their Recharging, Upgrading, SLA, etc.

Developer Labor is more expensive than your product and I can't let a product determine how I spend my labor. Delphi is a tool, and if Delphi tells me I need to use a nail-gun vs a hammer; then I'll make my own determination and most likely go with the hammer, if I don't need a nail-gun.

I fear Borland -> Inprise -> CodeGear -> Embt has lost the memory of how they have "stayed in business" all these years. It's certainly not because of how the product is managed and I think that needs to be evaluated.?

Eugene Kotlyarov at 4/21/2014 9:53:15 PM -
There is no way to downvote this, so I'll just add a comment that I totally support removing 8-bit strings.
My argument is string type for something that is supposed to be readable by humans and so unicode support is essential.
As a user of non-english programs I've seen too many applications broken because of using 8 bit strings.

Everest Software International Director at 3/26/2015 10:49:09 PM -
Yes, as Dalija already mentioned, this comment was really irrelevant to the discussion at hand.

We are now 2 years later, XE8 should be coming out shortly, but this issue is still unresolved ;-(

Dalija Prasnikar at 5/6/2014 1:35:28 PM -
8-bit strings also cover UTF8String that supports Unicode. Problems you have experienced are not directly connected with existence of 8-bit strings but how they are used. I am also non-english user and Unicode is very important to me. But so are 8-bit strings, especially UTF8String and RawByteString.

This request is not about removing Unicode, but allowing different string types so developers can choose best one for their job. In my line of work processing large amounts of UTF8 encoded data requires memory efficiency that 8-bit string provide.

Server Response from: ETNACODE01