Watch, Follow, &
Connect with Us
Public Report
Report From: Delphi-BCB/RTL/Delphi/Pascal Strings    [ Add a report in this area ]  
Report #:  111103   Status: Open
Inefficient loop in Pos() for purepascal
Project:  Delphi Build #:  17.0.4625.53395
Version:    17.0 Submitted By:   Leif Uneus
Report Type:  Suggestion / Enhancement Request Date Reported:  12/5/2012 2:31:48 PM
Severity:    Commonly encountered problem Last Updated: 1/8/2014 7:09:29 AM
Platform:    All platforms Internal Tracking #:  
Resolution: None (Resolution Comments) Resolved in Build: : None
Duplicate of:  None
Voting and Rating
Overall Rating: (1 Total Rating)
5.00 out of 5
Total Votes: 16
Description

Updated.

A discussion about the slow purepascal Pos() implementation in XE3 (and XE4,XE5) arised in this thread, https://forums.embarcadero.com/thread.jspa?messageID=565913#565913.

The conclusion was to use the FastCode purepascal PosEx_Sha_Pas_2 instead.
It is a factor of 8 faster than System.Pos for x64.

I submit the code here:

{code}
function PosEx_Sha_Pas_2(const SubStr, S: string; Offset: Integer = 1): Integer;
Type
  PInteger =^Integer;
var
  len, lenSub: Integer;
  ch: char;
  p, pSub, pStart, pStop: pchar;
label
  Loop0, Loop4,
  TestT, Test0, Test1, Test2, Test3, Test4,
  AfterTestT, AfterTest0,
  Ret, Exit;
begin;
  pSub := pointer(SubStr);
  p := pointer(S);

  if (p = nil) or (pSub = nil) or (Offset < 1) then
  begin;
    Result := 0;
    goto Exit;
  end;

  lenSub := PLongInt(PByte(pSub) - 4)^ - 1; // <- Modified
  len := PLongInt(PByte(p) - 4)^; // <- Modified
  if (len < lenSub + Offset) or (lenSub < 0) then
  begin;
    Result := 0;
    goto Exit;
  end;

  pStop := p + len;
  p := p + lenSub;
  pSub := pSub + lenSub;
  pStart := p;
  p := p + Offset + 3;

  ch := pSub[0];
  lenSub := -lenSub;
  if p < pStop then
    goto Loop4;
  p := p - 4;
  goto Loop0;

Loop4:
  if ch = p[-4] then
    goto Test4;
  if ch = p[-3] then
    goto Test3;
  if ch = p[-2] then
    goto Test2;
  if ch = p[-1] then
    goto Test1;
Loop0:
  if ch = p[0] then
    goto Test0;
AfterTest0:
  if ch = p[1] then
    goto TestT;
AfterTestT:
  p := p + 6;
  if p < pStop then
    goto Loop4;
  p := p - 4;
  if p < pStop then
    goto Loop0;
  Result := 0;
  goto Exit;

Test3:
  p := p - 2;
Test1:
  p := p - 2;
TestT:
  len := lenSub;
  if lenSub <> 0 then
    repeat
      ;
      if (pSub[len] <> p[len + 1]) or (pSub[len + 1] <> p[len + 2]) then
        goto AfterTestT;
      len := len + 2;
    until len >= 0;
  p := p + 2;
  if p <= pStop then
    goto Ret;
  Result := 0;
  goto Exit;

Test4:
  p := p - 2;
Test2:
  p := p - 2;
Test0:
  len := lenSub;
  if lenSub <> 0 then
    repeat
      ;
      if (pSub[len] <> p[len]) or (pSub[len + 1] <> p[len + 1]) then
        goto AfterTest0;
      len := len + 2;
    until len >= 0;
  Inc(p);
Ret:
  Result := p - pStart;
Exit:
end;
{code}

Even in x32 bit mode, the PosEx_Sha_Pas_2 routine is faster than System.Pos if the search result is above 20-25 characters.


Steps to Reproduce:
{$APPTYPE CONSOLE}

uses
  SysUtils, Diagnostics;

var
  i, j: Integer;
  Stopwatch: TStopwatch;
  str: string;
  P: PWideChar;

const
  N = 50000000;

begin
  str := 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do '
    + 'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim '
    + 'ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut '
    + 'aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit '
    + 'in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur '
    + 'sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt '
    + 'mollit anim id est laborum.';

  Stopwatch := TStopwatch.StartNew;
  for i := 1 to N do
  begin
    j := Pos('tempor', str);
    if j=0 then
      Beep;
  end;
  Writeln('Pos: ' + IntToStr(Stopwatch.ElapsedMilliseconds));

  Stopwatch := TStopwatch.StartNew;
  for i := 1 to N do
  begin
    j := PosEx_Sha_Pas_2('tempor', str);
    if j=0 then
      Beep;
  end;
  Writeln('PosEx_Sha: ' + IntToStr(Stopwatch.ElapsedMilliseconds));

  Readln;
end.

In x64 bit mode the results are:
System.Pos         18427 ms
PosEx_Sha_Pas_2     2282 ms

In x32 bit mode:
System.Pos           2171 ms
PosEx_Sha_Pas_2      1868 ms
Workarounds
None
Attachment
None
Comments

Leif Uneus at 5/27/2013 1:10:52 AM -
Se discussion in this thread, https://forums.embarcadero.com/thread.jspa?messageID=565913#565913.

Rudy found that the FastCoder's purepascal versions of PosSHA and PosSHA2 outperforms the proposed improvements.

There are other Pos suggestions in that thread as well, but PosSHA and PosSHA2 would be the best candidate.

Leif Uneus at 1/8/2014 8:18:21 AM -
I updated the report with the PosEx_Sha_Pas_2 algorithm and a test routine.

Leif Uneus at 1/6/2014 12:30:09 PM -
I submitted a version of the FastCode PosEx_Sha_Pas_2 here: http://stackoverflow.com/a/20947429/576719.

It implements a 32/64 bit purepascal version of the Pos system function (For unicode strings).
Performance in x64 bit mode is equal to System.Pos in 32 bit mode. It even surpasses System.Pos in x32 bit mode.

Server Response from: ETNACODE01