Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward

Adhemerval Zanella adhemerval.zanella@linaro.org
Wed Jul 4 12:08:00 GMT 2018



On 03/07/2018 20:30, Carlos O'Donell wrote:
> On Tue, Jul 3, 2018 at 4:10 PM, jrmarsha <jrmarsha@mtu.edu> wrote:
>>
>>>> Hello all,
>>>>
>>>> I've been implementing a new version of Timsort in C++ using more
>>>> standard
>>>> facilities.  One thing I've come across which adds a divide between a
>>>> larger
>>>> codebase and faster code is the implementation of std::inplace_merge with
>>>> sufficient memory.  This goes to the specific functions
>>>> __move_merge_adaptive, and __move_merge_adaptive_backward.  In
>>>> particular,
>>>> there is an optimization where each insertion searches for the maximum
>>>> number of elements it can move.  A brief in context explanation is here:
>>>> https://en.wikipedia.org/wiki/Timsort#Individual_merges .  Would such a
>>>> trick have a place in glibc or is the overhead likely too great in the
>>>> general case?
>>>
>>> It's possible. We have a tunable framework for changing runtime
>>> parameters like this based on application needs.  The interface only
>>> allows you to make the choice once at startup. So it may not be
>>> sufficient to meet the needs of all users. However some choice in this
>>> area may be better than no choice.
>>>
>>> Cheers,
>>> Carlos.
>>
>> How would I get new code incorporated into this and make that decision
>> intelligently at compile time?  What would be a good way to add this
>> inplace_merge variant in?
> 
> If you are looking specifically at std::inplace_merge, then you need to talk
> to the GCC developers, since libstdc++ is part of GCC.
> Please see: https://gcc.gnu.org/
> 
> If you are interested in similar strategies for the C library, then you can look
> at the qsort() API, which can be implemented in any way really as long as
> it meets the API requirements (and is written in C). In glibc you might use a
> tunable to switch between one algorithm and another. You can see this
> implementation in stdlib/qsort.c in glibc, which does implement quicksort.

In fact I think we should use a O(1) space algorithm for qsort() instead
of current strategy of quicksort plus mergesort.  This allows qsort() to
be fully AS-Safe and AC-Safe.  

I would also avoid adding a tunable for qsort, if the idea is to provide 
different algorithm I would rather use FreeBSD strategy to provide explicit
implementations [1].  I think it is not a good expectation to set the
algorithms expected complexity, in both runtime and memory consumption, 
by an external environment (for instance user might use qsort in a signal
handler and it might start to fail due malloc usage if the tunable is
set).

I have sent patchset upstream for review to refactor qsort [2]. It changed
to use the internal quicksort implementation as default (since our 
implementation has the optimization to use O(1) space) plus some small
optimization to the generic case (which also seems to most used in my
analysis).  Ondrej has raised some concerns about setting qsort to use
quicksort as default which has O(n^2) in worst-case performance as a 
potential issue and I replied that we might try to get quicksort to get 
O(n*ln(n)) statistically as a QoI. I my experiments I used the Mersenner 
Twister implementation I used for benchmark as the source of entropy for 
pivot selection, but we might also try to use the new arc4random instead 
(although Florian has stated it might require more work to use internally).

Another option, which I considering for next submission for 2.29, is to
just smoothsort algorithm as an alternative implementation.  It has also
O(1) space utilization, however it is faster only for already sorted input 
being slower for random, mostly sorted or repeated inputs. For reference
I have pushed the implementation I measured against on personal branch [4].

Or we can just use a simple heapsort, as Linux kernel does internally,
which is slower than both heapsort and quicksort, but the code is a lot
simpler than both.

There is also the block sort, which have O(n*ln(n)) worse-case complexity
and O(1) utilization, but I have not evaluated it. Its description seems 
rather complex, so I am not sure if it worth the trouble.

[1] https://www.freebsd.org/cgi/man.cgi?query=mergesort&sektion=3
[2] https://sourceware.org/ml/libc-alpha/2018-01/msg00626.html
[3] https://sourceware.org/ml/libc-alpha/2018-02/msg00358.html
[4] https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=shortlog;h=refs/heads/azanella/qsort-smooth



More information about the Libc-help mailing list