[PATCH] x86: Add thresholds for "rep movsb/stosb" to tunables

Carlos O'Donell carlos@redhat.com
Fri Jul 3 17:43:52 GMT 2020


On 7/3/20 12:54 PM, H.J. Lu wrote:
> On Fri, Jul 03, 2020 at 12:14:01PM -0400, Carlos O'Donell wrote:
>> On 7/2/20 3:08 PM, H.J. Lu wrote:
>>> On Thu, Jul 02, 2020 at 02:00:54PM -0400, Carlos O'Donell wrote:
>>>> On 6/6/20 5:51 PM, H.J. Lu wrote:
>>>>> On Fri, Jun 5, 2020 at 3:45 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>
>>>>>> On Thu, Jun 04, 2020 at 02:00:35PM -0700, H.J. Lu wrote:
>>>>>>> On Mon, Jun 1, 2020 at 7:08 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On Mon, Jun 1, 2020 at 6:44 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>>>> Tunables are designed to pass info from user to glibc, not the other
>>>>>>>>> way around.  When __libc_main is called, init_cacheinfo is never
>>>>>>>>> called.  I can call init_cacheinfo from __libc_main.  But there is no
>>>>>>>>> interface to update min and max values from init_cacheinfo.  I don't
>>>>>>>>> think --list-tunables will work here without changes to tunables.
>>>>>>>>
>>>>>>>> You have a dynamic threshold.
>>>>>>>>
>>>>>>>> You have to tell the user what that minimum is, otherwise they can't
>>>>>>>> use the tunable reliably.
>>>>>>>>
>>>>>>>> This is the first instance of a min/max that is dynamically determined.
>>>>>>>>
>>>>>>>> You must fetch the cache info ahead of the tunable initialization, that
>>>>>>>> is you must call init_cacheinfo before __init_tunables.
>>>>>>>>
>>>>>>>> You can initialize the tunable data dynamically like this:
>>>>>>>>
>>>>>>>> /* Dynamically set the min and max of glibc.foo.bar.  */
>>>>>>>> tunable_id_t id = TUNABLE_ENUM_NAME (glibc, foo, bar);
>>>>>>>> tunable_list[id].type.min = lowval;
>>>>>>>> tunable_list[id].type.max = highval;
>>>>>>>>
>>>>>>>> We do something similar for maybe_enable_malloc_check.
>>>>>>>>
>>>>>>>> Then once the tunables are parsed, and the cpu features are loaded
>>>>>>>> you can print the tunables, and the printed tunables will have meaningful
>>>>>>>> min and max values.
>>>>>>>>
>>>>>>>> If you have circular dependency, then you must process the cpu features
>>>>>>>> first without reading from the tunables, then allow the tunables to be
>>>>>>>> initialized from the system, *then* process the tunables to alter the existing
>>>>>>>> cpu feature settings.
>>>>>>>>
>>>>>>>
>>>>>>> How about this?  I got
>>>>>>>
>>>>>>
>>>>>> Here is the updated patch, which depends on
>>>>>>
>>>>>> https://sourceware.org/pipermail/libc-alpha/2020-June/114820.html
>>>>>>
>>>>>> to add "%d" support to _dl_debug_vdprintf.  I got
>>>>>>
>>>>>> $ ./elf/ld.so ./libc.so --list-tunables
>>>>>> glibc.elision.skip_lock_after_retries: 3 (min: -2147483648, max: 2147483647)
>>>>>> glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.malloc.perturb: 0 (min: 0, max: 255)
>>>>>> glibc.cpu.x86_shared_cache_size: 0x100000 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.elision.tries: 3 (min: -2147483648, max: 2147483647)
>>>>>> glibc.elision.enable: 0 (min: 0, max: 1)
>>>>>> glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.elision.skip_lock_busy: 3 (min: -2147483648, max: 2147483647)
>>>>>> glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.cpu.x86_non_temporal_threshold: 0x600000 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.cpu.x86_shstk:
>>>>>> glibc.cpu.hwcap_mask: 0x6 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.malloc.mmap_max: 0 (min: -2147483648, max: 2147483647)
>>>>>> glibc.elision.skip_trylock_internal_abort: 3 (min: -2147483648, max: 2147483647)
>>>>>> glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.cpu.x86_ibt:
>>>>>> glibc.cpu.hwcaps:
>>>>>> glibc.elision.skip_lock_internal_abort: 3 (min: -2147483648, max: 2147483647)
>>>>>> glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0xffffffff)
>>>>>> glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.cpu.x86_data_cache_size: 0x8000 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0xffffffff)
>>>>>> glibc.pthread.mutex_spin_count: 100 (min: 0, max: 32767)
>>>>>> glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0xffffffff)
>>>>>> glibc.malloc.check: 0 (min: 0, max: 3)
>>>>>> $
>>>>>>
>>>>>> Ok for master?
>>>>>>
>>>>>
>>>>> Here is the updated patch.  To support --list-tunables, a target should add
>>>>>
>>>>> CPPFLAGS-version.c = -DLIBC_MAIN=__libc_main_body
>>>>> CPPFLAGS-libc-main.S = -DLIBC_MAIN=__libc_main_body
>>>>>
>>>>> and start.S should be updated to define __libc_main and call
>>>>> __libc_main_body:
>>>>>
>>>>> extern void __libc_main_body (int argc, char **argv)
>>>>>   __attribute__ ((noreturn, visibility ("hidden")));
>>>>>
>>>>> when LIBC_MAIN is defined.
>>>>
>>>> I like where this patch is going, but the __libc_main wiring up means
>>>> we'll have to delay this until glibc 2.33 opens for development and
>>>> give the architectures time to fill in the required pieces of assembly.
>>>>
>>>> Can we split this into:
>>>>
>>>> (a) Minimum required to implement the feature e.g. just the tunable without
>>>>     my requested changes.
>>>>
>>>> (b) A second patch which implements the --list-tunables that users can
>>>>     then use to know what the values they can choose are.
>>>>
>>>> That way we can commit (a) right now, and then commit (b) when we
>>>> reopen for development?
>>>>
>>>
>>> Like this?
>>
>> Almost.
>>
>> Why do we still use a constructor?
>>
>> Why don't we accurately set the min and max?
>>
>> +#if HAVE_TUNABLES
>> +  TUNABLE_UPDATE (x86_non_temporal_threshold, long int,
>> +		  __x86_shared_non_temporal_threshold, 0,
>> +		  (long int) -1);
>> +  TUNABLE_UPDATE (x86_rep_movsb_threshold, long int,
>> +		  __x86_rep_movsb_threshold,
>> +		  minimum_rep_movsb_threshold, (long int) -1);
>> +  TUNABLE_UPDATE (x86_rep_stosb_threshold, long int,
>> +		  __x86_rep_stosb_threshold, 0, (long int) -1);
>>
>> A min and max of 0 and -1 respectively could have been set in the tunables
>> list file and are not dynamic?
>>
>> I'd expect your patch would do everything except actually implement
>> --list-tunables.
> 
> Here is the followup patch which does it.
> 
>>
>> We need a manual page, and I accept that showing a "lower value" will
>> have to wait for --list-tunables.
>>
>> Otherwise the patch is looking ready.
> 
> 
> Are these 2 patches OK for trunk?

Could you please post the patches in a distinct thread with a clear
subject, that way I know exactly what I'm applying and testing.
I'll review those ASAP so we can get something in place.

-- 
Cheers,
Carlos.



More information about the Libc-alpha mailing list