[PATCH] x86: Add thresholds for "rep movsb/stosb" to tunables

H.J. Lu hjl.tools@gmail.com
Fri Jul 3 17:53:43 GMT 2020


On Fri, Jul 3, 2020 at 10:43 AM Carlos O'Donell <carlos@redhat.com> wrote:
>
> On 7/3/20 12:54 PM, H.J. Lu wrote:
> > On Fri, Jul 03, 2020 at 12:14:01PM -0400, Carlos O'Donell wrote:
> >> On 7/2/20 3:08 PM, H.J. Lu wrote:
> >>> On Thu, Jul 02, 2020 at 02:00:54PM -0400, Carlos O'Donell wrote:
> >>>> On 6/6/20 5:51 PM, H.J. Lu wrote:
> >>>>> On Fri, Jun 5, 2020 at 3:45 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>>>>>
> >>>>>> On Thu, Jun 04, 2020 at 02:00:35PM -0700, H.J. Lu wrote:
> >>>>>>> On Mon, Jun 1, 2020 at 7:08 PM Carlos O'Donell <carlos@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>> On Mon, Jun 1, 2020 at 6:44 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>>>>>>>> Tunables are designed to pass info from user to glibc, not the other
> >>>>>>>>> way around.  When __libc_main is called, init_cacheinfo is never
> >>>>>>>>> called.  I can call init_cacheinfo from __libc_main.  But there is no
> >>>>>>>>> interface to update min and max values from init_cacheinfo.  I don't
> >>>>>>>>> think --list-tunables will work here without changes to tunables.
> >>>>>>>>
> >>>>>>>> You have a dynamic threshold.
> >>>>>>>>
> >>>>>>>> You have to tell the user what that minimum is, otherwise they can't
> >>>>>>>> use the tunable reliably.
> >>>>>>>>
> >>>>>>>> This is the first instance of a min/max that is dynamically determined.
> >>>>>>>>
> >>>>>>>> You must fetch the cache info ahead of the tunable initialization, that
> >>>>>>>> is you must call init_cacheinfo before __init_tunables.
> >>>>>>>>
> >>>>>>>> You can initialize the tunable data dynamically like this:
> >>>>>>>>
> >>>>>>>> /* Dynamically set the min and max of glibc.foo.bar.  */
> >>>>>>>> tunable_id_t id = TUNABLE_ENUM_NAME (glibc, foo, bar);
> >>>>>>>> tunable_list[id].type.min = lowval;
> >>>>>>>> tunable_list[id].type.max = highval;
> >>>>>>>>
> >>>>>>>> We do something similar for maybe_enable_malloc_check.
> >>>>>>>>
> >>>>>>>> Then once the tunables are parsed, and the cpu features are loaded
> >>>>>>>> you can print the tunables, and the printed tunables will have meaningful
> >>>>>>>> min and max values.
> >>>>>>>>
> >>>>>>>> If you have circular dependency, then you must process the cpu features
> >>>>>>>> first without reading from the tunables, then allow the tunables to be
> >>>>>>>> initialized from the system, *then* process the tunables to alter the existing
> >>>>>>>> cpu feature settings.
> >>>>>>>>
> >>>>>>>
> >>>>>>> How about this?  I got
> >>>>>>>
> >>>>>>
> >>>>>> Here is the updated patch, which depends on
> >>>>>>
> >>>>>> https://sourceware.org/pipermail/libc-alpha/2020-June/114820.html
> >>>>>>
> >>>>>> to add "%d" support to _dl_debug_vdprintf.  I got
> >>>>>>
> >>>>>> $ ./elf/ld.so ./libc.so --list-tunables
> >>>>>> glibc.elision.skip_lock_after_retries: 3 (min: -2147483648, max: 2147483647)
> >>>>>> glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.malloc.perturb: 0 (min: 0, max: 255)
> >>>>>> glibc.cpu.x86_shared_cache_size: 0x100000 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.elision.tries: 3 (min: -2147483648, max: 2147483647)
> >>>>>> glibc.elision.enable: 0 (min: 0, max: 1)
> >>>>>> glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.elision.skip_lock_busy: 3 (min: -2147483648, max: 2147483647)
> >>>>>> glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.cpu.x86_non_temporal_threshold: 0x600000 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.cpu.x86_shstk:
> >>>>>> glibc.cpu.hwcap_mask: 0x6 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.malloc.mmap_max: 0 (min: -2147483648, max: 2147483647)
> >>>>>> glibc.elision.skip_trylock_internal_abort: 3 (min: -2147483648, max: 2147483647)
> >>>>>> glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.cpu.x86_ibt:
> >>>>>> glibc.cpu.hwcaps:
> >>>>>> glibc.elision.skip_lock_internal_abort: 3 (min: -2147483648, max: 2147483647)
> >>>>>> glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0xffffffff)
> >>>>>> glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.cpu.x86_data_cache_size: 0x8000 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0xffffffff)
> >>>>>> glibc.pthread.mutex_spin_count: 100 (min: 0, max: 32767)
> >>>>>> glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0xffffffff)
> >>>>>> glibc.malloc.check: 0 (min: 0, max: 3)
> >>>>>> $
> >>>>>>
> >>>>>> Ok for master?
> >>>>>>
> >>>>>
> >>>>> Here is the updated patch.  To support --list-tunables, a target should add
> >>>>>
> >>>>> CPPFLAGS-version.c = -DLIBC_MAIN=__libc_main_body
> >>>>> CPPFLAGS-libc-main.S = -DLIBC_MAIN=__libc_main_body
> >>>>>
> >>>>> and start.S should be updated to define __libc_main and call
> >>>>> __libc_main_body:
> >>>>>
> >>>>> extern void __libc_main_body (int argc, char **argv)
> >>>>>   __attribute__ ((noreturn, visibility ("hidden")));
> >>>>>
> >>>>> when LIBC_MAIN is defined.
> >>>>
> >>>> I like where this patch is going, but the __libc_main wiring up means
> >>>> we'll have to delay this until glibc 2.33 opens for development and
> >>>> give the architectures time to fill in the required pieces of assembly.
> >>>>
> >>>> Can we split this into:
> >>>>
> >>>> (a) Minimum required to implement the feature e.g. just the tunable without
> >>>>     my requested changes.
> >>>>
> >>>> (b) A second patch which implements the --list-tunables that users can
> >>>>     then use to know what the values they can choose are.
> >>>>
> >>>> That way we can commit (a) right now, and then commit (b) when we
> >>>> reopen for development?
> >>>>
> >>>
> >>> Like this?
> >>
> >> Almost.
> >>
> >> Why do we still use a constructor?
> >>
> >> Why don't we accurately set the min and max?
> >>
> >> +#if HAVE_TUNABLES
> >> +  TUNABLE_UPDATE (x86_non_temporal_threshold, long int,
> >> +              __x86_shared_non_temporal_threshold, 0,
> >> +              (long int) -1);
> >> +  TUNABLE_UPDATE (x86_rep_movsb_threshold, long int,
> >> +              __x86_rep_movsb_threshold,
> >> +              minimum_rep_movsb_threshold, (long int) -1);
> >> +  TUNABLE_UPDATE (x86_rep_stosb_threshold, long int,
> >> +              __x86_rep_stosb_threshold, 0, (long int) -1);
> >>
> >> A min and max of 0 and -1 respectively could have been set in the tunables
> >> list file and are not dynamic?
> >>
> >> I'd expect your patch would do everything except actually implement
> >> --list-tunables.
> >
> > Here is the followup patch which does it.
> >
> >>
> >> We need a manual page, and I accept that showing a "lower value" will
> >> have to wait for --list-tunables.
> >>
> >> Otherwise the patch is looking ready.
> >
> >
> > Are these 2 patches OK for trunk?
>
> Could you please post the patches in a distinct thread with a clear
> subject, that way I know exactly what I'm applying and testing.
> I'll review those ASAP so we can get something in place.
>

Done:

https://sourceware.org/pipermail/libc-alpha/2020-July/115759.html

-- 
H.J.


More information about the Libc-alpha mailing list