This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] aarch64: Hoist ZVA check out of the memset function


On Wednesday 04 October 2017 09:39 PM, Szabolcs Nagy wrote:
>> +/* Memset entry point.  */
>> +ENTRY_ALIGN (MEMSET, 6)
>>  
>>  	DELOUSE (0)
>>  	DELOUSE (2)
>> @@ -46,9 +147,9 @@ ENTRY_ALIGN (__memset, 6)
>>  	add	dstend, dstin, count
>>  
>>  	cmp	count, 96
>> -	b.hi	L(set_long)
>> +	b.hi	MEMSET_L(set_long)
>>  	cmp	count, 16
>> -	b.hs	L(set_medium)
>> +	b.hs	MEMSET_L(set_medium)
>>  	mov	val, v0.D[0]
>>  
>>  	/* Set 0..15 bytes.  */
>> @@ -68,9 +169,9 @@ ENTRY_ALIGN (__memset, 6)
>>  3:	ret
>>  
>>  	/* Set 17..96 bytes.  */
>> -L(set_medium):
>> +MEMSET_L(set_medium):
> 
> why do we need different label names for the different memsets?
> 
> i'd expect a localized change where there are 4 variants of a
> piece of code under different ifdefs.

All of the variants are built in the same source file (memset_zva) which
is why they need different sets of labels.  I chose that so that I don't
blow up the number of files, but I can split it into individual files
and get rid of this macro.

> now it's hard to see what's going on, e.g. how code alignment of
> various parts of memset is affected.

Code alignments are not the focus of this change, I'll fine tune those
in further iterations because they give minor comparative gains.

> i don't think we need this global,
> i'd just do mrs	dczid_el0 here.
> (this code should be unreachable on current cpus)

Fair enough.  FWIW, I test this routine unconditionally.

>> +# undef MEMSET
>> +# undef MEMSET_ZVA
>> +# undef MEMSET_L
>> +# define MEMSET __memset_zva_default
>> +# define MEMSET_ZVA 1
>> +# define MEMSET_L(label) L(label ## _zvadef)
>> +# include <sysdeps/aarch64/memset.S>
>> +#endif
> 
> i think this is not ok, these should be in different files.
> 
> you put code that is never used together in the same object file.
> 
> (ideally the ifunc code for various cpus should be grouped
> together by cpu and on different pages for each cpu so you
> dont even load the code at runtime that is not for your cpu)

Yeah but these are not different CPU types.  Anyway, I don't have a
strong opinion on this, so I'll just split it up.  It'll get rid of the
macro you got confused about too.

>> diff --git a/sysdeps/aarch64/multiarch/rtld-memset.S b/sysdeps/aarch64/multiarch/rtld-memset.S
>> new file mode 100644
>> index 0000000..969cf14
>> --- /dev/null
>> +++ b/sysdeps/aarch64/multiarch/rtld-memset.S
>> @@ -0,0 +1,25 @@
>> +/* Memset for aarch64 for use within the dynamic linker.
>> +   Copyright (C) 2017 Free Software Foundation, Inc.
>> +
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library.  If not, see
>> +   <http://www.gnu.org/licenses/>.  */
>> +
>> +#if IS_IN (rtld)
>> +# define MEMSET memset
>> +# define INTERNAL_MEMSET
>> +# define MEMSET_L(label) L(label)
>> +# include <sysdeps/aarch64/memset.S>
>> +#endif
> 
> is this performance critical?
> (i don't know if it's only used for early dynlinker startup code
> or all libc internal memsets)
> 
> if not then we don't need it to be different variant, it could be
> just a copy of the _nozva/_defaultzva case

This is used only in rtld, the libc internal one is __memset_generic.
It is not really performance critical, but given that I'll be including
one or the other anyway, I chose to include memset.S instead of
memset_nozva.S.  I can do the latter.

> from this code it's not clear to me that cpu_features->zva_size
> is always initialized.

cpu_features is static, so it is always initialized to zero.

Thanks, I'll post an updated version.

Siddhesh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]