Maybe we should get rid of ifuncs

Michael Meissner meissner@linux.ibm.com
Thu May 2 02:59:37 GMT 2024


On Sat, Apr 27, 2024 at 07:24:05PM -0500, Peter Bergner wrote:
> On 4/24/24 9:43 AM, Zack Weinberg wrote:
> > I'm very curious what the plan for function multiversioning in GCC
> > and LLVM is, and how close to declarative it gets.
> 
> GCC (at least on powerpc) already supports it via the target_clones
> attribute.  See gcc/testsuite/gcc.target/powerpc/clone*.c for examples.
> Basically, it looks like (from clone3.c):
> 
> __attribute__((target_clones("cpu=power10,cpu=power9,default")))
> long mod_func (long a, long b)
> {
>   return (a % b) + s;
> }
> 
> long mod_func_or (long a, long b, long c)
> {
>   return mod_func (a, b) | c;
> }
> 
> 
> Mike knows how this works better than I, but GCC automatically emits an
> ifunc resolver for the different clones and looks to use the HWCAP*
> architecture mask associated with the cpu we're compiling for.
> The "default" function being called in the case our ifunc resolver
> doesn't match any of the HWCAP* masks from the cpus we're compiling
> for.

Sorry, I've been in and out of the hospital with my wife.

> Mike, it seems like this is more of a "cpu" clone and not a true HWCAP
> test, so this specific thing doesn't (at least currently) work for
> something like __attribute__((target_clones("vsx,mma,default"))) ?
> Or did I misread the code?

There are 3 things GCC provides:

1) Is the ability to write an ifunc function.  Any call to func is always
indirect.  The loader calls resolver at program/shared library load to get the
address of the function to use:

	extern int func_power10 (void);
	extern int func_power9 (void);
	extern int func_default (void);

	int func (void) __attribute__ ((__ifunc__ ("resolver")));

	void *
	resolver (void)
	{
	  if (__builtin_cpu_supports ("arch_3_1"))
	    return (void *) func_power10;

	  else if (__builtin_cpu_supports ("arch_3_00"))
	    return (void *) func_power9;

	  else
	    return (void *) func_default;
	}

2) The ability to change the target defaults for a particular function:

	int func_power10 (void) __attribute__((__target__("cpu=power10")));

	int func_power10 (void)
	{
	  // this function will be compiled for power10
	}

GCC allows the stuff inside __attribute__ to have 2 prefix underscores and 2
suffix underscores or not.  I prefer to always use the underscore prefixes and
suffixes just in case the user defined a 'target' macro (i.e. the stuff within
attributes is subject to macro replacement).

An alternative is to use #pragmas to change the defaults for a bit:

	#pragma GCC push_options
	#pragma GCC target ("cpu=power10")

	int func_power10 (void)
	{
	  // compiled with power10 options
	}

	#pragma GCC target ("cpu=power9")

	int func_power9 (void)
	{
	  // compiled with power9 options
	}

	#pragma GCC pop_options

	int func_default (void)
	{
	  // compiled with the default options
	}


3) The ability to use target clones, where the compiler constructs the ifunc
function, and recompiles the function multiple times with different target
defaults.

	extern int func (void)
	  __attribute__((__target_clones__("cpu=power10,cpu=power9,default")));

	int func (void) {
	  // 3 versions of func are compiled along with an ifunc resolver.
	}

Note, 'default' must always be listed in the target clones.  You can only
specify one option (i.e. you can't do something like compile -mcpu=power9 and
-mtune=power10 into one option).  So in practice, only -mcpu=<xxx> options are
useful.

If we need better fine grained support, we could have -mcpu options that adds
or subtracts the options.

The automatic ifunc only looks at hwcap/hwcap2 bits, and it sorts it so that it
checks for power10 first, etc.  At present, we have target clone support for:

	power6
	power7
	power8
	power9
	power10

Note since there is no real hwcap bit for power11, with my current patches for
power11, if you do:

	extern int func (void)
	  __attribute__((__target_clones__("cpu=power11,cpu=power10,cpu=power9,default")));

it will compile both power11 and power10 clones, but the resolver will only
call the power10 clone because we don't have a separate hwcap bit for power11
(that I know of).  If we do have a separate hwcap bit, it is easy to add
support for power11.

Now one thing that I thought had been done, but it appears no longer being done
is that the #ifdefs (i.e. _ARCH_PWR10, etc.) aren't changed when compiling the
target clone.

> 
> I'll note I'm pretty sure we (IBM/powerpc) have added ifunc usage to
> OpenBLAS and some other libraries outside of glibc.
> 
> 
> Peter
> 
> 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com


More information about the Libc-alpha mailing list