[RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines

Christoph Muellner christoph.muellner@vrull.eu
Tue Feb 7 00:15:59 GMT 2023


From: Christoph Müllner <christoph.muellner@vrull.eu>

This RFC series introduces ifunc support for RISC-V and adds
optimized routines of memset(), memcpy()/memmove(), strlen(),
strcmp(), strncmp(), and cpu_relax().

The ifunc mechanism desides based on the following hart features:
- Available extensions
- Cache block size
- Fast unaligned accesses

Since we don't have an interface to get this information from the
kernel (at the moment), this patch uses environment variables instead,
which is also why this patch should not be considered for upstream
inclusion and is explicitly tagged as RFC.

The environment variables are:
- RISCV_RT_MARCH (e.g. "rv64gc_zicboz")
- RISCV_RT_CBOZ_BLOCKSIZE (e.g. "64")
- RISCV_RT_CBOM_BLOCKSIZE (e.g. "64")
- RISCV_RT_FAST_UNALIGNED (e.g. "1")

The environment variables are looked up and parsed early during
startup, where other architectures query similar properties from
the kernel or the CPU.
The ifunc implementation can use test macros to select a matching
implementation (e.g. HAVE_RV(zbb) or HAVE_FAST_UNALIGNED()).

The following optimized routines exist:
- memset
- memcpy/memmove
- strlen
- strcmp
- strncmp
- cpu_relax

The following optimizations have been applied:
- excessive loop unrolling
- Zbb's orc.b instruction
- Zbb's ctz intruction
- Zicboz/Zic64b ability to clear a cache block in memory
- Fast unaligned accesses (but with keeping exception guarantees intact)
- Fast overlapping accesses

The patch was developed more than a year ago and was tested as part
of a vendor SDK since then. One of the areas where this patchset
was used is benchmarking (e.g. SPEC CPU2017).
The optimized string functions have been tested with the glibc tests
for that purpose.

The first patch of the series does not strictly belong to this series,
but was required to build and test SPEC CPU2017 benchmarks.

To build a cross-toolchain that includes these patches,
the riscv-gnu-toolchain or any other cross-toolchain
builder can be used.

Christoph Müllner (19):
  Inhibit early libcalls before ifunc support is ready
  riscv: LEAF: Use C_LABEL() to construct the asm name for a C symbol
  riscv: Add ENTRY_ALIGN() macro
  riscv: Add hart feature run-time detection framework
  riscv: Introduction of ISA extensions
  riscv: Adding ISA string parser for environment variables
  riscv: hart-features: Add fast_unaligned property
  riscv: Add (empty) ifunc framework
  riscv: Add ifunc support for memset
  riscv: Add accelerated memset routines for RV64
  riscv: Add ifunc support for memcpy/memmove
  riscv: Add accelerated memcpy/memmove routines for RV64
  riscv: Add ifunc support for strlen
  riscv: Add accelerated strlen routine
  riscv: Add ifunc support for strcmp
  riscv: Add accelerated strcmp routines
  riscv: Add ifunc support for strncmp
  riscv: Add an optimized strncmp routine
  riscv: Add __riscv_cpu_relax() to allow yielding in busy loops

 csu/libc-start.c                              |   1 +
 elf/dl-support.c                              |   1 +
 sysdeps/riscv/dl-machine.h                    |  13 +
 sysdeps/riscv/ldsodefs.h                      |   1 +
 sysdeps/riscv/multiarch/Makefile              |  24 +
 sysdeps/riscv/multiarch/cpu_relax.c           |  36 ++
 sysdeps/riscv/multiarch/cpu_relax_impl.S      |  40 ++
 sysdeps/riscv/multiarch/ifunc-impl-list.c     |  70 +++
 sysdeps/riscv/multiarch/init-arch.h           |  24 +
 sysdeps/riscv/multiarch/memcpy.c              |  49 ++
 sysdeps/riscv/multiarch/memcpy_generic.c      |  32 ++
 .../riscv/multiarch/memcpy_rv64_unaligned.S   | 475 ++++++++++++++++++
 sysdeps/riscv/multiarch/memmove.c             |  49 ++
 sysdeps/riscv/multiarch/memmove_generic.c     |  32 ++
 sysdeps/riscv/multiarch/memset.c              |  52 ++
 sysdeps/riscv/multiarch/memset_generic.c      |  32 ++
 .../riscv/multiarch/memset_rv64_unaligned.S   |  31 ++
 .../multiarch/memset_rv64_unaligned_cboz64.S  | 217 ++++++++
 sysdeps/riscv/multiarch/strcmp.c              |  47 ++
 sysdeps/riscv/multiarch/strcmp_generic.c      |  32 ++
 sysdeps/riscv/multiarch/strcmp_zbb.S          | 104 ++++
 .../riscv/multiarch/strcmp_zbb_unaligned.S    | 213 ++++++++
 sysdeps/riscv/multiarch/strlen.c              |  44 ++
 sysdeps/riscv/multiarch/strlen_generic.c      |  32 ++
 sysdeps/riscv/multiarch/strlen_zbb.S          | 105 ++++
 sysdeps/riscv/multiarch/strncmp.c             |  44 ++
 sysdeps/riscv/multiarch/strncmp_generic.c     |  32 ++
 sysdeps/riscv/multiarch/strncmp_zbb.S         | 119 +++++
 sysdeps/riscv/sys/asm.h                       |  14 +-
 .../unix/sysv/linux/riscv/atomic-machine.h    |   3 +
 sysdeps/unix/sysv/linux/riscv/dl-procinfo.c   |  62 +++
 sysdeps/unix/sysv/linux/riscv/dl-procinfo.h   |  46 ++
 sysdeps/unix/sysv/linux/riscv/hart-features.c | 356 +++++++++++++
 sysdeps/unix/sysv/linux/riscv/hart-features.h |  58 +++
 .../unix/sysv/linux/riscv/isa-extensions.def  |  72 +++
 sysdeps/unix/sysv/linux/riscv/libc-start.c    |  29 ++
 .../unix/sysv/linux/riscv/macro-for-each.h    |  24 +
 37 files changed, 2610 insertions(+), 5 deletions(-)
 create mode 100644 sysdeps/riscv/multiarch/Makefile
 create mode 100644 sysdeps/riscv/multiarch/cpu_relax.c
 create mode 100644 sysdeps/riscv/multiarch/cpu_relax_impl.S
 create mode 100644 sysdeps/riscv/multiarch/ifunc-impl-list.c
 create mode 100644 sysdeps/riscv/multiarch/init-arch.h
 create mode 100644 sysdeps/riscv/multiarch/memcpy.c
 create mode 100644 sysdeps/riscv/multiarch/memcpy_generic.c
 create mode 100644 sysdeps/riscv/multiarch/memcpy_rv64_unaligned.S
 create mode 100644 sysdeps/riscv/multiarch/memmove.c
 create mode 100644 sysdeps/riscv/multiarch/memmove_generic.c
 create mode 100644 sysdeps/riscv/multiarch/memset.c
 create mode 100644 sysdeps/riscv/multiarch/memset_generic.c
 create mode 100644 sysdeps/riscv/multiarch/memset_rv64_unaligned.S
 create mode 100644 sysdeps/riscv/multiarch/memset_rv64_unaligned_cboz64.S
 create mode 100644 sysdeps/riscv/multiarch/strcmp.c
 create mode 100644 sysdeps/riscv/multiarch/strcmp_generic.c
 create mode 100644 sysdeps/riscv/multiarch/strcmp_zbb.S
 create mode 100644 sysdeps/riscv/multiarch/strcmp_zbb_unaligned.S
 create mode 100644 sysdeps/riscv/multiarch/strlen.c
 create mode 100644 sysdeps/riscv/multiarch/strlen_generic.c
 create mode 100644 sysdeps/riscv/multiarch/strlen_zbb.S
 create mode 100644 sysdeps/riscv/multiarch/strncmp.c
 create mode 100644 sysdeps/riscv/multiarch/strncmp_generic.c
 create mode 100644 sysdeps/riscv/multiarch/strncmp_zbb.S
 create mode 100644 sysdeps/unix/sysv/linux/riscv/dl-procinfo.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/dl-procinfo.h
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hart-features.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hart-features.h
 create mode 100644 sysdeps/unix/sysv/linux/riscv/isa-extensions.def
 create mode 100644 sysdeps/unix/sysv/linux/riscv/libc-start.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/macro-for-each.h

-- 
2.39.1



More information about the Libc-alpha mailing list