Bug 12587 - sysconf(_SC_*CACHE) returns 0 for all caches on some CPUs.
Summary: sysconf(_SC_*CACHE) returns 0 for all caches on some CPUs.
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.13
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-15 18:48 UTC by John Haxby
Modified: 2014-06-27 13:40 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
A patch (848 bytes, patch)
2011-03-22 11:36 UTC, H.J. Lu
Details | Diff
A new patch (1.29 KB, patch)
2011-03-25 12:32 UTC, John Haxby
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Haxby 2011-03-15 18:48:58 UTC
I have a machine for which /proc/cpuinfo shows this:

  vendor_id       : GenuineIntel
  cpu family      : 6
  model           : 44
  model name      : Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
  stepping        : 2

and "getconf -a | grep CACHE" shows zero for all the matching entries.

For what its worth, __cpuid(2, eax, ebx, ecx, edx) puts 0x5b0b101,
0x5657f0, 0x0 and 0x2cb43049 in eax, ebx, ecx and edx respectively and
these do not correspond to any entries in intel_02_known[] in
sysdeps/x86_64/cacheinfo.c or sysdeps/i386/i686/cacheinfo.c.

The values in /sys/devices/system/cpu/cpuN/cache/index*/* seem to be
correct for this machine.
Comment 1 John Haxby 2011-03-16 12:00:21 UTC
My apologies: those CPUID results are for a different CPU, the actual problem ones are 0x55035a01 0xf0b2ff 0x0 0xca0000.
Comment 2 Ulrich Drepper 2011-03-20 12:15:41 UTC
I've implemented something without the ability to test it.  Check it and reopen if necessary.
Comment 3 H.J. Lu 2011-03-22 11:34:42 UTC
The fix doesn't work since the loop may not terminate.  It goes into an infinite
loop on Xeon X5670.  Also we should use __cpuid_count like the rest of
sysdeps/x86_64/cacheinfo.c.
Comment 4 H.J. Lu 2011-03-22 11:36:12 UTC
Created attachment 5322 [details]
A patch
Comment 5 Ulrich Drepper 2011-03-22 15:49:03 UTC
I've checked in a simpler patch.  The only problem was the missing increment.
Comment 6 John Haxby 2011-03-25 12:31:06 UTC
I moved the cpuid4 code to handle_intel() so that it is used whenever the cpuid4 is available.

This seems to be a better place to put the test because it avoids a cpuid2 which is not going to be useful and it also works on a greater range of hardware which makes it a bit easier to test (the Xeon X5670 I have access to is on a different continent).

I also replaced the asm with __cpuid_count: the previous version didn't use the macro and so didn't take into account the "#if defined(__i386__) && defined(__PIC__)" that the macro definition uses.
Comment 7 John Haxby 2011-03-25 12:32:42 UTC
Created attachment 5324 [details]
A new patch

Use cpuid4 in preference to cpuid2 when possible.
Comment 8 John Haxby 2011-03-30 15:30:06 UTC
Any comments on the updated patch, Ulrich?   I've had it working on  a Xeon 5670:

# getconf -a | grep -i cache
LEVEL1_ICACHE_SIZE                 32768
LEVEL1_ICACHE_ASSOC                4
LEVEL1_ICACHE_LINESIZE             64
LEVEL1_DCACHE_SIZE                 32768
LEVEL1_DCACHE_ASSOC                8
LEVEL1_DCACHE_LINESIZE             64
LEVEL2_CACHE_SIZE                  262144
LEVEL2_CACHE_ASSOC                 8
LEVEL2_CACHE_LINESIZE              64
LEVEL3_CACHE_SIZE                  12582912
LEVEL3_CACHE_ASSOC                 16
LEVEL3_CACHE_LINESIZE              64
LEVEL4_CACHE_SIZE                  0
LEVEL4_CACHE_ASSOC                 0
LEVEL4_CACHE_LINESIZE              0

and I've also carefully tested the cpuid4 vs cpuid2 thing on a variety of machines (including checking that the cpuid2 code still works).
Comment 9 John Haxby 2011-03-30 15:32:23 UTC
I think I forgot to mention: using cpuid4 will mean an end to updates to the intel_02_known table: any new processors support cpuid4 so even if there are cache descriptors for cpuid2 for them, you won't need to add them.
Comment 10 Witold Baryluk 2011-03-30 23:59:32 UTC
For your information, this error is the same error I reported in Debian some time ago:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=609389

Debian eglibc 2.13-0exp5, which includes Ulrich's patch, fixed problem in my case.

Thanks.
Comment 11 John Haxby 2011-03-31 09:18:08 UTC
(In reply to comment #10)
> For your information, this error is the same error I reported in Debian some
> time ago:
> 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=609389
> 
> Debian eglibc 2.13-0exp5, which includes Ulrich's patch, fixed problem in my
> case.

This is highly surprising.  That bug is for a machine whose cpuid level is 2 so the cpuid 4 instruction that is in both Ulrich's original patch and my variation on it would not be used.  If Ulrich's patch fixed this then you must be using it on some different processor.

That bug would appear to be as a result of missing entries in the intel_02_known table which would have been added for that processor quite some time ago.

The code re-ordering in my variation of Ulrich's fix has the advantage that the intel_02_known table need never be updated again (assuming that it covers all the old CPUs that have cpuid level < 4).  I know that some recent processorswith 24 way associative caches needed an update to this table, but those same processors would also have a cpuid level > 4 and so had we been using cpuid 4 deterministic cache parameters then the table would not have needed to be extended for them.  For that old Pentium M processor, though, none of the patches referred to here would have made any difference at all.   That's right isn't it Ulrich?
Comment 12 Ulrich Drepper 2011-04-02 00:13:38 UTC
Intel's documentation doesn't say that leaf 4 is guaranteed to contain the cache information if the largest standard function number is >= 4.  It only says the data is available that leaf can be queried and that if leaf 2 indicates that leaf 4 contains the data that it can be found there.

If you want the patch to be added you have to get Intel to clarify their documentation.
Comment 13 John Haxby 2011-04-05 13:50:23 UTC
(In reply to comment #12)
> If you want the patch to be added you have to get Intel to clarify their
> documentation.

I have three references for you: the commit for the corresponding code in the linux kernel and two extracts from the "Intel(R) 64 and IA-32 Architectures Software Developer's Manual, Volume 2A".

The original commit for the kernel code is in the historic kernel repo (git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git):

-------------------------------------------------------------------------
commit 7b502b56175499c472103e1d99346d3b5de7d53f
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Wed Mar 30 16:34:11 2005 -0800

    [PATCH] x86, x86_64: reading deterministic cache parameters and exporting
            it in /sysfs
    
    The attached patch adds support for using cpuid(4) instead of cpuid(2), to
    get CPU cache information in a deterministic way for Intel CPUs, whenever
    supported.  The details of cpuid(4) can be found here
    
    IA-32 Intel Architecture Software Developer's Manual (vol 2a)
    (http://developer.intel.com/design/pentium4/manuals/index_new.htm#sdm_vol2a)
    and
    Prescott New Instructions (PNI) Technology: Software Developer's Guide
    (http://www.intel.com/cd/ids/developer/asmo-na/eng/events/43988.htm)
    
    The advantage of using the cpuid(4) ('Deterministic Cache Parameters Leaf')
    are:
    
    - It provides more information than the descriptors provided by cpuid(2)
    
    - It is not table based as cpuid(2).  So, we will not need changes to the
      kernel to support new cache descriptors in the descriptor table (as is
      the case with cpuid(2)).
 
    The patch also adds a bunch of interfaces under
    /sys/devices/system/cpu/cpuX/cache, showing various information about the
    caches.  Most useful field being shared_cpu_map, which says what caches are
    shared among which logical cpus.
    
    The patch adds support for both i386 and x86-64.
    
    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-------------------------------------------------------------------------

I realise that this is merely a corroborating precedent rather than a definitive statement even though it did originate from Intel a few years ago.

However in my copy of the software developer's manual (Order Number: 253666-033US, December 2009) I find this in under the description of the CPUID instruction.  First on 3-320 describing the result from "INPUT EAX = 2: TLB/Cache/Prefetch Information Returned in EAX, EBX, ECX, EDX":

-------------------------------------------------------------------------
Note also a processor may report a general descriptor type (FFH) and not
report any byte descriptor of “cache type“ via CPUID leaf 2.
-------------------------------------------------------------------------

and under "INPUT EAX = 04H: Returns Deterministic Cache Parameters for Each Level":
-------------------------------------------------------------------------
Software can enumerate the deterministic cache parameters for each level
of the cache hierarchy starting with an index value of 0, until the
parameters report the value associated with the cache type field is 0.
-------------------------------------------------------------------------

Note that "deterministic" is used here to provide mean that the cache parameters are looked up directly rather than having to consult an external table (not parameters for a "deterministic cache").

This seems unambiguous: cpuid-4 enumerates each level of the cache hierarchy, not just some parts of it.  The 0xFF from cpuid-2 simply says that there is no cache information provided here (and indeed, the problematic Xeon 5670 that started this only has descriptors for TLBs).   The fact that the kernel has been using this scheme for six years now and that the original code was contributed by Intel corroborates this view.

H.J. Lu: can you respond if you disagree please?

Ulrich: in view, especially, of the second of the advantages cited by the commit above, can we have this in glibc please?
Comment 14 Ulrich Drepper 2011-04-06 11:30:03 UTC
None of that means at all that a maximum function number >= 4 implies that caches are guaranteed to be reported in leaf 4.  Why are you wasting my time if you have nothing to show but completely irrelevant data?  The current code works and unless Intel really screws up is guaranteed to work.
Comment 15 John Haxby 2011-04-06 12:20:32 UTC
(In reply to comment #14)
> None of that means at all that a maximum function number >= 4 implies that
> caches are guaranteed to be reported in leaf 4.  Why are you wasting my time if
> you have nothing to show but completely irrelevant data?  The current code
> works and unless Intel really screws up is guaranteed to work.

I'm sorry, Ulrich, but "Software can enumerate the deterministic cache parameters for each level of the cache hierarchy[...]" means exactly that; it does not mean that only some cache parameters are enumerated, now or in the future.  I know that you can spot ambiguity where I cannot (I have seen the evidence of that in the past, but I do not believe that you have seen something that the rest of us have missed this time.)

It does mean, however, that some current or planned CPU that does not have its cache parameters described by the intel_02_known[] table will fail whereas use of cpuid-4 will continue to work.

However, if you would prefer to check the intel_02_known[] table for each new cpu that Intel produce that it your choice, but personally I would prefer to fix this problem just once.
Comment 16 Ulrich Drepper 2011-04-09 15:10:35 UTC
There is nothing to fix here.  No further maintenance effort needed, nothing.  And your interpretation of what the manual says is far too optimistic.  Don't reopen this bug, the code works..
Comment 17 John Haxby 2011-04-09 15:23:50 UTC
(In reply to comment #8)
> [...]  I've had it working on  a Xeon
> 5670:
> 
> # getconf -a | grep -i cache
> LEVEL1_ICACHE_SIZE                 32768
> [...]

For other readers of this bug: this is the test result of the new patch: I did not test the one checked in at all.
Comment 18 Jackie Rosen 2014-02-16 17:44:59 UTC Comment hidden (spam)