[PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.

Thu Jul 9 13:46:02 GMT 2020

Hi Richard,

Thanks for the review,

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Thursday, July 9, 2020 1:35 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.
> 
> Tamar Christina <tamar.christina@arm.com> writes:
> > Hi All,
> >
> > This patch fixes a couple of issues in AArch64's -mcpu=native processing:
> >
> > The buffer used to read the lines from /proc/cpuinfo is 128 bytes
> > long.  While this was enough in the past with the increase in architecture
> extensions it is
> > no longer enough.   It results in two bugs:
> >
> > 1) No option string longer than 127 characters is correctly parsed.  Features
> >    that are supported are silently ignored.
> >
> > 2) It incorrectly enables features that are not present on the machine:
> >   a) It checks for substring matching instead of full word matching.  This
> makes
> >      it incorrectly detect sb support when ssbs is provided instead.
> >   b) Due to the truncation at the 127 char border it also incorrectly enables
> >      features due to the full feature being cut off and the part that is left
> >      accidentally enables something else.
> >
> > This breaks -mcpu=native detection on some of our newer system.
> >
> > The patch fixes these issues by reading full lines up to the \n in a string.
> > This gives us the full feature line.  Secondly it creates a set from
> > this string
> > to:
> >
> >  1) Reduce matching complexity from O(n*m) to O(n*logm).
> >  2) Perform whole word matching instead of substring matching.
> >
> > To make this code somewhat cleaner I also changed from using char* to
> > using std::string and std::set.
> >
> > Note that I have intentionally avoided the use of ifstream and
> > stringstream to make it easier to backport.  I have also not change
> > the substring matching for the initial line classification as I cannot
> > find a documented cpuinfo format which leads me to believe there may
> > be kernels out there that require this which may be why the original code
> does this.
> >
> > I also do not want this to break if the kernel adds a new line that is
> > long and indents the file by two tabs to keep everything aligned.  In
> > short I think an imprecise match is the right thing here.
> >
> > Test for this is added as the last thing in this series as it requires
> > some changes to be made to be able to test this.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Sorry to be awkward.  I know Kyrill's already approved this, but I kind-of
> object.
> 
> I think we should split this into two: fixing the limit bug, and fixing the
> complexity.  It's easier to justify backporting the fix for the limit bug than the
> fix for the complexity.

It was never the intention to fix the complexity. The change in complexity is just
simply because I split the feature strings into actual words now. I do so because
in my opinion this is simpler than doing memcmp and checking the the character
after the one you matched is a space or a null character, and then checking that you are
one character away from the previous space or at the start of the line.

Do you know of a simpler way to do word matches in C?

> 
> For fixing the limit bug, it seems better to use an obstack to read the file
> instead.  This should actually be much easier than what the patch does. :-)  It
> should also be more efficient.
> 

Sorry.. I genuinely don't understand. I had looked at obstack and the interface seemed to be
more work to me.  I am probably wrong since this interface has zero documentation
but it looks like to use obstack I have to (based on guessing from what other code are doing)

1. call gcc_obstack_init
2. call obstack_alloc
3. call obstack_grow any time I need a bigger buffer.
4. call obstack_free

So I am missing why this is simpler than calling realloc... 
But obviously I am missing something.

> For fixing the complexity: does this actually make things noticeably faster in
> practice?  The “m” in the O(n*m) is a fairly small constant.
> The cost of making it O(n*logm) this way involves many more memory
> allocations, so it isn't obvious that it's actually more efficient in practice.  Do
> you have any timings?

Well, no.. I didn't run any benchmarks. Did you have any in mind that you wanted me to run?

As I mentioned, the complexity change was just a byproduct of doing a much easier to see correct
word matching.  But if you want me to just add extra checks to the existing implementation to see
if it meets all the criteria of being a standalone word then sure I can do that.  I don't particularly find
that implementation easier to read and I would have thought correctness was to be favored over memory
allocation in something that is called once.

> 
> If search time is a problem, building a hash_map might be better.

I would have thought the red-black tree of a set is fine.. But if I'm only allowed to use GCC internal structures then
I will change it...

Thanks,
Tamar

> 
> Thanks,
> Richard