Proposal: format GDB Python files with black

Tue May 11 11:38:18 GMT 2021

On 5/10/21 11:55 PM, Simon Marchi via Gdb-patches wrote:
> Changing the subject, so this gets more visibility.
> 
> On 2021-05-08 12:00 p.m., Tom Tromey wrote:
>>>>>>> "Simon" == Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:
>>
>> Simon> A few people I talked to about this and I have good experience
>> Simon> with the tool black to auto-format Python code.
>>
>> Inspired by this, I took another look at clang-format.  If we could use
>> this, I think it would be a decent productivity improvement, because it
>> would mean contributors would no longer need to worry about the minutiae
>> of the GNU style.  We could also stop reviewing this.
> 
> I would love this.
> 

I'm fine with using a tool to autoformat things. I wouldn't mind getting 
used to a slightly different formatting scheme and I feel it would save 
me quite a bit of time by not having to bother with little formatting 
details.

>> My github has my current hacks, on the t/clang-format branch.  This
>> branch just has .clang-format, I didn't reformat the sources.  I'm using
>> clang-format 10.0.1.
>>
>> I managed to tweak the penalty values in the format to avoid the
>> bin-packing problem I was seeing previously.  I haven't extensively
>> tested this (I usually only reformat gdb_bfd.c and look at it, since it
>> exercised several bugs before...), so I recommend that others play with
>> it a bit.
> 
> I formatted the whole code base and checked a few files in Meld.  In
> general, I would say that it does more good (fixing little mistakes,
> finding split lines that could actually be one line) than harm.  I
> didn't see anything outrageously badly formatted (and if there is, we
> always have the option to disable formatting for a section of a file).
> 
> There are a lot of changes that are neutral, where the before and after
> would be equally good.
> 
> And of course, some changes where it does things not exactly like we
> would, like:
> 
>    gcc_type
>    compile_cplus_instance::convert_reference_base
>      (gcc_type base, enum gcc_cp_ref_qualifiers rquals)
> 
> which becomes:
> 
>    gcc_type
>    compile_cplus_instance::convert_reference_base (
>      gcc_type base, enum gcc_cp_ref_qualifiers rquals)
> 
> But really, I don't think these should be blockers.  It would take
> someone some really bad faith to say that the above changes how they
> read the code.
> 
> Is the following what you don't like?  In gdbsupport/common-debug.h,
> before:
> 
> 		debug_prefixed_printf (m_module, m_func, "%s: <%s debugging was not enabled on entry>",
> 				       m_end_prefix, m_module);
> 
> after:
> 
> 		debug_prefixed_printf (
> 		  m_module, m_func,
> 		  "%s: <%s debugging was not enabled on entry>", m_end_prefix,
> 		  m_module);
> 
> Note: the current line is too long, my bad.
> 
> I have only seen this when we make clang-format's life difficult by
> using long names or long string literals.  If we don't like how it
> formats some piece of code, we always have the choice of tweaking it to
> make its like easier.  It can be a good motivation to try to reduce
> indentation levels by splitting the code into functions, for example.
> 
> I'm surprised you didn't use "UseTab: Always", that gives pretty much
> our scheme of tabs + spaces for indentation.  By default it uses just
> spaces (I wouldn't mind using either, but since the goal is to get
> something close to our current style).
> 
> I would use "IndentPPDirectives: AfterHash".  Unlike what we use
> currently, it will indent with $IndentWidth columns, whereas we
> typically we indent these with a single column when we do it manually.
> But, I think I prefer 2 columns actually.  You can try formatting
> gdbsupport/gdb_locale.h for a good example.
> 
> Speaking of which, in order to be usable from gdbsupport and gdbserver,
> .clang-format would need to be in the top-level directory.  Or maybe it
> could be in gdbsupport, and gdb/gdbserver symlink to it.
> 
> We have a number of places that put the namespace brace on the same
> line:
> 
>    namespace foo {
> 
> We could keep that by using "BreakBeforeBraces: Custom".  But I don't
> think it's worth the trouble.  It adds one extra line for the opening
> brace, but it's typically not in a region of the file that you care much
> about.
> 
> I noticed this:
> 
>    btrace_block (CORE_ADDR begin, CORE_ADDR end) : begin (begin), end (end)
>    {
>      /* Nothing.  */
>    }
> 
> The initializer list is put on the same line as the constructor's
> prototype.  If there's a way to tell it that we don't want that, it
> would be closer to our style.
> 
> For AlignEscapedNewlines, I'd probably choose Right or DontAlign.  The
> reason being that if you modify the longest line of a macro (make that
> line longer or shorter), all the macros line will change as a result.
> So it will be harder to spot what changed when reading the hunk in the
> diff.  I have a slight preference for Right, because it puts the
> backslashes out of sight, thus improving readability.  It's the kind of
> thing that is quite annoying to maintain by hand, but it's not if a tool
> does it.
> 
> I would suggest setting SplitEmptyFunction, SplitEmptyRecord and
> SplitEmptyNamespace to false.  We already do it for empty methods.
> 
> Enough for now.
> 
>> I did see a few issues, though perhaps they are minor.
>>
>> 1. It removes all the Control-L page breaks from the source.
>>     I know these often confuse newcomers, so maybe it's fine.
>>     I feel that Someone out there will cheer this :)
> 
> Who dat?
> 
>>
>> 2. clang-format can't handle GNU-style goto label out-denting.
>>     There's a clang-format bug on this topic
>>     https://bugs.llvm.org/show_bug.cgi?id=24118
> 
> Probably not a big deal for us, as we don't use goto much?
> 
>>
>> 3. In gdb if we have a large 'if' condition that consists of expression
>>     chained with "&&" or "||", we often put each sub-condition on its own
>>     line.  clang-format packs them instead, and I didn't see a way to
>>     counteract this.
> 
> I hate too long conditions anyway.  If the condition spans too many
> lines or has deep parenthesis nesting, it would deserve to be refactored
> anyway.  Like, put that code in a function and just have:
> 
>    if (my_new_function (...))
> 
> With a descriptive name for the function, the code ends up more readable.
> 
>> 4. Gettext invocations like _("text") have a space added, like _ ("text").
>>     I didn't see a way to change this.
> 
> A very little inconvenience IMO, compared to not having to think about
> formatting.  We could live with this, and file a bug at the same time.
> There could be an option to add exceptions to SpaceBeforeParens.
> 
>> Another possible problem is that, unlike with 'black', ensuring that
>> everyone has the same version is a pain.
> 
> Indeed.  It's perhaps easy for me to say, because I get to choose what
> Linux distro and version I work on (so I opt for something recent), but
> I would still lean towards just following whatever the current latest
> stable version is.  There might be new options in the latest stable
> version we want to use, it would be nice not to have to wait years
> before we can use them.  And I suppose there's a not too painful way to
> get it for all the major distros out there (and for Windows and macOS,
> there are binary releases).  And you can always run it in Docker or
> something.

I suppose we could have scripts to automate this sort of task. Something 
that checks for the latest stable release and downloads it? A helper to 
make things easier and more consistent.

Can we do this server-side and take the burden off of the developers to 
have to do this as an additional step before pushing changes?

> 
> But we would probably encourage people to use git-clang-format, which
> only reformats the lines you touched.  So, it wouldn't be a big problem
> for people to use different versions, as it wouldn't create spurious
> changes elsewhere in the file.  The differences in formatting would be
> in lines you touch anyway. >
> Speaking of git-clang-format: from my experience, it is not super cool
> when the code base isn't already formatted with the target formatting
> style.  It causes discrepancies in style, because you have a mix of new
> style (whatever lines were touched) and old style (whatever lines were
> not touched) which can be annoying and distracting.  Or, it causes
> slightly bigger diffs than you would get otherwise.  For example, I
> ran git-clang-format on this commit here:
> 
>      https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=10e578d7e00d
> 
> ... to see what would have been the resulting patch had the author used
> git-clang-format.  This would have been the patch:
> 
>      https://pastebin.com/pxrZtnDU
> 
> If you check the changes in mi/mi-cmd-break.c, you can see that
> git-clang-format re-formatted the whole enum and array initializer.
> Which is nice in itself, the resulting formatting is nice.  But, the
> patch would be a bit harder to read.
> 
> The other option is to start by doing a big re-format and then tell
> people to use git-clang-format.  This way, there might be some small
> incoherences here and there due to different versions, but I think
> they'll be insignificant.
> 
> With the Python code, we did a massive re-format.  I don't think it did
> / will cause a big inconvenience, because there aren't many patches to
> the Python code.  But for the C++ code, it might be different, as there
> are more patches in everyone's pipeline that touch that code.
> 
> Another question is: would we re-format the C/C++ code in the testsuite?
> It might require a few adjustments to test cases (like there was one for
> black), but I would be fine with that.  My concern is more if there are
> some tests where the formatting of the code was specifically important.
> We could disable formatting for them, but the difficulty is to find
> them.

I think reformatting the test sources has the potential to break things 
on testcases with hardcoded assumptions. I wouldn't touch them at first.

> 
> Simon
>