git diff --color-words doesn't work properly

Marc Aldorasi marc@groundctl.com
Tue Dec 13 18:22:00 GMT 2016


When a .gitattributes file specifies a diff and the locale is utf8,
"git diff --color-words" fails with the message "fatal: Invalid
regular expression
[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lLuU]*|[-+*/<>%&^|=!]=|--|\+\+|<<=?|>>=?|&&|\|\||::|->\*?|\.\*|[^[:space:]]|[<C0>-<FF>][<80>-<BF>]+".
This does not happen with Git for Windows.  To reproduce it, run the
following commands in an empty directory:

git init
echo "* diff=cpp" > .gitattributes
git add .gitattributes
# This works
LC_ALL=C git diff --staged --color-words
# This fails
LC_ALL=en_US.UTF-8 git diff --staged --color-words
# It also fails if the locale is set to any other utf8 locale (e.g.
en_GB.UTF-8, ja_JP.UTF-8, etc).

The issue appears to be in regcomp.c's wgetnext function, which calls
mbrtowc, which fails because the regex isn't valid utf-8.

The easy fix is probably to either remove the non-ASCII characters
from that regex (it's defined in git's userdiff.c) or change it to a
unicode codepoint range (i.e. U+0080-U+10FFFF), but I don't know if
that would break anything else.

The attached cygcheck.out has my email address redacted, but is
otherwise unmodified.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygcheck.out
Type: application/octet-stream
Size: 72148 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin/attachments/20161213/ffb88d68/attachment.obj>
-------------- next part --------------

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


More information about the Cygwin mailing list