Cygwin, Unicode and "long" path names

Vadim vad@syping.de
Sat Jun 26 01:53:29 GMT 2021


Ah, this beautiful topic. Windows 7 x64.

This is the summary written as post-scriptum, tests and findings below:

1) Cygwin limits individual names to 255 bytes, Windows seems to follow 
UTF-16 chars and work fine: 256 bytes in 108 characters works.

Basically, this becomes a bytes vs characters story.

2) Bash file name auto-expansion detects the file of that name, but it 
gets truncated to 255 bytes. find's behaviour is the same ("No such file 
or directory" due to trying to access a non-existing truncated name)

2.1) If you try to correct the above mistake by adding truncated 
characters, then the program (cat) will complain about "File name too long"

2.2) If there exists a folder with a 255-byte name, equal to the 
truncated name, then "find ." will do a listing on that folder twice 
(effectively hiding the long-named folder from tools without leaving an 
error message)

3) UNC Paths get the same treatment: File name too long.

I expected Cygwin to handle these names without problems just like 
Windows, Explorer, cmd etc. do. Is this particular problem new or known? 
All I could find on the mailing list is around the time when Cygwin 
hadn't yet implemented Unicode support (UTF-8?), ~2004-2008.

These names were created by youtube-dl.exe executed from within Cygwin.

- Vadim

---

This file name is 255 bytes long and works:

s123點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt

This is 256 bytes and works perfectly normal in Windows (explorer, can 
paste and "dir <name>" in cmd despite showing [] block chars), but not 
Cygwin terminal (I used s123/s1234 as a prefix for easy auto-expansion):

s1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt


If I try to use tab-expansion in the terminal (mintty, bash) the problem 
becomes apparent ("xt" missing at the end):

$ cat s1234點半蘋果新聞報道\ 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ 
O記轉介律政司︱新巴車長被判不小心駕駛罪成 ︱深圳賽格大樓離奇劇晃\ 
民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.t
cat: 's1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.t': No such file or directory


However, with one fewer byte it expands properly:

$ cat s123點半蘋果新聞報道\ 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃\ 
民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.txt
hello


MAX_PATH? Yes, 255 bytes. Why then does the full file/folder name work 
in Windows? This is the full name (a folder), 257 bytes:

20210518_9點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞

And it can get longer! In fact, I can bump the total path to 396 bytes 
or "Column 249" as Notepad++ counts the characters (individual folder 
name is 359b or 211 chars, "column 212"):

D:/abcdefgh/Local_TEMP/cygwinunicode/1_123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789020210518_9點半蘋果新聞報道 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞


NTFS allows up to 255 UTF-16 for an individual path segment and this 
seems to align with the Windows tooling: cmd and Explorer can browse 
these fine, but the included file in the folder spills beyond the limit 
and you run into the usual 'total path too long' problem).

Whether you manually add the missing "xt" to the tab-completion or use 
UNC paths, the result is the same:

$ cat s1234點半蘋果新聞報道\ 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ 
O記轉介律政司︱新巴車長被判不小心駕駛罪成 ︱深圳賽格大樓離奇劇晃\ 
民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.txt
cat: 's1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt': File name too long
$ cat '\\?\D:\abcdefgh\Local_TEMP\cygwinunicode\20210518_9點半蘋果新聞報道 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt'
cat: '\\?\D:\abcdefgh\Local_TEMP\cygwinunicode\20210518_9點半蘋果新聞報道 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt': File name too long



More information about the Cygwin mailing list