Bug? wcsxfrm causing memory corruption
Duncan Roe
duncan_roe@acslink.net.au
Sun May 21 17:18:00 GMT 2017
On Wed, May 10, 2017 at 11:30:46AM +0200, Erik Bray wrote:
> Greetings--
>
> In the process of fixing the Python test suite on Cygwin I ran across
> one test that was consistently causing segfaults later on, not
> directly local to that test. The test involves wcsxfrm so that's
> where I focused my attention.
>
> The attached test demonstrates the bug. Given an output buffer of N
> wide characters, wcsxfrm will cause bytes beyond the destination size
> to be reversed. I believe it might actually be a bug in the underlying
> LCMapStringW workhorse (this is on Windows 10; have not tested other
> versions).
>
> According to its docs [1], the cchDest argument (size of the
> destination buffer) is treated as a *byte* count when using
> LCMAP_SORTKEY. However, for the purposes of applying the
> LCMAP_BYTEREV transformation it seems to be treating the output size
> (in bytes) as character count. So in the example I give, where the
> output sort key is 7 bytes (including the null terminator), it swaps
> *14* bytes--the bytes including the sort key as well as the next 7
> adjacent bytes. This is obviously a problem if the destination buffer
> is allocated out of some larger memory pool.
>
> This definitely has to be a bug, right? Or at least very poorly
> documented on MS's part. A workaround would either be to not use
> LCMAP_BYTEREV and just swap the bytes manually, or in a second call to
> LCMapStringW with LCMAP_BYTEREV and the correct character count...
>
> Thanks,
> Erik
>
>
> [1] https://msdn.microsoft.com/en-us/library/windows/desktop/dd318700(v=vs.85).aspx
> #include <stdlib.h>
> #include <stdio.h>
> #include <locale.h>
> #include <wchar.h>
> #include <string.h>
> #include <windows.h>
>
> #define SIZE 32
>
>
> void fill_bytes(uint8_t *a, int n) {
> int idx;
> for (idx=0; idx<n; idx++) {
> a[idx] = idx;
> }
> }
>
>
> void print_bytes(uint8_t *a, int n) {
> int idx;
> for (idx=0; idx<n; idx++) {
> printf("0x%02x ", ((uint8_t*)a)[idx]);
> if ((idx + 1) % 8 == 0) printf("\n");
> }
> }
>
> int main(void) {
> wchar_t *a, *b;
> uint8_t *aa;
> size_t ret;
> LCID collate_lcid;
> int idx;
> collate_lcid = 1033;
> b = L"b";
> a = (wchar_t*) malloc(SIZE);
> aa = (uint8_t*) a;
>
> setlocale(LC_ALL, "en_US.UTF-8");
>
> printf("using wcsxfrm:\n");
> fill_bytes(aa, SIZE);
> printf("before:\n");
> print_bytes(aa, SIZE);
> ret = wcsxfrm(a, b, 4);
> printf("after (%d):\n", ret);
> print_bytes(aa, SIZE);
>
> printf("\nusing LCMapStringW directly:\n");
> fill_bytes(aa, SIZE);
> printf("before:\n");
> print_bytes(aa, SIZE);
>
> ret = LCMapStringW(collate_lcid, LCMAP_SORTKEY | LCMAP_BYTEREV, b, -1, a, 8);
> printf("after (%d):\n", ret);
> print_bytes(aa, SIZE);
>
> printf("\nwithout LCMAP_BYTEREV:\n");
> fill_bytes(aa, SIZE);
> printf("before:\n");
> print_bytes(aa, SIZE);
>
> ret = LCMapStringW(collate_lcid, LCMAP_SORTKEY, b, -1, a, 8);
> printf("after (%d):\n", ret);
> print_bytes(aa, SIZE);
> free(a);
>
> return 0;
> }
Hi Erik,
I get
using wcsxfrm:
before:
0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
after (3):
0x09 0x0e 0x01 0x01 0x01 0x01 0x00 0x00
0x09 0x08 0x0b 0x0a 0x0d 0x0c 0x0e 0x0f
0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
using LCMapStringW directly:
before:
0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
after (7):
0x09 0x0e 0x01 0x01 0x01 0x01 0x07 0x00
0x09 0x08 0x0b 0x0a 0x0d 0x0c 0x0e 0x0f
0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
without LCMAP_BYTEREV:
before:
0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
after (7):
0x0e 0x09 0x01 0x01 0x01 0x01 0x00 0x07
0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
Is that the same as you see?
Cheers ... Duncan.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
More information about the Cygwin
mailing list