Bug 20938 - In variable-width charsets, _IO_wfile_sync passes a negative buffer size to __codecvt_do_length on certain inputs to fgetws resulting in SIGSEGV
Summary: In variable-width charsets, _IO_wfile_sync passes a negative buffer size to _...
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: stdio (show other bugs)
Version: 2.27
: P1 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-06 13:20 UTC by cat stevens
Modified: 2020-02-21 01:31 UTC (History)
3 users (show)

See Also:
Host: Ubuntu 18.04
Target:
Build: Unknown
Last reconfirmed:


Attachments
Fix wfileops.c to never use a negative delta (swap the pointer subtraction and always use ssize_t) (607 bytes, patch)
2016-12-07 21:47 UTC, cat stevens
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description cat stevens 2016-12-06 13:20:22 UTC
/* 

See this Stack Overflow question: https://stackoverflow.com/questions/40975450

The following code demonstrates the problem: 

*/ 

#include <locale.h>
#include <wchar.h>
#include <stdlib.h>
#include <stdio.h>

int main(const int argc, const char* const * const argv) {
  (void) argc;

  setlocale(LC_ALL, argv[1]);

  const size_t len = (size_t) atoi(argv[2]);

  wchar_t *buf = (wchar_t *) malloc(sizeof (wchar_t) * len),
         *stat = fgetws(buf, (int) len, stdin);

  wprintf(L"[%ls], [%ls]\n", stat, buf);

  free(buf);

  wprintf(L"we are still: %ls\n", L"alive");

  return EXIT_SUCCESS;
}

/* 

compile it like: `gcc -std=c11 main.c -o main` (-std=c++ >= 11 has the same behaviour)
Then, try running it like: `./main C 3`, and give it 10 bytes of input. The program works and ends normally, and Valgrind reports no memory errors.

On the other hand, if it is run with a variable-width (UTF) locale: `./main en_US.utf8 3` and given 10 bytes of STDIN the program behaves normally, but crashes on exiting main. Valgrind reports the following:

==2639== Invalid read of size 8
==2639==    at 0x4EAF575: _IO_wfile_sync (wfileops.c:534)
==2639==    by 0x4EB6DB1: _IO_default_setbuf (genops.c:523)
==2639==    by 0x4EB2FC8: _IO_file_setbuf@@GLIBC_2.2.5 (fileops.c:459)
==2639==    by 0x4EB79B5: _IO_unbuffer_all (genops.c:921)
==2639==    by 0x4EB79B5: _IO_cleanup (genops.c:966)
==2639==    by 0x4E73282: __run_exit_handlers (exit.c:96)
==2639==    by 0x4E73339: exit (exit.c:105)
==2639==    by 0x4E593F7: (below main) (libc-start.c:325)
==2639==  Address 0x18 is not stack'd, malloc'd or (recently) free'd

The aforementioned Stack Overflow question and answer came to the conclusion that:

Inside _IO_wfile_sync, fp->_wide_data->_IO_read_end points to the end of the input string L"5555555555\n", while fp->_wide_data->_IO_read_ptr is at the third character (after reading two), and the difference is -9.

__codecvt_do_length expects an argument _IO_size_t max for a buffer size and receives (_IO_ssize_t) -9 which is turned into, presumably, 2^word_size - 9. Among the first lines in this function is the declaration wchar_t to_buf[max];.

This results in *increasing* the stack pointer instead of decreasing it, and data that should have been safely stored there (among them the pointer fp of _IO_wfile_sync(), as it ends up stored in a register rbx) is overwritten at the first opportunity.

After the return from the function, fp is overwritten with something that does not make sense (NULL, in this case), even though it has never been exposed to it, and dereferencing it on line 534 of wfileops.c causes a SIGSEGV, as the backtrace tells us.

I don't know the exact build date as I have not built libc from source (I do not know how to tell GCC to use a completely different libc).

The GLibC version is 2.24-3ubuntu1, but it does not seem this exists only in Ubuntu repositories as it is here, too:
https://code.woboq.org/userspace/glibc/libio/wfileops.c.html#500

*/
Comment 1 cat stevens 2016-12-06 13:32:41 UTC
Reproduced on Fedora 24 with GLibC 2.23. (https://stackoverflow.com/q/40975450/#comment69201316_40978967)
Comment 2 cat stevens 2016-12-06 13:33:56 UTC
Reproduced on Fedora 24 Linux with GLibC 2.23.
Comment 3 cat stevens 2016-12-07 21:47:52 UTC
Created attachment 9693 [details]
Fix wfileops.c to never use a negative delta (swap the pointer subtraction and always use ssize_t)

I figured out how to run my test program under a different libc, so I cloned glibc 2.24 and made the changes specified in the patch. I annotated it with a comment, but the important part is that the end pointer will always be greater than the 'ptr' which points to the beginning of the string.

I'm not sure why a variable called 'delta' has a signed type (since difference is usually absolute) but __codecvt_do_length takes a 'size_t' as its last argument, not a 'ssize_t'. Since the subtraction of these pointers should always result in an unsigned number, it should be okay to store it in a 'size_t'.

The output of the program in my initial comment was:

$ ~/projects/c/misc/fgetws/f en_US.utf8 3
dddddddddddddddddddddddddddddddddd
[dd], [dd]
still works: yes
Segmentation fault (core dumped)

A segfault after main.

Now, it runs like this: 
$ ./testrun.sh ~/projects/c/misc/fgetws/f en_US.utf8 3 dddddddddddddddddddddddddddddddddd
[dd], [dd]
still works: yes

No segfault, hooray!
Comment 4 cat stevens 2016-12-07 21:50:03 UTC
Comment on attachment 9693 [details]
Fix wfileops.c to never use a negative delta (swap the pointer subtraction and always use ssize_t)

>diff --git a/libio/wfileops.c b/libio/wfileops.c
>index d88d08a..5b68100 100644
>--- a/libio/wfileops.c
>+++ b/libio/wfileops.c
>@@ -498,14 +498,15 @@ libc_hidden_def (_IO_wfile_overflow)
> wint_t
> _IO_wfile_sync (_IO_FILE *fp)
> {
>-  _IO_ssize_t delta;
>+  _IO_size_t delta;
>   wint_t retval = 0;
> 
>   /*    char* ptr = cur_ptr(); */
>   if (fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base)
>     if (_IO_do_flush (fp))
>       return WEOF;
>-  delta = fp->_wide_data->_IO_read_ptr - fp->_wide_data->_IO_read_end;
>+  /* subtract read_ptr from read_end because read_end will always be greater */
>+  delta = fp->_wide_data->_IO_read_end - fp->_wide_data->_IO_read_ptr;
>   if (delta != 0)
>     {
>       /* We have to find out how many bytes we have to go back in the
>@@ -530,7 +531,8 @@ _IO_wfile_sync (_IO_FILE *fp)
>          fp->_wide_data->_IO_state = fp->_wide_data->_IO_last_state;
>          nread = (*cv->__codecvt_do_length) (cv, &fp->_wide_data->_IO_state,
>                                              fp->_IO_read_base,
>-                                             fp->_IO_read_end, delta);
>+                                             fp->_IO_read_end,
>+                delta);
>          fp->_IO_read_ptr = fp->_IO_read_base + nread;
>          delta = -(fp->_IO_read_end - fp->_IO_read_base - nread);
>        }
Comment 5 Dr. Frisco Rose 2020-02-21 00:53:30 UTC
Seeing this bug on Debian with

GNU C Library (Debian GLIBC 2.24-11+deb9u4) stable release version 2.24

Occurs when exiting main that calls to fgetwc( stdin ) and does not consume all of the input. A gdb back trace
(gdb) bt
#0  __GI__IO_wfile_sync (fp=0x7f4c5cd236fe <do_length+126>) at wfileops.c:534
#1  0x00007f4c5cd29de2 in _IO_default_setbuf (fp=fp@entry=0x7f4c5d04e8c0 <_IO_2_1_stdin_>, p=0x0, len=0) at genops.c:523
#2  0x00007f4c5cd26a79 in _IO_new_file_setbuf (fp=0x7f4c5d04e8c0 <_IO_2_1_stdin_>, p=<optimized out>, len=<optimized out>) at fileops.c:459
#3  0x00007f4c5cd2a736 in _IO_unbuffer_all () at genops.c:921
#4  _IO_cleanup () at genops.c:966
#5  0x00007f4c5ccea8e3 in __run_exit_handlers (status=0, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:96
#6  0x00007f4c5ccea99a in __GI_exit (status=<optimized out>) at exit.c:105
#7  0x00007f4c5ccd52e8 in __libc_start_main (main=0x561cb839b4f8 <main>, argc=1, argv=0x7fff700f7908, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff700f78f8) at ../csu/libc-start.c:325
#8  0x0000561cb839a74a in _start ()

If I can help, just let me know how. I would very much like to build a UTF8 aware application but this seems to be a show stopper.
Comment 6 Dr. Frisco Rose 2020-02-21 01:31:23 UTC
Addendum: fclose(stdin) is a work around fix for anyone else who ends up here.
REF: www.linuxquestions.org/questions/programming-9/segfault-after-return-from-main-4175659749/