The bug is that `sscanf` (`vsscanf` actually) is very slow on large strings. It _seems_ like `vsscanf` is first trying to find the end of the string (with `rawmemchr`), which is expensive given that most of the string is not going to be read. Here's a test code: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> #include <errno.h> #define N 50000 static int _debug_helper(const char *src, int *a, int *n) { #if 1 return sscanf(src, "%d%n", a, n); #else char *end; errno = 0; *a = strtol(src, &end, 10); *n = end - src; return errno == 0 && *n > 0; #endif } int main() { int i; int a; int n; int so_far = 0; char *big_string = malloc(N * 4 + 1); for (i = 0; i < N; ++i) strcpy(big_string + i * 4, "123 "); big_string[N * 4] = '\0'; while (1) { if (_debug_helper(big_string + so_far, &a, &n) != 1) break; so_far += n; } return 0; } Compile with: gcc -Wall -g -O0 main.c Running this code with `N = 50000` and using `sscanf`, I get: $ time ./a.out real 0m1.602s user 0m1.596s sys 0m0.000s Running it with `N = 50000` and using the substitute code, I get: $ time ./a.out real 0m0.002s user 0m0.000s sys 0m0.000s Which is considrably smaller. Note that this shows that the part with `malloc` and initialization take very small time. Running callgrind shows that almost all of the time when using `sscanf` is spent in `rawmemchr`. Indeed, using gdb and randomly hitting CTRL+C, you always end up with a stack trace like this: #0 __rawmemchr_ia32 () at ../sysdeps/i386/i686/multiarch/../../rawmemchr.S:167 #1 0xb7e78a06 in _IO_str_init_static_internal () at strops.c:44 #2 0xb7e5c857 in __GI___isoc99_vsscanf () at isoc99_vsscanf.c:41 #3 0xb7e5c7cf in __isoc99_sscanf () at isoc99_sscanf.c:31 #4 0x08048494 in _debug_helper () at main.c:11 #5 0x08048517 in main () at main.c:41 This means that `rawmemchr` is slowing down `sscanf` by an unnecessary degree. To further prove this point (and to confirm my guess that `rawmemchr` is reading the whole string), here are a couple more tests: With `N = 25000` and using `sscanf`: $ time ./a.out real 0m0.407s user 0m0.404s sys 0m0.000s With `N = 12500` and using `sscanf`: $ time ./a.out real 0m0.106s user 0m0.104s sys 0m0.000s This clearly shows an `O(N^2)` behavior. The main loop of the program is `O(N)`, which means `sscanf` is running at `O(N)`. For large `N`, this is significant. On the other hand, the actual behavior of `sscanf` should be to read from the string according to the format string and no more, which in this case (using `%d` and "123" as values) is of constant time.
Does the C standard prescribe a big-O for sscanf? That is, while clearly this could be faster, is the current implementation compliant (in which case it’s a bug in the C standard) or is it just an implementation that’s compliant but could be better?
(In reply to BenFrantzDale from comment #1) > Does the C standard prescribe a big-O for sscanf? That is, while clearly > this could be faster, is the current implementation compliant (in which case > it’s a bug in the C standard) or is it just an implementation that’s > compliant but could be better? There is no complexity class requirement in the standard.
Of course the standard does not make any guarantees anywhere about complexity class, but this is a pretty serious QoI issue that's being hit in the real world and makes sscanf largely unusable for parsers. I'd like to see it fixed.
This seems to be related to the CPU architecture. Running the posted code on a virtualised pristine installation of Ubuntu 14.04 in i386 mode gives the following results : $ time ./a.out real 0m0.718s user 0m0.716s sys 0m0.000s However, running the same code on a separate installation of 14.04, but in x86_64 mode gives drastically different results : $ time ./a.out real 0m0.086s user 0m0.085s sys 0m0.001s This is reproducible with current versions of GCC and glibc (10.2.0 and 2.33 respectively), this time on actual hardware : i386 (-m32) : $ time ./a.out real 0m0.688s user 0m0.688s sys 0m0.000s x86_64 : $ time ./a.out real 0m0.042s user 0m0.042s sys 0m0.000s The CPU is Ryzen 3700X.
That is a difference in constant factor, on 64-bit it uses AVX2 rawmemchr (selected by ifunc), but on 32-bit sscanf invokes basic ia32 rawmemchr implementation directly.