Bug 17577 - sscanf extremely slow on large strings
Summary: sscanf extremely slow on large strings
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: stdio (show other bugs)
Version: 2.19
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-11 13:49 UTC by Shahbaz Youssefi
Modified: 2021-03-15 21:56 UTC (History)
6 users (show)

See Also:
Host: Linux (Ubuntu 14.04)
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Shahbaz Youssefi 2014-11-11 13:49:11 UTC
The bug is that `sscanf` (`vsscanf` actually) is very slow on large strings. It _seems_ like `vsscanf` is first trying to find the end of the string (with `rawmemchr`), which is expensive given that most of the string is not going to be read.

Here's a test code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>

#define N 50000

static int _debug_helper(const char *src, int *a, int *n)
{
#if 1
 return sscanf(src, "%d%n", a, n);
#else
 char *end;

 errno = 0;
 *a = strtol(src, &end, 10);
 *n = end - src;

 return errno == 0 && *n > 0;
#endif
}

int main()
{
 int i;
 int a;
 int n;
 int so_far = 0;

 char *big_string = malloc(N * 4 + 1);
 for (i = 0; i < N; ++i)
  strcpy(big_string + i * 4, "123 ");
 big_string[N * 4] = '\0';

 while (1)
 {
  if (_debug_helper(big_string + so_far, &a, &n) != 1)
   break;
  so_far += n;
 }

 return 0;
}

Compile with: gcc -Wall -g -O0 main.c

Running this code with `N = 50000` and using `sscanf`, I get:

$ time ./a.out

real 0m1.602s
user 0m1.596s
sys 0m0.000s

Running it with `N = 50000` and using the substitute code, I get:

$ time ./a.out

real 0m0.002s
user 0m0.000s
sys 0m0.000s

Which is considrably smaller. Note that this shows that the part with `malloc` and initialization take very small time. Running callgrind shows that almost all of the time when using `sscanf` is spent in `rawmemchr`. Indeed, using gdb and randomly hitting CTRL+C, you always end up with a stack trace like this:

#0 __rawmemchr_ia32 () at ../sysdeps/i386/i686/multiarch/../../rawmemchr.S:167
#1 0xb7e78a06 in _IO_str_init_static_internal () at strops.c:44
#2 0xb7e5c857 in __GI___isoc99_vsscanf () at isoc99_vsscanf.c:41
#3 0xb7e5c7cf in __isoc99_sscanf () at isoc99_sscanf.c:31
#4 0x08048494 in _debug_helper () at main.c:11
#5 0x08048517 in main () at main.c:41

This means that `rawmemchr` is slowing down `sscanf` by an unnecessary degree. To further prove this point (and to confirm my guess that `rawmemchr` is reading the whole string), here are a couple more tests:

With `N = 25000` and using `sscanf`:

$ time ./a.out

real 0m0.407s
user 0m0.404s
sys 0m0.000s

With `N = 12500` and using `sscanf`:

$ time ./a.out

real 0m0.106s
user 0m0.104s
sys 0m0.000s

This clearly shows an `O(N^2)` behavior. The main loop of the program is `O(N)`, which means `sscanf` is running at `O(N)`. For large `N`, this is significant. On the other hand, the actual behavior of `sscanf` should be to read from the string according to the format string and no more, which in this case (using `%d` and "123" as values) is of constant time.
Comment 1 BenFrantzDale 2021-03-03 02:26:24 UTC
Does the C standard prescribe a big-O for sscanf? That is, while clearly this could be faster, is the current implementation compliant (in which case it’s a bug in the C standard) or is it just an implementation that’s compliant but could be better?
Comment 2 Florian Weimer 2021-03-03 03:35:28 UTC
(In reply to BenFrantzDale from comment #1)
> Does the C standard prescribe a big-O for sscanf? That is, while clearly
> this could be faster, is the current implementation compliant (in which case
> it’s a bug in the C standard) or is it just an implementation that’s
> compliant but could be better?

There is no complexity class requirement in the standard.
Comment 3 Rich Felker 2021-03-03 19:31:56 UTC
Of course the standard does not make any guarantees anywhere about complexity class, but this is a pretty serious QoI issue that's being hit in the real world and makes sscanf largely unusable for parsers. I'd like to see it fixed.
Comment 4 Daniel Kozar 2021-03-15 20:38:13 UTC
This seems to be related to the CPU architecture. Running the posted code on a virtualised pristine installation of Ubuntu 14.04 in i386 mode gives the following results :

$ time ./a.out 

real	0m0.718s
user	0m0.716s
sys	0m0.000s

However, running the same code on a separate installation of 14.04, but in x86_64 mode gives drastically different results :

$ time ./a.out 

real	0m0.086s
user	0m0.085s
sys	0m0.001s

This is reproducible with current versions of GCC and glibc (10.2.0 and 2.33 respectively), this time on actual hardware :

i386 (-m32) :

$ time ./a.out 

real	0m0.688s
user	0m0.688s
sys	0m0.000s

x86_64 :

$ time ./a.out 

real	0m0.042s
user	0m0.042s
sys	0m0.000s

The CPU is Ryzen 3700X.
Comment 5 Alexander Monakov 2021-03-15 21:56:46 UTC
That is a difference in constant factor, on 64-bit it uses AVX2 rawmemchr (selected by ifunc), but on 32-bit sscanf invokes basic ia32 rawmemchr implementation directly.