Bug 19348 - re_search matches $ much slower than .*
Summary: re_search matches $ much slower than .*
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: regex (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-09 14:40 UTC by alex_y_xu
Modified: 2015-12-10 15:19 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description alex_y_xu 2015-12-09 14:40:18 UTC
$ echo {1..5000000} > file # adjust based on CPU speed
    $ time sed -e 's/$/stuff/' file >/dev/null # logical way to append to lines
    sed -e 's/$/stuff/' file > /dev/null  2.91s user 0.09s system 99% cpu 3.007 total
    $ time sed -e 's/.*/&stuff/' file >/dev/null
    sed -e 's/.*/&stuff/' file > /dev/null  1.62s user 0.34s system 99% cpu 1.972 total

musl via busybox sed was tested to be 2x faster in the first case than in the second.

intuitively, this does not make sense. .* should be slower because it needs to match the entire string whereas $ can skip to the end of the line (since sed must already find the new line in order to run the commands).

however, glibc spends an inordinate amount of time inside of check_halt_state_context, re_state_reconstruct, and re_string_context_at, according to callgrind.

I am unsure whether this qualifies as a glibc bug or how to fix it, but I think it is useful to have on the record.