[Cygwin] package-grep on Sourceware
ASSI
Stromeko@nexgo.de
Sun Dec 8 11:05:00 GMT 2019
Hi Frank,
I've done some more testing locally. I have confirmed my suspicion that the
output buffering will cause wrong results to be produced when the output gets
longer than a page. This should explain the odd results of a search that
causes all file names to be returned on the server.
I propose that (GNU) grep is run in line-buffering mode instead, which at
least on my local system doesn't materially impact the runtime. I also
suggest that you use find instead of ls to create the list of files, which
reduces the time to produce the file list slightly. If you really want to max
out the parallelism you will also need to limit the number of arguments fed
into each instance (otherwise you'll end up with xargs starting only 13
processes). Feeding in 1715 names on each invocation ends up starting 20
processes, so that should help getting the most out of the available number of
cores.
find $dir -mindepth 2 -maxdepth 2 -type f -not -name .htaccess |
xargs -L1715 -P16 LC_ALL=C grep -l --line-buffered -- "$param_grep" |
sort > "$tmpfile"
In a later iteration the list of files to be searched could be cached (in a
file $dir.lst, say). This already helps in the local case, but is likely more
effective on a system that has a lot more IO load than I can produce locally.
<$dir.lst xargs -L1715 -P16 LC_ALL=C grep -l --line-buffered --
"$param_grep" |
sort > "$tmpfile"
The cache file would need to be refreshed each time the listing directories
get updated (although you probably could just run find in a cronjob every few
minutes and nobody would notice a difference). Having a cache file would also
make determining the optimal input length easier as you could just count the
number of lines in order to calculate how to best split them among multiple
processes.
Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+
Waldorf MIDI Implementation & additional documentation:
http://Synth.Stromeko.net/Downloads.html#WaldorfDocs
More information about the Overseers
mailing list