[Cygwin] package-grep on Sourceware

ASSI Stromeko@nexgo.de
Sun Dec 8 11:05:00 GMT 2019


Hi Frank,

I've done some more testing locally.  I have confirmed my suspicion that the 
output buffering will cause wrong results to be produced when the output gets 
longer than a page.  This should explain the odd results of a search that 
causes all file names to be returned on the server.

I propose that (GNU) grep is run in line-buffering mode instead, which at 
least on my local system doesn't materially impact the runtime.  I also 
suggest that you use find instead of ls to create the list of files, which 
reduces the time to produce the file list slightly.  If you really want to max 
out the parallelism you will also need to limit the number of arguments fed 
into each instance (otherwise you'll end up with xargs starting only 13 
processes).  Feeding in 1715 names on each invocation ends up starting 20 
processes, so that should help getting the most out of the available number of 
cores.

  find $dir -mindepth 2 -maxdepth 2 -type f -not -name .htaccess |
    xargs -L1715 -P16 LC_ALL=C grep -l --line-buffered -- "$param_grep" |
    sort > "$tmpfile"

In a later iteration the list of files to be searched could be cached (in a 
file $dir.lst, say).  This already helps in the local case, but is likely more 
effective on a system that has a lot more IO load than I can produce locally.

    <$dir.lst xargs -L1715 -P16 LC_ALL=C grep -l --line-buffered -- 
"$param_grep" |
    sort > "$tmpfile"

The cache file would need to be refreshed each time the listing directories 
get updated (although you probably could just run find in a cronjob every few 
minutes and nobody would notice a difference).  Having a cache file would also 
make determining the optimal input length easier as you could just count the 
number of lines in order to calculate how to best split them among multiple 
processes.



Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Waldorf MIDI Implementation & additional documentation:
http://Synth.Stromeko.net/Downloads.html#WaldorfDocs





More information about the Overseers mailing list