Filesystem enumeration performance improvement
Jürgen Wagner
juergen@wagner.is
Sun Sep 30 19:50:00 GMT 2018
Hi Marco,
as you don't use the Cygwin APIs but go to the Windows APIs directly,
any changes to the way stat()/readdir() or related functions in Cygwin
operate do not seem to be a plausible reason why your code is running
faster. I doubt printf() can be improved to provide such a dramatic
speed-up.
In my experience, such effects usually have one of two reasons:
- There is some caching involved, either in Windows or on the disk
level. Run the benchmark tests with empty caches or caching disabled.
- Your virus scanner has improved and the operation of determining the
status of files no longer excessively causes checks. This is a bit
harder to verify or test.
Did you compare your program's performance with that of Cygwin's "find"?
Did that also show such a dramatic increase in throughput?
There is a free and quite fast disk space analyzer called RidNacs
(ScanDisk backwards). If the magic you observe is an optimized way of
caching, this program should also be affected.
Cheers,
--J.
On 30.09.2018 20:41, Marco Mason wrote:
> I recently upgraded from cygwin v2.10 to v2.11.1 and noticed that one of my
> programs got a tremendous speed boost. It's a custom filesystem
> enumeration program whose output I feed to frcode to update the
> /var/locatedb database. It used to take quite a bit of time (15-20
> minutes?), and now runs in about a minute. Since the program seems to work
> well, just many times faster, I'm rather happy with the changes.
>
> The reason I'm writing is that I don't see *why* I should have any timing
> changes at all! The reason I have my own file enumerator for locatedb is
> that the original went through the POSIX layer and was pretty slow,
> especially for remote-mounts. As I only needed enough for locate, I wrote
> my own enumerator against the Windows API for speed. Since my loop is
> essentially just using FindFirstFile/FindNextFile and printf(), I don't
> know why file gathering would be any faster.
>
> So either printf() has gotten remarkably faster, or there are some
> interactions between Cygwin and windows in the file enumeration area that
> are surprising me. Can someone please clue me in to what might be causing
> the speed increases?
>
> Looking at the git log and mailing list history, my best guess would be
> that it's related to the EMail threads "Why does readdir() open files ?"
> (Ben Rubson 2018-03-28) and "Why does (stat() ?) open files ?" (Ben Rubson
> 2018-04-09). However, I can't seem to pin down which git commits are
> relevent to those threads. If anyone can provide a little insight, I'd
> really appreciate it.
>
> --marco
>
> --
> Problem reports: http://cygwin.com/problems.html
> FAQ: http://cygwin.com/faq/
> Documentation: http://cygwin.com/docs.html
> Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
>
>
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
More information about the Cygwin
mailing list