This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

NTFS fragmentation redux

Some time back (~Aug), there was a discussion about NTFS's file fragmentation problem.

Some notes at the time:

From: Vladimir Dergachev
I have encountered a rather puzzling fragmentation that occurs when writing files using Cygwin.
a small Tcl script that, when run, creates files fragmented into about 300 pieces on my system)
On 03 August 2006 18:50, Vladimir Dergachev wrote:
I guess this means that sequential writes are officially broken on NTFS. Anyone has any idea for a workaround ? It would be nice if a simple
tar zcvf a.tgz * does not result in a completely fragmented file.
On Aug  3 14:54, Vladimir Dergachev wrote:
What I am thinking about is modifying cygwin's open and write calls so that they preallocate files in chunks of 10MB (configurable by an environment variable).

The "fault" is the behavior of the file system.
I compared NTFS with ext3 & xfs on linux (jfs & reiser hide how many
fragments a file is divided into).

NTFS is in the middle as far as fragmentation performance.  My disk
is usually defragmented, but the built-in Windows defragmenter doesn't
defragment free space.

I used a file size of 64M and proceeded copying that file to
a destination file using various utils.

With Xfs (linux), I wasn't able to fragment the target file.  Even
writing 1K chunks in append mode, the target file always ended up
in 1 64M fragment.

With Ext3 (also linux), it didn't seem to matter the copy method, cp, dd(blocksize 64M), and rsync all produced a target file with
2473 fragments.

NTFS using cygwin, varies the fragment size based on the the tool
writing the output. "cp" produced the most fragments at 515 fragments.
"rsync" came next with 19 fragments.
"dd" (using a bs=32M or bs=64M) did best at 1 fragment.
using "dd" and using a block size of 8k produced the same
results as "cp".

It appears cygwin does exactly the right thing as far as file
writes are concerned -- it writes the output using the block size
specified by the client program you are running. If you use a
small block size, NTFS allocates space for each write that you do.
If you use a big block size, NTFS appears to look for the first place that the entire write will fit. Back in DOS days, the built-in COPY command buffered as much data as would fit in memory then wrote it out -- meaning it would be like to create the output with a minimal number of fragments.

If you want your files to be unfragmented, you need to use a
file copy (or file write) util that uses a large buffer size --
one that (if possible), writes the entire file in 1 write.

In the "tar zcvf a.tgz *" case, I'd suggest piping the output of
tar into "dd" and use a large blocksize.


Unsubscribe info:
Problem reports:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]