This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: NTFS fragmentation


> > From: Vladimir Dergachev <vdergachev@rcgardis.com>
> To: cygwin@cygwin.com
> Subject: Re: NTFS fragmentation
> Date: Thu, 3 Aug 2006 14:54:33 -0400
> 
> On Thursday 03 August 2006 2:37 pm, Dave Korn wrote:
> > On 03 August 2006 18:50, Vladimir Dergachev wrote:
> > > On Thursday 03 August 2006 5:18 am, Dave Korn
> wrote:
> > >> On 03 August 2006 00:46, Vladimir Dergachev
> wrote:
> > >>
> > >>
> > >>     Hi Vladimir,
> > >>
> > >>>>> Please CC me - I am not on the list.
> > >>
> > >>   Done :)
> > >>
> > >
> > > I guess this means that sequential writes are
> officially broken on NTFS.
> > >
> > > Anyone has any idea for a workaround ? It would
> be nice if a simple
> > > tar zcvf a.tgz * does not result in a completely
> fragmented file.
> >
> >   I can only think of one thing worth trying off
> the top of my head: what
> > happens if you open a file (in non-sparse mode)
> and immediately seek to the
> > file size, then seek back to the start and
> actually write the contents?  Or
> > perhaps after seeking to the end you'd need to
> write (at least) a single
> > byte, then seek back to the beginning?
> >
> 
> I am not sure that I understand, if one creates the
> file and then seeks to 
> +1G, wouldn't the file pointer be still at 0 as the
> filesize is 0 ?
> 
> What I am thinking about is modifying cygwin's open
> and write calls so that 
> they preallocate files in chunks of 10MB
> (configurable by an environment 
> variable). 
> 
> This way we still get some fragmentation, but it
> would not be so bad - 
> assuming 50MB/sec disk read speed reading 10MB will
> take 200ms, while a seek 
> is at worst 20ms (usually around 10-15ms).
> 
>                                      best
> 
>                                             Vladimir
> Dergachev
>
 
It turns out that to actually allocate the file
blocks, you need to write some data. Seeking to the
desired size doesn't (or didn't used to) actually
allocate the intervening blocks. As Dave suggests, you
need to seek to the end and actually write something
to get the file blocks allocated. If you try this for
a very large file (several Gigabytes), you had better
be prepared to go and have a nice meal while you wait
for the block allocation to complete. Window's
security policy requires that the blocks not only be
allocated, but that they be written with data as well
- ostensibly to prevent malicious code from reading
old data it shouldn't have access to.

Granted, there are better ways to do this - zero-fill
on attempts to read from allocated but uninitialized
file space or at the very least, throw some kind of
exception when an application attempts to read
uninitialized file data. Since Windows supports sparse
files, the basic mechanism is there somewhere.

Windows doesn't (or didn't use to) allow preallocation
of files without actually writing data UNLESS you know
the proper incantation to prove you're a good guy (
your application needs to do a dance to grant itself
the "SeManageVolumePrivilege" privilege so it can
issue the "SetFileValidData" call).


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]