Created attachment 7761 [details] Sample posix_fallocate program (cc -o posix_fallocate posix_fallocate.c) Working on a bug for libvirt which resulted in one of the reviews indicating perhaps I had found a glibc bug. Hopefully I explain things well enough - at the very least there are a couple of links which can show a bit of history. If you need more data - I'd be happy to oblige... I did try googling to see if something like this had been reported before, but didn't see anything. I'm running an f20 environ which I try to keep "up to date". A "yum list installed | grep glibc" returns: glibc.i686 2.18-12.fc20 @updates glibc.x86_64 2.18-12.fc20 @updates glibc-common.x86_64 2.18-12.fc20 @updates glibc-devel.x86_64 2.18-12.fc20 @updates glibc-headers.x86_64 2.18-12.fc20 @updates First the libvirt bug: https://bugzilla.redhat.com/show_bug.cgi?id=1077068 Which led to me submitting some patches that got some questions where I thought perhaps I had answered what I had seen well enough: http://www.redhat.com/archives/libvir-list/2014-August/msg00367.html However, after making some changes as a result of my first patch effort, I got the following review: http://www.redhat.com/archives/libvir-list/2014-August/msg01074.html Which led me down the path of create a test program and try to explain what I think the bug is and see what I get. Essentially the problem is that for an NFS target on libvirt some tester was trying to create specifically sized files and then used a libvirt command which essentially does a stat() on the resulting file to ensure the size specified was the size created. Worked quite well for all types, except NFS. This issue may be related to NFS, but I'm not quite sure. As it turns out NFSv4 will allow "larger" 'wsize' values and will do some autonegotiation on your behalf if you don't provide a size - so in my case - that's going to be a wsize of 1MB. It seems as though this "wsize" is used in some matter during the posix_fallocate() in the math that would generate the "sized" file that the tester wanted. In the example from the above bz, it was a 10 MiB file which would return an 'st_size' from stat() of "88". The libvirt code would then say the size of the volume was 88*512 or 45056 bytes/44.0 KiB. Not quite what was expected. For a non NFS file, the 'st_size' was 20480 which ends up being the right size. If I modify the nfs server startup to change the "wsize" - I can get different results all the way down to 4096 in which case I get the "expected" results. Using the attached (hopefully) program - I've more or less reproduced the sequence of calls libvirt makes and results in a similar trace that is shown in one of the links above: % strace ./posix_fallocate /home/nfs_pool/target/bug10mib.img 10 ... mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fabd997d000 write(1, "Open '/home/nfs_pool/target/bug1"..., 74Open '/home/nfs_pool/target/bug10mib.img' for fallocate of 10485760 bytes ) = 74 open("/home/nfs_pool/target/bug10mib.img", O_RDWR|O_CREAT|O_EXCL, 03017576725020) = 3 write(1, "Before fallocate_posix\n", 23Before fallocate_posix ) = 23 fstat(3, {st_mode=S_IFREG|S_ISUID|S_ISVTX, st_size=0, ...}) = 0 write(1, "stat st_blocks=0 st_blksize=1048"..., 46stat st_blocks=0 st_blksize=1048576 st_size=0 ) = 46 write(1, "size in libvirt KiB would be 0\n", 31size in libvirt KiB would be 0 ) = 31 fallocate(3, 0, 0, 10485760) = -1 EOPNOTSUPP (Operation not supported) fstat(3, {st_mode=S_IFREG|S_ISUID|S_ISVTX, st_size=0, ...}) = 0 fstatfs(3, {f_type="NFS_SUPER_MAGIC", f_bsize=1048576, f_blocks=137675, f_bfree=101444, f_bavail=94428, f_files=8962048, f_ffree=8174933, f_fsid={0, 0}, f_namelen=255, f_frsize=1048576}) = 0 pwrite(3, "\0", 1, 1048575) = 1 pwrite(3, "\0", 1, 2097151) = 1 pwrite(3, "\0", 1, 3145727) = 1 pwrite(3, "\0", 1, 4194303) = 1 pwrite(3, "\0", 1, 5242879) = 1 pwrite(3, "\0", 1, 6291455) = 1 pwrite(3, "\0", 1, 7340031) = 1 pwrite(3, "\0", 1, 8388607) = 1 pwrite(3, "\0", 1, 9437183) = 1 pwrite(3, "\0", 1, 10485759) = 1 write(1, "fallocate ret=0 errno=0\n", 24fallocate ret=0 errno=0 ) = 24 write(1, "After fallocate_posix\n", 22After fallocate_posix ) = 22 fstat(3, {st_mode=S_IFREG|S_ISVTX, st_size=10485760, ...}) = 0 write(1, "stat st_blocks=88 st_blksize=104"..., 54stat st_blocks=88 st_blksize=1048576 st_size=10485760 ) = 54 write(1, "size in libvirt KiB would be 44\n", 32size in libvirt KiB would be 44 ) = 32 close(3) = 0 exit_group(0) = ? % My NFS Server is yes on the same host as the client program. I didn't do anything special to set it up - used the following sequence of steps: mkdir /home/nfs_pool mkdir /home/nfs_pool/nfs-export mkdir /home/nfs_pool/nfs edit /etc/idmapd.conf to change/add the uncommented line: Domain = localhost.localdomain edit /etc/exports and add: /home/nfs_pool/nfs-export localhost(rw,no_root_squash) Then use 'service nfs restart' % exportfs -v /home/nfs_pool/nfs-export localhost(rw,wdelay,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) % I used libvirt to start the pool, but all it does is essentially mount things, resulting in a mount -v showing: % mount -v ... localhost:/home/nfs_pool/nfs-export on /home/nfs_pool/target type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp6,port=0,timeo=600,retrans=2,sec=sys,clientaddr=::1,local_lock=none,addr=::1) ... % FWIW: Compared to a write on just my /home partition: % strace ./posix_fallocate /home/bug10mib.img 10 ... mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcb8c3eb000 write(1, "Open '/home/bug10mib.img' for fa"..., 58Open '/home/bug10mib.img' for fallocate of 10485760 bytes ) = 58 open("/home/bug10mib.img", O_RDWR|O_CREAT|O_EXCL, 03017576725020) = 3 write(1, "Before fallocate_posix\n", 23Before fallocate_posix ) = 23 fstat(3, {st_mode=S_IFREG|S_ISUID|S_ISVTX, st_size=0, ...}) = 0 write(1, "stat st_blocks=0 st_blksize=4096"..., 43stat st_blocks=0 st_blksize=4096 st_size=0 ) = 43 write(1, "size in libvirt KiB would be 0\n", 31size in libvirt KiB would be 0 ) = 31 fallocate(3, 0, 0, 10485760) = 0 write(1, "fallocate ret=0 errno=0\n", 24fallocate ret=0 errno=0 ) = 24 write(1, "After fallocate_posix\n", 22After fallocate_posix ) = 22 fstat(3, {st_mode=S_IFREG|S_ISUID|S_ISVTX, st_size=10485760, ...}) = 0 write(1, "stat st_blocks=20480 st_blksize="..., 54stat st_blocks=20480 st_blksize=4096 st_size=10485760 ) = 54 write(1, "size in libvirt KiB would be 102"..., 35size in libvirt KiB would be 10240 ) = 35 close(3) = 0 exit_group(0) = ? %
Commit 7fe9e2e089f4990b7d18d0798f591ab276b15f2b in 2.22 fixes the NFS issue by ignoring very large block sizes.