Here is how to reproduce the problem: $ dd if=/dev/zero of=file2G bs=1M count=2049 2049+0 records in 2049+0 records out 2148532224 bytes (2.1 GB) copied, 47.9032 s, 44.9 MB/s $ ar q ar2G.ar file2G $ od -a ar2G.ar 0000000 ! < a r c h > nl f i l e 2 G / sp 0000020 sp sp sp sp sp sp sp sp 1 3 2 4 4 6 6 2 0000040 8 0 sp sp 1 0 0 0 sp sp 1 0 0 0 sp sp 0000060 1 0 0 6 4 4 sp sp - 2 1 4 6 4 3 5 0000100 0 7 ` nl nul nul nul nul nul nul nul nul nul nul nul nul ^C Note that the archive claims that the 'file2G' size is negative: -214643507. This results in an invalid archive that cannot be extracted: $ ar xv ar2G.ar x - file2G ar: ar2G.ar is not a valid archive As a consequence of this it is impossible to generate Debian packages bigger than 2GB (for instance for applications that have a large dataset). Obviously the file size was stored into a signed 32bit variable. Reading the source code shows that it was actually a long which means there will be further issues if such an archive is moved from a 32bit system to a 64bit one. More precisely, the archive file format is a linked list of element headers and relies on these having accurate size information to find the position of the next element header. Since the archive format allocates 10 characters for the file size, it should be able to handle files up to 10GB. However: * Files between 2GiB and 4GiB The file size is stored as being negative. The archive cannot be extracted by either the 32bit ar or the 64bit one. * Files between 4GiB and 10GB Only the first 32bits are taken into account so ar will write a size of 0.1GiB for a 4.1GiB file. As a result, during extraction ar will think there is an archive element header in the middle of the file, resulting in an error (if not worse). There are also sign issues between 6GiB and 8GiB. * Files bigger than 10GB ar will silently truncated the file size to its first 10 decimal digits. Decoding will fail for the same reason as above. Even 64bit systems are not immune to these issues due to the file sizes being stored in 'unsigned int' variables in various places.
Created attachment 6123 [details] bfd: Fix writing the size of 2+GB elements in the archive.
Created attachment 6124 [details] bfd: Refuse to create an invalid archive when an archive element is too big.
Created attachment 6125 [details] bfd: Fix parsing the size of archive elements larger than 2GB.
Created attachment 6126 [details] bfd: Always use bfd_size_type to manipulate the size of an archive element.
Created attachment 6127 [details] ar: Fix handling of archive elements larger than 2GB.
I attached a set of 5 patches that fix this issue (at least for me). I hope they're ok. If not let me know.
CVSROOT: /cvs/src Module name: src Changes by: nickc@sourceware.org 2012-01-20 14:42:57 Modified files: bfd : ChangeLog archive.c archive64.c bfdio.c libbfd-in.h libbfd.h Log message: PR binutils/13534 * archive.c (_bfd_ar_sizepad): New function. Correctly install and pad the size field in an archive header. (_bfd_generic_read_ar_hdr_mag): Use the correct type and scan function for the archive size field. (bfd_generic_openr_next_archived_file): Likewise. (do_slurp_coff_armap): Likewise. (_bfd_write_archive_contents): Likewise. (_bfd_bsd44_write_ar_hdr): Use the new function. (bfd_ar_hdr_from_filesystem): Likewise. (_bfd_write_archive_contents): Likewise. (bsd_write_armap): Likewise. (coff_write_armap): Likewise. * archive64.c (bfd_elf64_archive_write_armap): Likewise. * bfdio.c (bfd_bread): Use correct type for archive element sizes. * ar.c (open_inarch): Likewise. (extract_file): Likewise. * libbfd-in.h (struct areltdata): Use correct types for parsed_size and extra_size fields. Prototype _bfd_ar_sizepad function. * libbfd.h: Regenerate. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/ChangeLog.diff?cvsroot=src&r1=1.5591&r2=1.5592 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/archive.c.diff?cvsroot=src&r1=1.80&r2=1.81 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/archive64.c.diff?cvsroot=src&r1=1.14&r2=1.15 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/bfdio.c.diff?cvsroot=src&r1=1.31&r2=1.32 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/libbfd-in.h.diff?cvsroot=src&r1=1.95&r2=1.96 http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/libbfd.h.diff?cvsroot=src&r1=1.265&r2=1.266
Hi Francois, Thanks for reporting this problem, and supplying a patch to fix it. I have checked in your patch with minor change - I made _bfd_ar_sizepad a boolean function - and one slightly more important change - I created a changelog entry. Cheers Nick