This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Remote sync filesystem to handle distribution space usage


On Mon, Mar 05, 2018 at 09:43:50AM -0800, Carlos O'Donell wrote:
> On 03/05/2018 06:40 AM, Zack Weinberg wrote:
> > On Mon, Mar 5, 2018 at 8:39 AM, Mike FABIAN <mfabian@redhat.com> wrote:
> >> I added this to the wiki for hte 2.28 release notes:
> >> https://sourceware.org/glibc/wiki/Release/2.28#The_locale-archive_file_is_much_bigger
> > 
> > Thanks for writing this up.
> > 
> >> As LC_COLLATE makes up the bulk of the locale data, the size of
> >> the locales increased a lot. The locale-archive file which contains the
> >> data for all localess grew from 126 MiB to 206 Mib.
> > 
> > I wonder if we should spend some time thinking about ways to compact
> > this data or factor it out.  I realize it's not as simple as putting
> > the compiled form of iso14651_t1_common in its own file that all the
> > locales refer to, because of the "locale specific rules", but maybe it
> > could be _almost_ that simple?  Alternatively, perhaps some simple
> > compression could be applied?
>  
> Right, the problem is that the tables and weights for collation are built
> as a singular set, and any additional rules would perturb the tables and
> their weights. However, I haven't looked that closely at seeing how much
> of the 3-level tables are the same across similar locales with variant
> collation rules.
> 
> Note that the a user is free to delete certain locales from the 
> locale-archive, so we must be able to revert such changes with minimal
> metadata overhead.
> 
> I'm going to go back to C.UTF-8 in a couple of weeks to look over finalizing
> those changes and the full code-point sorting fixes I have, and I'll see
> if I can come up with any ideas.
> 

It reminds me idea that I had but didn't pursue because it needs
established distribution to start using it. It would simplify locale
files as well as debug symbols, man pages and similar use calles.

For this type files add mechanism that would download them when actually
needed because open of that file was called. With fast enough connection
it would be downloaded faster than application reads it.

It would work that if one would try to open nonexisting file/directory
with absolute_path there would be check /network-files/absolute_path
(only writtable by root)

If that exist it would connect to daemon that would download from url in
/network-files/absolute_path, when finished copy that into correct
location and return handle for temporary file. Its recommanded to use
md5 as part of url.

With secure connection to server it should be enough for security. For
mirroring one could do something like include hash of each 4k data.

Directories are to reduce space usage of /network-files as it only needs
to mirror existing directories. For nonexisting it downloads a file that
tells how to create directory in /network-files as well as content of all files in
it.


It would need to have bit more complex package manager as to replace
those files it would need to check if downloaded file exists and delete
it if its unchanged.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]