This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv3] Protect _dl_profile_fixup data-dependency order [BZ #23690]



On 22/10/2018 21:17, Carlos O'Donell wrote:
> On 10/18/18 3:40 PM, Adhemerval Zanella wrote:
>> I disagree, each possible user option we support incurs in extra
>> maintainability and in this case the possible combination of current 
>> trampoline types and arch-specific code increases even more the burden
>> of not only provide, but to ensure correctness and testability.
> 
> I agree with you on this.
> 
>>> Thus enabling auditing should have as little impact on the underlying
>>> application deployment as possible.
>>>
>>> Forcing immediate binding for LD_AUDIT has an impact we cannot measure,
>>> because we aren't the user with the application.
>>
>> I agree, but I constantly I hear that lazy-binding might show performance
>> advantages without much data to actually to back this up. Do we have actual
>> benchmarks and data that show it still a relevant feature?
> 
> There are two issues at hand.
> 
> (1) Lazy-binding provides a hook for developer tooling.
> 
> (2) Lazy-binding speeds up application startup.
> 
> We have concrete evidence for (1), it's LD_AUDIT, and latrace/ltrace, and
> a bunch of other smaller developer tooling.
> 
> There is even production systems using it like Spindle:
> https://computation.llnl.gov/projects/spindle
> 
> Spindle has immediate examples of where all aspects of the dynamic loading
> process are slowed down by large scientific workloads.

Correct me if I am wrong, but from the paper it seems it intercepts the 
the file operations and use a shared caching mechanism to avoid duplicate
the loading time. My understanding is the issue they are trying to solve
is not relocation runtime overhead, but rather parallel file system operations
when multiple processes loads a bulk of shared libraries and python modules
incurring in I/O concurrency overhead.

Also, it says rtld-audit PLT interposition is in fact a performance issue
which they had to actually make the spindle client to handle the GOT
setup. They do seems to use the symbol binding to intercept open* calls
to handle script languages loading.

I understand that they adapted the rtld-audit to their needs, however it
does not really require lazy-binding to intercept the library calls to
intercept the file operations (readdir for instance). Also I think it
would be feasible to call la_symbind* on first symbol resolution for
non-lazy mode.

> 
> However, we don't have any good microbenchmarks to show the difference
> between lazy and non-lazy. I should write some so we can have a concrete
> discussion.
> 
> I see rented cloud environments as places where lazy-binding would help
> reduce CPU usage costs.
> 
> I see distribution usage of BIND_NOW as a security measure that while
> important is not always relevant to users running services inside their
> own networks. Why pay the performance cost of security relevant features
> if you don't need them?

I do agree with you, but my point is 1. maybe the performance gains do not
really outweigh the code complexity and its maintainability costs and 2. the
other factor (security in this case) might be more cost effectively.

> 
>>>
>>> The point of these features is to allow for users to customize their choices
>>> to meet their application needs. It is not a one-siz-fits-all.
>>>
>>>> More and more distributions are set bind-now as default build option and
>>>> audition already implies some performance overhead (not considering the
>>>> lazy-resolution performance gain might also not represent true in real
>>>> world cases).
>>>  
>>> Distribution choices are different from user application choices.
>>>
>>> Sometimes we make unilateral choices, but only if it's a clear win.
>>>
>>> The most recent case was AArch64 TLSDESC, where Arm decided that TLSDESC
>>> would always be resolved non-lazily (Szabolcs will correct me if I'm wrong).
>>> This was a case where the synchronization required to update the TLSDESC
>>> was so costly on a per-function-call basis that it was clearly always a
>>> win to force TLSDESC to always be immediately bound, and drop the required
>>> synchronization (a cost you always had to pay).
>>>
>>> Here the situation is less clear, and we have less data with which to make
>>> the choice. Selection of lazy vs. non-lazy is still a choice we give users
>>> and it is independent of auditing.
>>>
>>> In summary:
>>>
>>> - Selection of lazy vs non-lazy binding is presently an orthogonal user
>>>   choice from auditing.
>>>
>>> - Distribution choices are about general solutions that work best for a
>>>   large number of users.
>>>
>>> - Lastly, a one-size-fits-all solution doesn't work best for all users.
>>>
>>> Unless there is a very strong and compelling reason to force non-lazy-binding
>>> for LD_AUDIT, I would not recommend we do it. It's just a question of user
>>> choice.
>>
>> My point is since we have limited resources, specially for synchronization
>> issues which required an extra level of carefulness; I see we should prioritize
>> better and revaluate some taken decisions. Some decisions were made to handle a 
>> very specific issue in the past which might not be relevant for current usercases,
>> where the trade-off of performance/usability/maintainability might have changed.
> 
> Agreed. I think we need some benchmarks here to have a real discussion.

Agreed.

> 
>> We already had some lazy-bind issues in the past (BZ#19129, BZ#18034, BZ#726),
>> still have some (BZ#23296, BZ#23240, BZ#21349, BZ#20107), and might still contain
>> some not accounted for in bugzilla for not so widespread used options (ld audit,
>> ifunc, tlsdesc, etc.). These are just the one I got from a very basic bugzilla 
>> search, we might have more.
> 
> I agree, it is compilcated by the fact that multiple threads resolve the symbols
> at the same time.
> 
>> This lead to ask me if lazy-bind still worth all the required internal complexity
>> and which real world gains we are trying to obtain besides just the option for
>> itself. I do agree that giving more user choices are a better thing, but we
>> need to balance usefulness, usability, and maintenance.
> 
> I don't disagree, *but* if we are going to get rid of lazy-binding, something
> we have supported for a long time, it's going to have to be with good evidence
> to show our users that it really doesn't matter anymore.
> 
> I hope that makes my position clearer.
> 
> In summary:
> 
> - If we are going to make a change to remove lazy-binding it has to be in an
>   informed manner with results from benchmarking that allow us to give
>   evidence to our users.
> 

Your position is clear and also to make mine clearly I just want to check
if removing lazy-binding as default might be an option. I also don't want
to block this patch, so might move this discussion no another thread.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]