This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Sat, 2014-07-19 at 17:02 -0700, Roland McGrath wrote: > Separate from the implementation details of the workaround you propose, I > think we would like to understand more clearly what is going on to confuse > helgrind. I have attached the (under review) valgrind patch. Whatever better solution we might have in a future glibc (see below discussion), we should have a solution for old/current glibc. Note that this patch applies to the valgrind SVN (it also contains the test program that shows the tls false positive). > Though it's all unclear to me so far, I tend to suspect both > that this workaround is not what we'd consider the ideal fix and that the > underlying problem is likely to have other symptoms (for other parts of the > code, other usage patterns, and/or other code outside of libc that rolls > its own synchronization primitives). Yes, helgrind has several tricks (kludges?) to avoid raising false positive in various "low level" code: * ld.so is not instrumented, otherwise this causes huge nr of false positive to (expensively) suppress. * the default valgrind error suppression file contains very general suppression entries, suppressing all races that have their top frame in libc or libpthread. Search for 'not very clever' or 'ugly' in valgrind default.supp. A.o., default.supp mentions that helgrind does not understand stdio locking. The 'not clever/ugly' is mostly because these supp can cause false negative. * several more precise suppression entries for various false + in glibc and others. The above techniques do not work with the false positive caused by the stack cache/tls internal structures, as (some of) the races appears in user code. Also, pthread cond var cannot be properly handled by helgrind. For sure, it would be nice to have a general solution for all that (or at least a solution that decreases the nr of needed kludges in helgrind). An approach could be to add in glibc specific helgrind requests to describe the low level synchronisation primitives (e.g. describe the 'create lock/lock/unlock'operations). This has some drawbacks: * it introduces a few instructions in the 'hot path' of the low level glibc primitives. (this might imply to have library versions compiled with/without these requests, which makes it more cumbersome for the end user, which has then to point at the good library when using helgrind). * Such annotations are valgrind specific. They will probably work also for the other valgrind race detector (drd). But there are other race detection tools which might have similar difficulties. * It is not sure that all helgrind kludges can be eliminated by using these requests. Someone on valgrind dev mailing list suggested to have specific debug information used to mark synchronisation primitives. Interesting idea, but not sure if/how that can work (and that is for sure a very long term solution). In summary, have a better support for race detection tools in glibc is attractive, but looks not trivial. So, several various things can be done: * short term: the valgrind hack patch to disable stack cache in to support old and current glibc * when glibc provides a tuning infrastructure, replace the hack by a cleaner way to disable the cache * more longer term: see how glibc could "describe" its behaviour to race detection tools * Maybe at somewhat shorter term, solve the problem of pthread cond var, for which there is currently no known hack/kludge (see helgrind user manual section 7.5 for a description of the difficulty) Thanks Philippe
Attachment:
patch_stack_cache_hack.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |