Unregistering many thousands of kprobes can take several minutes. (Registration seems to be quick.) This could be because of extensive synchronization for each inserted/removed probe. Please investigate whether such synchronization could be done once for a batch of many kprobes.
Specifically, it seems to be the rcu synchronization after every unregistration that takes so long when multiplied by thousands.
Assigning to Masami at Satoshi's request.
Created attachment 1612 [details] kprobe fast unregistration Here is a series of patches which introduce two kinds of interfaces for speeding up unregistering process of kprobes. Currently, the unregister_*probe() function takes a long time to unregister specified probe because it waits the rcu synchronization after each unregistration. So we need to introduce new method for unregistering a lot of probes at once. I'd like to suggest introducing following two interfaces for this issue. The first interface is unregister_*probe_fast(). This removes breakpoint instruction from kernel code and adds specified probe to the unregistering list. The second interface is commit_kprobes(). This waits the rcu synchronization and clean up each probe on the unregistering list. This patch also adds a list_head member to the kprobe data structure for linking to the unregistering list. If using these interfaces, the probe handlers of unregistering probes might be called after unregister_*probe_fast() is called. So, you MUST call commit_kprobes() after calling unregisnter_*probe_fast() for all probes. It is safe even if several threads unregister probes simultaneously, because the unregistration process is protected by kprobe_lock mutex. If one thread calls commit_kprobes(), it will cleanup not only its probes but also the other thread's probes. And it is transparently done. If there is no unregistering probes, commit_kprobes() do nothing. Here is an example code. -- struct kprobes *p; for_each_probe(p) { unregister_kprobe_fast(p); } commit_kprobes(); -- Thanks,
Created attachment 1613 [details] [stap] support fast unregistration This patch add support fast unregistration to stap. Here, I show how the waiting time is reduced by using this feature. This is the test script. --fastunreg.stp-- probe syscall.*, syscall.*.return { printf("test probe\n"); } probe begin {exit();} ---- Without this patch, stap has to wait more than 7 seconds for unregistration all probes. $ stap -v fastunreg.stp -o /dev/null Pass 1: parsed user script and 54 library script(s) in 760usr/20sys/806real ms. Pass 2: analyzed script: 579 probe(s), 1 function(s), 3 embed(s), 0 global(s) in 2040usr/70sys/2174real ms. Pass 3: translated to C into "/tmp/stapwW3jDW/stap_dc844f44127e7d8537f8aebb8b404d5b_76493.c" in 20usr/0sys/33real ms. Pass 4: compiled C into "stap_dc844f44127e7d8537f8aebb8b404d5b_76493.ko" in 2100usr/160sys/2323real ms. Pass 5: starting run. WARNING: Number of errors: 0, skipped probes: 32 Pass 5: run completed in 20usr/50sys/7169real ms. With this patch, stap wait less than 0.2 seconds for unregistration all probes. $ stap -v fastunreg.stp -o /dev/null Pass 1: parsed user script and 54 library script(s) in 930usr/0sys/964real ms. Pass 2: analyzed script: 579 probe(s), 1 function(s), 3 embed(s), 0 global(s) in 1630usr/60sys/1737real ms. Pass 3: translated to C into "/tmp/stapPZkh0h/stap_e1a907cd3ec467ddb8d95221c1e4aae1_76493.c" in 20usr/0sys/33real ms. Pass 4: compiled C into "stap_e1a907cd3ec467ddb8d95221c1e4aae1_76493.ko" in 2080usr/200sys/2310real ms. Pass 5: starting run. Pass 5: run completed in 20usr/40sys/145real ms.
Thank you, looks really good. Someone needs to send it upstream, at which point we can ask it be included in RHEL/Fedora/etc. From the point of view of the translator, it would be ideal if it could emit an #ifdef/#else/#endif to generate both _fast and non-_fast kinds of unregistration calls. In other words, it would be helpful if the kernel patch added a #define ARCH_KPROBES_UNREGISTER_FAST or somesuch, so the translator could work transparently with unpatched and patched kernels.
(In reply to comment #5) > Thank you, looks really good. > > Someone needs to send it upstream, at which point we can ask it be included in > RHEL/Fedora/etc. OK, I'll do that. > From the point of view of the translator, it would be ideal if it could emit an > #ifdef/#else/#endif to generate both _fast and non-_fast kinds of unregistration > calls. In other words, it would be helpful if the kernel patch added a #define > ARCH_KPROBES_UNREGISTER_FAST or somesuch, so the translator could work > transparently with unpatched and patched kernels. OK. I'll propose some kinds of new flag. Kprobes have never had such kind of flags. So we should introduce such flags or macros to not only for this but every interface changes.
(In reply to comment #6) > OK. I'll propose some kinds of new flag. Actually, with the new autoconf hack in the module makefile, this is no longer necessary.
I am not very comfortable with the current approach (yet). I guess the request is to add a batch unregister facility. - Would that mean, unregister *all* currently install kprobes? - If not, we'll need to specify a subset. If it is indeed a subset, I like Christoph's idea of a NULL terminated array supplied to the unregister functin. Also, from an RCU perspective, "batching" unregistration of multiple kprobes is a simple matter. You can chain multiple unregister requests and then do the necessary stuff (put back original instruction, other housekeeping) for all the kprobes and *then* call synchronize_sched(). Thus, you'll have just one grace period for all of the probes and RCU will do the right thing. I see this being done, but its more complex than required. I agree that unregistration needs speedup, but it shouldn't be at the cost of usability in the normal case - if we can preserve the current API and/or simplicity, I am all for it. I wouldn't want users to make multiple calls to unregister and assume that they'll do the right thing. Thanks, Ananth
> - Would that mean, unregister *all* currently install kprobes? In systemtap's case, we'd probably use it to unregister all probes on a per-module or per-kernel basis. > - If not, we'll need to specify a subset. If it is indeed a subset, I like > Christoph's idea of a NULL terminated array supplied to the unregister functin. Yes, that is a natural option, but it requires the client to allocate and fill in an array just for purposes of shutdown. (We don't already have arrays of naked struct kprobes.)
(In reply to comment #8) > I am not very comfortable with the current approach (yet). I guess the request > is to add a batch unregister facility. > > - Would that mean, unregister *all* currently install kprobes? > - If not, we'll need to specify a subset. If it is indeed a subset, I like > Christoph's idea of a NULL terminated array supplied to the unregister functin. Basically, I agree with Frank's concern. I think it costs high, because we have to allocate/manage another array and set probes which we want to unregister. Thus, the API itself becomes simple, but the code of the user side will become a bit complex. In contrast, my preparing/commit style interface just requires changing function name and calling an additional function. > Also, from an RCU perspective, "batching" unregistration of multiple kprobes is > a simple matter. You can chain multiple unregister requests and then do the > necessary stuff (put back original instruction, other housekeeping) for all the > kprobes and *then* call synchronize_sched(). Thus, you'll have just one grace > period for all of the probes and RCU will do the right thing. I see this being > done, but its more complex than required. > > I agree that unregistration needs speedup, but it shouldn't be at the cost of > usability in the normal case - if we can preserve the current API and/or > simplicity, I am all for it. I wouldn't want users to make multiple calls to > unregister and assume that they'll do the right thing. Sure, IMHO, we'll not get rid of the current API, and user can use either it or unregister_fast style APIs. In this way, only if the user want to speedup unregistration of hundreds of probes, he can choose the new APIs.
But, if you were to unregister on a per-module basis, would it be too much to ask that all kprobes be part of an array? Surely, systemtap can possibly be able to transparently allocate all probes of a module from an array. From a normal user perspective, I don't see this as a big problem. Normal (non-systemtap) kprobe users would generally have, one, or at most a few kprobes in one module. Unless there is *no other way* to implement this feature with the current API (or maybe *one* new API, say, kprobes_unregister_batch() or somesuch), I would be very reluctant to go with any interface that requires multiple calls.
How about this for an idea for bulk unregistration (on a per-module basis): - struct kprobe will have a new element - struct list_head chain; - Modules can be made to list_add() every new kprobe registered. - Modules also need to have a reference to the list_head - We provide one new API unregister_kprobes_list() or somesuch where in the unregister routine, list_for_each_entry() is used to iterate through the kprobes chain and unregistered - Call synchronize_sched() once this list is processed. Does this sound ok? Masami?
(In reply to comment #12) > How about this for an idea for bulk unregistration (on a per-module basis): > > - struct kprobe will have a new element - struct list_head chain; > - Modules can be made to list_add() every new kprobe registered. > - Modules also need to have a reference to the list_head > [...] This looks simple to the kernel, but again it requires the client to do more pointer manipulation. It's possible, but it doesn't make the process *overall* simpler nor more robust.
(In reply to comment #12) > How about this for an idea for bulk unregistration (on a per-module basis): > > - struct kprobe will have a new element - struct list_head chain; > - Modules can be made to list_add() every new kprobe registered. > - Modules also need to have a reference to the list_head > - We provide one new API unregister_kprobes_list() or somesuch where in the > unregister routine, list_for_each_entry() is used to iterate through the kprobes > chain and unregistered > - Call synchronize_sched() once this list is processed. > > Does this sound ok? Masami? It seems good to me. Frank, I think we can make user's code more simple if we provide additional API register_kprobes_list(). Here is an example(usage) code which I think. --- struct list_head probe_list; struct kprobe *kp; INIT_LIST_HEAD(&probe_list); for (i = 0; i < NR_PROBES; i++) { kp = kmalloc(sizeof(struct kprobe), GFP_KERNEL); INIT_LIST_HEAD(&kp->chain); list_add(&probe_list, &kp->chain); } ... register_kprobes_list(&probe_list); ... unregister_kprobes_list(&probe_list); --- What do you think about this?
(In reply to comment #14) > (In reply to comment #12) > Frank, I think we can make user's code more simple if > we provide additional API register_kprobes_list(). > [...] That is, make the kprobe client maintain a linked list for both registration and unregistration. This seems like making things more rather than less complicated. We already minimize dynamic memory allocation in favour of static (bss/data) structures. We like to iterate over larger structures that contain kprobes. We like to informally bunch kprobes on a per-module basis since our compiled probes tolerate the absence or presence of probed modules. This will be even more useful when we allow module init/exit code to be probed, and add/remove kprobes on-the-fly during module loading and unloading. At the same time, I don't want to have to thread multiple linked lists. While of course systemtap could be made to emit such code, this is not making *its* job easier, and it certainly isn't making it easier for the dinosaurs writing kprobes C code by hand. :-) I hope we can think of another way to defer the slow system-wide synchronization. That was the only problem with the API. How about a new pair of functions to bracket the existing unregister calls: like a begin_ & end_registration_group? Then, with a begin_* active, registration/unregistration synch would be deferred until the end_* is called. Or, how about implementing unregistration in a deferred way, but which allows the client's struct kprobe memory to be reused or freed immediately?
Subject: Re: request new batch registration/unregistration API On Tue, Apr 03, 2007 at 12:19:42PM -0000, fche at redhat dot com wrote: > > ------- Additional Comments From fche at redhat dot com 2007-04-03 13:19 ------- > (In reply to comment #12) > > How about this for an idea for bulk unregistration (on a per-module basis): > > > > - struct kprobe will have a new element - struct list_head chain; > > - Modules can be made to list_add() every new kprobe registered. > > - Modules also need to have a reference to the list_head > > [...] > > This looks simple to the kernel, but again it requires the client to do more > pointer manipulation. It's possible, but it doesn't make the process *overall* > simpler nor more robust. In practice, the client doesn't have to do any more than list_add_tail_rcu(&new_kp, &list_head) while adding a probe and list_del_rcu(&unreg_kp). I don't see how it can be made more simpler than this. Any reason why you think this process is less robust?
Is there any upstreaming news on this patch?
Masami, Any updates? Aside, see what is done with global disable/enable of kprobes. All gets done at one shot with just one synchronize_sched(). This is similar to the scheme I suggested - while the global scheme works on the global list, we'll need a module local list for the batch case. Ananth
Sorry for not writing you soon. There have been many suggestions. And now, I think we have two options. (A) register_/unregister_kprobe_list() (B) begin_/end_registration_group() (Both methods need to introduce a list member to the kprobe data structure.) I ported my orignal patch and developed (B) idea on 2.6.21-mm1. And currently, I'm trying to developed (A) idea. So, which idea is more useful for you? In my opinion, (A) seems more general way, and (B) is easier to implement because it is similer to my original implementation... Thanks,
Subject: Re: request new batch registration/unregistration API On Wed, May 09, 2007 at 01:01:22PM -0000, masami dot hiramatsu dot pt at hitachi dot com wrote: > > ------- Additional Comments From masami dot hiramatsu dot pt at hitachi dot com 2007-05-09 14:01 ------- > Sorry for not writing you soon. > > There have been many suggestions. And now, I think we have two options. > (A) register_/unregister_kprobe_list() > (B) begin_/end_registration_group() > (Both methods need to introduce a list member to the kprobe data structure.) > > I ported my orignal patch and developed (B) idea on 2.6.21-mm1. > And currently, I'm trying to developed (A) idea. > > So, which idea is more useful for you? > > In my opinion, (A) seems more general way, and (B) is easier to implement > because it is similer to my original implementation... I very much prefer (A). (B) (I feel) is an overengineered solution to a problem that can be solved more simply by (A). Ananth
(In reply to comment #20) > > There have been many suggestions. And now, I think we have two options. > > (A) register_/unregister_kprobe_list() > > (B) begin_/end_registration_group() > > (Both methods need to introduce a list member to the kprobe data structure.) > > > > I ported my orignal patch and developed (B) idea on 2.6.21-mm1. > > And currently, I'm trying to developed (A) idea. > > > > So, which idea is more useful for you? > > > > In my opinion, (A) seems more general way, and (B) is easier to implement > > because it is similer to my original implementation... > > I very much prefer (A). (B) (I feel) is an overengineered solution to a > problem that can be solved more simply by (A). OK, I'm continuing to develop the patches which provides register_/unregister_kprobe_list(). After debugging it, I'll post it as soon as possible. Thanks,
Another way may be the idea from this posting re. markers: http://lkml.org/lkml/2007/5/10/475 That is: defer synchronization until the last probe in a given module is unregistered.
Yes, that is possible if you ensure that all of the probes registered by the said module are going to be unregistered at the same time. The RCU semantics won't allow for postponing synchronize_sched() to later, if that is not the case. In such a deferred/intermittant unregistration, results are unpredictable. The one advantage of the list based approach is that it'll handle both register and unregister with just one call, in contrast to 'n' calls in the refcounted solution. The refcount solution doesn't handle the registration case (not that it is a bottleneck).
Created attachment 1845 [details] [kprobe] batch registration take 2 Here is a set of patches which introduce new list interfaces for speeding up registration/unregistration process of kprobes. Currently, the unregister_*probe() function takes a long time to unregister specified probe because it waits the rcu synchronization after each unregistration. So we need to introduce new method for unregistering a lot of probes at a time. So, these patches introduce batch registration/unregistration interfaces. It also introduce a "chain" list_head structure as a new member of the kprobe structure. To use these interfaces, at first, add each "chain" member of all probes which you want to register/unregister at a time to a list_head structure. And, invoke *register_*probes_list() to register/unregister all probes on the list. Note that you MUST add only same type of probes(kprobe, kretprobe, and jprobe) into a list, because these registration/unregistration functions can't identify the type of probes on the list automatically. If one of probes on a list gets an error, batch registration functions unregister all registered probes on the list. And it returns error code. Here is an example code. --- struct list_head probe_list; struct kprobe *kp; INIT_LIST_HEAD(&probe_list); for (i = 0; i < NR_PROBES; i++) { kp = kzalloc(sizeof(struct kprobe), GFP_KERNEL); INIT_LIST_HEAD(&kp->chain); list_add(&probe_list, &kp->chain); } ... register_kprobes_list(&probe_list); ... unregister_kprobes_list(&probe_list); --- Thanks,
Created attachment 1846 [details] [test] batch (un)registration test program Here is a test program for the batch registration interface. You can check that interfaces work correctly as below. $ make $ sudo ./batch_test.sh
Created attachment 1847 [details] [stap] support batch unregistration of kprobes This patch add new batch unregistration support feature to systemtap. This also includes autoconf support.
Very quick eyeballing over the kernel patches: - __unregister_kprobes_list_part is calling synchronize_sched() with kprobe_mutex held. That is incorrect. - Some codingstyle issues: - if (num > 0 && ++count >= num) break; -> needs to be broken into two lines here and other places - In newer routines added, please leave a blank line between local declarations and code - + if (p->post_handler){ -> need a space between ) and { here and elsewhere - Can some code reordering be done to eliminate forward declarations? Anil, can you please do a review?
(In reply to comment #27) > Very quick eyeballing over the kernel patches: Thank you for reviewing. > - __unregister_kprobes_list_part is calling synchronize_sched() with > kprobe_mutex held. That is incorrect. > - Some codingstyle issues: > - if (num > 0 && ++count >= num) break; -> needs to be broken into two lines > here and other places > - In newer routines added, please leave a blank line between local > declarations and code > - + if (p->post_handler){ -> need a space between ) and { here > and elsewhere > > - Can some code reordering be done to eliminate forward declarations? I fixed these issues except for is_kretprobe() forward declaration, because it depends on the pre_handler_kretprobe(). I think the pre_handler_kretprobe() can be moved just below the cleanup_rp_inst(). What would you think about it?
One additional consideration may be a lost cause, but I'll still mention it: by changing struct kprobe, it becomes difficult or impossible to backport this work into an ABI-frozen release stream like RHEL.
(In reply to comment #26) > Created an attachment (id=1847) > [stap] support batch unregistration of kprobes > This patch add new batch unregistration support feature to systemtap. > This also includes autoconf support. Nicely done. One issue: the linked list chaining may need to consider the rc from the k[ret]probe_register call. Do we want failed probes on that list?
(In reply to comment #29) > One additional consideration may be a lost cause, but I'll still mention it: > by changing struct kprobe, it becomes difficult or impossible to backport > this work into an ABI-frozen release stream like RHEL. Perhaps it's worth comparing this linked-list method to an ABI-compatible one where the new k[ret]_unregister_list functions take an array of k[ret]probe pointers. Then the k[ret]probe structs don't need to be changed, and thus backporting would be easy. It would also simplify the translator changes. (I argued against this approach earlier, hoping to make the kprobes layer do all this automatically. But if it can't, then at least we can try to make it as pure an extension as possible.)
The refcount approach is good if all probe unregister calls happen in one go. Sadly, Mathieu's refcount patch is marker specific and will need work to make it generic (I think that can be done - make the refcount agnostic to instrumentation - static or dynamic). Also, the patch is contained in module.c - which means its only useful for probe "modules" - again, not that big a deal, 'cos we don't (currently) have non-modular probe users. Yes, it'd be interesting to see if we can handle all of this at the module level, without much duplication between markers and kprobes.
(In reply to comment #31) > (In reply to comment #29) > > One additional consideration may be a lost cause, but I'll still mention it: > > by changing struct kprobe, it becomes difficult or impossible to backport > > this work into an ABI-frozen release stream like RHEL. > > Perhaps it's worth comparing this linked-list method to an ABI-compatible one > where the new k[ret]_unregister_list functions take an array of k[ret]probe > pointers. Then the k[ret]probe structs don't need to be changed, and thus > backporting would be easy. It would also simplify the translator changes. Good idea. I think that it is reasonable to use a pointer array of *probes. Since we can't identify the type of probes on the list before registering it, we have to make lists for each types of probes. If we take symmetry of API into consideration, we may not merge those list. So, we don't need flexibility of those list. I mean, we may not add additional probes to this list, and may not delete some probes from it.
(In reply to comment #30) > One issue: the linked list chaining may need to consider the rc > from the k[ret]probe_register call. Do we want failed probes > on that list? Thanks, I'll fix that.
Created attachment 1860 [details] [kprobe] batch registration take 3 Hi Frank and Ananth, I modified my patches to suit to Frank's suggestion and attached here. However, I think it might become a bit complex for this purpose. In my opinion, we might as well apply this patch only to the ABI-frozen kernels, and push the patches based on the take2 to the upstream. What would you think about this idea? Descriptions: Here is a set of patches which introduce new list interfaces for speeding up registration/unregistration process of kprobes. Currently, the unregister_*probe() function takes a long time to unregister specified probe because it waits the rcu synchronization after each unregistration. So we need to introduce new method for unregistering a lot of probes at a time. So, these patches introduce batch registration/unregistration interfaces. It also introduce "probe_container" data structure for listing up the probes which will be registered/unregistered. To use these interfaces, at first, allocate probe_container structures for each probes, and assign the address of the kprobe structure to the probe_container.kp. After that, add those probe_containers to a list_head structure. Finally, invoke *register_*probes_list() to register/unregister all probes on the list. Note that you MUST add only same type of probes(kprobe, kretprobe, and jprobe) into a list, because these registration/unregistration functions can't identify the type of probes on the list automatically. If one of probes on a list gets an error, batch registration functions unregister all registered probes on the list. And it returns error code. Here is an example code. --- struct list_head probe_list; struct probe_container *pc; INIT_LIST_HEAD(&probe_list); for (i = 0; i < NR_PROBES; i++) { pc = kzalloc(sizeof(struct probe_container), GFP_KERNEL); pc->kp = kzalloc(sizeof(struct kprobe), GFP_KERNEL); INIT_LIST_HEAD(&pc->list); list_add(&probe_list, &pc->list); } ... register_kprobes_list(&probe_list); ... unregister_kprobes_list(&probe_list); --- Thanks,
Created attachment 1861 [details] [test] batch (un)registration test program for take3 patchs.
Thank you very much for your ongoing efforts on this. I am sorry that I was not completely clear in what I was proposing for the ABI-frozen case, which is the same thing someone else on lkml said: void unregister_kprobe_list (struct kprobe** array, size_t count); That does not require any extra structures or lists, and does not need to change how registration is done. What do you think?
Thank you for your advice. (In reply to comment #37) > Thank you very much for your ongoing efforts on this. > > I am sorry that I was not completely clear in what I was > proposing for the ABI-frozen case, which is the same thing > someone else on lkml said: > > void unregister_kprobe_list (struct kprobe** array, size_t count); > > That does not require any extra structures or lists, and does > not need to change how registration is done. What do you think? > Would you mean introducing only unregister_kprobes_list() for batch unregistration, and no need to introduce register_*probes_list() for batch registration? I think that is one idea. Since the type of probes could be identified when we unregistering probes, we can integrate all probes in an array. However, from the viewpoint of symmetry of interfaces, I think we should introduce registration interfaces. What would you think?
> Would you mean introducing only unregister_kprobes_list() for batch > unregistration, and no need to introduce register_*probes_list() for > batch registration? I know of no performance requirement for a batch registration call, but of course symmetry suggests that there be one. It would have to be carefully written to handle mid-array registration errors. If it existed, systemtap could use it, though it wouldn't accomplish much.
Is there any progress on posting at least the batch-unregistration API on LKML?
Created attachment 2291 [details] [kprobe] batch registration for 2.6.25-rc2-mm1 I updated a series of patches against for 2.6.25-rc2-mm1. This patchset includes a bugfix of register_kretprobe.
Created attachment 2292 [details] [stap] support batch unregistration of kprobes take 2 This patch adds batch unregistration support to systemtap.
Created attachment 2293 [details] [test] batch (un)registration test program for new patchs. Test tools for new batch registration/unregistration interface.
Looks great (including the systemtap patch). Please feel free to commit to systemtap as soon as upstream kernel accepts the API.
Thanks Masami for pursuing this. Overall, the changes look good. Some observations: a. Andrew has taken in my patch to introduce CONFIG_KRETPROBES; your patches may have to be updated to accomodate that (mostly cosmetic change) b. I guess its good to indicate in the doc *how* one can use this interface - probably indicate the array allocation and usage. c. The first patch in this series can go upstream independent of the bulk registration set.
Masami I successfully tested your patch on x86_64, But Iam not able to run your testcase on powerpc systems, as registering of probe fails. ================================================== root@llm27lp1 batchtest-20080226]# ./batch_test.sh insmod: error inserting './batch_test.ko': -1 Operation not permitted test failed (case 1) <<test case 1>> UNEXPECTED: probe install error: -22 ===================================================
(In reply to comment #45) > Thanks Masami for pursuing this. > > Overall, the changes look good. Some observations: > a. Andrew has taken in my patch to introduce CONFIG_KRETPROBES; your patches may > have to be updated to accomodate that (mostly cosmetic change) OK, but could you tell me how can I get the latest -mm tree which includes your patches? > b. I guess its good to indicate in the doc *how* one can use this interface - > probably indicate the array allocation and usage. Sure, I'll add it. > c. The first patch in this series can go upstream independent of the bulk > registration set. OK, thanks.
(In reply to comment #46) > Masami > I successfully tested your patch on x86_64, But Iam not able to run your > testcase on powerpc systems, as registering of probe fails. > ================================================== > root@llm27lp1 batchtest-20080226]# ./batch_test.sh > insmod: error inserting './batch_test.ko': -1 Operation not permitted > test failed (case 1) > <<test case 1>> > UNEXPECTED: probe install error: -22 > =================================================== Unfortunately, I don't have the powerpc system. But the test case #1 just use register_kprobe(), so I think it comes from some architecture differences. For example, this testtool might not work on ia64, because kp.addr should be initialized by *(void **)func on ia64, instead of (void*)func. I might better use symbol_name field instead of addr field...
Masami I built stap from systemtap patch attached here and then executed the systemtap tests on powerpc system without any problem. Thanks Srinivasa Ds
Here's a review. A lot of these comments are nits, but I do have some concerns around the RCU stuff. See below. General: I like this approach. In particular, arrays of pointers-to-probes make the most sense to me. The top/bottom division works nicely. [un]register_bulk_*probes is kind of an ugly name. Why not just [un]register_*probes? kprobe-fix-kretprobe-nullcheck.patch =================================== OK. I like the code cleanup, too. kprobe-batch_registration.patch =============================== > + /* Initialize multiprobe list for __unregister_kprobe_bottom() */ > + INIT_LIST_HEAD(&p->list); No need for the comment. Initializing a data structure shouldn't require justification. __unregister_kprobe_bottom: Your code here propagates what seems to be a bug in the existing code, related to locking and/or RCU. Specifically, the existing unregister_kprobe() code calls list_del_rcu() at one point that is AFTER the call to synchronize_sched() (and no synchronization call follows) and is also not protected by kprobe_mutex. In your code, this is the list_del_rcu() call in __unregister_kprobe_bottom(). According to my (admittedly intermittent) understanding of RCU, the proper usage is: lock list_del_rcu(...) unlock synchronize But what I see is lock ... unlock synchronize if (cleanup) { ... list_del_rcu(...) } I think the list_del_rcu() call in question could be changed to a plain old list_del() because old_p has already been disconnected from the probe table at that point. For that same reason, I think it's OK that kprobe_mutex isn't held. But the way it's coded, it looks bad. In any case, a comment is warranted. > + __unregister_kprobe_bottom(kp, list_empty(&kp->list) || > + (kp->list.next == kp->list.prev)); Once in the existing code and 3 times in the patches, there is this check to see whether the probe being unregistered is the last probe connected to an aggregate probe. It might be clearer and simpler if you abstract this check into a separate function -- e.g.: static inline int singleton_list(const struct list_head *head) { return (head->next == head->prev && !list_empty(head)); } Then the above changes to __unregister_kprobe_bottom(kp, list_empty(&kp->list) || singleton_list(&kp->list)); and p->list.next == &old_p->list && p->list.prev == &old_p->list)) { changes to singleton_list(&old_p->list)) { In unregister_kprobe: > + if (cleanup >= 0) { > + synchronize_sched(); This looks wrong. __unregister_kprobe_top() can call list_del_rcu() AND return cleanup==0. So shouldn't you ALWAYS call synchronize_sched()? In register_bulk_kprobes: > + for (i = 0; i < num; i++) { > + ret = __register_kprobe(kprobes[i], > + (unsigned long)__builtin_return_address(0)); I don't know how much __builtin_return_address() costs on the various architectures, but shouldn't you compute it just once, outside the loop? kretprobe-batch_registration.patch ================================== > + for (i = 0; i < num; i++) { > + ret = __register_kretprobe(rps[i], > + (unsigned long)__builtin_return_address(0)); Again, move the __builtin_return_address() call outside the loop. jprobe-batch_registration.patch =============================== > + for (i = 0; i < num; i++) { > ... > + ret = __register_kprobe(&jp->kp, > + (unsigned long)__builtin_return_address(0)); Again, move the __builtin_return_address() call outside the loop. doc-batch_registration.patch ============================ > +There are also register_/unregsiter_bulk_*probes() functions for s/unregsiter/unregister/ > +batch registration/unregistration of a group of *probes. These > +functions can speed unregistration process up when you have to functions can speed up the unregistration process when you have to Suppose somebody calls unregister_bulk_kprobes(kps, 2); What should happen if kps[1] is registered but kps[0] isn't? Do you stop because of the error on kps[0], or keep going? This would be a good thing to document. kprobes.txt currently doesn't make any promises about unregistering an unregistered probe, but users may have different expectations regarding bulk unregistration. > +Inserts specified probes into each specified place. If any error > +occurs while registering, all registered probes on the array are > +safely unregistered before returning. How about this? Registers each of the num probes in the specified array. If any error occurs during registration, all probes in the array, up to the bad probe, are safely unregistered before the register_bulk_* function returns. Also, I think the first paragraph in the API Reference section should be updated. The first sentence is: The Kprobes API includes a "register" function and an "unregister" function for each type of probe. I'd add the following sentence: The API also includes "register_bulk" and "unregister_bulk" functions for (un)registering arrays of probes. batch_test.c: ============= > * boost probe bench > * Copyright (c) 2006 Hitachi,Ltd., Adjust copied-and-pasted comment. > #define CALC_TIME(title, cmd) {cycles_t c; \ > c = get_cycles(); \ > cmd; \ > c = get_cycles() - c; \ > printk(title ":%lld mc\n", c);} Typical kernel parlance (to eliminate the "};" at the end that results from a typical invocation) would be: #define CALC_TIME(title, cmd) do { \ cycles_t c = get_cycles(); \ cmd; \ c = get_cycles() - c; \ printk(title ":%lld mc\n", c); \ } while (0) > /* one probe inserted in different places without batch function */ > printk("<<test case 1>>\n"); Fix comment. It's just 1 probe in 1 place. > /*---- two probes inserted in different places ----*/ > printk("<<test case 3>>\n"); ... > /*---- two probes inserted in different places ----*/ > printk("<<test case 6>>\n"); ... > /*---- two probes inserted in different places ----*/ > printk("<<test case 9>>\n"); These look like the same thing. Maybe change "probes" to "kprobes" or "kretprobes" or "jprobes" as appropriate. Ditto elsewhere. > /*---- two probes inserted in a same place ----*/ > printk("<<test case 10>>\n"); Clarify why you expect this to fail: because you can't insert multiple jprobes at the same address. BTW, English nit: s/a same place/the same place/
Subject: Re: request new batch registration/unregistration API > OK, but could you tell me how can I get the latest -mm tree which includes your > patches? Andrew is yet to release it; you'll probably have to wait for 2.6.25-rc3-mm1
Hi Jim, (In reply to comment #50) > Here's a review. A lot of these comments are nits, but I do have > some concerns around the RCU stuff. See below. Thank you for detailed comment! > [un]register_bulk_*probes is kind of an ugly name. Why not > just [un]register_*probes? I think it is not a noticeable change. Users will be easily mistype and misread it. If you have another good name, please let me know. > kprobe-batch_registration.patch > =============================== > > + /* Initialize multiprobe list for __unregister_kprobe_bottom() */ > > + INIT_LIST_HEAD(&p->list); > No need for the comment. Initializing a data structure shouldn't > require justification. OK, I'll remove it. > __unregister_kprobe_bottom: > Your code here propagates what seems to be a bug in the existing code, > related to locking and/or RCU. > > Specifically, the existing unregister_kprobe() code calls > list_del_rcu() at one point that is AFTER the call to synchronize_sched() > (and no synchronization call follows) and is also not protected by > kprobe_mutex. In your code, this is the list_del_rcu() call in > __unregister_kprobe_bottom(). Thank you for finding it. I'll fix that. > I think the list_del_rcu() call in question could be changed to a > plain old list_del() because old_p has already been disconnected > from the probe table at that point. For that same reason, I think > it's OK that kprobe_mutex isn't held. But the way it's coded, > it looks bad. In any case, a comment is warranted. I'll check whether it is safety. > > + __unregister_kprobe_bottom(kp, list_empty(&kp->list) || > > + (kp->list.next == kp->list.prev)); > Once in the existing code and 3 times in the patches, there is this > check to see whether the probe being unregistered is the last probe > connected to an aggregate probe. It might be clearer and simpler if > you abstract this check into a separate function -- e.g.: > static inline int singleton_list(const struct list_head *head) > { > return (head->next == head->prev && !list_empty(head)); > } That is a good idea. > In unregister_kprobe: > > + if (cleanup >= 0) { > > + synchronize_sched(); > This looks wrong. __unregister_kprobe_top() can call list_del_rcu() > AND return cleanup==0. So shouldn't you ALWAYS call synchronize_sched()? Please don't worry, I use ">=";-). when cleanup == 0(or 1), it always calls synchronize_sched(). if cleanup < 0, there is an error in __unregister_kprobe_top(). > In register_bulk_kprobes: > > + for (i = 0; i < num; i++) { > > + ret = __register_kprobe(kprobes[i], > > + (unsigned long)__builtin_return_address(0)); > I don't know how much __builtin_return_address() costs on the > various architectures, but shouldn't you compute it just once, > outside the loop? Indeed. I'll get it outside. > doc-batch_registration.patch > ============================ [snip] > Suppose somebody calls > unregister_bulk_kprobes(kps, 2); > What should happen if kps[1] is registered but kps[0] isn't? Do you > stop because of the error on kps[0], or keep going? This would be > a good thing to document. kprobes.txt currently doesn't make any > promises about unregistering an unregistered probe, but users may > have different expectations regarding bulk unregistration. Actually, that case is not checked, so it will break the kernel. (NOTE: register_kprobe also does not check it.) > > +Inserts specified probes into each specified place. If any error > > +occurs while registering, all registered probes on the array are > > +safely unregistered before returning. > How about this? > Registers each of the num probes in the specified array. If any > error occurs during registration, all probes in the array, up to > the bad probe, are safely unregistered before the register_bulk_* > function returns. Thanks, I'll update. > > Also, I think the first paragraph in the API Reference section > should be updated. The first sentence is: > The Kprobes API includes a "register" function and an "unregister" > function for each type of probe. > I'd add the following sentence: > The API also includes "register_bulk" and "unregister_bulk" > functions for (un)registering arrays of probes. Sure. > batch_test.c: > ============= > > * boost probe bench > > * Copyright (c) 2006 Hitachi,Ltd., > Adjust copied-and-pasted comment. Oh, OK. I'll fix this test code. Thank you again,
(In reply to comment #52) > > [un]register_bulk_*probes is kind of an ugly name. Why not > > just [un]register_*probes? > > I think it is not a noticeable change. > Users will be easily mistype and misread it. But the signatures of the two functions are different, so it is an easy compile-time error to mix them up.
(In reply to comment #53) > (In reply to comment #52) > > > > [un]register_bulk_*probes is kind of an ugly name. Why not > > > just [un]register_*probes? > > > > I think it is not a noticeable change. > > Users will be easily mistype and misread it. > > But the signatures of the two functions are different, > so it is an easy compile-time error to mix them up. OK, there are already same kinds of naming like as alloc_pages/alloc_page in the kernel. BTW, I think it seems better to inline register_*probe as below; --- static inline int register_kprobe(struct kprobe *p) { return register_kprobes(&p, 1); } static inline void unregister_kprobe(struct kprobe *p) { unregister_kprobes(&p, 1); } --- Thanks,
Subject: Re: request new batch registration/unregistration API > (In reply to comment #53) > > (In reply to comment #52) > > > > > > [un]register_bulk_*probes is kind of an ugly name. Why not > > > > just [un]register_*probes? > > > > > > I think it is not a noticeable change. > > > Users will be easily mistype and misread it. > > > > But the signatures of the two functions are different, > > so it is an easy compile-time error to mix them up. > > OK, there are already same kinds of naming > like as alloc_pages/alloc_page in the kernel. > > BTW, I think it seems better to inline register_*probe as below; > --- > static inline int register_kprobe(struct kprobe *p) > { > return register_kprobes(&p, 1); > } > static inline void unregister_kprobe(struct kprobe *p) > { > unregister_kprobes(&p, 1); > } You can't EXPORT_SYMBOL(_GPL) an inline function.. can you? :-) Ananth
(In reply to comment #55) > > BTW, I think it seems better to inline register_*probe as below; > > --- > > static inline int register_kprobe(struct kprobe *p) > > { > > return register_kprobes(&p, 1); > > } > > static inline void unregister_kprobe(struct kprobe *p) > > { > > unregister_kprobes(&p, 1); > > } > > You can't EXPORT_SYMBOL(_GPL) an inline function.. can you? > :-) Even though, register_*probes are exported as GPL symbol. So, you can't use register_*probe from non-GPL module. Anyway, it just comes from code maintainability, so we can deal with it separately(ex. as a cleanup patch etc.). Thanks,
Subject: Re: request new batch registration/unregistration API > > > > You can't EXPORT_SYMBOL(_GPL) an inline function.. can you? > > :-) > > Even though, register_*probes are exported as GPL symbol. > So, you can't use register_*probe from non-GPL module. Aside from whether they are _GPL or not, its not the right thing to do. > Anyway, it just comes from code maintainability, so we > can deal with it separately(ex. as a cleanup patch etc.). Nope! I don't think we'd want to make currently exported functions inline.
(In reply to comment #57) > > Anyway, it just comes from code maintainability, so we > > can deal with it separately(ex. as a cleanup patch etc.). > > Nope! I don't think we'd want to make currently exported functions > inline. Would you concern about ABI compatibility? I also concern about maintainability, because (un)register_*probes can process 1 to n probes. I think, these would better be merged into 1 implementation internally.
(In reply to comment #55) > You can't EXPORT_SYMBOL(_GPL) an inline function.. can you? (I might mis-explain below...) In my idea, those inline functions are defined in linux/kprobes.h. So, you don't need to export those symbols. However, this change breaks ABI compatibility.
Subject: Re: request new batch registration/unregistration API > > You can't EXPORT_SYMBOL(_GPL) an inline function.. can you? > > (I might mis-explain below...) > In my idea, those inline functions are defined in linux/kprobes.h. > So, you don't need to export those symbols. > However, this change breaks ABI compatibility. 1. I don't want to break the ABI unless its absolutely necessary. 2. Just the sheer size of the register/unregister_* functions makes them a bad candidate for inlining. 3. Just dropping the EXPORT_SYMBOLs without a lead time is frowned upon in the community. And removing them => KABI breakage. There have been cases recently where bigger functions that were initially inline, have been converted to calls; we seem to be doing it backwards here. We have *probes as an _interface_ to the kernel, and I'd like to leave it that way, rather than make them just header defined inline functions.
(In reply to comment #60) > 1. I don't want to break the ABI unless its absolutely necessary. > 2. Just the sheer size of the register/unregister_* functions makes them > a bad candidate for inlining. > 3. Just dropping the EXPORT_SYMBOLs without a lead time is frowned upon > in the community. And removing them => KABI breakage. > > There have been cases recently where bigger functions that were > initially inline, have been converted to calls; we seem to be doing it > backwards here. > > We have *probes as an _interface_ to the kernel, and I'd like to leave > it that way, rather than make them just header defined inline functions. > Thank you. so I'll leave these ABIs and symbols. And, I'll change just the implementation of unregister_*probe to use unregister_*probes for avoiding code duplication. Thanks again,
(In reply to comment #61) I promised Ananth and Masami to comment on this, so here goes. > (In reply to comment #60) > > 1. I don't want to break the ABI unless its absolutely necessary. I don't see how changing an exported function to an inline function in kprobes.h breaks the ABI. The function is still available. Are you concerned that it is now available to non_GPL modules? The main danger (an unlikely one) that I see in changing a __kprobes function to an inline function is that gcc may at some time decide NOT to inline it, which would leave it open for probing. Some time back, a kprobes maintainer had to change a bunch of inline functions to __kprobes at akpm's instigation, for that reason. > > 2. Just the sheer size of the register/unregister_* functions makes them > > a bad candidate for inlining. But the register/unregister_* functions would become very small -- just one-line calls to the corresponding bulk-registration functions. > > 3. Just dropping the EXPORT_SYMBOLs without a lead time is frowned upon > > in the community. And removing them => KABI breakage. > > > > There have been cases recently where bigger functions that were > > initially inline, have been converted to calls; we seem to be doing it > > backwards here. > > > > We have *probes as an _interface_ to the kernel, and I'd like to leave > > it that way, rather than make them just header defined inline functions. > > > > Thank you. so I'll leave these ABIs and symbols. > And, I'll change just the implementation of unregister_*probe to use > unregister_*probes for avoiding code duplication. > > Thanks again, I think it's fine to leave the [un]register_*probe functions as __kprobe symbols, and reimplement them as one-line calls to the bulk functions. I'm always in favor of code simplification and reduction of code duplication, which I think will be served here. But be sure we don't lose track of who the real caller is -- i.e., remember to pass around the right value from __builtinr_retunr_address().
Created attachment 2312 [details] [kprobe] batch registration for 2.6.25-rc3-mm1 Here is the updated patchset. - Change interfaces to (un)register_*probes. - Add a patch for list_singleton - Check failures in unregister_*probes. Thanks
(In reply to comment #62) > I think it's fine to leave the [un]register_*probe functions as __kprobe > symbols, and reimplement them as one-line calls to the bulk functions. I'm > always in favor of code simplification and reduction of code duplication, which > I think will be served here. But be sure we don't lose track of who the real > caller is -- i.e., remember to pass around the right value from > __builtinr_retunr_address(). Hmm, so how about this? --- static int __register_kprobes(struct kprobe **kps, int num, unsigned long addr) { ...(real implementation) } int __kprobes register_kprobe(struct kprobe *p) { return __register_kprobes(&p, 1, __builtin_return_address(0)); } int __kprobes register_kprobes(struct kprobe **kps, int num) { return __register_kprobes(kps, num, __builtin_return_address(0)); } --- (new patchset already reimplemented unregister_*probe.)
(In reply to comment #64) ... > Hmm, so how about this? > --- > static int __register_kprobes(struct kprobe **kps, int num, unsigned long addr) { > ...(real implementation) > } > int __kprobes register_kprobe(struct kprobe *p) > { > return __register_kprobes(&p, 1, __builtin_return_address(0)); > } > int __kprobes register_kprobes(struct kprobe **kps, int num) > { > return __register_kprobes(kps, num, __builtin_return_address(0)); > } > --- > (new patchset already reimplemented unregister_*probe.) > That looks fine to me. Thanks.
Proposal in comment 64 is fine. I don't like the idea of exporting a core kernel functionality through just a header file; so removing the EXPORT_SYMBOL is out of the question. Wrapping the main calls to call the same routine internally is OK. Also, patch 1 of the series (add_list_singleton.patch) can just go upstream independently.
Created attachment 2319 [details] [kprobe] batch registration for 2.6.25-rc5-mm1 Updated patches against 2.6.25-rc5-mm1 - reimplement register_*probe as wrapping register_*probes - fix a bug in register_jprobes. Thanks,
Created attachment 2320 [details] [test] batch (un)registration test program for new patchs (for *register_*probes) Updated testtool for (un)register_*probes().
Created attachment 2321 [details] [stap] support batch unregistration of kprobes take 3 Updated patch for systemtap.
Overall the patches look fine. Thanks! Any reason why there is code duplication in jprobe-batch_registration.patch? Basically the portions that assign entry, pre_ and break_handlers are duplicated.
Created attachment 2322 [details] Fix testtool to run on powerpc also Masami, I needed this patch to get the test module to run seamlessly on powerpc
Created attachment 2323 [details] Fix testtool to run on powerpc also Masami, I needed this patch to get the test module to run seamlessly on powerpc
Created attachment 2324 [details] [kprobe] batch registration for 2.6.25-rc5-mm1 (fixed) (In reply to comment #70) > Overall the patches look fine. Thanks! > > Any reason why there is code duplication in jprobe-batch_registration.patch? > Basically the portions that assign entry, pre_ and break_handlers are duplicated. Oops, it was my fault. I uploaded an old patches... Here is new (and correct) one.
Created attachment 2334 [details] [test-kprobe] add batch registration testcases to test_kprobes.c These patches add batch registration testcases to test_kprobes.c and introduce indirect function calls instead of a tricky code for preventing gcc optimization.
Masami, Isn't it preferable to have a dummy inline asm like asm ("") in the probed function instead? Check 38332cb98772f5ea757e6486bed7ed0381cb5f98 in Linus' kernel for an example.
(In reply to comment #75) > Masami, > Isn't it preferable to have a dummy inline asm like > > asm ("") > > in the probed function instead? Check 38332cb98772f5ea757e6486bed7ed0381cb5f98 > in Linus' kernel for an example. Ananth, I just worried about portability. Is that inline asm supported on any arch? If so, I can use the idea. (Anyway, it may require some comment for explaining its purpose)
Subject: Re: request new batch registration/unregistration API On Tue, Mar 25, 2008 at 03:40:43PM -0000, mhiramat at redhat dot com wrote: > > Ananth, I just worried about portability. > Is that inline asm supported on any arch? If so, I can use the idea. > (Anyway, it may require some comment for explaining its purpose) Yes, it will require a comment to explain its purpose, but if you see the commit I cited in the earlier update, that change is in include/linux/<file> which is arch agnostic. Basic asm directives without reference to specific arch registers are supposedly portable. Maybe its worth a try similar to the time.h change.
Created attachment 2338 [details] [test-kprobe] add batch registration testcases to test_kprobes.c take2 Ananth, I updated the patches to use dummy asm(""). Could you test it on ppc64?
Masami, With just kprobes-fix-test.patch the test passes on powerpc which means the asm directive is working. However, with all the patches applied, I get: Kprobe smoke test failed: kprobe pre_handler2 not called Kprobe smoke test failed: jprobe handler2 not called Kprobe smoke test failed: kretprobe handler2 not called BUG: Kprobe smoke test: 3 error(s) running handlers Haven't been able to dig further...
Re comment #79, I may have spoken too early. If kprobe_target*() is modified to use the older get_kprobe() instead, tests pass. It looks like asm() works, but not all the time :-( Masami, batchtest-20080321 that uses function pointers instead works fine. Lets go with that. Sorry for having led you down the wrong path.
- kernel patch was already merged to 2.6.25-mm1 (still in the mm tree). - systemtap patch was committed. We can use it on 2.6.25-mm1 with the latest systemtap. Thanks,