Review of the patch "abipkgdiff parallelization"
Ondrej Oprala
ooprala@redhat.com
Thu Jan 1 00:00:00 GMT 2015
On 04.11.2015 12:53, Dodji Seketeli wrote:
> Hello,
>
> This is the review of the patch named "abipkgdiff parallelization",
> attached to the enhancement request
> https://sourceware.org/bugzilla/show_bug.cgi?id=19081.
>
> At the end of this message, I am attaching the initial patch to this
> email as it is otherwise inaccessible via email because bugzilla doesn't
> send us attachments along with the bug mails we get.
>
> I am also replying to the comment accompanying the bugzilla attachment
> here, as I believe it's more convenient to use email to discuss these
> things.
>
>> --- Comment #5 from Ondrej Oprala <ooprala at redhat dot com> ---
>> So here's my take on this issue.
>> I've been using the rpms of vtk to test this throughout.
>> Prior to any threading attempts, running this:
>> abipkgdiff vtk-6.1.0-26.fc23.x86_64.rpm vtk-6.2.0-8.fc23.x86_64.rpm --d1
>> vtk-debuginfo-6.1.0-26.fc23.x86_64.rpm --d2
>> vtk-debuginfo-6.2.0-8.fc23.x86_64.rpm
>>
>> took a bit less than 4 mins:
>> 230,08s user 6,60s system 101% cpu 3:53,03 total
>> threads make the numbers much nicer \o/
> Whoah! I cannot wait to see this patch land in master ;-)
>
> Thank you for working on this.
>
>> 381,17s user 14,15s system 492% cpu 1:20,23 total
>>
>> I tried to be as verbose as possible with the commit message, but if you
>> think I missed
>> anything anywhere, please tell me, I want this to be as good as it can
>> get.
> Nah, the cover letter is great, thank you for taking the time to put
> that together. It's incredibly useful for maintenance to have extensive
> context like that.
>
>> Helgrind only complains about elf_version (and elf_begin), which we
>> already discussed on the IRC.
> Good, good.
>
> So, I like the spirit of the patch, really.
>
> I just have some comments about its letter.
>
> There we go.
>
>> diff --git a/doc/manuals/abipkgdiff.rst b/doc/manuals/abipkgdiff.rst
>> index 7b90d88..4ca6f6a 100644
>> --- a/doc/manuals/abipkgdiff.rst
>> +++ b/doc/manuals/abipkgdiff.rst
>> @@ -89,6 +89,10 @@ Options
>> Do not search the *package2* for the presence of suppression
>> files.
>>
>> + * ``--no-parallel``
>> +
>> + Do not extract packages or run comparisons in parallel.
> Maybe, right before this sentence, it would be cool to say that by
> default, if the system has several processors, the tool uses them to
> perform several tasks in parallel.
>
> [...]
>
>
>> diff --git a/tools/abipkgdiff.cc b/tools/abipkgdiff.cc
> [...]
>
> Please, add a comment that describes the whole picture, saying what is
> performed concurrently, what is performed sequentially, so that future
> readers have the whole "modus operandi" in their heads. Something like:
>
> As this program uses libpthread to perform several tasks
> concurrently, here is a coarse grain description of the sequence
> of actions performed, including where things are done
> concurrently.
>
> (steps 1/ to /5 are performed in sequence)
>
> 1/ the first package and its debug info are extracted concurrently.
> One thread extracts the package (and maps its content) and another one
> extracts the debug info package.
>
> 2/ the second package and its debug info are extracted in parallel.
> One thread extracts the package (and maps its content) and another one
> extracts the debug info package.
>
> 3/ the file system trees of extracted packages are traversed to
> identify existing pairs and a list of arguments for future comparison
> is made. The trees are traversed in concurrently.
>
> 4/ comparisons are performed concurrently.
>
> 5/ the reports are then emitted to standard output, always in the same
> order.
>
> [...]
Thanks for the great description, I used it and stated you as the co-author.
>> -/// This contains the set of files of a given package. It's populated
>> +/// The key for getting the thread-local elf_file_paths vector, which
>> +/// contains the set of files of a given package. The vector is populated
>> /// by a worker function that is invoked on each file contained in the
>> -/// package. So this global variable is filled by the
>> +/// package, specifically by the
>> /// {first,second}_package_tree_walker_callback_fn() functions. Its content
>> -/// is relevant only during the time after which the current package has been
>> -/// analyzed and before we start analyzing the next package.
>> -static vector<string> elf_file_paths;
>> +/// is relevant only until the mapping of the packages elf files is done.
>> +static pthread_key_t thr_key_elf_map;
> I wouldn't call this variable a *map. How about calling it
> elf_file_paths_tls_key or something similar?
>
> [...]
>
>> +/// This map is used to keep environments for differing corpora
>> +/// referenced.
>> +static map<corpus_diff_sptr, abigail::ir::environment_sptr> env_map;
> It would be informative to add that an environment needs to be kept
> alive during the life time of all the libabigail objects that depend
> on it.
>
> [...]
>
>> +/// Arguments passed to the package extraction functions
>> +struct extract_args
> Please call this type package_descriptor. It really designates the
> package as well as the the context that is necessary to extract it and
> map its content. This type is thus used by the functions that extract
> and map the package and its associated debug info.
>
> I believe the package_descriptor name is more specific (has more
> specific meaning) than the more generic extract_args name.
>
>> +{
>> + package &pkg;
>> + const options& opts;
>> + ftw_cb_type callback;
>> +};
>> +
> Please add a comment describing the type below, just like what you did
> for the type above.
>
>> +struct compare_args
>> +{
>> + const elf_file elf1;
>> + const string& debug_dir1;
>> + const elf_file elf2;
>> + const string& debug_dir2;
>> + const options& opts;
>> +
>> + /// Constructor for compare_args, which is used to pass
>> + /// information to the comparison threads.
>> + ///
>> + /// @param elf1 the first elf file to consider.
>> + ///
>> + /// @param debug_dir1 the directory where the debug info file for @p
>> + /// elf1 is stored.
>> + ///
>> + /// @param elf2 the second elf file to consider.
>> + ///
>> + /// @param debug_dir2 the directory where the debug info file for @p
>> + /// elf2 is stored.
>> + ///
>> + /// @param opts the options the current program has been called with.
>> + compare_args(const elf_file elf1, const string& debug_dir1,
>> + const elf_file elf2, const string& debug_dir2,
>> + const options& opts)
> I think elf1 and elf2 arguments should be passed by reference, to
> avoid a useless copy at function invocation time.
>
>> + : elf1(elf1), debug_dir1(debug_dir1), elf2(elf2),
>> + debug_dir2(debug_dir2), opts(opts)
>> + {}
>> +};
>> +typedef shared_ptr<compare_args> compare_args_sptr;
> Please add a comment for this typedef.
>
> [...]
>
>> @@ -596,10 +660,14 @@ erase_created_temporary_directories_parent()
>>
>> /// Extract the content of a package.
>> ///
>> -/// @param package the package we are looking at.
>> -static bool
>> -extract_package(const package& package)
>> +/// @param pkg the package we are looking at.
>> +/// @return via pthread_exit() a pointer to a boolean value of true upon
>> +/// successful completion, false otherwise.
>> +static void
>> +extract_package(const package* pkg)
> Rather than changing this function, I would add a wrapper to it,
> called pthread_routine_extract_package() or something similar. That
> wrapper would call extract_package as it is right now, and either and
> pthread_exit() with the appropriate value, depending on the return
> value of extract_package.
>
> The idea is that if we later want to use extract a package
> independently from the pthread boilerplate, we could just call
> extract_package. Inherently, extracting a package is a basic action
> that shouldn't not depend on pthread.
>
>> {
>> + bool *ret = new bool (true);
>> + const package& package = *pkg;
>> switch(package.type())
>> {
>> case abigail::tools_utils::FILE_TYPE_RPM:
>> @@ -607,13 +675,13 @@ extract_package(const package& package)
>> if (!extract_rpm(package.path(), package.extracted_dir_path()))
>> {
>> cerr << "Error while extracting package" << package.path() << "\n";
>> - return false;
>> + goto FALSE;
>> }
>> - return true;
>> + goto TRUE;
>> #else
>> cerr << "Support for rpm hasn't been enabled. Please consider "
>> "enabling it at package configure time\n";
>> - return false;
>> + goto FALSE;
>> #endif // WITH_RPM
>> break;
>> case abigail::tools_utils::FILE_TYPE_DEB:
>> @@ -621,20 +689,20 @@ extract_package(const package& package)
>> if (!extract_deb(package.path(), package.extracted_dir_path()))
>> {
>> cerr << "Error while extracting package" << package.path() << "\n";
>> - return false;
>> + goto FALSE;
>> }
>> - return true;
>> + goto TRUE;
>> #else
>> cerr << "Support for deb hasn't been enabled. Please consider "
>> "enabling it at package configure time\n";
>> - return false;
>> + goto FALSE;
>> #endif // WITH_DEB
>> break;
>>
>> case abigail::tools_utils::FILE_TYPE_DIR:
>> // The input package is just a directory that contains binaries,
>> // there is nothing to extract.
>> - break;
>> + goto TRUE;
>>
>> case abigail::tools_utils::FILE_TYPE_TAR:
>> #ifdef WITH_TAR
>> @@ -642,20 +710,23 @@ extract_package(const package& package)
>> {
>> cerr << "Error while extracting GNU tar archive "
>> << package.path() << "\n";
>> - return false;
>> + goto FALSE;
>> }
>> - return true;
>> + goto TRUE;
>> #else
>> cerr << "Support for GNU tar hasn't been enabled. Please consider "
>> "enabling it at package configure time\n";
>> - return false;
>> + goto FALSE;
>> #endif // WITH_TAR
>> break;
>>
>> default:
>> - return false;
>> + break;
>> }
>> - return true;
>> +FALSE:
>> + *ret = false;
>> +TRUE:
>> + pthread_exit(ret);
>> }
> [...]
>
>> @@ -742,145 +821,158 @@ set_diff_context_from_opts(diff_context_sptr ctxt,
>>
>> /// Compare the ABI two elf files, using their associated debug info.
>> ///
>> -/// The result of the comparison is emitted to standard output.
>> -///
>> -/// @param elf1 the first elf file to consider.
>> -///
>> -/// @param debug_dir1 the directory where the debug info file for @p
>> -/// elf1 is stored.
>> +/// The result of the comparison is saved to a global corpus map.
>> ///
>> -/// @param elf2 the second eld file to consider.
>> +/// @args the list of argument sets used for comparison
>> ///
>> -/// @param debug_dir2 the directory where the debug info file for @p
>> -/// elf2 is stored.
>> -///
>> -/// @param opts the options the current program has been called with.
>> -///
>> -/// @return the status of the comparison.
>> +/// @return the status of the comparison via pthread_exit().
>> static abidiff_status
>> -compare(const elf_file& elf1,
>> - const string& debug_dir1,
>> - const elf_file& elf2,
>> - const string& debug_dir2,
>> - const options& opts)
>> +compare_binaries(list<compare_args_sptr> *args)
>> {
> Likewise, rather than making this compare() function know about
> pthread boiler plate, I'd rather leave it alone, and write a wrapper
> function named pthread_routine_compare() that would take a vector of
> compare_args_sptr (why a list, by the way?). OK, of course, this
> compare() function should be modified to take an environment in
> parameter, and use that. It should also "return" the resulting
> corpus_diff_sptr, by parameter, rather than emitting the report to
> standard output itself. By parameter, I mean that the function should
> take a reference to corpus_diff_sptr by parameter, and set that
> parameter to the corpus_diff_sptr resulting from the comparison. The
> caller (here the pthread routine) would then arrange for
> creating/handling the environment, pass it to compare(), and the
> resulting corpus_diff_sptr appropriately. Does that make sense?
>
> [...]
I originally used list, because it had its own sort method, I rewrote
it to use the generic sort from algorithm and a vector.
>> }
> [...]
>
>>
>> /// Extract the content of a package and map its content.
>> +/// Also extract its accompanying debuginfo package.
>> ///
>> -/// The extracting is done to a temporary
>> -///
>> -/// @param package the package to consider.
>> +/// The extracting is done to a temporary directory.
>> ///
>> -/// @param opts the options the current package has been called with.
>> +/// @param a the set of arguments needed for successful extraction;
>> +/// specifically the package itself, the options the current package has been
>> +/// called with and a callback to traverse the directory structure.
>> ///
>> -/// @return true upon successful completion, false otherwise.
>> -static bool
>> -extract_package_and_map_its_content(package& package,
>> - const options& opts,
>> - ftw_cb_type callback)
>> +/// @return via pthread_exit() true upon successful completion, false
>> +/// otherwise.
>> +static void
>> +prepare_package_set(struct extract_args *a)
> I would prefer that we call this function
>
> pthread_routine_extract_pkg_and_map_its_content(), as it says more
> specifically what it does; it also says that it's to be used as a
> pthread routine.
>
> Of course, keep the extended comment that you added, saying that it
> also extracts the accompanying debug info.
>
> Also, please drop the 'struct' keyword in 'struct extra_args' (that
> should now be package_descriptor) because it's not necessary here.
>
>> {
>> - if (!extract_package(package))
>> - return false;
>> + pthread_t thr_pkg, thr_debug;
>> + package& package = a->pkg;
>> + const options& opts = a->opts;
>> + ftw_cb_type callback = a->callback;
>> + bool is_debug, result = true;
>> + bool *child_result;
>> +
>> +#define PTHREAD_JOIN(thr,thr_ret) \
>> + do { \
>> + if (!pthread_join(thr, reinterpret_cast<void **>(&thr_ret))) \
>> + { \
>> + result &= *thr_ret; \
>> + delete thr_ret; \
>> + } \
>> + else \
>> + result = false; \
>> + } while (0)
> Why not replacing this define by a function that would take the thread
> id, a bool** thread_return_value, and a bool& result?
>
>> +
>> + // The debug-info package usually takes longer to extract than the main
>> + // package plus that package's mapping for ELFs and optionally suppression
>> + // specs, so we run it ASAP.
>> + if ((is_debug = !!package.debug_info_package()))
> Here, the is_debug variable should be named 'has_debug_info_pkg',
> because package.debug_info_package() doesn't say anything about the
> fact that the package *itself* is a debug info package; rather it
> returns the *associated* debug info package, if it's present.
>
> Also, I don't think the '!!' is necessary, because the returned value
> of package.debug_info_package() should be implicitly converted to
> bool. Or am I missing something?
Oops, the !! was necessary before, then I changed the code and forgot to
remove it.
> [...]
>
>> }
> [...]
>
>>
>> /// Prepare the packages for comparison.
>> @@ -984,19 +1139,86 @@ prepare_packages(package& first_package,
>> package& second_package,
>> const options& opts)
>> {
>> - if (!extract_package_and_map_its_content(first_package, opts,
>> - first_package_tree_walker_callback_fn)
>> - || !extract_package_and_map_its_content(second_package, opts,
>> - second_package_tree_walker_callback_fn))
>> - return false;
>> + bool ret = true;
>> + bool *thread_retval;
>> + const int npkgs = 2;
>> + pthread_t extract_thread[npkgs];
>> +
>> +#define PTHREAD_JOIN(thr,thr_ret) \
>> + do { \
>> + if (!pthread_join(thr, reinterpret_cast<void **>(&thr_ret))) \
>> + { \
>> + ret &= *thr_ret; \
>> + delete thr_ret; \
>> + } \
>> + else \
>> + ret = false; \
>> + } while (0)
> If what I said for the previous incarnation of this macro makes sense,
> we should replace this one by a function as well.
>
>> +
>> + extract_args ea[npkgs] =
> On one hand you are using a variable (npkgs) here, but you still need
> to know that the variable's value is '2', otherwise, the
> initialization won't pass compilation, because the number of members
> of the array is '2'.
Yeah, that's redundant. I'll let the compiler do the math ;)
> So I'd rather write:
>
> extract_args ea[] =
>
> (okay this should be package_descriptor, rather than extract_args).
>
>> + {
>> + {
>> + .pkg = first_package,
>> + .opts = opts,
>> + .callback = first_package_tree_walker_callback_fn
>> + },
>> + {
>> + .pkg = second_package,
>> + .opts = opts,
>> + .callback = second_package_tree_walker_callback_fn
>> + },
>> + };
> [...]
>
>> +}
>> +/// Compare the added sizes of a ELF pair specified by @a1
>> +/// with the sizes of a ELF pair from @a2.
>> +///
>> +/// Larger filesize strongly raises the possibility of larger debug-info,
>> +/// hence longer diff time. For a package containing several relatively
>> +/// large and small ELFs, it is often more efficient to start working on
>> +/// the larger ones first. This function is used to order the pairs by
>> +/// size, starting from the largest.
>> +///
>> +/// @a1 the first set of arguments containing also the size information about
>> +/// the ELF pair being compared.
>> +///
>> +/// @a2 the second set of arguments containing also the size information about
>> +/// the ELF pair being compared.
>> +bool
>> +elf_less(compare_args_sptr &a1, compare_args_sptr &a2)
> This is really a nit, but throughout the project, function names must
> have a verb in them, for better legibility. So please call this
> elf_size_is_less() or something like that.
>
I renamed it to elf_size_is_greater, since that's what is actually returned,
"elf_pair1 greater elf_pair2 ?" Assembly veterans will surely forgive me for
using "greater" rather than the unsigned "above".
>> +{
>> + off_t s1 = a1->elf1.size + a1->elf2.size;
>> + off_t s2 = a2->elf1.size + a2->elf2.size;
>> +
>> + return s1 > s2;
>> }
>>
>> /// Compare the ABI of two packages
>> @@ -1037,6 +1259,8 @@ compare(package& first_package,
>>
>> abidiff_status status = abigail::tools_utils::ABIDIFF_OK;
>>
>> + list<compare_args_sptr> elf_pairs;
> Why not using a vector here? I mean, it takes less size in memory,
> and I am not sure there really is a speed penalty incurred by adding
> elements to this container.
>
> [...]
>
>
> And, for the record, here is the original patch that you attached to
> bugzilla.
Thanks for the review, all the issues should now be addressed in the
attached patch.
Cheers,
Ondrej.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Bug-19081-abipkgdiff-parallelization.patch
Type: text/x-patch
Size: 31151 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/libabigail/attachments/20150101/317d14ae/attachment.bin>
More information about the Libabigail
mailing list