[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: a new tool fedabipkgdiff

To: Chenxiong Qi <cqi@redhat.com>
Subject: Re: a new tool fedabipkgdiff
From: Dodji Seketeli <dodji@redhat.com>
Date: Thu, 31 Mar 2016 13:03:02 +0200
Authentication-results: sourceware.org; auth=none
Cc: libabigail@sourceware.org
Delivered-to: listarch-libabigail@sourceware.org
Delivered-to: mailing list libabigail@sourceware.org
In-reply-to: <56E76073.8030002@redhat.com> (Chenxiong Qi's message of "Tue, 15 Mar 2016 09:08:03 +0800")
List-help: <mailto:libabigail-help@sourceware.org>
List-post: <mailto:libabigail@sourceware.org>
List-subscribe: <mailto:libabigail-subscribe@sourceware.org>
Mailing-list: contact libabigail-help@sourceware.org; run by ezmlm
Organization: Me, myself and I
References: <56C44DA7.9040609@redhat.com> <56E76073.8030002@redhat.com>
Sender: libabigail-owner@sourceware.org
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)
Hello Chenxiong,

Sorry for my being late to review this.  It just felt through the cracks
while I was looking as something else.  Oh well.

So I reviewed it and I must say this is good stuff :-)

I looked at it in details and the comments I have now are mostly about
documentations and the like.

Please find my comments below.

>     * autoconf-archive/ax_compare_version.m4: New file copied from the

Each of these lines should start with a 'tab' character (you know, the
\t character), as specified in the file COMMIT-LOG-GUIDELINES:

    The subsequent lines should have the form of the Body of a GNU ChangeLog
    entry, i.e:

	    * file1.c (func1): Changed foo in this function.
	    (func2): Changed blah in that function
	    * file2.c (func_foo): Changed something here.

    Note that before the '*', there is a tab that is 8 spaces long.  Also
    note that right after the '*', there is a space.

To ease the task, I have edited your ChangeLog.  Here is the result,
that you can add to your patch:

	* autoconf-archive/ax_compare_version.m4: New file copied from the
	autoconf-archive project.
	* autoconf-archive/ax_prog_python_version.m4: Likewise.
	* autoconf-archive/ax_python_module.m4: Likewise.
	* Makefile.am: Add the new files above to the source distribution.
	* configure.ac: Include the new m4 macros from the autoconf
	archive. Add a new --enable-fedabipkgdiff option. Update the
	report at the end of the configure process to show the status of
	the fedabipkgdiff feature. Add check for prerequisite python
	modules itertools, shutil, unittest and mock.  These are necessary
	for the unit test of fedabipkgdiff. Generate
	tests/runtestfedabipkgdiff.py into the build directory, from the
	tests/runtestfedabipkgdiff.py.in input file.
	* tools/Makefile.am: Include the fedabipkgdiff to the source
	distribution and install it if the "fedabipkgdiff" feature is
	enabled.
	* tests/Makefile.am: Rename runtestfedabipkgdiff.sh into
	runtestfedabipkgdiff.py.  Add the new runtestfedabipkgdiff.py.in
	autoconf template file in here.
	* tests/runtestfedabipkgdiff.py.in: New unit test file.
	* tools/fedabipkgdiff: New tool fedabipkgdiff.
[...]

> diff --git a/tests/runtestfedabipkgdiff.py.in b/tests/runtestfedabipkgdiff.py.in
> new file mode 100755
> index 0000000..3087212
> --- /dev/null
> +++ b/tests/runtestfedabipkgdiff.py.in
> @@ -0,0 +1,450 @@
> +#!/usr/bin/python
> +# -*- coding: utf-8 -*-
> +# -*- Mode: Python
> +#
> +# This file is part of the GNU Application Binary Interface Generic
> +# Analysis and Instrumentation Library.  This program is free
> +# software; you can redistribute it and/or modify it under the terms
> +# of the GNU General Public License as published by the Free Software
> +# Foundation; either version 3, or (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# General Lesser Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this program; see the file COPYING-LGPLV3.  If
> +# not, see <http://www.gnu.org/licenses/>.

This is a program, not a library.  In the project, by default,
programs are GPL.  It's only the library that is LGPL.  So if this is
an oversight, I guess we should fix it and change this license to GPL.

> +#
> +# Author: Chenxiong Qi
> +
> +import os
> +import itertools
> +import shutil
> +import unittest
> +
> +import koji

In general, I find that this file lacks comments.  There should be a
general comment at the beginning of the file that introduces the kind
of tests that are performed here.  For instance,
tests/test-diff-pkg.cc has:

    /// @file
    ///
    /// This test harness program computes the ABI changes between ELF
    /// binaries present inside input packages.  Some of the input
    /// packages have debuginfo, some don't.  The resulting ABI change
    /// report is then compared with a reference one.
    ///
    /// The set of input files and reference reports to consider should be
    /// present in the source distribution, which means they must be
    /// referenced in tests/data/Makefile.am by the EXTRA_DIST variable.

And if you look into that file, there are various other comments that
describe how the test is organized etc.  I think we should try and do
the same for all the tests that we add.

For instance:

[...]

> +counter = itertools.count(0)

What is this global variable for?  We should have a comment for this.


> +class UtilsTest(unittest.TestCase):

We should have an introductory comment for this, I believe.

> +
> +    def test_is_fedora_distro(self):

Same for this test.  What is the philosophy behind this?  What is the
kind of "distro strings" we want to accept? etc.

> +        distro = 'fc5'
> +        self.assertTrue(fedabipkgdiff_mod.is_distro_valid(distro))
> +
> +        distro = 'f5'
> +        self.assertFalse(fedabipkgdiff_mod.is_distro_valid(distro))
> +
> +        distro = 'fc23'
> +        self.assertTrue(fedabipkgdiff_mod.is_distro_valid(distro))
> +
> +        distro = 'fc'
> +        self.assertFalse(fedabipkgdiff_mod.is_distro_valid(distro))
> +
> +        distro = 'fc234'
> +        self.assertFalse(fedabipkgdiff_mod.is_distro_valid(distro))
> +
> +        distro = 'el7'
> +        self.assertTrue(fedabipkgdiff_mod.is_distro_valid(distro))
> +
> +        distro = 'el7_2'
> +        self.assertFalse(fedabipkgdiff_mod.is_distro_valid(distro))
> +
> +
> +class RPMTest(unittest.TestCase):
> +    """Test case for RPM class"""

For instance, I would have expected this comment to say something
introductory about the tests performed on the fedabipkgdiff.RPM
class.

> +
> +    def setUp(self):

Here, I would have expected a comment about what this
debuginfo_rpm_info data member is, saying that it must be the same
type as the parameter type expected by the constructor of the
fedabipkgdiff.RPM class.  So if that constructor's parameter type
changes, this test must be updated.  Something like that.

> +        self.debuginfo_rpm_info = {
> +            'arch': 'i686',
> +            'name': 'httpd-debuginfo',
> +            'release': '1.fc22',
> +            'version': '2.4.18'
> +            }

Likewise.

> +        self.rpm_info = {
> +            'arch': 'x86_64',
> +            'name': 'httpd',
> +            'release': '1.fc22',
> +            'version': '2.4.18'
> +            }
> +
> +    def test_attribute_access(self):

Please add a comment saying what this test does.  At least its
intent.  For instance:

    This test enforces the exported interface of the fedabipkgdiff.RPM
    class.  If a new data member is added to that type, removed or
    changed in that class, please update this test accordingly.

Maybe repeat here that the data members of the RPM class are the same
as the data members of the poorly documented return type of the koji
getRPM request, and so whenever that return type changes, we need to
update this test.  You know, something like that.  Otherwise, there is
too much implicit knowledge here, and this makes maintenance painful
for anyone would would have to take over tomorrow.  And this is
especially true for a program in python where types are so loosely
defined and thus where type mis-matches won't be caught until the
program's runtime.

etc.

> +        rpm = fedabipkgdiff_mod.RPM(self.debuginfo_rpm_info)
> +        self.assertEquals(self.debuginfo_rpm_info['arch'], rpm.arch)
> +        self.assertEquals(self.debuginfo_rpm_info['name'], rpm.name)
> +        self.assertEquals(self.debuginfo_rpm_info['release'], rpm.release)
> +        self.assertEquals(self.debuginfo_rpm_info['version'], rpm.version)

[...]

I think all test cases should be documented similarly.  And also, all
the classes that are helper classes for the tests should be
documented.


> --- /dev/null
> +++ b/tools/fedabipkgdiff
> @@ -0,0 +1,805 @@
> +#!/usr/bin/env python
> +# -*- coding: utf-8 -*-
> +# -*- Mode: Python
> +#
> +# This file is part of the GNU Application Binary Interface Generic
> +# Analysis and Instrumentation Library.  This program is free
> +# software; you can redistribute it and/or modify it under the terms
> +# of the GNU General Public License as published by the Free Software
> +# Foundation; either version 3, or (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# General Lesser Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this program; see the file COPYING-LGPLV3.  If
> +# not, see <http://www.gnu.org/licenses/>.

Likewise, I believe this should be GPL.

> +#
> +# Author: Chenxiong Qi
> +

Please add an introductory comment here, explaining the intend of the
tool.  Just like what is done for in tools/abidw.cc:

    /// @file
    ///
    /// This program reads an elf file, try to load its debug info (in
    /// DWARF format) and emit it back in a set of "text sections" in native
    /// libabigail XML format.


[...]

Generally speaking, I would like to see all the function parameter
*types* and return values to be documented.

[...]

> +def log_call(func):
> +    def proxy(*args, **kwargs):

Please add a comment that says that this is a decorator.  And please
tell what we expect *args and **kwargs to be.  (the arguments of func,
as well as the key word arguments, if any).

[...]


> +class RPM(object):
> +    """Represeting a RPM"""
> +
> +    def __init__(self, data):
> +        """Initialize a RPM object
> +
> +        :param dict data: a dict representing a RPM information got from koji
> +            API, either listRPMs or getRPM

What are the keys and values expected in that dict?  Granted the koji
API is so "lightly" documented, it doesn't really help.  But I think
we could write an example of the dict keys/values that *we* expect
here, to ease maintenance.

Also, you could say something about the fact that the set of data
members expected by this class is enforced in the unit test case 

[...]

> +    @property
> +    def nvra(self):
> +        return '%(name)s-%(version)s-%(release)s.%(arch)s' % self.data
> +
> +    @property
> +    def filename(self):
> +        return '{0}.rpm'.format(self.nvra)
> +
> +    @property
> +    def is_debuginfo(self):
> +        """Check if a RPM is a debuginfo"""

I would have said:

    "Check if the name of the current RPM denotes a debug info
     package"

> +        return koji.is_debuginfo(self.data['name'])
> +
> +    @property
> +    def download_url(self):
> +        """Get the URL from where to download from koji"""
> +        build = session.getBuild(self.build_id)
> +        return os.path.join(pathinfo.build(build), pathinfo.rpm(self.data))
> +
> +    @property
> +    def downloaded_file(self):
> +        """Get a pridictable downloaded file name with absolute path"""
> +        # arch should be removed from the result returned from PathInfo.rpm
> +        filename = os.path.basename(pathinfo.rpm(self.data))
> +        return os.path.join(get_download_dir(), filename)
> +
> +    @property
> +    def is_downloaded(self):

Please add a comment here.

> +        return os.path.exists(self.downloaded_file)
> +
> +
> +class LocalRPM(RPM):
> +    """Representing a local RPM
> +
> +    Local RPM means the one that could be already downloaded or built from
> +    where I can find it
> +    """
> +
> +    def __init__(self, filename):

Please add a comment here.  What is the type of the "filename"
parameter etc...

> +        self.local_filename = filename
> +        self.data = koji.parse_NVRA(os.path.basename(filename))
> +
> +    @property
> +    def downloaded_file(self):

Comments.

> +
> +class Brew(object):
> +    """Proxy to kojihub XMLRPC with additional extensions to fedabipkgdiff
> +
> +    kojihub XMLRPC APIs are well-documented in koji's source code. For more
> +    details information, please refer to class RootExports within kojihub.py.
> +    """

Please add an html link to the koji source code where the XMLRPC APIs
are documented.

> +
> +    def __init__(self, baseurl):
> +        """Initialize Brew that is a proxy to koji.ClientSession"""
> +        self.session = koji.ClientSession(baseurl)
> +
> +    @log_call
> +    def listRPMs(self, selector=None, **kwargs):
> +        """Proxy to kojihub.listRPMs
> +
> +        :param selector: to adjust if a RPM should be selected
> +        :type selector: a callable object
> +        :param kwargs: keyword parameters accepted by kojihub.listRPMs

Here if you could give examples of the keywords (and their arguments)
that we'd pass, it would be good.  I know the reader can always go dig
into kojihub.listRPMs, but examples would help.  I think redundancy is
good here.

> +        :type kwargs: dict
> +        :return: a list of RPMs, each of them is a dict object

Again, an example of the dict object that we expect would be good.

> +        :rtype: list
> +        """
> +        if selector:
> +            assert hasattr(selector, '__call__'), 'selector should be callable'
> +        rpms = self.session.listRPMs(**kwargs)
> +        if selector:
> +            rpms = [rpm for rpm in rpms if selector(rpm)]
> +        return rpms
> +
> +    @log_call
> +    def getRPM(self, *args, **kwargs):

Comments.

> +        rpm = self.session.getRPM(*args, **kwargs)
> +        if rpm is None:
> +            raise RpmNotFound('Cannot find RPM {0}'.format(args[0]))
> +        return rpm
> +
> +    @log_call
> +    def listBuilds(self, topone=None, selector=None, order_by=None,
> +                   reverse=None, **kwargs):
> +        """Proxy to kojihub.listBuilds to list completed builds

Please document here (yeah I know it's redundant) the parameters that
kojihub.listBuilds expects.

> +
> +        Suport additional two keyword parameters:
> +
> +        :param bool topone: whether to return the top first one
> +        :param selector: a callable object used to select specific subset of
> +            builds
> +        :type selector: callable object
> +        :param str order_by: the attribute name by which to order the builds,
> +            for example, name, version, or nvr.
> +        :param bool reverse: whether to order builds reversely
> +        :param dict kwargs: keyword parameters accepted by kojihub.listBuilds
> +        :return: a list of builds, even if just return only one build
> +        :rtype: list
> +        """
> +        if 'state' not in kwargs:
> +            kwargs['state'] = koji.BUILD_STATES['COMPLETE']
> +
> +        if selector is not None and not hasattr(selector, '__call__'):
> +            raise TypeError(
> +                '{0} is not a callable object.'.format(str(selector)))
> +
> +        if order_by is not None and not isinstance(order_by, basestring):
> +            raise TypeError('order_by {0} is invalid.'.format(order_by))
> +
> +        builds = self.session.listBuilds(**kwargs)
> +        if selector is not None:
> +            builds = [build for build in builds if selector(build)]
> +        if order_by is not None:
> +            # FIXME: is it possible to sort builds by using opts parameter of
> +            # listBuilds
> +            builds = sorted(builds,
> +                            key=lambda item: item[order_by],
> +                            reverse=reverse)
> +        if topone:
> +            builds = builds[0:1]
> +
> +        return builds
> +
> +    @log_call
> +    def getPackage(self, name):
> +        """Proxy to kojihub.getPackage
> +
> +        :param str name: package name

Please give example of the package name that we expect, so that a user
of this function has an idea of the correct form expected.

> +        :return: a dict object representing a package
> +        :rtype: dict
> +        """
> +        package = self.session.getPackage(name)
> +        if package is None:
> +            package = self.session.getPackage(name.rsplit('-', 1)[0])
> +            if package is None:
> +                raise KojiPackageNotFound(
> +                    'Cannot find package {0}.'.format(name))
> +        return package
> +
> +    @log_call
> +    def getBuild(self, *args, **kwargs):
> +        """Proxy to kojihub.getBuild"""

Please describe the (keyword) arguments expected.

> +        return self.session.getBuild(*args, **kwargs)
> +

> +    @log_call
> +    def get_rpm_build_id(self, name, version, release, arch=None):
> +        """Get build ID that contains a rpm with specific nvra
> +
> +        If arch is omitted, a rpm can be identified easily by its N-V-R-A.
> +
> +        If arch is omitted, name is used to get associated package, and then
> +        to get the build.
> +
> +        :param str name: name of a rpm
> +        :param str version: version of a rpm
> +        :param str release: release of a rpm

I think that here, example of what you expect as release (at least)
would be interesting.

[...]

> +
> +    @log_call
> +    def get_package_latest_build(self, package_name, distro):
> +        """Get latest build from a package
> +
> +        :param str package_name: from which package to get the latest build
> +        :param str distro: which distro the latest build belongs to

The form of what is expected as distro would be important too.  That
would help to maintain the test suite.

[...]

+@log_call
+def get_session():

Comments.

+    return Brew(global_config.koji_server)

[...]

+def get_download_dir():
+    """Return the directory holding all downloaded rpms"""
+    download_dir = os.path.join(HOME_DIR, 'downloads')

I would use the standard $XDG_CACHE_HOME place, to store the
downloaded packages:
https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html.
That is, ~/.local/fedabipkgdiff.

I could also use a fedabipkgdiff --clean-cache option to clean the
cache cache, but we can add that this later, after the patch is in.

+    if not os.path.exists(download_dir):
+        os.makedirs(download_dir)
+    return download_dir
+

> +@log_call
> +def abipkgdiff(pkg_info1, pkg_info2):
> +    """Run abipkgdiff against found two RPM packages"""

Please document the expected types of the parameters here.

+    print 'ABI check on {0} and {1}'.format(pkg_info1.package.filename,
+                                            pkg_info2.package.filename)

Please, rather print:

'Comparing the ABI of binaries between {0} and {1}:'

[...]

> +def magic_construct(rpms):
> +    """Construct RPMs into a magic structure
> +
> +    Convert list of
> +
> +    foo-1.0-1.fc22.i686
> +    foo-debuginfo-1.0-1.fc22.i686
> +    foo-devel-1.0-1.fc22.i686
> +
> +    to list of
> +
> +    (foo-1.0-1.fc22.i686, foo-debuginfo-1.0-1.fc22.i686)
> +    (foo-devel-1.0-1.fc22.i686, foo-debuginfo-1.0-1.fc22.i686)
> +    """
> +    debuginfo = None
> +    packages = []
> +    for rpm in rpms:
> +        if rpm.is_debuginfo:
> +            debuginfo = rpm
> +        else:
> +            packages.append(rpm)
> +    return [PkgInfo(package, debuginfo) for package in packages]
> +
> +
> +@log_call
> +def run_abipkgdiff(pkg1_infos, pkg2_infos):
> +    """Run abipkgdiff
> +
> +    If one of the executions finds ABI differences, the return code is the
> +    return code from abipkgdiff.
> +
> +    :param dict pkg1_infos: a dict mapping from arch to list of rpms, that is
> +        returned from method make_rpms_usable_for_abipkgdiff

Rather than refer to what make_rpms_usable_for_abipkgdiff returns,
please redundantly describe the dict here, giving a real example, so
that a user willing to use this function can see what she should feed
it with.

[...]

> +@log_call
> +def diff_local_rpm_with_latest_rpm_from_koji():
> +    """Diff against local rpm and remove latest rpm
> +
> +    This operation handles a local rpm and debuginfo rpm and remote ones
> +    located in remote Koji server, that has specific distro specificed by
> +    argument --from.
> +
> +    1/ Suppose the packager has just locally built a package named
> +    foo-3.0.fc24.rpm. To compare the ABI of this locally build package with the
> +    latest stable package from Fedora 23, one would do:
> +
> +    fedabipkgdiff --from f23 ./foo-3.0.fc24.rpm
> +    """

Great comment, thanks.

[...]


> +@log_call
> +def make_rpms_usable_for_abipkgdiff(rpms):
> +    """
> +    Construct result that contains mappings from arch to download url and
> +    downloaded rpm filename of rpm and debuginfo rpm
> +
> +    :return: a mapping from an arch to a list of rpms

Please give an example of this dict.

[...]


> +@log_call
> +def diff_rpms_with_nvra(name, version, release, arch=None,
> +                        all_subpackages=None):

Comments.

> +    build_id = session.get_rpm_build_id(name, version, release, arch)
> +    rpms = session.select_rpms_from_a_build(build_id, name, arches=arch,
> +                                            select_subpackages=all_subpackages)
> +    return make_rpms_usable_for_abipkgdiff(rpms)
> +

Also, I wouldn't call this function "diff_rpms_with_nvra", because the
function doesn't perform any diffing, does it?  Rather, my
understanding is that it constructs the two sets of package
descriptors that would be passed to abipkgdiff.


> +@log_call
> +def diff_two_nvras_from_koji():
> +    """Diff two nvras from koji
> +
> +    The arch probably omits, that means febabipkgdiff will diff all arches. If

I would say "the arch is probably omitted ..."

[...]

+def main():

[...]


+    else:
+        print >>sys.stderr, 'Unknown arguments. Please refer to -h.'
+        returncode = 1

Please, refer to --help, rather -h just like what is done in the other
libabigail tools.

+
+    return returncode

All in all, I am wondering if we shouldn't add some progress
indication to the user, just like what "dnf" does.  You know, saying
things like "we are downloading this package etc, and now we are
comparing these packages, etc."  Otherwise, the time taken can seem
quite long.  But we can add this later, after the patch has been
committed, I guess.

Also, we should arrange for the downloads to happen in parallel,
somehow.  Again, this can happen after the patch is in.

Now, the other big thing missing from this patch is a manual.

That is really important.  Every single libabigail tool has a
documentation in docs/manuals.  Documentation is in the restructured
text format, and we use python-sphinx to generate html, man and info
variants for it.  The manual should also say where the packages are
downloaded.

If you don't feel like adding a manual for this tool, please tell me,
I'll help.

I love this patch very much!  Thank you for your hard work on this.
This is simply awesome.  I cannot wait for it to get in :-)

Cheers,

-- 
		Dodji
References:
- a new tool fedabipkgdiff
  - From: Chenxiong Qi <cqi@redhat.com>
- Re: a new tool fedabipkgdiff
  - From: Chenxiong Qi <cqi@redhat.com>
Prev by Date: [Bug default/19885] ELF/DWARF reader cannot associates a function DIE to a symbol on powerpc64
Previous by thread: Re: a new tool fedabipkgdiff
Next by thread: [Bug default/19658] New: Type canonicalization slow for the 2nd binary being loaded
Index(es):
- Date
- Thread