[ITP] python-license-expression and cygport PoC patch

Brian Inglis Brian.Inglis@SystematicSw.ab.ca
Sun Jun 23 20:12:03 GMT 2024


On 2024-06-23 08:26, Jon Turney via Cygwin-apps wrote:
> On 06/06/2024 20:03, Brian Inglis via Cygwin-apps wrote:
>> I found github/nexB/license-expression Python package to do SPDX licence 
>> checks developed by the same team doing SPDX-toolkit for SPDX, using the same 
>> current data, by and working with Fedora folks et al.
> 
> Thanks for taking a look at this problem.
> 
> Having a package for this seems fine, but: this package is what calm uses, and 
> still has the drawbacks I mentioned:
> 
> * embeds the SPDX license data, doesn't dynamically fetch it
> * can't really handle LicenseRef reasonably

Thanks Jon,

They appear to be looking at splitting the code and data packages, as the rules 
appear to be settling down somewhat, whereas the data will only expand, and they 
seem to want to make it easier to keep up the release cadence.

[But looking back at zoneinfo tzdb packages tzcode and tzdata: they used to be 
split, are still are logically, but every release now happens to both 
simultaneously, and for our distro, it makes no sense to not release tzcode, 
including the utilities, with tzdata, as that way we always pick up the latest 
tweaks.]

Given the trapping problems you solved below, it should be possible in the SPDX 
instantiation, to trap the unpackaged LicenseRef-* and ExceptionRef-* entries 
and downgrade their reporting to warnings.

[I do not agree with their packaging of ...Refs based on the reporting or 
requesting organization, for example, ScanCode, Fedora, etc. rather than the 
package source, for example, I use LicenseRef-IANA-TZ-Public-Domain rather than 
something like LicenseRef-ScanCode-Public-Domain, and they only allow 
ExceptionRefs after WITH even when, for example, Google grants IP rights in 
addition to those in the licence, tying both together, so they are *NOT* 
independent.

Having raised a few issues and made a few points about various public domain 
packages and licences, SPDX appear to be fixated on licence texts as the 
embodiment of each variety of public domain licence, despite there probably only 
being a few major sources: expired copyrights (coming up for really ancient 
sources), US government department sources, and individual US developers; there 
may be occasional non-US individual developers, but as there is no real public 
domain concept elsewhere, they have often come up with equivalent expressions, 
like WTFPL, etc.]

I also am not sure we want to have to jump thru SPDX hoops for each Cygwin 
package licence we hit before Fedora does?

>> Successful attempt to package Python license-expression (without tests):
>>
>>      https://cygwin.com/cgi-bin2/jobs.cgi?id=8210
>>
>> log at:
>>
>>      https://github.com/cygwin/scallywag/actions/runs/9293093201
>>
>> cygport attached and at:
>>
>> https://cygwin.com/cgit/cygwin-packages/playground/commit/?id=3626386b10c967f780547d1703ad23bd50f6331a
>>
>> The package installs and runs using PoC attached in spdx-license-expression.py 
>> script hooked into /usr/share/cygport/lib/pkg_pkg.cygpart license hint 
>> addition patch attached.
> 
> I'm not super-keen on adding a cygport dependency on python, just to do this check.
> 
> It would probably be preferable to do this check initially after the .cygport is 
> read, rather than only telling you about problems when you get around to doing 
> to the package step.

Add after the mandatory variables checks for LICENSE, etc.?
Could be optional additional packages - install 
python-SPDX-licen[cs]e-expression package which depends on 
python-license-expression to do checks - cygport checks for SPDX and runs 
licence checks only if present?

>> I also ran a test of the Python script and module against all package source 
>> cygport files declaring licences which I maintain or ever looked at, including 
>> a git/cygwin-packages/*.cygport download from 2023-02, showing the results in 
>> the attached log.
>> I also attempted to trap the exceptions in the script, but that does not seem 
>> to work in any documented obvious manner, but I do not know enough Python to 
>> address this fully.
> 
> Yeah, the way validate() handles parse errors is bizarre and unhelpful.
> 
> What I ended up doing is calling parse() first to catch those errors, so 
> something like:
> 
>      try:
>          licensing.parse(expression)
>          errs = licensing.validate(expression).errors
>      except (ExpressionError, ExpressionParseError) as e:
>          print(e, file=sys.stderr)
>          return 2

Thanks for that tip, I will have another look at calm, and see if I can work on 
adding that, also checking for ...Refs, and warning or erroring (non-fatal) as 
appropriate, in python-SPDX-license-expression.

Then another approach early in cygport to detect and check.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry


More information about the Cygwin-apps mailing list