Zdenek Kabelac [Wed, 26 Oct 2022 12:33:09 +0000 (14:33 +0200)]
device_mapper: add parser for vdo metadata
Add very simplistic parser of vdo metadata to be able to obtain
logical_blocks stored within vdo metadata - as lvm2 may
submit smaller value due to internal aligment rules.
To avoid creation of mismatching table line - use this number
instead the one provided by lvm2.
Tony Asleson [Tue, 18 Oct 2022 17:30:09 +0000 (12:30 -0500)]
lvmdbusd: Always leverage udev
Previously we utilized udev until we got a dbus notification from lvm
command line tools. This however misses the case where something outside
of lvm clears the signatures on a block device and we fail to refresh the
state of the daemon. Change the behavior so we always monitor udev events,
but ignore those udev events that pertain to lvm members.
Note: --udev command line option no longer does anything and simply
outputs a message that it's no longer used.
Tony Asleson [Wed, 19 Oct 2022 17:46:45 +0000 (12:46 -0500)]
lvmnotify.c: Check to see if dbus daemon is running
The lvm dbus daemon will auto activate on dbus API calls. To
prevent the dbus daemon starting when lvm command line tools are
being used we will check to see if the daemon is running first.
If the daemon is not running, we will not notify the daemon.
corubba [Wed, 12 Oct 2022 14:19:01 +0000 (09:19 -0500)]
lvmlockd: Fix sanlock lvmlock lv size calculation
The number of extents for the sanlock lvmlock lv is calculated using
integer division, which rounds towards zero. With a physical extent size
of 129M, instead of the requested 256M the lv is only 129M (1 extent).
With any physical extent size greater than 256M the lv creation fails
because the number of extents is zero.
This is fixed by replacing the integer division with a division macro
that rounds up and thus guarantees that the size of the lv will always
be equal or greater than the requested size. Using the examples above, a
pes of 129M will result in a 258M lv (2 extents), pes of 300M in a 300M
lv (1 extent).
The re-calculation of the lv size in bytes and megabytes is only so the
debug output shows the correct values. The size in mb there is still
not byte-perfect-accurate, but good enough for a human-readable estimate;
and the exact size in bytes and extents is right next to it.
Peter Rajnoha [Wed, 12 Oct 2022 12:41:58 +0000 (14:41 +0200)]
toollib: do not process just created historical LV
When executing process_each_lv_in_vg, we process live LVs first and
after that, we process any historical LVs. In case we have just removed
an LV, which also means we have just made it "historical" and so it
appears as fresh item in vg->historical_lvs list, we have to skip it
when we get to processing historical LVs inside the same process_each_lv_in_vg
call.
The simplest approach here, without introducing another LV list, is to
simply mark such historical LVs as "fresh" directly in struct
historical_logical_volume when we have just removed the original LV
and created the historical LV for it. Then, we just need to check the
flag when processing historical LVs and skip it if it is "fresh".
When we read historical LVs out of metadata, they are marked as
"not fresh" and so they can be processed as usual.
This was mainly an issue in conjuction with -S|--select use:
(In this example, a thin pool with lvol1 thin LV and lvol2 and lvol3 snapshots.)
# lvs -H vg -o name,pool_lv,full_ancestors,full_descendants
LV Pool FAncestors FDescendants
lvol1 pool lvol2,lvol3
lvol2 pool lvol1 lvol3
lvol3 pool lvol2,lvol1
pool
...here, the historical LV lvol2 should not have been removed because
we have just removed its original non-historical lvol2 and the fresh
historical lvol2 must not be included in the same processing spree.
David Teigland [Wed, 31 Aug 2022 21:05:07 +0000 (16:05 -0500)]
device id: add new types using values from vpd_pg83
The new device_id types are: wwid_naa, wwid_eui, wwid_t10.
The new types use the specific wwid type in their name.
lvm currently gets the values for these types by reading
the device's vpd_pg83 sysfs file (this could change in the
future if better methods become available for reading the
values.)
If a device is added to the devices file using one of these
types, prior versions of lvm will not recognize the types
and will be unable to use the devices.
When adding a new device, lvm continues to first use sys_wwid
from the sysfs wwid file. If the device has no sysfs wwid file,
lvm now attempts to use one of the new types from vpd_pg83.
If a devices file entry with type sys_wwid does not match a
given device's sysfs wwid file, the sys_wwid value will also
be compared to that device's other wwids from its vpd_pg83 file.
If the kernel changes the wwid type reported from the sysfs
wwid file, e.g. from a device's t10 id to its naa id, then lvm
should still be able to match it correctly using the vpd_pg83
data which will include both ids.
David Teigland [Wed, 31 Aug 2022 20:59:28 +0000 (15:59 -0500)]
device id: change space handling in t10 wwid
t10 wwids are now edited in the same way that multipath does,
which is replacing a series of spaces with one _. Previously
lvm replaced every space with one _. Devices file entries
with the old form will be converted to the new shorter form.
lvmlockd: purge the lock resources left in previous lockspace
If lvmlockd in cluster is killed accidently or any other reason, the
lock resources will become orphaned in the VG lockspace. When the
cluster manager tries to restart this daemon, the LVs will probably
become inactive because of resource schedule policy and thus the lock
resouce will be omited during the adoption process. This patch will
try to purge the lock resources left in previous lockspace, so the
following actions can work again.
David Teigland [Wed, 28 Sep 2022 16:02:17 +0000 (11:02 -0500)]
lvresize: exclude new fs handling at build time
Exclude the new fs resizing capabilities at build time
(rather than run time) if the necessary libblkid features
are not available. When excluded, all fs resizing options
are translated to resize_fsadm. Accessing the new
features now requires rebuilding lvm if libblkid is
upgraded.
Tony Asleson [Thu, 22 Sep 2022 22:10:13 +0000 (17:10 -0500)]
lvmdbustest: Re-work setUp
Place the addCleanup at the end as we don't want to go through clean up
if we don't make it through setUp. If we don't do this we can remove VGs
that we didn't create in the unit test.
Tony Asleson [Wed, 21 Sep 2022 15:05:36 +0000 (10:05 -0500)]
lvmdbusd: Correct lvm shell signal & child process handling
Previously when the __del__ method ran on LVMShellProxy we would blindly
call terminate(). This was a race condition as the underlying process
may/maynot be present. When the process is still present the SIGTERM will
end up being seen by lvmdbusd too. Re-work the code so that we
first try to wait for the child process to exit and only then if it hasn't
exited will we send it a SIGTERM. We also ensure that when this is
executed we will briefly ignore a SIGTERM that arrives for the daemon.
If we plan to use dm throttling for mirror targets - we actually
have to check whether kernel runs with CONFIG_HZ_1000 - if it does
not the whole idea of throttling is actually not working in the
testsuite as within a single 'tick' with HZ 100 way too much date
is being moved on any modern hardware - and since there is no plan
to change this in kernel - we simply avoid using throttling on such
kernel and test needs to work differently - either ignore results
or use much larger mirror sizes...
Tony Asleson [Tue, 20 Sep 2022 16:40:15 +0000 (11:40 -0500)]
lvmdbusd: Correct get_object_path_by_uuid_lvm_id
When checking to see if the PV is missing we incorrectly checked that the
path_create was equal to PV creation. However, there are cases where we
are doing a lookup where the path_create == None. In this case, we would
fail to set lvm_id == None which caused a problem as we had more than 1
PV that was missing. When this occurred, the second lookup matched the
first missing PV that was added to the object manager. This resulted in
the following:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/lvmdbusd/utils.py", line 667, in _run
self.rc = self.f(*self.args)
File "/usr/lib/python3.9/site-packages/lvmdbusd/fetch.py", line 25, in _main_thread_load
(changes, remove) = load_pvs(
File "/usr/lib/python3.9/site-packages/lvmdbusd/pv.py", line 46, in load_pvs
return common(
File "/usr/lib/python3.9/site-packages/lvmdbusd/loader.py", line 55, in common
del existing_paths[dbus_object.dbus_object_path()]
Because we expect to find the object in existing_paths if we found it in
the lookup.
Tony Asleson [Tue, 6 Sep 2022 21:23:06 +0000 (16:23 -0500)]
lvmdbusd: Use pseudo tty to get "lvm>" prompt again
When lvm is compiled with editline, if the file descriptors don't look like
a tty, then no "lvm> " prompt is done. Having lvm output the shell prompt
when consuming JSON on a report file descriptor is very useful in
determining if lvm command is complete.
Tony Asleson [Wed, 31 Aug 2022 20:04:59 +0000 (15:04 -0500)]
lvmdbusd: Instruct lvm to output debug to file for fullreport
Historically we have seen a few different errors which occur when we call
fullreport. Failing exit code and JSON which is missing one or more keys.
Instruct lvm to dump the debug to a file during fullreport calls when we
fork & exec lvm. If we encounter an error, ouput the debug data.
The reason this isn't being done when lvmshell is used is because we
don't have an easy way to test the error paths.
This change is complicated by the following:
1. We don't know if fullreport was good until we evaluate all the JSON.
This is done a bit after we have called into lvm and returned.
2. We don't want to orphan the debug file used by lvm if the daemon is
killed. Thus we try to minimize the window where the debug file hasn't
already been unlinked. A RFE to pass an open FD to lvm for this
purpose is outstanding.
The temp. file is:
-rw------. 1 root root /tmp/lvmdbusd.lvm.debug.XXXXXXXX.log
Tony Asleson [Wed, 31 Aug 2022 16:20:49 +0000 (11:20 -0500)]
lvmdbusd: Re-work error handling
Introduce an exception which is used for known existing issues with lvm.
This is used to distinguish between errors between lvm itself and lvmdbusd.
In the case of lvm bugs, when we simply retry the operation we will log
very little. Otherwise, we will dump a full traceback for investigation
when we do the retry.
Tony Asleson [Fri, 26 Aug 2022 18:01:05 +0000 (13:01 -0500)]
lvmdbusd: Re-work error handling for run_cmd
Instead of lumping all the exceptions, break them out to handle the dbus
exceptions separately, to reduce the amount of debug information that ends
up in the journal that has questionable value.
Tony Asleson [Thu, 25 Aug 2022 14:33:50 +0000 (09:33 -0500)]
lvmdbusd: Don't report recoverable error
Lvm occasionally fails to return all the request JSON keys in the output of
"fullreport". This happens very rarely. When it does the daemon was reporting
the resulting informational exception:
MThreadRunner: exception
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/lvmdbusd/utils.py", line 667, in _run
self.rc = self.f(*self.args)
File "/usr/lib/python3.9/site-packages/lvmdbusd/fetch.py", line 40, in _main_thread_load
(lv_changes, remove) = load_lvs(
File "/usr/lib/python3.9/site-packages/lvmdbusd/lv.py", line 143, in load_lvs
return common(
File "/usr/lib/python3.9/site-packages/lvmdbusd/loader.py", line 37, in common
objects = retrieve(search_keys, cache_refresh=False)
File "/usr/lib/python3.9/site-packages/lvmdbusd/lv.py", line 95, in lvs_state_retrieve
l['vdo_operating_mode'],
KeyError: 'vdo_operating_mode'
The daemon retries the operation, which usually works and the daemon continues.
However, simply reporting this informational stack trace is causing CI and other
automated tests to fail as they expect no tracebacks in the log output.
Remove the reporting of this code path unless it persists and causes the daemon
to give up and exit.
Tony Asleson [Wed, 17 Aug 2022 22:24:08 +0000 (17:24 -0500)]
lvmdbusd: Add debug circular buffer
When the daemon isn't started with --debug we will keep a circular
buffer of the past N number of debug messages which we will output
when we encounter an issue.
Tony Asleson [Wed, 17 Aug 2022 22:21:19 +0000 (17:21 -0500)]
lvmdbusd: fix hangs on SIGINT
Rather than trying to bubble up return codes that get us to exit cleanly
it's better to just raise an exception to bail. In some cases functions
don't have return codes, so they cannot be checked.
Tony Asleson [Wed, 17 Aug 2022 17:08:16 +0000 (12:08 -0500)]
lvmdbusd: Make sure to set cfg.got_external_event
We were incorrectly only setting this if --udev wasn't present on the
command line. In all cases when we see a manager.ExternalEvent we want
to set this.
Tony Asleson [Wed, 17 Aug 2022 17:05:06 +0000 (12:05 -0500)]
lvmdbusd: Handle no lastlog
Depending on when an occurs, it maynot have any information available for
lastlog. In this case try to grab an error message from the original
response.
Tony Asleson [Tue, 9 Aug 2022 22:43:00 +0000 (17:43 -0500)]
lvmdbusd: Re-work flight recorder data
Introduce a new lock for the flight recorder, so that we can dump it when
a command is block waiting for lvm to complete. Also in all paths we will
addthe metadata to the flight recorder before it's done, so we will have
it when a command hangs and we dump the flight recorder. Add the missing
bits after the command has finished.