Zdenek Kabelac [Sat, 6 May 2023 18:35:42 +0000 (20:35 +0200)]
tests: check for lvmdbusd running in the system
Check for running (possibly leftover) lvmdbusd running in the
system - as this daemon may interfere with this test as in this
case both be operating on same 'live' data in /run/lvm.
David Teigland [Tue, 2 May 2023 21:12:23 +0000 (16:12 -0500)]
lvreduce: make _lvseg_get_stripes handle integrity layer
lvreduce uses _lvseg_get_stripes() which was unable to get raid stripe
info with an integrity layer present. This caused lvreduce on a
raid+integrity LV to fail prematurely when checking stripe parameters.
An unhelpful error message about stripe size would be printed.
David Teigland [Tue, 25 Apr 2023 19:46:36 +0000 (14:46 -0500)]
lvmcache: fix valgrind error when dropping md duplicate
When lvmcache info is dropped because it's an md component,
then the lvmcache vginfo can also be dropped, but the list
iterator was still using the list head in vginfo, so break
from the loop earlier to avoid it.
Previous commit cause the pvmove could actually be started in unexpected
order - so make sure, we are not starting new pvmove in same VG until
the previous one is started.
Not quite sure if this helps anything, some of testing
machines can't reliably remove scsi_debug, reporting
they are in use - but it's not easily reproducible...
tests: handle multiple devs with wait_pvmove_lv_ready
aux wait_pvmove_lv_ready() now handles multiple pvmove LVs
at one go - which allows a bit fast checking - although
at some point we may need to switch to use delayed devs
since mirror throttling seems to be no longer working well,
as CPU are getting so fast, that most of data are already
pvmoved before throttling has any chance to do something...
When 'brd' device can be removed (is unused AKA not opened),
remove such device and use again for testing.
Let's assume user has no unused brd device left in the system.
When the 'tests' sometimes fail to cleanup devices, with this
change futher cleanup from some next test may evenually release
brd device and make it available for testing.
There is no easy way to detect, whether device supports zeroing,
and kernel also zeroes device when it's not directly supported,
but with extra message:
operation not supported error, dev X, sector Y op 0x9:(WRITE_ZEROES)...
So to avoid generating such message with every 'lvcreate', use for
zeroing of upto 8K just standard write of zeroed page.
(maybe we can go with even larger sizes).
Convert test to use only ext4 instead of 300M demanding XFS.
Shorten 'B' files to 4K and use 4K strip size with >raid1 arrays
so we do not risk spreading of the file across stripe.
Also use easier 'aux corrupt_dev()' method to introduce a bit
corruption into a block device with integrity.
Zdenek Kabelac [Mon, 6 Mar 2023 13:52:59 +0000 (14:52 +0100)]
vdo: use fixed size vdopool wrapper
Instead of using size of 'empty header' in vdopool use fixed size 4K
for a 'wrappeing' vdo-pool device.
This fixes the issue when user tried to activate vdo-pool after
a conversion from vdo managed device with 'vgchange -ay' - where
this command activated all LVs with 'vdo-pool' wrapping device as well,
but this converted pool uses 0-length header.
This 4k size should usually prevent other tools like 'blkid' recognize
such device as anything - so it shouldn't cause any problems with
duplicate indentification of devices.
Tony Asleson [Thu, 30 Mar 2023 15:10:23 +0000 (10:10 -0500)]
lvmdbusd: Correct seg. fault on s390x ELN
syscall 186 is specific to x86 64bit. As this is different from arch
to arch and between same arch different arch size we will only grab
thread ID using built-in python support if it is supported.
Tony Asleson [Thu, 9 Mar 2023 17:29:58 +0000 (11:29 -0600)]
lvmdbusd: Add a retries during initial load
When the daemon is starting we do an initial fetch of lvm state. If we
happened to get some type of failure with lvm during this time we would
exit. During error injection testing this happened enough that
the unit tests were unable to finish. Add retries to ensure we can get
started during error injection testing.
Tony Asleson [Thu, 9 Mar 2023 17:25:58 +0000 (11:25 -0600)]
lvmdbustest: Only inject 1 missing key error
Previously we were injecting a missing key in the lv, vg, and pv.
Given the order of processing in lvmdbusd, this prevented us from
exercising all the error paths. Change to returning just 1 instead.
Tony Asleson [Thu, 9 Mar 2023 17:21:27 +0000 (11:21 -0600)]
lvmdbusd: Handle missing key in get_key
When we sort the LVs, we can stumble on a missing key, protect against
this as well.
Seen in error injection testing:
Traceback (most recent call last):
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/fetch.py", line 198, in update_thread
num_changes = load(*_load_args(queued_requests))
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/fetch.py", line 83, in load
rc = MThreadRunner(_main_thread_load, refresh, emit_signal).done()
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/utils.py", line 726, in done
raise self.exception
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/utils.py", line 732, in _run
self.rc = self.f(*self.args)
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/fetch.py", line 40, in _main_thread_load
(lv_changes, remove) = load_lvs(
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/lv.py", line 148, in load_lvs
return common(
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/loader.py", line 37, in common
objects = retrieve(search_keys, cache_refresh=False)
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/lv.py", line 72, in lvs_state_retrieve
lvs = sorted(cfg.db.fetch_lvs(selection), key=get_key)
File "/home/tasleson/projects/lvm2/daemons/lvmdbusd/lv.py", line 35, in get_key
pool = i['pool_lv']
KeyError: 'pool_lv'
David Teigland [Wed, 8 Feb 2023 19:34:35 +0000 (13:34 -0600)]
vg_read: keep MISSING_PV when device with no mda reappears
Remove old code that became incorrect at some point.
It's probably a fragment of an old condition that was left
behind because it wasn't understood. We don't want to drop
the MISSING_PV flag just because the PV has no mda in use.
The device that was missing may have stale data, so the user
needs to decide if the device should be removed or restored.
David Teigland [Tue, 7 Feb 2023 21:25:46 +0000 (15:25 -0600)]
tests: vg-raid-takeover
Different sequences of steps that could be used to handle raid LVs
after VG takeover (what would happen in cluster failover) combined
with the loss of a disk.
Peter Rajnoha [Tue, 7 Mar 2023 13:45:06 +0000 (14:45 +0100)]
toollib: fix segfault if using -S|--select with log/report_command_log=1 setting
When we are using -S|--select for non-reporting tools while using command log
reporting (log/report_command_log=1 setting), we need to create an internal
processing handle to handle the selection itself. In this case, the internal
processing handle to execute the selection (to process the -S|--select) has
a parent handle (that is processing the actual non-reporting command).
When this parent handle exists, we can't destroy the command log report
in destroy_processing_handle as there's still the parent processing to
finish. The parent processing may still generate logs which need to be
reported in the command log report. If the command log report was
destroyed prematurely together with destroying the internal processing
handle for -S|--select, then any subsequent log request from processing
the actual command (and hence an attermpt to access the command log report)
ended up with a segfault.
See also: https://bugzilla.redhat.com/show_bug.cgi?id=2175220