David Teigland [Thu, 15 Oct 2020 19:11:08 +0000 (14:11 -0500)]
pvscan: rework to improve PVs without metadata
Restructure the pvscan code, and add new temporary files
that list pvids in a VG, used for processing PVs that
have no metadata.
The new temp files, in /run/lvm/pvs_lookup/<vgname>, allow a
proper pvscan --cache to be done on PVs that have no metadata.
pvscan --cache <dev> is only supposed to read <dev>, but when
<dev> has no metadata, this had not been possible. The
command had to fall back to scanning all devices to read all
VG metadata to get the list of all PVIDs needed to check for
a complete VG. Now, the temp file can be used in place of
reading metadata from all PVs on the system.
David Teigland [Thu, 15 Oct 2020 19:05:45 +0000 (14:05 -0500)]
add label_read_pvid
To read the lvm headers and set dev->pvid if the
device is a PV. Difference from label_scan_ functions
is this does not read any vg metadata or add any info
to lvmcache.
David Teigland [Tue, 20 Oct 2020 20:10:08 +0000 (15:10 -0500)]
scanning: improve filtering control
Filtering in label_scan was controlled indirectly by
the fact that bcache was not yet set up when label_scan
first ran. The result is that filters that needed data
would not run and would return -EAGAIN, which would
result in the dev flag FILTER_AFTER_SCAN being set.
After the dev header was read for checking the label,
filters would be rechecked because of FILTER_AFTER_SCAN.
All filters would be checked this time because bcache
was now set up, and the filters needing data would
largely use data already scanned for reading the label.
This design worked but is hard to adjust for future
cases where bcache is already set up.
Replace this method (based on setting up bcache, or not)
with a new cmd flag filter_nodata_only. When this flag
is set filters that need data will not run. This allows
the same label_scan behavior when bcache has been set up.
There are no expected changes in behavior.
Zdenek Kabelac [Tue, 20 Oct 2020 20:26:44 +0000 (22:26 +0200)]
memlock: allocate at most halve of rlimit stack
Touch of stack allocation validated given size with rlimit
and if the reserved_stack was above rlimit, its been completely
ignored - now we will always touch stack upto rlimit/2 size.
Zdenek Kabelac [Tue, 20 Oct 2020 20:22:52 +0000 (22:22 +0200)]
lvmcmdlib: lvm2_init_threaded
cmd context has 'threaded' value that used be set
by clvmd - and allowed proper memory locking management.
Reuse same bit for dmeventd.
Since dmeventd is using 300KiB stack per thread,
we will ignore any user settings for allocation/reserved_stack
until some better solution is find.
This avoids crashing of dmevend when user changes this value
and because in most cases lvm2 should work ok with 64K stack
size, this change should not cause any problems.
Zdenek Kabelac [Mon, 19 Oct 2020 14:43:50 +0000 (16:43 +0200)]
cov: split check for type assignment
Check that type is always defined, if not make it explicit internal
error (although logged as debug - so catched only with proper lvm.conf
setting).
This ensures later type being NULL can't be dereferenced with coredump.
Zdenek Kabelac [Fri, 16 Oct 2020 18:58:58 +0000 (20:58 +0200)]
dm: remove created devices on error path
DM tree keeps track of created device while preloading a device tree.
When fail occures during such preload, it will now try to remove
all created and preloaded device. This makes it easier to maintain
stacking of device, since we do not need to check in-depth for
existance of all possible created devices during the failure.
Zdenek Kabelac [Fri, 2 Oct 2020 17:19:30 +0000 (19:19 +0200)]
tests: aux hides zero and error device
When ERR_DEV and ZERO_DEV are used, they are automatically
taken down when the last user no longer needs them,
so hide them from 'forgotten' device check.
Zdenek Kabelac [Fri, 2 Oct 2020 17:17:36 +0000 (19:17 +0200)]
tests: rename shown debug trace
As there could be few invokes of stacktrace, avoid
repeatedly display logs from commands.
So after first display rename debug.log* -> debug_log
so the file still can remain for reading in test dir.
Zdenek Kabelac [Fri, 2 Oct 2020 17:32:28 +0000 (19:32 +0200)]
wipe_lv: use BLKZEROOUT when possible
Since BLKZEROOUT ioctl should be supposedly fastest
way how to clear block device start using this ioctl
for zeroing a device. Commonly we do zero typically
small portion of a device (8KiB) - however since we now
also started to zero metadata devices, in the case
of i.e. thin-pool metadata this can go upto ~16GiB
and here the performance starts to be noticable.
Zdenek Kabelac [Fri, 2 Oct 2020 17:26:58 +0000 (19:26 +0200)]
wipe_lv: drop label_scan_invalidate on error path
Since dev_set_bytes() now closes dev on error path itself,
remove this unneeded call now (introduced few commits back
in history thus removing comment from WHATS_NEW)
Zdenek Kabelac [Fri, 2 Oct 2020 15:16:14 +0000 (17:16 +0200)]
bcache: support interrupts when waiting on IO
Since lvm2 normally block signals during protected
phase where it does not want to be interrupted.
Support interruptible processing when allowed
in section between sigint_allow() ... sigint_restore())
and let the 'io_getenvents()' finish with EINTR.
Zdenek Kabelac [Fri, 2 Oct 2020 15:42:50 +0000 (17:42 +0200)]
bcache: fix busy loop with too many errors
When bcache tries to write data to a faulty device,
it may get out of caching blocks and then just busy-loops
on a CPU - so this check protects this by checking
if there is already max_io (~64) errored blocks.
Zdenek Kabelac [Fri, 2 Oct 2020 15:18:12 +0000 (17:18 +0200)]
bcache: fix waiting problem for completed IO
Call _wait_all() which does check whether there is still
some pending IO before sleep. Otherwise it may happen
our submitted IO operations have been already dispatched
and this call then endlessly waits for IO which are all done.
This can be reproduced when device returns quickly errors
on write requests.
David Teigland [Thu, 11 Jun 2020 18:33:40 +0000 (13:33 -0500)]
writecache: use two step detach
When detaching a writecache, use the cleaner setting
by default to writeback data prior to suspending the
lv to detach the writecache. This avoids potentially
blocking for a long period with the device suspended.
Detaching a writecache first sets the cleaner option, waits
for a short period of time (less than a second), and checks
if the writecache has quickly become clean. If so, the
writecache is detached immediately. This optimizes the case
where little writeback is needed.
If the writecache does not quickly become clean, then the
detach command leaves the writecache attached with the
cleaner option set. This leaves the LV in the same state
as if the user had set the cleaner option directly with
lvchange --cachesettings cleaner=1 LV.
After leaving the LV with the cleaner option set, the
detach command will wait and watch the writeback progress,
and will finally detach the writecache when the writeback
is finished. The detach command does not need to wait
during the writeback phase, and can be canceled, in which
case the LV will remain with the writecache attached and
the cleaner option set. When the user runs the detach
command again it will complete the detach.
To detach a writecache directly, without using the cleaner
step (which has been the approach previously), add the
option --cachesettings cleaner=0 to the detach command.
David Teigland [Tue, 23 Jun 2020 18:19:11 +0000 (13:19 -0500)]
pvcreate/pvremove: reimplement device checks
Reorganize checking the device args for pvcreate/pvremove
to prepare for future changes. There should be no change
in behavior. Stop the inverted use of process_each_pv,
which pulled in a lot of unnecessary processing, and call
the check functions on each device directly.
Since we detect already transaction if before starting
to build dm tree - this extra check is a duplicate
that would only capture very tiny 'race' and we later
validate transaction_id with suspended snapshot origin.
Introduce structures lv_status_thin_pool and
lv_status_thin (pair to lv_status_cache, lv_status_vdo)
Convert lv_thin_percent() -> lv_thin_status()
and lv_thin_pool_percent() + lv_thin_pool_transaction_id() ->
lv_thin_pool_status().
This way a function user can see not only percentages, but also
other important status info about thin-pool.
TODO:
This patch tries to not change too many other things,
but pool_below_threshold() now uses new thin-pool info to return
failure if thin-pool cannot be actually modified.
This should be handle separately in a better way.
LVM2 is distributed under GPLv2 only. The readline library changed its
license long ago to GPLv3. Given that those licenses are incompatible
and you follow the FSF in their interpretation that dynamically linking
creates a derivative work, distributing LVM2 linked against a current
readline version might be legally problematic.
Add support for the BSD licensed editline library as an alternative for
readline.
David Teigland [Mon, 28 Sep 2020 18:21:44 +0000 (13:21 -0500)]
tests: add case for metadata checksum differences
Cover the case where two copies of metadata have the
same seqno but different checksums. Also elaborate
on an existing fixme in the code for this case, since
we should be doing something better for this case.
This had been uncovering an issue with reopening
fds in readwrite mode.
David Teigland [Fri, 25 Sep 2020 17:29:10 +0000 (12:29 -0500)]
lvpoll: don't use hints
There's a bug when lvpoll attempts to write new hints,
related to the fact that lvpoll does not follow the same
scanning process as standard commands.
Fix by disabling the use of hints in lvpoll. We may want
to renable hints in lvpoll in a way that they can be used,
if valid, but not updated if they don't exist or are invalid.
Improve error response and reporting, when creating thin snapshots.
If the thin pool kernel metadata already have device with ID lvm2
tries to create, give more meanigful error message and also properly
restore transaction id to the value known to thin-pool in this case.
Before it's been possible to divert by one from kernel TID value,
and lvm2 stacked delete message for such thin device.
Shorten running time of the test.
Fix some issues in invoked resizing script to it returns
correct return code and dmeventd can be a little bit quicker
in this test.
When user tries to extend vdo pool - he needs to go always
at least by 1 full VDO slab (defined as vdo_slab_size_mb).
To avoid all trouble around find 'workable' size - lvm2 automatically
increases the passed (or by --use-policies calculated) extension size
(and informs a user about sometimes possibly large increase as slab
size can go upto 32GiB)
With VDO users need to always 'think-big' anyway and expect such
operation to be in GiB domain range.
When thetable reload fails during suspend() - we were only calling
plain resume() - and this will reload only those devices,
which were left suspend, but will not try to restore
metadata state according to lvm2 reverted metadata.
So if we were reloading device tree - we have restored
only top-level LV and rest of reverted device manipulation
were left alone and possibly mismatched what is in committed
metadata.
FIXME: There are several cases were such revert will likely not work
properly anyway as some operation are currenly handled in single commit,
while they need multiple commits, but it's step towards better correctness.
At least we catch there errors now earlier.
When loop can't handle sector-size option - failure caused double fail
for access of unbound variable
Also fix expression for 'rm' and remove loops after loop release.