David Teigland [Fri, 11 Dec 2020 21:56:04 +0000 (15:56 -0600)]
partial flag for writecache and integrity
When a writecache sublv or an integrity metadata sublv
are partial (missing a dev), set the partial flag on
the upper level LV also, as is done for other sublvs.
David Teigland [Thu, 10 Dec 2020 21:37:23 +0000 (15:37 -0600)]
writecache: fix uncache for two step detach
Fix the two-step writecache detach in commit c32d7fed4f78b.
In the case of uncache, the cachevol is removed after
detaching the writecache. When the detach is finished
in the second step, the remove must wait until then.
David Teigland [Wed, 9 Dec 2020 23:36:09 +0000 (17:36 -0600)]
cache: activation cache_check on cachevol
When using cache with a cachevol, the cache_check tool was
not being run on the cache metadata during activation.
cache_check clears the needs_check flag in the cache
metadata, so if the flag was set due to an unclean
shutdown, the activation would fail.
Zdenek Kabelac [Mon, 7 Dec 2020 15:16:55 +0000 (16:16 +0100)]
fsadm: fix unbound variable usage
When 'fsadm resize vg/lv' is used without size, it should just
resize filesystem to match device - but since we now check
for unbound variable in bash - the previous usage no longer
works and needs explicit check.
David Teigland [Wed, 11 Nov 2020 21:13:46 +0000 (15:13 -0600)]
tests: integrity mismatch checks for all raid levels
Verify that corruption is corrected for raid levels other
than raid1. For other raid levels, attempt to corrupt the
given file pattern on each underlying device, since we don't
know which device contains the file being corrupted.
This ensures that corruption is actually be introduced
when testing the other raid levels.
Verify that corruption is being corrected by checking
the integritymismatches count is non-zero for the raid LV,
which includes the total from all images (since we don't
know which image will have the corruption.)
David Teigland [Tue, 27 Oct 2020 20:42:08 +0000 (15:42 -0500)]
pvck: fix dev filtering
filters needing io weren't being run because bcache
wasn't set up. Read the first 4k of the device
before doing filtering or reading ondisk structs to
reduce reads.
David Teigland [Tue, 27 Oct 2020 19:28:54 +0000 (14:28 -0500)]
pvck: handle first mda at non-4096 offset
It's possible for a machine with a non-4k page size
to create a PV with an mda_header at an offset other
than 4k. Fix pvck --dump to work with these other
mda offsets. pvck --repair will write a new first
mda at 4096 but lvm with other page sizes will work
with this.
David Teigland [Fri, 23 Oct 2020 18:53:52 +0000 (13:53 -0500)]
pvcreate: clean up opening and filtering of args
The args for pvcreate/pvremove (and vgcreate/vgextend
when applicable) were not efficiently opened, scanned,
and filtered. This change reorganizes the opening
and filtering in the following steps:
- label scan and filter all devs
. open ro
. standard label scan at the start of command
- label scan and filter dev args
. open ro
. uses full md component check
. typically the first scan and filter of pvcreate devs
- close and reopen dev args
. open rw and excl
- repeat label scan and filter dev args
. using reopened rw excl fd
- wipe and write new headers
. using reopened rw excl fd
Zdenek Kabelac [Sun, 25 Oct 2020 19:19:31 +0000 (20:19 +0100)]
fsadm: better check for getsize64 support
Older blockdev tool return failure error code with --help,
and since now the tool abort on command failure, lets
detect missing --getsize64 support directly by running
command and check if it returns something usable.
It's likely very hard to have the system with
such old blockdev tool and newer lvm2 compiled.
Zdenek Kabelac [Fri, 23 Oct 2020 22:29:45 +0000 (00:29 +0200)]
tests: fsadm test continue after fs repair
Test case where filesystem has been corrected via fsck.
In such case fsck returns '1' as success and should be
handled in a same way as '0' since fs is correct.
Zdenek Kabelac [Fri, 23 Oct 2020 23:13:42 +0000 (01:13 +0200)]
fsadm: enhance error handling
Set more secure bash failure mode for pipilines.
Avoid using unset variables.
Enhnace error reporting for failing command.
Avoid using error via 'case..esac || error'.
David Teigland [Wed, 21 Oct 2020 21:21:50 +0000 (16:21 -0500)]
get dev size when setting pv device
In some cases the dev size may not have been read yet
in set_pv_devices(). In this case get the dev size
before comparing the dev size with the pv size.
David Teigland [Thu, 15 Oct 2020 19:11:08 +0000 (14:11 -0500)]
pvscan: rework to improve PVs without metadata
Restructure the pvscan code, and add new temporary files
that list pvids in a VG, used for processing PVs that
have no metadata.
The new temp files, in /run/lvm/pvs_lookup/<vgname>, allow a
proper pvscan --cache to be done on PVs that have no metadata.
pvscan --cache <dev> is only supposed to read <dev>, but when
<dev> has no metadata, this had not been possible. The
command had to fall back to scanning all devices to read all
VG metadata to get the list of all PVIDs needed to check for
a complete VG. Now, the temp file can be used in place of
reading metadata from all PVs on the system.
David Teigland [Thu, 15 Oct 2020 19:05:45 +0000 (14:05 -0500)]
add label_read_pvid
To read the lvm headers and set dev->pvid if the
device is a PV. Difference from label_scan_ functions
is this does not read any vg metadata or add any info
to lvmcache.
David Teigland [Tue, 20 Oct 2020 20:10:08 +0000 (15:10 -0500)]
scanning: improve filtering control
Filtering in label_scan was controlled indirectly by
the fact that bcache was not yet set up when label_scan
first ran. The result is that filters that needed data
would not run and would return -EAGAIN, which would
result in the dev flag FILTER_AFTER_SCAN being set.
After the dev header was read for checking the label,
filters would be rechecked because of FILTER_AFTER_SCAN.
All filters would be checked this time because bcache
was now set up, and the filters needing data would
largely use data already scanned for reading the label.
This design worked but is hard to adjust for future
cases where bcache is already set up.
Replace this method (based on setting up bcache, or not)
with a new cmd flag filter_nodata_only. When this flag
is set filters that need data will not run. This allows
the same label_scan behavior when bcache has been set up.
There are no expected changes in behavior.
Zdenek Kabelac [Tue, 20 Oct 2020 20:26:44 +0000 (22:26 +0200)]
memlock: allocate at most halve of rlimit stack
Touch of stack allocation validated given size with rlimit
and if the reserved_stack was above rlimit, its been completely
ignored - now we will always touch stack upto rlimit/2 size.
Zdenek Kabelac [Tue, 20 Oct 2020 20:22:52 +0000 (22:22 +0200)]
lvmcmdlib: lvm2_init_threaded
cmd context has 'threaded' value that used be set
by clvmd - and allowed proper memory locking management.
Reuse same bit for dmeventd.
Since dmeventd is using 300KiB stack per thread,
we will ignore any user settings for allocation/reserved_stack
until some better solution is find.
This avoids crashing of dmevend when user changes this value
and because in most cases lvm2 should work ok with 64K stack
size, this change should not cause any problems.
Zdenek Kabelac [Mon, 19 Oct 2020 14:43:50 +0000 (16:43 +0200)]
cov: split check for type assignment
Check that type is always defined, if not make it explicit internal
error (although logged as debug - so catched only with proper lvm.conf
setting).
This ensures later type being NULL can't be dereferenced with coredump.
Zdenek Kabelac [Fri, 16 Oct 2020 18:58:58 +0000 (20:58 +0200)]
dm: remove created devices on error path
DM tree keeps track of created device while preloading a device tree.
When fail occures during such preload, it will now try to remove
all created and preloaded device. This makes it easier to maintain
stacking of device, since we do not need to check in-depth for
existance of all possible created devices during the failure.
Zdenek Kabelac [Fri, 2 Oct 2020 17:19:30 +0000 (19:19 +0200)]
tests: aux hides zero and error device
When ERR_DEV and ZERO_DEV are used, they are automatically
taken down when the last user no longer needs them,
so hide them from 'forgotten' device check.
Zdenek Kabelac [Fri, 2 Oct 2020 17:17:36 +0000 (19:17 +0200)]
tests: rename shown debug trace
As there could be few invokes of stacktrace, avoid
repeatedly display logs from commands.
So after first display rename debug.log* -> debug_log
so the file still can remain for reading in test dir.
Zdenek Kabelac [Fri, 2 Oct 2020 17:32:28 +0000 (19:32 +0200)]
wipe_lv: use BLKZEROOUT when possible
Since BLKZEROOUT ioctl should be supposedly fastest
way how to clear block device start using this ioctl
for zeroing a device. Commonly we do zero typically
small portion of a device (8KiB) - however since we now
also started to zero metadata devices, in the case
of i.e. thin-pool metadata this can go upto ~16GiB
and here the performance starts to be noticable.
Zdenek Kabelac [Fri, 2 Oct 2020 17:26:58 +0000 (19:26 +0200)]
wipe_lv: drop label_scan_invalidate on error path
Since dev_set_bytes() now closes dev on error path itself,
remove this unneeded call now (introduced few commits back
in history thus removing comment from WHATS_NEW)
Zdenek Kabelac [Fri, 2 Oct 2020 15:16:14 +0000 (17:16 +0200)]
bcache: support interrupts when waiting on IO
Since lvm2 normally block signals during protected
phase where it does not want to be interrupted.
Support interruptible processing when allowed
in section between sigint_allow() ... sigint_restore())
and let the 'io_getenvents()' finish with EINTR.
Zdenek Kabelac [Fri, 2 Oct 2020 15:42:50 +0000 (17:42 +0200)]
bcache: fix busy loop with too many errors
When bcache tries to write data to a faulty device,
it may get out of caching blocks and then just busy-loops
on a CPU - so this check protects this by checking
if there is already max_io (~64) errored blocks.
Zdenek Kabelac [Fri, 2 Oct 2020 15:18:12 +0000 (17:18 +0200)]
bcache: fix waiting problem for completed IO
Call _wait_all() which does check whether there is still
some pending IO before sleep. Otherwise it may happen
our submitted IO operations have been already dispatched
and this call then endlessly waits for IO which are all done.
This can be reproduced when device returns quickly errors
on write requests.
David Teigland [Thu, 11 Jun 2020 18:33:40 +0000 (13:33 -0500)]
writecache: use two step detach
When detaching a writecache, use the cleaner setting
by default to writeback data prior to suspending the
lv to detach the writecache. This avoids potentially
blocking for a long period with the device suspended.
Detaching a writecache first sets the cleaner option, waits
for a short period of time (less than a second), and checks
if the writecache has quickly become clean. If so, the
writecache is detached immediately. This optimizes the case
where little writeback is needed.
If the writecache does not quickly become clean, then the
detach command leaves the writecache attached with the
cleaner option set. This leaves the LV in the same state
as if the user had set the cleaner option directly with
lvchange --cachesettings cleaner=1 LV.
After leaving the LV with the cleaner option set, the
detach command will wait and watch the writeback progress,
and will finally detach the writecache when the writeback
is finished. The detach command does not need to wait
during the writeback phase, and can be canceled, in which
case the LV will remain with the writecache attached and
the cleaner option set. When the user runs the detach
command again it will complete the detach.
To detach a writecache directly, without using the cleaner
step (which has been the approach previously), add the
option --cachesettings cleaner=0 to the detach command.
David Teigland [Tue, 23 Jun 2020 18:19:11 +0000 (13:19 -0500)]
pvcreate/pvremove: reimplement device checks
Reorganize checking the device args for pvcreate/pvremove
to prepare for future changes. There should be no change
in behavior. Stop the inverted use of process_each_pv,
which pulled in a lot of unnecessary processing, and call
the check functions on each device directly.
Since we detect already transaction if before starting
to build dm tree - this extra check is a duplicate
that would only capture very tiny 'race' and we later
validate transaction_id with suspended snapshot origin.
Introduce structures lv_status_thin_pool and
lv_status_thin (pair to lv_status_cache, lv_status_vdo)
Convert lv_thin_percent() -> lv_thin_status()
and lv_thin_pool_percent() + lv_thin_pool_transaction_id() ->
lv_thin_pool_status().
This way a function user can see not only percentages, but also
other important status info about thin-pool.
TODO:
This patch tries to not change too many other things,
but pool_below_threshold() now uses new thin-pool info to return
failure if thin-pool cannot be actually modified.
This should be handle separately in a better way.