David Teigland [Tue, 12 Jan 2021 17:53:02 +0000 (11:53 -0600)]
cachevol: allow forced detaching of damaged or invalid cachevol
A cachevol can be forcibly detached when it's missing devices.
Also allow this if it's damaged/invalid and unrepairable.
This would be needed to recover data from the origin LV after
a cachevol is lost or damaged beyond repair.
David Teigland [Tue, 12 Jan 2021 20:41:38 +0000 (14:41 -0600)]
writecache: let block_size setting override device block sizes
In cases where lvconvert does not detect a fs block size on the
device, it falls back to choosing a writecache block size based
on the device's LBS and PBS (tries to match those.)
If the user specifies a writecache block size on the command
line (--cachesettings block_size=4096|512), lvconvert currently
fails and reports an error if the user-specified value does not
match the value lvconvert would have chosen based on LBS and PBS.
The purpose of allowing a user-specified value on the command line
is to override what lvconvert would otherwise do, so change this
to just print a warning that the user value does not match the
value that would be chosen based on the LBS/PBS, and then take
the user-specified value as the writecache block size.
Zdenek Kabelac [Mon, 1 Feb 2021 09:24:19 +0000 (10:24 +0100)]
allocation: report allocation error instead of crash
Current allocation limitation requires to fit metadata/log LV on
a single PV. This is usually not a big problem, but since
thin-pool and cache-pool is using this for allocating extents
for their metadata LVs it might be eventually causing errors
where the remaining free spaces for large metadata size is spread
over several PV.
Zdenek Kabelac [Fri, 29 Jan 2021 22:20:18 +0000 (23:20 +0100)]
pool: limit pmspare to 16GiB
There is not much point to let allocate more then this size
even when i.e. converted LV is bigger then 16GiB (%extent_size)
ATM neither thin-pool nor cache-pool supports bigger metadata.
Zdenek Kabelac [Tue, 12 Jan 2021 16:59:29 +0000 (17:59 +0100)]
thin: improve 16g support for thin pool metadata
Initial support for thin-pool used slightly smaller max size 15.81GiB
for thin-pool metadata. However the real limit later settled at 15.88GiB
(difference is ~64MiB - 16448 4K blocks).
lvm2 could not simply increase the size as it has been using hard cropping
of the loaded metadata device to avoid warnings printing warning of kernel
when the size was bigger (i.e. due to bigger extent_size).
This patch adds the new lvm.conf configurable setting:
allocation/thin_pool_crop_metadata
which defaults to 0 -> no crop of metadata beyond 15.81GiB.
Only user with these sizes of metadata will be affected.
Without cropping lvm2 now limits metadata allocation size to 15.88GiB.
Any space beyond is currently not used by thin-pool target.
Even if i.e. bigger LV is used for metadata via lvconvert,
or allocated bigger because of to large extent size.
With cropping enabled (=1) lvm2 preserves the old limitation
15.81GiB and should allow to work in the evironement with
older lvm2 tools (i.e. older distribution).
Thin-pool metadata with size bigger then 15.81G is now using CROP_METADATA
flag within lvm2 metadata, so older lvm2 recognizes an
incompatible thin-pool and cannot activate such pool!
Users should use uncropped version as it is not suffering
from various issues between thin_repair results and allocated
metadata LV as thin_repair limit is 15.88GiB
Users should use cropping only when really needed!
Patch also better handles resize of thin-pool metadata and prevents resize
beoyond usable size 15.88GiB. Resize beyond 15.81GiB automatically
switches pool to no-crop version. Even with existing bigger thin-pool
metadata command 'lvextend -l+1 vg/pool_tmeta' does the change.
Patch gives better controls 'coverted' metadata LV and
reports less confusing message during conversion.
Patch set also moves the code for updating min/max into pool_manip.c
for better sharing with cache_pool code.
David Teigland [Wed, 27 Jan 2021 22:23:00 +0000 (16:23 -0600)]
writecache: use cleaner message instead of table reload
When detaching writecache, make the first stage send a message
to dm-writecache to set the cleaner option. This is instead of
reloading the dm table with the cleaner option set. Reloading
the table causes udev to process/probe the dm dev, which gets
stalled because of the writeback activity, and the stalled udev
in turn stalls the lvconvert command when it tries to sync with
udev events.
When getting writecache status we do not need to get
open_count or read_head info, which can cause extra steps.
In case legs of a raid0 LV are removed, the lvdisplay command still
reports 'available' though raid0 is not providing any resilience
compared to the other raid levels.
Also lvdisplay does not display '(partial)' in case of missing raid0
legs as oposed to the lvs command.
Enhance lvdisplay to report "NOT available" for any RaidLV type in case
too many legs are inaccessible hence causing data loss. I.e. any leg
for raid0, all for raid1, more than 1 for raid4/5, more than 2 for raid6
and in case of completely lost mirror groups for raid10.
Zdenek Kabelac [Thu, 21 Jan 2021 20:15:33 +0000 (21:15 +0100)]
pvscan: ensure read buffer ends with 0
Read buffersize - 1 so the last byte is always 0.
Simplify init of 0 buffers.
Check snprintf result for error and report internal error as it could
happen only via bad compile parameters.
David Teigland [Wed, 13 Jan 2021 19:33:19 +0000 (13:33 -0600)]
integrity: fix segfault on error path when replacing images
When adding replacement raid+integrity images (lvconvert --repair
after a raid image is lost), various errors can cause the function
to exit with an error. On this exit path, the function attempts
to revert new images that had been created but not yet used. The
cleanup failed to account for the fact that not all images needed
to be reverted.
Zdenek Kabelac [Tue, 12 Jan 2021 14:58:07 +0000 (15:58 +0100)]
alloc: enhance estimation of sufficient_pes_free
Since commit 77fdc17d70e62cab75efaaf0dad02493b948610d always include
log_len size into needed extents - however now we may need sometimes
more extents then necessary - mainly when multiple PVs are involved
into allocation.
Add logs_still_needed into calculation of sufficient_pes_free()
David Teigland [Fri, 11 Dec 2020 21:56:04 +0000 (15:56 -0600)]
partial flag for writecache and integrity
When a writecache sublv or an integrity metadata sublv
are partial (missing a dev), set the partial flag on
the upper level LV also, as is done for other sublvs.
David Teigland [Thu, 10 Dec 2020 21:37:23 +0000 (15:37 -0600)]
writecache: fix uncache for two step detach
Fix the two-step writecache detach in commit c32d7fed4f78b.
In the case of uncache, the cachevol is removed after
detaching the writecache. When the detach is finished
in the second step, the remove must wait until then.
David Teigland [Wed, 9 Dec 2020 23:36:09 +0000 (17:36 -0600)]
cache: activation cache_check on cachevol
When using cache with a cachevol, the cache_check tool was
not being run on the cache metadata during activation.
cache_check clears the needs_check flag in the cache
metadata, so if the flag was set due to an unclean
shutdown, the activation would fail.
Zdenek Kabelac [Mon, 7 Dec 2020 15:16:55 +0000 (16:16 +0100)]
fsadm: fix unbound variable usage
When 'fsadm resize vg/lv' is used without size, it should just
resize filesystem to match device - but since we now check
for unbound variable in bash - the previous usage no longer
works and needs explicit check.
David Teigland [Wed, 11 Nov 2020 21:13:46 +0000 (15:13 -0600)]
tests: integrity mismatch checks for all raid levels
Verify that corruption is corrected for raid levels other
than raid1. For other raid levels, attempt to corrupt the
given file pattern on each underlying device, since we don't
know which device contains the file being corrupted.
This ensures that corruption is actually be introduced
when testing the other raid levels.
Verify that corruption is being corrected by checking
the integritymismatches count is non-zero for the raid LV,
which includes the total from all images (since we don't
know which image will have the corruption.)
David Teigland [Tue, 27 Oct 2020 20:42:08 +0000 (15:42 -0500)]
pvck: fix dev filtering
filters needing io weren't being run because bcache
wasn't set up. Read the first 4k of the device
before doing filtering or reading ondisk structs to
reduce reads.
David Teigland [Tue, 27 Oct 2020 19:28:54 +0000 (14:28 -0500)]
pvck: handle first mda at non-4096 offset
It's possible for a machine with a non-4k page size
to create a PV with an mda_header at an offset other
than 4k. Fix pvck --dump to work with these other
mda offsets. pvck --repair will write a new first
mda at 4096 but lvm with other page sizes will work
with this.
David Teigland [Fri, 23 Oct 2020 18:53:52 +0000 (13:53 -0500)]
pvcreate: clean up opening and filtering of args
The args for pvcreate/pvremove (and vgcreate/vgextend
when applicable) were not efficiently opened, scanned,
and filtered. This change reorganizes the opening
and filtering in the following steps:
- label scan and filter all devs
. open ro
. standard label scan at the start of command
- label scan and filter dev args
. open ro
. uses full md component check
. typically the first scan and filter of pvcreate devs
- close and reopen dev args
. open rw and excl
- repeat label scan and filter dev args
. using reopened rw excl fd
- wipe and write new headers
. using reopened rw excl fd
Zdenek Kabelac [Sun, 25 Oct 2020 19:19:31 +0000 (20:19 +0100)]
fsadm: better check for getsize64 support
Older blockdev tool return failure error code with --help,
and since now the tool abort on command failure, lets
detect missing --getsize64 support directly by running
command and check if it returns something usable.
It's likely very hard to have the system with
such old blockdev tool and newer lvm2 compiled.
Zdenek Kabelac [Fri, 23 Oct 2020 22:29:45 +0000 (00:29 +0200)]
tests: fsadm test continue after fs repair
Test case where filesystem has been corrected via fsck.
In such case fsck returns '1' as success and should be
handled in a same way as '0' since fs is correct.
Zdenek Kabelac [Fri, 23 Oct 2020 23:13:42 +0000 (01:13 +0200)]
fsadm: enhance error handling
Set more secure bash failure mode for pipilines.
Avoid using unset variables.
Enhnace error reporting for failing command.
Avoid using error via 'case..esac || error'.
David Teigland [Wed, 21 Oct 2020 21:21:50 +0000 (16:21 -0500)]
get dev size when setting pv device
In some cases the dev size may not have been read yet
in set_pv_devices(). In this case get the dev size
before comparing the dev size with the pv size.
David Teigland [Thu, 15 Oct 2020 19:11:08 +0000 (14:11 -0500)]
pvscan: rework to improve PVs without metadata
Restructure the pvscan code, and add new temporary files
that list pvids in a VG, used for processing PVs that
have no metadata.
The new temp files, in /run/lvm/pvs_lookup/<vgname>, allow a
proper pvscan --cache to be done on PVs that have no metadata.
pvscan --cache <dev> is only supposed to read <dev>, but when
<dev> has no metadata, this had not been possible. The
command had to fall back to scanning all devices to read all
VG metadata to get the list of all PVIDs needed to check for
a complete VG. Now, the temp file can be used in place of
reading metadata from all PVs on the system.
David Teigland [Thu, 15 Oct 2020 19:05:45 +0000 (14:05 -0500)]
add label_read_pvid
To read the lvm headers and set dev->pvid if the
device is a PV. Difference from label_scan_ functions
is this does not read any vg metadata or add any info
to lvmcache.
David Teigland [Tue, 20 Oct 2020 20:10:08 +0000 (15:10 -0500)]
scanning: improve filtering control
Filtering in label_scan was controlled indirectly by
the fact that bcache was not yet set up when label_scan
first ran. The result is that filters that needed data
would not run and would return -EAGAIN, which would
result in the dev flag FILTER_AFTER_SCAN being set.
After the dev header was read for checking the label,
filters would be rechecked because of FILTER_AFTER_SCAN.
All filters would be checked this time because bcache
was now set up, and the filters needing data would
largely use data already scanned for reading the label.
This design worked but is hard to adjust for future
cases where bcache is already set up.
Replace this method (based on setting up bcache, or not)
with a new cmd flag filter_nodata_only. When this flag
is set filters that need data will not run. This allows
the same label_scan behavior when bcache has been set up.
There are no expected changes in behavior.
Zdenek Kabelac [Tue, 20 Oct 2020 20:26:44 +0000 (22:26 +0200)]
memlock: allocate at most halve of rlimit stack
Touch of stack allocation validated given size with rlimit
and if the reserved_stack was above rlimit, its been completely
ignored - now we will always touch stack upto rlimit/2 size.
Zdenek Kabelac [Tue, 20 Oct 2020 20:22:52 +0000 (22:22 +0200)]
lvmcmdlib: lvm2_init_threaded
cmd context has 'threaded' value that used be set
by clvmd - and allowed proper memory locking management.
Reuse same bit for dmeventd.
Since dmeventd is using 300KiB stack per thread,
we will ignore any user settings for allocation/reserved_stack
until some better solution is find.
This avoids crashing of dmevend when user changes this value
and because in most cases lvm2 should work ok with 64K stack
size, this change should not cause any problems.
Zdenek Kabelac [Mon, 19 Oct 2020 14:43:50 +0000 (16:43 +0200)]
cov: split check for type assignment
Check that type is always defined, if not make it explicit internal
error (although logged as debug - so catched only with proper lvm.conf
setting).
This ensures later type being NULL can't be dereferenced with coredump.