Zdenek Kabelac [Fri, 10 Nov 2017 12:28:33 +0000 (13:28 +0100)]
reload: do not take backup with suspended devices
If the suspend/resume sequence would leave some device in suspend
for possible later resume, backup cannot be takes (fs holding backups
could be still frozen in critical section())
Zdenek Kabelac [Thu, 9 Nov 2017 12:00:28 +0000 (13:00 +0100)]
raid: protect raid4 activation
Move check for presence of raid4 into the right place
so there is no way how to hit activation of any LV
with raid4 on kernel which does not support it.
raid: reject message for 2-legged raid4/5 -> striped
Commit 763db8aab02d7df551a3e8500d261ef6c9651bdb rejects 2-legged
conversions to striped/raid0 but different messages are displayed
for raid0 or striped. This commit provides the same rejection messages.
raid: reject conversion request to striped/raid0 on 2-legged raid4/5
raid4/5 LVs may only be converted to striped or raid0/raid0_meta
in case they have at least 3 legs. 2-legged raid4/5 are a result
of either converting a raid1 to raid4/5 (takeover) or converting
a raid4/5 with more than 2 legs to raid1 with 2 legs (reshape).
The raid4/5 personalities map those as raid1,
thus reject conversion to striped/raid0.
Eric Ren [Mon, 30 Oct 2017 12:53:20 +0000 (20:53 +0800)]
clvmd: supress ENOENT error on testing connection
In HA cluster, we have "clvm" resource agent to manage clvmd daemon.
The agent invokes clvmd like: "clvmd -T90 -d0", which always prints
a scaring error message:
"""
local socket: connect failed: No such file or directory
"""
When specifed with "-d" option, clvmd tries to check if an instance
of the clvmd daemon is already running through a testing connection.
The connect() will fail with this ENOENT error in such case, so supress
the error message in such case.
TODO: add missing error reaction code - since ofter log_error, program
is not supposed to continue running (log_error() is for reporting
stopping problems).
testsuite: Fix problem when checking RAID4/5/6 for mismatches.
The lvchange-raid[456].sh test checks that mismatches can be detected
properly. It does this by writing garbage to the back half of one of
the legs directly. When performing a "check" or "repair" of mismatches,
MD does a good job going directly to disk and bypassing any buffers that
may prevent it from seeing mismatches. However, in the case of RAID4/5/6
we have the stripe cache to contend with and this is not bypassed. Thus,
mismatches which have /just/ happened to an area that now populates the
stripe cache may be overlooked. This isn't a serious issue, however,
because the stripe cache is short-lived and reasonably small. So, while
there may be a small window of time between the disk changing underneath
the RAID array and when you run a "check"/"repair" - causing a mismatch
to be missed - that would be no worse than if a user had simply run a
"check" a few seconds before the disk changed. IOW, it simply isn't worth
making a fuss over dropping the stripe cache before beginning a "check" or
"repair" (which we actually did attempt to do a while back).
So, to get the test running smoothly, we simply deactivate and reactivate
the LV to force the stripe cache to be dropped and then proceed. We could
just as easily wait a few seconds for the stripe cache to empty also.
testsuite: Add and document a 'should' for "idle" -> "recover" RAID test
When a "recover" is just starting for a RAID LV, it is possible to get
"idle" for the sync action if the status is issued quickly enough. This
is fine, the MD thread just hasn't gotten things going yet. However,
the /need/ for a "recover" should be marked in md->recovery and it would
be simple enough to fix the kernel so this doesn't happen. May eventually
want a separate bug for this, but for now it fits with RHBZ 1507719.
Bastian Blank [Wed, 1 Nov 2017 14:25:54 +0000 (15:25 +0100)]
systemd: remove Install sections from socket-activated services
We always preferred and recommended socket activation for our services
so remove the Install section in related .service units which are unused
in this case and keep only the Install section in associated .socket
units.
Zdenek Kabelac [Tue, 31 Oct 2017 23:51:39 +0000 (00:51 +0100)]
lv_manip: hide layered LV temporarily
Since vg_validate() now rejects LVs without segments and
insert_layer_for_segments_on_pv() gets just created
'layer_lv' without segment, it needs to be hidden
from vg->lvs during processing of _align_segment_boundary_to_pe_range()
as this function calls lv_validate() and now requires
vg to be consistent. LV is then put back into vg->lvs.
Jonathan Brassow [Tue, 31 Oct 2017 02:58:38 +0000 (21:58 -0500)]
test: clean-up failing test case and document 'should' cases
There are two known bugs in the lvconvert-raid-status-validation.sh
test. The first one I consider to be more of an annoyance (1507719).
The second one I consider to be more serious (1507729).
RHBZ 1507719 simply documents the fact that the three RAID status
fields may not always be coherent due to the way they are set and
unset when the MD thread is shutting down and starting up. For
example, the sync ratio may be 100% but the sync action may not
yet have switched to "idle" and the health characters may not yet
all be 'A's (i.e. the devices set to InSync).
RHBZ 1507729 is more serious. The sync ratio can be 100% for a
short period of time after upconverting linear -> RAID1. It is
reset to 0 once the MD sync thread gets to work on it. It does
this because, technically, the array /is/ in-sync if the new
devices are excluded - i.e. the data is 100% available and
consistent. I'm not sure what to do about this problem, but we'd
much rather not have this state that looks exactly like the
end of the process when the sync ratio is 100% because the
"recover" process finished, but the sync action and health
characters haven't been updated yet. Put simply, the problem
is that we can't tell if a sync is starting or finished based
on the status output.
Since 4fa5add6b1bd4d7f7313f2950021a09e4130ad08 ("pvcreate: Wipe cached
bootloaderarea when wiping label.") label_remove is responsible
for the lvmcache_del. (toollib and liblvm need fixing to share
the code.)
Zdenek Kabelac [Mon, 30 Oct 2017 10:00:31 +0000 (11:00 +0100)]
thin: fix merging messages
Correct reported message when thin snapshot has been already merged.
So lvm2 is no longer reporting "Mergins of snapshot X will occur..."
(even with swapped names).
When an ignored metadata area gets flagged for use again, make sure the
code doesn't try to parse its old metadata. Firstly by trying to detect
this situation and skipping the read (while still remembering the
position reached in the circular buffer), and secondly by clearing the
invalid live metadata location on disk as a precaution when subsequently
writing out the precommitted metadata.
Problems showed up when a metadata area in one VG got moved to
another VG in ignored state (still holding metadata for the original
VG) and then later got brought into use in the new VG - only the header
should be read in this case, not any of the metadata content.
vgsplit shares the vg_rename code so that must only set the PV_MOVED_VG
flag introduced in commit 486ed108481cfbe4336f0fa590cbae251188ecc6
("vgmerge: Fix intermediate metadata corruption") on PVs that moved.
Converting from one raid level to another, no changes
of stripes or stripesize can be requested because those
are subject to reshaping. I.e. the process requires to
takeover first and secondly request raid algorithm,
stripe or stripesize changes.
Ignore any related changes display warninngs
and proceed with the takeover.
Without this patch, a takeover requesting
stripesize change causes data corruption!
Zdenek Kabelac [Thu, 26 Oct 2017 11:58:43 +0000 (13:58 +0200)]
tests: allow override of LVM_LOG_FILE_MAX_LINES
Just like with other vars support this:
make check_local T=xyz LVM_LOG_FILE_MAX_LINES=10000000
Allows easily to override existing line limit.
Also increase limiting size of logs per command since some of
our commands are becoming very verbose....
Zdenek Kabelac [Thu, 26 Oct 2017 11:55:36 +0000 (13:55 +0200)]
log: better message when reached log limit
Add explaining message, when command was aborted due to the reach
of configure line number count (LVM_LOG_FILE_MAX_LINES)
for logging (used mainly with testing).
Zdenek Kabelac [Thu, 26 Oct 2017 12:02:44 +0000 (14:02 +0200)]
WHATS_NEW: missed
Last patch missed to mention, we've improved/fixed generated paths
in units and init.d shell scripts when lvm2 was plainly configured
with just i.e. --prefix.
Note: some distros might have fully specified --sbindir and
--usrsbindir - thus those very not seeing problems in generated paths.
Zdenek Kabelac [Tue, 24 Oct 2017 14:34:25 +0000 (16:34 +0200)]
snapshot: improve validation
Do not allow to take snapshot of mirror/raid leg or log or metadata LV.
This was actually never supported, but user was able to create it,
and this put device stack in hardly fixable state (needs manual work).
This prevents such creation to pass.
Also improve validation when recreating snapshot volume type
from origin and COW volume.
Zdenek Kabelac [Tue, 24 Oct 2017 12:56:42 +0000 (14:56 +0200)]
lvconvert: fixing extraction of vgname
Correction to function for extracting vgname out of lvconvert
parameters.
Avoid repeating some checks.
Add code to handle generic options which may provide vgname in its argument
and compare them all so they match to a single vgname (otherwise it's a
error).
Extract default (envvar) vgname only when no position nor optional vgname is
found.
Ondrej Kozina [Tue, 24 Oct 2017 09:55:14 +0000 (11:55 +0200)]
test: add regression test for fsadm bug
the bug in LUKS grow/shrink decision in fsadm was
masked due to fact that default LVM2 extent size
was larger than LUKS1 default data offset for dm-crypt
mapping. The new test address this bug.
Zdenek Kabelac [Mon, 23 Oct 2017 09:36:51 +0000 (11:36 +0200)]
lvcreate: skip checking for name restriction for caching
lvcreate supports a 'conversion' when caching LV.
This normally worked fine, however in case passed LV was
thin-pool's data LV with suffix _tdata we have failed to early.
As the easiest fix looks dropping validation of name when
caching type is select - such name check will happen later
once the VG is opened again and properly detect if the LV
with protected name already exists and can be converted,
or will be rejected as ambigiuous operation requiring user
to specify --type cache | --type cache-pool.
device: Separate errors for dev not found and filtered.
Replaced the confusing device error message "not found (or ignored by
filtering)" by either "not found" or "excluded by a filter".
(Later we should be able to say which filter.)
Zdenek Kabelac [Wed, 11 Oct 2017 10:41:28 +0000 (12:41 +0200)]
activation: fix activation lock
Activation lock has a primary purpose to serialize locking of individual
LV in case there is no other protecting mechanism for parallel
execution.
However in the case an activated LV is composed from several other LVs,
noone should be able to manipulate with those LVs as well.
This patch add a very 'naive' global VG activation locking in this case.
In the future we may introduce smarter function detecting minimal closed
graph components if this will appear as bottleneck
Patch checks if the VG Write lock is held - in this case we do not
need any more locking - command has exclusive access to VG.
In case we have clustered VG and we are activating an LV which does not
need other LVs - we also do not need any more locks.
In all other cases take respective lock - for single LV - use lvid,
for complex LVs use vgname.