Zdenek Kabelac [Mon, 4 Dec 2017 14:05:44 +0000 (15:05 +0100)]
libdm: watch for failing _info_by_dev
Separate handling of error code from _info_by_dev.
This error can only happeng when we are running out of memory.
In such case there is urgent need to stop any futher proceeding
of command and run to error ASAP.
Avoiding "$(get first_extent_sector "$d")" in the loop
allows the test to succeed in the cluster. Further cluster
analysis needed to get to the core reason.
The lvm2 test suite aims at small test resource footprints
(few PVs, small PV sizes) to run on tmpfs backed loop device.
OTOH, lvconvert-reshape-raid.sh aims to test the maxima of
supported total stripes of 64. This patch adds a prerequisite
conditional to skip tests using more than 14 stripes.
It requires the target version 1.13.1 to avoid deadlocks.
If the recovery of the repleced leg(s) of a RaidLV created without
initial resynchronization (i.e. "lvcreate --nosync ...") got
interrupted, it can't be extended because of the < 100% sync rate.
raid: ignore --stripesize on raid4/5 conversion to 1 stripe
In case caller passes in changed stripe size when reshaping raid4/5
to 1 stripe aiming to convert to raid1 and optionally to linear,
ignore it to prevent data corruption.
Zdenek Kabelac [Thu, 30 Nov 2017 12:26:44 +0000 (13:26 +0100)]
activation: avoid rechecking pvmove node
Use new 3rd. state of trace_pvmove_deps == 2.
In this state we know, we have already seen the node and can skip futher
testing. Remainging value 1 signals we want to track, and value 0
is for ignoring tracking, but node is still checking in this case.
Zdenek Kabelac [Tue, 28 Nov 2017 22:11:20 +0000 (23:11 +0100)]
activation: extend resume validation
Check also all snapshosts when resume is requested,
the origin volume is already resume, but possibly
some subLV or snapshot LV could be suspended if
we are still in critical_section.
Zdenek Kabelac [Fri, 1 Dec 2017 10:50:14 +0000 (11:50 +0100)]
activation: split priority from memory locking
When entering any critical section, lvm2 used to lock process memory
and raised task priority to avoid problem with page swapping and minimize
time of having non-resumed devices in table.
With this patch, memory locking which which is expensive is only used when
entering 'suspending' section as only in this section there is risk
lvm could be suspending a device which later can be needed for paging.
Raised priority is still kept for all section entrances as this is
low-cost operation and may accelerate table resumes - although the real
impact can be still considered later.
Zdenek Kabelac [Wed, 29 Nov 2017 21:19:46 +0000 (22:19 +0100)]
pvmove: add missing segment merging
When pvmove is finished and metadata are updated, the code missed
to merge possible mergable segments - so add explicit merging
call after pvmoved volumes are unlocked.
This avoids weird results where i.e. lvs could have been reporting
non-matching segments as lvs upon metadata read is doing silent segment
merging while dm table left after pvmove was still preserving
non-merged segments.
Zdenek Kabelac [Mon, 27 Nov 2017 09:21:21 +0000 (10:21 +0100)]
cmdline: avoid overrun on very large numbers.
When large size number (>2^31) is given on command line it could be
misdetected and in certain cases lead to wrongly casted number.
So make sure all cases always do set _MAX number in case the value would
not fit within the supported range instead of getting some random value
within the range.
In most cases this was not a problem to detect, but i.e. stripesize
parameter might have been fooled by certain large numbers.
Zdenek Kabelac [Mon, 27 Nov 2017 09:26:35 +0000 (10:26 +0100)]
toollib: improve stripes args reading
Rewrite validation of stripes and stripe_size args into more readable
sequential code.
Extend reading of stripes & stripes_size args so it better knows
defaults for types like striped raid.
TODO: this should really be a value obtained for segtype structure and
all the weird conditions and modification of stripes and stripe_size
around lvm2 code should be dropped.
Zdenek Kabelac [Sat, 25 Nov 2017 23:28:33 +0000 (00:28 +0100)]
pvmove: enhance delayed_resume logic
ATM we want to support delayed resume purely in pvmove case.
So have libdm logic internal to recognize difference beween
pvmove and other targets that do use delayed resume.
This fixes problem introduced with commit aa68b898ff9c51dcb
for mirror-on-mirror or snapshot-on-mirror problem.
TODO: likely added new API call and let libdm user select
delayed nodes explicitely.
Zdenek Kabelac [Fri, 24 Nov 2017 12:55:42 +0000 (13:55 +0100)]
activation: automaticaly discover pvmove holders
When pvmove is finished and does 'suspend/resume' on PVMOVE LV,
on resume path committed metadata are already showing 'standalone'
pvmove LV prepared just for removal.
However code should be able to 'resume' preloaded LV there were
participating in pvmove operation.
Previously this was all done in the 'tools' part of lvm2 code.
So the lvconvert upon pvmove finish had to explicitely call 'resume' on every such LV.
Now 'smarted' activation code is able to deduce and combine all information from
the active dm table and committed metadata so single call resolves
it all in one go.
Internally holders are detected by reading sysfs directory to capture
all needed UUID which are then looked in lvm2 metadata and all such
LVs are automatically collected into dmtree.
Zdenek Kabelac [Fri, 24 Nov 2017 12:58:23 +0000 (13:58 +0100)]
mirror: use lv_update_and_reload_origin
Replace complex code with standard lv_update_and_reload_origin().
Extra suspend should not be necessary.
(If they would be - dependency tree would have bug for fixing).
Zdenek Kabelac [Fri, 24 Nov 2017 12:57:22 +0000 (13:57 +0100)]
libdm: preload propagates delayed resume
Propagate delayed resume at least for preload case in a simple way.
Currently PVMOVE depends on internal logic where 'mirror' with
corelog is 'possible' PVMOVE. In such case resume of 'created'
node is 'delayed'.
This is mostly an ugly internal hack - but for the moment being when we
add propagation for preload - it does work reasonable.
TODO: provide standard API and avoid this internal 'guessing'.
Zdenek Kabelac [Fri, 24 Nov 2017 12:53:02 +0000 (13:53 +0100)]
resume: secure critical section
Only thin-pool with origin_only suspend is allowed to be not suspending anything.
In such case pairing resume will 'decrement' critical section counter.
When sanlock or dlm lock managers return an error number
that we don't recognize, replace it with a generic -ELMERR
which is defined in the set of special lvmlockd error
numbers. Otherwise, an unknown lock manager error number
could be misinterpreted for something else if it happened
to overlap another set of error numbers (which they have
not thus far.)
David Teigland [Wed, 15 Nov 2017 21:34:42 +0000 (15:34 -0600)]
lvmlockd: retry on other sanlock errors
These less common errors returned from sanlock should
also cause sanlock to retry the lock acquire:
- i/o timeout occurs during sanlock_acquire().
other i/o on the same disk as the leases can cause
sanlock i/o timeouts.
- low level disk paxos contention between hosts naturally
causes one host to not acquire the lease. There are a
couple special error numbers associated with these cases
that should just be recognized as a normal failure to
acquire the lease.
Zdenek Kabelac [Wed, 15 Nov 2017 11:08:33 +0000 (12:08 +0100)]
activation: suspend pvmove using lv.
Whenever pvmove tree is going to be generated for suspend
and such LV has a user - use this 'using LV' to generate
correct dm tree holding all components.
Zdenek Kabelac [Fri, 10 Nov 2017 20:15:50 +0000 (21:15 +0100)]
activation: check subLV before skipping resume
LV is asked for resume, and its already resume and tool
is inside 'critical_section()' check if there is any suspended sub LV.
In that case 'resume' operation will not be skipped.
Only lv_committed() now uses vg->vg_committed and it appears redundant
if its contents match the enclosing VG so don't waste cycles creating it
when that's known to be true when no write lock is held so the struct
won't get modified.
- Use 'lvmcache' consistently instead of 'metadata cache'
- Always use 5 characters for source line number
- Remember to convert uuids into printable form
- Use <no name> rather than (null) when VG has no name.
The persistent filter should not be imported by any command that doesn't
use it so take addtional note of REQUIRES_FULL_LABEL_SCAN (for vgrename)
and introduce IGNORE_PERSISTENT_FILTER for vgscan and pvscan.
Zdenek Kabelac [Fri, 10 Nov 2017 12:28:33 +0000 (13:28 +0100)]
reload: do not take backup with suspended devices
If the suspend/resume sequence would leave some device in suspend
for possible later resume, backup cannot be takes (fs holding backups
could be still frozen in critical section())
Zdenek Kabelac [Thu, 9 Nov 2017 12:00:28 +0000 (13:00 +0100)]
raid: protect raid4 activation
Move check for presence of raid4 into the right place
so there is no way how to hit activation of any LV
with raid4 on kernel which does not support it.
raid: reject message for 2-legged raid4/5 -> striped
Commit 763db8aab02d7df551a3e8500d261ef6c9651bdb rejects 2-legged
conversions to striped/raid0 but different messages are displayed
for raid0 or striped. This commit provides the same rejection messages.
raid: reject conversion request to striped/raid0 on 2-legged raid4/5
raid4/5 LVs may only be converted to striped or raid0/raid0_meta
in case they have at least 3 legs. 2-legged raid4/5 are a result
of either converting a raid1 to raid4/5 (takeover) or converting
a raid4/5 with more than 2 legs to raid1 with 2 legs (reshape).
The raid4/5 personalities map those as raid1,
thus reject conversion to striped/raid0.
Eric Ren [Mon, 30 Oct 2017 12:53:20 +0000 (20:53 +0800)]
clvmd: supress ENOENT error on testing connection
In HA cluster, we have "clvm" resource agent to manage clvmd daemon.
The agent invokes clvmd like: "clvmd -T90 -d0", which always prints
a scaring error message:
"""
local socket: connect failed: No such file or directory
"""
When specifed with "-d" option, clvmd tries to check if an instance
of the clvmd daemon is already running through a testing connection.
The connect() will fail with this ENOENT error in such case, so supress
the error message in such case.
TODO: add missing error reaction code - since ofter log_error, program
is not supposed to continue running (log_error() is for reporting
stopping problems).