lvconvert: position dedicated parity device in raid4 conversions porperly
On conversions between striped/raid0* and raid4, the kernel expects
the dedicated raid4 parity SubLVs in the first segment area rather than
in the last it's been allocated to, thus the data mapping ain't proper.
Enhance lvconvert (lib/metadata/raid_manip.c) to shift the dedicated
parity SubLVs on conversions from striped/raid0* to raid4 and vice-versa.
In case of raid0_meta -> raid4 where the MD raid0 personality already has
stored RAID array device positions in the superblocks, the MetaLVs have to
be cleared so that the kernel doesn't fail validating the array positions
after lvm has shifted them up by one.
Add more tests to lvconvert-raid-takeover.sh including one to check for
mapping flaws by converting a created raid4 with filesystem -> striped
and fsck it.
Whilst on it:
- add missing direct striped -> raid4 conversion to the takeover array
to avoid an intermim conversion from striped -> raid0*
- clean up the takeover array
- allow lvconvert to actually call lv_raid_convert() on all takeover requests
in order to check parameters and display messages provided by takeover
functions rather than just "...not supported" from within lvconvert
- fix a typo
dmsetup: Produce partial output if dev disappears.
If a device disappears after obtaining the list of devices but before
processing it as a member of that list, dmsetup exits with a failure code.
Most commands still produce what output they can in these circumstances,
but 'ls --tree' and 'info -c' with fields depending on device dependencies
didn't. Change this.
Zdenek Kabelac [Wed, 12 Oct 2016 09:12:48 +0000 (11:12 +0200)]
lvconvert: still use strcmp for now
Keep for now function logic making its decision on string content.
We need bigger patch converting all things to bit-checks later.
This needs however bigger refactoring.
So this commit reverts some changes from: c8b6c130158e1cc1a639afc21f5fd6306286fdb3
Commit 088b3d036a73a7d947f6e9b12e8fad8a9f40d97f allowed repair on cache origin RAID LVs
and restricted lvconvert actions on RAID SubLVs to change number of mirrors, repair,
replace and type changes in order to avoid unsuitable coversions on them.
This introduced a regression prohibiting --splitmirrors on any RAID SubLVs
(e.g. of cache or thin LVs; lvconvert-{cache,thin}-raid.sh tests failing).
introduced an abort if lvm logs too much. In the case of utilizing
lvm shell this is a pretty normal occurance, so we will disable
when we use lvm shell.
Zdenek Kabelac [Tue, 11 Oct 2016 11:19:44 +0000 (13:19 +0200)]
cleanup: use already set values
When we have already decoded arg_is_set into a local var
or already set segment type - already use these
values instead of repeating calls and string checks.
Zdenek Kabelac [Tue, 11 Oct 2016 09:59:43 +0000 (11:59 +0200)]
lvconvert: fix error value
Seems some error path where not converted to 'new' ECMD return value.
Fix them to always 'goto out'.
Also drop unneeded 'ret = 0' when ret already is 0.
Tony Asleson [Fri, 7 Oct 2016 19:55:36 +0000 (14:55 -0500)]
lvmdbustest.py: Ensure we exit non-zero on fail
If you run multiple runs of unittest.main, unless you don't pass exit=true
the test case always ends with a 0 exit code. Add ability to store the
result of each invocation of the test and exit with a non-zero exit code
if anyone of them fail.
Zdenek Kabelac [Mon, 3 Oct 2016 13:39:54 +0000 (15:39 +0200)]
dmeventd: pthread_sigmask in single function
Integrate back _unblock_sigalrm() and check for error code of
pthread_sigmask() function so we do not use uninitialized
sigmask_t on error path (Coverity).
Tony Asleson [Tue, 27 Sep 2016 03:02:08 +0000 (22:02 -0500)]
lvmdbusd: Allow PV device names to be '[unknown]'
When a PV device is missing lvm will return '[unknown]' for the device
path. The object manager keeps a hash table lookup for uuid and for PV's
device name. When we had multiple PVs with the same device path we
we only had 1 key in the table for the lvm id (device path). This caused
a problem when the PV device transitioned from '[unknown]' to known as any
subsequent transitions would cause an exception:
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/lvmdbusd/request.py", line 66, in run_cmd
result = self.method(*self.arguments)
File "/usr/lib/python3.5/site-packages/lvmdbusd/manager.py", line 205, in _pv_scan
cfg.load()
File "/usr/lib/python3.5/site-packages/lvmdbusd/fetch.py", line 24, in load
cache_refresh=False)[1]
File "/usr/lib/python3.5/site-packages/lvmdbusd/pv.py", line 48, in load_pvs
emit_signal, cache_refresh)
File "/usr/lib/python3.5/site-packages/lvmdbusd/loader.py", line 80, in common
cfg.om.remove_object(cfg.om.get_object_by_path(k), True)
File "/usr/lib/python3.5/site-packages/lvmdbusd/objectmanager.py", line 153, in remove_object
self._lookup_remove(path)
File "/usr/lib/python3.5/site-packages/lvmdbusd/objectmanager.py", line 97, in _lookup_remove
del self._id_to_object_path[lvm_id]
KeyError: '[unknown]'
when trying to delete a key that wasn't present. In this case we don't add a
lookup key for the device path and the PV can only be located by UUID.
The dm_stats_delete_region() call removes a region from the bound
device, and, if the region is grouped, from the group leader
group descriptor stored in aux_data.
To do this requires a listed handle: previous versions of the
library do not since no dependencies exist between regions without
grouping.
This leads to strange behaviour when a command built against an old
version of the library is used with one supporting groups. Deleting
a region with dmstats succeeds, but logs errors:
# dmstats list
Name RgID RgSta RgSiz #Areas ArSize ProgID
vg_hex-root 0 0 1.00g 1 1.00g dmstats
vg_hex-root 1 1.00g 1.00g 1 1.00g dmstats
vg_hex-root 2 2.00g 1.00g 1 1.00g dmstats
# dmstats delete --regionid 2 vg_hex/root
Region ID 2 does not exist
Could not delete statistics region.
Command failed
# dmstats list
Name RgID RgSta RgSiz #Areas ArSize ProgID
vg_hex-root 0 0 1.00g 1 1.00g dmstats
vg_hex-root 1 1.00g 1.00g 1 1.00g dmstats
This happens because the call to dm_stats_delete_region() is inside
a dm_stats_walk_*() iterator: upon entry to the call, the iterator
is at its end conditions and about to terminate. Due to the call to
dm_stats_list() inside the function, it returns with an iterator at
the beginning of a walk and performs a further iteration before
exiting. This final loop makes a further attempt to delete the
(already deleted) region, leading to the confusing error messages.
libdm: fix stats walk compatibility with older dmsetup
The current dmsetup.c handles DR_STATS and DR_STATS_META reports
separately in _display_info_cols(), meaning that the stats walk
functions are never called for these report types.
Versions before v2.02.159 have a loop using dm_stats_walk_do() and
dm_stats_walk_while(), that executes once for non-stats reports,
and once per region, or area, for DR_STATS/DR_STATS_META reports.
This older behaviour relies on the documented behaviour that the
walk functions will accept a NULL pointer as the struct dm_stats*
argument.
This was broken by commit f1f2df7b: the NULL test on dms and
dms->regions were incorrectly moved from the dm_stats_walk_end()
wrapper to the internal '_stats_walk_end()' helper.
Since the pointer is dereferenced in between these points, using
an older dmsetup with current libdm results in a segfault when
running a non-stats report:
# dmsetup info -c vg00/lvol0
Segmentation fault (core dumped)
Restore the NULL checks to the wrapper function as intended.
Peter Rajnoha [Tue, 27 Sep 2016 08:47:54 +0000 (10:47 +0200)]
systemd: disable service start rate limiting for lvm2-pvscan@.service
We shouldn't be losing pvscans just because of the fact that the
underlying device (PV) appears and disappears quickly in the system,
otherwise lvmetad may not see the device if it appears again (or it may
still keep the device in cache even it's already gone).
Peter Rajnoha [Fri, 23 Sep 2016 12:51:15 +0000 (14:51 +0200)]
toolcontext: read all configuration sources when checking config values in lvm2-activation-generator through lighweight toolcontext handler
We added lightweight toolcontext handle to avoid useless initialization
of some parts of the context and also to avoid problems when using the
handle very soon at system boot, like in lvm2-activation-generator
through lvm2app interface. However, we missed reading all the other
config sources like lvmlocal.conf as well as any tag config - we need to
read these too to get the final config value which may be overriden in
any of these additional config sources.
Currently, we use this lightweight toolcontext handle to read
global/use_lvmetad and global/use_lvmpolld config values in
lvm2-activation-generator using lvm2app interface (lvm_config_find_bool
lvm2app function).
tests: fix raid rebuild tests to work with older target versions
Pre 1.9 dm-raid targets status output was racy, which caused
the device status chars to be unreliable _during_ synchronization.
This shows paritcularly with tiny test devices used.
Enhance lvchange-rebuild-raid.sh to not check status
chars _during_ synchronization. Just check afterwards.
dmsetup: ensure --filemap histogram bounds are freed
Make sure that the temporary dm_histogram used for the bounds
argument is freed in the case that the user provided a --bounds
argument on the command line.
The dm-raid target now rejects device rebuild requests during ongoing
resynchronization thus causing 'lvconvert --repair ...' to fail with
a kernel error message. This regresses with respect to failing automatic
repair via the dmeventd RAID plugin in case raid_fault_policy="allocate"
is configured in lvm.conf as well.
Previously allowing such repair request required cancelling the
resynchronization of any still accessible DataLVs, hence reasoning
potential data loss.
Patch allows the resynchronization of still accessible DataLVs to
finish up by rejecting any 'lvconvert --repair ...'.
It enhances the dmeventd RAID plugin to be able to automatically repair
by postponing the repair after synchronization ended.
More tests are added to lvconvert-rebuild-raid.sh to cover single
and multiple DataLV failure cases for the different RAID levels.
Commit 199697accff0658a11420e263c1f85fb74cd39d3 rerouted funtion
for priting cache volume origin to lvm2app app function - which
however had a bug. So restore the original functionality
and print correct LV as cache origin LV.
Tony Asleson [Mon, 19 Sep 2016 14:54:24 +0000 (09:54 -0500)]
lvmdbusd: Fix dbus object with only properties and no method
Gris debugged that when we don't have a method the introspection
data is missing the interface itself eg.
<interface name="<your_obj_iface_name>" />
When adding the properties to the dbus object introspection we will
add the interface too if it's missing. This now allows us the
ability to have a dbus object with only properties.
Tony Asleson [Sat, 17 Sep 2016 05:11:59 +0000 (00:11 -0500)]
lvmdbustest.py: Test with/without timeout values
Because of the different code paths we need to test job handling with
all operations. Test now runs virtually everything with timeout == 0
and timeout == 15 so that we test both use cases.
Note: If a client passes -1 for the timeout value they need to make
sure that their connection time out is also infinite. Otherwise, if the client
times out the service side hangs any new dbus calls until the job
that is in progress completes. Not sure why this behavior is occuring
at this time, but it appears a limitation/bug of the dbus-python library.
Tony Asleson [Sat, 17 Sep 2016 05:10:46 +0000 (00:10 -0500)]
lvmdbusd: Set value to non none on error
When we register a failure we need to use a valid value which will be
returned with the object manager. Otherwise we will raise an Exception
because we are trying to construct an object path from None.
Tony Asleson [Fri, 16 Sep 2016 19:00:14 +0000 (14:00 -0500)]
lvmdbusd: Vg.LvCreate* methods, correct return type
The methods were returning an instance of the object instead of the
object path which was causing an exception when the result was returned
with the job object as we are explicity trying to return an object path.
Unit test added which re-creates the issue and verifies the fix.
Reload of thin-pool origin_only is designed to only post messages
to a thin-pool. It's not intended to be used for reload of thin-pool
table. Fix it by using standard call 'lv_update_and_reload()'.
thin: enforce there is some free space in thin pool metadata
Unconditionally guard there is at least 1/4 of metadata volume
free (<16Mib) or 4MiB - whichever value is smaller.
In case there is not enough free space do not let operation proceed and
recommend thin-pool metadata resize (in case user has not
enabled autoresize, manual 'lvextend --poolmetadatasize' is needed).
thin: report pool as holder when no active thin volume
In the case there is no active thin volume, report thin pool
as lock holder. This fixed function like lvextend
which either expecte lock holder LV is some active thin
or 'possibly' inactive thin pool.
David Teigland [Tue, 13 Sep 2016 21:50:47 +0000 (16:50 -0500)]
lvmlockd: fix metadata validation when rescanning VG
When rescanning a VG from disk, the metadata read from
each PV was compared as a sanity check. The comparison
is done by exporting the vg metadata from each dev to
a config tree, and then comparing the config trees.
The function to create the config tree inserts
extraneous information along with the actual VG metadata.
This extra info includes creation_time. The config
trees for two devs can easily be created one second
apart in which case the different creation_times would
cause the metadata comparison to fail. The fix is to
exclude the extraneous info from the metadata comparison.