Fix bug in cmirror that caused incorrect status info to print on some nodes.
Looking at the code in cmirrord/local.c, we can see the various different
request types handled in different ways. Some information that is non-changing
does not need to go around the cluster and can be short-circuited. For
example, once the cluster mirror is in-sync, it is pointless to continue
sending that query around the cluster. We can save network bandwidth and reply
directly back to the kernel. When it comes to status information, there are
two types 'TABLE' and 'INFO'. The 'TABLE' information never changes and
belongs to the group of requests that can be safely short-circuited. The
'STATUS' information can change - and will change if a device fails. Thus it
cannot be short-circuited, but this is exactly what was found. The 'STATUS'
information request was being short-circuited and therefore never reporting the
failure condition to anyone other than the "server" that experienced it
directly.
Allow a subset of failed devices to be replaced in RAID LVs.
If two devices in an array failed, it was previously impossible to replace
just one of them. This patch allows for the replacement of some, but perhaps
not all, failed devices.
Prevent resume from creating error devices that already exist from suspend.
Thanks to agk for providing the patch that prevents resume from attempting
(and then failing) to create error devices which already exist; having been
created by a corresponding suspend operation.
In some occasional case dmevent restart was experiencing problems
with obtaining pid lockfile. So this patch tries to send several more kill
message until daemon kills itself so there is would reponse.
With this small loop the restart seems to work reliable,
although the loopsize and usleep are just randomly picked for now.
Peter Rajnoha [Tue, 24 Apr 2012 08:00:55 +0000 (08:00 +0000)]
Rename (Blk)DevNames header to (Blk)DevNamesUsed in dmsetup info -c output.
Just to make it clearer since there is the "dmsetup info -c -o blkdevname"
as well that shows the "block device name for this mapping", having a
"BlkDevName" header on output.
It's a bit confusing then if the "dmsetup info -c -o devs_used,blkdevs_used"
is named with a plural "DevNames"/"BlkDevNames" but at the same time having
a totally different meaning than the singular form "BlkDevName".
Unlike 'mirror' segtype, 'raid1' should perform flush on suspend.
The 'mirror' segtype and 'raid1' segtype both set the 'MIRRORED' flag.
However, due to differences in the way these device-mapper targets behave
'mirror' must be suspended with the 'noflush' option and 'raid1' does not
have to be.
This patch ensures that when the 'MIRRORED' flag is checked to see if
'noflush' is needed that it does not also set it for 'raid1' by mistake.
Remove 'up' from rounding message that sometimes rounds down.
Detect reduction of 0 after rounding for stripes and avoid warning of potential data loss.
Fix inability to split RAID1 image while specifying a particular PV.
The logic for resuming the original and newly split LVs was not properly
done to handle situations where anything but the last device in the array
was split. It did not take into account the possible name collisions that
might occur when the original LV undergoes the shifting and renaming of its
sub-LVs.
When resizing thin pool - we need to use strip info from _tdata volume.
In future more generic solution will be necessary once we start to support
lvconvert (resize of stacked devices and stay properly aligned).
For now we just allow striped or linear LV so this code will work.
Peter Rajnoha [Wed, 11 Apr 2012 09:12:02 +0000 (09:12 +0000)]
Change message severity to log_very_verbose for missing dev info in udev db.
Libudev does not provide transactions when querying udev database - once we
get the list of block devices (devices/obtain_device_list_from_udev=1) and
we iterate over the list to get more detailed information about device node
and symlink names used etc., the device could be removed just in between we
get the list and put a query for more info. In this case, libudev returns
NULL value as the device does not exist anymore.
Recently, we've added a warning message to reveal such situations. However,
this could be misleading if the device is not related to the LVM action
we're just processing - the non-related block device could be removed in
parallel and this is not an error but a possible and normal operation.
(N.B. This "missing info" should not happen when devices are related to
the LVM action we're just processing since all such processing should be
synchronized with udev and the udev db must always be in consistent state
after the sync point. But we can't filter this situation out from others,
non-related devices, so we have to lower the message verbosity here for a
general solution.)
RAID LVs could not handle a down-convert if a device other than the last one
in the array was specified for removal. This change addresses that (bz806111).
Commit ID 46a75dedb4f6aa815a804f27cafbd3fd16a62011 consolidated code from the
various dmeventd plug-ins into a new function called 'dmeventd_lvm2_command',
but the new function did not strip off the "_mlog" extentions that the
mirror plug-in had been doing. This created bug 794904 - failure to replace
devices in a redundant log.
The test suite did catch this scenario because it performs repair tests (mainly)
through the CLI and not dmeventd. It's also not easy to test because the test
itself will hang if the bug is encountered.
Add a couple new functions to gdbinit file and decode a couple lv->status flags
New functions:
- seg_pvs: Print a list of PVs in a seg_pvs list
- pv_dev_name: print name of a PV
Zdenek Kabelac [Fri, 30 Mar 2012 14:59:35 +0000 (14:59 +0000)]
Fix unlocking in error path of vgreduce
When vg_read fails, it internally unlocks VG if it's been locked,
so in error path we should skip unlock_vg for this case.
(user would see ugly internal warning)
devel/~ # lvconvert --splitmirrors 1 -n abc/splitted_one vg/mirrored_one
Please use a single volume group name ("vg" or "abc")
Run `lvconvert --help' for more information.
Zdenek Kabelac [Wed, 28 Mar 2012 11:10:08 +0000 (11:10 +0000)]
Improve test suite
Add make help target.
Add LVM_TEST_PARALLEL to support parallel runs of tests
Work around the problem the dmsetup table/info may return error
by using dmtable and dminfo function that will use 'should'.
(Error happens when some concurently running process removes table
entry while dmsetup command resolves table entries inside the loop.)
Milan Broz [Tue, 27 Mar 2012 15:53:45 +0000 (15:53 +0000)]
Fix exclusive lvmchange -aey to fail if volume is active on different node.
Activation on remote node should be tried only if it is masked by tags
locally (like when hosttags enabled, IOW activate_lv_excl_local()
doesn't return error.)
Introduced change caused that lvchange -aey succeeded even if volume was
activated exclusively remotely.
Peter Rajnoha [Tue, 27 Mar 2012 11:04:46 +0000 (11:04 +0000)]
Add 'vgscan --cache' functionality for consistency with 'pvscan --cache'.
Calling vgscan alone should reuse information from the lvmetad (if running).
The --cache option should initiate direct device scan and update lvmetad
appropriately (if running).
This is mainly for vgscan to behave consistently compared to pvscan.
Milan Broz [Mon, 26 Mar 2012 20:32:58 +0000 (20:32 +0000)]
Do not allow pvmove if some affected LVs are activated
locally or on more nodes while others are activated exclusively.
Current pvmove code can either use local mirror (for exclusive
activation) or cmirror (for clustered LVs).
Because the whole intenal pvmove LV is just segmented LV containing
segments of several top-level LVs, code cannot properly handle
situation if some segment need to be activated exclusively.
Previously, it wrongly activated exclusive LV on all nodes
(locing code allowed it) but now this is no lnger possible.
If there is exclusively activated LV, pvmove is only
possible if all affected LVs are aslo activated exclusively.
(Note that in non-exclusive mode pvmove still activates LVs
on other nodes during move.)
# lvchange -aly vg_test/lv1
# lvchange -aey vg_test/lv2
# pvmove -i 1 /dev/sdc
Error locking on node bar-01: Device or resource busy
Error locking on node bar-03: Volume is busy on another node
...
Failed to activate lv2
Zdenek Kabelac [Fri, 23 Mar 2012 09:58:04 +0000 (09:58 +0000)]
Update and fix monitoring of thin pool devices
Code adds better support for monitoring of thin pool devices.
update_pool_lv uses DMEVENTD_MONITOR_IGNORE to not manipulate with monitoring.
vgchange & lvchange are checking real thin pool device for existance
as we are using _tpool real device and visible LV pool device might not
be even active (_tpool is activated implicitely for any thin volume).
monitor_dev_for_events is another _lv_postorder like code it might be worth
to think about reusing it here - for now update the code to properly
monitory thin volume deps.
For unmonitoring add extra code to check the usage of thin pool - in case it's in use
unmonitoring of thin volume is skipped.
Zdenek Kabelac [Tue, 20 Mar 2012 17:42:19 +0000 (17:42 +0000)]
Fix regression in thin monitoring
Patch https://www.redhat.com/archives/lvm-devel/2012-February/msg00118.html
removed initilization of thin volume monitoring, leaving it only for
thin pool - but missed the code move part for monitoring of thin pools.
Effectively making thin pools not monitorable.
Zdenek Kabelac [Tue, 20 Mar 2012 10:51:57 +0000 (10:51 +0000)]
Update testing scripts
Make the teardown really usable - it will try down to remove all the left
devices even from previous test runs
(the only missing piece is probably proper mdadm teardown)
Add few more local vars
Try to setup PATH and LD_LIBRARY_PATH just once.
Try shorter sleeps.
Zdenek Kabelac [Tue, 20 Mar 2012 10:48:59 +0000 (10:48 +0000)]
Update test for dmevent restart
Actually restart was failing for different reason - so pass in proper
location of dmeventd for restart from lvm command and avoid using
the one from /sbin location.
Zdenek Kabelac [Tue, 20 Mar 2012 10:47:02 +0000 (10:47 +0000)]
Support improperly formated device numbers
There are kernel drivers (smblk) which set '-1' as their device major number.
This number is listed in /proc/devices then - but the kernel itself is using
just 12 bits - thus device is accessible via 4095 - there is posted patch
for 3.4 to fix this behavior (0 for auto allocation was mean to be used).
However to still allow using such devices with older kernels add some code
to use same behavior - so cut 12 bits from the major number from /proc/devices.
For now use log_warn() - maybe the severity of the message could be lowered
to just verbose level.
Zdenek Kabelac [Fri, 16 Mar 2012 12:59:43 +0000 (12:59 +0000)]
Update test.sh check.sh utils.sh get.sh
Indent
Better shell usage
Function simplification
More usage of 'get' functions
Don't use valgrind tracing for check and get function (faster)
Update shell debugging (PS4, better stacktrace)
Support paths with spaces
Export SCRIPTNAME for external usage
Watch for dmeventd unexpectedly started during test
Zdenek Kabelac [Fri, 16 Mar 2012 12:59:02 +0000 (12:59 +0000)]
Update aux.sh lvmwrapper
Indent
Add valgrind support:
env LVM_TEST_VALGRIND={0123} (the higher level, more commands tested)
env LVM_TEST_CLVMD=1 runs clvmd within valgrind.
env VALGRIND script name executed for each lvm command (def. is valg).
Smarted teardown - should minimize occurence of left dev entries
(using dmsetup remove -f only as last resort)
sort removed devices by open count before actual removal
Use "" around string that may contain spaces.
Set log/verbose and activation/retry_deactivation to defined value.
Remove debug.log after successful lvm command (easier to check output).
Zdenek Kabelac [Fri, 16 Mar 2012 12:56:29 +0000 (12:56 +0000)]
Allow to use also special prefixed names for test
Currently we could not test special prefixes in our test suite.
As teardown will not find such device and basicaly busyloops here,
as at cannot remove such names.
This patch adds possibity to use:
vgcreate V_$vg1 $dev
Note: you still need to use $PREFIX somewhere in the name.
(And of course, it's really bad idea to use $PREFIX (=LVMTEST)
for normally used LVs)
The only purpose of this patch is to allow testing cluster with
special vg names that begins with V_ , P_....
Zdenek Kabelac [Fri, 16 Mar 2012 12:53:05 +0000 (12:53 +0000)]
Fix string parsing
Fix propagation of -e option - pass it via internal shell variable.
Fix parsing of /proc/mounts files (don't check for substrings).
as reported by O.Mangold with suggested patch:
https://www.redhat.com/archives/linux-lvm/2012-February/msg00030.html
Properly pass arguments with spaces ("$@")
Add validation for YES and EXTOFF variable content.
Fix name conflicts that prevent down-converting RAID1 when specifying a device
When down-converting a RAID1 device, it is the last device that is extracted
and removed when the user does not specify a particular device. However,
when a device is specified (and it is not the last), the device is removed and
the remaining sub-LVs are "shifted down" to fill the hole. This cause problems
when resuming the LV because if the shifted devices were resumed (and thus
renamed) before the sub-LV being extracted, there would be a name conflict.
The solution is to resume the extracted sub-LVs first so that they can be
properly renamed preventing a possible conflict.
Zdenek Kabelac [Wed, 14 Mar 2012 17:12:05 +0000 (17:12 +0000)]
Improve thin_check option passing
Update a way we handle option passing - so we now support path and options
with space inside.
Fix dm name usage for thin pools with '-' in name.
Use new lvm.conf option thin_check_options to pass in options as string array.
Peter Rajnoha [Wed, 14 Mar 2012 15:51:51 +0000 (15:51 +0000)]
Use SD_ACTIVATION env. var. in systemd units to better detect systemd in use.
LISTEN_PID and LISTEN_FDS environment variables are defined only during systemd
"start" action. But we still need to know whether we're activated during
"reload" action as well - we use the reload action to call "dmeventd -R"/"lvmetad -R"
for statefull daemon restart. We can't use normal "restart" as that is simply
composed of "stop" and "start" and we would lose any state the daemon has.