Alasdair Kergon [Wed, 16 May 2012 12:50:14 +0000 (12:50 +0000)]
Re-enable partial activation of non-thin LVs until it can be fixed. (2.02.90)
- The test should be checking the LV as a whole, not just individual segments.
Alasdair Kergon [Fri, 11 May 2012 22:19:12 +0000 (22:19 +0000)]
Fix allocation policy loop so it doesn't continue beyond cling using later
policies it shouldn't be using when --alloc cling is specified but no tags
are defined.
Zdenek Kabelac [Wed, 9 May 2012 12:17:06 +0000 (12:17 +0000)]
Initial support for lvconvert for thin pool volumes.
Support has many limitations and lots of FIXMEs inside,
however it makes initial task when user creates a separate LV for
thin pool data and thin metadata already usable, so let's enable
it for testing.
Easiest API:
lvconvert --chunksize XX --thinpool data_lv metadata_lv
More functionality extensions will follow up.
TODO: Code needs some rework since a lot of same code is getting copied.
Fix up-convert when mirror activation is controled by volume_list and tags.
When mirrors are up-converted, a transient mirror layer is put in so that
only the new devices are sync'ed. That transient layer must carry the tags
of the original mirror LV, otherwise it will fail to activate when activation
is regulated by lvm.conf:activation/volume_list. The conversion would then
fail.
The fix is to do exactly the same thing that is being done for linear ->
mirror converting (lib/metadata/mirror.c:_init_mirror_log()). We copy the
tags temporarily for the new LV and remove them after the activation.
Snapshots of RAID logical volumes are allowed (including "raid1"). However,
snapshots of "mirror" logical volumes has been disallowed due to unsolvable
issues inherent to the design. The fact that mirroring (dm-raid1.c) must
stop all I/O as the result of a failure and wait for userspace intervention
can lead to a circular dependency if userspace is simultaneously waiting for
snapshots (on mirrors) to make an I/O update before proceeding.
Various snapshot on mirror tests have been removed as a result.
Fix bug in cmirror that caused incorrect status info to print on some nodes.
Looking at the code in cmirrord/local.c, we can see the various different
request types handled in different ways. Some information that is non-changing
does not need to go around the cluster and can be short-circuited. For
example, once the cluster mirror is in-sync, it is pointless to continue
sending that query around the cluster. We can save network bandwidth and reply
directly back to the kernel. When it comes to status information, there are
two types 'TABLE' and 'INFO'. The 'TABLE' information never changes and
belongs to the group of requests that can be safely short-circuited. The
'STATUS' information can change - and will change if a device fails. Thus it
cannot be short-circuited, but this is exactly what was found. The 'STATUS'
information request was being short-circuited and therefore never reporting the
failure condition to anyone other than the "server" that experienced it
directly.
Allow a subset of failed devices to be replaced in RAID LVs.
If two devices in an array failed, it was previously impossible to replace
just one of them. This patch allows for the replacement of some, but perhaps
not all, failed devices.
Prevent resume from creating error devices that already exist from suspend.
Thanks to agk for providing the patch that prevents resume from attempting
(and then failing) to create error devices which already exist; having been
created by a corresponding suspend operation.
In some occasional case dmevent restart was experiencing problems
with obtaining pid lockfile. So this patch tries to send several more kill
message until daemon kills itself so there is would reponse.
With this small loop the restart seems to work reliable,
although the loopsize and usleep are just randomly picked for now.
Peter Rajnoha [Tue, 24 Apr 2012 08:00:55 +0000 (08:00 +0000)]
Rename (Blk)DevNames header to (Blk)DevNamesUsed in dmsetup info -c output.
Just to make it clearer since there is the "dmsetup info -c -o blkdevname"
as well that shows the "block device name for this mapping", having a
"BlkDevName" header on output.
It's a bit confusing then if the "dmsetup info -c -o devs_used,blkdevs_used"
is named with a plural "DevNames"/"BlkDevNames" but at the same time having
a totally different meaning than the singular form "BlkDevName".
Unlike 'mirror' segtype, 'raid1' should perform flush on suspend.
The 'mirror' segtype and 'raid1' segtype both set the 'MIRRORED' flag.
However, due to differences in the way these device-mapper targets behave
'mirror' must be suspended with the 'noflush' option and 'raid1' does not
have to be.
This patch ensures that when the 'MIRRORED' flag is checked to see if
'noflush' is needed that it does not also set it for 'raid1' by mistake.
Remove 'up' from rounding message that sometimes rounds down.
Detect reduction of 0 after rounding for stripes and avoid warning of potential data loss.
Fix inability to split RAID1 image while specifying a particular PV.
The logic for resuming the original and newly split LVs was not properly
done to handle situations where anything but the last device in the array
was split. It did not take into account the possible name collisions that
might occur when the original LV undergoes the shifting and renaming of its
sub-LVs.
When resizing thin pool - we need to use strip info from _tdata volume.
In future more generic solution will be necessary once we start to support
lvconvert (resize of stacked devices and stay properly aligned).
For now we just allow striped or linear LV so this code will work.
Peter Rajnoha [Wed, 11 Apr 2012 09:12:02 +0000 (09:12 +0000)]
Change message severity to log_very_verbose for missing dev info in udev db.
Libudev does not provide transactions when querying udev database - once we
get the list of block devices (devices/obtain_device_list_from_udev=1) and
we iterate over the list to get more detailed information about device node
and symlink names used etc., the device could be removed just in between we
get the list and put a query for more info. In this case, libudev returns
NULL value as the device does not exist anymore.
Recently, we've added a warning message to reveal such situations. However,
this could be misleading if the device is not related to the LVM action
we're just processing - the non-related block device could be removed in
parallel and this is not an error but a possible and normal operation.
(N.B. This "missing info" should not happen when devices are related to
the LVM action we're just processing since all such processing should be
synchronized with udev and the udev db must always be in consistent state
after the sync point. But we can't filter this situation out from others,
non-related devices, so we have to lower the message verbosity here for a
general solution.)
RAID LVs could not handle a down-convert if a device other than the last one
in the array was specified for removal. This change addresses that (bz806111).
Commit ID 46a75dedb4f6aa815a804f27cafbd3fd16a62011 consolidated code from the
various dmeventd plug-ins into a new function called 'dmeventd_lvm2_command',
but the new function did not strip off the "_mlog" extentions that the
mirror plug-in had been doing. This created bug 794904 - failure to replace
devices in a redundant log.
The test suite did catch this scenario because it performs repair tests (mainly)
through the CLI and not dmeventd. It's also not easy to test because the test
itself will hang if the bug is encountered.
Add a couple new functions to gdbinit file and decode a couple lv->status flags
New functions:
- seg_pvs: Print a list of PVs in a seg_pvs list
- pv_dev_name: print name of a PV
Zdenek Kabelac [Fri, 30 Mar 2012 14:59:35 +0000 (14:59 +0000)]
Fix unlocking in error path of vgreduce
When vg_read fails, it internally unlocks VG if it's been locked,
so in error path we should skip unlock_vg for this case.
(user would see ugly internal warning)
devel/~ # lvconvert --splitmirrors 1 -n abc/splitted_one vg/mirrored_one
Please use a single volume group name ("vg" or "abc")
Run `lvconvert --help' for more information.
Zdenek Kabelac [Wed, 28 Mar 2012 11:10:08 +0000 (11:10 +0000)]
Improve test suite
Add make help target.
Add LVM_TEST_PARALLEL to support parallel runs of tests
Work around the problem the dmsetup table/info may return error
by using dmtable and dminfo function that will use 'should'.
(Error happens when some concurently running process removes table
entry while dmsetup command resolves table entries inside the loop.)
Milan Broz [Tue, 27 Mar 2012 15:53:45 +0000 (15:53 +0000)]
Fix exclusive lvmchange -aey to fail if volume is active on different node.
Activation on remote node should be tried only if it is masked by tags
locally (like when hosttags enabled, IOW activate_lv_excl_local()
doesn't return error.)
Introduced change caused that lvchange -aey succeeded even if volume was
activated exclusively remotely.
Peter Rajnoha [Tue, 27 Mar 2012 11:04:46 +0000 (11:04 +0000)]
Add 'vgscan --cache' functionality for consistency with 'pvscan --cache'.
Calling vgscan alone should reuse information from the lvmetad (if running).
The --cache option should initiate direct device scan and update lvmetad
appropriately (if running).
This is mainly for vgscan to behave consistently compared to pvscan.
Milan Broz [Mon, 26 Mar 2012 20:32:58 +0000 (20:32 +0000)]
Do not allow pvmove if some affected LVs are activated
locally or on more nodes while others are activated exclusively.
Current pvmove code can either use local mirror (for exclusive
activation) or cmirror (for clustered LVs).
Because the whole intenal pvmove LV is just segmented LV containing
segments of several top-level LVs, code cannot properly handle
situation if some segment need to be activated exclusively.
Previously, it wrongly activated exclusive LV on all nodes
(locing code allowed it) but now this is no lnger possible.
If there is exclusively activated LV, pvmove is only
possible if all affected LVs are aslo activated exclusively.
(Note that in non-exclusive mode pvmove still activates LVs
on other nodes during move.)
# lvchange -aly vg_test/lv1
# lvchange -aey vg_test/lv2
# pvmove -i 1 /dev/sdc
Error locking on node bar-01: Device or resource busy
Error locking on node bar-03: Volume is busy on another node
...
Failed to activate lv2
Zdenek Kabelac [Fri, 23 Mar 2012 09:58:04 +0000 (09:58 +0000)]
Update and fix monitoring of thin pool devices
Code adds better support for monitoring of thin pool devices.
update_pool_lv uses DMEVENTD_MONITOR_IGNORE to not manipulate with monitoring.
vgchange & lvchange are checking real thin pool device for existance
as we are using _tpool real device and visible LV pool device might not
be even active (_tpool is activated implicitely for any thin volume).
monitor_dev_for_events is another _lv_postorder like code it might be worth
to think about reusing it here - for now update the code to properly
monitory thin volume deps.
For unmonitoring add extra code to check the usage of thin pool - in case it's in use
unmonitoring of thin volume is skipped.
Zdenek Kabelac [Tue, 20 Mar 2012 17:42:19 +0000 (17:42 +0000)]
Fix regression in thin monitoring
Patch https://www.redhat.com/archives/lvm-devel/2012-February/msg00118.html
removed initilization of thin volume monitoring, leaving it only for
thin pool - but missed the code move part for monitoring of thin pools.
Effectively making thin pools not monitorable.
Zdenek Kabelac [Tue, 20 Mar 2012 10:51:57 +0000 (10:51 +0000)]
Update testing scripts
Make the teardown really usable - it will try down to remove all the left
devices even from previous test runs
(the only missing piece is probably proper mdadm teardown)
Add few more local vars
Try to setup PATH and LD_LIBRARY_PATH just once.
Try shorter sleeps.
Zdenek Kabelac [Tue, 20 Mar 2012 10:48:59 +0000 (10:48 +0000)]
Update test for dmevent restart
Actually restart was failing for different reason - so pass in proper
location of dmeventd for restart from lvm command and avoid using
the one from /sbin location.
Zdenek Kabelac [Tue, 20 Mar 2012 10:47:02 +0000 (10:47 +0000)]
Support improperly formated device numbers
There are kernel drivers (smblk) which set '-1' as their device major number.
This number is listed in /proc/devices then - but the kernel itself is using
just 12 bits - thus device is accessible via 4095 - there is posted patch
for 3.4 to fix this behavior (0 for auto allocation was mean to be used).
However to still allow using such devices with older kernels add some code
to use same behavior - so cut 12 bits from the major number from /proc/devices.
For now use log_warn() - maybe the severity of the message could be lowered
to just verbose level.