fmt1 doesn't have a separate commit function: updates take effect
immediately vg_write is called, so we must update lvmetad at this
point if we're going to go on and ask lvmetad for the VG metadata
again before calling the commit function (though that's probably an
unsupported and pointless thing to do anyway as the client must
already have that data and it cannot have changed because it's locked
and with devs suspended we shouldn't be communicating with lvmetad;
so when that's fixed properly, this fix here can be reverted).
This problem showed up as an internal error when lvremoving an LVM1
snapshot.
Peter Rajnoha [Fri, 21 Dec 2012 09:54:31 +0000 (10:54 +0100)]
pvscan: synchronize with udev if pvscan --cache is used.
We need to call sync_local_dev_names directly as pvscan uses
VG_GLOBAL lock and this one *does not* cause the synchronization
(sync_dev_names) to be called on unlock (VG_GLOBAL is not a real VG):
define unlock_vg(cmd, vol)
do { \
if (is_real_vg(vol)) \
sync_dev_names(cmd); \
(void) lock_vol(cmd, vol, LCK_VG_UNLOCK); \
} while (0)
Without this fix, we end up without udev synchronization for the
pvscan --cache (mainly for -aay that causes the VGs/LVs to be
autoactivated) and also udev synchronization cookies are then left
in the system since they're not managed properly (code before sets
up udev sync cookies, but we have to call dm_udev_wait at least once
after that to do the wait and cleanup).
Peter Rajnoha [Fri, 21 Dec 2012 09:34:48 +0000 (10:34 +0100)]
activation: fix autoactivation to not trigger on each PV change
Before, the pvscan --cache -aay was called on each ADD and CHANGE
uevent (for a device that is not a device-mapper device) and each CHANGE
event (for a PV that is a device-mapper device).
This causes troubles with autoactivation in some cases as CHANGE event
may originate from using the OPTION+="watch" udev rule that is defined
in 60-persistent-storage.rules (part of the rules provided by udev
directly) and it's used for all block devices
(except fd*|mtd*|nbd*|gnbd*|btibm*|dm-*|md* devices). For example, the
following sequence incorrectly activates the rest of LVs in a VG if one
of the LVs in the VG is being removed:
[root@rhel6-a ~]# pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
[root@rhel6-a ~]# vgcreate vg /dev/sda
Volume group "vg" successfully created
[root@rhel6-a ~]# lvcreate -l1 vg
Logical volume "lvol0" created
[root@rhel6-a ~]# lvcreate -l1 vg
Logical volume "lvol1" created
[root@rhel6-a ~]# vgchange -an vg
0 logical volume(s) in volume group "vg" now active
...so the vg was deactivated, then lvol1 removed, and we end up with
lvol1 removed (which is ok) BUT with lvol0 activated (which is wrong)!!!
This is because after lvol1 removal, we need to write metadata to the
underlying device /dev/sda and that causes the CHANGE event to be
generated (because of the WATCH udev rule set on this device) and this
causes the pvscan --cache -aay to be reevaluated.
We have to limit this and call pvscan --cache -aay to autoactivate
VGs/LVs only in these cases:
--> if the *PV is not a dm device*, scan only after proper device
addition (ADD event) and not with any other changes (CHANGE event)
--> if the *PV is a dm device*, scan only after proper mapping
activation (CHANGE event + the underlying PV in a state "just
activated")
Jonathan Brassow [Tue, 18 Dec 2012 20:40:42 +0000 (14:40 -0600)]
RAID: Limit replacement of devices when array is not in-sync.
If a RAID array is not in-sync, replacing devices should not be allowed
as a general rule. This is because the contents used to populate the
incoming device may be undefined because the devices being read where
not in-sync. The kernel enforces this rule unless overridden by not
allowing the creation of an array that is not in-sync and includes a
devices that needs to be rebuilt.
Since we cannot know the sync state of an LV if it is inactive, we must
also enforce the rule that an array must be active to replace devices.
That leaves us with the following conditions:
1) never allow replacement or repair of devices if the LV is in-active
2) never allow replacement if the LV is not in-sync
3) allow repair if the LV is not in-sync, but warn that contents may
not be recoverable.
In the case where a user is performing the repair on the command line via
'lvconvert --repair', the warning is printed before the user is prompted
if they would like to replace the device(s). If the repair is automated
(i.e. via dmeventd and policy is "allocate"), then the device is replaced
if possible and the warning is printed.
Andy Grover [Mon, 17 Dec 2012 22:14:38 +0000 (14:14 -0800)]
lvm2app: No special behavior for 0 for max_snap_size in lvm_lv_snapshot()
It isn't possible to choose a sane default for snapshot size, so just
play it straight and use the passed size instead of adding special
behavior for 0.
Also revert change to Python lib, size parameter must be supplied.
Petr Rockai [Sun, 16 Dec 2012 23:43:18 +0000 (00:43 +0100)]
lvmetad: Fix a possible race in remove_metadata.
All operations on shared hash tables need to be protected by mutexes. Moreover,
lookup and subsequent key removal need to happen atomically, to avoid races (and
possible double free-ing) between multiple threads trying to manipulate the same
VG.
Zdenek Kabelac [Fri, 14 Dec 2012 15:35:26 +0000 (16:35 +0100)]
lvmetad: unlock vg on out-of-memory path
If we fail to get memory for mutex, hash the mutex
or fail somewhere along pthread function calls
return allocated resources back and unlock vg_lock_map mutex.
Zdenek Kabelac [Fri, 14 Dec 2012 00:00:36 +0000 (01:00 +0100)]
libdaemon: check for strdup result
Detect failure of dm_pool_strdup() and print error in fail path.
Save one extra strchr call - since we already know the distance
for the '=' character.
Zdenek Kabelac [Fri, 14 Dec 2012 12:57:01 +0000 (13:57 +0100)]
log: move abort past syslog
When the abort_on_internal_errors is enabled, we aborted prior
the syslog logging output.
Since such fatal error gets level _LOG_FATAL it should
not be blocked by debug_level() check so lets move it further,
to get abort error logged also via syslog.
Peter Rajnoha [Thu, 13 Dec 2012 10:15:37 +0000 (11:15 +0100)]
lvconvert: also allow --type with --stripes
We can also use this for conversion between different mirror segment
types. Each new segment type converter then needs to check itself
whether the --stripes is applicable.
Petr Rockai [Wed, 12 Dec 2012 13:39:52 +0000 (14:39 +0100)]
toollib: Avoid a global lock in process_each_pv if lvmetad is used.
The motivation to grab the global lock is to avoid a scan and metadata parsing
for each PV, but the cost of obtaining metadata is _mostly_ mitigated by having
lvmetad around. Not taking the global lock improves throughput when multiple pvs
or related commands are running in parallel, like in RHEV.
Petr Rockai [Wed, 12 Dec 2012 11:51:28 +0000 (12:51 +0100)]
lvmetad: Fix autoactivation for MDA-less PVs.
Calling pvscan --cache with -aay on a PV without an MDA would spuriously fail
with an internal error, because of an incorrect assumption that a parsed VG
structure was always available. This is not true and the autoactivation handler
needs to call vg_read to obtain metadata in cases where the PV had no MDAs to
parse. Therefore, we pass vgid into the handler instead of the (possibly NULL)
VG coming from the PV's MDA.
Zdenek Kabelac [Mon, 10 Dec 2012 09:22:48 +0000 (10:22 +0100)]
thin: fix test for dicards ignore settings
Arghh, this was bad last-minute shortening of if() expression
in the commit 1ef98310187a7.
dm_tree_node_set_thin_pool_discard() must not run in the same
expression as check for non-power-2 discard, otherwise
there are 2 calls for dm_tree_node_set_thin_pool_discard
and whole setting of discards is missinterpretted.
pvmove/RAID: Disallow pvmove on RAID LVs until properly handled
Attempting pvmove on RAID LVs replaces the kernel RAID target with
a temporary pvmove target, ultimately destroying the RAID LV. pvmove
must be prevented on RAID LVs for now.
Use 'lvconvert --replace old_pv vg/lv new_pv' if you want to move
an image of the RAID LV.
In case we don't want to activate, autoactivate or have the
VG/LV read-only. Primarily targeted for the auto_activation_volume_list,
but it makes no harm for other settings (the part of the code
that reads these three settings is shared, but there's no
reason to separate it only for this change).
Zdenek Kabelac [Sun, 2 Dec 2012 15:32:42 +0000 (16:32 +0100)]
libdm: deactivate failed node in preload
If the resume of preloaded node fails, do not leave such
node in the table - since it may not be easy to detach such
node later when the node is i.e. internal.
i.e. failing activation of the thin pool with mismatching
chunk size may leave -tpool device in the table, which
could have been then removed only by dmsetup command.
Jonathan Brassow [Fri, 30 Nov 2012 22:47:02 +0000 (16:47 -0600)]
man page (lvcreate): Better explain stripes option for RAID 4/5/6.
Do a better job explaining the '--stripes/-i' option to lvcreate
when it comes to RAID 4/5/6. The extra devices needed for parity
are implicitly added to the argument given. So a 5-device RAID6
logical volume is created with '-i 3' - indicating 3 stripes plus
the implicit 2 devices needed for RAID6.
Peter Rajnoha [Thu, 29 Nov 2012 14:50:52 +0000 (15:50 +0100)]
udev: add a warning message if DM_DISABLE_UDEV set and udev running
$ export DM_DISABLE_UDEV=1
$ dmsetup create test --table "0 1 zero"
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, device-mapper library will manage device nodes in device directory.
$ lvchange -ay vg/lvol0
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will manage logical volume symlinks in device directory.
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will obtain device list by scanning device directory.
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, device-mapper library will manage device nodes in device directory.
Setting this environment variable will cause a full fallback
to old direct node and symlink management in libdevmapper and lvm2.
It means:
- disabling udev synchronization
(--noudevsync in dmsetup and --noudevsync + activation/udev_sync=0
lvm2 config)
- disabling dm and any subsystem related udev rules
(--noudevrules in dmsetup and activation/udev_rules=0 lvm2 config)
- management of nodes/symlinks under /dev directly by libdevmapper/lvm2
(--verifyudev in dmsetup and activation/verify_udev_operations=1
lvm2 config)
- not obtaining any device list from udev database
(devices/obtain_device_list_from_udev=0 lvm2 config)
Note: we could set all of these before - there's no functional change!
However the DM_DISABLE_UDEV environment variable is a nice shortcut
to make it easier for libdevmapper users so that one can switch off all
of the udev management off at one go directly on the command line,
without a need to modify any source or add any extra switches.
Peter Rajnoha [Thu, 29 Nov 2012 12:59:12 +0000 (13:59 +0100)]
udev: do not verify udev operations for --noudevsync
If udev synchronization is disabled by means of --noudevsync
option, we should disable just the synchronization and nothing else.
The udev fallback (verifying udev operations and fixing the
nodes/symlinks if found incorrect) is orthogonal and controlled
by a separate activation/verify_udev_operations configuration option.
Zdenek Kabelac [Mon, 26 Nov 2012 22:45:35 +0000 (23:45 +0100)]
thin: allow restore with --force
Allow restoring metadata with thin pool volumes.
No validation is done for this case within vgcfgrestore tool -
thus incorrect metadata may lead to destruction of pool content.
Zdenek Kabelac [Mon, 26 Nov 2012 10:04:00 +0000 (11:04 +0100)]
thin: detect discards for non-power-2
Check if target supports discards for chunk sizes,
that are not power of 2 (just multiple of 64K),
and enable it in case it's supported by thin kernel target.
Jonathan Brassow [Thu, 22 Nov 2012 00:46:52 +0000 (18:46 -0600)]
RAID: If no stripes argument is given for RAID10 create, default to 2
Similar to the way the 'mirror', 'raid1' and 'raid10' segment types set
the number of mirrors to 2 ('-m 1') if the argument is not specified,
here we set the number of stripes to 2 if not given on the command line
when creating a RAID10 LV.
Commit bf2741376d47411994d4065863acab8e405ff5c7 started to use
lv_is_active() instead of call for lv_info & info.exists so
we cover also cluster activated devices.
For snapshost the conversion was not correct and introduced
regression by blocking creation of snapshot of inactive LV.
Fix it by assigning lv_is_active() directly.
Note: we still have minor issue to fix - to make
lv_is_???? function able to return error states since
lv_info() may fail.
Zdenek Kabelac [Thu, 15 Nov 2012 13:48:32 +0000 (14:48 +0100)]
thin: add common pool functions
Move common functions for lvcreate and lvconvert.
get_pool_params() - read thin pool args.
update_pool_params() - updates/validates some thin args.
It is getting complicated and even few more things will be
implemented, so to avoid reimplementing things differently
in lvcreate and lvconvert code has been splitted
into 2 common functions that allow some future extension.