Preserve exclusive activation of cluster mirror when converting.
This patch to the suspend code - like the similar change for resume -
queries the lock mode of a cluster volume and records whether it is active
exclusively. This is necessary for suspend due to the possibility of
preloading targets. Failure to check to exclusivity causes the cluster target
of an exclusively activated mirror to be used when converting - rather than
the single machine target.
Zdenek Kabelac [Thu, 19 Jan 2012 15:39:41 +0000 (15:39 +0000)]
Thin disable snapshot creation when pool is over the threshold.
Since snapshot needs to suspend origin - it might lead to pool userspace
deadlock (as the pool will wait for new space in case it would be overfilled,
but dmeventd would not be able to resize it, as the lvcreate operation would
have kept the VG lock.)
To minimize the risk of such scenario - we prevent to create new snapshot
in case we are over the threshold - but beware, there is still small timewindow,
so keep threshold at some reasonable level!
Zdenek Kabelac [Thu, 19 Jan 2012 15:27:54 +0000 (15:27 +0000)]
Thin add function to read thin volume percent
This value returns percentage of 'mapped' size compared with total LV size.
(Without passed seg pointer it return highest mapped size - but it's
not used yet.)
Petr Rockai [Mon, 16 Jan 2012 08:25:32 +0000 (08:25 +0000)]
Beef up the lvmetad code with more functionality and a bunch of bugfixes. There
used to be a few mis-ordered memory accesses (release and access in the next
block). Fix that set_flag could have sometimes corrupted the flags being
modified.
A few issues with metadata tracking are sorted out as well now, and there are
only a few problems remaining before we can integrate lvmetad, mostly on the
client side:
- metadata areas need to be tracked in lvmetad (most likely to be addressed
through an extension of metadata, meaning no special support in lvmetad would
be needed)
- non-udev scanning code needs to be taught about telling lvmetad about device
disappearance (pvscan most importantly)
- this last item also needs to mesh with metadata inconsistencies and
suddenly-incomplete volume groups (aux disable_dev in tests); udev-based
scanning should address this separately and more elegantly
Petr Rockai [Sun, 15 Jan 2012 11:17:16 +0000 (11:17 +0000)]
Unfortunately, blank lines are sometimes produced by config serializer, and
this interferes with their role as message separator in the lvmetad
protocol. Switch to using "##" on an otherwise blank line as a separator.
Peter Rajnoha [Wed, 11 Jan 2012 12:46:19 +0000 (12:46 +0000)]
Support different device name types on output of dmsetup deps, ls and info -c command.
Add 'blkdevname' and 'blkdevs_used' field to dmsetup info -c -o.
Add 'blkdevname' option to dmsetup ls --tree to see block device names.
Add '-o options' to dmsetup deps and ls to select device name type on output.
Peter Rajnoha [Wed, 11 Jan 2012 12:34:44 +0000 (12:34 +0000)]
Add dm_device_get_name to get map name or block device name for given devno.
This is accomplished by reading associated sysfs information. For a dm device,
this is /sys/dev/block/major:minor/dm/name (supported in kernel version >= 2.6.29,
for older kernels, the behaviour is the same as for non-dm devices).
For a non-dm device, this is a readlink on /sys/dev/block/major:minor, e.g.
/sys/dev/block/253:0 --> ../../devices/virtual/block/dm-0.
The last component of the path is a proper kernel name (block device name).
One can request to read only kernel names by setting the 'prefer_kernel_name'
argument if needed.
Zdenek Kabelac [Mon, 9 Jan 2012 12:26:14 +0000 (12:26 +0000)]
Use sysfs to set/get of read-ahead
If we know major:minor number of device (which is known after resume) we will
try to use sysfs to set/get read ahead parameters of device.
This avoid potential problem of blocking commands like 'dmsetup info' awaiting
for device being usable for open/close - i.e. overfilled thin pool may block
such command.
Zdenek Kabelac [Thu, 5 Jan 2012 15:38:18 +0000 (15:38 +0000)]
Support rounding of percentage upward
We want to keep this logic -
when LV is extend - extend the LV by at least given amount,
when LV is reduced - reduce the LV by at most given amount.
So for this the rounding needs to be used.
Current logic which seems to satisfy give rule is to round up all
extent values for LV resize upward except for values with '-' sign
that are round downward.
This patch also fixes the problem when lvextend --use-polices tried
to extend LV the by i.e. 20% - but the resulting 20% were smaller
the extent size thus before this patch no extension happened.
Zdenek Kabelac [Thu, 22 Dec 2011 16:37:01 +0000 (16:37 +0000)]
Use new dmeventd_lvm2_command function in dmeventd plugins.
For snapshot, prepare whole command in front into private buffer.
Add also some missing '\n' for syslog messages.
For raid and mirror only convert creation of command line string.
This should avoid any unbound growth of mempool for dm_split_names.
Zdenek Kabelac [Thu, 22 Dec 2011 15:57:29 +0000 (15:57 +0000)]
Thin use helper function
Fix some minor outstading issue from thin plugin introduction -
Call dmeventd_lvm2_exit() in failpath for registration.
Add some missing '\n' in syslog messages.
Zdenek Kabelac [Wed, 21 Dec 2011 13:24:24 +0000 (13:24 +0000)]
Drop extra stat before open of device
Since the !(dev->flags & DEV_REGULAR) code path just called
dev_name_confirmed() which has just called 'stat()' inside,
remove duplicate second stat() call here.
Zdenek Kabelac [Wed, 21 Dec 2011 13:21:09 +0000 (13:21 +0000)]
Do not lstat common path prefix
When both path have identical prefix i.e. /dev/disk/by-id
skip 2 x lstat() for /dev /dev/disk /dev/disk/by-id
and directly lstat() only different part of the path.
Reduces amount of lstat calls on system with lots of devices.
Add policy based automated repair of RAID logical volumes
The RAID plug-in for dmeventd now calls 'lvconvert --repair' to address failures
of devices in a RAID logical volume. The action taken can be either to "warn"
or "allocate" a new device from any spares that may be available in the
volume group. The action is designated by setting 'raid_fault_policy' in
lvm.conf - the default being "warn".
Don't allow two images to be split and tracked from a RAID LV at one time
Also, don't allow a splitmirror operation on a RAID LV that is already tracking
a split, unless the operation is to stop the tracking and complete the split.
Example:
~> lvconvert --splitmirrors 1 --trackchanges vg/lv /dev/sdc1
# Now tracking changes - image can be merged back or split-off for good
~> lvconvert --splitmirrors 1 -n new_name vg/lv /dev/sdc1
# ^ Completes split ^
If a split is performed on a RAID that is tracking an already split image and
PVs are provided, we must ensure that
1) the already split LV is represented in the PVs
2) we are careful to split only the tracked image
Support the ability to replace specific devices in a RAID array.
RAID is not like traditional LVM mirroring. LVM mirroring required failed
devices to be removed or the logical volume would simply hang. RAID arrays can
keep on running with failed devices. In fact, for RAID types other than RAID1,
removing a device would mean substituting an error target or converting to a
lower level RAID (e.g. RAID6 -> RAID5, or RAID4/5 to RAID0). Therefore, rather
than removing a failed device unconditionally and potentially allocating a
replacement, RAID allows the user to "replace" a device with a new one. This
approach is a 1-step solution vs the current 2-step solution.