Jonathan Brassow [Wed, 10 Oct 2012 16:33:10 +0000 (11:33 -0500)]
[lv|vg]change: Allow limited metadata changes when PVs are missing
A while back, the behavior of LVM changed from allowing metadata changes
when PVs were missing to not allowing changes. Until recently, this
change was tolerated by HA-LVM by forcing a 'vgreduce --removemissing'
before trying (again) to add tags to an LV and then activate it. LVM
mirroring requires that failed devices are removed anyway, so this was
largely harmless. However, RAID LVs do not require devices to be removed
from the array in order to be activated. In fact, in an HA-LVM
environment this would be very undesirable. Device failures in such an
environment can often be transient and it would be much better to restore
the device to the array than synchronize an entirely new device.
There are two methods that can be used to setup an HA-LVM environment:
"clvm" or "tagging". For RAID LVs, "clvm" is out of the question because
RAID LVs are not supported in clustered VGs - not even in an exclusively
activated manner. That leaves "tagging". HA-LVM uses tagging - coupled
with 'volume_list' - to ensure that only one machine can have an LV active
at a time. If updates are not allowed when a PV is missing, it is
impossible to add or remove tags to allow for activation. This removes
one of the most basic functionalities of HA-LVM - site redundancy. If
mirroring or RAID is used to replicate the storage in two data centers
and one of them goes down, a server and a storage device are lost. When
the service fails-over to the alternate site, the VG will be "partial".
Unable to add a tag to the VG/LV, the RAID device will be unable to
activate.
The solution is to allow vgchange and lvchange to alter the LVM metadata
for a limited set of options - --[add|del]tag included. The set of
allowable options are ones that do not cause changes to the DM kernel
target (like --resync would) or could alter the structure of the LV
(like allocation or conversion).
Peter Rajnoha [Wed, 10 Oct 2012 15:03:47 +0000 (17:03 +0200)]
dmsetup: also apply 'mangle' command for UUIDs
Compared to names, UUIDs can't be renamed once they are created
for a device. The 'mangle' command will just issue an error message
about a need for manual intervention in this case - reactivating the
device (remove + create) does the job as the defualt mangling mode
used is "auto" and that will assign a correct mangled form the UUID.
Peter Rajnoha [Wed, 10 Oct 2012 14:59:47 +0000 (16:59 +0200)]
libdm: add dm_task_get_uuid_mangled/unmangled
Just like we already have existing mangling support for
device-mapper names, we need exactly the same for device-mapper
UUIDs as their character whitelist is wider than what udev supports.
In case udev is used to create entries in /dev based on UUIDs
and these UUIDs contain characters not supported by udev,
we'll end up with incorrect /dev content for such devices.
So we need to mangle them to a form that is supported by udev.
The mangling used for UUIDs follows the mangling used for names
(that is already supported and used throughout). That means,
setting the name mangling mode via dm_set_name_mangling_mode
affects mangling used for UUIDs in exactly the same manner.
It would be useless to add a new and separate
dm_set_uuid_mangling_mode fn, we'll reuse existing interface.
Peter Rajnoha [Mon, 8 Oct 2012 14:49:54 +0000 (16:49 +0200)]
systemd: remove ExecStartPost from lvm2-lvmetad.service.
The ExecStartPost with pvscan --cache in lvm2-lvmetad.service
is not needed now as this is called transparently within the
first LVM command that queries lvmetad.
Zdenek Kabelac [Mon, 8 Oct 2012 12:46:44 +0000 (14:46 +0200)]
test: move raid test to separate tests
Revert changes to origin lvcreate-large test and use separate
test scripts for raid - so they can be properly skipped when
kernel doesn't support raid targets.
Zdenek Kabelac [Fri, 5 Oct 2012 09:06:08 +0000 (11:06 +0200)]
lvconvert: disable convertion of thin to mirrors
For now this convertions is not supported, thus disabled.
The only supported conversion for now is to create mirrored thin pools
from mirrored devices.
RAID: Do not allow RAID LVs in a cluster volume group.
It would be possible to activate a RAID LV exclusively in a cluster
volume group, but for now we do not allow RAID LVs to exist in a
clustered volume group at all. This has two components:
1) Do not allow RAID LVs to be created in a clustered VG
2) Do not allow changing a VG from single-machine to clustered
if there are RAID LVs present.
Zdenek Kabelac [Mon, 14 May 2012 11:57:30 +0000 (13:57 +0200)]
thin: lvconvert
Update code for lvconvert.
Change the lvconvert user interface a bit - now we require 2 specifiers
--thinpool takes LV name for data device (and makes the name)
--poolmetadata takes LV name for metadata device.
Fix type in thin help text -z -> -Z.
Supported is also new flag --discards for thinpools.
Patch clears the flag if thin pool is stacked over mirror.
Since thin pool could be used to stack device over mirrors,
it needs resume properly i.e. mirrors with corelog which are otherwise
unconditionally skipped (for pvmove functionality).
Jonathan Brassow [Thu, 27 Sep 2012 21:51:22 +0000 (16:51 -0500)]
RAID: Fix problems with creating, extending and converting large RAID LVs
MD's bitmaps can handle 2^21 regions at most. The RAID code has always
used a region_size of 1024 sectors. That means the size of a RAID LV was
limited to 1TiB. (The user can adjust the region_size when creating a
RAID LV, which can affect the maximum size.) Thus, creating, extending or
converting to a RAID LV greater than 1TiB would result in a failure to
load the new device-mapper table.
Again, the size of the RAID LV is not limited by how much space is allocated
for the metadata area, but by the limitations of the MD bitmap. Therefore,
we must adjust the 'region_size' to ensure that the number of regions does
not exceed the limit. I've added code to do this when extending a RAID LV
(which covers 'create' and 'extend' operations) and when up-converting -
specifically from linear to RAID1.
Petr Rockai [Sat, 11 Aug 2012 08:37:28 +0000 (10:37 +0200)]
lib/cache/lvmetad: Refactor to use dm_config_tree in requests.
We were using daemon_send_simple until now, but it is no longer adequate, since
we need to manipulate requests in a generic way (adding a validity token to each
request), and the tree-based request interface is much more suitable for this.
Petr Rockai [Sat, 11 Aug 2012 08:33:53 +0000 (10:33 +0200)]
libdaemon: Extend and refactor APIs.
- move common dm_config_tree manipulation functions from lvmetad-core to
daemon-shared
- add config-tree-based request manipulation APIs to daemon-client
- factor out _v (va_list) variants of most variadic functions in libdaemon
Jonathan Brassow [Wed, 19 Sep 2012 16:09:32 +0000 (11:09 -0500)]
mirror: 'lvconvert --resync' should reset LV_NOTSYNCED on corelog mirror
When reformatting the 'lvchange_resync' code in commit 05131f5853e86419d9c726faa961b8d012298d9c, a '!' should have been removed
from the condition that checks for the LV_NOTSYNCED flag on a corelog
mirror LV. The presence of this '!' caused the LV_NOTSYNCED flag to be
cleared when it wasn't present and left when it was present.
It is not allowed to add images to a 'mirror' or 'raid1' LV if the
LV_NOTSYNCED flag is set. We add some up-convert tests to ensure this
behavior is being enforced and that the LV_NOTSYNCED flag is being
properly cleared by 'lvchange --resync'.
(Not updating WHATS_NEW because this is intrarelease.)
Jonathan Brassow [Fri, 14 Sep 2012 21:26:53 +0000 (16:26 -0500)]
RAID1: Clear the LV_NOTSYNCED flag when a RAID1 LV is converted to linear
Failing to clear the LV_NOTSYNCED flag when converting a RAID1 LV to
linear can result in the flag being present after an upconvert - even
if the sync is performed when upconverting.
Jonathan Brassow [Fri, 14 Sep 2012 21:12:52 +0000 (16:12 -0500)]
RAID1: Like mirrors, do not allow adding images to LV created w/ --nosync
Mirrors do not allow upconverting if the LV has been created with --nosync.
We will enforce the same rule for RAID1. It isn't hugely critical, since
the portions that have been written will be copied over to the new device
identically from either of the existing images. However, the unwritten
sections may be different, causing the added image to be a hybrid of the
existing images.
Also, we are disallowing the addition of new images to a RAID1 LV that has
not completed the initial sync. This may be different from mirroring, but
that is due to the fact that the 'mirror' segment type "stacks" when adding
a new image and RAID1 does not. RAID1 will rebuild a newly added image
"inline" from the existant images, so they should be in-sync.
Peter Rajnoha [Wed, 12 Sep 2012 09:30:13 +0000 (11:30 +0200)]
systemd: depend on systemd-udev-settle unit in activation unit
The "fedora-wait-storage.service" that the "lvm2-activation.service"
had as a dependency (which was fedora-specific solution anyway)
is obsolete now as this unit called "modprobe scsi_wait_scan"
which is not used anymore.
The "fedora-wait-storage.service" had "systemd-udev-settle" as
its dependency, so let's depend on this one directly now,
bypassing the out-dated "fedora-wait-storage.service".
Jonathan Brassow [Tue, 11 Sep 2012 18:09:35 +0000 (13:09 -0500)]
RAID: Properly handle resync of RAID LVs
Issuing a 'lvchange --resync <VG>/<RAID_LV>' had no effect. This is
because the code to handle RAID LVs was not present. This patch adds
the code that will clear the metadata areas of RAID LVs - causing them
to resync upon activation.
Jonathan Brassow [Tue, 11 Sep 2012 18:01:05 +0000 (13:01 -0500)]
cleanup: Restructure code that handles mirror resyncing
When an LV is to be resynced, the metadata areas are cleared and the
LV is reactivated. This is true for mirroring and will also be true
for RAID LVs. We restructure the code in lvchange_resync() so that we
keep all the common steps necessary (validation of ability to resync,
deactivation, activation of meta/log devices, clearing of those devices,
etc) and place the code that will be divergent in separate functions:
detach_metadata_devices()
attach_metadata_devices()
The common steps will be processed on lists of metadata devices. Before
RAID capability is added, this will simply be the mirror log device (if
found).
This patch lays the ground-work for adding resync of RAID LVs.
Jonathan Brassow [Tue, 11 Sep 2012 17:55:17 +0000 (12:55 -0500)]
cleanup: Reduce indentation by short-circuiting function
By changing the conditional for resyncing mirrors with core-logs a
bit, we can short-circuit the rest of the function for that case
and reduce the amount of indenting in the rest of the function.
This cleanup will simplify future patches aimed at properly handling
the resync of RAID LVs.
Jonathan Brassow [Mon, 10 Sep 2012 22:15:20 +0000 (17:15 -0500)]
RAID: Disallow addition of RAID images while array is not in-sync
We cannot add images to a RAID array while it is not in-sync. The
kernel will simply reject the table, saying:
'rebuild' specified while array is not in-sync
Now we check to ensure the LV is in-sync before attempting image
additions.
RAID: '--test' should not cause a valid create command to fail
It is necessary when creating a RAID LV to clear the new metadata areas.
Failure to do so could result in a prepopulated bitmap that would cause
the new array to skip syncing portions of the array. It is a requirement
that the metadata LVs be activated and cleared in the process of creating.
However in test mode, this requirement should be lifted - no new LVs should
be created or written to.
cleanup: Use segtype->ops->name() instead of segtype->name where applicable
When printing a message for the user and the lv_segment pointer is available,
use segtype->ops->name() instead of segtype->name. This gives a better
user-readable name for the segment. This is especially true for the
'striped' segment type, which prints "linear" if there is an area_count of
one.
Peter Rajnoha [Mon, 27 Aug 2012 13:39:08 +0000 (15:39 +0200)]
make: fix subdir order for distclean
The 'test' subdir needs to be processed before 'tools' subdir
for distclean as all the cmd names are read from 'tools/.commands'
file. Otherwise we'd end up with dangling symlinks in 'tools' subdir.