Jonathan Brassow [Fri, 30 Nov 2012 22:47:02 +0000 (16:47 -0600)]
man page (lvcreate): Better explain stripes option for RAID 4/5/6.
Do a better job explaining the '--stripes/-i' option to lvcreate
when it comes to RAID 4/5/6. The extra devices needed for parity
are implicitly added to the argument given. So a 5-device RAID6
logical volume is created with '-i 3' - indicating 3 stripes plus
the implicit 2 devices needed for RAID6.
Peter Rajnoha [Thu, 29 Nov 2012 14:50:52 +0000 (15:50 +0100)]
udev: add a warning message if DM_DISABLE_UDEV set and udev running
$ export DM_DISABLE_UDEV=1
$ dmsetup create test --table "0 1 zero"
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, device-mapper library will manage device nodes in device directory.
$ lvchange -ay vg/lvol0
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will manage logical volume symlinks in device directory.
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will obtain device list by scanning device directory.
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, device-mapper library will manage device nodes in device directory.
Setting this environment variable will cause a full fallback
to old direct node and symlink management in libdevmapper and lvm2.
It means:
- disabling udev synchronization
(--noudevsync in dmsetup and --noudevsync + activation/udev_sync=0
lvm2 config)
- disabling dm and any subsystem related udev rules
(--noudevrules in dmsetup and activation/udev_rules=0 lvm2 config)
- management of nodes/symlinks under /dev directly by libdevmapper/lvm2
(--verifyudev in dmsetup and activation/verify_udev_operations=1
lvm2 config)
- not obtaining any device list from udev database
(devices/obtain_device_list_from_udev=0 lvm2 config)
Note: we could set all of these before - there's no functional change!
However the DM_DISABLE_UDEV environment variable is a nice shortcut
to make it easier for libdevmapper users so that one can switch off all
of the udev management off at one go directly on the command line,
without a need to modify any source or add any extra switches.
Peter Rajnoha [Thu, 29 Nov 2012 12:59:12 +0000 (13:59 +0100)]
udev: do not verify udev operations for --noudevsync
If udev synchronization is disabled by means of --noudevsync
option, we should disable just the synchronization and nothing else.
The udev fallback (verifying udev operations and fixing the
nodes/symlinks if found incorrect) is orthogonal and controlled
by a separate activation/verify_udev_operations configuration option.
Zdenek Kabelac [Mon, 26 Nov 2012 22:45:35 +0000 (23:45 +0100)]
thin: allow restore with --force
Allow restoring metadata with thin pool volumes.
No validation is done for this case within vgcfgrestore tool -
thus incorrect metadata may lead to destruction of pool content.
Zdenek Kabelac [Mon, 26 Nov 2012 10:04:00 +0000 (11:04 +0100)]
thin: detect discards for non-power-2
Check if target supports discards for chunk sizes,
that are not power of 2 (just multiple of 64K),
and enable it in case it's supported by thin kernel target.
Jonathan Brassow [Thu, 22 Nov 2012 00:46:52 +0000 (18:46 -0600)]
RAID: If no stripes argument is given for RAID10 create, default to 2
Similar to the way the 'mirror', 'raid1' and 'raid10' segment types set
the number of mirrors to 2 ('-m 1') if the argument is not specified,
here we set the number of stripes to 2 if not given on the command line
when creating a RAID10 LV.
Commit bf2741376d47411994d4065863acab8e405ff5c7 started to use
lv_is_active() instead of call for lv_info & info.exists so
we cover also cluster activated devices.
For snapshost the conversion was not correct and introduced
regression by blocking creation of snapshot of inactive LV.
Fix it by assigning lv_is_active() directly.
Note: we still have minor issue to fix - to make
lv_is_???? function able to return error states since
lv_info() may fail.
Zdenek Kabelac [Thu, 15 Nov 2012 13:48:32 +0000 (14:48 +0100)]
thin: add common pool functions
Move common functions for lvcreate and lvconvert.
get_pool_params() - read thin pool args.
update_pool_params() - updates/validates some thin args.
It is getting complicated and even few more things will be
implemented, so to avoid reimplementing things differently
in lvcreate and lvconvert code has been splitted
into 2 common functions that allow some future extension.
Jonathan Brassow [Wed, 14 Nov 2012 20:58:47 +0000 (14:58 -0600)]
mirror: Mirrored log should be fixed before mirror when double fault occurs
This patch is intended to fix bug 825323 - FS turns read-only during a double
fault of a mirror leg and mirrored log's leg at the same time. It only
affects a 2-way mirror with a mirrored log. 3+-way mirrors and mirrors
without a mirrored log are not affected.
The problem resulted from the fact that the top level mirror was not
using 'noflush' when suspending before its "down-convert". When a
mirror image fails, the bios are queue until a suspend is recieved. If
it is a 'noflush' suspend, the bios can be safely requeued in the DM
core. If 'noflush' is not used, the bios must be pushed through the
target and if a device is failed for a mirror, that means issuing an
error. When an error is received by a file system, it results in it
turning read-only (depending on the FS).
Part of the problem was is due to the nature of the stacking involved in
using a mirror as a mirror's log. When an image in each fail, the top
level mirror stalls because it is waiting for a log flush. The other
stalls waiting for corrective action. When the repair command is issued,
the entire stacked arrangement is collapsed to a linear LV. The log
flush then fails (somewhat uncleanly) and the top-level mirror is suspended
without 'noflush' because it is a linear device.
This patch allows the log to be repaired first, which in turn allows the
top-level mirror's log flush to complete cleanly. The top-level mirror
is then secondarily reduced to a linear device - at which time this mirror
is suspended properly with 'noflush'.
Peter Rajnoha [Thu, 1 Nov 2012 12:33:49 +0000 (13:33 +0100)]
systemd: do not remove lvm2-activation.service
Fix previous commit 360c569ce8f0bfe936d59ca91de2716958550524.
Remove only fedora-storage-init/fedora-storage-init-late.service, but
not lvm2-activation.service.
fedora-storage-init.service fedora-storage-init-late.service
Peter Rajnoha [Tue, 30 Oct 2012 19:36:49 +0000 (20:36 +0100)]
systemd: various updates and fixes
Don't use lvmetad in lvm2-monitor.service ExecStop to avoid a systemd issue.
- a systemd design issue while processing dependencies
with socket-based activation that ends up with a hang
- https://bugzilla.redhat.com/show_bug.cgi?id=843587
(also tracker bug https://bugzilla.redhat.com/show_bug.cgi?id=871527)
- not using lvmetad in this case is just a workaround, once the bug
above is resolved, we should enable the lvmetad in that specific case
Remove dependency on fedora-storage-init.service in lvm2 systemd units.
- fedora-storage-init.service and fedora-storage-init-late.service is
going to be separated into respective units that belong to each block
device subsystem:
- mpath + mdraid activated via udev solely
- dmraid with its own dmraid-activation.service unit
- lvm2 with the lvm2-activation-generator to generate the
activation units runtime if lvmetad disabled
(global/use_lvmetad=0 set in lvm.conf) and activation done
via udev+lvmetad if lvmetad enabled (global/use_lvmetad=1 set
in lvm.conf)
Depend on lvm2-lvmetad.socket in lvm2-monitor.service systemd unit.
- as lvm2-monitor uses lvmetad if lvmetad is enabled
Tony Asleson [Thu, 25 Oct 2012 22:31:11 +0000 (17:31 -0500)]
python-lvm: Memory leaks & seg. fault fixes
Issues found (thus far) in unit test developemnt for python bindings.
Added Py_DECREF(ptr) in liblvm_lvm_vg_open & liblvm_lvm_vg_create
in error paths so that we correctly clean up memory.
Added a call to lvm_vg_close when we remove a vg. The code was
clearing out the vg pointer which prevented us from actually
calling lvm_vg_close in the close path.
liblvm_lvm_vg_create_lv_linear was not initializing
lvobj->parent_vgobj and if lvm_vg_create_lv_linear failed
we went through liblvm_lv_dealloc on clean up and tried to
Py_DECREF an invalid pointer.
Jonathan Brassow [Thu, 25 Oct 2012 05:42:45 +0000 (00:42 -0500)]
mirror: Avoid reading mirrors with failed devices in mirrored log
Commit 9fd7ac7d035f0b2f8dcc3cb19935eb181816bd76 did not handle mirrors
that contained mirrored logs. This is because the status line of the
mirror does not give an indication of the health of the mirrored log,
as you can see here:
[root@bp-01 lvm2]# dmsetup status vg-lv vg-lv_mlog
vg-lv: 0 409600 mirror 2 253:6 253:7 400/400 1 AA 3 disk 253:5 A
vg-lv_mlog: 0 8192 mirror 2 253:3 253:4 7/8 1 AD 1 core
Thus, the possibility for LVM commands to hang still persists when mirror
have mirrored logs. I discovered this while performing some testing that
does polling with 'pvs' while doing I/O and killing devices. The 'pvs'
managed to get between the mirrored log device failure and the attempt
by dmeventd to repair it. The result was a very nasty block in LVM
commands that is very difficult to remove - even for someone who knows
what is going on. Thus, it is absolutely essential that the log of a
mirror be recursively checked for mirror devices which may be failed
as well.
Despite what the code comment says in the aforementioned commit...
+ * _mirrored_transient_status(). FIXME: It is unable to handle mirrors
+ * with mirrored logs because it does not have a way to get the status of
+ * the mirror that forms the log, which could be blocked.
... it is possible to get the status of the log because the log device
major/minor is given to us by the status output of the top-level mirror.
We can use that to query the log device for any DM status and see if it
is a mirror that needs to be bypassed. This patch does just that and is
now able to avoid reading from mirrors that have failed devices in a
mirrored log.
Jonathan Brassow [Wed, 24 Oct 2012 04:10:33 +0000 (23:10 -0500)]
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
Jonathan Brassow [Wed, 24 Oct 2012 02:19:27 +0000 (21:19 -0500)]
RAID: Make RAID 4/5/6 display sync status under heading s/Copy%/Cpy%Sync
The heading 'Copy%' is specific to PVMOVE volumes, but can be generalized
to apply to LVM mirrors also. It is a bit awkward to use 'Copy%' for
RAID 4/5/6, however - 'Sync%' would be more appropriate. This is why
RAID 4/5/6 have not displayed their sync status by any means available to
'lvs' yet.
Jonathan Brassow [Wed, 24 Oct 2012 01:33:54 +0000 (20:33 -0500)]
mirror/raid: Move 'copy_percent' to common code (mirror.c -> lv_manip.c)
The 'copy_percent' function takes the 'extents_copied' field from each
segment in an LV to create the numerator for the ratio that is to
become the copy_percent. (Otherwise known as the 'sync' percent for
non-pvmove uses, like mirror LVs and RAID LVs.) This function safely
works on RAID - not just mirrors - so it is better to have it in
lv_manip.c rather than mirror.c.
There's a lot of different functions that do a lot of different things
in lv_manip.c, so I placed the function near a function in lv_manip.c
that it was close to in metadata-exported.h. Different placement in the
file or a different name for the function may be useful.
Tony Asleson [Tue, 23 Oct 2012 19:50:14 +0000 (14:50 -0500)]
python-lvm: seg. fault in liblvm_lvm_percent_to_float
The first parameter needs to be present even if it isn't being
used, otherwise the one any only parameter we get is null
which causes PyArg_ParseTuple to seg. fault.
Peter Rajnoha [Tue, 23 Oct 2012 09:40:53 +0000 (11:40 +0200)]
udev_sync: cookie_set=1 on each dm_task_set_cookie call
cookie_set variable found in the struct dm_task should be always
set to 1 after dm_task_set_cookie_call, even if udev_sync is disabled
as the cookie itself carries synchronization informations *as well as*
extra flags to control other aspects of udev support.
For example, one could disable the synchronization itself, but still
direct the libdm code to disable library fallback via
DM_UDEV_DISABLE_LIBRARY_FALLBACK flag. These extra flags still need
to be carried out!
A concrete example:
$ dmsetup create test --table "0 1 zero" --noudevsync
This disables synchronization with udev. As the --verifyudev option is
not used, we don't want to do any corrections. In other words, we
need DM_UDEV_DISABLE_LIBRARY_FALLBACK flag to be used. However,
with --noudevsync this was not the case - the flag was ignored!
This patch fixes the case when noudevsync is used but there are still
some extra flags passed within the cookie flag part. The synchronization
part of the cookie stays zero (which is ok as dm_udev_wait call on such a
cookie is simply a NOOP).
Andy Grover [Wed, 17 Oct 2012 19:55:25 +0000 (12:55 -0700)]
python-lvm: Implement proper refcounting for parent objects
Our object nesting:
lib -> VG -> LV -> lvseg
-> PV -> pvseg
Implement refcounting and checks to ensure parent objects are not
dealloced before their children. Also ensure calls to self or child's
methods are handled cleanly for objects that have been closed or removed.
Ensure all functions that are object methods have a first parameter named
'self', for consistency
Rename local vars that reference a Python object to '*obj', in order to
differentiate from liblvm handles
Andy Grover [Mon, 15 Oct 2012 20:26:01 +0000 (13:26 -0700)]
python-lvm: Remove liblvm object
Instead of requiring users to create a liblvm object, and then calling
methods on it, the module acquires a liblvm handle as part of
initialization. This makes it impossible to instantiate a liblvm object
with a different systemdir, but there is an alternate envvar method for
that obscure use case.
Jonathan Brassow [Mon, 15 Oct 2012 20:43:15 +0000 (15:43 -0500)]
TEST: Re-add testing of lvconvert-raid for kernels < 3.2
I'm not sure what 'BUG's were being encountered when the restriction
to limit lvconvert-raid.sh tests to kernels > 3.2 was added. I do know
that there were BUG's that could be triggered when testing snapshots and
some of the earliest DM RAID available in the kernel. I've taken out
the 3.2 kernel restriction and added a dm-raid >= 1.2 restriction instead.
This will allow the test to run on patched production kernels.
Jonathan Brassow [Mon, 15 Oct 2012 20:09:05 +0000 (15:09 -0500)]
Clean-up: Adjust message to be clearer on action taken and why
A message is printed when the region_size of a RAID LV is adjusted
to allow for large (> ~1TB) LVs. The message wasn't very clear.
Hopefully, this is better.
Try to bring the lvmetad usage text and man page closer to the code.
There seem to be 3 useful ways to use -d with lvmetad at the moment:
-d all
-d wire
-d debug
(They can also be comma-separated like -d wire,debug.)
Prior to the last release, -d, -dd and -ddd were supported.
Fail if an unrecognised debug arg is supplied on the command line.
Change -V to report the same version as the lvm binary: previously it
just reported version 0.