Milan Broz [Tue, 5 Jan 2010 15:58:11 +0000 (15:58 +0000)]
Resume volumes in reverse order to preserve memlock pairing.
If renaming snapshot with virtual origin, the origin is renamed too.
But the code must resume LVs in reverse order to properly
pair memlock (in cluster locking).
(The resume of snapshot resumes origin too and later resume
is ignored otherwise.)
Milan Broz [Fri, 18 Dec 2009 12:44:20 +0000 (12:44 +0000)]
Remove missing flag if PV reappeared and is empty.
When PV device reappears with old metadata, it is
always updated to new version byt atutomatic metadata
repair.
Remove missing flag if device is empty.
If device contains allocated extents, issue warning that
user must remove volumes and re-add this PV before
manipulating with this volume.
This partially solves bug 547842 when one PV (log) is failed,
dmeventd removes that device and later this device reappears and
is wrongly added into VG marked missing.
Milan Broz [Wed, 9 Dec 2009 19:53:39 +0000 (19:53 +0000)]
Call explicitly suspend for temporary mirror layer.
The memlock_inc() fix is wrong, memlock count is not
propagated to long living process (clvmd) and just
it underflow there.
Also suspend is needed to pre-load precommited metadata
on other nodes (remapping to error taget in this case).
With explicit suspend we generate lock request and code
can update memlock count.
(Infinitely "locked" memory caused that fs_unlock() was not
called properly and on cluster nodes remains
old links in /dev/mapper for not active devices.)
(N.B. failing of suspend call here is not handled as fatal
error - the LV is going to be removed later anyway.)
Milan Broz [Wed, 9 Dec 2009 19:29:04 +0000 (19:29 +0000)]
Allow manipulation with precommited metadata even when a PV is missing.
The new recovery code first tries to repair LV and then removes failed PV
from VG. It means that during operation there can be VG with PV missing,
and vg_read code handles it like not consistent VG.
We already allows returning "inconsistent" commited metadata,
for mirror repair we need this for precommited too.
(The suspend call prepares precommited metadata to inactive table on
other cluster nodes.)
"Inconsistent" here means - correct metadata, just with some metadata areas
not found (obviously on missing or failed PVs).
Milan Broz [Wed, 9 Dec 2009 19:00:16 +0000 (19:00 +0000)]
Never ever use distributed lock for LV in non-clustered VG.
The LV locks make sense only for clustered LVs.
Properly check cluster flag and never issue cluster lock here.
There are several places in code, where it is already checked, this
patch add this check to all needed calls.
In previous code the lock behaviour was inconsistent,
for example, the pre/post callback can take lock even for local volume,
but deactivate call do not released this lock and it remains held forever.
The local LV lock request now just let run the underlying activation code
on local node, the same process like in local locking.
(Again, this is important for new mirror repair calls, here for local
mirrors but with cluster locking enabled.)
Milan Broz [Wed, 9 Dec 2009 18:42:02 +0000 (18:42 +0000)]
Get rid of magic masks in cluster locking code - clvmd part.
- do_command and lock_vg expect flags (no change here)
Bug fixes:
- lock_vg should check for NONBLOCK on lock_cmd, flags have this bit masked-out
- do_pre/post_command expect do not mask flag at all, this causes that
the code inside is never run! (see following patches, these functions
expect plain command without flags)
Milan Broz [Wed, 9 Dec 2009 18:28:27 +0000 (18:28 +0000)]
Get rid of magic masks in cluster locking code.
Patch should not cause any problems, only real change is
removing LCK_LOCAL bit from lock type flag, it is never used there.
(LCK_LOCAL is part arg[1] bits anyway.)
Milan Broz [Wed, 9 Dec 2009 18:09:52 +0000 (18:09 +0000)]
Remove newly created log volume if initial deactivation fails.
If there is problem deactivate LV and
_init_mirror_log is called with remove_on_failure = 1,
remove the newly created log LV from metadata.
(This can happen if there is active device with the same name
but different UUID.)
The main reason for this "workaround" patch is to
- do not keep _mlog volume in metadata, so user can repeat the action
- print better error message describing the real problem
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
WARNING: New mirror won't be synchronised. Don't read what you didn't write!
/dev/vg_bar/lv1_mlog: not found: device not cleared
Aborting. Failed to wipe mirror log.
Error locking on node bar-01: Input/output error
Unable to deactivate mirror log LV. Manual intervention required.
Failed to create mirror log.
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
WARNING: New mirror won't be synchronised. Don't read what you didn't write!
Aborting. Unable to deactivate mirror log.
Failed to initialise mirror log.
I see "Deactivated" message when I activate and "Activated" message when
I deactivate. The code uses "activate" as boolean but it can be any one
of the enum values from CHANGE_AY, CHANGE_AN, CHANGE_AE, etc.
Signed-off-by: Malahal Naineni <malahal@us.ibm.com> Acked-by: Dave Wysochanski <dwysocha@redhat.com>
Peter Rajnoha [Mon, 7 Dec 2009 12:03:47 +0000 (12:03 +0000)]
Disable udev rules on change event with DISK_RO=1.
There's a new change udev event generated since kernel 2.6.32 that
notifies userspace about a change in read-only attribute for block
devices (with DISK_RO=1 environment variable set).
We need to detect this and disable the rule application so the
meaning of this change event is not interchanged with the regular
change event used while resuming/renaming DM devices.
If there's anybody awaiting this notification in foreign rules,
he can still check for this env var and do the appropriate actions
separately.
Update a few more uint64_t's related to the 64-bit status change.
At this point they probably do not matter but going forward they
may - depends on future patches for replicator, etc. I think
these probably got missed because they were 'flags' so I changed
the name to 'status' to be consistent. So the on-disk
things 'flags' and the in structure 'status' (bits).
NOTE: WHATS_NEW already has entry for this in current release.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com>
Petr Rockai [Mon, 30 Nov 2009 17:17:11 +0000 (17:17 +0000)]
Optionally abort on internal errors (and leverage this option in the
testsuite). (This is showing a problem in the pvmove test for me, so I expect
the tests to start failing -- this needs to be fixed separately though.)
Dave Wysochanski [Wed, 25 Nov 2009 20:44:07 +0000 (20:44 +0000)]
Remove unnecessary / duplicate dm_list macros and functions.
These are no longer used by anyone. The dm_list defines are all in
libdevmapper.h and libdm/datastruct/list.c contains any function definitions.
There is some code in "old-tests" that still use this but this code is not
being maintained.
Mike Snitzer [Tue, 24 Nov 2009 22:55:55 +0000 (22:55 +0000)]
Switch status from 32-bit to 64-bit
The physical_volume, volume_group, logical_volume and lv_segment
structures' 'status' member is now uint64_t.
The alignment of these structures was also audited to remove holes. The
movement of some members in 'volume_group' and 'lv_segment' eliminates
holes. The 'physical_volume' structure still has one 4-byte hole after
'pe_size'; the other structures no longer have any holes. Each
structures' size has not changed.
Milan Broz [Tue, 24 Nov 2009 16:10:25 +0000 (16:10 +0000)]
Refresh device filters before full device rescan in lvmcache.
The sysfs filter initialise hash of available devices using
scan of /sys/block. We need to refresh even this hash
when performing full scan otherwise the newly appeared
device could be rejected, because there is no entry
in sysfs filter.
This easily could happen when attaching new device
to cluster node. (Only force refresh of context
in clvmd -R works here now).
Unfortunately consequences of this are much worse,
missing device part on that node is replaced with missing segment
(even when no partial arg is selected) and this directly
lead to data corruption.
See https://bugzilla.redhat.com/show_bug.cgi?id=538515
Simply fix it by refreshing device filters in lvmcache
before performing the full device scan.
Milan Broz [Tue, 24 Nov 2009 16:08:49 +0000 (16:08 +0000)]
Return error status if vgchange fails to activate some volume.
(on one node a storage connection failed):
# vgchange -a y vg_bar ; echo $?
Error locking on node bar-02: Refusing activation of partial LV lv1. Use --partial to override.
1 logical volume(s) in volume group "vg_bar" now active
0
So activation fails on one node, error is correctly printed but
status code is wrong.
This patch fixes the top level (vgchange) to return proper code
(and print # of activated LVs).
Milan Broz [Mon, 23 Nov 2009 10:55:14 +0000 (10:55 +0000)]
Fix memory lock imbalance in locking code.
(This affects only cluster locking because only cluster
locking module set LCK_PRE_MEMLOCK.)
With currect code you get
# vgchange -a n
Internal error: _memlock_count has dropped below 0.
when using cluster locking.
It is caused by _unlock_memory calls here
if ((flags & (LCK_SCOPE_MASK | LCK_TYPE_MASK)) == LCK_LV_RESUME)
memlock_dec();
Unfortunately it is also (wrongly) called in immediate unlock
(when LCK_HOLD is not set) from lock_vol
(LCK_UNLOCK is misinterpreted as LCK_LV_RESUME).
Avoid this by comparing original flags and provide memlock
code type of operation (suspend/resume).
Petr Rockai [Thu, 19 Nov 2009 01:11:57 +0000 (01:11 +0000)]
Fix another bug in memlock handling, this time the "global" dmeventd memlock
was preventing device scans in lvconvert --repair running from inside dmeventd.
Peter Rajnoha [Fri, 13 Nov 2009 12:43:21 +0000 (12:43 +0000)]
Support udev flags even when udev_sync is disabled or not compiled in.
This provides better support for environments where udev rules are installed
but udev_sync is not compiled in (however, using udev_sync is highly
recommended). It also provides consistent and expected functionality even
when '--noudevsync' option is used.
There is still requirement for kernel >= 2.6.31 for the flags to work though
(it uses DM cookies to pass the flags into the kernel and set them in udev
event environment that we can read in udev rules).
Peter Rajnoha [Fri, 13 Nov 2009 12:33:27 +0000 (12:33 +0000)]
Remove 'last_rule' from udev rules.
'last_rule' option has been removed from udev (version >= 147).
From now on, we require foreign rules to check and honor
ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG} instead. Foreign
rules should be skipped totally when this flag is set.
Rename pvcreate_params processing functions to better match <object><action>.
Rename fill_default_pvcreate_params to pvcreate_params_set_defaults.
Rename pvcreate_validate_restore_params to pvcreate_restore_params_validate.
Rename pvcreate_validate_params to pvcreate_params_validate.
Peter Rajnoha [Sun, 1 Nov 2009 18:01:31 +0000 (18:01 +0000)]
More cleanup in udev rules:
- add copyright notice for 10-dm.rules.in,
- set DM_UDEV_DISABLE_{DISK, OTHER}_RULES_FLAG in 11-dm-lvm.rules directly
for inappropriate and non-top-level subdevices in case we use older kernels
where DM_COOKIE is not used (and therefore there are no flags passed from
the LVM process itself). This applies for older kernels (version < 2.6.31),
- remove unnecessary filters in 95-dm-notify.rules - the DM_COOKIE env var
itself is set for change/remove udev events and for DM devices only so
there's no need to double-check this.