Mike Snitzer [Wed, 10 Feb 2010 14:38:24 +0000 (14:38 +0000)]
Add 'fail_if_percent_unsupported' arg to _percent() and _percent_run().
We unfortunately don't yet _know_, in dev_manager_snapshot_percent(), if
a snapshot-merge target is active (activation is deferred if dev is
open); so we can't short-circuit origin devices based purely on existing
LVM LV attributes.
Set 'fail_if_percent_unsupported' in dev_manager_snapshot_percent() for
a merging origin LV, otherwise passing unsupported LV types to _percent
will lead to a default successful return with percent_range as
PERCENT_100.
For a merging origin, PERCENT_100 will result in a polldaemon that runs
infinitely (because completion is PERCENT_0).
Mike Snitzer [Mon, 8 Feb 2010 23:28:06 +0000 (23:28 +0000)]
Remove false "failed to find tree node for <lv>" error from _cached_info().
When activating a merging origin it is valid, and expected, to not have
a node in the deptree for both the origin and its merging snapshot. The
_cached_info() caller is only concerned with whether a device is open.
If there isn't a node in the tree the associated device is definitely
not open.
Mike Snitzer [Fri, 5 Feb 2010 22:47:22 +0000 (22:47 +0000)]
Switch lvconvert_single() over to using get_vg_lock_and_logical_volume()
This change was deferred to help ease the review of previous refactoring
related to using process_each_lv() for lvconvert's merge support. Not
that doing so _really_ helped but...
Mike Snitzer [Fri, 5 Feb 2010 22:44:37 +0000 (22:44 +0000)]
lvconvert --merge @tag support
Switch lvconvert's --merge code over to using process_each_lv(). Doing
so adds support for a single 'lvconvert --merge' to start merging
multiple LVs (which includes @tag expansion).
Add 'lvconvert --merge @tag' testing to test/t-snapshot-merge.sh
Adjust man/lvconvert.8.in to reflect these expanded capabilities.
The lvconvert.c implementation requires rereading the VG each iteration
of process_each_lv(). Otherwise a stale VG instance associated with
the LV passed to lvconvert_single_merge() would result in stale VG
metadata being written back out to disk. This overwrote new metadata
that was written when a previous snapshot LV finished merging (via
lvconvert_poll). This is only an issue when merging multiple LVs that
share the same VG (a single VG is typical for most LVM configurations on
system disks).
In the end this new support is very useful for performing a "system
rollback" that requires multiple snapshot LVs be merged to their
respective origin LV.
The yum-utils 'fs-snapshot' plugin tags all snapshot LVs that it creates
with a common 'snapshot_tag' that is unique to the yum transaction.
Rolling back a yum transaction, that created LVM snapshots with the tag
'yum_20100129133223', is as simple as:
lvconvert --merge @yum_20100129133223
Adding a new mimage (leg/copy) to a mirror behaves differently
depending on if the mirror has a 'core' or 'disk' log. When there
is a disk log, the new leg is added by stacking a new mirror on
top of the old (one leg is the old mirror and the other leg is the newly
added device). When the log is a 'core' log, the new leg is simply added
to the existing mirror and all the devices are re-synced.
The logic that handles collapsing the stacked 'disk' log mirror was
having the effect of causing 'core' logged mirrors to begin resync'ing
for a second time. I have used the 'CONVERTING' flag to indicate that
a mirror is converting by way of stacking. This is no longer set for
up-converting core logs. The final 'collapse' logic can safely be skipped
for 'core' log mirrors - getting rid of the second resync.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Peter Rajnoha [Wed, 3 Feb 2010 14:08:39 +0000 (14:08 +0000)]
This is related to liblvm and its lvm_list_vg_names() and lvm_list_vg_uuids() functions
where we should not expose internal VG names/uuids (the ones with "#" prefix )through the
interface. Otherwise, we could end up with library users opening internal VGs which will
initiate locking mechanism that won't be cleaned up properly.
"#orphans_{lvm1, lvm2, pool}" names are treated in a special way, they are truncated first
to "orphans" and this is used as a part of the lock name then (e.g. while calling lvm_vg_open()).
When library user calls lvm_vg_close(), the original name "orphans_{lvm1, lvm2, pool}"
is used directly and therefore no unlock occurs.
We should exclude internal VG names and uuids in the lists provided by lvmcache:
lvmcache_get_vgids() and lvmcache_get_vgnames().
Mike Snitzer [Wed, 3 Feb 2010 03:58:08 +0000 (03:58 +0000)]
Add %ORIGIN support to lv{create,extend,reduce,resize} --extents option
Allow the number of logical extents to be expressed (for a snapshot) as
a percentage of the total space in the Origin Logical Volume with the
suffix %ORIGIN.
Update the relevant man pages accordingly. Eliminate inconsistencies
between the man pages and tools/commands.h
Was using dm_list_iterate_items when I should have been using
*_safe. This had the effect of segfaulting the log daemon when
converting a mirror from one log type to another.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Milan Broz [Wed, 27 Jan 2010 13:29:11 +0000 (13:29 +0000)]
Fix pvmove abort when temporary mirror fails to be cluster-aware.
When activation of pvmove mirror fails on cluster, some nodes
still possibly succeeded in activation.
- Explicitly deactivate that mirror to be sure
- properly pair suspend/resume calls to not cause memory lock problems in clvmd
Code cannot simply call _finish_pvmove on cluster in this situation, because
changed LVs are suspended twice (causing memory inbalance) and also temporary
mirror is activated when it is not expected (and we know that it failed already).
Patch prepares special function which remove temporary mirror references from
metadata and then resumes changed LVs.
Mike Snitzer [Fri, 22 Jan 2010 21:59:42 +0000 (21:59 +0000)]
Default to checking LV's progress before waiting in _wait_for_single_lv.
Support "wait before testing" using '+' in pvmove and lvconvert
interval. Doing so overrides the new default of sleeping after checking
the LV's progress.
Sleeping before checking progress can lead to extraneous polldaemons
being left running. These polldaemons would have otherwise exited had
they checked before sleeping. Checking progress before sleeping helps
workaround the subtly unreliable nature of "finished" state checking
in _percent_run.
Update test/t-mirror-names.sh to use '+' when providing its lvconvert
interval.
Dave Wysochanski [Thu, 21 Jan 2010 21:04:44 +0000 (21:04 +0000)]
Remove useless memory allocation for pv->vg_name in _alloc_pv().
All this seems to do is provide a memory leak so remove it.
The only caller of _alloc_pv() later explicitly sets
pv->vg_name = fmt->orphan_vg_name so clearly this allocation
should be removed. I also saw no where in the code where
strncpy was used to assign pv->vg_name - only direct assignments
and strdup's.
zkabelac [Thu, 21 Jan 2010 13:41:39 +0000 (13:41 +0000)]
Reset released pointer and counters.
DSO is currently not dl_close-ing pluing during it is unregister handling,
so clear structure and related counter, so there are no memory problems.
Futher fixes are needed.
Mike Snitzer [Wed, 20 Jan 2010 21:53:10 +0000 (21:53 +0000)]
Preload the origin prior to suspend IFF snapshot(s) still exist after a
merge completes. This narrows the scope of this "hack" (which still
needs a proper fix within the deptree).
This stops dmeventd from trying to access snapshot devices that were
already removed.
Mike Snitzer [Tue, 19 Jan 2010 16:44:57 +0000 (16:44 +0000)]
Add a common way to establish a scsi_debug-based 4K drive for use by an
LVM2 test (rather than using the traditional loop device).
prepare_scsi_debug_dev currently assumes exclussive access to the
scsi_debug module. Any script that tries to use prepare_scsi_debug_dev
when scsi_debug is unavailable or already loaded into the kernel will be
skipped.
t-topology-support.sh shows how prepare_scsi_debug_dev function can be
used repeatedly (within a script) to test LVM2 ontop of a ramdisk-based
SCSI device w/ arbitrary scsi_debug features.
Mike Snitzer [Tue, 19 Jan 2010 15:59:34 +0000 (15:59 +0000)]
update test/t-pvcreate-operation-md.sh attempt loading raid0.ko if raid0
isn't already available (in /proc/mdstat).
switch to requiring 2.6.33 for the alignment_offset tests; 2.6.{31,32}
alignment_offset values aren't reliable. 2.6.33 _should_ have mkp's
alignment_offset fixes but so far it doesn't (as of 2.6.33-rc4).
Mike Snitzer [Fri, 15 Jan 2010 22:58:25 +0000 (22:58 +0000)]
Change dev_manager_mirror_percent()'s 'struct logical_volume *' to be
'const'. Be consistent with its use (and dev_manager_snapshot_percent()).
Pass 'lv' from dev_manager_snapshot_percent() to _percent() to
_percent_run(). _percent_run() always dereferenced 'lv' (when
initializing segh) even though it may have been NULL (as was the case
until now for dev_manager_snapshot_percent()).
If a "snapshot-origin" LV (snapshot-merge whose merge was deferred
becuase it was open) was passed to _percent_run() it would always return
100%.
Update _percent_run() to NOT return PERCENT_100 et. al. if
->target_percent() wasn't ever called and supplied 'lv' is a merging
origin. A default return of 100% does not work for snapshot-merge.
Also tweak a related lvconvert log_error() to include "Aborting merge."
When moving the cluster log server into the LVM tree, the in memory
bitmap tracking was switched from the e2fsprogs implementation to
the device-mapper implementation (dm_bitset_t). The latter has a
leading uin32_t field designed to hold the number of bits that are
being tracked. The code was not properly handling this change in
all places. Specifically, when getting the bitmap to/from disk.
Endian adjustments will likely need to be made on the accounting
field as well, since bitmaps are passed between machines on
start-up.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Off-by-one count was causing not all the mirror table parameters
that were necessary to be passed on to userspace.
The cluster mirror table (log portion only) used to look like this:
clustered-disk <parm_count> <disk> <region_size> <uuid> \
[[no]sync] [block_on_error]
Now it looks like this:
userspace <parm_count> <uuid> clustered-disk <disk> <region_size> \
[[no]sync]
So, there is one extra argument in the latter case - this was
unaccounted for.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Mike Snitzer [Wed, 13 Jan 2010 01:56:18 +0000 (01:56 +0000)]
Rename segment and lv status flag from SNAPSHOT_MERGE to MERGING.
Eliminate 'merging_snapshot' from 'struct logical_volume' and just use
'snapshot' for origin lv's reference to the merging snapshot; also set
MERGING in the origin lv's status.
Mike Snitzer [Wed, 13 Jan 2010 01:54:34 +0000 (01:54 +0000)]
Merge on activate support.
If either the origin or snapshot that is to be merged is open the merge
will not start; only the merge metadata will be written. The merge will
start on the next activation of the origin (or via lvchange --refresh)
IFF both the origin and snapshot are closed.
Merge on activate is particularly important if we want to merge over a
mounted filesystem that cannot be unmounted (until next boot) --- for
example root.
Mike Snitzer [Wed, 13 Jan 2010 01:52:58 +0000 (01:52 +0000)]
When turning merging origin into non-merging origin, there is bad sequence:
snapshots are suspended, new origin is created, snapshots are resumed, new
origin is resumed. So it allocates memory while suspended.
To fix it, move vg_commit after suspend_lv, so that the suspend code will
treat it as precommitted vg and will preload new origin prior to suspend.
NOTE: agk doesn't like this "hack"; need to revisit and fix
Mike Snitzer [Wed, 13 Jan 2010 01:49:22 +0000 (01:49 +0000)]
When there is merging snapshot, report percentage on the origin LV.
Because the snapshot LV will be hidden this is needed so the user can
see merging progress with "lvs" command.
Mike Snitzer [Wed, 13 Jan 2010 01:48:38 +0000 (01:48 +0000)]
Report merging snapshot as 'S' instead of 's':
This is useful for when the snapshot is still active and merging hasn't
started yet; it shows a merge is pending. Once merging starts the
merging snapshot will be hidden but can still be displayed with 'lvs -a'
Report snapshot origin with merging snapshot as 'O' instead of 'o':
Before merge starts this shows that a merge is pending. While merging
the snapshot will be hidden, 'O' enables a user to see that there is a
snapshot merging.
Mike Snitzer [Wed, 13 Jan 2010 01:44:37 +0000 (01:44 +0000)]
Merging device is loaded with "-cow" suffix and with base name of the
origin. This is needed so that "-cow" device can be found and removed
when lvremove is performed.
Mike Snitzer [Wed, 13 Jan 2010 01:39:44 +0000 (01:39 +0000)]
Add support for "snapshot-merge" target.
Introduces new libdevmapper function dm_tree_node_add_snapshot_merge_target
Verifies that the kernel (dm-snapshot) provides the 'snapshot-merge'
target.
Activate origin LV as snapshot-merge target. Using snapshot-origin
target would be pointless because the origin contains volatile data
while a merge is in progress.
Because snapshot-merge target is activated in place of the
snapshot-origin target it must be resumed after all other snapshots
(just like snapshot-origin does) --- otherwise small window for data
corruption would exist.
Ideally the merging snapshot would not be activated at all but if it is
to be activated (because snapshot was already active) it _must_ be done
after the snapshot-merge. This insures that DM's snapshot-merge target
will perform exception handover in the proper order (new->resume before
old->resume). DM's snapshot-merge does support handover if the reverse
sequence is used (old->resume before new->resume) but DM will fail to
resume the old snapshot; leaving it suspended.
To insure the proper activation sequence dm_tree_activate_children() was
updated to accommodate an additional 'activation_priority' level. All
regular snapshots are 0, snapshot-merge is 1, and merging snapshot is 2.