Add:
- missing conditions to support any types to function with
it in lv_raid_convert(); temporary workaround used until
cli validation patches get merged
- tests requesting "-R " to lvconvert-raid-takeover.sh
involving a cleanup of the script
lvconvert: add segtype raid5_n and conversions to/from it (cleanup)
Cleanups as of Jons review:
- enhance comment about mandatory raid4 <-> raid5_n activation w/o metadata SubLVs
- remove bogus segment flag setting
- fix to sync related comments on conversions to raid0/striped and amongst raid4/5
- add missing error message for non-synced conversion to raid0/striped
Add:
- support to change region size of existing RaidLVs
(all RAID LV types but raid0/raid0_meta)
- lvconvert-raid-regionsize.sh with test variations
for different RAID types and region sizes
Zdenek Kabelac [Mon, 6 Feb 2017 12:15:39 +0000 (13:15 +0100)]
tests: drop zeroing
Well waiting for zeroing may take enough time to finish 'raid' sync.
So make the test running faster without zeroing and better avoid race
to have chance to happen (i.e. lvcreate is finished after array
gets already in sync).
lvconvert: add segtypes raid6_{ls,rs,la,ra}_6 and conversions to/from it
Add:
- support for segment types raid6_{ls,rs,la,ra}_6
(striped raid with dedicated last Q-Syndrome SubLVs)
- conversion support from raid5_{ls,rs,la,ra} to/from raid6_{ls,rs,la,ra}_6
- setting convenient segtypes on conversions from/to raid4/5/6
- related tests to lvconvert-raid-takeover.sh factoring
out _lvcreate,_lvconvert funxtions
lvconvert: add segtype raid6_n_6 and conversions to/from it
Add:
- support for segment type raid6_n_6 (striped raid with dedicated last parity/Q-Syndrome SubLVs)
- conversion support from striped/raid0/raid0_meta/raid4 to/from raid6_n_6
- related tests to lvconvert-raid-takeover.sh
lvconvert: add segtype raid5_n and conversions to/from it
Add:
- support for segment type raid5_n (striped raid with dedicated last parity SubLVs)
- conversion support from striped/raid0/raid0_meta/raid4 to/from raid5_n
- related tests to lvconvert-raid-takeover.sh
Bryn M. Reeves [Sun, 11 Dec 2016 13:26:34 +0000 (13:26 +0000)]
dmstats: allow --filemap groups to be updated
Add a new update_filemap command to dmstats that allows a filemap
group to be updated:
# dmstats update_filemap --groupid 0 vm.img
/var/lib/libvirt/images/vm.img: Updated group ID 0 with 137 region(s).
This will update the set of regions mapped to the file to reflect
the current file system allocation.
Currently this needs to be run manually - a future update will add
support for monitoring file maps via a daemon, allowing them to be
automatically updated when the underlying file is modified.
Bryn M. Reeves [Sun, 11 Dec 2016 12:53:04 +0000 (12:53 +0000)]
libdm: add dm_stats_update_regions_from_fd()
Add a call to update the regions corresponding to a file mapped
group of regions. The regions to be updated must be grouped, to
allow us to correctly identify extents that have been deallocated
since the map was created.
Tables are built of the file extents, and the extents currently
mapped to dmstats regions: if a region no longer has a matching
file extent, it is deleted, and new regions are created for any
file extents without a matching region.
The FIEMAP call returns extents that are currently in-memory (or
journaled) and awaiting allocation in the file system. These have
the FIEMAP_EXTENT_UNKNOWN | FIEMAP_EXTENT_DELALLOC flag bits set
in the fe_flags field - these extents are skipped until they
have a known disk location.
Since it is possile for the 0th extent of the file to have been
deallocated this must also handle the possible deletion and
re-creation of the group leader: if no other region allocation
is taking place the group identifier will not change.
Bryn M. Reeves [Wed, 25 Jan 2017 15:10:59 +0000 (15:10 +0000)]
libdm: test for DM_STATS_GROUP_NOT_PRESENT in _stats_group_id_present
If the group_id passed to _stats_group_id_present is equal to the
special value DM_STATS_GROUP_NOT_PRESENT there is no need to perform
any further tests: return false immediately.
Zdenek Kabelac [Fri, 20 Jan 2017 22:07:05 +0000 (23:07 +0100)]
dmeventd_thin: new logic for calling commands
For more advanced support we need to ensure better logic for calling
external much more advanced script for maintanance of thin-pool.
So this new code ensures:
When thin-pool data or metadata is bigger then 50%,
then with each 5% increment, action is called.
This is independent from autoextend_threshold.
This action always happens when thin-pool is over threshold,
(so no action when it's exactly i.e. 60%).
The only exception is 100% full thin-pool - which invokes 'last'
action.
Since thin-pool occupancy may change also downward, code needs
to also handle possibly reduction of occupancy of thin-pool.
So when usage drop from 90% to 50%, thin-pool will start to call
again action when it will pass 55% threshold.
This give external commands lot of option i.e. to call 'fstrim'
before actual resize is needed.
Zdenek Kabelac [Fri, 20 Jan 2017 22:06:45 +0000 (23:06 +0100)]
dmeventd_thin: drop umounting on error path
Default internal logic will stop trying to do any 'rescue' action
when executed command fails.
This will be now fully in hands of external script if such
behaviour is needed.
Zdenek Kabelac [Fri, 20 Jan 2017 21:53:55 +0000 (22:53 +0100)]
dmeventd_thin: rework failure handling
Instead of stopping monitoring after couple failing retries,
keep monitoring forever, just make larger delays between command
retries (ATM upto ~42 minutes).
So syslog is not spammed too often, yet commands have a chance to
be retried and succeed eventually...
Zdenek Kabelac [Fri, 20 Jan 2017 20:50:23 +0000 (21:50 +0100)]
dmeventd_thin: init command
When dmeventd configured command does not start with 'lvm ' prefix,
it's going to be an 'external' command.
In this case we split command by spaces to argv strings.
Zdenek Kabelac [Mon, 9 Jan 2017 15:30:49 +0000 (16:30 +0100)]
libdm: add human R|readable units
When showing sizes with 'H|human' units we do use standard rounding.
This however is confusing users from time to time,
when the printed number uses some biger units i.e. GiB and there is just
tiny fraction of space missing.
So here is some real-life example with new 'r' unit.
Meaning is - lvol1 has 'slightly' less then 2.00g - from sign '<' user
can be aware the LV doesn't have full 2.00GiB in size so he
will be less surpriced allocation of 2G volume will not succeed.
Zdenek Kabelac [Fri, 6 Jan 2017 11:41:38 +0000 (12:41 +0100)]
mirror: relax internal error for a while
With recent commit d6a74025df1afb3d76bec435bc6a40d649217b42 using
INTERNAL_ERROR while cheking layer LV - it's been noticed mirror
logic currently doesn't do a correct thing during upconversion and
does a full-try instead of checking only allocator capabilities.
This leads to invalid usage of layer.
To keep existing code running before providing a fix, relax
INTERNAL_ERROR just an error and keep the 'code' running.
Once mirror code is fixed, these all check should be switched
to internal errors.
Zdenek Kabelac [Thu, 5 Jan 2017 14:52:00 +0000 (15:52 +0100)]
report: report merged state for inactive LV
This was missing piece in 77997c7673bfca56f51ae4eb55a50bc76e40fe79.
When merging origin is inactive (while driver is loaded) we
could already report merge in progress values as there is
no way to activate 'old state' now.
Zdenek Kabelac [Thu, 5 Jan 2017 14:49:07 +0000 (15:49 +0100)]
debug: show proper error message for layer mismatch
Show proper internal error for failing command when there are some
inconsitencies in sizes of LV and its layer instead of rather
meaningless error code 5.
(Could be hit i.e. if user tried to 'resize' cached LV and then
uncache such LV.)
Zdenek Kabelac [Thu, 5 Jan 2017 14:32:25 +0000 (15:32 +0100)]
cache: resize is still unsupported
During rework of resize code this validation check
has been lost (in my resize branch). Upstream
is still not supporting resize of any cache type LV
so needs to be prevented.
Zdenek Kabelac [Tue, 3 Jan 2017 13:47:46 +0000 (14:47 +0100)]
cache: add missing udev wait
When we need to clear dirty cache content of cached LV, there
is table reload which usually is shortly followed by next metadata
change. However udev can't (as of now) process udev event
while device is 'suspended'.
So whenever sequence of 'suspend/resume/suspend' is needed,
we need to wait first for finishing of 'resume' processing before
starting next 'suspend'. Otherwise there is 'race' danger of triggering
unwantend umount by systemd as such event will trigger
SYSTEMD_READY=0 state for a moment for such changed device.
Such race is pretty ugly to trace so we may need to review more
sequencies for missing 'sync'.
(Other option is to enhnace 'udev' rules processing to avoid
such dramatic actions to be happening for suspended devices).
The only reasonable behaviour here is to error on
any number out of accepted range (i.e. now numbers
wrapping around with some hidden logic).
As this is plain bug there is no support for
backward compatibility since noone should
set numbers >UINT32_MAX and expect 0 or error
depending on how big number was used....
Zdenek Kabelac [Tue, 3 Jan 2017 12:02:52 +0000 (13:02 +0100)]
lvmcmdline: support uint32
Add simple function to wrap usage for only uint32 numbers.
Unlike 'int_arg' which accepts full range of 64bit number
this function will error on numbers out of this range:
lvchange: allow a transiently failed RaidLV to be refreshed
Add to commits 87117c2b2546 and 0b8bf73a63d8 to avoid refreshing two
times altogether, thus avoiding issues related to clustered, remotely
activated RaidLV. Avoid need to repeat "lvchange --refresh RaidLV"
two times as a workaround to refresh a RaidLV. Fix handles removal
of temporary *-missing-* devices created for any missing segments
in RAID SubLVs during activation.
Because the kernel dm-raid target isn't able to handle transiently
failing devices properly we need
"[dm-devel][PATCH] dm raid: fix transient device failure processing"
as well.
test: add lvchange-raid-transient-failures.sh
and enhance lvconvert-raid.sh
Zdenek Kabelac [Thu, 22 Dec 2016 22:28:04 +0000 (23:28 +0100)]
thin: refresh status when error processing fails
When thin-pool processes event and 'lvextend --use-policies' fails
rather capture up-to-date new info as the fullness percentage may
have jumped noticable. This way we could use 'more' correct numbers
when checking for thresholds.
Zdenek Kabelac [Thu, 22 Dec 2016 18:51:35 +0000 (19:51 +0100)]
report: show proper info for merging origin
When there is 'merging' of an origin in progress, but metadata stil
do provide both origin and snapshot, we should show data from merged
snapshot. This is important mainly for thin case, where there was
a window, where i.e. 'lvs -o+device_id' would report information
about 'already gone' origin thin LV.
This race window is usually hard to trigger but can be ocasionally hit.
Usually shortly after activation, but before polling process manages
to update metadata after merge.
Zdenek Kabelac [Tue, 20 Dec 2016 14:59:11 +0000 (15:59 +0100)]
validation: rework segment validation
Move individual segment validation to a separate function
executed for 'complete_vg'.
Move some 'extra' validation bits from 'raid' validation to global
segtype validation (so extending existing normal validation)
TODO: still some test are left to be moved.
Reduce some duplication in validation process - there are still
some left thought so still room for improving overal speed.
Tony Asleson [Wed, 14 Dec 2016 21:30:01 +0000 (15:30 -0600)]
lvmdbusd: Use timeout_add instead
The function timeout_add_seconds has quite a bit of variability. Using
timeout_add which specifies the timeout in ms instead of seconds. Testing
shows that this is much more consistent which should improve clients that
are using shorter timeouts for the API and the connection.
Zdenek Kabelac [Mon, 19 Dec 2016 13:06:55 +0000 (14:06 +0100)]
lvconvert: fix shown lv name for snapshot split
We can't keep 'display_lvname' for too long - it's using
ringbuffer and keeps limited number of names. So it's
safe only per few simple tests, but can't be used anymore
after some function calls..
(Fixes 00e641ef37a977129acc503f3fa1b67f556ac5eb)
Bryn M. Reeves [Thu, 15 Dec 2016 19:03:42 +0000 (19:03 +0000)]
libdm: clear region table in dm_stats_list()
Call _stats_regions_destroy() from dm_stats_list() if dms->regions
is non-NULL. This avoids leaking any pool allocations and ensures
the handle is in a known state: if an error occurs during the list,
dms->regions will be NULL and the handle will appear empty.
Zdenek Kabelac [Sat, 17 Dec 2016 21:41:27 +0000 (22:41 +0100)]
lvconvert: support cache to external origin conversion
Add this functionality to lvconvert:
'lvconvert --thin cachedLV --thinpool vg/poll'
Converts cachedLV to external origin (which will be read-only).
New thin volume is created in thinpool LV and it's using external
origin as source for unprovisioned chunks.
This conversion happens online (while volume is in use).
Thin LV remains fully writable.
Cached external origin no longer could be written so cache will be used
ONLY for read operations. For this limitation we require cache mode
to be writethrough (as writeback cannot write to read-only volumes).
When thinLV is later removed cached external origin is again
fully usable, just note, LV remain in 'read-only' mode.
When read-write is needed, 'lvchange -prw' has to be used.
Single external origin could be user by multiple thinLV in
multiple differen thin pool.
Zdenek Kabelac [Sun, 18 Dec 2016 14:05:31 +0000 (15:05 +0100)]
cache: improve activation with -real
When cache volume may be converted from normal to -real layer LV
we need to improve logic for call cache_check.
With this patch, we register call for cache_check only when metadata LV
is not yet present in active table slot (should match initial table
load).
This avoids unwanted checking when cache would become layer device
online.
Zdenek Kabelac [Sat, 17 Dec 2016 21:40:59 +0000 (22:40 +0100)]
libdm: drop callback on revert path
The system is likely in some very inconsisten state.
Do not try to make it even more problematic with trying
to invoke tools like thin_check via callback.
Zdenek Kabelac [Sat, 17 Dec 2016 21:40:14 +0000 (22:40 +0100)]
lv: fix lock holder for external origin
External origin could be reloaded via more locks.
It's actually even more complex then thin-pool,
as it may be active on more nodes for linear LVs
(and maybe even more types).
External origin is always read-only thus unmodifiable
device so there should not be a problem accesing it
through multiple nodes.
Also for thin-pool check first presence of active thin-pool.
FIXME:
It's not easy to detect on which nodes this device is active
Thus manipulation with such device may require checking every
node and it active state and refresh.
But since such setup is quite complex to prepare and use,
hopefully there are not user trying to 'explore' this usage yet.
Zdenek Kabelac [Sat, 17 Dec 2016 20:52:27 +0000 (21:52 +0100)]
cache: prepare status checking for layer
To be ready to show status of cache volume, call the status
with layer. Layer is automatically detected in this case when
cache volume is used in 'layered' form (needs -real suffix).