Zdenek Kabelac [Tue, 13 Jan 2015 14:23:03 +0000 (15:23 +0100)]
thin: errrorwhenfull support
Support error_if_no_space feature for thin pools.
Report more info about thinpool status:
(out_of_data (D), metadata_read_only (M), failed (F) also as health
attribute.)
Zdenek Kabelac [Wed, 14 Jan 2015 09:31:24 +0000 (10:31 +0100)]
cleanup: update API for segment reporting
API for seg reporting is breaking internal lvm coding - it cannot
use vgmem mem pool for allocation of reported value.
So use separate pool instead of 'vgmem' for non vg related allocations
Add consts for many function params - but still many other are left
for now as non-const - needs deeper level of change even on libdm side.
raid_manip: fix multi-segment misallocation on 'lvconvert --repair'
An 'lvconvert --repair $RAID_LV" to replace a failed leg of a multi-segment
RAID10/4/5/6 logical volume can lead to allocation of (parts of) the replacement
image component pair on the physical volume of another image component
(e.g. image 0 allocated on the same PV as image 1 silently impeding resilience).
Patch fixes this severe resilince issue by prohibiting allocation on PVs
already holding other legs of the RAID set. It allows to allocate free space
on any operational PV already holding parts of the image component pair.
Peter Rajnoha [Mon, 12 Jan 2015 14:16:57 +0000 (15:16 +0100)]
tests: pvscan --cache DevicePath does not fail if the device is just filtered
It's not an error if the device is filtered out and hence cleared from
lvmetad cache - "pvscan --cache DevPath" has now the same behaviour in
this case as "pvscan --cache major:minor" (which is more consistent).
Before, the tests expected failure return code for "pvscan --cache DevicePath"
if the device was filtered (which is a different situation if the device
is missing in the system completely!).
Peter Rajnoha [Mon, 12 Jan 2015 13:02:57 +0000 (14:02 +0100)]
dev-type: filter out partitioned device-mapper devices as unsuitable for use as PVs
Normally, if there are partitions defined on top of device-mapper
device, there should be a device-mapper device created for each
partiton on top of the old one and once the underlying DM device
is used by another devices (partition mappings in this case),
it can't be used as a PV anymore.
However, sometimes, it may happen the partition mappings are
missing - either the partitioning tool is not creating them if
it does not contain full support for device-mapper devices or
the mappings were removed.
Better safe than sorry - check for partition header on DM devs
and filter them out as unsuitable for PVs in case the check is
positive. Whatever the user is doing, let's do our best to prevent
unwanted corruption (...by running pvcreate on top of such device
that would corrupt the partition header).
Peter Rajnoha [Mon, 12 Jan 2015 12:50:11 +0000 (13:50 +0100)]
pvscan: notify lvmetad about device that is gone and pvscan is run with device path instead of major:minor pair
If pvscan is run with device path instead of major:minor pair and this
device still exists in the system and the device is not visible anymore
(due to a filter that is applied), notify lvmetad properly about this.
This makes it more consistent with respect to existing pvscan with
major:minor which already notifies lvmetad about device that is gone
due to filters.
However, if the device is not in the system anymore, we're not able
to translate the original device path into major:minor pair which
lvmetad needs for its action (lvmetad_pv_gone fn). So in this case,
we still need to use major:minor pair only, not device path. But at
least make "pvscan --cache DevicePath" as near as possible to "pvscan
--cahce <major>:<minor>" functionality.
Also add a note to pvscan man page about this difference when using
pvscan --cache with DevicePath and major:minor pair.
Peter Rajnoha [Fri, 9 Jan 2015 15:34:16 +0000 (16:34 +0100)]
scripts: clvmd: replace awk functionality with LVM's selection
No need to use awk now to get appropriate VGs/LVs, use LVM's
own --select - it's quicker, it removes a need for external
dependency on awk and it's also more readable.
Peter Rajnoha [Fri, 9 Jan 2015 13:04:44 +0000 (14:04 +0100)]
metadata: add "Failed to write VG <vg_name>." on failed vg_write and revert previous patch
Better than previous patch which changed log_warn to log_error -
we can have multiple MDAs and if one of them fails to be written,
we can still continue with other MDAs if we're in a mode where
we can handle missing PVs - so keep the log_warn for single
failed MDA write as it was before.
However, add log_error with "Failed to write VG <vg_name>." in
case we're not handling missing PVs or no MDA was written at all
during VG write process. This also prevents an internal error in
which the vg_write fails and we're not issuing any other log_error
in vg_write caller or above, so we end up with:
"Internal error: Failed command did not use log_error".
Peter Rajnoha [Fri, 9 Jan 2015 10:24:16 +0000 (11:24 +0100)]
dev_manager: do not mark snapshot origins as unusable devices just because of possible blocked mirror underneath
At first, all snapshot-origins where marked as unusable unconditionally
here, but we can't cut off whole snapshot-origin use in a stack just
because of this possible mirror state. This whole "device_is_usable"
check was even incorrectly part of persistent filter before commit a843d0d97c66aae1872c05b0f6cf4bda176aae2 (where filter cleanup was
done).
The persistent filter is used only if obtain_device_list_from_udev=0,
which means that the former check for snapshot-origin here had not even
been hit with default configuration for a few years before commit a843d0d97c66aae1872c05b0f6cf4bda176aae2 (the check for snapshot-origin and
skipping of this LV was introduced with commit a71d6051ed3d72af6895733c599cc44b49f24dbb
back in 2010).
The obtain_device_list_from_udev=1 (and hence not using persistent
filter and hence not hitting this check for snapshot-origins and skipping) has been
in action since commit edcda01a1e18af6599275801a8237fe10112ed6f (that is 2011).
So for 3 years this condition was not even checked with default configuration,
making it superfluous.
This all changed in 2014 with commit 8a843d0d97c66aae1872c05b0f6cf4bda176aae2
where "filter-usable" is introduced and since then all snapshot-origins
have been marked as unusable more often than before and making snapshot-origins
practically unusable in a stack.
If we want to fix this eventually in a correct way, we need to look
down the stack and if snapshot-origin is hit and there's a blocked
mirror underneath, only then mark the device as unusable. But mirrors
in stack are not supported anymore so it's questionable whether it's
worth spending more time on this at all...
Before:
$ lvs -a vg
LV VG Active Attr LSize Cpy%Sync Layout Role
lvol0 vg active rwi-a-r--- 4.00m 100.00 raid,raid1 public
[lvol0_mimage_0_rimage_0] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_mimage_1_rimage_1] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rmeta_0] vg active ewi-aor--- 4.00m linear private,raid,metadata
[lvol0_rmeta_1] vg active ewi-aor--- 4.00m linear private,raid,metadata
Incorrect name: lvol0_mimage_0_rimage_0
With this patch applied:
$ lvs -a vg
LV VG Active Attr LSize Cpy%Sync Layout Role
lvol0 vg active rwi-a-r--- 4.00m 100.00 raid,raid1 public
[lvol0_rimage_0] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rimage_1] vg active iwi-aor--- 4.00m linear private,raid,image
[lvol0_rmeta_0] vg active ewi-aor--- 4.00m linear private,raid,metadata
[lvol0_rmeta_1] vg active ewi-aor--- 4.00m linear private,raid,metadata
Peter Rajnoha [Wed, 7 Jan 2015 09:52:38 +0000 (10:52 +0100)]
mirror: do not try to reactivate inactive mirror when removing its LVs which have missing PVs
When mirror has missing PVs and there are mirror images on those missing
PVs, we delete the images and during this delete operation, we also
reactivate the LV. But if we're trying to reactivate the LV in cluster
which is not active and at the same time cmirrord is not running (which
is OK since we may have created the mirror LV as inactive), we end up
with:
"Error locking on node <node_name>: Shared cluster mirrors are not available."
That is because we're trying to activate the mirror LV without cmirrord.
However, there's no need to do this reactivation if the mirror LV (and
hence it's sub LVs) were not activated before.
This issue caused failure in mirror-vgreduce-removemissing.sh test
recently with this sequence (excerpt from the test script):
The important thing about that test is that we're not running cmirrord,
we're activating the mirror with "-an" so it's inactive and then
vgreduce --removemissing tries to reactivate the mirror images
as part of the _delete_lv function call inside and since cmirrord
is not running, we end up with the "Shared cluster mirrors are not
available." error.
Peter Rajnoha [Tue, 6 Jan 2015 08:59:04 +0000 (09:59 +0100)]
cmirror: do not check for cmirror availability when creating deactivated cluster mirrors
When creating cluster mirrors while they're not supposed to be activated
immediately after creation, we don't need to check for cmirrord availability.
We can just create these mirrors and let the check to be done on activation
later on. This is addendum for commit cba6186325f0d5806cf1ddec276b3bb8e178687a.
Peter Rajnoha [Mon, 5 Jan 2015 15:45:30 +0000 (16:45 +0100)]
cmirror: check for cmirror availability during cluster mirror creation and activation
When creating/activating clustered mirrors, we should have cmirrord
available and running. If it's not, we ended up with rather cryptic
errors like:
$ lvcreate -l1 -m1 --type mirror vg
Error locking on node 1: device-mapper: reload ioctl on failed: Invalid argument
Failed to activate new LV.
$ vgchange -ay vg
Error locking on node node 1: device-mapper: reload ioctl on failed: Invalid argument
This patch adds check for cmirror availability and it errors out
properly, also giving a more precise error messge so users are able
to identify the source of the problem easily:
$ lvcreate -l1 -m1 --type mirror vg
Shared cluster mirrors are not available.
$ vgchange -ay vg
Error locking on node 1: Shared cluster mirrors are not available.
Exclusively activated cluster mirror LVs are OK even without cmirrord:
$ vgchange -aey vg
1 logical volume(s) in volume group "vg" now active
Peter Rajnoha [Fri, 19 Dec 2014 08:17:16 +0000 (09:17 +0100)]
libdm: report: add more comments about helper macros to get reserved values
Since GET_FIELD_RESERVED_VALUE always returns a pointer, don't reference
it with "&" when used - we already have that pointer value (this is an
addendum to recent commit 028ff309472834e82fe4b849ea4c243feb5098b9).
Only GET_TYPE_RESERVED_VALUE needs to be referenced with "&" as it
returns directly the value of that type.
Peter Rajnoha [Thu, 18 Dec 2014 15:12:50 +0000 (16:12 +0100)]
report: fix segfault on NULL value hit in cache_settings field
We have to use empty list, not NULL if we want to denote that the list
has no items. Otherwise, the code further can segfault as it expects
there's always a sane value (= some list), including empty list,
but never NULL.
- print "" if the cache_policy value is undefined (the first name for this reserved value is "")
- recognize "undefined" reserved name as synonym to ""
(so statements like "lvs -S cache_policy=undefined" are still recognized)
Peter Rajnoha [Thu, 18 Dec 2014 13:52:16 +0000 (14:52 +0100)]
cleanup: use helper macros to get reserved value from values.h for vg_mda_copies and lv_read_ahead fields
Avoid making a copy of the keyword which is already registered in
values.h for "unmanaged" (vg_mda_copies field) and "auto" reserved
value (lv_read_ahead field). Also use helper macros to handle these
reserved - this is the correct approach - just do not copy the same
thing again and do not mix it! The GET_FIELD_RESERVED_VALUE and
GET_FIRST_RESERVED_NAME macros guarantees this - use it!
In addition to that, rename reserved values:
vg_mda_copies --> vg_mda_copies_unmanaged
lv_read_ahead --> lv_read_ahead_auto
So the field reserved values follows this scheme:
"<field_name>_<reserved_value_name>".
The same applies for type reserved values with this scheme:
"<report type name in lowercase>_<reserved_value_name>"
Add a comment about this scheme for others to follow as well
when adding new fields and their reserved values. This makes
it a bit easier to read the code then.
Also add GET_FIELD_RESERVED_VALUE(id) macro to get per-field reserved value.
This makes it much more readable and hopefully it'll make it
easier to use these helper macros when adding new reporting
fields with reserved values if needed.
Peter Rajnoha [Thu, 18 Dec 2014 10:54:40 +0000 (11:54 +0100)]
report: dup cache policy name string for report in cache_policy field
The cache policy name taken as LV segment property must be duped
for report as the VG/LV/seg structure is destroyed after processing,
reporting happens later:
$ valgrind lvs -o+cache_policy
...
==16589== Invalid read of size 1
==16589== at 0x54ABCC3: dm_report_compact_fields
(libdm-report.c:1739)
==16589== by 0x153FC7: _report (reporter.c:619)
==16589== by 0x1540A6: lvs (reporter.c:641)
==16589== by 0x148021: lvm_run_command (lvmcmdline.c:1452)
==16589== by 0x1495CB: lvm2_main (lvmcmdline.c:1907)
==16589== by 0x164712: main (lvm.c:21)
==16589== Address 0x7d465f2 is 8,338 bytes inside a block of size
16,384 free'd
==16589== at 0x4C2ACE9: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==16589== by 0x54B8C85: _free_chunk (pool-fast.c:318)
==16589== by 0x54B84FB: dm_pool_destroy (pool-fast.c:78)
==16589== by 0x1E59C7: _free_vg (vg.c:78)
==16589== by 0x1E5A6D: release_vg (vg.c:95)
==16589== by 0x159B6E: _process_lv_vgnameid_list (toollib.c:1967)
==16589== by 0x159DD7: process_each_lv (toollib.c:2030)
==16589== by 0x153ED8: _report (reporter.c:598)
==16589== by 0x1540A6: lvs (reporter.c:641)
==16589== by 0x148021: lvm_run_command (lvmcmdline.c:1452)
==16589== by 0x1495CB: lvm2_main (lvmcmdline.c:1907)
==16589== by 0x164712: main (lvm.c:21)
Peter Rajnoha [Thu, 18 Dec 2014 10:29:48 +0000 (11:29 +0100)]
libdm: report: also check whether field type is supported for field-specific reserved value
We only checked global per-report-type reserved values for compatibility
with selection code. This patch also adds a check for per-report-field
reserved values. This avoids problems where unsupported report type is
used as reserved value which could cause hard to debug problems
otherwise. So this additional check stops from registering unsupported
and unhandled per-field reserved values.
Registerting such unsupported reserved value is a programmatic error,
so report internal error in this case to stop us from making a mistake
here in the future or even today where STR_LIST fields can't have
reserved values yet.
Peter Rajnoha [Thu, 11 Dec 2014 14:16:04 +0000 (15:16 +0100)]
tools, man: --binary option is available with -C for {pv,vg,lv}display
The {pv,vg,lv}display *do* use reporting in case "-C|--columns" is used.
The man page was correct, the recognition for the --binary was missing
in the code though!
Peter Rajnoha [Thu, 11 Dec 2014 09:40:28 +0000 (10:40 +0100)]
man: remove reference to --binary in {pv,vg,lv}display man page
The {pv,vg,lv}display commands don't use reporting capabilites and
as such they can't use --binary. This got into the man pages by
mistake - the display commands do not recognize --binary option.
Peter Rajnoha [Wed, 10 Dec 2014 12:43:54 +0000 (13:43 +0100)]
vgimportclone: also notify lvmetad about changes if it's used
All the LVM commands are run in mode without lvmetad use (since lvmetad
can't handle duplicates). When we're finished with vgimportclone, we
need to notify lvmetad about changes.
Before this patch (/dev/sda and /dev/sdb contains a copy VG called "vg"):
$ vgimportclone --basevgname vg_snap /dev/sdb
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: Activation disabled. No device-mapper interaction will be attempted.
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Physical volume "/tmp/snap.zcJ8LCmj/vgimport0" changed 1 physical volume changed / 0 physical volumes not changed
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: Activation disabled. No device-mapper interaction will be attempted.
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Volume group "vg" successfully changed
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Volume group "vg" successfully renamed to "vg_snap"
Reading all physical volumes. This may take a while...
Found volume group "vg" using metadata type lvm2
Found volume group "fedora" using metadata type lvm2
With this patch applied:
$ vgimportclone --basevgname vg_snap /dev/sdb
...
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Volume group "vg" successfully renamed to "vg_snap"
Notifying lvmetad about changes since it was disabled temporarily.
Reading all physical volumes. This may take a while...
Found volume group "vg_snap" using metadata type lvm2
Found volume group "fedora" using metadata type lvm2
Found volume group "vg" using metadata type lvm2
The "restart lvmetad before enabling it" message is a bit misleading
here - we should probably suppress this one, but we can't suppress
warning messages selectively at the moment and we don't want to lose
other warning/error messages printed...
Peter Rajnoha [Wed, 10 Dec 2014 12:25:23 +0000 (13:25 +0100)]
vgimportclone: replace awk with dumpconfig to generate temporary lvm.conf for vgimportclone
With current dumpconfig, we can generate lvm.conf easily - we can merge
current lvm.conf with the config given on cmd line:
lvm dumpconfig --mergedconfig --config "..."
This is a bit simpler than using awk and it also avoids problems when some of
the configuration is missing in existing lvm.conf file and hardcoded defaults
are used instead. The dumpconfig handles this transparently.
The problem here is the use of --ubuffered together with regex used in
selection criteria. If the report output is not buffered, each row is
discarded as soon as it is reported. The bug is in the use of report
handle's memory - in the example above, what happens is:
1) report handle is initialized together with its memory pool
2) selection tree is initialized from selection criteria string
(using the report handle's memory pool!)
2a) this also means the regex is initialized from report handle's mem pool
3) the object (row) is reported
3a) any memory needed for output is intialized out of report handle's mem pool
3b) selection criteria matching is executed - if the regex is checked the
very first time (for the very first row reported), some more memory
allocation happens as regex allocates internal structures "on-demand",
it's allocating from report handle's mem pool (see also step 2a)
4) the report output is executed
5) the object (row) is discarded, meaning discarding all the mem pool
memory used since step 3.
Now, with step 5) we have discarded the regex internal structures from step 3b.
When we execute reporting for another object (row), we're using the same
selection criteria (step 3b), but tihs is second time we're using the regex
and as such, it's already initialized completely. But the regex is missing the
internal structures now as they got discarded in step 5) from previous
object (row) reporting (because we're using "unbuffered" reporting).
To resolve this issue and to prevent any similar future issues where each
object/row memory is discarded after output (the unbuffered reporting) while
selection tree is global for all the object/rows, use separate memory pool
for report's selection.
This patch replaces "struct selection_node *selection_root" in struct
dm_report with new struct selection which contains both "selection_root"
and "mem" for separate mem pool used for selection.
We can change struct dm_report this way as it is not exposed via libdevmapper.
(This patch will have even more meaning for upcoming patches where selection
is used even for non-reporting commands where "internal" reporting and
selection criteria matching happens and where the internal reporting is
not buffered.)
Zdenek Kabelac [Thu, 4 Dec 2014 13:24:44 +0000 (14:24 +0100)]
raid: properly rename split image
When we split leg from raid - we take a proper new lock for a new LV.
However for now activation checks only 'existince' of device UUID,
but it's not validating device has a proper name.
As a quick fix call suspend()/resume() to rename after split mirror.
Peter Rajnoha [Fri, 5 Dec 2014 10:42:43 +0000 (11:42 +0100)]
libdm: report: add dm_report_compact_fields
Add new dm_report_compact_fields function to cause report outout
(dm_report_output) to ignore fields which don't have any value set
in any of the rows reported. This provides support for compact report
output where only fields which have something to report are displayed.
The dm_report_set_output_selection was not implemented in the end -
we have dm_report_init_with_selection instead. This is just a remnant
from development code that got into libdevmapper.h by mistake.
Peter Rajnoha [Wed, 26 Nov 2014 10:30:01 +0000 (11:30 +0100)]
coverity: remove dead code in lv_info_with_seg_status
Just call return 0 directly on error path, without using
"goto" - the code is short, no need to use it this way
(the dead code appeared as part of further changes in this
function).
Zdenek Kabelac [Wed, 26 Nov 2014 08:27:40 +0000 (09:27 +0100)]
thin: add missing 64KB rounding
When chunk size needs to be estimated, the code missed to round
to proper 64kb boundaries (or power of 2 for older thin pool driver).
So for some data and metadata size (i.e. 10GB and 4MB) it resulted
in incorrect chunk size (not being a multiple of 64KB)
Fix it by adding proper rounding and also use 1 routine for 2 places
where the same calculation is made.
Fix also incorrect printed warning that has used 'ffs()'
(which returns first 'least significant' bit in word)
and it was not really giving any useful size info and replace it
with properly estimated chunk size.