David Teigland [Tue, 13 Nov 2018 21:00:11 +0000 (15:00 -0600)]
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
David Teigland [Mon, 26 Nov 2018 18:49:39 +0000 (12:49 -0600)]
pvscan systemd service for event based activation
The pvscan systemd service for autoactivation was
mistakenly dropped along with the lvmetad related
services.
The activation generator program now looks at the new
lvm.conf setting "event_activation" (default 1) to
switch between event activation and direct activation.
Previously, the old use_lvmetad setting was used to
switch between event and direct activation.
David Teigland [Wed, 21 Nov 2018 21:16:23 +0000 (15:16 -0600)]
writecache: set block_size using --cachesettings
instead of a separate --writecacheblocksize option.
writecache block_size is not technically a setting,
but it can borrow the option as a special case.
David Teigland [Fri, 16 Nov 2018 19:09:29 +0000 (13:09 -0600)]
bcache: sync io fixes
fix lseek error check
fix read/write error checks
handle zero return from read and write
don't return an error for short io
fix partial read/write loop
David Teigland [Fri, 16 Nov 2018 18:21:20 +0000 (12:21 -0600)]
io: use sync io if aio fails
io_setup() for aio may fail if a system has reached the
aio request limit. In this case, fall back to using
sync io. Also, lvm use of aio can be disabled entirely
with config setting global/use_aio=0.
The system limit for aio requests can be seen from
/proc/sys/fs/aio-max-nr
The current usage of aio requests can be seen from
/proc/sys/fs/aio-nr
The system limit for aio requests can be increased by
setting fs.aio-max-nr using sysctl.
Zdenek Kabelac [Mon, 19 Nov 2018 17:06:39 +0000 (18:06 +0100)]
tests: update required raid target
For embeded reshaping operation require higher driver version.
(otherwise we get:
Converting vg/LV1 from raid6 (same as raid6_zr) is directly possible to the following layouts:
raid6_nc
raid6_nr
raid6_la_6
raid6_ls_6
raid6_ra_6
raid6_rs_6
raid6_n_6
Zdenek Kabelac [Sun, 18 Nov 2018 20:38:56 +0000 (21:38 +0100)]
tests: skip portion of test for lvmpolld
lvmpolld ATM is not desingned to preserve interval checking
in the same way the 'lvconvert' tool is doing - so the passed
'-i 40' is not respected and lvmpolld autonomously checks
state of conversion and updates lvm2 metadata and dm tables
when needed.
So skip portion of test that relayed on this and preserve this logic
only for command line invocation and forking of polling process
where the interval of checking is under full control.
Zdenek Kabelac [Sat, 17 Nov 2018 00:38:39 +0000 (01:38 +0100)]
tests: prefer internal header
Although we primarily want to check externally used libdevmapper
library, out internal all.h is still keeping all symbols
as the original library has - so for simpler compilation keep
using this private copy for defining needed symbols.
Zdenek Kabelac [Fri, 16 Nov 2018 14:54:09 +0000 (15:54 +0100)]
libdm: do not add params for resume and remove
DM_DEVICE_CREATE with table is doing several ioctl operations,
however only some of then takes parameters.
Since _create_and_load_v4() reused already existing dm task from
DM_DEVICE_RELOAD it has also kept passing its table parameters
to DM_DEVICE_RESUME ioctl - but this ioctl is supposed to not take
any argument and thus there is no wiping of passed data - and
since kernel returns buffer and shortens dmi->data_size accordingly,
anything past returned data size remained uncleared in zfree()
function.
This has problem if the user used dm_task_secure_data (i.e. cryptsetup),
as in this case binary expact secured data are erased from main memory
after use, but they may have been left in place.
This patch is also closing the possible hole for error path,
which also reuse same dm task structure for DM_DEVICE_REMOVE.
Zdenek Kabelac [Mon, 12 Nov 2018 14:28:45 +0000 (15:28 +0100)]
tests: add wait loop
Add a little wait loop - since lvconvert started background process
and we need to wait till this bg task initiate its work -
adding ~1s loop should give reasonable enough time to start mirroring.
Zdenek Kabelac [Mon, 12 Nov 2018 14:22:44 +0000 (15:22 +0100)]
devicemapper: retry mirror leg deactivation
This could be seen as continuation of 6cee8f1b063dcf5d809e14de38ba489ce5b8f562.
Some test maching with old udev system shows problem,
where udev 'jumps on' leg device after mirror target
releases its legs - since udev does not (in this old case) skips
such device from scanning - it opens device - and this prevent
leg device to be deactivated - effectively such device stays
'leaked' in DM table invisibly to lvm2 command.
So to 'combat' this issue - if the device has '_mimage' in its name,
the retry of deactivation is automatically assumed.
NOTE: wider impact is unexpected - as it's touching only old mirror
target which is nowadays replaced with 'raid'.
In case there will be some problem identified - probably both patches
should be reverted.
Zdenek Kabelac [Thu, 8 Nov 2018 11:12:58 +0000 (12:12 +0100)]
devicemapper: retry remove even for subLVs
With older systems and udevs we don't have control over scanning of lvm2
internal devices - so far we set retry-removal only for top-level LVs,
but in occasional cases udev can be 'fast enough' to open device for
scanning and prevent removal of such device from DM table.
So to combat this case - try to pass 'retry' flag also for removal of
internal device so see how many races can go away with this simple
patch.
Note: patch is applied only to internal version of libdm so the external
API remains working in the old way for now.
Zdenek Kabelac [Thu, 8 Nov 2018 09:02:28 +0000 (10:02 +0100)]
activation: trimming string is expected
Commit 813347cf84617a946d9184f44c5fbd275bb41766 added extra validation,
however in this particular we do want to trim suffix out so rather ignore
resulting error code here intentionaly.
David Teigland [Fri, 17 Aug 2018 20:45:52 +0000 (15:45 -0500)]
Allow dm-cache cache device to be standard LV
If a single, standard LV is specified as the cache, use
it directly instead of converting it into a cache-pool
object with two separate LVs (for data and metadata).
With a single LV as the cache, lvm will use blocks at the
beginning for metadata, and the rest for data. Separate
dm linear devices are set up to point at the metadata and
data areas of the LV. These dm devs are given to the
dm-cache target to use.
The single LV cache cannot be resized without recreating it.
If the --poolmetadata option is used to specify an LV for
metadata, then a cache pool will be created (with separate
LVs for data and metadata.)
Usage:
$ lvcreate -n main -L 128M vg /dev/loop0
$ lvcreate -n fast -L 64M vg /dev/loop1
$ lvs -a vg
LV VG Attr LSize Type Devices
main vg -wi-a----- 128.00m linear /dev/loop0(0)
fast vg -wi-a----- 64.00m linear /dev/loop1(0)
$ lvconvert --type cache --cachepool fast vg/main
$ lvs -a vg
LV VG Attr LSize Origin Pool Type Devices
[fast] vg Cwi---C--- 64.00m linear /dev/loop1(0)
main vg Cwi---C--- 128.00m [main_corig] [fast] cache main_corig(0)
[main_corig] vg owi---C--- 128.00m linear /dev/loop0(0)
Zdenek Kabelac [Tue, 6 Nov 2018 14:01:52 +0000 (15:01 +0100)]
tests: add wait for udev
Since the test is currently directly working with live directory,
which can be getting updates from system's udev - add wait
for settling so removal of all known PVs happens after that.
But still this has major influce on behavior of running system,
so the test should never be executed on a user used box.
Zdenek Kabelac [Thu, 1 Nov 2018 14:33:34 +0000 (15:33 +0100)]
cov: split check for type assignment
Check that type is always defined, if not make it explicit internal
error (although logged as debug - so catched only with proper lvm.conf
setting).
This ensures later type being NULL can't be dereferenced with coredump.
David Teigland [Fri, 2 Nov 2018 17:11:09 +0000 (12:11 -0500)]
lvmlockd: fix handling of sanlock release error
When sanlock_release returns an error because of an i/o
timeout releasing the lease on disk, lvmlockd should just
consider the lock released. sanlock will continue trying
to release the lease on disk after the original request
times out.
David Teigland [Mon, 22 Oct 2018 20:45:23 +0000 (15:45 -0500)]
lvmlockd: use new sanlock sector/align interface
The choice about sector size and lease align size is
now made by the sanlock user, in this case lvmlockd.
This will allow lvmlockd to use other lease sizes in
the future. This also prevents breakage if hosts
report different sector sizes, or the sector size
reported by a device changes.
Bryn M. Reeves [Thu, 1 Nov 2018 16:49:05 +0000 (16:49 +0000)]
dmsetup: fix stats report command output
Since the stats handle is neither bound nor listed before the
attempt to call dm_stats_get_nr_regions(), it will always return
zero: this prevents reporting of any dmstats regions on any
device.
Remove the dm_stats_get_nr_regions() check and instead rely on
the correct return status from dm_stats_populate() which only
returns 0 in the case that there are regions to inspect (and
which logs a specific error for all other cases).
Bryn M. Reeves [Thu, 1 Nov 2018 16:47:56 +0000 (16:47 +0000)]
libdm-stats: move no regions warning after dm_stats_list()
It doesn't make sense to test or warn about the region count until
the stats handle has been listed: at this point it may or may not
contain valid information (but is guaranteed to be correct after
the list).
David Teigland [Mon, 29 Oct 2018 21:53:17 +0000 (16:53 -0500)]
metadata: prevent writing beyond metadata area
lvm uses a bcache block size of 128K. A bcache block
at the end of the metadata area will overlap the PEs
from which LVs are allocated. How much depends on
alignments. When lvm reads and writes one of these
bcache blocks to update VG metadata, it can also be
reading and writing PEs that belong to an LV.
If these overlapping PEs are being written to by the
LV user (e.g. filesystem) at the same time that lvm
is modifying VG metadata in the overlapping bcache
block, then the user's updates to the PEs can be lost.
This patch is a quick hack to prevent lvm from writing
past the end of the metadata area.