David Teigland [Mon, 15 Apr 2019 16:27:49 +0000 (11:27 -0500)]
pvscan: handle case of scanning PV without metadata last
Handle the case where pvscan --cache -aay (with no dev args)
gets to the final PV, completing the VG, but that final PV does not
have VG metadata. In this case, we need to use VG metadata from a
previously scanned PV in the same VG, which we saved for this
possibility. Using this saved metadata, we can find which VG
this PVID belongs to, and then check if that VG is now complete,
and if so add the VG name to the list of complete VGs to be
autoactivated.
David Teigland [Thu, 11 Apr 2019 16:49:18 +0000 (11:49 -0500)]
hints: fix non-empty hints list when not using hints
When hints are invalid and ignored, the list of hints
could be non-empty (from additions before an invalid
hint was found). This confused the calling code which
was checking for an empty list to see if hints were used.
Ensure the list is empty when hints are not used.
Peter Rajnoha [Thu, 11 Apr 2019 10:18:02 +0000 (12:18 +0200)]
systemd: put back DefaultDependencies=no for lvmpolld socket unit
Previous commit 0cab341e1d0e8f9089d3c62d3adbec24dfd5e124 removed this
by mistake - we have to keep the DefaultDependencies=no - the
sockets.target is after sysinit.target.
Peter Rajnoha [Tue, 9 Apr 2019 10:10:17 +0000 (12:10 +0200)]
systemd: add missing Before=shutdown.target to LVM2 services to fix shutdown ordering
We already used Conflicts=shutdown target to stop LVM2 services on shutdown.
But we still missed the ordering - the shutdown.target should be reached
only after all the services are really stopped.
David Teigland [Fri, 5 Apr 2019 21:44:00 +0000 (16:44 -0500)]
pvscan: ignore device with incorrect size
If a device looks like a PV, but its size does not
match the PV size in the metadata, then skip it for
purposes of autoactivation. It's probably not wrong
device for the PV.
David Teigland [Tue, 26 Feb 2019 22:48:29 +0000 (16:48 -0600)]
pvscan: remove initialization case
In the past, the first 'pvscan --cache -aay dev' command
to run on the system would initialize the pvs_online dir
by scanning all devs and creating online files for all pvs
it found, and then autoactivating the VG (if complete) for
the named dev. The idea was that the system may not have
been able to run pvscan commands for early devices, so the
first pvscan to run would need to "make up" for any devices
that had appeared previously, which the system was unable to
scan. The problem or idea of making up for missed scans is
historical and should no longer be needed, so remove this
special init case.
David Teigland [Tue, 26 Feb 2019 22:39:43 +0000 (16:39 -0600)]
pvscan: for init only autoactivate vg for named dev
When pvscan is run for the initialization case (the first
pvscan run on the system), it scans all devs and creates
online files for all PVs it finds. Previously it would
then autoactivate every complete VG, but change this to
only autoactive the (complete) VG corresponding to the
named device arg(s).
David Teigland [Thu, 4 Apr 2019 19:36:28 +0000 (14:36 -0500)]
man: updates to lvmlockd
- remove reference to locking_type which is no longer used
- remove references to adopting locks which has been disabled
- move some sanlock-specific info out of a general section
- remove info about doing automatic lockstart by the system
since this was never used (the resource agent does it)
- replace info about lvextend and manual refresh under gfs2
with a description about the automatic remote refresh
When data are growing, adapt also size of metadata.
As we get way too many reports from users doing huge growths of
data portion while keep metadata small and avoiding using monitoring.
So to enhance the user-experience in case user requests grown of
thin-pool (without passing PV list for growth) - lvm2 will automaticaly
grown also the metadata part of thin-pool (if possible).
Add function for estimation of thin-pool metadata size for given size of
data. Function is using already existing internal API so it can
be reused for resize of thin-pool data.
David Teigland [Fri, 22 Mar 2019 20:01:29 +0000 (15:01 -0500)]
lvextend: refresh shared LV with vgname as arg
Update the previous commit to leave the vgname as
an arg instead of moving it into the select option,
(the compound select option rule is confusing the
dlm arg processing.)
David Teigland [Fri, 22 Mar 2019 19:28:02 +0000 (14:28 -0500)]
lvextend: refresh shared LV using select option
Using --select 'lvname=LV && vgname=VG' avoids the problem
of the lvchange exit code not distinguishing an actual error
result vs the VG or LV not existing. (This is in case there
is an odd dlm/gfs2 setup where some nodes are running the dlm
but do not have access to the VG.)
David Teigland [Wed, 20 Mar 2019 18:20:26 +0000 (13:20 -0500)]
lvextend: refresh shared LV remotely using dlm/corosync
When lvextend extends an LV that is active with a shared
lock, use this as a signal that other hosts may also have
the LV active, with gfs2 mounted, and should have the LV
refreshed to reflect the new size. Use the libdlmcontrol
run api, which uses dlm_controld/corosync to run an
lvchange --refresh command on other cluster nodes.
David Teigland [Thu, 7 Mar 2019 17:20:41 +0000 (11:20 -0600)]
warn about changes to an active lv with shared lock
When an LV is active with a shared lock, a command can be
run to change the LV with --lockopt skiplv (to override the
exclusive lock the command ordinarily requires which is not
compatible with the outstanding shared lock.)
In this case, other commands may have the LV active and may
need to refresh the LV, so print warning stating this.
Zdenek Kabelac [Wed, 6 Feb 2019 11:37:47 +0000 (12:37 +0100)]
activation: synchronize before removing devices
Udev is running udev-rule action upon 'resume'.
However lvm2 in special case is doing replacement of
'soon-to-be-removed' device with 'error' target for resuming
and then follows actual removal - the sequence is usually quick,
so when udev start action - it can result in 'strange' error
message in kernel log like:
Process '/usr/sbin/dmsetup info -j 253 -m 17 -c --nameprefixes --noheadings --rows -o name,uuid,suspended' failed with exit code 1.
To avoid this - we need to ensure there is synchronization wait for udev
between 'resume' and 'remove' part of this process.
However existing code put strict requirement to avoid synchronizing with
udev inside critical section - but this originally came from requirement
to not do anything special while there could be devices in
suspend-state. Now we are able to see differnce between critical section
with or without suspended devices. For udev synchronization only
suspended devices are prohibited to be there - so slightly relax
condition and allow calling and using 'fs_sync()' even inside critical
section - but there must not be any suspended device.
Zdenek Kabelac [Thu, 24 Jan 2019 13:12:26 +0000 (14:12 +0100)]
vdo: enable caching for vdopool LV and vdo LV
Allow using caching with VDO.
User can either cache a single vdopool or
a vdo LV - difference when the caching is put-in depends on a use-case
and it's upto user to decide which kind of speed is expected.
Zdenek Kabelac [Wed, 13 Mar 2019 12:02:09 +0000 (13:02 +0100)]
filter: enhance mpath detection
Internal detection of SCSI device being in-use by DM mpath has been
performed several times for each component device - this could be
eventually racy - so instead when we do remember 1st. checked result
for device being mpath and use it consistenly over the filter runtime.
Move DM usage into dev_manager.c source file.
Also convert STATUS to INFO ioctl - as that's enough
to obtain UUID - this also avoid issuing unwanted flush on checked DM
device for being mpath.
David Teigland [Tue, 5 Mar 2019 21:19:05 +0000 (15:19 -0600)]
pvscan: ignore online for shared and foreign PVs
Activation would not be allowed anyway, but we can
check for these cases early and avoid wasted time in
pvscan managing online files an attempting activation.
David Teigland [Mon, 4 Mar 2019 17:18:34 +0000 (11:18 -0600)]
io: increase the default io memory from 4 to 8 MiB
This is the default bcache size that is created at the
start of the command. It needs to be large enough to
hold a single copy of metadata for a given VG, or the
VG cannot be read or written (since the entire VG would
not fit into available memory.)
Increasing the default reduces the chances of anyone
needing to increase the default to use their VG.
The size can be set in lvm.conf global/io_memory_size;
the lower limit is 4 MiB and the upper limit is 128 MiB.
David Teigland [Mon, 4 Mar 2019 18:13:09 +0000 (12:13 -0600)]
io: warn when metadata size approaches io memory size
When a single copy of metadata gets within 1MB of the
current io_memory_size value, begin printing a warning
that the io_memory_size should be increased.
David Teigland [Fri, 1 Mar 2019 19:55:59 +0000 (13:55 -0600)]
config: add new setting io_memory_size
which defines the amount of memory that lvm will allocate
for bcache. Increasing this setting is required if it is
smaller than a single copy of VG metadata.
David Teigland [Wed, 30 Jan 2019 15:55:34 +0000 (09:55 -0600)]
Use "cachevol" to refer to cache on a single LV
and "cachepool" to refer to a cache on a cache pool object.
The problem was that the --cachepool option was being used
to refer to both a cache pool object, and to a standard LV
used for caching. This could be somewhat confusing, and it
made it less clear when each kind would be used. By
separating them, it's clear when a cachepool or a cachevol
should be used.
Previously:
- lvm would use the cache pool approach when the user passed
a cache-pool LV to the --cachepool option.
- lvm would use the cache vol approach when the user passed
a standard LV in the --cachepool option.
Now:
- lvm will always use the cache pool approach when the user
uses the --cachepool option.
- lvm will always use the cache vol approach when the user
uses the --cachevol option.
David Teigland [Tue, 26 Feb 2019 20:31:44 +0000 (14:31 -0600)]
logging: new config settings to specify debug fields
For users who do not want all of the fields included
in debug lines, let them specify in lvm.conf which
fields to include. timestamp, command[pid], and
file:line fields can all be disabled.
Use the correct loop variable within the loop, instead of reusing the
initial value. Table lines after the first don't get terminated in
the right place.
David Teigland [Wed, 13 Feb 2019 20:21:56 +0000 (14:21 -0600)]
pvscan: autoactivate a VG once
When a VG has multiple PVs, and all those PVs come online
at the same time, concurrent pvscans for each PV will all
create the individual pvid files, and all will often see
the VG is now complete. This causes each of the pvscan
commands to think it should activate the VG, so there
are multiple activations of the same VG. The vg lock
serializes them, and only the first pvscan actually does
the activation, but there is still a lot of extra overhead
and time used by the other pvscans that attempt to
activate the already active VG. This can lead to a backlog
of pvscans and timeouts.
To fix this, this adds a new /run/lvm/vgs_online/ dir that
works like the existing /run/lvm/pvs_online/ dir. Each pvscan
that wants to activate a VG will first try to exlusively create
the file vgs_online/<vgname>. Only the first pvscan will
succeed, and that one will do the VG activation. The other
pvscans will find the vgname file exists and will not do the
activation step.
When a PV goes offline, the vgs_online file for the corresponding
VG is removed. This allows the VG to be autoactivated again
when the PV comes online again. This requires that the vgname be
stored in the pvid files.
David Teigland [Wed, 13 Feb 2019 19:35:24 +0000 (13:35 -0600)]
pvscan: fix autoactivation from concurrent pvscans
Use a file lock to ensure that only one pvscan will do
initialization of pvs_online, otherwise multiple concurrent
pvscans may all see an empty pvs_online directory and
do initialization.
The pvscan that is doing initialization should also only
attempt to activate complete VGs.
David Teigland [Wed, 13 Feb 2019 21:23:43 +0000 (15:23 -0600)]
hints: fix recreating hints from pvscan
When aay was included in the pvscan --cache command,
the activation part was complaining about the unusual
state of the hint file since it had been recreated
just prior.
David Teigland [Tue, 5 Feb 2019 16:15:40 +0000 (10:15 -0600)]
apply obtain_device_list_from_udev to all libudev usage
udev_dev_is_md_component and udev_dev_is_mpath_component
are not used for obtaining the device list, but they still
use libudev for device info. When there are problems with
udev, these functions can get stuck. So, use the existing
obtain_device_list_from_udev config setting to also control
whether these "is component" functions are used, which gives
us a way to avoid using libudev entirely when it's causing
problems.
Zdenek Kabelac [Mon, 28 Jan 2019 18:50:41 +0000 (19:50 +0100)]
thin: select chunk size as power of 2
Whenever thin-pool chunk size is unspecified and left for lvm calculation
try to select the size as nearest highest power-of-2 instead of
just being a multiple of 64KiB.
Zdenek Kabelac [Mon, 21 Jan 2019 14:37:51 +0000 (15:37 +0100)]
lv_manip: better work with PERCENT_VG modifier with lvresize
Fixing recent commit 022ebb0cfebee4ac8fdbe4e0c61e85db1038a115
Resize already has size that needs to be counted with,
otherwise upsizing operation could turn into size reduction one.
Zdenek Kabelac [Sun, 20 Jan 2019 23:48:05 +0000 (00:48 +0100)]
vdo: minor API cleanup
Since the parse_vdo_pool_status() become vdo_manip API part,
and there will be no 'dm' matching status parser,
the API can be simplified and closely match thin API here.
Zdenek Kabelac [Fri, 18 Jan 2019 21:27:02 +0000 (22:27 +0100)]
lv_manip: better work with PERCENT_VG modifier
When using 'lvcreate -l100%VG' and there is big disproportion between
real available space and requested setting - automatically fallback
to 100%FREE.
Difference can be seen when VG is big and already most space was
allocated, so the requestion 100%VG can end (and by spec for % modifier
it's correct) as LV with size of 1%VG. Usually this is not a big
problem - buit in some cases - like cache-pool allocation, this
can result a big difference for chunksize selection.
With this patch it's more closely match common-sense logic without
the need of reitteration of too big changes in lvm2 core ATM.
TODO: in the future there should be allocator solving all allocations
in a single call.