Peter Rajnoha [Thu, 30 Aug 2018 10:35:58 +0000 (12:35 +0200)]
scripts: add After=rbdmap.service to {lvm2-activation-net,blk-availability}.service
We need to have Ceph RBD devices mapped first before use in a stack
where LVM is on top so make sure rbdmap.service is called before
generated lvm2-activation-net.service.
On shutdown, we need to stop blk-availability first before we stop the
rbdmap.service.
Zdenek Kabelac [Mon, 27 Aug 2018 08:18:26 +0000 (10:18 +0200)]
dmeventd: lvm2 plugin uses envvar registry
Thin plugin started to use configuble setting to allow to configure
usage of external scripts - however to read this value it needed to
execute internal command as dmeventd itself has no access to lvm.conf
and the API for dmeventd plugin has been kept stable.
The call of command itself was not normally 'a big issue' until users
started to use higher number of monitored LVs and execution of command
got stuck because other monitored resource already started to execute
some other lvm2 command and become blocked waiting on VG lock.
This scenario revealed necesity to somehow avoid calling lvm2 command
during resource registration - but this requires bigger changes - so
meanwhile this patch tries to minimize the possibility to hit this race
by obtaining any configurable setting just once - such patch is small
and covers majority of problem - yet better solution needs to be
introduced likely with bigger rework of dmeventd.
TODO: Avoid blocking registration of resource with execution of lvm2
commands since those can get stuck waiting on mutexes.
David Teigland [Mon, 27 Aug 2018 16:42:25 +0000 (11:42 -0500)]
lvmetad: fix pvs for many devices
When using lvmetad, 'pvs' still evaluates full filters
on all devices (lvmetad only provides info about PVs,
but pvs needs to report info about all devices, at
least sometimes.)
Because some filters read the devices, pvs still reads
every device, even with lvmetad (i.e. lvmetad is no help
for the pvs command.) Because the device reads are not
being managed by the standard label scan layer, but only
happen incidentally through the filters, there is nothing
to control and limit the bcache content and the open file
descriptors for the devices. When there are a lot of devs
on the system, the number of open fd's excedes the limit
and all opens begin failing.
The proper solution for this would be for pvs to really
use lvmetad and not scan devs, or for pvs to do a proper
label scan even when lvmetad is enabled. To avoid any
major changes to the way this has worked, just work around
this problem by dropping bcache and closing the fd after
pvs evaluates the filter on each device.
David Teigland [Mon, 27 Aug 2018 16:15:35 +0000 (11:15 -0500)]
lvmetad: improve scan for pvscan all
For 'pvscan --cache' avoid using dev_iter in the loop
after the label_scan by passing the necessary devs back
from the label_scan for the continued pvscan.
The dev_iter functions reapply the filters which will
trigger more io when we don't need or want it. With
many devs, incidental opens from the filters (not controlled
by the label scan) can lead to too many open files.
David Teigland [Fri, 24 Aug 2018 19:46:51 +0000 (14:46 -0500)]
bcache: reduce MAX_IO to 256
This is the number of concurrent async io requests that
the scan layer will submit to the bcache layer. There
will be an open fd for each of these, so it is best to
keep this well below the default limit for max open files
(1024), otherwise lvm may get EMFILE from open(2) when
there are around 1024 devices to scan on the system.
"lvconvert --type linear RaidLV" on striped and raid4/5/6/10
have to provide the convenient interim layouts. Fix involves
a cleanup to the convenience type function.
As a result of testing, add missing sync waits to
lvconvert-raid-reshape-linear_to_raid6-single-type.sh.
Zdenek Kabelac [Tue, 7 Aug 2018 08:34:17 +0000 (10:34 +0200)]
mirror: fix splitmirrors for mirror type
With improved mirror activation code --splitmirror issue poppedup
since there was missing proper preload code and deactivation
for splitted mirror leg.
David Teigland [Thu, 2 Aug 2018 16:26:59 +0000 (11:26 -0500)]
mirrors: fix read_only_volume_list
If a mirror LV is listed in read_only_volume_list, it would
still be activated rw. The activation would initially be
readonly, but the monitoring function would immediately
change it to rw. This was a regression from commit
David Teigland [Wed, 1 Aug 2018 15:26:28 +0000 (10:26 -0500)]
vgcreate: close exclusive fd after pvcreate
When vgcreate does an automatic pvcreate, it opens the
dev with O_EXCL to ensure no other subsystem is using
the device. This exclusive fd remained in bcache and
prevented activation parts of lvm from using the dev.
This appeared with vgcreate of a sanlock VG because of
the unique combination where the dev is not yet a PV,
so pvcreate is needed, and the vgcreate also creates
and activates an internal LV for sanlock.
Fix this by closing the exclusive fd after it's used
by pvcreate so that it won't interfere with other
bits of lvm that may try to use the device.
Bryn M. Reeves [Thu, 28 Jun 2018 13:25:30 +0000 (14:25 +0100)]
dmsetup: fix error propagation in _display_info_cols()
Commit 3f35146 added a check on the value returned by the
_display_info_cols() function:
1024 if (!_switches[COLS_ARG])
1025 _display_info_long(dmt, &info);
1026 else
1027 r = _display_info_cols(dmt, &info);
1028
1029 return r;
This exposes a bug in the dmstats code in _display_info_cols:
the fact that a device has no regions is explicitly not an error
(and is documented as such in the code), but since the return
code is not changed before leaving the function it is now treated
as an error leading to:
# dmstats list
Command failed.
When no regions exist.
Set the return code to the correct value before returning.
lvconvert: reject conversions of LVs under snapshot
Conversions of LVs under snapshot to thinpool or cachepool
correctly fail but leave them inactive and provide cryptic
error messages like 'Internal error: #LVs (10) != #visible
LVs (2) + #snapshots (1) + #internal LVs (5) in VG VG'.
David Teigland [Mon, 23 Jul 2018 16:08:12 +0000 (11:08 -0500)]
lvconvert: restrict command matching for no option variant
The 'lvconvert LV' command def has caused multiple problems
for command matching because it matches the required options
of any lvconvert command. Any lvconvert with incorrect options
ends up matching 'lvconvert LV', which then produces an error
about incorrect options being used for 'lvconvert LV'. This
prevents suggestions from nearest-command partial command matches.
Add a special case for 'lvconvert LV' so that it won't be used
as a partial match for a command that has options specified.
When lvm2 command is executed in test mode, discard ioctl is skipped.
This may cause even data-loose in case, issuing discard for released
areas was enabled and user 'tested' lvreduce.
When allocating thin-pool with more then 1 device - try to
allocate 'metadataLV' with reuse of log-type allocation for mirror LV.
It should be naturally place on other device then 'dataLV'.
However due to somewhat hard to follow allocation logic code,
it's been rejected allocation in cases where there was not
enough space for data or metadata on single PV, thus to successed,
usage of segments was mandatory.
to enforce separe meta and data LV - on default settings, this is not
enable thus segment allocation is meant to work.
NOTE:
As already said - the original intention of this whole 'if()' is unclear,
so try to split this test into multiple more simple tests that are more readable.
Zdenek Kabelac [Sat, 30 Jun 2018 09:05:14 +0000 (11:05 +0200)]
memlock: extend exception list
Amound of linked libraries grows.
Most of them we don't need to lock in, since we are not using
them in locked section, so skip locking them in memory.
David Teigland [Tue, 26 Jun 2018 16:58:11 +0000 (11:58 -0500)]
scan: reopen RDWR during rescan
Commit a30e6222799:
"scan: work around udev problems by avoiding open RDWR"
had us reopen a device RDWR in the write function. Since
we know earlier that the command intends to write to devices
in the VG, we can reopen the VG's devices RDWR during the
rescan instead of waiting until the writes to happen.
lvconvert: support linear <-> striped convenience conversions
"lvconvert --type {linear|striped|raid*} ..." on a striped/linear
LV provides convenience interim type to convert to the requested
final layout similar to the given raid* <-> raid* conveninece types.
Whilst on it, add missing raid5_n convenince type from raid5* to raid10.
Zdenek Kabelac [Tue, 12 Jun 2018 14:27:42 +0000 (16:27 +0200)]
systemd: add conficting sockets
Since we are using "DefaultDependencies=no" we do not get automatic STOP
job on socket connection - so automatically refuse connection on
shutdown by adding this Conflict definition to socket Unit.
David Teigland [Wed, 20 Jun 2018 16:32:45 +0000 (11:32 -0500)]
scan: work around udev problems by avoiding open RDWR
udev creates a train wreck of events if we open devices
with RDWR. Until we can fix/disable/scrap udev, work around
this by opening RDONLY and then closing/reopening RDWR when
a write is needed. This invalidates the bcache blocks for
the device before writing so it can trigger unnecessary
rereading.
David Teigland [Mon, 18 Jun 2018 19:10:48 +0000 (14:10 -0500)]
clvmd: fix leak of saved_vg struct
Commit c016b573ee32f "clvmd: separate saved_vg from vginfo"
created a separate hash table for the saved_vg structs.
The vg's referenced by the saved_vg struct were all being
freed properly, but the svg wrapper struct itself was not
being freed.
David Teigland [Wed, 13 Jun 2018 20:54:39 +0000 (15:54 -0500)]
lvmlockd: update method for changing clustered VG
The previous method for forcibly changing a clustered VG to
a local VG involved using -cn and --config locking_type=0.
Add an alternative that is consistent with other forced
lock type changes:
vgchange --locktype none --lockopt force.