When merging, splitting or renaming VGs, use a new PV status flag
PV_MOVED_VG to mark the PVs that hold metadata with the old VG name and
use this to provide PV-level granularity instead of incorrectly assuming
all PVs in the VG are the same.
This will cause the dmeventd to be properly stopped at shutdown (after
all the dmeventd clients unregistered their devices from monitoring)
with dm-event.service's stop action (there's no direct action defined
for the "stop" so systemd sends SIGTERM instead).
Before, we let dmeventd to get killed only as part of the very last
SIGTERM/SIGKILL for all the remaining processes late in the shutdown
sequence so we may have missed some logs if dmeventd encountered an
error during its shutdown (logging facilities are already off at this
late time in shutdown sequence).
Zdenek Kabelac [Wed, 4 Oct 2017 11:58:21 +0000 (13:58 +0200)]
dmeventd: schedule exit on break
When dmeventd receives SIGTERM/INT/HUP/QUIT it validates if exit is possible.
If there was any device still monitored, such exit request used to
be ignored/refused. This 'usually' worked reasonably well, however if there
is very short time period between last device is unmonitored and signal
reception - there was possibility such EXIT was ignored, as dmeventd has
not yet got into idle state even commands like 'vgchange -an' has already
finished.
This patch changes logic towards scheduling EXIT to the nearest
point when there is no monitored device.
EXIT is never forgotten.
NOTE: if there is only a single monitored device and someone sends
SIGTERM and later someone uses i.e. 'lvchange --refresh' after
unmonitoring dmeventd will exit and new instance needs to be
started.
Tony Asleson [Mon, 25 Sep 2017 20:20:03 +0000 (15:20 -0500)]
lvmdbusd: thread stacks dump support
If you send a SIGUSR1 (10) to the daemon it will dump all the
threads current stacks to stdout. This will be useful when the
daemon is apparently hung and not processing requests.
Tony Asleson [Fri, 22 Sep 2017 14:59:50 +0000 (09:59 -0500)]
lvmdbusd: Main thread exception logging
Make sure that any and all code that executes in the main thread is
wrapped with a try/except block to ensure that at the very least
we log when things are going wrong.
Changing the VG of a PV uses the same on-disk mechanism as vgrename.
This relies on recognising both the old and new VG names. Prior to this
patch the vgsplit code incorrectly provided the new VG name twice
instead of the old and new ones. This lead the low-level mechanism not
to recognise the device as already belonging to a VG and so paying no
attention to the location of its existing metadata, sometimes partly
overwriting it and then later trying to read the corrupt metadata and
issuing a checksum error.
Tony Asleson [Wed, 20 Sep 2017 21:39:35 +0000 (16:39 -0500)]
lvmdbusd: Ensure vg_uuid is present
In some cases we are seeing where there are no VGs, but the data returned from
lvm shows that the PVs have the following for the VG:
"vg_name":"[unknown]", "vg_uuid":""
The code was only checking for the exitence of the VG name and we called into
the function get_object_path_by_uuid_lvm_id which requires both the VG name and
the LV name to exist (asserts this) which results in the following stack trace:
Traceback (most recent call last):
File "/home/tasleson/lvm2/daemons/lvmdbusd/utils.py", line 563, in runner
obj._run()
File "/home/tasleson/lvm2/daemons/lvmdbusd/utils.py", line 584, in _run
self.rc = self.f(*self.args)
File "/home/tasleson/lvm2/daemons/lvmdbusd/fetch.py", line 26, in
_main_thread_load
cache_refresh=False)[1]
File "/home/tasleson/lvm2/daemons/lvmdbusd/pv.py", line 48, in load_pvs
emit_signal, cache_refresh)
File "/home/tasleson/lvm2/daemons/lvmdbusd/loader.py", line 37, in common
objects = retrieve(search_keys, cache_refresh=False)
File "/home/tasleson/lvm2/daemons/lvmdbusd/pv.py", line 40, in
pvs_state_retrieve
p["pv_attr"], p["pv_tags"], p["vg_name"], p["vg_uuid"]))
File "/home/tasleson/lvm2/daemons/lvmdbusd/pv.py", line 84, in __init__
vg_uuid, vg_name, vg_obj_path_generate)
File "/home/tasleson/lvm2/daemons/lvmdbusd/objectmanager.py", line 318,
in get_object_path_by_uuid_lvm_id
assert uuid
AssertionError
Tony Asleson [Wed, 20 Sep 2017 20:46:16 +0000 (15:46 -0500)]
lvmdbusd: Fix hang in MThreadRunner
When executing in the main thread, if we encounter an exception we
will bypass the notify_all call on the condition and the calling thread
never wakes up.
@staticmethod
def runner(obj):
# noinspection PyProtectedMember
Exception thrown here
----> obj._run()
So the following code doesn't run, which causes calling thread to hang
with obj.cond:
obj.function_complete = True
obj.cond.notify_all()
Additionally for some unknown reason the stderr is lost.
Best guess is it's something to do with scheduling a python function
into the GLib.idle_add. That made finding issue quite difficult.
Peter Rajnoha [Thu, 21 Sep 2017 13:23:24 +0000 (15:23 +0200)]
blkdeactivate: also try to unmount /boot on blkdeactivate -u if on top of supported device
There's nothing special about /boot other than it's used during boot.
But when blkdeactivate is called either on all devices or including a
device where the /boot is on top, we should also include this mount
point when doing unmount before deactivation of supported devices.
Peter Rajnoha [Thu, 21 Sep 2017 13:04:45 +0000 (15:04 +0200)]
blkdeactivate: add blkdeactivate -r wait option to wait for MD resync/recovery/reshape
The new blkdeactivate -r|mdraidoption wait causes blkdeactivate to wait
for any resync/recovery/reshape that is currently in progress before
deactivating the device.
If this option is used, blkdeactivate calls mdadm -W|--wait before
mdadm -S|--stop.
We're canonicalizing/escaping the names here and we're reusing the
variable name so the code doesn't need to use extra variables and
further assignments that may confuse us. Let's keep the code simple.
The
local name=(...$name)
is not the same as
local name
name=(...$name)
(I know various code-checking tools fuss about this and recommend
the 2nd way, but let's ignore those tools' nitpicking here please.)
Peter Rajnoha [Thu, 21 Sep 2017 15:03:42 +0000 (17:03 +0200)]
blkdeactivate: fix --{dm,lvm,mpath}options option name recognition
There was a typo in blkdeactivate --dmoption/--lvmoption/mpathoption,
it had missing "s" at the end and it was not recognized properly, only
short names for the options (-d/-l/-m).
David Teigland [Wed, 20 Sep 2017 15:51:52 +0000 (10:51 -0500)]
improve error messages when command rules fail
When certain cmd def RULE's fail, the error messages can
sometimes be confusing. This expands the error messages
to help clarify why the rule failed, especially in cases
where options are used incorrectly.
David Teigland [Tue, 19 Sep 2017 18:08:41 +0000 (13:08 -0500)]
pvmove: require LV name in a shared VG
In a shared VG, only allow pvmove with a named LV,
so that only PE's used by the LV will be moved.
The LV is then activated exclusively, ensuring that
the PE's being moved are not used from another host.
Previously, pvmove was mistakenly allowed on a full PV.
This won't work when LVs using that PV are active on
other hosts.
Enable handling of --poolmetadataspare so if user can prevent
creation of _pmspare volume during --repair operation (just
like during actual lvcreate or lvconvert) for pool volumes.
David Teigland [Thu, 14 Sep 2017 17:20:29 +0000 (12:20 -0500)]
lvcreate: use cmd defs to deny unspported lockd cases
In a shared VG, lvconvert must be used to create thin pools
and cache pools, not the lvcreate variants of those commands.
Deny these cases early in lvcreate using the new command defs.
Denying these cases deeper in the code was missing some
cleanup of the partially completed command.
Zdenek Kabelac [Fri, 25 Aug 2017 09:55:38 +0000 (11:55 +0200)]
daemonize: more unified code
ATM we have several instances of daemonizing code.
Each has its 'special' logic so not completely easy
to unify them all into a single routine.
Start to unify them and use one strategy for rediricting
all input/outpus to /dev/null - use 'dup2' function for this
and open /dev/null before fork to make sure it's available.
Zdenek Kabelac [Fri, 25 Aug 2017 09:59:19 +0000 (11:59 +0200)]
locking: avoid descriptor leak for nonblocking mode
When file-locking mode failed on locking, such description was leaked
(typically not an issue since command usually exists afterwards).
So shirt close() at the end of function and use it in all error paths.
Also make sure, when interrrupt is detected, it's really not holding
lock and returns 0.
Zdenek Kabelac [Wed, 16 Aug 2017 12:12:48 +0000 (14:12 +0200)]
lvmlockd: shorter code
gcc warns here about storring 69 bytes in 64 byte array (losing
potentially 4 bytes from 'ls->name').
lvmlockd-core.c:2657:36: warning: ‘%s’ directive output may be truncated writing up to 64 bytes into a region of size 60 [-Wformat-truncation=]
snprintf(tmp_name, MAX_NAME, "REM:%s", ls->name);
^~
lvmlockd-core.c:2657:2: note: ‘snprintf’ output between 5 and 69 bytes into a destination of size 64
snprintf(tmp_name, MAX_NAME, "REM:%s", ls->name);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Replaced with slightly better code - but it still misses error path what
to do if the name would be truncated... - so added FIXME.
Also using all bytes for snprintf() buffer size
(as the size is with \0 included)
David Teigland [Tue, 15 Aug 2017 16:53:21 +0000 (11:53 -0500)]
lvmlockd: zero extended lvmlock LV
After the internal lvmlock LV (holding sanlock leases) is
extended to hold more leases, it needs to be zeroed.
sanlock expects to see either zeroed blocks or blocks
initialized with leases.
currently, lvcreate for sanlock find the free lock offset
from the beginning of the lvmlock every time.
after created thousands of lvs, it will issue thousands of read
ios for lvcreate to find free lock offset.
remeber the last free lock offset will greatly reduce the impact
Peter Rajnoha [Tue, 15 Aug 2017 11:23:51 +0000 (13:23 +0200)]
pvcreate: fix check for 2nd mda at end of disk fits if using pvcreate --restorefile
Fix code checking that the 2nd mda which is at the end of disk really
fits the available free space and avoid any DA and MDA interleaving when
we already have DA preallocated. This mainly applies when we're restoring
a PV from VG backup using pvcreate --restorefile where we may already have
some DA preallocated - this means the PV was in a VG before with already
allocated space from it (the LVs were created). Hence we need to avoid
stepping into DA - the MDA can never ever be inside in such case!
The code responsible for this calculation was already in
_text_pv_add_metadata_area fn, but it had a bug in the calculation where
we subtracted one more sector by mistake and then the code could still
incorrectly allocate the MDA inside existing DA. The patch also renames
the variable in the code so it doesn't confuse us in future.
Also, if the 2nd mda doesn't fit, don't silently continue with just 1
MDA (at the start of the disk). If 2nd mda was requested and we can't
create that due to unavailable space, error out correctly (the patch
also adds a test to shell/pvcreate-operation.sh for this case).
pvcreate: Use maximum metadata area size with --restorefile
If the PV was originally created with a larger-than-default
metadata area the restored one wasn't and might not even be
large enough to hold the metadata!
pvcreate: Wipe cached bootloaderarea when wiping label.
Previously the cache remembered an existing bootloaderarea and
reinstated it (without even checking for overlap) when asked to
write out the PV. pvcreate could write out an incorrect layout.
Zdenek Kabelac [Tue, 1 Aug 2017 16:17:06 +0000 (18:17 +0200)]
configure: tune BUILD_DMEVENTD
Drop 'DMEVENT' make variable and use BUILD_DMEVENTD like with other daemons.
NOTE: by default we do not build dmeventd - maybe time to change
as lot of build targets basically do need dmeventd...
Switch to use SYSTEMD_LIBS and avoid linking systemd library with
every linked tool when dbus notification are enabled.
(TODO add missing testing for lib presence)
Zdenek Kabelac [Tue, 1 Aug 2017 10:41:45 +0000 (12:41 +0200)]
configure: improve test for realtime clock
Check first if we need to even link -lrt - since clock functions
are normally emebeded with recent glibc (>=2.17)
Use standard RT_LIBS name.
Avoid duplicate test for realtime clock with lvmlockd
Show better error message when realtime clock support is missing or
disabled.
Link RT_LIBS explicitely with lvmlockd and lvmetad.
Avoid adding -g more then once for debug builds.
Avoid enabling DEBUG_MEM when we build multithreaded tools.
Link executables with -fPIE -pie and --export-dynamic LDFLAGS
Introduce PROGS_FLAGS to add option to pass flags for external libs.
Link lvm2 internally library only when really used.
Link DAEMON_LIBS with daemons.
Pass VALGRIND_CFLAGS internally
Set shell failure mode on couple places.
lvm2 warned about zeroing and too big chunksize (>=512KiB), but
only during lvconvert, so lvcreate was creating thin-pools
without any warning about possible slowness of thin provisioning
because of zeroing.