Zdenek Kabelac [Fri, 25 Aug 2017 09:55:38 +0000 (11:55 +0200)]
daemonize: more unified code
ATM we have several instances of daemonizing code.
Each has its 'special' logic so not completely easy
to unify them all into a single routine.
Start to unify them and use one strategy for rediricting
all input/outpus to /dev/null - use 'dup2' function for this
and open /dev/null before fork to make sure it's available.
Zdenek Kabelac [Fri, 25 Aug 2017 09:59:19 +0000 (11:59 +0200)]
locking: avoid descriptor leak for nonblocking mode
When file-locking mode failed on locking, such description was leaked
(typically not an issue since command usually exists afterwards).
So shirt close() at the end of function and use it in all error paths.
Also make sure, when interrrupt is detected, it's really not holding
lock and returns 0.
Zdenek Kabelac [Wed, 16 Aug 2017 12:12:48 +0000 (14:12 +0200)]
lvmlockd: shorter code
gcc warns here about storring 69 bytes in 64 byte array (losing
potentially 4 bytes from 'ls->name').
lvmlockd-core.c:2657:36: warning: ‘%s’ directive output may be truncated writing up to 64 bytes into a region of size 60 [-Wformat-truncation=]
snprintf(tmp_name, MAX_NAME, "REM:%s", ls->name);
^~
lvmlockd-core.c:2657:2: note: ‘snprintf’ output between 5 and 69 bytes into a destination of size 64
snprintf(tmp_name, MAX_NAME, "REM:%s", ls->name);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Replaced with slightly better code - but it still misses error path what
to do if the name would be truncated... - so added FIXME.
Also using all bytes for snprintf() buffer size
(as the size is with \0 included)
David Teigland [Tue, 15 Aug 2017 16:53:21 +0000 (11:53 -0500)]
lvmlockd: zero extended lvmlock LV
After the internal lvmlock LV (holding sanlock leases) is
extended to hold more leases, it needs to be zeroed.
sanlock expects to see either zeroed blocks or blocks
initialized with leases.
currently, lvcreate for sanlock find the free lock offset
from the beginning of the lvmlock every time.
after created thousands of lvs, it will issue thousands of read
ios for lvcreate to find free lock offset.
remeber the last free lock offset will greatly reduce the impact
Peter Rajnoha [Tue, 15 Aug 2017 11:23:51 +0000 (13:23 +0200)]
pvcreate: fix check for 2nd mda at end of disk fits if using pvcreate --restorefile
Fix code checking that the 2nd mda which is at the end of disk really
fits the available free space and avoid any DA and MDA interleaving when
we already have DA preallocated. This mainly applies when we're restoring
a PV from VG backup using pvcreate --restorefile where we may already have
some DA preallocated - this means the PV was in a VG before with already
allocated space from it (the LVs were created). Hence we need to avoid
stepping into DA - the MDA can never ever be inside in such case!
The code responsible for this calculation was already in
_text_pv_add_metadata_area fn, but it had a bug in the calculation where
we subtracted one more sector by mistake and then the code could still
incorrectly allocate the MDA inside existing DA. The patch also renames
the variable in the code so it doesn't confuse us in future.
Also, if the 2nd mda doesn't fit, don't silently continue with just 1
MDA (at the start of the disk). If 2nd mda was requested and we can't
create that due to unavailable space, error out correctly (the patch
also adds a test to shell/pvcreate-operation.sh for this case).
pvcreate: Use maximum metadata area size with --restorefile
If the PV was originally created with a larger-than-default
metadata area the restored one wasn't and might not even be
large enough to hold the metadata!
pvcreate: Wipe cached bootloaderarea when wiping label.
Previously the cache remembered an existing bootloaderarea and
reinstated it (without even checking for overlap) when asked to
write out the PV. pvcreate could write out an incorrect layout.
Zdenek Kabelac [Tue, 1 Aug 2017 16:17:06 +0000 (18:17 +0200)]
configure: tune BUILD_DMEVENTD
Drop 'DMEVENT' make variable and use BUILD_DMEVENTD like with other daemons.
NOTE: by default we do not build dmeventd - maybe time to change
as lot of build targets basically do need dmeventd...
Switch to use SYSTEMD_LIBS and avoid linking systemd library with
every linked tool when dbus notification are enabled.
(TODO add missing testing for lib presence)
Zdenek Kabelac [Tue, 1 Aug 2017 10:41:45 +0000 (12:41 +0200)]
configure: improve test for realtime clock
Check first if we need to even link -lrt - since clock functions
are normally emebeded with recent glibc (>=2.17)
Use standard RT_LIBS name.
Avoid duplicate test for realtime clock with lvmlockd
Show better error message when realtime clock support is missing or
disabled.
Link RT_LIBS explicitely with lvmlockd and lvmetad.
Avoid adding -g more then once for debug builds.
Avoid enabling DEBUG_MEM when we build multithreaded tools.
Link executables with -fPIE -pie and --export-dynamic LDFLAGS
Introduce PROGS_FLAGS to add option to pass flags for external libs.
Link lvm2 internally library only when really used.
Link DAEMON_LIBS with daemons.
Pass VALGRIND_CFLAGS internally
Set shell failure mode on couple places.
lvm2 warned about zeroing and too big chunksize (>=512KiB), but
only during lvconvert, so lvcreate was creating thin-pools
without any warning about possible slowness of thin provisioning
because of zeroing.
Table lines are separated by commas.
Devices are separated by semi-colons.
Flags is currently 'ro' or 'rw' (and might be extended in a
yet-to-be-defined way in future).
Any comma, semi-colon or backslash within a field is quoted by a
preceding backslash.
The format can later be supplied as input to dmsetup or even to the
booting kernel as an alternative way to set up devices.
Based on patches submitted by
Enric Balletbo i Serra <enric.balletbo@collabora.com>.
David Teigland [Wed, 12 Jul 2017 21:03:41 +0000 (16:03 -0500)]
vgchange: separate change locktype and allow recovery
Add an independent command definition for "vgchange --locktype",
and split the implementation out of the set of common metadata
changes. It is unlike normal metadata changes, and can only
be run by itself. (Changing the lock type is similar in
principle to changing the VG name or the VG system ID; it
effects the ability of any host to see or access the VG.)
At some point this command lost the ability to forcibly change
the lock type of a shared VG to "none" (making it a local VG).
This can be necessary to repair shared VGs (e.g. recovery steps
that occur in vg_read are disabled for shared VGs because
they are not locked properly, or recovering sanlock locks
when the PV holding them is lost.)
"vgchange --locktype none --lockopt force VG" is used as the
method of forcing the shared VG to become local so that it
can be repaired.
Since _deactivate_and_remove_lvs() is used in more then one place,
move the needed udev synchronization into this function so other
users automatically get correct fs state before next dm manipulation.
Assumption here is that this udev synchronization 'delay' may also
prevent to 'early' table reloads which might cause kernel problems
for md-core - but we may need more generic time-limited reload
frequency for raid devices.
Note: on udev-less system there will be almost no delay.
During test do a more close selection of visible devices.
If some test leaks a device with LVMTEST prefix, next
test should not be influnced (or parallel running one).
If the test is running in non-/dev dir - it's already protected
by full path with $PREFIX in it - however it test is running
in real /dev dir - there was no such protection and test
were confused when they have seen such leaked device.
Commit 9b4b5d449ef7045d33b8d8d7e69011d16f4bb8ab started to accept
rather way to wide set of strings we do not want to take for size.
i.e. lvresize -t vg0/lvol0 -LNaNM
Restore check for 'digit || locales defined 'dot' | '.'
(as we tend to take '.' even if locales uses ',')
API for strtod() or strtoul() needs reset of errno, before it's being
called. So add missing resets in missing places and some also some
errno validation for out-of-range numbers.
Since we are reading size as (double) we can get way bigger
number then just plain int64. So to make this check actually
more valid and usable do a maxsize compare in 'double'.
Avoid reading already released memory and do a continue directly.
Invalid read of size 1
at 0x1201B0: main_loop (clvmd.c:931)
by 0x11F640: main (clvmd.c:666)
Address 0x72ddef0 is 32 bytes inside a block of size 224 free'd
at 0x4C30D18: free (vg_replace_malloc.c:530)
by 0x54D6FD1: dm_free_wrapper (dbg_malloc.c:357)
by 0x122E6E: process_work_item (clvmd.c:2034)
by 0x123003: lvm_thread_fn (clvmd.c:2085)
by 0x590A3A8: start_thread (pthread_create.c:465)
by 0x5C3C7FE: clone (in /usr/lib64/libc-2.25.90.so)
Block was alloc'd at
at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
by 0x54D6EF1: dm_malloc_aux (dbg_malloc.c:286)
by 0x54D6F1C: dm_zalloc_aux (dbg_malloc.c:291)
by 0x54D6F96: dm_zalloc_wrapper (dbg_malloc.c:345)
by 0x11F89C: local_rendezvous_callback (clvmd.c:731)
by 0x1203D2: main_loop (clvmd.c:964)
by 0x11F640: main (clvmd.c:666)
Initialize mutex upfront any debugging and fix this report:
Mutex reinitialization: mutex 0x485d20, recursion count 0, owner 1.
at 0x4C38480: pthread_mutex_init_intercept (drd_pthread_intercepts.c:821)
by 0x4C38480: pthread_mutex_init (drd_pthread_intercepts.c:830)
by 0x11F359: main (clvmd.c:562)
mutex 0x485d20 was first observed at:
at 0x4C38F63: pthread_mutex_lock_intercept (drd_pthread_intercepts.c:885)
by 0x4C38F63: pthread_mutex_lock (drd_pthread_intercepts.c:898)
by 0x11E920: debuglog (clvmd.c:254)
by 0x11F1D8: main (clvmd.c:527)