Zdenek Kabelac [Tue, 27 Aug 2019 10:18:47 +0000 (12:18 +0200)]
activation: use cmd pending mem for pending_delete
Since we need to preserve allocated strings across 2 separate
activation calls of '_tree_action()' we need to use other mem
pool them dm->mem - but since cmd->mem is released between
individual lvm2 locking calls, we rather introduce a new separate
mem pool just for pending deletes with easy to see life-span.
(not using 'libmem' as it would basicaly keep allocations over
the whole lifetime of clvmd)
This patch is fixing previous commmit where the memory was
improperly used after pool release.
Zdenek Kabelac [Mon, 26 Aug 2019 15:19:16 +0000 (17:19 +0200)]
configure: check for prlimit
Update configure and make code compilable if prlimit() is not present.
Since the code is suspicious do not cope yet with it's replacement
with set/getrlimit().
Zdenek Kabelac [Mon, 26 Aug 2019 11:28:17 +0000 (13:28 +0200)]
lv_manip: add synchronizations
New udev in rawhide seems to be 'dropping' udev rule operations for devices
that are no longer existing - while this is 'probably' a bug - it's
revealing moments in lvm2 that likely should not run in a single
transaction and we should wait for a cookie before submitting more work.
TODO: it seem more 'error' paths should always include synchronization
before starting deactivating 'just activated' devices.
We should probably figure out some 'automatic' solution for this instead
of placing sync_local_dev_name() all over the place...
Zdenek Kabelac [Mon, 26 Aug 2019 11:28:00 +0000 (13:28 +0200)]
cache: improve vgremove loop
Support internal removal of 'cache origin' volume - which we
do not normally expose to a user - however internal processing
loops may hit this condition (depending on order of list LVs).
So when this operation is internally requested - we automatically
try to remove it's 'holding' LV (cache LV) - which will also
remove the origin.
Zdenek Kabelac [Mon, 26 Aug 2019 13:13:55 +0000 (15:13 +0200)]
snapshot: always activate
Drop the 'cluster-only' optimization so we do resume ALL device
before we try to wait on cookie before 'removal' operation.
It's more correct order of operation - alhtough possibly slightly
less efficient - but until we have correct list of operations
'in-progress' we can't do anything better.
However we have operations like 'snapshot merge' where we are
resuming device tree in 2 subsequent activation calls - so
1st such call will still have suspened devices and no chance
to push 'remove' ioctl.
Since we curently cannot easily solve this by doing just single
activation call (which would be preferred solution) - we introduce
a preservation of pending_delete via command structure and
then restore it on next activation call.
This way we keep to remove devices later - although it might be
not the best moment - this may need futher tunning.
Also we don't keep the list of operation in 1 trasaction
(unless we do verify udev symlinks) - this could probably
also make it more correct in terms of which 'remove' can
be combined we already running 'resume'.
Zdenek Kabelac [Fri, 16 Aug 2019 21:49:38 +0000 (23:49 +0200)]
dmsetup: debug print
Udev debugging is a bit tricky, so to more easily pair cookie ID,
which is the lowest 16 bit - print cookie as hexa number.
This simplify pairing of processed cookies while the 'higher bit flags'
are changed for the same cookie.
Zdenek Kabelac [Fri, 16 Aug 2019 21:49:59 +0000 (23:49 +0200)]
activation: add synchronization point
Resuming of 'error' table entry followed with it's dirrect removal
is now troublesame with latest udev as it may skip processing of
udev rules for already 'dropped' device nodes.
As we cannot 'synchronize' with udev while we know we have devices
in suspended state - rework 'cleanup' so it collects nodes
for removal into pending_delete list and process the list with
synchronization once we are without any suspended nodes.
Zdenek Kabelac [Tue, 20 Aug 2019 10:23:08 +0000 (12:23 +0200)]
pvmove: add missing synchronization
Between 'resume' and 'remove' we need to wait for udev to synchronize,
otherwise udev may 'skip' resume event processing if the udev node
is already gone.
Zdenek Kabelac [Tue, 20 Aug 2019 10:30:25 +0000 (12:30 +0200)]
pvmove: correcting read_ahead setting
When pvmove is finished, we do a tricky operation since we try to
resume multiple different device that were all joined into 1 big tree.
Currently we use the infromation from existing live DM table,
where we can get list of all holders of pvmove device.
We look for these nodes (by uuid) in new metadata, and we do now a full
regular device add into dm tree structure. All devices should be
already PRELOAD with correct table before entering suspend state,
however for correctly working readahead we need to put correct info
also into RESUME tree. Since table are preloaded, the same table
is skip and resume, but correct read ahead is now set.
David Teigland [Thu, 1 Aug 2019 20:04:10 +0000 (15:04 -0500)]
improve duplicate pv handling for md components
Eliminate md components at the start so they don't
interfere with actual duplicates, and don't need
to be removed later. This also allows for choosing
no copy of a PVID if they all happen to be md
components.
David Teigland [Thu, 1 Aug 2019 19:43:19 +0000 (14:43 -0500)]
md component detection addition in vg_read
Usually md components are eliminated in label scan and/or
duplicate resolution, but they could sometimes get into
the vg_read stage, where set_pv_devices compares the
device to the PV.
If set_pv_devices runs an md component check and finds
one, vg_read should eliminate the components.
In set_pv_devices, run an md component check always
if the PV is smaller than the device (this is not
very common.) If the PV is larger than the device,
(more common), do the component check when the config
setting is "auto" (the default).
dmeventd: avoid bail out preventing repair in raid plugin
Problem:
even though dead raid component devices are detected, the
raid plugin is bailing out thus preventing a repair attempt.
Rational:
in case of component device errors, the MD resynchronization
thread runs in parallel with the thrown event being processed
by the raid plugin. The plugin retrieves the raid device status
but that still reflects insync regions as 0 (when it should
already be total regions) because the MD thread didn't update it yet.
Solution:
Remove the insync regions check and let lvconvert carry out its
pre-repair checks and optionally carry out a repair attempt.
Zdenek Kabelac [Mon, 17 Jun 2019 20:47:35 +0000 (22:47 +0200)]
tests: replaces grep -q usage
Since we use 'set -euE -o pipefail' for shell execution,
any failure of any command in the 'piped' shell can result
in failure of whole executed chain - resulting in typically
unsually test skip, that was left unnoticed.
Since checked command have usually short output, the simplest
fix seems to be to let grep parse whole output instead
of quiting after first match.
Fix versioning for updated symbols dm_stats_create_region
and dm_stats_create_region.
Only the latest symbol should have global entry.
Since I'm not sure what is currenlty the best option for
old symbols - we added support for easy commenting of them
(so we do not lose information when the symbol appeared
for the first time.)
Note: some old already deleted symbols should have been
restored as comments.
David Teigland [Thu, 1 Aug 2019 15:06:47 +0000 (10:06 -0500)]
vgcreate/vgextend: restrict PVs with mixed block sizes
Avoid having PVs with different logical block sizes in the same VG.
This prevents LVs from having mixed block sizes, which can produce
file system errors.
The new config setting devices/allow_mixed_block_sizes (default 0)
can be changed to 1 to return to the unrestricted mode.
David Teigland [Fri, 26 Jul 2019 19:21:08 +0000 (14:21 -0500)]
Fix rounding writes up to sector size
Do this at two levels, although one would be enough to
fix the problem seen recently:
- Ignore any reported sector size other than 512 of 4096.
If either sector size (physical or logical) is reported
as 512, then use 512. If neither are reported as 512,
and one or the other is reported as 4096, then use 4096.
If neither is reported as either 512 or 4096, then use 512.
- When rounding up a limited write in bcache to be a multiple
of the sector size, check that the resulting write size is
not larger than the bcache block itself. (This shouldn't
happen if the sector size is 512 or 4096.)
David Teigland [Mon, 1 Jul 2019 20:00:34 +0000 (15:00 -0500)]
metadata: extend writes to zero space
Previously, consecutive copies of metadata would have garbage
data in the space between them. After metadata wrapping,
the garbage would be portions of old metadata. This made
analysis of the metadata area more difficult.
This would happen because the start of new copy of metadata
is advanced from the end of the last copy to start at the
next 512 byte boundary.
Zero the space between consecutive copies of metadata by
extending each metadata write to end at the next 512 byte
boundary. The size of the metadata itself is not extended,
only the write. The buffer being written contains the
metadata text followed by the necessary number of zeros.
David Teigland [Tue, 9 Jul 2019 19:48:31 +0000 (14:48 -0500)]
enable full md component detection at the right time
An active md device with an end superblock causes lvm to
enable full md component detection. This was being done
within the filter loop instead of before, so the full
filtering of some devs could be missed.
Also incorporate the recently added config setting that
controls the md component detection.
which caused pvscan to not consider a PV online (for purposes
of event based activation) if the PV and device sizes differed.
This helped to avoid mistaking MD components for PVs, and is
replaced by triggering an md component check when PV and device
sizes differ (which happens in set_pv_device).
lvconvert: allow --stripes/--stripesize in 'mirror' conversions
This allows the creation of a striped mirror leg(s) during upconvert
by adding lvconvert command line options --stripes/--stripesize
for 'mirror' to tools/command-lines.in.
In case multiple mirror legs are being added, all will have the
same requested striped layout.
Peter Rajnoha [Thu, 4 Jul 2019 10:57:55 +0000 (12:57 +0200)]
udev: do not overwrite ID_MODEL in 69-dm-lvm-metad.rules
We've been assigning this in 69-dm-lvm-metad.rules:
ENV{ID_MODEL}="LVM PV $env{ID_FS_UUID_ENC} on /dev/$name"
This was for the description to appear for each systemd device
unit representing this device, for example:
$systemctl -a | grep "LVM PV"
dev-block-252:2.device loaded active plugged LVM PV JhxC7B-YTgk-3jIU-5GVo-c4gV-W8t3-UUz06p on /dev/vda2 2
dev-disk-by\x2did-lvm\x2dpv\x2duuid\x2dJhxC7B\x2dYTgk\x2d3jIU\x2d5GVo\x2dc4gV\x2dW8t3\x2dUUz06p.device loaded active plugged LVM PV JhxC7B-YTgk-3jIU-5GVo-c4gV-W8t3-UUz06p on /dev/vda2 2
...
However, there could be an actual ID_MODEL that people are interested in
more than the fact that this is an LVM PV and so we shouldn't overwrite
the value.
Also, we already have a symlink /dev/disk/by-id/lvm-pv-uuid-<PV_UUID>
created which is then reflected as device unit (all device's symlinks
have systemd device unit representation) so we can still reach this
information in systemd unit listings even without setting the ID_MODEL.
David Teigland [Tue, 2 Jul 2019 15:59:40 +0000 (10:59 -0500)]
cache: warn and prompt for writeback with cachevol
The cache repair utility does not yet work with a cachevol
(where metadata and data exist on the same LV.) So, warn
and prompt if writeback is specified with a cachevol.
David Teigland [Fri, 21 Jun 2019 18:37:11 +0000 (13:37 -0500)]
exported vg handling
The exported VG checking/enforcement was scattered and
inconsistent. This centralizes it and makes it consistent,
following the existing approach for foreign and shared
VGs/PVs, which are very similar to exported VGs/PVs.
The access policy that now applies to foreign/shared/exported
VGs/PVs, is that if a foreign/shared/exported VG/PV is named
on the command line (i.e. explicitly requested by the user),
and the command is not permitted to operate on it because it
is foreign/shared/exported, then an access error is reported
and the command exits with an error. But, if the command is
processing all VGs/PVs, and happens to come across a
foreign/shared/exported VG/PV (that is not explicitly named on
the command line), then the command silently skips it and does
not produce an error.
A command using tags or --select handles inaccessible VGs/PVs
the same way as a command processing all VGs/PVs, and will
not report/return errors if these inaccessible VGs/PVs exist.
The new policy fixes the exit codes on a somewhat random set of
commands that previously exited with an error if they were
looking at all VGs/PVs and an exported VG existed on the system.
There should be no change to which commands are allowed/disallowed
on exported VGs/PVs.
Certain LV commands (lvs/lvdisplay/lvscan) would previously not
display LVs from an exported VG (for unknown reasons). This has
not changed. The lvm fullreport command would previously report
info about an exported VG but not about the LVs in it. This
has changed to include all info from the exported VG.
David Teigland [Tue, 11 Jun 2019 21:17:24 +0000 (16:17 -0500)]
scanning: open devs rw when rescanning for write
When vg_read rescans devices with the intention of
writing the VG, the label rescan can open the devs
RW so they do not need to be closed and reopened
RW in dev_write_bytes.
David Teigland [Tue, 18 Jun 2019 21:10:06 +0000 (16:10 -0500)]
metadata: include description with command in metadata areas
Previously the VG metadata description field (which contains
the command line) was only included in backup/archive copies
of the metadata. Now also include it in the metadata written
to the metadata areas.
David Teigland [Fri, 14 Jun 2019 14:26:08 +0000 (09:26 -0500)]
fix man page generation
The man page generation for pvchange/lvchange/vgchange was
incorrect (leaving out some option listings) as a result of
commit e225bf5 "fix command definition for pvchange -a"
Zdenek Kabelac [Tue, 11 Jun 2019 14:40:44 +0000 (16:40 +0200)]
tests: correct checked target name
So when the target name happened to be a suffix of another one,
the grep was filtering incorrect line
(i.e. dm-cache && dm-writecache) - so do a line head matching.
David Teigland [Mon, 10 Jun 2019 16:35:26 +0000 (11:35 -0500)]
fix command definition for pvchange -a
The -a was being included in the set of "one or more"
options instead of an actual required option. Even
though the cmd def was not implementing the restrictions
correctly, the command internally was.
Adjust the cmd def code which did not support a command
with some real required options and a set of "one or more"
options.
David Teigland [Fri, 7 Jun 2019 19:30:03 +0000 (14:30 -0500)]
vgsplit: simplify vg creation
The way that this command now uses the global lock
followed by a label scan, it can simply check if the
new VG name exists, and if not lock it and create it.
David Teigland [Mon, 10 Jun 2019 15:07:30 +0000 (10:07 -0500)]
locking: reset global_ex flag at end of cmd
These two flags may be not reset at the end of
the command when the unlock is implicit, which
is a problem if the cmd struct is reused.
Clear the flags in the general fin_locking.
Marian Csontos [Mon, 10 Jun 2019 15:05:04 +0000 (17:05 +0200)]
Merge remote-tracking branch 'origin/master'
* origin/master: (22 commits)
tests: add metadata-bad-mdaheader.sh
tests: add metadata-bad-text.sh
tests: add outdated-pv.sh
tests: add metadata-old.sh
tests: add missing-pv missing-pv-unused
metadata.c: removed unused code
improve reading and repairing vg metadata
add a warning message when updating old metadata
vgcfgbackup add error messages
vgck --updatemetadata is a new command
move pv header repairs to vg_write
process_each_pv handle outdated pvs
move wipe_outdated_pvs to vg_write
create separate lvmcache update functions for read and write
fix vg_commit return value
change args for text label read function
add mda arg to add_mda
keep track of which mdas have old metadata in lvmcache
ability to keep track of outdated pvs in lvmcache
ability to keep track of bad mdas in lvmcache
...