Shift refresh of mirror table right into monitor_dev_for_events().
Use !vg_write_lock_held() to recognize use of lvchange/vgchange.
(this shall change if this would no longer work, but requires
futher some API changes).
With this patch dm mirror table is only refreshed when necassary.
Also update WARNING message about mirror usage without monitoring
and display LV name.
David Teigland [Thu, 26 Apr 2018 19:41:57 +0000 (14:41 -0500)]
clvmd: drop old saved_vg when returning new saved_vg
In some pvmove tests, clvmd uses the new (precommitted)
saved_vg, but then requests the old saved_vg, and
expects that the new saved_vg be returned instead of
the old. So, when returning the new saved_vg, forget
the old one so we don't return it again.
David Teigland [Thu, 26 Apr 2018 19:48:13 +0000 (14:48 -0500)]
scan: refresh filters before scan
The filters save information about devices that should
be ignored, so if we need to repeat a scan (unusual,
but happens in clvmd), we need to update the filters.
David Teigland [Wed, 25 Apr 2018 21:23:55 +0000 (16:23 -0500)]
pvmove: in fork mode destroy bcache in child
When pvmove was run in background mode and forks
instead of using lvmpolld, the child pvmove process
was not clearing the bcache from the parent, so all
the aio ops in the child were failing.
David Teigland [Tue, 24 Apr 2018 19:48:30 +0000 (14:48 -0500)]
clvmd: skip dev rescan after full scan
When clvmd does a full label scan just prior to
calling _vg_read(), pass a new flag into _vg_read
to indicate that the normal rescan of VG devs is
not needed.
David Teigland [Tue, 24 Apr 2018 18:19:48 +0000 (13:19 -0500)]
clvmd: reuse a vg struct for sequential LV operations
After reading a VG, stash it in lvmcache as "saved_vg".
Before reading the VG again, try to use the saved_vg.
The saved_vg is dropped on VG lock operations.
David Teigland [Tue, 24 Apr 2018 17:18:22 +0000 (12:18 -0500)]
lvmcache: rename suspended_vg to saved_vg
The copy of the VG which clvmd stashes in lvmcache should
not only be used between suspend and resume, but between
sequential LV operations in clvmd, so that clvmd does not
need to reread the VG for each one. Prepare for that by
renaming the stashed VG as "saved_vg".
Instead of using delayer device user 'zero' device and let mirror
do some real work which takes some time.
In case the test machine is too fast - mirror might need to be made bigger
to meet needed criteria.
Also move all test needed this 'zero' PV trick to the end of test
so $dev2 and $dev4 are covered with 'zero' and can take any amount of
write without consuming any real space.
David Teigland [Wed, 18 Apr 2018 21:29:42 +0000 (16:29 -0500)]
scan: skip device rescan in vg_read
For reporting commands (pvs,vgs,lvs,pvdisplay,vgdisplay,lvdisplay)
we do not need to repeat the label scan of devices in vg_read if
they all had matching metadata in the initial label scan. The
data read by label scan can just be reused for the vg_read.
This cuts the amount of device i/o in half, from two reads of
each device to one. We have to be careful to avoid repairing
the VG if we've skipped rescanning. (The VG repair code is very
poor, and will be redone soon.)
David Teigland [Mon, 9 Apr 2018 18:57:44 +0000 (13:57 -0500)]
bcache: intercept test mode before write
Don't allow writes in test mode. test mode should be
more sophisticated than just faking writes, and this
should be a last defense for cases where test mode is
not being checked correctly.
David Teigland [Mon, 9 Apr 2018 18:40:49 +0000 (13:40 -0500)]
lvmcache: simplify
Recent changes allow some major simplification of the way
lvmcache works and is used. lvmcache_label_scan is now
called in a controlled fashion at the start of commands,
and not via various unpredictable side effects. Remove
various calls to it from other places. lvmcache_label_scan
should not be called from anywhere during a command, because
it produces an incorrect representation of PVs with no MDAs,
and misclassifies them as orphans. This has been a long
standing problem. The invalid flag and rescanning based on
that is no longer used and removed. The 'force' variation is
no longer needed and removed.
David Teigland [Thu, 1 Mar 2018 18:43:29 +0000 (12:43 -0600)]
clvm: fix bcache scan handling
We can't let clvmd keep all scanned devs open,
which prevents them from being removed. So
drop the bcache data (and close fds) affter
doing a label scan.
Also set up bcache before the clvm-specific
vg_read (which needs to rescan the vg's devs
using bcache) and destroy the bcache after.
David Teigland [Thu, 1 Mar 2018 16:17:32 +0000 (10:17 -0600)]
bcache: fix error handling
The error handling code wasn't working, but it
appears that just removing it is what we need.
The doesn't really need any different behavior
related to bcache blocks on an io error, it just
wants to know if there was an error.
David Teigland [Tue, 27 Feb 2018 18:35:45 +0000 (12:35 -0600)]
scan: create bcache with minimum number of blocks
In some odd cases (e.g. tests) there are very few devices
which results in creating too few blocks in bcache, so
create bcache with a minimum number of blocks.
David Teigland [Fri, 16 Feb 2018 16:37:09 +0000 (10:37 -0600)]
scan: drop bcache and close fd for LV with stacked PV
When a PV is stacked on an LV, the LV will be kept in
bcache, and the open fd on the LV may interfere with
processing the LV. So, drop/close a bcache fd for
an LV before processing the LV.
David Teigland [Wed, 14 Feb 2018 21:45:31 +0000 (15:45 -0600)]
scan: setup bcache for commands using lvmetad
Commands using lvmetad will not begin with a proper
label_scan which initializes bcache, but may later
decide they need to scan a set of devs, in which case
they'll need bcache set up at that point.
David Teigland [Wed, 14 Feb 2018 20:47:28 +0000 (14:47 -0600)]
vgremove: fix force remove on devs with damaged metadata
The improved detection of bad metadata when scanning
(where errors were ignored before) means we now have to
override some errors when forcibly erasing damaged metadata.
David Teigland [Wed, 14 Feb 2018 19:49:56 +0000 (13:49 -0600)]
scan: skip extra scan in vg_read
Drop an extra label scan in the recovery part
of vg_read. This is a temporary improvement
until the pending replacement for the broken
recovery code burried in vg_read.
David Teigland [Tue, 13 Feb 2018 18:50:44 +0000 (12:50 -0600)]
scan: add a dev to bcache before each read to handle write path
This is a temporary hacky workaround to the problem of
reads going through bcache and writes not using bcache.
The write path wants to read parts of data that it is
incrementally writing to disk, but the reads (using
bcache) don't work because the writes are not in the
bcache. For now, add a dev to bcache before each attempt
to read it in case it's being used on the write path.
David Teigland [Tue, 13 Feb 2018 14:58:35 +0000 (08:58 -0600)]
scan: use separate fd for bcache
Create a new dev->bcache_fd that the scanning code owns
and is in charge of opening/closing. This prevents other
parts of lvm code (which do various open/close) from
interfering with the bcache fd. A number of dev_open
and dev_close are removed from the reading path since
the read path now uses the bcache.
With that in place, open(O_EXCL) for pvcreate/pvremove
can then be fixed. That wouldn't work previously because
of other open fds.
David Teigland [Fri, 9 Feb 2018 17:24:40 +0000 (11:24 -0600)]
process_each_label: use lvmcache
In the same way as the other process_each functions.
In the common case all the info that's needed can be
used from lvmcache after a label scan. But this means
that unchosen devs for duplicate PVs need to be handled
explicitly.
David Teigland [Wed, 1 Nov 2017 14:35:40 +0000 (09:35 -0500)]
lvmcache: simplify metadata cache
The copy of VG metadata stored in lvmcache was not being used
in general. It pretended to be a generic VG metadata cache,
but was not being used except for clvmd activation. There
it was used to avoid reading from disk while devices were
suspended, i.e. in resume.
This removes the code that attempted to make this look
like a generic metadata cache, and replaces with with
something narrowly targetted to what it's actually used for.
This is a way of passing the VG from suspend to resume in
clvmd. Since in the case of clvmd one caller can't simply
pass the same VG to both suspend and resume, suspend needs
to stash the VG somewhere that resume can grab it from.
(resume doesn't want to read it from disk since devices
are suspended.) The lvmcache vginfo struct is used as a
convenient place to stash the VG to pass it from suspend
to resume, even though it isn't related to the lvmcache
or vginfo. These suspended_vg* vginfo fields should
not be used or touched anywhere else, they are only to
be used for passing the VG data from suspend to resume
in clvmd. The VG data being passed between suspend and
resume is never modified, and will only exist in the
brief period between suspend and resume in clvmd.
suspend has both old (current) and new (precommitted)
copies of the VG metadata. It stashes both of these in
the vginfo prior to suspending devices. When vg_commit
is successful, it sets a flag in vginfo as before,
signaling the transition from old to new metadata.
resume grabs the VG stashed by suspend. If the vg_commit
happened, it grabs the new VG, and if the vg_commit didn't
happen it grabs the old VG. The VG is then used to resume
LVs.
This isolates clvmd-specific code and usage from the
normal lvm vg_read code, making the code simpler and
the behavior easier to verify.
Sequence of operations:
- lv_suspend() has both vg_old and vg_new
and stashes a copy of each onto the vginfo:
lvmcache_save_suspended_vg(vg_old);
lvmcache_save_suspended_vg(vg_new);
- vg_commit() happens, which causes all clvmd
instances to call lvmcache_commit_metadata(vg).
A flag is set in the vginfo indicating the
transition from the old to new VG:
vginfo->suspended_vg_committed = 1;
- lv_resume() needs either vg_old or vg_new
to use in resuming LVs. It doesn't want to
read the VG from disk since devices are
suspended, so it gets the VG stashed by
lv_suspend:
vg = lvmcache_get_suspended_vg(vgid);
If the vg_commit did not happen, suspended_vg_committed
will not be set, and in this case, lvmcache_get_suspended_vg()
will return the old VG instead of the new VG, and it will
resume LVs based on the old metadata.
David Teigland [Mon, 6 Nov 2017 18:09:52 +0000 (12:09 -0600)]
label_scan: remove extra label scan and read for orphan PVs
When process_each_pv() calls vg_read() on the orphan VG, the
internal implementation was doing an unnecessary
lvmcache_label_scan() and two unnecessary label_read() calls
on each orphan. Some of those unnecessary label scans/reads
would sometimes be skipped due to caching, but the code was
always doing at least one unnecessary read on each orphan.
The common format_text case was also unecessarily calling into
the format-specific pv_read() function which actually did nothing.
By analyzing each case in which vg_read() was being called on
the orphan VG, we can say that all of the label scans/reads
in vg_read_orphans are unnecessary:
1. reporting commands: the information saved in lvmcache by
the original label scan can be reported. There is no advantage
to repeating the label scan on the orphans a second time before
reporting it.
2. pvcreate/vgcreate/vgextend: these all share a common
implementation in pvcreate_each_device(). That function
already rescans labels after acquiring the orphan VG lock,
which ensures that the command is using valid lvmcache
information.
David Teigland [Thu, 26 Oct 2017 19:32:30 +0000 (14:32 -0500)]
vgcreate: improve the use of label_scan
The old code was doing unnecessary label scans when
checking to see if the new VG name exists. A single
label_scan is sufficient if it is done after the
new VG lock is held.
David Teigland [Thu, 26 Oct 2017 15:58:23 +0000 (10:58 -0500)]
lvmetad: use new label_scan for update from lvmlockd
When lvmlockd indicates that the lvmetad cache is out of
date because of changes by another node, lvmetad_pvscan_vg()
rescans the devices in the VG to update lvmetad. Use the
new label_scan in this function to use the common code and
take advantage of the new aio and reduced reads.
David Teigland [Wed, 25 Oct 2017 18:39:46 +0000 (13:39 -0500)]
label_scan: fix independent metadata areas
This fixes the use of lvmcache_label_rescan_vg() in the previous
commit for the special case of independent metadata areas.
label scan is about discovering VG name to device associations
using information from disks, but devices in VGs with
independent metadata areas have no information on disk, so
the label scan does nothing for these VGs/devices.
With independent metadata areas, only the VG metadata found
in files is used. This metadata is found and read in
vg_read in the processing phase.
lvmcache_label_rescan_vg() drops lvmcache info for the VG devices
before repeating the label scan on them. In the case of
independent metadata areas, there is no metadata on devices, so the
label scan of the devices will find nothing, so will not recreate
the necessary vginfo/info data in lvmcache for the VG. Fix this
by setting a flag in the lvmcache vginfo struct indicating that
the VG uses independent metadata areas, and label rescanning should
be skipped.
In the case of independent metadata areas, it is the metadata
processing in the vg_read phase that sets up the lvmcache
vginfo/info information, and label scan has no role.
David Teigland [Wed, 7 Feb 2018 19:26:37 +0000 (13:26 -0600)]
scan: do scanning at the start of a command
Move the location of scans to make it clearer and avoid
unnecessary repeated scanning. There should be one scan
at the start of a command which is then used through the
rest of command processing.
Previously, the initial label scan was called as a side effect
from various utility functions. This would lead to it being called
unnecessarily. It is an expensive operation, and should only be
called when necessary. Also, this is a primary step in the
function of the command, and as such it should be called prominently
at the top level of command processing, not as a hidden side effect
of a utility function. lvm knows exactly where and when the
label scan needs to be done. Because of this, move the label scan
calls from the internal functions to the top level of processing.
Other specific instances of lvmcache_label_scan() are still called
unnecessarily or unclearly by specific commands that do not use
the common process_each functions. These will be improved in
future commits.
During the processing phase, rescanning labels for devices in a VG
needs to be done after the VG lock is acquired in case things have
changed since the initial label scan. This was being done by way
of rescanning devices that had the INVALID flag set in lvmcache.
This usually approximated the right set of devices, but it was not
exact, and obfuscated the real requirement. Correct this by using
a new function that rescans the devices in the VG:
lvmcache_label_rescan_vg().
Apart from being inexact, the rescanning was extremely well hidden.
_vg_read() would call ->create_instance(), _text_create_text_instance(),
_create_vg_text_instance() which would call lvmcache_label_scan()
which would call _scan_invalid() which repeats the label scan on
devices flagged INVALID. lvmcache_label_rescan_vg() is now called
prominently by _vg_read() directly.
David Teigland [Wed, 7 Feb 2018 19:14:08 +0000 (13:14 -0600)]
scan: use new label_scan for lvmcache_label_scan
To do label scanning, lvm code calls lvmcache_label_scan().
Change lvmcache_label_scan() to use the new label_scan()
based on bcache.
Also add lvmcache_label_rescan_vg() which calls the new
label_scan_devs() which does label scanning on only the
specified devices. This is for a subsequent commit and
is not yet used.