Leo Yan [Thu, 3 Jun 2021 09:59:10 +0000 (17:59 +0800)]
tests: Support multiple backing devices
In current implementation, the option "LVM_TEST_BACKING_DEVICE" only
supports to specify one backing device; this patch is to extend the
option to support multiple backing devices by using comma as separator,
e.g. below command specifies two backing devices:
make check_lvmlockd_idm LVM_TEST_BACKING_DEVICE=/dev/sdj3,/dev/sdk3
This can allow the testing works on multiple drives and verify the
locking scheme if can work as expected for multiple drives case. For
example, for Seagate IDM locking scheme, if a VG uses two PVs, every PV
is resident on a drive, thus the locking operations will be sent to two
drives respectively; so the extension for "LVM_TEST_BACKING_DEVICE" can
help to verify different drive configurations for locking.
Leo Yan [Thu, 3 Jun 2021 09:59:09 +0000 (17:59 +0800)]
tests: Enable the testing for IDM locking scheme
This patch is to introduce testing option LVM_TEST_LOCK_TYPE_IDM, with
specifying this option, the Seagate IDM lock manager will be launched as
backend for testing. Also add the prepare and remove shell scripts for
IDM.
David Teigland [Wed, 2 Jun 2021 21:29:54 +0000 (16:29 -0500)]
pvchange: fix file locking deadlock
Calling clear_hint_file() to invalidate hints would acquire
the hints flock before the global flock which could cause deadlock.
The lock order requires the global lock to be taken first.
pvchange was always invalidating hints, which was unnecessary;
only invalidate hints when changing a PV uuid. Because of the
lock ordering, take the global lock before clear_hint_file which
locks the hints file.
Leo Yan [Fri, 21 May 2021 02:56:37 +0000 (10:56 +0800)]
configure: Add macro LOCKDIDM_SUPPORT
The macro LOCKDIDM_SUPPORT is missed in configure.h.in file, thus when
execute "configure" command, it has no chance to add this macro in the
automatic generated header include/configure.h.
This patch adds macro LOCKDIDM_SUPPORT into configure.h.in.
Leo Yan [Fri, 7 May 2021 02:25:15 +0000 (10:25 +0800)]
lib: locking: Parse PV list for IDM locking
For shared VG or LV locking, IDM locking scheme needs to use the PV
list assocated with VG or LV for sending SCSI commands, thus it requires
to use some places to generate PV list.
In reviewing the flow for LVM commands, the best place to generate PV
list is in the locking lib. So this is why this patch parses PV list as
shown. It iterates over all the PV nodes one by one, and compare with
the VG name or LV prefix string. If any PV matches, then the PV is
added into the PV list. Finally the PV list is sent to lvmlockd daemon.
Here as mentioned, it compares LV prefix string with the format
"lv_name_", the reason is it needs to find out all relevant PVs, e.g.
for the thin pool, it has LVs for metadata, pool, error, and raw LV, so
we can use the prefix string to find out all PVs belonging to the thin
pool.
For the global lock, it's not covered in this patch. To avoid the egg
and chicken issue, we need to prepare the global lock ahead before any
locking can be used. So the global lock's PV list is established in
lvmlockd daemon by iterating all drives with partition labeled with
"propeller".
Leo Yan [Fri, 7 May 2021 02:25:14 +0000 (10:25 +0800)]
lib: locking: Add new type "idm"
We can consider the drive firmware a server to handle the locking
request from nodes, this essentially is a client-server model.
DLM uses the kernel as a central place to manage locks, so it also
complies with client-server model for locking operations. This is
why IDM and DLM are similar with each other for their wrappers.
This patch largely works by generalizing the DLM code paths and then
providing degeneralized functions as wrappers for both IDM and DLM.
Leo Yan [Fri, 7 May 2021 02:25:13 +0000 (10:25 +0800)]
lvmlockd: idm: Hook Seagate IDM wrapper APIs
To allow the IDM locking scheme be used by users, this patch hooks the
IDM wrapper; it also introducs a new locking type "idm" and we can use
it for global lock with option '-g idm'.
To support IDM locking type, the main change in the data structure is to
add pvs path arrary. The pvs list is transferred from the lvm commands,
when lvmlockd core layer receives message, it extracts the message with
the keyword "path[idx]". Finally, the pv list will pass to IDM lock
manager as the target drives for sending IDM SCSI commands.
Leo Yan [Fri, 7 May 2021 02:25:12 +0000 (10:25 +0800)]
lvmlockd: idm: Introduce new locking scheme
Alongside the existed locking schemes of DLM and sanlock, this patch is
to introduce new locking scheme: In-Drive-Mutex (IDM).
With the IDM support in the drive, the locks are resident in the drive,
thus, the locking lease is maintained in a central place: the drive
firmware. We can consider this is a typical client-server model,
every host (or node) in the server cluster launches the request for
leasing mutex to a drive firmware, the drive firmware works as an
arbitrator to grant the mutex to a requester and it can reject other
applicants if the mutex has been acquired. To satisfy the LVM
activation for different modes, IDM supports two locking modes:
exclusive and shareable.
Every IDM is identified with two IDs, one is the host ID and another is
the resource ID. The resource ID is a unique identifier for what the
resource it's protected, in the integration with lvmlockd, the resource
ID is combined with VG's UUID and LV's UUID; for the global locking,
the bytes in resource ID are all zeros, and for the VG locking, the
LV's UUID is set as zero. Every host can generate a random UUID and
use it as the host ID for the SCSI command, this ID is used to clarify
the ownership for mutex.
For easily invoking the IDM commands to drive, like other locking
scheme (e.g. sanlock), a daemon program named IDM lock manager is
created, so the detailed IDM SCSI commands are encapsulated in the
daemon, and lvmlockd uses the wrapper APIs to communicate with the
daemon program.
This patch introduces the IDM locking wrapper layer, it forwards the
locking requests from lvmlockd to the IDM lock manager, and returns the
result from drives' responding.
One thing should be mentioned is the IDM's LVB. IDM supports LVB to max
7 bytes when stores into the drive, the most significant byte of 8 bytes
is reserved for control bits. For this reason, the patch maps the
timestamp in macrosecond unit with its cached LVB, essentially, if any
timestamp was updated by other nodes, that means the local LVB is
invalidate. When the timestamp is stored into drive's LVB, it's
possbile to cause time-going-backwards issue, which is introduced by the
time precision or missing synchronization acrossing over multiple nodes.
So the IDM wrapper fixes up the timestamp by increment 1 to the latest
value and write back into drive.
Currently LVB is used to track VG changes and its purpose is to notify
lvmetad cache invalidation when detects any metadata has been altered;
but lvmetad is not used anymore for caching metadata, LVB doesn't
really work. It's possible that the LVB functionality could be useful
again in the future, so let's enable it for IDM in the first place.
While we heavily try to spot arrays that are not yet in-sync,
some kernels tends to block our lvm2 command in kernel,
while we resume these smaller raid arrays even for 5 seconds.
But since the result is not really wrong - report these
check failures only as TEST WARNING.
Missed -l option in man page, although users should prefer
lvresize -r when the also want to do a volume management,
as there they can specify i.e. extents for allocation.
Also mention dm-crypt support in command description.
If the 'act' has been already processed by add_client_result()
it could have been possibly release - so avoid accessin 'act->'
afterward and go for next item directly.
This case happens when i.e. we convert LV to another type,
when we change existing LV into a different type - so change
to debug level and avoid confusing users with message about
Device path not match.
We may eventually enhnace caching code to drop cached info
after taking lock and reading VG.
With commit 0b18c25d934564015402de33e15a267045ed1b8c there
was introduced 'zalloc()' for allocation of outdates pvs,
but no matching 'free()' is present.
Switch to use cmd mempool instead of adding free() code into
several places.
David Teigland [Tue, 20 Apr 2021 22:03:09 +0000 (17:03 -0500)]
commands: use AUTOTYPE in definitions
If a cmd def implies an LV type without --type
in the required options, then include the implied
type in the cmd def as AUTOTYPE: <type>
instead of including the redundant --type foo
in the OO list of options.
Including an implied --type in the OO list would
often cause multiple cmd defs to potentially be
identical when options were used, and a user
command could match more than one cmd def.
The AUTOTYPE values are listed in man page and
help output as
[ --type foo (implied) ]
If a user command includes --type, it will usually
match a cmd def with --type in the required options.
But, if the user command matches a cmd def with
AUTOTYPE, then the specifed --type and AUTOTYPE must
match.
The man-generator program has a new --check
option that compares cmd defs to find any cmd defs
that are equivalent with the use of options,
and should have their options adjusted.
Correcting some usage of Bold and Italics (files).
Adding some missing SEE ALSO.
Fixing missed replaceable paths that are configurable.
Be careful about .P in .TP sections - need to use .sp for space line.
Use .UR/.UE for URL references.
Add missing VG into description of thin pool creation command.
Remove one duplicated thin-pool creation command.
Remove options --discards and --errorwhenfull from the list when the command describes
only creation of a thin volume - as these options do apply for thin-pool.
Also use here more correct name OO_LVCONVERT_THINPOOL instead of OO_LVCONVERT_THIN.
Reorder extra options for cache & thin-pool before common pool options.
Order consistenly --stripes and --stripesize after --extents option
so the options related to pools are better together.
Remove invalid snapshot creation description - since this case is
handled through our configurable spare volume creation.
Add some missing optional --type parameters for few command instancies.
Emit .ad l / .ad b less frequently around larger blocks
we want to keep left aligned.
Avoid emittting empty lines.
Reduce .HP usage and replace it with .TP.
However keep .HP for all option listings, as i.e. html rendering
can't handle well combintion of .TP an .HP together and .TP alone
is not indenting 2nd. line of long option line.
(For .TP line we don't need to emit .br)
Surround .SH with dots for better look.
For some .TP use plain more readable .I for a line.
Support rendering of optional [Number] (for --units).
Use better markup for units and instead of long markup string,
show individual units with markup.
Enhance man typography decoration of optional option
prefixes like --[raid]writebeind and use regular font to render []
as these are not part of the option name itself.
David Teigland [Wed, 14 Apr 2021 21:34:04 +0000 (16:34 -0500)]
man/help: change LV type listing
Previously, accepted LV types were presented as a series of suffixes
after the "LV" on the command line. The addition of many new types
resulted in this becoming too long, e.g
For man pages, move these types from the command line to a new line
dedicated to listing accepted LV types:
lvconvert --type cache --cachepool LV LV1
...
LV1 types: linear striped thinpool vdo vdopool vdopooldata raid
The special "LV1" is used as a reference to avoid confusion
with other LVs that may appear on the command line. There
are currently no commands with more than one typed LV, but
if there are cases with more, then "LV2" could also be used.
For command line usage/-h output, drop the LV types from the
command line specification. The more detailed is not needed
in the help output and can be found in the man page.
With to use .TP where it's easy and doesn't change layout
(since .HP is marked as deprecated) - but .TP is not always perfetc match.
Avoid submitting empty lines to troff and replace them mostly with .P
and use '.' at line start to preserve 'visual' presence of empty line
while editing man page manually when there is no extra space needed.
Fix some markup.
Add some missing SEE ALSO section.
Drop some white-space at end-of-lines.
Improve hyphenation logic so we do not split options.
Use '.IP numbers' only with first one the row (others in row
automatically derive this value)
Actully this was bad idea - to make it on pair.
-Zn for thin-pools is already used - so here user must have
create new pool and swap existing thin-pool metadata into.
So reverting this commit to avoid any possible regression.
It would be complicated to handle ',' alignment after hyphenation
changes ATM, but these commas seems to be there rather unneeded
so remove them and make the man output more clear.