Peter Rajnoha [Wed, 28 Jul 2010 10:30:28 +0000 (10:30 +0000)]
Revert unsuccessful table load preparation in combined "create, load and resume" scenario.
There was missing "revert" call in _create_and_load_v4 fn while the preparation
for table load ends up with failure in create/load/resume sequence. Otherwise
we could end up with a device being created, but not table-loaded nor resumed.
Even though the table is not loaded and the device is not resumed at this
stage, we still need to synchronize with udev when calling the revert
"remove" ioctl - there's still a remove uevent generated! The "revert"
code does exactly that.
Building without the '--enable-cmirrord' option means that
CMIRRORD_PIDFILE is not defined. This makes the build fail.
Therefore, we need to conditionalize the check for cmirrord
based on if CMIRRORD_PIDFILE is defined.
It's not enough to check for the kernel module in the case of cluster
mirrors, we must also check that the log daemon (cmirrord) is running.
The log module can be auto-loaded, but the daemon cannot be
"auto-started". Failing to check for the daemon produces cryptic
messages that customers have a hard time deciphering. (The system
messages do report that the log daemon is not running, but people
don't seem to find this message easily.)
Here are examples of what is printed when the module is available,
but the log daemon has not been started.
[root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg
Shared cluster mirrors are not available.
[root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg -v
Setting logging type to disk
Finding volume group "vg"
Archiving volume group "vg" metadata (seqno 3).
Creating logical volume lv
Executing: /sbin/modprobe dm-log-userspace
Cluster mirror log daemon is not running
Shared cluster mirrors are not available.
Creating volume group backup "/etc/lvm/backup/vg" (seqno 4).
Fix reversal of LV list before performing a split mirror.
When splitting off mirror images from a mirror, we always take
LVs from the end of a list. For example, if the mirror sub-devices
are lv_mimage_[012], we should select lv_mimage_2 if splitting off
one image. However, lv_mimage_0 was being selected instead.
The problem came from calling '_move_removable_mimages_to_end'
when it was unnecessary to do so. When the user /does/ specify
specific devices to be removed, this function properly moved the
appropriate LVs to the end of the list for extraction. However,
if the user /doesn't/ give any specific PVs, the function should
do nothing. '_move_removable_mimages_to_end' was keying off of
whether 'removable_pvs' was NULL or not and this value was
improperly being populated with the set of all available PVs.
This was causing '_move_removable_mimages_to_end' to completely
reverse the list, which in turn caused us to extract the
hithertofore front-of-the-list LVs.
Fix for bug 612311: Split of linear provides no error msg
An unhandled condition allowed the command to terminate
cleanly without a warning. Added a check for the
'--splitmirrors' argument to allow execution to the lower
level function that has the check to see if the user is
trying to split a linear device. You should now see a
message if you try to use --splitmirrors on a linear device.
The main problem with these bugs was that the newly split
off LV was not being suspended properly. This meant that
the memlock count was not being balanced, the DM devices
were not being renamed, and some DM devices which should
have been removed were not.
I've also renamed some of the variables and added comments
to make things clearer as to what is going on. (I can break
this patch in two if it means easier review.)
Dave Wysochanski [Tue, 13 Jul 2010 15:04:23 +0000 (15:04 +0000)]
Minor man page updates related to metadataignore and vgmetadatacopies.
pvchange: Add --metadataignore description
vgchange: Fix minor formatting
pvcreate: Update metadataignore description to refer to pvchange
lvm.conf: Refer to pvcreate and pvchange for metadata options.
Add dm_create_lockfile to libdm to handle pidfiles for all daemons.
Switch dmeventd to use dm_create_lockfile and drop duplicate code.
Allow clvmd pidfile to be configurable.
Switch cmirrord and clvmd to use dm_create_lockfile.
Peter Rajnoha [Mon, 12 Jul 2010 11:37:49 +0000 (11:37 +0000)]
Add more verbose messages while checking volume_list and hosttags settings.
This should bring less confusion when there are some settings left and
people just forgot about it and then they run into problems. These messages
should give them a hint of what's really going on.
Failed to test for the case where a log was requested to be removed
even though there was no log. A simple run through the in-tree test
suite would have caught this. :(
- if (lv_is_mirrored(detached_log_lv) &&
+ if (detached_log_lv && lv_is_mirrored(detached_log_lv) &&
Also, made some cosmetic changes suggested by kabi after my last check-in
(e.g. s/return 0/return_0/ and adding an error message).
Finish fix for bug 607347: failing both redundant mirror log legs...
A previous check-in added logic to handle the case where both images
of a mirrored log failed. It solved the problem by simply removing
the log entirely - leaving the parent mirror with a 'core' log. This
worked for most cases. However, if there was a small delay between
the failures of the two mirrored log devices, the mirror would hang,
LVM would hang, and no additional LVM commands could be issued.
When the first leg of the log fails, it signals the need for repair.
Before 'lvconvert --repair' is run by dmeventd, the second leg fails.
'lvconvert' would see both devices as failed and try to remove the
log entirely. When it came time to suspend the parent mirror to
update the configuration, the suspend would hang because it couldn't
get any I/O through the mirrored log, which was plugged waiting for
corrective action. The solution is to replace the log with an error
target to clear any pending writes before removing it. This allows
the parent mirror to suspend and make the proper changes.
Pass metadataignore to pv_create, pv_setup, _mda_setup, and add_mda.
Pass metadataignore through PV creation / setup paths.
As a result of this cleanup, we can remove the unnecessary setting
of mda_ignore bits inside pvcreate_single(), after call to pv_create.
For now, just set metadataignore to '0' in some places. This is
equivalent to the prior functionality, although the 0 is given
by the caller not hardcoded in _mda_setup() call.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Cleanups for configure:
Indent updates.
Use AC_HELP_STRING for help string.
Start help string with lower letter.
Add [] around some default values i.e. [TYPE=internal].
Skip some "" around shell assigment when not needed.
Fix typo --with-device-gid=UID string.
Add prompt if using --metadataignore argument with vgmetadatacopies.
When using vgmetadatacopies value other than "umanaged" (0), prompt
the user if the usage of --metadataignore would change the value of
vgmetadatacopies. The main 2 cases are:
1) pvchange --metadataignore
2) vgextend --metadataignore
We leave the prompt check in the tools, and do not change anything
if the user says 'n'.
Examples:
vgextend --metadataignore y vgtest /dev/loop0
Setting metadataignore will override preferred number of copies of VG vgtest metadata.
Are you sure? [y/n]: y
No physical volume label read from /dev/loop0
Physical volume "/dev/loop0" successfully created
Volume group "vgtest" successfully extended
pvchange --metadataignore y /dev/loop3
Setting metadataignore on /dev/loop3 will override preferred number of copies of VG vgtest metadata.
Are you sure? [y/n]: y
WARNING: Changing preferred number of copies of VG vgtest metadata from 3 to 2
Physical volume "/dev/loop3" changed
1 physical volume changed / 0 physical volumes not changed
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Test failed commit of mda on new pv - failed vgextend.
Test the auto-repair capability when we fail committing to an mda
on a new pv adding to a vg. This test should fail until we fix
the auto-repair in this case.
Peter Rajnoha [Wed, 7 Jul 2010 11:22:46 +0000 (11:22 +0000)]
Use "nowatch" udev rule for known inappropriate devices.
For now, this is just a precaution. Normally, all the other (non-dm) rules
should check DM_UDEV_DISABLE_OTHER_RULES_FLAG and therefore avoid setting
any inotify watches as well. But let's make sure.
Support for final assignment of the "nowatch" rule (the use of ":=") will
appear in next udev release, v160. This should also work in previous udev
versions but the setting won't be sealed so any further OPTIONS="watch" will
always prevail there.
We may want to add more specific "nowatch" rules later if needed.
Adjust auto-metadata repair and caching logic to try to cope with empty mdas.
- If a PV contained empty mdas, the auto-recovery code was not kicking in.
- The 'inconsistent' state was getting lost when metadata was cached so
recovery didn't kick in. But leave the behaviour alone when using
precommitted metadata because of a warning in a confusing FIXME.
In my testing, pvs and vgs didn't repair inconsistent metadata like they
used to do. (How many other tools fail similarly now?)
And there should be no need to cache inconsistent metadata because it is
supposed to get repaired under the protection of a write lock immediately it is
discovered.
This code is in need of a redesign based on first principles.
I still see bugs in this code and this commit is risky.
Fix for bug 607347: failing both redundant mirror log legs...
Rather than attempting to remove all the images of a mirrored
log volume via remove_mirror_images, simply remove the log
if all its devices have failed.
Taka was the first to report that there is still an outstanding
issue with handling this case. I've managed to reproduce it
only very rarely, and am still working on identifying the problem.
Failing to handle the problem rarely is better than not handling
the scenario at all, so I'm checking this in.
Milan Broz [Thu, 1 Jul 2010 21:23:47 +0000 (21:23 +0000)]
Remove superfluous suspended device counter from clvmd.
Moreover, in current mirror handling, when it calls activate
on removed but suspended detached log this counter drops below zero
and confuses debug log.
Petr Rockai [Wed, 30 Jun 2010 21:40:27 +0000 (21:40 +0000)]
Maintain memlock balance in clvmd.
When a mirror is being downconverted in a cluster, a series of suspends and
resumes is executed.
With the change to using UUIDs in dev_manager instead of names, the behaviour
has changed with regards to including an _mlog in the deptree of a logical
volume. In the old (pre-UUID-enabled) code, the _mlog would appear in a deptree
of any volume purely based on a name match: a linear volume foo would include
foo_mlog in its dependencies if that happened to exist. This behaviour was
fixed and the mlog is now only included for mirrors.
By a coincidence, this mlog bug had been hiding a different bug in clvmd. When
a mirror is being dismantled (and converted to a linear volume), it is first
suspended as a whole, then later resumed in parts. Nevertheless, the overall
memlock balance is maintained in this operation. The problem kicks in, because
even though the mirror log was suspended as part of the mirror, when the
dismantled mirror is resumed again, it is no longer a mirror and therefore the
mirror log stays suspended. This would not be a problem in itself, since
_delete_lv (from metadata/mirror.c) is called on it subsequently, which does an
activate/deactivate cycle and removes the LV. The activate/deactivate cycle
correctly prompts clvmd to resume the device: however, in doing this, it will
issue an unpaired resume operation (the suspend that caused the mirror log to
be suspended is paired with resuming the dismantled mirror later). We have
concluded that the path in clvmd should never affect memlock_count, since there
should never be an unmatched explicit suspend preceding this resume.