Move RAID convert tests to new file, t-lvconvert-raid.sh
There is duplicate code in t-lvconvert-raid.sh and t-lvcreate-raid.sh.
This should be moved into a common file which is then sourced by these two
files. I'll wait to move the duplicate code until I can talk to mornfall.
Add support for m-way to n-way up-convert in RAID1 (no linear to n-way yet)
This patch adds the ability to upconvert a raid1 array - say from 2-way to
3-way. It does not yet support upconverting linear to n-way.
The 'raid' device-mapper target allows for individual components (images) of
an array to be specified for rebuild. This mechanism is used when adding
new images to the array so that the new images can be resync'ed while the
rest of the images in the array can remain 'in-sync'. (There is no
mirror-on-mirror layering required.)
Add the ability to split an image from the mirror and track changes.
~> lvconvert --splitmirrors 1 --trackchanges vg/lv
The '--trackchanges' option allows a user the ability to use an image of
a RAID1 array for the purposes of temporary read-only access. The image
can be merged back into the array at a later time and only the blocks that
have changed in the array since the split will be resync'ed. This
operation can be thought of as a partial split. The image is never completely
extracted from the array, in that the array reserves the position the device
occupied and tracks the differences between the array and the split image via
a bitmap. The image itself is rendered read-only and the name (<LV>_rimage_*)
cannot be changed. The user can complete the split (permanently splitting the
image from the array) by re-issuing the 'lvconvert' command without the
'--trackchanges' argument and specifying the '--name' argument.
~> lvconvert --splitmirrors 1 --name my_split vg/lv
Merging the tracked image back into the array is done with the '--merge'
option (included in a follow-on patch).
~> lvconvert --merge vg/lv_rimage_<n>
The internal mechanics of this are relatively simple. The 'raid' device-
mapper target allows for the specification of an empty slot in an array
via '- -'. This is what will be used if a partial activation of an array
is ever required. (It would also be possible to use 'error' targets in
place of the '- -'.) If a RAID image is found to be both read-only and
visible, then it is considered separate from the array and '- -' is used
to hold it's position in the array. So, all that needs to be done to
temporarily split an image from the array /and/ cause the kernel target's
bitmap to track (aka "mark") changes made is to make the specified image
visible and read-only. To merge the device back into the array, the image
needs to be returned to the read/write state of the top-level LV and made
invisible.
Add --splitmirrors support for RAID1 (1 image only)
Users already have the ability to split an image from an LV of "mirror"
segtype. This patch extends that ability to LVs of "raid1" segtype.
This patch only allows a single image to be split off, however. (The
"mirror" segtype allows an arbitrary number of images to be split off.
e.g. 4-way => 3-way/linear, 2-way/2-way, linear,3-way)
When down-converting RAID1, don't activate sub-lvs between suspend/resume
of top-level LV.
We can't activate sub-lv's that are being removed from a RAID1 LV while it
is suspended. However, this is what was being used to have them show-up
so we could remove them. 'sync_local_dev_names' is a sufficient and
proper replacement and can be done after the top-level LV is resumed.
Compiler warning fixes, better error messaging, and cosmetic changes.
1) add new function 'raid_remove_top_layer' which will be useful
to other conversion functions later (also cleans up code)
2) Add error messages if raid_[extract|add]_images fails
3) Add function prototypes to prevent compiler warnings when
compiling with '--with-raid=shared'
Various code clean-ups (s/malloc/zalloc/, new msgs, etc)
Fix a couple more issues that kabi found.
- Add some error messages in failure cases
- s/malloc/zalloc/
- use vg->vgmem for lv names instead of vg->cmd->mem
Zdenek Kabelac [Thu, 11 Aug 2011 17:34:30 +0000 (17:34 +0000)]
Lock memory for shared VG
Use debug pool locking functionality. So the command could check,
whether the memory in the pool has not been modified.
For lv_postoder() instead of unlocking and locking for every changed
struct status member do it once when entering and leaving function.
(mprotect would trap each such memory access).
Currently lv_postoder() does not modify other part of vg structure
then status flags of each LV with flags that are reverted back to
its original state after function exit.
Zdenek Kabelac [Thu, 11 Aug 2011 17:29:04 +0000 (17:29 +0000)]
Add memory pool locking functions
Adding debuging functionality to lock and unlock memory pool.
2 ways to debug code:
crc - is default checksum/hash of the locked pool.
It gets slower when the pool is larger - so the check is only
made when VG is finaly released and it has been used more then
once.Thus the result is rather informative.
mprotect - quite fast all the time - but requires more memory and
currently it is using posix_memalign() - this could be
later modified to use dm_malloc() and align internally.
Tool segfaults when locked memory is modified and core
could be examined for faulty code section (backtrace).
Only fast memory pools could use mprotect for now -
so such debug builds cannot be combined with DEBUG_POOL.
Zdenek Kabelac [Thu, 11 Aug 2011 17:24:23 +0000 (17:24 +0000)]
Cache and share generated VG structs
Extend vginfo cache with cached VG structure. So if the same metadata
are use, skip mda decoding in the case, the same data are in use.
This helps for operations like activation of all LVs in one VG,
where same data were decoded giving the same output result.
Patch adds 1-to-1 connection between volume_group and lvmcache_vginfo.
Compiler complaining that meta_lv could be used uninitialized. (Not true
because it is protected by 'clear_metadata'.) I switched to using 'lv->vg',
as it makes no difference to vg_[write|commit].
Milan Broz [Wed, 10 Aug 2011 16:07:53 +0000 (16:07 +0000)]
If anything bad happens and unlocking fails
(here clvmd crashed in the middle of operation),
lock is not removed from cache - here is one example:
locking/cluster_locking.c:497 Locking VG V_vg_test UN (VG) (0x6)
locking/cluster_locking.c:113 Error writing data to clvmd: Broken pipe
locking/locking.c:399 <backtrace>
locking/locking.c:461 <backtrace>
Internal error: Volume Group vg_test was not unlocked
Code should always remove lock info from lvmcache and update counters
on unlock, even if unlock fails.
Peter Rajnoha [Tue, 9 Aug 2011 11:44:57 +0000 (11:44 +0000)]
Suppress low-level locking errors and warnings while using --sysinit.
Today, we use "suppress_messages" flag (set internally in init_locking fn based
on 'ignorelockingfailure() && getenv("LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES")'.
This way, we can suppress high level messages like "File-based locking
initialisation failed" or "Internal cluster locking initialisation failed".
However, each locking has its own sequence of initialization steps and these
could log some errors as well. It's quite misleading for the user to see such
errors and warnings if the "--sysinit" is used (and so the ignorelockingfailure
&& LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES environment variable). Errors and
warnings from these intermediary steps should be suppressed as well if requested.
This patch propagates the "suppress_messages" flag deeper into locking init
functions. I've also added these flags for other locking types for consistency,
though it's not actually used for no_locking and readonly_locking.
Zdenek Kabelac [Thu, 4 Aug 2011 15:18:10 +0000 (15:18 +0000)]
Remove unused inconsistent_seqno
Last usage was removed in Petr's commit related to VG mda repair fix
where relaxed check starts to ignore inconsistencies coming from
PVs that are marked MISSING - thus removing unused variable.
Zdenek Kabelac [Thu, 4 Aug 2011 12:40:24 +0000 (12:40 +0000)]
Minor memory leak fix
Defer the test of the function return value after the string memory is released.
Otherwise in this error path the string would present memory leak.
(Thought in this case we are already out of memory...)
Basic support includes:
- ability to create RAID 1/4/5/6 arrays
- ability to delete RAID arrays
- ability to display RAID arrays
Notable missing features (not included in this patch):
- ability to clean-up/repair failures
- ability to convert RAID segment types
- ability to monitor RAID segment types
Peter Rajnoha [Tue, 2 Aug 2011 10:49:57 +0000 (10:49 +0000)]
Change DEFAULT_UDEV_SYNC to 1 so udev_sync is used even without any config.
This should be set by default! Normally we have "activation/udev_sync = 1"
in lvm.conf (example.conf.in). But if we use lvm2 without any config file
(or without a definition within '--config' option) the DEFAULT_UDEV_SYNC
is used instead. Together with verify_udev_operations=0 (when we rely on
udev fully), this can cause races as the node could be missing when needed.
(See also https://bugzilla.redhat.com/show_bug.cgi?id=723144)
Peter Rajnoha [Thu, 28 Jul 2011 13:06:50 +0000 (13:06 +0000)]
Add support for systemd file descriptor handover in dmeventd.
Systemd preloads file descriptors for us and passes them in for
newly spawned daemon when using on-demand fifo (or socket)
based activation.
This patch adds checks for file descriptors preloaded by
systemd and uses them instead of opening the FIFOs again
to properly support on-demand FIFO-based activation.
(We'll change FIFOs to sockets soon - but still this
part of the code will stay almost the same.)
Peter Rajnoha [Thu, 28 Jul 2011 13:03:37 +0000 (13:03 +0000)]
Add support for new oom killer adjustment interface (oom_score_adj).
The filename to adjust the oom score was changed in 2.6.36.
We should use oom_score_adj instead of oom_adj (which is still
there under /proc, but it's scheduled for removal in August 2012).
New oom_score_adj uses a range from -1000 (OOM_SCORE_ADJ_MIN,
disable oom killing) to 1000 (OOM_SCORE_ADJ_MAX).
Petr Rockai [Mon, 25 Jul 2011 17:59:50 +0000 (17:59 +0000)]
lvmetad: Edit the MISSING_PV flags only after making a "reply" copy of the
metadata, which is then serialised and discarded. This fixes a couple of
outstanding TODO items about handling the MISSING flags correctly.
Petr Rockai [Mon, 25 Jul 2011 15:51:51 +0000 (15:51 +0000)]
lvmetad: Check integrity of multiple metadata copies, i.e. ensure that seqno
equality implies metadata equality. Signal error in response to any update
requests that try to overwrite metadata without providing a higher seqno.
Compare also file size to detect changed config file
Clvmd detects modifed config file before it takes lv_lock.
If the config file is changed rapidly - the change was ignored within
a seocnd ranged. This patch adds also compare of file size.
So change like some flag for 0 to 1 would pass unnoticed - but
it's quick fix for failing test suite.
Petr Rockai [Wed, 20 Jul 2011 21:33:41 +0000 (21:33 +0000)]
lvmetad: Obliterate vg_status by returning the same information from
update_pv_status, saving a dozen lines of code and execution time of one
walkthrough of the PV list.
Petr Rockai [Wed, 20 Jul 2011 21:23:43 +0000 (21:23 +0000)]
First stab at making lvmetad-core threadsafe. The current design should allow
very reasonable amount of parallel access, although the hash tables may become
a point of contention under heavy loads. Nevertheless, there should be orders
of magnitude less contention on the hash table locks than we currently have on
block device scanning.
Petr Rockai [Wed, 20 Jul 2011 16:46:40 +0000 (16:46 +0000)]
Make lvmetad also report VGID in reply when adding a PV without MDAs (this
obviously only works for VGs that already had at least some MDA discovered).