Move cascade inside libdm etc.
Makes dumpconfig whole-section output wrong in a different way from before,
but we should be able to merge cft_cmdline properly into cmd->cft now and
remove cascade.
Fix for bug 732142: Unsafe table load during mirror image split
There was a bad sequence:
*) Make changes to LV layout to split images (e.g. 4-way -> 2-way/2-way)
1) vg_write, suspend_lv(original_mirror), vg_commit
2) activate_lv(newly_split_lv)
3) resume_lv(original_mirror)
Step #2 is not allowed. However, without it, the resume of the original
mirror will also resume its former sub-LVs - making it impossible to
activate the newly split LV due to the changes in layering, pointers, and
names that had already been made. Additionally, the resume or the original
brings the sub-lv's online with names that differ from the metadata on disk -
also a no-no. Thus, the split must be done in stages such that the active LVs
always reflect what is in the committed LVM metadata.
First, alter the original mirror by releasing the images. The images are made
visible and independent as an intermediate stage. (This way, we can have
consistency between LVM metadata and active LVs.) The second stage collects
the recently split LVs, deactivates them, forms them into a mirror if necessary,
and then activates them. It is a bit of a circuitous method, but it is the only
way to split a mirror from a mirror and obey these general rules:
1) Never [de]activate sub-lvs when the top-level LV is suspended
2) Avoid having active LVs that differ from the description in the LVM metadata
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Petr Rockai [Wed, 31 Aug 2011 15:19:19 +0000 (15:19 +0000)]
Replace const usage of dm_config_find_node with more appropriate value-lookup
functionality. A number of bugs (copied and pasted all over the code) should
disappear:
- most string lookup based on dm_config_find_node would segfault when
encountering a non-zero integer (the intention there was to print an
error message instead)
- check for required sections in metadata would have been satisfied by
values as well (i.e. not sections)
- encountering a section in place of expected flag value would have
segfaulted (due to assumed but unchecked cn->v != NULL)
Petr Rockai [Wed, 31 Aug 2011 12:39:58 +0000 (12:39 +0000)]
Fix warnings and constness handling in lvmetad-core (adjusting the
dm_config_find_node to give non-const node pointer, since that better reflects
the contract of that function).
Petr Rockai [Wed, 31 Aug 2011 11:31:57 +0000 (11:31 +0000)]
A compromise integration of LVMetaD into the build: I have kept all the
daemon/common code in a single libdaemon.a, which is completely private. This
is currently linked into the lvmetad binary, and will be linked into LVM (the
client part, since static linking only picks up only symbols that are actually
used). I have also added --enable/disable-lvmetad to ./configure; although the
current default is off, I expect this to be flipped to on shortly. There's no
LVM-side support yet, but when there is, even when built, it'll still need to
be enabled by an lvm.conf option.
Petr Rockai [Tue, 30 Aug 2011 14:55:15 +0000 (14:55 +0000)]
Move the core of the lib/config/config.c functionality into libdevmapper,
leaving behind the LVM-specific parts of the code (convenience wrappers that
handle `struct device` and `struct cmd_context`, basically). A number of
functions have been renamed (in addition to getting a dm_ prefix) -- namely,
all of the config interface now has a dm_config_ prefix.
Peter Rajnoha [Mon, 29 Aug 2011 13:37:36 +0000 (13:37 +0000)]
Directly allocate buffer memory in a pvck scan instead of using a mempool.
There's a very high memory usage when calling _pv_analyse_mda_raw (e.g. while
executing pvck) that can end up with "out of memory".
_pv_analyse_mda_raw scans for metadata in the MDA, iteratively increasing the
size to scan with SECTOR_SIZE until we find a probable config section or we're
at the edge of the metadata area. However, when using a memory pool, we're also
iteratively chasing for bigger and bigger mempool chunk which can't be found
and so we're always allocating a new one, consuming more and more memory...
This patch just changes the mempool to direct memory allocation in this
problematic part of the code.
Move RAID convert tests to new file, t-lvconvert-raid.sh
There is duplicate code in t-lvconvert-raid.sh and t-lvcreate-raid.sh.
This should be moved into a common file which is then sourced by these two
files. I'll wait to move the duplicate code until I can talk to mornfall.
Add support for m-way to n-way up-convert in RAID1 (no linear to n-way yet)
This patch adds the ability to upconvert a raid1 array - say from 2-way to
3-way. It does not yet support upconverting linear to n-way.
The 'raid' device-mapper target allows for individual components (images) of
an array to be specified for rebuild. This mechanism is used when adding
new images to the array so that the new images can be resync'ed while the
rest of the images in the array can remain 'in-sync'. (There is no
mirror-on-mirror layering required.)
Add the ability to split an image from the mirror and track changes.
~> lvconvert --splitmirrors 1 --trackchanges vg/lv
The '--trackchanges' option allows a user the ability to use an image of
a RAID1 array for the purposes of temporary read-only access. The image
can be merged back into the array at a later time and only the blocks that
have changed in the array since the split will be resync'ed. This
operation can be thought of as a partial split. The image is never completely
extracted from the array, in that the array reserves the position the device
occupied and tracks the differences between the array and the split image via
a bitmap. The image itself is rendered read-only and the name (<LV>_rimage_*)
cannot be changed. The user can complete the split (permanently splitting the
image from the array) by re-issuing the 'lvconvert' command without the
'--trackchanges' argument and specifying the '--name' argument.
~> lvconvert --splitmirrors 1 --name my_split vg/lv
Merging the tracked image back into the array is done with the '--merge'
option (included in a follow-on patch).
~> lvconvert --merge vg/lv_rimage_<n>
The internal mechanics of this are relatively simple. The 'raid' device-
mapper target allows for the specification of an empty slot in an array
via '- -'. This is what will be used if a partial activation of an array
is ever required. (It would also be possible to use 'error' targets in
place of the '- -'.) If a RAID image is found to be both read-only and
visible, then it is considered separate from the array and '- -' is used
to hold it's position in the array. So, all that needs to be done to
temporarily split an image from the array /and/ cause the kernel target's
bitmap to track (aka "mark") changes made is to make the specified image
visible and read-only. To merge the device back into the array, the image
needs to be returned to the read/write state of the top-level LV and made
invisible.
Add --splitmirrors support for RAID1 (1 image only)
Users already have the ability to split an image from an LV of "mirror"
segtype. This patch extends that ability to LVs of "raid1" segtype.
This patch only allows a single image to be split off, however. (The
"mirror" segtype allows an arbitrary number of images to be split off.
e.g. 4-way => 3-way/linear, 2-way/2-way, linear,3-way)
When down-converting RAID1, don't activate sub-lvs between suspend/resume
of top-level LV.
We can't activate sub-lv's that are being removed from a RAID1 LV while it
is suspended. However, this is what was being used to have them show-up
so we could remove them. 'sync_local_dev_names' is a sufficient and
proper replacement and can be done after the top-level LV is resumed.
Compiler warning fixes, better error messaging, and cosmetic changes.
1) add new function 'raid_remove_top_layer' which will be useful
to other conversion functions later (also cleans up code)
2) Add error messages if raid_[extract|add]_images fails
3) Add function prototypes to prevent compiler warnings when
compiling with '--with-raid=shared'
Various code clean-ups (s/malloc/zalloc/, new msgs, etc)
Fix a couple more issues that kabi found.
- Add some error messages in failure cases
- s/malloc/zalloc/
- use vg->vgmem for lv names instead of vg->cmd->mem
Zdenek Kabelac [Thu, 11 Aug 2011 17:34:30 +0000 (17:34 +0000)]
Lock memory for shared VG
Use debug pool locking functionality. So the command could check,
whether the memory in the pool has not been modified.
For lv_postoder() instead of unlocking and locking for every changed
struct status member do it once when entering and leaving function.
(mprotect would trap each such memory access).
Currently lv_postoder() does not modify other part of vg structure
then status flags of each LV with flags that are reverted back to
its original state after function exit.
Zdenek Kabelac [Thu, 11 Aug 2011 17:29:04 +0000 (17:29 +0000)]
Add memory pool locking functions
Adding debuging functionality to lock and unlock memory pool.
2 ways to debug code:
crc - is default checksum/hash of the locked pool.
It gets slower when the pool is larger - so the check is only
made when VG is finaly released and it has been used more then
once.Thus the result is rather informative.
mprotect - quite fast all the time - but requires more memory and
currently it is using posix_memalign() - this could be
later modified to use dm_malloc() and align internally.
Tool segfaults when locked memory is modified and core
could be examined for faulty code section (backtrace).
Only fast memory pools could use mprotect for now -
so such debug builds cannot be combined with DEBUG_POOL.
Zdenek Kabelac [Thu, 11 Aug 2011 17:24:23 +0000 (17:24 +0000)]
Cache and share generated VG structs
Extend vginfo cache with cached VG structure. So if the same metadata
are use, skip mda decoding in the case, the same data are in use.
This helps for operations like activation of all LVs in one VG,
where same data were decoded giving the same output result.
Patch adds 1-to-1 connection between volume_group and lvmcache_vginfo.
Compiler complaining that meta_lv could be used uninitialized. (Not true
because it is protected by 'clear_metadata'.) I switched to using 'lv->vg',
as it makes no difference to vg_[write|commit].
Milan Broz [Wed, 10 Aug 2011 16:07:53 +0000 (16:07 +0000)]
If anything bad happens and unlocking fails
(here clvmd crashed in the middle of operation),
lock is not removed from cache - here is one example:
locking/cluster_locking.c:497 Locking VG V_vg_test UN (VG) (0x6)
locking/cluster_locking.c:113 Error writing data to clvmd: Broken pipe
locking/locking.c:399 <backtrace>
locking/locking.c:461 <backtrace>
Internal error: Volume Group vg_test was not unlocked
Code should always remove lock info from lvmcache and update counters
on unlock, even if unlock fails.
Peter Rajnoha [Tue, 9 Aug 2011 11:44:57 +0000 (11:44 +0000)]
Suppress low-level locking errors and warnings while using --sysinit.
Today, we use "suppress_messages" flag (set internally in init_locking fn based
on 'ignorelockingfailure() && getenv("LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES")'.
This way, we can suppress high level messages like "File-based locking
initialisation failed" or "Internal cluster locking initialisation failed".
However, each locking has its own sequence of initialization steps and these
could log some errors as well. It's quite misleading for the user to see such
errors and warnings if the "--sysinit" is used (and so the ignorelockingfailure
&& LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES environment variable). Errors and
warnings from these intermediary steps should be suppressed as well if requested.
This patch propagates the "suppress_messages" flag deeper into locking init
functions. I've also added these flags for other locking types for consistency,
though it's not actually used for no_locking and readonly_locking.
Zdenek Kabelac [Thu, 4 Aug 2011 15:18:10 +0000 (15:18 +0000)]
Remove unused inconsistent_seqno
Last usage was removed in Petr's commit related to VG mda repair fix
where relaxed check starts to ignore inconsistencies coming from
PVs that are marked MISSING - thus removing unused variable.