Dave Wysochanski [Mon, 28 Jun 2010 20:35:17 +0000 (20:35 +0000)]
Update _vg_read and _text_create_text_instance to use fid_add_mda[s].
When we are constructing the vg, we may need to adjust the list of
metadata_areas if there are ignored mdas. At label read time, we
do not read the metadata of ignored mdas, and as a result, they do
not get placed on vg->fid->metadata_areas inside _text_create_text_instance
since lvmcache does not have these areas attached to vginfo->infos.
However, when we're checking the pvids inside _vg_read, after having
read another metadata area from another PV, we do have the opportunity
to update the metadata_area and metadata_areas_ignored lists based
on the read metadata_area. We need accurate mda lists for the reporting
functions that count the ignored mdas, as well as general correctness
of mda balancing.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:34:58 +0000 (20:34 +0000)]
Use mdas_empty_or_ignored() in place of checks for empty mda list.
With the addition of ignored mdas, we replace all checks for an empty
mda list with a new function to look for either an empty mda list or
ignored mdas.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:34:40 +0000 (20:34 +0000)]
Add mdas_empty_or_ignored() helper function.
Add a helper function to consolidate checking for an empty mdas list
or ignored mdas. Ignored mdas should behave almost identically to
an empty mda list - the metadata areas should not be read or written
to. This function will make it easier to implement metadata balancing
and easier to track pvs with an empty mda list or ignored mdas.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:34:24 +0000 (20:34 +0000)]
Implement ignore of mda if bit set by skipping r/w of metadata.
We implement ignore of an mda at label_read time by checking for
the ignore bit, and then skipping the reading of the vgname and
other information in the metadata. This will have an effect similar
to a PV found with no mdas. Thus, it will look like an orphan in the
cache until we scan the rest of the system and find a PV with
metadata, and the mda will not be on the vg->fid->metadata_areas
list so no read/writes will be done to the metadata area.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:33:58 +0000 (20:33 +0000)]
Add --metadataignore to pvchange, allowing for ignoring of metadata areas.
This patch just modifies pvchange to call the underlying ignore
functions for mdas. Ensure special cases do not reflect changes
in metadata (PVs with 0 mdas, setting ignored when already ignored,
clearing ignored when not ignored).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:33:44 +0000 (20:33 +0000)]
Define new functions and vgs/pvs fields related to mda ignore.
Define a new pvs field, pv_mda_used_count, and a new vgs field,
vg_mda_used_count to match the existing pv_mda_count and vg_mda_count.
These new fields count the number of mdas that have the 'ignored' bit
clear (they are in use on the PV / VG). Also define various supporting
functions to implement the counting as well as setting the ignored
flag and determining if an mda is ignored. These high level functions
call into the lower level location independent mda ignore functions
defined by earlier patches.
Note that counting ignored mdas in a vg requires traversing both lists
and checking for the ignored bit on the mda. The count of 'ignored'
mdas then is defined by having the bit set, not by which list the mda
is on. The list does determine whether LVM actually does read/write to
the mda, though we must count the bits in order to return accurate numbers
for the various counts. Also, pv_mda_set_ignored must search both vg
lists for ignored mda. If the state changes and needs to be committed
to disk, the ignored mda will be on the non-ignored list.
Note also in pv_mda_set_ignored(), we must properly manage the mda lists.
If we change the ignored state of an mda, we must change any mdas on
vg->fid->metadata_areas that correspond to this pv. Also, we may
need to allocate a copy of the mda, as is done when fid->metadata_areas
is populated from _vg_read(), if we are un-ignoring an ignored mda.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:31:59 +0000 (20:31 +0000)]
Add mda location specific mda_copy constructor.
Because of the way mdas are handled internally, where a PV in a VG
has mdas on both info->mdas and vg->fid->metadata_areas list, we
need a location independent copy constructor for struct
metadata_area. Break up the existing format-text specific copy
constructor into a format independent piece and a format dependent
piece.
This function is necessary to properly implement pv_set_mda_ignored().
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:31:38 +0000 (20:31 +0000)]
Add mda_locns_match() internal library function for mapping pv/device to VG mda.
A metadata_area is defined independent of the location. One downside
is that there is no obvious mapping from a pv to an mda. For a PV in
a VG, we need a way to start with a PV and end up with an MDA, if we
are to manage mdas starting with a device/pv. This function provides
us a way to go down the list of PVs on a VG, and identify which ones
match a particular PV.
I'm not entirely happy with this approach, but it does fit into the
existing structures in a reasonable way.
An alternative solution might be to refactor the VG - PV interface such
that mdas are a list tied to a PV. However, this seemed a bit tricky since
a PV does not come into existence until after the list of mdas is
constructed (see _vg_read() - we create a 'fid' and attach mdas to it,
then we go through them and attach pvs).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:31:01 +0000 (20:31 +0000)]
Allow raw_read_mda_header to be called from text_label.c.
We'd like to pass in mda_header to vgname_from_mda(). In order to
do this, we need to call raw_read_mda_header() from text_label.c,
_text_read(), which gets called from the label_read() path, and
peers into the metadata and update vginfo cache. We should check
the disable bit here, and if set, not peer into the vg metadata,
thus reducing the I/O to disk.
In the process, move vgname_from_mda() to layout.h, since the fn
only gets called from format_text code, and we need the mda_header
definition from the private layout.h.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
This refactoring moves the device open/close up one level to the caller of
_vg_read_raw_area(). Should be no functional change and facilitate future
changes related to metadata balancing.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:30:14 +0000 (20:30 +0000)]
Add location independent flag and functions to ignore mdas.
First we add a 'flags' field to the location independent
metadata_area structure, and a MDA_IGNORE flag. The
mda_is_ignored and mda_set_ignored functions are added to
manage the flag. Adding the flag and functions gives a
library interface to ignore metadata areas independent of
the underlying location (disk, file, etc). The location
specific read/write functions must then handle the specifics
of what this flag means to the location.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:29:57 +0000 (20:29 +0000)]
Add text format specific 'rlocn' ignore flag and access functions.
Adding a flag to the 'rlocn' structure in the mda header of the
text format allows us to flip a bit to ignore an area on disk that
stores the metadata via the text format specific mda_header.
This patch defines the flag and access functions to manage the flag.
Other patches will manage the ignore on a format-independent basis,
by using a flag in the metadata_area structure.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Dave Wysochanski [Mon, 28 Jun 2010 20:29:42 +0000 (20:29 +0000)]
Change 'filler' to 'flags' in on-disk 'raw_locn' structure.
Future patches will make use of a specific flag in the on-disk 'raw_locn'
structure to enable/disable metadata areas, and facilitate metadata
balancing.
Note that 'filler' is always set to '0' (see add_mda() - memset),
so use of this area as a non-zero flags field is a safe way to
provide future code features.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
The same region size is used for both mirror volume and mirrored
log volume, but when the physical extent size is bigger than region size,
the size of mirror leg for mirrored log is smaller than the region size
and lvcreate command fails.
This patch adjusts a region size of mirrored log to a smaller value of
region size or physical extent size.
[This patch ensures that the region_size of the mirrored log does not
exceed the size of the mirrored log itself, which would violate the
kernel constraint: (region_size <= ti->len).]
Signed-off-by: Takahiro Yasui <takahiro.yasui@hds.com> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Zdenek Kabelac [Thu, 24 Jun 2010 08:29:30 +0000 (08:29 +0000)]
Preload libc locale messages.
Preload libc.mo file for localized lvm before taking memory lock - this way
we prevent disk access for some error paths in libdm, that prints localized
errno messages while they are still in memory locked state.
Committing Taka's patch... He found a problem during
the failure of a device that contained both a image of
a mirror and an image of the mirrored log. The order
of the handling of those faults was important (and
wrong), this patch corrects that.
Peter Rajnoha [Wed, 23 Jun 2010 17:00:32 +0000 (17:00 +0000)]
Fix udev rules to handle spurious events properly.
We can use DM_UDEV_PRIMARY_SOURCE_FLAG to identify the spurious events
and use it as an indication that the device has already been activated before
(and hence we can find this property in udev database).
WARNING: This change requires udev startup script to preserve udev database
from initrd. All the information stored there during activation of devices
is important for the initial "udevadm trigger --action=add" call that is
used in udev startup script. If not done this way, udev startup script needs
to define DM_UDEV_PRIMARY_SOURCE_FLAG=1 property for any ADD events it uses.
The function that runs to compress a stacked mirror after
converting from 2-way to 3-way mirror (collapse_mirrored_lv)
was calling '_remove_mirror_images' with the 'remove_log'
parameter set. When the code was put in to fix 599898 to
honor log parameters during conversion, this argument was
suddenly being honored. Thus, when someone would convert from
a 2-way to 3-way mirror, the log would get removed.
'collapse_mirrored_lv' should not be calling '_remove_mirror_images'
with 'remove_log' set.
Mirrors can be layered - as in the case of an converting 2-way
to 3-way mirror. When conversion operations are performed on
these types of mirrors, log options can be confused/ignored.
In the case of a converting 3-way mirror, we have a top-level
2-way corelog mirror whose legs are 1) a 2-way disk-log mirror
and 2) a linear device. If we wish to convert this 3-way mirror
to a 2-way mirror, the linear device is removed and the extra
top layer is eliminated. If we also wished to convert the disk
log to a core log in the same step, ambiguity creeps in. It is
somewhat obvious what the user wants - a 2-way mirror with a
corelog. However, looking at the top level mirror before
compression, it seems that the mirror already has a core log.
This is why the operation seemed to fail.
This patch simply re-evaluates what mirrored_seg points to after
a compression and then considers the log argument.
Peter Rajnoha [Mon, 21 Jun 2010 08:54:32 +0000 (08:54 +0000)]
Use early udev synchronisation and update of dev nodes for clustered mirrors.
When using clustered mirrors, we need device nodes to be created during
processing of device tree, not at its end like we normally do (we need to
access the nodes in cmirror prematurely). Therefore we use a new flag called
"immediate_dev_node" stored in deptree's load_properties struct to instruct the
device tree processing code to immediately synchronize with udev and flush all
stacked node operations so the nodes are prepared for use.
For now, the immediate_dev_node is used for clustered mirrors during
processing the dm_tree_preload_children code only. We can add more later if
needed.
daemons/cmirrord/functions.c (part of cmirrord) was referencing
linux/kdev_t.h even though it wasn't needed. Strangely, it seems
to be causing problems on various architectures (i686) in the
function daemons/cmirrord/functions.c:disk_status_info()->sprintf.
I'm not sure why this is a problem since none of the macros in
kdev_t.h are used in that code, but it certainly doesn't hurt to
pull an unnecessary header and it seems to fix the problem.
Milan Broz [Thu, 17 Jun 2010 12:48:54 +0000 (12:48 +0000)]
Clean up cluster lock mode and flags definition.
Code is mixing up internal DLM and LVM definitions of lock
modes and flags.
OpenAIS and singlenode locking do not depend on DLM but
code currently cannot be compiled without libdlm.h!
LCK_* flags is LVM abstraction, used through all the code.
Only low-level backend (clvmd-cman etc) should use DLM definitions,
also this code should do all needed conversions.
Because there are two DLM flags used in generic code
(NOQUEUE, CONVERT) we define it similar way like lock modes.
(So all needed binary-compatible flags are on one place in locking.h)
Zdenek Kabelac [Mon, 7 Jun 2010 14:31:59 +0000 (14:31 +0000)]
Fix wrong usage of exec_prefix from previous patch introducing LVM_PATH define
Introduce lvm_exec_prefix with resolved exec_prefix.
(using same ac_default_prefix as for CLVMD_PATH)
Use lvm_exec_prefix instead of dmeventd_prefix (fixes missing ac_default_prefix)
Note: This patch is rather hot-fix as currently generate code
does not create correct code for make exec_prefix=
Milan Broz [Fri, 4 Jun 2010 12:59:30 +0000 (12:59 +0000)]
Fix restart of clvmd using -S switch
- allocate environment dynamically (still missing some limit?)
- try to recover, if destroy failed (do not destroy lvm here) and free memory
- check strdup() return codes
- report failure to log
- do not print NULL in exclusive lock loop
Peter Rajnoha [Tue, 1 Jun 2010 16:08:13 +0000 (16:08 +0000)]
Add support for dm-mod module autoload.
A kernel patch is on its way for 2.6.35 adding support for dm-mod module
autoload. Udev v155 and higher is able to read static node information given
in modules.devname (extracted by depmod before) and will create such nodes
at its start. The first access to such node will load the module automatically
(directly in kernel) before the actual read/write operation is processed.
Zdenek Kabelac [Mon, 24 May 2010 09:03:39 +0000 (09:03 +0000)]
Replicator: update activate code for vgchange
Activate only the first replicator-dev LV, that activates all other
related LVs from Replicator. In case of error during this activation,
it will not retry again for other heads (less confusing error log).
Mikulas Patocka [Fri, 21 May 2010 15:28:16 +0000 (15:28 +0000)]
Fix scripts/relpath.awk to work with mawk
length(array) is specific to GNU awk and doesn't work in mawk.
Use a return value of "split" function to indicate array size, this is
supported in both gawk and mawk.
This patch fixes the following errors during "make install" when mawk is
installed as a default awk.
mawk: scripts/relpath.awk: line 25: illegal reference to array from
mawk: scripts/relpath.awk: line 25: illegal reference to array to
mawk: scripts/relpath.awk: line 27: illegal reference to array from
mawk: scripts/relpath.awk: line 32: illegal reference to array to