Cluster log daemon (clogd): use LVM bitops in place of ext2 bitops
Eliminate dependency on outside library, since the same functionality
exists in our tree.
[It is important that the bitops work in the same way, as the bitmaps
must remain backwards compatible. I haven't tested every architecture,
but the x86* archs work. My test involved using the old ext2fsprogs
bitops, memcpy'ing the bits over to the LVM bitset array and ensuring
that only the bits set via the old methods were set.]
Petr Rockai [Thu, 13 Aug 2009 13:23:51 +0000 (13:23 +0000)]
Refactor file locking, lifting the flock wrapper code into separate
functions. Also fixes a bug, where a nonblocking lock could, in certain race
situations, succeed without actually obtaining the lock.
The changes to remove LCK_NONBLOCK from the LVM locks broke clvmd because the
code was clearly wrong but working anyway! The constant was being masked rather
than the variable that was supposed to match against it.
Mike Snitzer [Tue, 4 Aug 2009 16:02:39 +0000 (16:02 +0000)]
Added basic pvcreate --dataalignmentoffset testing to t-pvcreate-usage.sh
Added topology testing via new test/t-pvcreate-operation-md.sh
- requires mdadm and rawhide kernel for full test coverage
Mike Snitzer [Sat, 1 Aug 2009 17:11:02 +0000 (17:11 +0000)]
Retrieve MD sysfs attributes for MD partitions
Rename private _primary_dev() to a public get_primary_dev() and reuse it
to allow retrieval of the MD sysfs attributes (raid level, etc) for MD
partitions.
Mike Snitzer [Sat, 1 Aug 2009 17:09:48 +0000 (17:09 +0000)]
Improve ability to lookup primary device associated with a partition
Improve lib/device/device.c:_primary_dev()'s ability to look up the
primary device associated with all partitions; including blkext
(e.g. partitions directly on MD). The same will also work for obscure
sysfs paths; e.g.: paths with mangled names like the HP cciss driver
uses: /sys/block/cciss!c0d0/cciss!c0d0p1/
Mike Snitzer [Sat, 1 Aug 2009 17:08:43 +0000 (17:08 +0000)]
Add devices/data_alignment_detection to lvm.conf.
Adds 'data_alignment_detection' config option to the devices section of
lvm.conf. If your kernel provides topology information in sysfs (linux
>= 2.6.31) for the Physical Volume, the start of data area will be
aligned on a multiple of the ’minimum_io_size’ or ’optimal_io_size’
exposed in sysfs.
minimum_io_size is used if optimal_io_size is undefined (0). If both
md_chunk_alignment and data_alignment_detection are enabled the result
of data_alignment_detection is used.
Mike Snitzer [Sat, 1 Aug 2009 17:07:36 +0000 (17:07 +0000)]
Add devices/data_alignment_offset_detection to lvm.conf.
If the pvcreate --dataalignmentoffset option is not specified the start
of a PV's aligned data area will be shifted by the associated
'alignment_offset' exposed in sysfs (unless
devices/data_alignment_offset_detection is disabled in lvm.conf).
Mike Snitzer [Thu, 30 Jul 2009 21:15:17 +0000 (21:15 +0000)]
Disable the "new pe_start policy"
Documented which use-cases force the reinstatement of the nuanced
handling of pe_start. As soon as orphan PVs are eliminated much of this
will no longer be a concern ('preserve_pe_start' can be reenabled in
.pv_setup).
Added defensive 'if (pv->pe_align)' check in _text_pv_write()'s pe_start
loop.
Mike Snitzer [Thu, 30 Jul 2009 18:40:22 +0000 (18:40 +0000)]
Revert 'preserve_pe_start' related code in _text_pv_setup
If pv_setup was given a non-zero pe_start it would short-circuit
establishing a default pv->pe_align. pv->pe_align=0 would result
in a divide by zero in _mda_setup(). 'vgconvert -M2 $vgname' hit this.
.pv_write still properly preserves pe_start if it was supplied.
Mike Snitzer [Thu, 30 Jul 2009 17:45:28 +0000 (17:45 +0000)]
Add --dataalignmentoffset to pvcreate to shift start of aligned data area
Adds pe_align_offset to 'struct physical_volume'; is initialized with
set_pe_align_offset(). After pe_start is established pe_align_offset is
added to it.
Mike Snitzer [Thu, 30 Jul 2009 17:41:01 +0000 (17:41 +0000)]
Remove legacy support for preserving pe_start if a PV already has data
areas.
This preserved pe_start would quickly be readjusted to follow the first
mda anyway. An example use-case that hit this code path is: running
pvcreate on an already existing PV _without_ a preceeding pvremove.
Mike Snitzer [Thu, 30 Jul 2009 17:18:03 +0000 (17:18 +0000)]
Formalize pe_start policy as split between .pv_setup and .pv_write.
Document existing pe_start policy.
Fix issue in _text_pv_setup() where existing pe_start case could have
the pv->pe_start set to pv->pe_align even though pe_start shouldn't ever
change.
vgconvert and pvcreate have a facility to preserve the existing start
of the on-disk data extents, known as pe_start.
They indicate this by passing the existing value to the pvsetup function
which must preserve it.
This patch avoids one particular case where the value could get
changed incorrectly now that the alignment settings are configurable.
Making adjustments to go along with the changes to the kernel.
A patch to the kernel, adding the 'luid' field to dm_ulog_request,
will allow us to properly identify log instances. We will now
be able to definitively identify which logs are to be removed/
suspended/resumed. This replaces the old faulty behavior of
assuming the logs were the same if they had the same UUID and
incrementing/decrementing a reference count.
Dave Wysochanski [Tue, 28 Jul 2009 15:14:56 +0000 (15:14 +0000)]
Add an open_mode to the vg struct for liblvm - enforce read / write semantics.
For now, a simple way to enforce the read/write semantics is to just save the
open mode of the VG. If the caller uses lvm_vg_create, the mode is write.
The caller using lvm_vg_open can use either read or write to open the VG.
Once we have this, we enforce the permissions on each API call and don't allow
a caller to modify a VG that has not been opened properly.
This may be better combined with the locking mode, but I view that as future
cleanup, past this initial release. The intial release should enforce the
basic object semantics though, as described in the lvm.h file.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Dave Wysochanski [Tue, 28 Jul 2009 13:17:04 +0000 (13:17 +0000)]
Add lvm_vg_get_seqno, updating lvm.h and unit test.
Adding the ability to get the seqno is important for an application to
determine if something has changed in a VG. Otherwise, the only way to
know is to open the VG with write permission and hold the handle.
Dave Wysochanski [Tue, 28 Jul 2009 00:36:58 +0000 (00:36 +0000)]
Update lvm.h to address feeback.
This addresses a a large amount of Alasdair's review. Subsequent patches
will address remaining issues.
Addressed:
// FIXME Mention that's also required on error.
// FIXME Be consistent in terminology. It's called "system_dir" then last sentence says "system directory setting". Is it referring to "system_dir" there or something else?
// FIXME Mention it frees all resources and cannot be used subsequently?
// FIXME What does "any system configuration" mean?
// FIXME Expand on that explanation a bit, now that we know what the other fns look like.
// FIXME Not sure about that - it needs to scan sometimes. "will not" or "might not" ?
// FIXME: That's a FIXME in the code!!!
// FIXME What does "copied" mean in this context???
// FIXME Say what struct the returned struct dm_list is a list of...
// FIXME "This API" ? This function creates an object in memory?
// FIXME This function commits the Volume Group object referenced by the VG handle to disk?
// FIXME Where is "Name" defined? Absolute pathname?
Outstanding:
// FIXME Version function first? No structs or handles needed for that.
// FIXME Sort out this alignment. "Set an" directly below "system_dir" looks awful. Indent differently? More blank lines?
// FIXME Check how doxygen processes this. E.g. "return: LVM handle. You must use lvm_error() to check there were no errors and confirm that the handle is valid for passing to other functions."
// FIXME Find a better name. lvm_init.
// FIXME Consider renaming according to the new name for lvm_create.
// FIXME Please can we use dm_malloc throughout?