sourceware.org Git - lvm2.git/log

RAID:  Add writemostly/writebehind support for RAID1

'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics.  The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value.  If no trailing
character is given, it will set the flag.
Synopsis:
        lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
        lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv

The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set.  It is signified with a 'w'.  If the device
has failed, the 'p'artial flag has priority.

Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
  LV                VG   Attr      #Str Type   SSize
  raid1             vg   Rwi---r-m    2 raid1  500.00m
  [raid1_rimage_0]  vg   Iwi---r--    1 linear 500.00m
  [raid1_rimage_1]  vg   Iwi---r-w    1 linear 500.00m
  [raid1_rmeta_0]   vg   ewi---r--    1 linear   4.00m
  [raid1_rmeta_1]   vg   ewi---r--    1 linear   4.00m

Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
  LV                VG   Attr      #Str Type   SSize
  raid1             vg   rwi---r-p    2 raid1  500.00m
  [raid1_rimage_0]  vg   Iwi---r--    1 linear 500.00m
  [raid1_rimage_1]  vg   Iwi---r-p    1 linear 500.00m
  [raid1_rmeta_0]   vg   ewi---r--    1 linear   4.00m
  [raid1_rmeta_1]   vg   ewi---r-p    1 linear   4.00m

A new reportable field has been added for writebehind as well.  If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
  LV            Attr      WBehind
  lv            rwi-a-r--     512
  [lv_rimage_0] iwi-aor-w
  [lv_rimage_1] iwi-aor--
  [lv_rmeta_0]  ewi-aor--
  [lv_rmeta_1]  ewi-aor--

Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
  LV            Attr      WBehind
  lv            rwi-a-r--
  [lv_rimage_0] iwi-aor-w
  [lv_rimage_1] iwi-aor--
  [lv_rmeta_0]  ewi-aor--
  [lv_rmeta_1]  ewi-aor--

libdaemon: Print a diagnostic when we fail to lock the pidfile.

Revert "cleanup: simplify option matching function"

This reverts commit 0396ade38b88431d959ce02fac689306a2c47786.

The original code also handled len==1, which the new code doesn't.
Press <TAB> in the lvm shell to get a list of the possible
flag completions for a single hyphen.

CLEAN-UP:  Better string checking to avoid substring matches

Commit 9fd7ac7d035f0b2f8dcc3cb19935eb181816bd76 introduced a way a
method of avoiding reading from mirrors with a device failure.  If
a device was found to be dead, the mapping table was checked for
'handle_errors' or 'block_on_error'.  These strings were checked for
in the table string via 'strstr', which could also match on strings
like, 'no_handle_errors' or 'no_block_on_error'.  No such strings
exist, but we don't want to have problems in the future if they do.
So, we check for ' <string>{'\0'|' '}'.

dmsetup: check for strncpy

Test whether device name fits into a given buffer.

tools: add common lv_change_activate

Move common code for changing activation state from
vgchange and lvchange to one function.

Fix the order of checks - so we always implicitelly
activate snapshots and thin volumes in exclusive mode,
and we do not allow local deactivation for them.

cleanup: add gcc attribute for printf parameter

cleanup: remove unused lvl_idx

lvmcache: keep sp_count unsigned

WHATS_NEW: Add WHATS_NEW entry for previous commit.

RAID:  Revert previous commit that allowed identical table loads.

Revert commit 31c24dd9f2ad7b5f7913a18c9f11a00d7b3474a1.  This commit
was used to force a RAID device-mapper table to be loaded into the
kernel despite the fact that it was identical to the one already
loaded.  The effect allowed a RAID array with a transiently failed
device to refresh and reintegrate the failed device.  This operation
is better done in the kernel on a 'resume'.  Since,
'lvchange --refresh' already performs a suspend/resume cycle, the
above commit is not needed once the kernel change is made.  Reverting
the commit removes an unnecessary (at least for now) change to the
device-mapper interface.

RAID:  Add scrubbing support for RAID LVs

New options to 'lvchange' allow users to scrub their RAID LVs.
Synopsis:
lvchange --syncaction {check|repair} vg/raid_lv

RAID scrubbing is the process of reading all the data and parity blocks in
an array and checking to see whether they are coherent.  'lvchange' can
now initaite the two scrubbing operations: "check" and "repair".  "check"
will go over the array and recored the number of discrepancies but not
repair them.  "repair" will correct the discrepancies as it finds them.

'lvchange --syncaction repair vg/raid_lv' is not to be confused with
'lvconvert --repair vg/raid_lv'.  The former initiates a background
synchronization operation on the array, while the latter is designed to
repair/replace failed devices in a mirror or RAID logical volume.

Additional reporting has been added for 'lvs' to support the new
operations.  Two new printable fields (which are not printed by
default) have been added: "syncaction" and "mismatches".  These
can be accessed using the '-o' option to 'lvs', like:
lvs -o +syncaction,mismatches vg/lv
"syncaction" will print the current synchronization operation that the
RAID volume is performing.  It can be one of the following:
        - idle:   All sync operations complete (doing nothing)
        - resync: Initializing an array or recovering after a machine failure
        - recover: Replacing a device in the array
        - check: Looking for array inconsistencies
        - repair: Looking for and repairing inconsistencies
The "mismatches" field with print the number of descrepancies found during
a check or repair operation.

The 'Cpy%Sync' field already available to 'lvs' will print the progress
of any of the above syncactions, including check and repair.

Finally, the lv_attr field has changed to accomadate the scrubbing operations
as well.  The role of the 'p'artial character in the lv_attr report field
as expanded.  "Partial" is really an indicator for the health of a
logical volume and it makes sense to extend this include other health
indicators as well, specifically:
        'm'ismatches:  Indicates that there are discrepancies in a RAID
                       LV.  This character is shown after a scrubbing
                       operation has detected that portions of the RAID
                       are not coherent.
        'r'efresh   :  Indicates that a device in a RAID array has suffered
                       a failure and the kernel regards it as failed -
                       even though LVM can read the device label and
                       considers the device to be ok.  The LV should be
                       'r'efreshed to notify the kernel that the device is
                       now available, or the device should be 'r'eplaced
                       if it is suspected of failing.

config: remove typo in handling devices/write_cache_state config

...which caused the cmd->dump_filter to be always set irrespective
of the actual devices/write_cache_state setting.

test: Re-instate exec in lvm-wrapper, catching crashes inside not.

mirror:  Fix overly-concerning warning on mirror up-convert failure.

Attempting to up-convert an inactive mirror when there is insufficient
space leads to the following message:
  Unable to allocate extents for mirror(s).
  ABORTING: Failed to remove temporary mirror layer inactive_mimagetmp_3.
  Manual cleanup with vgcfgrestore and dmsetup may be required.
This is caused by a failure to execute the 'deactivate_lv' function in
the error condition.  The deactivate returns an error because the LV is
already inactive.  This patch checks if the LV is activate and calls
deactivate_lv only if it is.  This allows the error cleanup code to work
properly in this condition.

It wasn't that big of a deal anyway, since there was no previous vg_commit
that needed to be reverted.  IOW, no harm was done if the allocation failed.
The message was scary and useless.

WHATS_NEW: Include entry for RAID status func improvements

RAID:  Capture new RAID kernel sync_action status fields

I've updated the dm_status_raid structure and dm_get_status_raid()
function to make it handle the new kernel status fields that will
be coming in dm-raid v1.5.0.  It is backwards compatible with the
old status line - initializing the new fields to '0'.  The new
structure is also more amenable to future changes.  It includes a
'reserved' field that is currently initialized to zero but could
be used to hold flags describing new features.  It also now uses
pointers for the character strings instead of attempting to allocate
their space along with the structure (causing the size of the
structure to be variable).  This allows future fields to be appended.

The new fields that are available are:
- sync_action : shows what the sync thread in the kernel is doing
                 (idle, frozen, resync, recover, check, repair, or
                 reshape)
- mismatch_count: shows the number of discrepancies which were
                   found or repaired by a "check" or "repair"
                   process, respectively.

man: lvceate document external origin snapshot

Document added support for external origin.

man: updates for lvconvert and lvcreate

Cleanup and improvement on man pages.

libdm: validate params for NULL

Validate passed params and report error
instead of dereferencing NULL passed argument.

test: Invalidate .cache after vgimportclone.

lvmetad: Fix a memory leak introduced in 15fdd5c90dd.

lvmetad: Check for reappeared PVs.

lvmetad: Mark PVs visible to lvmetad but not to us as MISSING.

tests: lvconvert external origin

Add tests for thin external origin support

thin: lvcreate external origin snapshot support

cleanup: indent line

cmdline: add arg_long_option_name

Add simple function to return long_name string option for given arg.

log: reenable abort

abort_on_internal_error got ignored with the new class logging commit.j
Put this check back in this return path, so the check is not skipped.

metadata: use PV's internal UNLABELLED_PV flag more consistently

Set when new PV created, cleared on PV write.

pv_write: clean up non-orphan format1 PV write

...to not pollute the common and format-independent code in the
abstraction layer above.

The format1 pv_write has common code for writing metadata and
PV header by calling the "write_disks" fn and when rewriting
the header itself only (e.g. just for the purpose of changing
the PV UUID) during the pvchange operation, we had to tweak
this functionality for the format1 case and we had to assign
the PV the orphan state temporarily.

This patch removes the need for this format1 tweak and it calls
the write_disks with appropriate flag indicating whether this is
a PV write call or a VG write call, allowing for metatada update
for the latter one.

Also, a side effect of the former tweak was that it effectively
invalidated the cache (even for the non-format1 PVs) as we
assigned it the orphan state temporarily just for the format1
PV write to pass.

Also, that tweak made it difficult to directly detect whether
a PV was part of a VG or not because the state was incorrect.

Also, it's not necessary to backup and restore some PV fields
when doing a PV write:

  orig_pe_size = pv_pe_size(pv);
  orig_pe_start = pv_pe_start(pv);
  orig_pe_count = pv_pe_count(pv);
  ...
  pv_write(pv)
  ...
  pv->pe_size = orig_pe_size;
  pv->pe_start = orig_pe_start;
  pv->pe_count = orig_pe_count;

...this is already done by the layer below itself (the _format1_pv_write fn).

So let's have this cleaned up so we don't need to be bothered
about any 'format1 special case for pv_write' anymore.

WHATS_NEW: vgextend and PV with 0 MDAs

cleanup: remove unused 'pv_by_path' fn

The pv_by_path might be also dangerous to use as it does not
count with any other metadata areas but the ones found on the PV
itself. If metadata was not found on the PV referenced by the path,
it returned no PV though it might have been referenced by metadata
elsewhere (on other PVs...).

vgextend: do not allow PV with 0 MDAs to be added while already in a VG

If extending a VG and including a PV with 0 MDAs that was already
a part of a VG, the vgextend allowed that PV to be added and we
ended up *with one PV in two VGs*!

The vgextend code used the 'pv_by_path' fn that returned a PV for
a given path. However, when the PV did not have any metadata areas,
the fn just returned a PV without any reference to existing VG.
Consequently, any checks for the existing VG failed.

[0] raw/~ # pvcreate --metadatacopies 0 /dev/sda
  Physical volume "/dev/sda" successfully created

[0] raw/~ # pvcreate --metadatacopies 1 /dev/sdb
  Physical volume "/dev/sdb" successfully created

[0] raw/~ # vgcreate vg1 /dev/sda /dev/sdb
  Volume group "vg1" successfully created

[0] raw/~ # pvcreate --metadatacopies 1 /dev/sdc
  Physical volume "/dev/sdc" successfully created

[0] raw/~ # vgcreate vg2 /dev/sdc
  Volume group "vg2" successfully created

Before this patch (incorrect):
[0] raw/~ # vgextend vg2 /dev/sda
  Volume group "vg2" successfully extended

With this patch (correct):
[0] raw/~ # vgextend vg2 /dev/sda
  Physical volume '/dev/sda' is already in volume group 'vg1'
  Unable to add physical volume '/dev/sda' to volume group 'vg2'.

metadata: add 'allow_orphan' arg to find_pv_by_name fn

Before, the find_pv_by_name call always failed if the PV found was orphan.
However, we might use this function even for a PV that is not part of any VG.
This patch adds 'allow_orphan' arg to find_pv_by_name fn that allows that.

cleanup: remove superfluous wrappers

_find_pv_by_name -> find_pv_by_name
_find_pv_in_vg -> find_pv_in_vg
_find_pv_in_vg_by_uuid -> find_pv_in_vg_by_uuid

The only callers of the underscored variants were their wrappers
without the underscore. No other part of the code referenced the
underscored variants.

thin: rework lvconvert

Usage of layer was not the best plan here - for proper devices stack
we have to keep correct reference in volume_group structure and
make the new thin pool LV appear as a new volume.

thin: cannot use snapshot merge with thinpool

thin: assign through structure

Just simplify code.

thin: read paramaters in front

Fill paramaters into lvconvert_params -
so we could use it independently on args()

thin: lvconvert read params sooner

Move code for reading stripesize and readahead to the
_read_params function.

thin: move update_pool_params

Now we may recongnize preset arguments, move
the code for updating thin pool related values
into /lib portion of the code.

thin: mark passed args

Keep the flag whether given thin pool argument has been given on command
line or it's been 'estimated'

Call of update_pool_params() must not change cmdline given args and
needs to know this info.

Since there is a need to move this update function into /lib, we cannot
use arg_count().

FIXME: we need some generic mechanism here.

cleanup: move comment

cleanup: indent

dmsetup: fix 'splitname -o' to not fail if used without '-c'

This was a regression introduced with e33fd978a8a56228677c14612fce99b3b3588125
(libdm v1.02.68/lvm2 v2.02.89) with the introduction of new output
fields blkdevname and blkdevs_used for ls and deps dmsetup commands.

A new common '_process_options' fn was added with that commit, but the
fn was called prematurely which then broke processing of
'dmsetup splitname -o' which should implicitly use '-c' option
and this was failing after the commit:

  alatyr/~ $ dmsetup splitname -o lv_name /dev/mapper/vg_data-test
  Option not recognised: lv_name
  Couldn't process command line.

The '-c' had to be used for correct operation:

  alatyr/~ $ dmsetup splitname -c -o lv_name /dev/mapper/vg_data-test
  LV
  test

Now fixed to work as it did before:

  alatyr/~ $ dmsetup splitname -o lv_name /dev/mapper/vg_data-test
  LV
  test

filters: power2 devs get precedence if PVIDs match

Give precedence to EMC "power2" devices with duplicate PVIDs like
we already do with "emcpower" devices.

test: Fix the way in_sync calculates the sync field location

'in_sync' was using the last field in the RAID status output as
the location for the sync ratio field. The sync ratio may not always
be the last field, but it will always be the 7th field. So we switch
to using the absolute value rather than computing the last field
index number.

RAID:  Code changes missing from previous commit (bbc6378)

Previous commit included changes to WHATSNEW, but the code changes
were missing.  Here is the description from the previous commit:
commit bbc6378b73246d7ef00c16274fea38118531ae95
Author: Jonathan Brassow <jbrassow@redhat.com>
Date:   Thu Feb 21 11:31:36 2013 -0600

    RAID:  Make 'lvchange --refresh' restore transiently failed RAID PVs

    A new function (dm_tree_node_force_identical_table_reload) was added to
    avoid the suppression of identical table reloads.  This allows RAID LVs
    to reload the on-disk superblock information that contains which devices
    have failed and the bitmaps.  If the failed device has returned, this has
    the effect of restoring the device and initiating recovery.  Without this
    patch, the user had to completely deactivate their RAID LV and re-activate
    it in order to restore the failed device.  Now they simply need to
    suspend and resume (which is done by 'lvchange --refresh').

    The identical table suppression is only avoided if the LV is not PARTAIL
    (i.e. all of it's devices can be seen and read by LVM) and the kernel
    status of the array contains failed devices.  In other words, the function
    will only be called in the case where we may have success in restoring
    a failed device in the array.

clean-up: Typo 's/should had/should have/'

cleanup: get rid of compiler's warning about possible unitialized variable

config: make DEFAULT_MAX_HISTORY unconditional

WHATS_NEW: add lines for config validation support

dumpconfig: add --ignoreadvanced and --ignoreunsupported switch

lvm dumpconfig [--ignoreadvanced] [--ignoreunsupported]

--ignoreadvanced causes the advanced configuration options to be left
out on dumpconfig output

--ignoreunsupported causes the options that are not officially supported
to be lef out on dumpconfig output

config: add comment note about advanced and unsupported config nodes

This shows up in the output as a short commentary:

  $ lvm dumpconfig --type default --withcomments metadata/disk_areas
  # Configuration option metadata/disk_areas.
  # This configuration option is advanced.
  # This configuration option is not officially supported.
  disk_areas=""

dumpconfig: add --withcomments and --withversions switch

lvm dumpconfig [--withcomments] [--withversions]

The --withcomments causes the comments to appear on output before each
config node (if they were defined in config_settings.h).

The --withversions causes a one line extra comment to appear on output
before each config node with the version information in which the
configuration setting first appeared.

config: add support for enhanced config node output

There's a possibility to interconnect the dm_config_node with an
ID, which in our case is used to reference the configuration
definition ID from config_settings.h. So simply interconnecting
struct dm_config_node with struct cfg_def_item.

This patch also adds support for enhanced config node output besides
existing "output line by line". This patch adds a possibility to
register a callback that gets called *before* the config node is
processed line by line (for example to include any headers on output)
and *after* the config node is processed line by line (to include any
footers on output). Also, it adds the config node reference itself
as the callback arg in addition to have a possibility to extract more
information from the config node itself if needed when processing the
output callback (e.g. the key name, the id, or whether this is a
section or a value etc...).

If the config node from lvm.conf/--config tree is recognized and valid,
it's always coupled with the config node definition ID from
config_settings.h:

struct dm_config_node {
   int id;
   const char *key;
   struct dm_config_node *parent, *sib, *child;
   struct dm_config_value *v;
}

For example if the dm_config_node *cn holds "devices/dev" configuration,
then the cn->id holds "devices_dev_CFG" ID from config_settings.h, -1 if
not found in config_settings.h and 0 if matching has not yet been done.

To support the enhanced config node output, a new structure has been
defined in libdevmapper to register it:

  struct dm_config_node_out_spec {
    dm_config_node_out_fn prefix_fn; /* called before processing config node lines */
    dm_config_node_out_fn line_fn; /* called for each config node line */
    dm_config_node_out_fn suffix_fn; /* called after processing config node lines */
  };

Where dm_config_node_out_fn is:

  typedef int (*dm_config_node_out_fn)(const struct dm_config_node *cn, const char *line, void *baton);

(so in comparison to existing callbacks for config node output, it has
an extra dm_config_node *cn arg in addition)

This patch also adds these functions to libdevmapper:
  - dm_config_write_node_out
  - dm_config_write_one_node_out

...which have exactly the same functionality as their counterparts
without the "out" suffix. The "*_out" functions adds the extra hooks
for enhanced config output (prefix_fn and suffix_fn mentioned above).

One can still use the old interface for config node output, this is
just an enhancement for those who'd like to modify the output more
extensively.

dumpconfig: add --type, --atversion and --validate arg

lvm dumpconfig [--type {current|default|missing|new}] [--atversion] [--validate]

This patch adds above-mentioned args to lvm dumpconfig and it maps them
to creation and writing out a configuration tree of a specific type
(see also previous commit):

  - current maps to CFG_TYPE_CURRENT
  - default maps to CFG_TYPE_DEFAULT
  - missing maps to CFG_TYPE_MISSING
  - new maps to CFG_TYPE_NEW

If --type is not defined, dumpconfig defaults to "--type current"
which is the original behaviour of dumpconfig before all these changes.

The --validate option just validates current configuration tree
(lvm.conf/--config) and it writes a simple status message:

  "LVM configuration valid" or "LVM configuration invalid"

config: use config checks and add support for creating trees from config definition (config_def_create_tree fn)

Configuration checking is initiated during config load/processing
(_process_config fn) which is part of the command context
creation/refresh.

This patch also defines 5 types of trees that could be created from
the configuration definition (config_settings.h), the cfg_def_tree_t:

  - CFG_DEF_TREE_CURRENT that denotes a tree of all the configuration
    nodes that are explicitly defined in lvm.conf/--config

  - CFG_DEF_TREE_MISSING that denotes a tree of all missing
    configuration nodes for which default valus are used since they're
    not explicitly used in lvm.conf/--config

  - CFG_DEF_TREE_DEFAULT that denotes a tree of all possible
    configuration nodes with default values assigned, no matter what
    the actual lvm.conf/--config is

  - CFG_DEF_TREE_NEW that denotes a tree of all new configuration nodes
    that appeared in given version

  - CFG_DEF_TREE_COMPLETE that denotes a tree of the whole configuration
    tree that is used in LVM2 (a combination of CFG_DEF_TREE_CURRENT +
    CFG_DEF_TREE_MISSING). This is not implemented yet, it will be added
    later...

The function that creates the definition tree of given type:

  struct dm_config_tree *config_def_create_tree(struct config_def_tree_spec *spec);

Where the "spec" specifies the tree type to be created:

  struct config_def_tree_spec {
    cfg_def_tree_t type; /* tree type */
    uint16_t version; /* tree at this LVM2 version */
    int ignoreadvanced; /* do not include advanced configs */
    int ignoreunsupported; /* do not include unsupported configs */
  };

This tree can be passed to already existing functions that write
the tree on output (like we already do with cmd->cft).

There is a new lvm.conf section called "config" with two new options:

  - config/checks which enables/disables checking (enabled by default)

  - config/abort_on_errors which enables/disables aborts on any type of
    mismatch found in the config (disabled by default)

config: add support for configuration check (config_def_check fn)

Add support for configuration checking - type checking and recognition
of registered configuration settings that LVM2 understands and also
check the structure of the configuration. Log error on any mismatch
found.

A hash over all allowed configuration paths is created which helps
with matching the exact configuration (lvm.conf/--config tree) with
the configuration item definition from config_settings.h in an
efficient and one-step way.

Two more helper flags are introduced for each configuration definition
item:

  - CFG_USED which marks the item as being used (lvm.conf/--config)
    This helps with identifying missing configuration options
    (and for which defaults were used) when traversing the tree later.

  - CFG_VALID which denotes that the item has already been checked and
    it was found valid. This improves performance, so if the check
    is called once again on the same tree which was not reloaded, we
    can just return the state from previous check (with a possibility
    to force the check if needed).

The new function that config.h exports and which is going to be used
to perform the configuration checking is:

  int config_def_check(struct cmd_context *cmd, int force, int skip, int suppress_messages)

...which is exported internally via config.h.

libdevmapper: add dm_config_value_is_bool function

Export this functionality from libdevmapper just for
convenience and general use when reading boolean values
which could be defined either in a numeric way with 0/1
or by using strings with "true"/"false", "yes"/"no",
"on"/"off", "y"/"n".

config: refer to config nodes using assigned IDs

For example, the old call and reference:

find_config_tree_str(cmd, "devices/dir", DEFAULT_DEV_DIR)

...now becomes:

find_config_tree_str(cmd, devices_dir_CFG)

So we're referring to the named configuration ID instead
of passing the configuration path and the default value
is taken from central config definition in config_settings.h
automatically.

config: add structs to represent config definition and register config_settings.h content

This patch adds basic structures that encapsulate the config_settings.h
content - it takes each item and puts it in structures:

  - cfg_def_type_t to define config item type

  - cfg_def_value_t to define config item (default) value

  - flags used to define the nature and use of the config item:
      - CFG_NAME_VARIABLE for items with variable names (e.g. tags)
      - CFG_ALLOW_EMPTY for items where empty value is allowed
      - CFG_ADVANCED for items which are considered as "advanced settings"
      - CFG_UNSUPPORTED for items which are not officially supported
        (config options mostly for internal use and testing/debugging)

  - cfg_def_item_t to encapsulate the whole definition of the config
    definition itself

Each config item is referenced by named ID, e.g. "devices_dir_CFG"
instead of directly typing the path "devices/dir" as it was before.

This patch also adds cfg_def_get_path helper function to get the
config setting path up to the root for given config ID
(it returns the path in form of "abc/def/.../xyz" where the "abc"
is the topmost element).

config: add config_settings.h

This file centrally defines all recognized LVM2 configuration
sections and settings. Each item here has its parent, set of
allowed types, default value, brief comment, version the setting
first appeared in and flags that further define the nature of
the configuration setting and its use.

config: add vsn macro

The 'vsn' macro encodes the LVM2 version major, minor
and patchlevel number in a packed form using 16 bits.

config: fix config node lookup inside empty sections to not return the section itself

When a section was empty in a configuration tree (no children - this is
allowed) and we were looking for a config node inside that section, the
_find_config_node function incorrectly returned the section itself if
the node inside that section was not found.

For example the configuration below:

The config:
    abc {
    }

And a function call to get the "def" node inside "abc" section:
     _find_config_node(..., "abc/def")

...returned the "abc" node instead of NULL ("def" not found).

This in turn caused segfaults in the code using lookups in such
a configuration tree as we (correctly) expected that the node
returned was always the one we were looking for or NULL if not
found. But if incorrect node was returned instead, we processed
that as if this was the node we were looking for and so we
processed its value as well. But sections don't have values => segfault.

cleanup: remove struct pv_header_extension reference from struct pv_header

Just to prevent accidental and improper use when reading the layout
from disk because of the already existing disk_areas_xl[0] lists
that are variable in size. We can read pv_header_extension only
after we know exactly where the lists end...

lvmetad: fix to properly process embedding area

WHATS_NEW: add lines for embedding area support

man: update pvcreate/pvs/vgconvert man page to cover embedding area support

report: add reporting fields for Embedding Area start and size

There are new reporting fields for Embedding Area: ea_start and ea_size.

An example of 1m Embedding Area and relevant reporting fields:
raw/~ # pvs -o pv_name,pe_start,ea_start,ea_size
PV 1st PE EA start EA size
/dev/sda 2.00m 1.00m 1.00m

tools: add embeddingareasize arg to pvcreate and vgconvert

To create an Embedding Area during PV creation (pvcreate or as part of
the vgconvert operation), we need to define the Embedding Area size.
The Embedding Area start will be calculated automatically by the tools.

This patch adds --embeddingareasize argument to pvcreate and vgconvert.

pv_header_extension: add support for writing PV header extension (flags & Embedding Area)

The PV header extension information (PV header extension version, flags
and list of Embedding Area locations) is stored just beyond the PV header base.

When calculating the Embedding Area start value (ea_start), the same logic is
used as when calculating the pe_start value for Data Area - the value must
follow exactly the same alignment restrictions for its start value
(the alignment detected automatically or provided via command line using
the --dataalignment and --dataalignmentoffset arguments).

The Embedding Area is placed at the very start of the PV, starting at
ea_start. The Data Area starting at pe_start is placed next. The pe_start is
still properly aligned. Due to the pe_start alignment, it's possible that the
resulting Embedding Area size (ea_size) ends up bigger in size than requested
(but never less than requested).

pv_header_extension: add support for reading PV header extension (flags & Embedding Area)

New tools with PV header extension support will read the extension
if it exists and it's not an error if it does not exist (so old PVs
will still work seamlessly with new tools).

Old tools without PV header extension support will just ignore any
extension.

As for the Embedding Area location information (its start and size),
there are actually two places where this is stored:
  - PV header extension
  - VG metadata

The VG metadata contains a copy of what's written in the PV header
extension about the Embedding Area location (NULL value is not copied):

    physical_volumes {
        pv0 {
          id = "AkSSRf-difg-fCCZ-NjAN-qP49-1zzg-S0Fd4T"
          device = "/dev/sda"     # Hint only

          status = ["ALLOCATABLE"]
          flags = []
          dev_size = 262144       # 128 Megabytes
          pe_start = 67584
          pe_count = 23   # 92 Megabytes
          ea_start = 2048
          ea_size = 65536 # 32 Megabytes
        }
    }

The new metadata fields are "ea_start" and "ea_size".
This is mostly useful when restoring the PV by using existing
metadata backups (e.g. pvcreate --restorefile ...).

New tools does not require these two fields to exist in VG metadata,
they're not compulsory. Therefore, reading old VG metadata which doesn't
contain any Embedding Area information will not end up with any kind
of error but only a debug message that the ea_start and ea_size values
were not found.

Old tools just ignore these extra fields in VG metadata.

pv_header_extension: add supporting infrastructure for PV header extension (flags & Embedding Area)

PV header extension comes just beyond the existing PV header base:

PV header base (existing):
- uuid
- device size
- null-terminated list of Data Areas
- null-terminater list of MetaData Areas

PV header extension:
- extension version
- flags
- null-terminated list of Embedding Areas

This patch also adds "eas" (Embedding Areas) list to lvmcache (lvmcache_info)
and it also adds support for common operations on the list (just like for
already existing "das" - Data Areas list):
- lvmcache_add_ea
- lvmcache_update_eas
- lvmcache_foreach_ea
- lvmcache_del_eas

Also, add ea_start and ea_size to struct physical_volume for processing
PV Embedding Area location throughout the code (currently only one
Embedding Area is supported, though the definition on disk allows for
more if needed in the future...).

Also, define FMT_EAS format flag to mark that the format actually
supports Embedding Areas (currently format-text only).

cleanup: use struct pvcreate_restorable_params throughout

cleanup: add struct pvcreate_restorable_params and move relevant items from pvcreate_params

Extract restorable PV creation parameters from struct pvcreate_params into
a separate struct pvcreate_restorable_params for clarity and also for better
maintainability when adding any new items later.

activation: fix pvmove partial tree creation

Do not try to add LV again into the partial tree, if it's been
already added. Otherwise we may end in endless loop.

thin: lvconvert support for external origin

Add basic support for converting LV into an external origin volume.

Syntax:

lvconvert --thinpool vg/pool --originname renamed_origin -T origin

It will convert volume 'origin' into a thin volume, which will
use 'renamed_origin' as an external read-only origin.
All read/write into origin will go via 'pool'.

renamed_origin volume is read-only volume, that could be activated
only in read-only mode, and cannot be modified.

thin: removal of external_origin

thin: report external origin

Use the field 'origin' for reporting external origin lv name.

For thin volumes with external origin, report the size of
external origin size via:

lvs -o+origin_size

thin: external origin cannot be changed

Do not allow conversion of external origin into writeable LV,
and prohibit changing the external origin size.

If the snapshot origin is also external origin, merge is prohibited.

thin: add support for external origin

Add internal support for thin volume's external origin.

lvremove: easier removal of dependent lvs

Add function to remove lvs which are depending on removed lv
prior the lv is removed.

User is asked for confirmation.

activation: simplify activation code

Reorder activation code to look similar for preload tree and
activation tree.

Its also give much better suppport for device stacking,
since now we also support activation of snapshot which might
be then used for other devices.

activation: add _add_layer_target_to_dtree

Add function for creation of simple linear mapping over layer device.

thin: replace _thin_layer with lv_layer()

Use consitently lv_layer function internally for thin pool layer name.

activation: extend _cached_info

Add layer string to support check of layered devices.

configure: Respect pkg-config --cflags for udev.

conf: add missing '=' for raid10_segtype_default

RAID:  Make 'lvchange --refresh' restore transiently failed RAID PVs

A new function (dm_tree_node_force_identical_table_reload) was added to
avoid the suppression of identical table reloads.  This allows RAID LVs
to reload the on-disk superblock information that contains which devices
have failed and the bitmaps.  If the failed device has returned, this has
the effect of restoring the device and initiating recovery.  Without this
patch, the user had to completely deactivate their RAID LV and re-activate
it in order to restore the failed device.  Now they simply need to
suspend and resume (which is done by 'lvchange --refresh').

The identical table suppression is only avoided if the LV is not PARTAIL
(i.e. all of it's devices can be seen and read by LVM) and the kernel
status of the array contains failed devices.  In other words, the function
will only be called in the case where we may have success in restoring
a failed device in the array.

vgimport:  Allow '--force' to import VGs with missing PVs.

When there are missing PVs in a volume group, most operations that alter
the LVM metadata are disallowed.  It turns out that 'vgimport' is one of
those disallowed operations.  This is bad because it creates a circular
dependency.  'vgimport' will complain that the VG is inconsistent and that
'vgreduce --removemissing' must be run.  However, 'vgreduce' cannot be run
because it has not been imported.  Therefore, 'vgimport' must be one of
the operations allowed to change the metadata when PVs are missing.  The
'--force' option is the way to make 'vgimport' happen in spite of the
missing PVs.

pvcreate: fix alignment to incorporate alignment offset if PV has 0 MDAs

If zero metadata copies are used, there's no further recalculation of
PV alignment that happens when adding metadata areas to the PV and
which actually calculates the alignment correctly as a matter of fact.
So fix this for "PV without MDA" case as well.

Before this patch:
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 1 /dev/sda
  Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
  PV         1st PE
  /dev/sda    12.00m
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 0 /dev/sda
  Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
  PV         1st PE
  /dev/sda     8.00m

After this patch:
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 1 /dev/sda
  Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
  PV         1st PE
  /dev/sda    12.00m
[1] raw/~ # pvcreate --dataalignment 8m --dataalignmentoffset 4m
--metadatacopies 0 /dev/sda
  Physical volume "/dev/sda" successfully created
[1] raw/~ # pvs -o pv_name,pe_start
  PV         1st PE
  /dev/sda    12.00m

Also, remove a superfluous condition "pv->pe_start < pv->pe_align" in:
  if (pe_start == PV_PE_START_CALC && pv->pe_start < pv->pe_align)
    pv->pe_start = pv->pe_align ...
This part of the condition is not reachable as with the PV_PE_START_CALC,
we always have pv->pe_start set to 0 from the PV struct initialisation
(...the pv->pe_start value is just being calculated).

RAID: Add new 'raid10_segtype_default' setting in lvm.conf

If '--mirrors/-m' and '--stripes/-i' are used together when creating
a logical volume, mirrors-over-stripes is currently chosen. The user
can override this by using the '--type raid10' option on creation.
However, we want a place where we can set the default behavior to
'raid10' explicitly - similar to the "mirror" and "raid1" tunable,
mirror_segtype_default.

A follow-on patch should use this new setting to change the default
from "mirror" to "raid10", as this is the preferred segment type.

clean-up: Remove a FIXME question that has been settled

It is ok for us to use the shorthand 'lv_is_virtual' to detect error
targets in a RAID LV when searching for candidates for device replacement.

RAID:  Allow remove/replace of sub-LVs composed of error segments.

When a device fails, we may wish to replace those segments with an
error segment.  (Like when a 'vgreduce --removemissing' removes a
failed device that happens to be a RAID image/meta.)  We are then left
with images that we will eventually want to remove or replace.

This patch allows us to pull out these virtual "error" sub-LVs.  This
allows a user to 'lvconvert -m -1 vg/lv' to extract the bad sub-LVs.
Sub-LVs with error segments are considered for extraction before other
possible devices so that good devices are not accidentally removed.

This patch also adds the ability to replace RAID images that contain error
segments.  The user will still be unable to run 'lvconvert --replace'
because there is no way to address the 'error' segment (i.e. no PV
that it is associated with).  However, 'lvconvert --repair' can be
used to replace the image's error segment with a new PV.  This is also
the most appropriate way to do it, since the LV will continue to be
reported as 'partial'.

RAID: Make 'vgreduce --removemissing' work with RAID LVs

Currently it is impossible to remove a failed PV which has a RAID LV
on it. This patch fixes the issue by replacing the failed PV with an
'error' segment within the affected sub-LVs. Once there is no longer
a RAID LV using the PV, it can be removed.

Most often, it is better to replace a failed RAID device with a spare.
(You can use 'lvconvert --repair <vg>/<LV>' to accomplish that.)
However, if there are no spares in the volume group and none will be
added, it is useful to be able to removed the failed device.

Following patches address the ability to perform 'lvconvert' operations
on RAID LVs that contain sub-LVs composed of 'error' segments.

clean-up:  Rename lvm.conf setting 'mirror_region_size' to 'raid_region_size'

We have been using 'mirror_region_size' in lvm.conf as the default region
size for RAID logical volumes as well as mirror logical volumes.  Since,
"raid" is more inclusive and representative than "mirror", I have changed
the name of this setting.  We must still check for the old setting and warn
the user if we are overriding it with the new setting if both happen to be
present.

fix: 'Couldn't read extent size' --> '... extent start'

report: fix pvs -o pv_free reporting for PVs with 0 PEs

[0] raw/~ # lsblk -o NAME,SIZE /dev/sda
NAME  SIZE
sda   128M

[0] raw/~ # pvcreate --dataalignment 128m /dev/sda
  Physical volume "/dev/sda" successfully created

[0] raw/~ # vgcreate vg /dev/sda
  Volume group "vg" successfully created

[0] raw/~ # lvcreate -l1 vg
  Volume group "vg" has insufficient free space (0 extents): 1 required.

Before this patch:
[0] raw/~ # pvs -o pv_name,pv_free
  PV         PFree
  /dev/sda   128.00m

After this patch:
[0] raw/~ # pvs -o pv_name,pv_free
  PV         PFree
  /dev/sda      0