]> sourceware.org Git - lvm2.git/blame - doc/lvm_fault_handling.txt
man: document allocation process in lvm.8
[lvm2.git] / doc / lvm_fault_handling.txt
CommitLineData
b5097c84
JEB
1LVM device fault handling
2=========================
3
4Introduction
5------------
6This document is to serve as the definitive source for information
7regarding the policies and procedures surrounding device failures
8in LVM. It codifies LVM's responses to device failures as well as
9the responsibilities of administrators.
10
11Device failures can be permanent or transient. A permanent failure
12is one where a device becomes inaccessible and will never be
13revived. A transient failure is a failure that can be recovered
14from (e.g. a power failure, intermittent network outage, block
15relocation, etc). The policies for handling both types of failures
16is described herein.
17
d0981401
JEB
18Users need to be aware that there are two implementations of RAID1 in LVM.
19The first is defined by the "mirror" segment type. The second is defined by
20the "raid1" segment type. The characteristics of each of these are defined
21in lvm.conf under 'mirror_segtype_default' - the configuration setting used to
22identify the default RAID1 implementation used for LVM operations.
23
b5097c84
JEB
24Available Operations During a Device Failure
25--------------------------------------------
26When there is a device failure, LVM behaves somewhat differently because
27only a subset of the available devices will be found for the particular
28volume group. The number of operations available to the administrator
29is diminished. It is not possible to create new logical volumes while
30PVs cannot be accessed, for example. Operations that create, convert, or
31resize logical volumes are disallowed, such as:
32- lvcreate
33- lvresize
34- lvreduce
35- lvextend
36- lvconvert (unless '--repair' is used)
37Operations that activate, deactivate, remove, report, or repair logical
38volumes are allowed, such as:
39- lvremove
40- vgremove (will remove all LVs, but not the VG until consistent)
41- pvs
42- vgs
43- lvs
44- lvchange -a [yn]
45- vgchange -a [yn]
46Operations specific to the handling of failed devices are allowed and
47are as follows:
48
49- 'vgreduce --removemissing <VG>': This action is designed to remove
50 the reference of a failed device from the LVM metadata stored on the
51 remaining devices. If there are (portions of) logical volumes on the
52 failed devices, the ability of the operation to proceed will depend
53 on the type of logical volumes found. If an image (i.e leg or side)
54 of a mirror is located on the device, that image/leg of the mirror
55 is eliminated along with the failed device. The result of such a
56 mirror reduction could be a no-longer-redundant linear device. If
57 a linear, stripe, or snapshot device is located on the failed device
58 the command will not proceed without a '--force' option. The result
59 of using the '--force' option is the entire removal and complete
d0981401
JEB
60 loss of the non-redundant logical volume. If an image or metadata area
61 of a RAID logical volume is on the failed device, the sub-LV affected is
62 replace with an error target device - appearing as <unknown> in 'lvs'
63 output. RAID logical volumes cannot be completely repaired by vgreduce -
64 'lvconvert --repair' (listed below) must be used. Once this operation is
65 complete on volume groups not containing RAID logical volumes, the volume
66 group will again have a complete and consistent view of the devices it
67 contains. Thus, all operations will be permitted - including creation,
68 conversion, and resizing operations. It is currently the preferred method
69 to call 'lvconvert --repair' on the individual logical volumes to repair
70 them followed by 'vgreduce --removemissing' to extract the physical volume's
71 representation in the volume group.
b5097c84
JEB
72
73- 'lvconvert --repair <VG/LV>': This action is designed specifically
d0981401
JEB
74 to operate on individual logical volumes. If, for example, a failed
75 device happened to contain the images of four distinct mirrors, it would
76 be necessary to run 'lvconvert --repair' on each of them. The ultimate
77 result is to leave the faulty device in the volume group, but have no logical
78 volumes referencing it. (This allows for 'vgreduce --removemissing' to
79 removed the physical volumes cleanly.) In addition to removing mirror or
80 RAID images that reside on failed devices, 'lvconvert --repair' can also
81 replace the failed device if there are spare devices available in the
82 volume group. The user is prompted whether to simply remove the failed
83 portions of the mirror or to also allocate a replacement, if run from the
84 command-line. Optionally, the '--use-policies' flag can be specified which
85 will cause the operation not to prompt the user, but instead respect
b5097c84 86 the policies outlined in the LVM configuration file - usually,
d0981401
JEB
87 /etc/lvm/lvm.conf. Once this operation is complete, the logical volumes
88 will be consistent. However, the volume group will still be inconsistent -
89 due to the refernced-but-missing device/PV - and operations will still be
b5097c84
JEB
90 restricted to the aformentioned actions until either the device is
91 restored or 'vgreduce --removemissing' is run.
92
93Device Revival (transient failures):
94------------------------------------
95During a device failure, the above section describes what limitations
96a user can expect. However, if the device returns after a period of
97time, what to expect will depend on what has happened during the time
98period when the device was failed. If no automated actions (described
99below) or user actions were necessary or performed, then no change in
100operations or logical volume layout will occur. However, if an
101automated action or one of the aforementioned repair commands was
102manually run, the returning device will be perceived as having stale
103LVM metadata. In this case, the user can expect to see a warning
104concerning inconsistent metadata. The metadata on the returning
105device will be automatically replaced with the latest copy of the
106LVM metadata - restoring consistency. Note, while most LVM commands
107will automatically update the metadata on a restored devices, the
108following possible exceptions exist:
109- pvs (when it does not read/update VG metadata)
110
111Automated Target Response to Failures:
112--------------------------------------
d0981401
JEB
113The only LVM target types (i.e. "personalities") that have an automated
114response to failures are the mirror and RAID logical volumes. The other target
b5097c84
JEB
115types (linear, stripe, snapshot, etc) will simply propagate the failure.
116[A snapshot becomes invalid if its underlying device fails, but the
117origin will remain valid - presuming the origin device has not failed.]
d0981401
JEB
118
119Starting with the "mirror" segment type, there are three types of errors that
120a mirror can suffer - read, write, and resynchronization errors. Each is
121described in depth below.
b5097c84
JEB
122
123Mirror read failures:
124If a mirror is 'in-sync' (i.e. all images have been initialized and
125are identical), a read failure will only produce a warning. Data is
126simply pulled from one of the other images and the fault is recorded.
127Sometimes - like in the case of bad block relocation - read errors can
128be recovered from by the storage hardware. Therefore, it is up to the
129user to decide whether to reconfigure the mirror and remove the device
130that caused the error. Managing the composition of a mirror is done with
131'lvconvert' and removing a device from a volume group can be done with
132'vgreduce'.
133
134If a mirror is not 'in-sync', a read failure will produce an I/O error.
135This error will propagate all the way up to the applications above the
136logical volume (e.g. the file system). No automatic intervention will
137take place in this case either. It is up to the user to decide what
138can be done/salvaged in this senario. If the user is confident that the
139images of the mirror are the same (or they are willing to simply attempt
140to retreive whatever data they can), 'lvconvert' can be used to eliminate
141the failed image and proceed.
142
143Mirror resynchronization errors:
144A resynchronization error is one that occurs when trying to initialize
145all mirror images to be the same. It can happen due to a failure to
146read the primary image (the image considered to have the 'good' data), or
147due to a failure to write the secondary images. This type of failure
148only produces a warning, and it is up to the user to take action in this
149case. If the error is transient, the user can simply reactivate the
150mirrored logical volume to make another attempt at resynchronization.
151If attempts to finish resynchronization fail, 'lvconvert' can be used to
152remove the faulty device from the mirror.
153
154TODO...
155Some sort of response to this type of error could be automated.
156Since this document is the definitive source for how to handle device
157failures, the process should be defined here. If the process is defined
158but not implemented, it should be noted as such. One idea might be to
159make a single attempt to suspend/resume the mirror in an attempt to
160redo the sync operation that failed. On the other hand, if there is
161a permanent failure, it may simply be best to wait for the user or the
162automated response that is sure to follow from a write failure.
163...TODO
164
165Mirror write failures:
166When a write error occurs on a mirror constituent device, an attempt
167to handle the failure is automatically made. This is done by calling
168'lvconvert --repair --use-policies'. The policies implied by this
169command are set in the LVM configuration file. They are:
170- mirror_log_fault_policy: This defines what action should be taken
171 if the device containing the log fails. The available options are
172 "remove" and "allocate". Either of these options will cause the
173 faulty log device to be removed from the mirror. The "allocate"
174 policy will attempt the further action of trying to replace the
175 failed disk log by using space that might be available in the
176 volume group. If the allocation fails (or the "remove" policy
177 is specified), the mirror log will be maintained in memory. Should
178 the machine be rebooted or the logical volume deactivated, a
179 complete resynchronization of the mirror will be necessary upon
180 the follow activation - such is the nature of a mirror with a 'core'
181 log. The default policy for handling log failures is "allocate".
182 The service disruption incurred by replacing the failed log is
183 negligible, while the benefits of having persistent log is
184 pronounced.
185- mirror_image_fault_policy: This defines what action should be taken
186 if a device containing an image fails. Again, the available options
187 are "remove" and "allocate". Both of these options will cause the
188 faulty image device to be removed - adjusting the logical volume
189 accordingly. For example, if one image of a 2-way mirror fails, the
190 mirror will be converted to a linear device. If one image of a
191 3-way mirror fails, the mirror will be converted to a 2-way mirror.
192 The "allocate" policy takes the further action of trying to replace
193 the failed image using space that is available in the volume group.
194 Replacing a failed mirror image will incure the cost of
195 resynchronizing - degrading the performance of the mirror. The
196 default policy for handling an image failure is "remove". This
197 allows the mirror to still function, but gives the administrator the
198 choice of when to incure the extra performance costs of replacing
199 the failed image.
200
d0981401
JEB
201RAID logical volume device failures are handled differently from the "mirror"
202segment type. Discussion of this can be found in lvm2-raid.txt.
This page took 0.053289 seconds and 5 git commands to generate.