]>
Commit | Line | Data |
---|---|---|
4ebbd137 JEB |
1 | ======================= |
2 | = LVM RAID Design Doc = | |
3 | ======================= | |
4 | ||
5 | ############################# | |
6 | # Chapter 1: User-Interface # | |
7 | ############################# | |
8 | ||
9 | ***************** CREATING A RAID DEVICE ****************** | |
10 | ||
11 | 01: lvcreate --type <RAID type> \ | |
12 | 02: [--regionsize <size>] \ | |
13 | 03: [-i/--stripes <#>] [-I,--stripesize <size>] \ | |
14 | 04: [-m/--mirrors <#>] \ | |
15 | 05: [--[min|max]recoveryrate <kB/sec/disk>] \ | |
16 | 06: [--stripecache <size>] \ | |
17 | 07: [--writemostly <devices>] \ | |
18 | 08: [--maxwritebehind <size>] \ | |
19 | 09: [[no]sync] \ | |
20 | 10: <Other normal args, like: -L 5G -n lv vg> \ | |
21 | 11: [devices] | |
22 | ||
23 | Line 01: | |
24 | I don't intend for there to be shorthand options for specifying the | |
25 | segment type. The available RAID types are: | |
26 | "raid0" - Stripe [NOT IMPLEMENTED] | |
27 | "raid1" - should replace DM Mirroring | |
28 | "raid10" - striped mirrors, [NOT IMPLEMENTED] | |
29 | "raid4" - RAID4 | |
30 | "raid5" - Same as "raid5_ls" (Same default as MD) | |
31 | "raid5_la" - RAID5 Rotating parity 0 with data continuation | |
32 | "raid5_ra" - RAID5 Rotating parity N with data continuation | |
33 | "raid5_ls" - RAID5 Rotating parity 0 with data restart | |
34 | "raid5_rs" - RAID5 Rotating parity N with data restart | |
35 | "raid6" - Same as "raid6_zr" | |
36 | "raid6_zr" - RAID6 Rotating parity 0 with data restart | |
37 | "raid6_nr" - RAID6 Rotating parity N with data restart | |
38 | "raid6_nc" - RAID6 Rotating parity N with data continuation | |
39 | The exception to 'no shorthand options' will be where the RAID implementations | |
40 | can displace traditional tagets. This is the case with 'mirror' and 'raid1'. | |
75a59aab JEB |
41 | In this case, "mirror_segtype_default" - found under the "global" section in |
42 | lvm.conf - can be set to "mirror" or "raid1". The segment type inferred when | |
43 | the '-m' option is used will be taken from this setting. The default segment | |
44 | types can be overridden on the command line by using the '--type' argument. | |
4ebbd137 JEB |
45 | |
46 | Line 02: | |
47 | Region size is relevant for all RAID types. It defines the granularity for | |
48 | which the bitmap will track the active areas of disk. The default is currently | |
49 | 4MiB. I see no reason to change this unless it is a problem for MD performance. | |
50 | MD does impose a restriction of 2^21 regions for a given device, however. This | |
51 | means two things: 1) we should never need a metadata area larger than | |
52 | 8kiB+sizeof(superblock)+bitmap_offset (IOW, pretty small) and 2) the region | |
53 | size will have to be upwardly revised if the device is larger than 8TiB | |
54 | (assuming defaults). | |
55 | ||
56 | Line 03/04: | |
57 | The '-m/--mirrors' option is only relevant to RAID1 and will be used just like | |
58 | it is today for DM mirroring. For all other RAID types, -i/--stripes and | |
59 | -I/--stripesize are relevant. The former will specify the number of data | |
60 | devices that will be used for striping. For example, if the user specifies | |
61 | '--type raid0 -i 3', then 3 devices are needed. If the user specifies | |
62 | '--type raid6 -i 3', then 5 devices are needed. The -I/--stripesize may be | |
63 | confusing to MD users, as they use the term "chunksize". I think they will | |
64 | adapt without issue and I don't wish to create a conflict with the term | |
65 | "chunksize" that we use for snapshots. | |
66 | ||
67 | Line 05/06/07: | |
68 | I'm still not clear on how to specify these options. Some are easier than | |
69 | others. '--writemostly' is particularly hard because it involves specifying | |
70 | which devices shall be 'write-mostly' and thus, also have 'max-write-behind' | |
71 | applied to them. It has been suggested that a '--readmostly'/'--readfavored' | |
72 | or similar option could be introduced as a way to specify a primary disk vs. | |
73 | specifying all the non-primary disks via '--writemostly'. I like this idea, | |
74 | but haven't come up with a good name yet. Thus, these will remain | |
75 | unimplemented until future specification. | |
76 | ||
77 | Line 09/10/11: | |
78 | These are familiar. | |
79 | ||
80 | Further creation related ideas: | |
81 | Today, you can specify '--type mirror' without an '-m/--mirrors' argument | |
82 | necessary. The number of devices defaults to two (and the log defaults to | |
83 | 'disk'). A similar thing should happen with the RAID types. All of them | |
84 | should default to having two data devices unless otherwise specified. This | |
85 | would mean a total number of 2 devices for RAID 0/1, 3 devices for RAID 4/5, | |
86 | and 4 devices for RAID 6/10. | |
87 | ||
88 | ||
89 | ***************** CONVERTING A RAID DEVICE ****************** | |
90 | ||
91 | 01: lvconvert [--type <RAID type>] \ | |
92 | 02: [-R/--regionsize <size>] \ | |
93 | 03: [-i/--stripes <#>] [-I,--stripesize <size>] \ | |
94 | 04: [-m/--mirrors <#>] \ | |
75a59aab JEB |
95 | 05: [--merge] |
96 | 06: [--splitmirrors <#> [--trackchanges]] \ | |
97 | 07: [--replace <sub_lv|device>] \ | |
98 | 08: [--[min|max]recoveryrate <kB/sec/disk>] \ | |
99 | 09: [--stripecache <size>] \ | |
100 | 10: [--writemostly <devices>] \ | |
101 | 11: [--maxwritebehind <size>] \ | |
102 | 12: vg/lv | |
103 | 13: [devices] | |
4ebbd137 JEB |
104 | |
105 | lvconvert should work exactly as it does now when dealing with mirrors - | |
106 | even if(when) we switch to MD RAID1. Of course, there are no plans to | |
107 | allow the presense of the metadata area to be configurable (e.g. --corelog). | |
108 | It will be simple enough to detect if the LV being up/down-converted is | |
109 | new or old-style mirroring. | |
110 | ||
111 | If we choose to use MD RAID0 as well, it will be possible to change the | |
112 | number of stripes and the stripesize. It is therefore conceivable to see | |
113 | something like, 'lvconvert -i +1 vg/lv'. | |
114 | ||
115 | Line 01: | |
116 | It is possible to change the RAID type of an LV - even if that LV is already | |
117 | a RAID device of a different type. For example, you could change from | |
118 | RAID4 to RAID5 or RAID5 to RAID6. | |
119 | ||
75a59aab | 120 | Line 02/03/04: |
4ebbd137 JEB |
121 | These are familiar options - all of which would now be available as options |
122 | for change. (However, it'd be nice if we didn't have regionsize in there. | |
123 | It's simple on the kernel side, but is just an extra - often unecessary - | |
124 | parameter to many functions in the LVM codebase.) | |
125 | ||
75a59aab JEB |
126 | Line 05: |
127 | This option is used to merge an LV back into a RAID1 array - provided it was | |
128 | split for temporary read-only use by '--splitmirrors 1 --trackchanges'. | |
129 | ||
4ebbd137 | 130 | Line 06: |
75a59aab JEB |
131 | The '--splitmirrors <#>' argument should be familiar from the "mirror" segment |
132 | type. It allows RAID1 images to be split from the array to form a new LV. | |
133 | Either the original LV or the split LV - or both - could become a linear LV as | |
134 | a result. If the '--trackchanges' argument is specified in addition to | |
135 | '--splitmirrors', an LV will be split from the array. It will be read-only. | |
136 | This operation does not change the original array - except that it uses an empty | |
137 | slot to hold the position of the split LV which it expects to return in the | |
138 | future (see the '--merge' argument). It tracks any changes that occur to the | |
139 | array while the slot is kept in reserve. If the LV is merged back into the | |
140 | array, only the changes are resync'ed to the returning image. Repeating the | |
141 | 'lvconvert' operation without the '--trackchanges' option will complete the | |
142 | split of the LV permanently. | |
143 | ||
144 | Line 07: | |
4ebbd137 JEB |
145 | This option allows the user to specify a sub_lv (e.g. a mirror image) or |
146 | a particular device for replacement. The device (or all the devices in | |
147 | the sub_lv) will be removed and replaced with different devices from the | |
148 | VG. | |
149 | ||
75a59aab | 150 | Line 08/09/10/11: |
4ebbd137 JEB |
151 | It should be possible to alter these parameters of a RAID device. As with |
152 | lvcreate, however, I'm not entirely certain how to best define some of these. | |
153 | We don't need all the capabilities at once though, so it isn't a pressing | |
154 | issue. | |
155 | ||
75a59aab | 156 | Line 12: |
4ebbd137 JEB |
157 | The LV to operate on. |
158 | ||
75a59aab | 159 | Line 13: |
4ebbd137 JEB |
160 | Devices that are to be used to satisfy the conversion request. If the |
161 | operation removes devices or splits a mirror, then the devices specified | |
162 | form the list of candidates for removal. If the operation adds or replaces | |
163 | devices, then the devices specified form the list of candidates for allocation. | |
164 | ||
165 | ||
166 | ||
167 | ############################################### | |
168 | # Chapter 2: LVM RAID internal representation # | |
169 | ############################################### | |
170 | ||
171 | The internal representation is somewhat like mirroring, but with alterations | |
172 | for the different metadata components. LVM mirroring has a single log LV, | |
173 | but RAID will have one for each data device. Because of this, I've added a | |
174 | new 'areas' list to the 'struct lv_segment' - 'meta_areas'. There is exactly | |
175 | a one-to-one relationship between 'areas' and 'meta_areas'. The 'areas' array | |
176 | still holds the data sub-lv's (similar to mirroring), while the 'meta_areas' | |
177 | array holds the metadata sub-lv's (akin to the mirroring log device). | |
178 | ||
179 | The sub_lvs will be named '%s_rimage_%d' instead of '%s_mimage_%d' as it is | |
180 | for mirroring, and '%s_rmeta_%d' instead of '%s_mlog'. Thus, you can imagine | |
181 | an LV named 'foo' with the following layout: | |
182 | foo | |
183 | [foo's lv_segment] | |
184 | | | |
185 | |-> foo_rimage_0 (areas[0]) | |
186 | | [foo_rimage_0's lv_segment] | |
187 | |-> foo_rimage_1 (areas[1]) | |
188 | | [foo_rimage_1's lv_segment] | |
189 | | | |
190 | |-> foo_rmeta_0 (meta_areas[0]) | |
191 | | [foo_rmeta_0's lv_segment] | |
192 | |-> foo_rmeta_1 (meta_areas[1]) | |
193 | | [foo_rmeta_1's lv_segment] | |
194 | ||
195 | LVM Meta-data format | |
75a59aab | 196 | ==================== |
4ebbd137 JEB |
197 | The RAID format will need to be able to store parameters that are unique to |
198 | RAID and unique to specific RAID sub-devices. It will be modeled after that | |
199 | of mirroring. | |
200 | ||
201 | Here is an example of the mirroring layout: | |
202 | lv { | |
203 | id = "agL1vP-1B8Z-5vnB-41cS-lhBJ-Gcvz-dh3L3H" | |
204 | status = ["READ", "WRITE", "VISIBLE"] | |
205 | flags = [] | |
206 | segment_count = 1 | |
207 | ||
208 | segment1 { | |
209 | start_extent = 0 | |
210 | extent_count = 125 # 500 Megabytes | |
211 | ||
212 | type = "mirror" | |
213 | mirror_count = 2 | |
214 | mirror_log = "lv_mlog" | |
215 | region_size = 1024 | |
216 | ||
217 | mirrors = [ | |
218 | "lv_mimage_0", 0, | |
219 | "lv_mimage_1", 0 | |
220 | ] | |
221 | } | |
222 | } | |
223 | ||
224 | The real trick is dealing with the metadata devices. Mirroring has an entry, | |
225 | 'mirror_log', in the top-level segment. This won't work for RAID because there | |
226 | is a one-to-one mapping between the data devices and the metadata devices. The | |
227 | mirror devices are layed-out in sub-device/le pairs. The 'le' parameter is | |
228 | redundant since it will always be zero. So for RAID, I have simple put the | |
229 | metadata and data devices in pairs without the 'le' parameter. | |
230 | ||
231 | RAID metadata: | |
232 | lv { | |
233 | id = "EnpqAM-5PEg-i9wB-5amn-P116-1T8k-nS3GfD" | |
234 | status = ["READ", "WRITE", "VISIBLE"] | |
235 | flags = [] | |
236 | segment_count = 1 | |
237 | ||
238 | segment1 { | |
239 | start_extent = 0 | |
240 | extent_count = 125 # 500 Megabytes | |
241 | ||
242 | type = "raid1" | |
243 | device_count = 2 | |
244 | region_size = 1024 | |
245 | ||
246 | raids = [ | |
247 | "lv_rmeta_0", "lv_rimage_0", | |
248 | "lv_rmeta_1", "lv_rimage_1", | |
249 | ] | |
250 | } | |
251 | } | |
252 | ||
253 | The metadata also must be capable of representing the various tunables. We | |
254 | already have a good example for one from mirroring, region_size. | |
255 | 'max_write_behind', 'stripe_cache', and '[min|max]_recovery_rate' could also | |
256 | be handled in this way. However, 'write_mostly' cannot be handled in this | |
257 | way, because it is a characteristic associated with the sub_lvs, not the | |
258 | array as a whole. In these cases, the status field of the sub-lv's themselves | |
259 | will hold these flags - the meaning being only useful in the larger context. | |
260 | ||
75a59aab JEB |
261 | |
262 | ############################################## | |
263 | # Chapter 3: LVM RAID implementation details # | |
264 | ############################################## | |
265 | ||
4ebbd137 | 266 | New Segment Type(s) |
75a59aab | 267 | =================== |
4ebbd137 JEB |
268 | I've created a new file 'lib/raid/raid.c' that will handle the various different |
269 | RAID types. While there will be a unique segment type for each RAID variant, | |
270 | they will all share a common backend - segtype_handler functions and | |
271 | segtype->flags = SEG_RAID. | |
272 | ||
273 | I'm also adding a new field to 'struct segment_type', parity_devs. For every | |
274 | segment_type except RAID4/5/6, this will be 0. This field facilitates in | |
275 | allocation and size calculations. For example, the lvcreate for RAID5 would | |
276 | look something like: | |
277 | ~> lvcreate --type raid5 -L 30G -i 3 -n my_raid5 my_vg | |
278 | or | |
279 | ~> lvcreate --type raid5 -n my_raid5 my_vg /dev/sd[bcdef]1 | |
280 | ||
281 | In the former case, the stripe count (3) and device size are computed, and | |
282 | then 'segtype->parity_devs' extra devices are allocated of the same size. In | |
283 | the latter case, the number of PVs is determined and 'segtype->parity_devs' is | |
284 | subtracted off to determine the number of stripes. | |
285 | ||
286 | This should also work in the case of RAID10 and doing things in this manor | |
287 | should not affect the way size is calculated via the area_multiple. | |
288 | ||
289 | Allocation | |
75a59aab | 290 | ========== |
4ebbd137 JEB |
291 | When a RAID device is created, metadata LVs must be created along with the |
292 | data LVs that will ultimately compose the top-level RAID array. For the | |
293 | foreseeable future, the metadata LVs must reside on the same device as (or | |
294 | at least one of the devices that compose) the data LV. We use this property | |
295 | to simplify the allocation process. Rather than allocating for the data LVs | |
296 | and then asking for a small chunk of space on the same device (or the other | |
297 | way around), we simply ask for the aggregate size of the data LV plus the | |
298 | metadata LV. Once we have the space allocated, we divide it between the | |
299 | metadata and data LVs. This also greatly simplifies the process of finding | |
300 | parallel space for all the data LVs that will compose the RAID array. When | |
301 | a RAID device is resized, we will not need to take the metadata LV into | |
302 | account, because it will already be present. | |
303 | ||
304 | Apart from the metadata areas, the other unique characteristic of RAID | |
305 | devices is the parity device count. The number of parity devices does nothing | |
306 | to the calculation of size-per-device. The 'area_multiple' means nothing | |
307 | here. The parity devices will simply be the same size as all the other devices | |
308 | and will also require a metadata LV (i.e. it is treated no differently than | |
309 | the other devices). | |
310 | ||
311 | Therefore, to allocate space for RAID devices, we need to know two things: | |
312 | 1) how many parity devices are required and 2) does an allocated area need to | |
313 | be split out for the metadata LVs after finding the space to fill the request. | |
314 | We simply add these two fields to the 'alloc_handle' data structure as, | |
75a59aab JEB |
315 | 'parity_count' and 'alloc_and_split_meta'. These two fields get set in |
316 | '_alloc_init'. The 'segtype->parity_devs' holds the number of parity | |
4ebbd137 JEB |
317 | drives and can be directly copied to 'ah->parity_count' and |
318 | 'alloc_and_split_meta' is set when a RAID segtype is detected and | |
319 | 'metadata_area_count' has been specified. With these two variables set, we | |
320 | can calculate how many allocated areas we need. Also, in the routines that | |
321 | find the actual space, they stop not when they have found ah->area_count but | |
322 | when they have found (ah->area_count + ah->parity_count). | |
323 | ||
75a59aab JEB |
324 | Conversion |
325 | ========== | |
326 | RAID -> RAID, adding images | |
327 | --------------------------- | |
328 | When adding images to a RAID array, metadata and data components must be added | |
329 | as a pair. It is best to perform as many operations as possible before writing | |
330 | new LVM metadata. This allows us to error-out without having to unwind any | |
331 | changes. It also makes things easier if the machine should crash during a | |
332 | conversion operation. Thus, the actions performed when adding a new image are: | |
333 | 1) Allocate the required number of metadata/data pairs using the method | |
334 | describe above in 'Allocation' (i.e. find the metadata/data space | |
335 | as one unit and split the space between them after found - this keeps | |
336 | them together on the same device). | |
337 | 2) Form the metadata/data LVs from the allocated space (leave them | |
338 | visible) - setting required RAID_[IMAGE | META] flags as appropriate. | |
339 | 3) Write the LVM metadata | |
340 | 4) Activate and clear the metadata LVs. The clearing of the metadata | |
341 | requires the LVM metadata be written (step 3) and is a requirement | |
342 | before adding the new metadata LVs to the array. If the metadata | |
343 | is not cleared, it carry residual superblock state from a previous | |
344 | array the device may have been part of. | |
345 | 5) Deactivate new sub-LVs and set them "hidden". | |
346 | 6) expand the 'first_seg(raid_lv)->areas' and '->meta_areas' array | |
347 | for inclusion of the new sub-LVs | |
348 | 7) Add new sub-LVs and update 'first_seg(raid_lv)->area_count' | |
349 | 8) Commit new LVM metadata | |
350 | Failure during any of these steps will not affect the original RAID array. In | |
351 | the worst scenario, the user may have to remove the new sub-LVs that did not | |
352 | yet make it into the array. | |
353 | ||
354 | RAID -> RAID, removing images | |
355 | ----------------------------- | |
356 | To remove images from a RAID, the metadata/data LV pairs must be removed | |
357 | together. This is pretty straight-forward, but one place where RAID really | |
358 | differs from the "mirror" segment type is how the resulting "holes" are filled. | |
359 | When a device is removed from a "mirror" segment type, it is identified, moved | |
360 | to the end of the 'mirrored_seg->areas' array, and then removed. This action | |
361 | causes the other images to shift down and fill the position of the device which | |
362 | was removed. While "raid1" could be handled in this way, the other RAID types | |
363 | could not be - it would corrupt the ordering of the data on the array. Thus, | |
364 | when a device is removed from a RAID array, the corresponding metadata/data | |
365 | sub-LVs are removed from the 'raid_seg->meta_areas' and 'raid_seg->areas' arrays. | |
366 | The slot in these 'lv_segment_area' arrays are set to 'AREA_UNASSIGNED'. RAID | |
367 | is perfectly happy to construct a DM table mapping with '- -' if it comes across | |
368 | area assigned in such a way. The pair of dashes is a valid way to tell the RAID | |
369 | kernel target that the slot should be considered empty. So, we can remove | |
370 | devices from a RAID array without affecting the correct operation of the RAID. | |
371 | (It also becomes easy to replace the empty slots properly if a spare device is | |
372 | available.) In the case of RAID1 device removal, the empty slot can be safely | |
373 | eliminated. This is done by shifting the higher indexed devices down to fill | |
374 | the slot. Even the names of the images will be renamed to properly reflect | |
375 | their index in the array. Unlike the "mirror" segment type, you will never have | |
376 | an image named "*_rimage_1" occupying the index position 0. | |
377 | ||
378 | As with adding images, removing images holds off on commiting LVM metadata | |
379 | until all possible changes have been made. This reduces the likelyhood of bad | |
380 | intermediate stages being left due to a failure of operation or machine crash. | |
381 | ||
382 | RAID1 '--splitmirrors', '--trackchanges', and '--merge' operations | |
d0981401 | 383 | ------------------------------------------------------------------ |
75a59aab JEB |
384 | This suite of operations is only available to the "raid1" segment type. |
385 | ||
386 | Splitting an image from a RAID1 array is almost identical to the removal of | |
387 | an image described above. However, the metadata LV associated with the split | |
388 | image is removed and the data LV is kept and promoted to a top-level device. | |
389 | (i.e. It is made visible and stripped of its RAID_IMAGE status flags.) | |
390 | ||
391 | When the '--trackchanges' option is given along with the '--splitmirrors' | |
392 | argument, the metadata LV is left as part of the original array. The data LV | |
393 | is set as 'VISIBLE' and read-only (~LVM_WRITE). When the array DM table is | |
394 | being created, it notices the read-only, VISIBLE nature of the sub-LV and puts | |
395 | in the '- -' sentinel. Only a single image can be split from the mirror and | |
396 | the name of the sub-LV cannot be changed. Unlike '--splitmirrors' on its own, | |
397 | the '--name' argument must not be specified. Therefore, the name of the newly | |
398 | split LV will remain the same '<lv>_rimage_<N>', where 'N' is the index of the | |
399 | slot in the array for which it is associated. | |
400 | ||
401 | When an LV which was split from a RAID1 array with the '--trackchanges' option | |
402 | is merged back into the array, its read/write status is restored and it is | |
403 | set as "hidden" again. Recycling the array (suspend/resume) restores the sub-LV | |
404 | to its position in the array and begins the process of sync'ing the changes that | |
405 | were made since the time it was split from the array. | |
406 | ||
d0981401 JEB |
407 | RAID device replacement with '--replace' |
408 | ---------------------------------------- | |
409 | This option is available to all RAID segment types. | |
410 | ||
411 | The '--replace' option can be used to remove a particular device from a RAID | |
412 | logical volume and replace it with a different one in one action (CLI command). | |
413 | The device device to be removed is specified as the argument to the '--replace' | |
414 | option. This option can be specified more than once in a single command, | |
415 | allowing multiple devices to be replaced at the same time - provided the RAID | |
416 | logical volume has the necessary redundancy to allow the action. The devices | |
417 | to be used as replacements can also be specified in the command; similar to the | |
418 | way allocatable devices are specified during an up-convert. | |
419 | ||
420 | Example> lvconvert --replace /dev/sdd1 --replace /dev/sde1 vg/lv /dev/sd[bc]1 | |
421 | ||
422 | RAID '--repair' | |
423 | --------------- | |
424 | This 'lvconvert' option is available to all RAID segment types and is described | |
425 | under "RAID Fault Handling". | |
426 | ||
427 | ||
428 | RAID Fault Handling | |
429 | =================== | |
430 | RAID is not like traditional LVM mirroring (i.e. the "mirror" segment type). | |
431 | LVM mirroring required failed devices to be removed or the logical volume would | |
432 | simply hang. RAID arrays can keep on running with failed devices. In fact, for | |
433 | RAID types other than RAID1 removing a device would mean substituting an error | |
434 | target or converting to a lower level RAID (e.g. RAID6 -> RAID5, or RAID4/5 to | |
435 | RAID0). Therefore, rather than removing a failed device unconditionally, the | |
436 | user has a couple of options to choose from. | |
437 | ||
438 | The automated response to a device failure is handled according to the user's | |
439 | preference defined in lvm.conf:activation.raid_fault_policy. The options are: | |
440 | # "warn" - Use the system log to warn the user that a device in the RAID | |
441 | # logical volume has failed. It is left to the user to run | |
442 | # 'lvconvert --repair' manually to remove or replace the failed | |
443 | # device. As long as the number of failed devices does not | |
444 | # exceed the redundancy of the logical volume (1 device for | |
445 | # raid4/5, 2 for raid6, etc) the logical volume will remain | |
446 | # usable. | |
447 | # | |
448 | # "remove" - NOT CURRENTLY IMPLEMENTED OR DOCUMENTED IN example.conf.in. | |
449 | # Remove the failed device and reduce the RAID logical volume | |
450 | # accordingly. If a single device dies in a 3-way mirror, | |
451 | # remove it and reduce the mirror to 2-way. If a single device | |
452 | # dies in a RAID 4/5 logical volume, reshape it to a striped | |
453 | # volume, etc - RAID 6 -> RAID 4/5 -> RAID 0. If devices | |
454 | # cannot be removed for lack of redundancy, fail. | |
455 | # THIS OPTION CANNOT YET BE IMPLEMENTED BECAUSE RESHAPE IS NOT | |
456 | # YET SUPPORTED IN linux/drivers/md/dm-raid.c. The superblock | |
457 | # does not yet hold enough information to support reshaping. | |
458 | # | |
459 | # "allocate" - Attempt to use any extra physical volumes in the volume | |
460 | # group as spares and replace faulty devices. | |
461 | ||
462 | If manual intervention is taken, either in response to the automated solution's | |
463 | "warn" mode or simply because dmeventd hadn't run, then the user can call | |
464 | 'lvconvert --repair vg/lv' and follow the prompts. They will be prompted | |
465 | whether or not to replace the device and cause a full recovery of the failed | |
466 | device. | |
467 | ||
468 | If replacement is chosen via the manual method or "allocate" is the policy taken | |
469 | by the automated response, then 'lvconvert --replace' is the mechanism used to | |
470 | attempt the replacement of the failed device. | |
471 | ||
472 | 'vgreduce --removemissing' is ineffectual at repairing RAID logical volumes. It | |
473 | will remove the failed device, but the RAID logical volume will simply continue | |
474 | to operate with an <unknown> sub-LV. The user should clear the failed device | |
475 | with 'lvconvert --repair'. |