NewsDDRaid was demoed at LCA last week, running on a 12 node personal cluster that Peter Anvin of Orion Multisystems hand-carried halfway around the world expressly for the demo.Bad news and good news. The bad news: ddraid crashed in the demo. The good news: ddraid crashed only because a buggy old ddraid driver that can't handle an Ext3 mkfs was on the machine. Otherwise, it worked pretty well and performed to spec (on every try other than the actual demo). Slides for the LCA DDRAID presentation are here. Mpeg of the balloon ride is too big to post here. IntroductionThis is the home page of the Distributed Data Raid block device project. DDRaid is a device mapper extension that allows you to run a higher-level software raid array where each member of the raid is a separate cluster node rather than a local disk. It also works for ordinary software raid, and is actually pretty efficient.A cluster raid device together with a cluster filesystem like GFS or OCFS2 creates a distributed data cluster (the "dd" in "ddraid") that does not rely on a single shared disk. The cluster raid 3.5 array is redundant at the node level, so one data node can fail without losing any data. The cluster raid device will automatically find another node in the cluster to replace the missing node, in order to restore the safety factor. Besides extra saftey, the cluster raid array offers increased performance, particularly for linear IO loads. Random IO loads perform no worse than a single, raw disk. Performance is never worse than a single disk, and often far better. Similar to the existing device mapper mirror device, a ddraid array keeps a persistent record on disk of which regions of the array are currently being written so that in the event of a system crash, only those regions need to be resynced (by recomputing parity blocks). A cluster may contain both data nodes that are members of the distributed data array, and ordinary nodes that have full access to the array data via the cluster filesystem. Data nodes themselves may access the shared filesystem. DDraid is based on the Raid 3.5 model, which I investigated a couple of years ago but did not implement. A ddraid array can only have certain numbers of members, in practice: 2, 3, 5 or 9. However, each of these members can itself be an array, so any multiple of these numbers is possible. DDRaid arrays can be joined together linearly, and do not have to have the same number of members. So there is considerable flexibility in how a distributed data cluster may be configured. A ddraid device consists of three components:
A consequence of this infrastructe independence is, you can try out the ddraid device without installing any cluster patches. You only need the tarball linked below. [This isn't quite true at the moment, because the non-cluster ddraid agent still has a few gdlm dependencies, which will be fixed soon.] Project TeamOnce again, that would be me!StatusFirst benchmarks on a realistic configuration were achieved recently, using an Ext2 filesystem. Benchmarks show that a five member array can be up to 62% faster than a single, raw disk. Degraded mode operation (one member failed) was recently implemented. This project began life as a cluster mirror, and the server still only knows about mirrors at this point, not how to reconstruct parity blocks. Some work remains be done on client failure recovery and server failover. Disk errors are not handled yet, though the mechanism to continue operating with a failed disk is in place and tested. There are several known bugs. In other words, don't use ddraid for real data.Hackers and tire kickers are cordially invited to download the tarball and try it out. Source codeThis is ddraid.0.0.5.tgz, the tarball that should have been used last week at
LCA, and it would not have hung while making the Ext3 filesystem.
It has also been (lightly) tested with Ext2, ReiserFS, GFS and
OCFS2. Device bringup is still somewhat manual, as is failover
because ddraid-cman-agent.c hasn't been coded yet. Assignment of
the dirty map partition is also manual (part of the ddraid-server
command line). This ddraid snapshot does exhibit the linear performance
acceleration claimed above and achieves close to the aggregate platter
speeds, less the parity disk. The code lets you turn of global
synchronization entirely, or just turn off dirty logging. This
makes a modest difference to performance, but of course, you have to do
the synchronization yourself which could get a little tedious (though
for an active/passive Ext2/3 configuration, turning off global
synchronization might be just the thing to get that last drop of
performance). Here is the latest
tarball, to be built against kernel 2.6.11.3. This code is
suitable for unit testing and experimenting, not live use.
Tracing output is on by default, which needs to be turned off for any
reasonable performance. Follow the directions in the
README. The ddraid tarball 0.0.6 is around here somewhere and will be
posted soon. It's, um... better :-) Note Note Note! This is still pre-alpha. Do not
use it on a filesystem you care about. To get the latest source from CVS: At the login password prompt, just hit enter. But as of today, there isn't anything in cvs! Soon. DocumentationMailing lists linux-cluster
is the mailing list for cluster-related questions and discussion. Whenever the development source code repository is updated,
email is sent to the cluster-cvs mailing list. IRCChannel #linux-cluster on freenode Links
|