Piranha - Load-balanced generic service clustering enviroment

Description

Piranha is not one singular tool but a collection of programs arranged to provide a more cohesive enviroment for those administrators that wish to setup clustering services in the linux enviroment.

Requirements

Base requirements:

The recommended base kernel you should be using is linux-2.2.14 which has had the ipvs patch applied (0.9.10) You should compile, install and reboot your designated clustering computers on this patched kernel. The ipvs patch also contains the source code to build ipvsadm. ipvsadm is the workhorse of the Note, the kernel patch MUST be applied before you attempt to compile the utility as the patch carries some extra defines for some header files. If in doubt follow the README file included in the ipvs patch kit.

By following the above instructions you should have the basics required for using virtual IP services under linux. At time of writing, the RH6.2 release of the kernel is built with IPVS support though it uses a slightly older version of the IPVS patch.

LVS:

The piranha kit revolves around a single configuration file /etc/lvs.cf All components of piranha use this file as the definition of the cluster. Piranha provides a daemon called 'lvs' that runs on the primary and backup nodes. This process controls Piranha and supports communication among its components.

To help determine if a node in the cluster is still alive, another daemon, 'pulse' runs on the primary and backup nodes. This process is normally started from the rc scripts as '/etc/rc.d/init.d/pulse start'

Another daemon that runs on all nodes in the cluster is 'nanny'. Through this process, the primary LVS node determines whether a host service is alive and should continue to receive job assignments.

Piranha GUI:

There are actually two GUI's provided. A X11 GTK based one and a WEB based application. The GTK version is in the process of being phased out and should not be used, though it is provided for those that have used this GUI before and feel more comfortable using it. The GTK version has not been expanded to include the newer features of the piranha clustering facilities, so it is a good idea to think of using the WEB based version. Please note that both of these applications are used primarily to aid the administer in generating a valid /etc/lvs.cf file. For those people that are experianced, it is perfectly acceptable to edit the lvs.cf file using your favorite editor. The GUI's are provided to facilitate the rapid generation of the lvs.cf file as building one from scratch can be quite laborious.

The WEB based GUI relies on several components that may or may not be installed on your system. Those components are:

The piranha GUI typically runs as a seperate web process not attached to the local machines web service. Typically it runs as user piranha as opposed to user nobody and is normally found on port 3636 though this is configurable from the ~piranha/conf/httpd.conf file

lvs.cf:

/etc/lvs.cf is where the cluster definition is held. A typical cluster configuration will look as follows. Please pay attention to the comment fields which describe the meaning of various items.
    # This file is generated by the piranha GUI.  Do not hand edit.  All
    # modifications created by any means other than the use of piranha will
    # not be supported.
    #
    # This file has 3 sections. Section 1 is always required, then EITHER
    # section 2 or section 3 is to be used.
    #       1. LVS node/router definitions needed by the LVS system.
    #       2. Virtual server definitions, including lists of real servers.
    #       3. Failover service definitions (for any services running on the
    #          LVS primary or backup node instead of on virtual servers).
    #          NOTICE: Failover services are an upcoming feature of piranha and
    #          are not provided in this release.
    
    
    
    
    # SECTION 1 - GLOBAL SETTINGS
    #
    # The LVS is a single point of failure (which is bad).  To protect against
    # this machine breaking things, we should have a redundant/backup LVS node.
    #       service:        Either "lvs" for Virtual Servers  or "fos" for
    #                       Failover Services (defaults to "lvs" if missing)
    #       primary:        The IP of the main LVS node/router
    #       backup:         The IP of the backup LVS node/router
    #       backup_active:  Set this to 1 if using a backup LVS node/router
    #       heartbeat:      Use heartbeat between LVS nodes
    #       keepalive:      Time between heartbeats between LVS machines.
    #       deadtime:       Time w/ out response before node failure is assumed.
    
    
    service = lvs
    primary = 207.175.44.150
    backup = 207.175.44.196
    backup_active = 1
    heartbeat = 1
    heartbeat_port = 1050
    keepalive = 6
    deadtime = 18
    
    # All nodes must have either appropriate .rhost files set up for all nodes in
    # the cluster, us use some equivalent mechanism. Default it rsh, but you
    # may set an alternate command (which must be equivalent to rsh) here (ssh
    # is the most common).
    
    rsh_command = rsh
    
    # lvs server configuration environments: NAT, Direct Routing, and Tunneling.
    # NAT (Network Address Translation) is the simplist to set up and works well
    # in most situations.
    #
    # network = direct
    # network = tunnel
    
    network = nat
    nat_router = 192.168.10.100 eth1:1
    
    
    
    # SECTION 2 - VIRTUAL SERVERS
    #
    # Information we need to keep track of for each virtual server is:
    # scheduler:    pcc, rr, wlc, wrr (default is wlc)
    # persistent:   time (in seconds) to allow a persistent service connection to
    #               remain active.  If missing or set to 0, persistence is turned
    #               off.
    # pmask:        If persistence is enabled, this is the netmask to apply.
    #               Default is 255.255.255.255
    # address:      IP address of the virtual server (required)
    # active:       Simple switch if node is on or off
    # port:         port number to be handled by this virtual server (default
    #               is 80)
    # load_monitor: Tool to check load average on real server machines.
    #               Possible tools include rup, ruptime, uptime.
    # timeout:      Time (in seconds) between service activity queries
    # reentry:      Time (in seconds) a service must be alive before it is allowed
    #               back into the virtual server's routing table after leaving the
    #               table via failure.
    # send:         [optional] test string to send to port
    # expect:       [optional] test string to receive from port
    # protocol:     tcp or udp (defaults to tcp)
    #
    # This is the needed information for each real server for each Virtual Server:
    # address:      IP address of the real server.
    # active:       Simple switch if node is on or off
    # weight:       relative measure of server capacity
    
    virtual server1 {
            address = 207.175.44.252 eth0:1
            active = 1
            load_monitor = uptime
            timeout = 5
            reentry = 10
            port = http
            send = "GET / HTTP/1.0\r\n\r\n"
            expect = "HTTP"
            scheduler = wlc
            persistent = 60
            pmask = 255.255.255.255
            protocol = tcp
    
            server Real1 {
                    address = 192.168.10.2
                    active = 1
                    weight = 1
            }
    
            server Real2 {
                    address = 192.168.10.3
                    active = 1
                    weight = 1
            }
    }
    
    
    virtual server2 {
            address = 207.175.44.253 eth0:1
            active = 0
            load_monitor = uptime
            timeout = 5
            reentry = 10
            port = 21
            send = "\n"
    
            server Real1 {
                    address = 192.168.10.2
                    active = 1
            }
    
            server Real2 {
                    address = 192.168.10.3
                    active = 1
            }
    } 
    
    
    
    
    # SECTION 3 - FAILOVER SERVICES
    #
    # LVS node Service failover. This section applies only to services running
    # on the primary and backup LVS nodes (instead of being part of a virtual
    # server setup). You cannot currently use these services and virtual
    # servers in the same setup, and you must have at least a 2 node cluster
    # (a primary and backup) in order to use these failover services. All
    # nodes must be identically configured Linux systems.
    #
    # Failover services provide the most basic form of fault recovery. If any
    # of the services on the active node fail, all of the services will be
    # shutdown and restarted on a backup node. Services defined here will
    # automatically be started & stopped by LVS, so a backup node is
    # considered a "warm" standby. This is due to a technical restriction that
    # a service can only be operational on one node at a time, otherwise it may
    # fail to bind to a virtual IP address that does not yet exist on that
    # system or cause a networking conflict with the active service. The
    # commands provided for "start_cmd" and "stop_cmd" must work the same for
    # all nodes. Multiple services can be defined.
    #
    # Information here is similar in meaning and format to the virtual server
    # section. Failover Services and Virtual Servers cannot both be used on
    # a running system, so the "service = xxx" setting in the first section
    # of this file indicates which to use when starting the cluster.
    
    
    failover web1 {
         active = 1
         address = 207.175.44.242 eth0:1
         port = 1010
         send = "GET / HTTP/1.0\r\n\r\n"
         expect = "HTTP"
         timeout = 10
         start_cmd = "/etc/rc.d/init.d/httpd start"
         stop_cmd = "/etc/rc.d/init.d/httpd stop"
    }
    
    
    failover ftp {
         active = 0
         address = 207.175.44.252 eth0:1
         port = 21
         send = "\n"
         timeout = 10
         start_cmd = "/etc/rc.d/init.d/inet start"
         stop_cmd = "/etc/rc.d/init.d/inet stop"
    }

References

The original piranha howto
Whitepaper on piranha
The High-Availability Linux Project homepage