Conga - A Management platform for Cluster and Storage Systems
Introduction
Conga is a next generation management interface for cluster and storage configuration/monitoring targetted for inclusion in RHEL5. It seeks to draw together several disparate management systems into one seemless, intuitive interface.
Conga is not a monolithic, end to end system. While it can function as a complete remote management solution, it also affords many opportunities for users to replace components with their own elements more appropriate to their business
needs, and also to scale and customize conga to their requirements. In the Architecture section below, it is demonstrated that entry into conga can be made at a variety of places and comittment levels.
The objectives of the Conga project include:
- Integrating Cluster and Storage management into one streamlined web-based user interface.
- Incorporating features of deploy-tool into conga so that creating or extending a cluster is greatly simplified.
- Providing a customizable management interface that persists user preferences.
- Displaying Critical Events and Logs for the systems being monitored at user specifiable levels.
- Offering a way to monitor multiple systems or clusters, and browse to individual nodes on a cluster, without having to constantly re-authenticate.
- Aggregating monitoring data for a cluster or storage server asynchronously and persisting that data so that it is always at an administrators fingertips.
- Integrating context sensitive online support documentation into the various taskflows of the user interface.
- Support for fine grained user permissioning by creating and assigning roles to user tasks.
Use cases and User Scenarios:
An extensive set of Use Cases can be found here.
User Scenario #1:
Jack is a system administrator for his company. His primary responsibility is to keep his company's file server and outward facing webservers up and running. When he arrives at work in the morning, he starts a browser and selects a bookmark for his company's file server cluster. After logging into the management portal, he is greeted with a message that two critical events have occurred since he last was logged in:
- Node02 in the fileserver cluster was fenced at 4:31AM that morning. The name Node02 is highlighted as a link, and selecting the link, Jack is taken to the monitoring and configuration page for Node02. He clicks on the 'View Logs' link, and notices that the eth0 interface went down at 4:30AM. He then clicks the 'View Cluster' link for the cluster that this node is a member of, and notices that the NFS service and the Samba service that were running on Node02 are now running on Node03. Failover went smoothly.
- Jack is also informed by the event monitoring system within the management portal, that one of the machines he monitors storage on reached 90% usage on a critical Logical Volume (Jack has a 'Watch' set on this volume.) Clicking on the highlighted name of the Logical Volume, he is taken to the Storage Management page for that system, with the close to full volume centered on the page. He clicks a button labelled 'Extend Volume' and is greeted with a form for extending the size of the Volume.
Choosing a new disk that is not online yet, he initializes it, adds it to the appropriate volume group, and then resizes both the logical Volume and the file system running above it, with one click of a confirmation dialog.
Architecture
Conga is composed of the following:
- A component called ricci, which loosely stands for 'remote configuration interface' and is installed as an agent on any machine to be monitored or configured...details on ricci follow later.
- A component called luci, which provides a view and control assistance to ricci.
- A development and build plan.
- A deployment strategy.
- Use cases which motivate the need for the system.
Figure 1. General Conga Architecture
ricci
ricci is an agent that runs on any machine or cluster node to be administered. It is written in C for speed, and consists of several components. The following illustration shows the components of ricci and how they are deployed. The ricci pieces are in light blue.
In the diagram above, luci makes contact with the ricci dispatcher via a secure connection. The nature of this conection is detailed in this use case, and also described later in the security section of this document. ricci is a daemon listening on a specific port. In our prototype code, it uses port 11111. The final choice of port would need to be registered, of course.
All communication between ricci and luci is done through XML-RPC. If luci needs to obtain storage configuration information for a system that a user is monitoring, or to effect a change in the configuration of that system, it is requested through a versioned xml schema. The ricci dispatcher parses the incoming request, discovers that it is bound for its LVM module, and then hands the xml over to the module via DBUS and Oddjob. The LVM module is an application written in C that takes a string buffer (the incoming xml content) as input. It parses the request, acts on it, and generates a response that is handed back to the ricci dispatcher via dbus and oddjob, then finally to the luci server that initiated the original request.
DBUS is a system messaging bus that allows applications to communicate. It has been in Red Hat linux since RH7.0 days, and is a mature technology. Applications register their interfaces through config files in /etc/DBUS-1, and then other apps can connect to dbus and send messages with a destination address and method. DBUS has both C and python bindings available for its use.
Oddjob is tightly integrated with DBUS, and is a secure way for a non-root process (in our case, the ricci dispatcher) to run root privileged commands. Oddjob provides a framework for registering a specific process owned by a specific user with a specific root privilege. Establishing this trusted privilege requires root privilege to configure.
A couple of notes about the ricci modules:
- Only those ricci modules that are desired, need to be installed. Each ricci module is it's own RPM (with an associated xml schema) and when it is installed, it registers itself by including itself in a flat file with the module name, its DBUS address, and its xml identification tag so that the ricci dispatcher will automatically discover it and begin using it. In fact, the ricci dispatcher has a specific set of XML-RPC directives just for it, that includes downloading and installing additional ricci modules. If a sys admin wishes to add a machine to a running cluster, through luci he or she can specify the machine name and the conf file for the target cluster, and ricci will pull down and install all necessary cluster suite RPMs, install them, install the ricci cluster module, copy the cluster.conf to /etc/cluster, and then start the cluster daemons!
- New modules can be created and added as needed. If it were desirable to configure network parameters on a system, a network config module could be added along with a versioned XML-RPC schema and then registered with ricci.
- In the introduction, it was mentioned that Conga was designed with flexability being a key attribute, and that there were multiple entry point for users wishing to employ some or all of its capability. In the ricci diagram, there are three numbered balloons indicating entry points into ricci.
- Balloon #1: Any process running on the system that knows the XML schema for a particular module, and has root privilege (ricci modules run only as root) can exec a ricci module directly. This is akin to exposing a new xml interface for an application like LVM.
- Balloon #2: A process running on the machine can also connect to dbus, and with knowledge of the oddjob interface and address for the desired module, can invoke a ricci module in this manner. For example, a customer portal project is underway at Red Hat. If the customer portal developers would prefer not to connect to ricci, but would prefer having their own agent on the system, they could still use the rici modules. Note: Oddjob can be configured with different privileges for different interfaces. It would be trivial to allow certain users read-only xml-rpc comands.
- Balloon #3: Custom external view/controller components are free to handshake with ricci and use the ricci modules. It is not necessary to use luci. Custom components would need to use a secure conection to ricci, have knowledge of the root password on the system when challenged, and be able to store a key.
- ricci would provide several important new levels of functionality to cluster suite:
- ricci provides a way through a UI to start a cluster or stop a cluster with one click.
- ricci provides a way, by incorporating deploy-tool functionality of removing the banality of cluster pkg installation and copying of cluster.conf on new nodes.
- ricci provides a way of knowing that a node has been fenced, or to explicitly fence a node remotely -- handy for initial deployment and testing.
- ricci provides a way for the new snmp/cim based cluster monitoring component to asynchronously upload monitoring data to a luci server, for aggregation and processing.
- ricci provides a way of collecting general log information and setting watches on cluster parameters, such as when a service fails over or when a locking problem has become evident.
- There would be a way of discovering existing clusters in a programatic manner. When ping'd, the ricci dispatcher could return the simplest of "I'm Here!" messages; and if the machine was a member of a Red Hat Cluster Suite cluster, it could return the name of the cluster. luci, or a compatible web ui could scan a subnet or address range looking for clusters and when "Create a new node" is initiated, the user could choose from a list of clusters.
luci
luci is an application server that provides a web interface for ricci. At this writing, the GPL-licensed Plone package is a strong contender for this important component. Plone is a highly configurable Portal/Content Management System built on Zope (which is distributed under a license that the FSF agrees is GPL-compatible). Both elements are written in python, and integrate well with our existing python UI model and CGI code.
The biggest advantage of using plone is the User/Role management capabilities that are included with it. With luci built upon Plone, a luci administrator can create user accounts and create roles that those user accounts are privileged to employ. This allows the administrator to configure which systems a user is able to manage, and which ricci modules on a system a user is allowed to access.
luci has a database associated with it that will:
- Persist secure keys exchanged with each ricci module it encounters. This limits the need for a luci user to only enter the root password for a ricci-enavled machine once: single sign-on. Only the top level administrator at a luci site may add a new ricci system into the database.
- Save user portal preferences. Each luci user is free to customize their view of luci how they desire; such as portlet choice, colors, watches...all within their user roles, of course.
- Collect monitoring information and display it quickly when needed.
luci also provides a way for us to display context aware cluster documentation in a doc portlet. If a user is unsure of a fence device field, the explanation of the field is at their fingertips in an adjacent portlet.
Security
The Ricci Agent:
Ricci is written in C++. It is a daemon that runs as a non-root system account, and listens on port 11111. The SELinux policy for ricci only allows it to talk to DBUS and Pam. When it receives an authentication request, it passes the incoming password to Pam for authentication. If successful, ricci stores the ssl cert for the calling luci server in /var/lib/luci/certs. Only the ricci user can write to this folder.
Because ricci is a system account, it will not dump a core file. The incoming password is not persisted anywhere and resides only in volatile memory. We scramble the memory used for the password string despite the unlikely chance that a memory page from a system account could be viewed - we use a C++ String class to process the password. Custom allocators/deallocators are used for our string code that scramble the password's memory after use.
After authentication, requests and commands are sent to the ricci agent using XML via the ssl connection. Ricci reads the xml and determines what ricci module it is bound for, and then delivers it to the module via DBUS and Oddjob (Nalin's excellent trusted authority layer for DBUS). By the way, the buffer that holds the incoming XML is immediately scrambled after dispatch because fence agent passwords as well as iSCSI passwords may be embedded within the XML.
After the incoming XML's destination is determined, the xml makes its way via DBUS and Oddjob to the stdin of the module target module. The ricci modules are simply C applications that are exec'd by Oddjob as root privileged applications that are able to invoke needed root privileged apps such as lvm2, cluster tools, chkconfig, etc. The module generates a response and returns the response via DBUS to ricci, who in turn passes it on to the luci server that originated the request. Ricci modules are prevented from exec'ing each other by SELinux policy.