Conga - A Management platform for Cluster and Storage Systems


Introduction

Conga is a next generation management interface for cluster and storage configuration/monitoring targetted for inclusion in RHEL5. It seeks to draw together several disparate management systems into one seemless, intuitive interface.

Conga is not a monolithic, end to end system. While it can function as a complete remote management solution, it also affords many opportunities for users to replace components with their own elements more appropriate to their business needs, and also to scale and customize conga to their requirements. In the Architecture section below, it is demonstrated that entry into conga can be made at a variety of places and comittment levels.

The objectives of the Conga project include:

Use cases and User Scenarios: An extensive set of Use Cases can be found here.

User Scenario #1: Jack is a system administrator for his company. His primary responsibility is to keep his company's file server and outward facing webservers up and running. When he arrives at work in the morning, he starts a browser and selects a bookmark for his company's file server cluster. After logging into the management portal, he is greeted with a message that two critical events have occurred since he last was logged in:

  1. Node02 in the fileserver cluster was fenced at 4:31AM that morning. The name Node02 is highlighted as a link, and selecting the link, Jack is taken to the monitoring and configuration page for Node02. He clicks on the 'View Logs' link, and notices that the eth0 interface went down at 4:30AM. He then clicks the 'View Cluster' link for the cluster that this node is a member of, and notices that the NFS service and the Samba service that were running on Node02 are now running on Node03. Failover went smoothly.
  2. Jack is also informed by the event monitoring system within the management portal, that one of the machines he monitors storage on reached 90% usage on a critical Logical Volume (Jack has a 'Watch' set on this volume.) Clicking on the highlighted name of the Logical Volume, he is taken to the Storage Management page for that system, with the close to full volume centered on the page. He clicks a button labelled 'Extend Volume' and is greeted with a form for extending the size of the Volume. Choosing a new disk that is not online yet, he initializes it, adds it to the appropriate volume group, and then resizes both the logical Volume and the file system running above it, with one click of a confirmation dialog.

Architecture

Conga is composed of the following:
  1. A component called ricci, which loosely stands for 'remote configuration interface' and is installed as an agent on any machine to be monitored or configured...details on ricci follow later.
  2. A component called luci, which provides a view and control assistance to ricci.
  3. A development and build plan.
  4. A deployment strategy.
  5. Use cases which motivate the need for the system.


Figure 1. General Conga Architecture

ricci
ricci is an agent that runs on any machine or cluster node to be administered. It is written in C for speed, and consists of several components. The following illustration shows the components of ricci and how they are deployed. The ricci pieces are in light blue.

In the diagram above, luci makes contact with the ricci dispatcher via a secure connection. The nature of this conection is detailed in this use case, and also described later in the security section of this document. ricci is a daemon listening on a specific port. In our prototype code, it uses port 11111. The final choice of port would need to be registered, of course.

All communication between ricci and luci is done through XML-RPC. If luci needs to obtain storage configuration information for a system that a user is monitoring, or to effect a change in the configuration of that system, it is requested through a versioned xml schema. The ricci dispatcher parses the incoming request, discovers that it is bound for its LVM module, and then hands the xml over to the module via DBUS and Oddjob. The LVM module is an application written in C that takes a string buffer (the incoming xml content) as input. It parses the request, acts on it, and generates a response that is handed back to the ricci dispatcher via dbus and oddjob, then finally to the luci server that initiated the original request.

DBUS is a system messaging bus that allows applications to communicate. It has been in Red Hat linux since RH7.0 days, and is a mature technology. Applications register their interfaces through config files in /etc/DBUS-1, and then other apps can connect to dbus and send messages with a destination address and method. DBUS has both C and python bindings available for its use.

Oddjob is tightly integrated with DBUS, and is a secure way for a non-root process (in our case, the ricci dispatcher) to run root privileged commands. Oddjob provides a framework for registering a specific process owned by a specific user with a specific root privilege. Establishing this trusted privilege requires root privilege to configure.

A couple of notes about the ricci modules: