Welcome to CAPforge!
The Cluster Administration Package (CAP) is meant to ease integration, configuration, and systems management for clustering. It is the "glue" that allows cluster administrators to leverage existing technologies in a functional framework. This is done by delivering functionality into three component categories.
The first is "Information Management." From this category functions can be written to generate standard unix configuration files or to produce a file arrangement suitable for other clustering technologies.
The next category is "Control." This category allows an administrator to control their cluster as one system using common methods one uses to control a single system, such as power and console. With the addition of management devices, a cluster administrator can also control uids or gather system sensor data when possible.
The last category is "Installation." When installing a multitude of systems a cluster administrator wants a common method or set of methods to ensure a set of functionality if delivered to each node in their cluster. By leveraging existing technologies, CAP can ensure it can adapt to fit the needs of other installation methods available. clusters.
Objective:
Deliver the right tool sets and methods for managing and maintaining a diverse set of High Performance Computing (HPC) Linux Cluster configurations and architectures.
CAP Mantra!
- based on this http://www.hpcwire.com/hpc/1231732.html
Many organizations have used leadership analogies in describing systems. The Sandia Cplant Team used the concept of leaders and an SMU. ANL used the concept of towns and mayors. We are going to settile on the business org chart model but wanted to give credit to where we've taken this idea from. Thanks guys!
| Type | Business Org Chart Logical Mapping |
| CAP Configuration/Installation Root | Director |
| Scheduling/Logging | Associate Director |
| Admin/Boot | Manager |
| Login/Compute/Viz/IO Node | Workers |
This is a logical mapping of services. It doesn't mean you need a node for every role. In general we have used the following:
| Worker Node Count | Configuration |
| 256 - 512 | 2 managers (one being a director as well) |
| 1024 | 1 director node, 4 manager nodes |
| 2048 | 1 director node, 2 associate director node, 8 manager nodes |
| 4096 | 2 director nodes, 2 associate director nodes, 16 manager nodes |
In general we try to match 1 manager to at most 256-384 workers and figure a department needs to split when approaching 1024 workers. (SMU/Town
) We believe this method allows us to grow into large complex configurations and still be run in an efficient manner.In general a node that is a manager only can't tell another department what todo. If only real world org charts worked this way!
Below is a diagram that outlines how we perceive workflow for this layout.
#!graphviz
digraph orgchart {
node [shape=box]; Directors AssociateDirectors;
node [shape=circle]; Workers Managers
Directors -> AssociateDirectors
Directors -> Managers
AssociateDirectors -> Managers
Managers -> Workers
}
Here is a layout for a 4096 node system with:
- 4 departments (A-D) of 1024 workers each
- each group of 256 workers being managed by one manger (4 managers to a department)
#!graphviz
digraph orgchart {
node [shape=box]; Director1 Director2 AssociateDirector1 AssociateDirector2;
node [shape=circle]; A_Managers B_Managers C_Managers D_Managers;
node [shape=circle]; A_Workers B_Workers C_Workers D_Workers;
Director1 -> AssociateDirector1
Director1 -> AssociateDirector2
Director2 -> AssociateDirector1
Director2 -> AssociateDirector2
AssociateDirector1 -> A_Managers
AssociateDirector1 -> B_Managers
AssociateDirector1 -> C_Managers
AssociateDirector1 -> D_Managers
AssociateDirector2 -> A_Managers
AssociateDirector2 -> B_Managers
AssociateDirector2 -> C_Managers
AssociateDirector2 -> D_Managers
A_Managers -> A_Workers
B_Managers -> B_Workers
C_Managers -> C_Workers
D_Managers -> D_Workers
}
Supported Platforms
CAP is not dependent upon a specific vendor platform; however, it can take advantage of features available only from certain vendors, such as:
- CLI interface on HP Integrity servers
- Integrated Lights Out Edition (ilo) on HP Proliant servers
- Dell Remote Access Card (DRAC) on Dell PowerEdge? servers
- Advanced Systems Management Service Processor Network on IBM Netfinity servers
- IPMI interfaces
CAP is meant to run indepedently of the architecture. It has been run on the following types:
- x86_64: em64t, opteron AMD64
- IA32: Pentium II, III, 4, XEON
- IA64: Itanium I, II
License
CAP is licensed under the GPL v2 license. More information is available via: http://www.gnu.org/copyleft/gpl.html
Enjoy!
The CAPforge Team
Starting Points
- CAPforge News!!
- SourceCodeCheckout -- CAPforge Source Code Repositories
- Documentation Directory (Work in progress)
- Distribution Downloads
- Mailing lists
- Meeting Minutes
- Work Order Concept
- Device API (could also fit in pdsh?)
Other information
HOWTO Trac plugin guide for 64 bit OS installations
- Developer Pages
- TracGuide -- Built-in Documentation
- The Trac project -- Trac Open Source Project
- Trac FAQ -- Frequently Asked Questions
- CAPforge Support -- Trac Support
- scratchpad
Upcoming Meetings
cafe express
