NodeMon is a resource utilization monitor tailored to the Altix architecture, but is applicable to any Linux system or cluster. It allows distributed resource monitoring via the Growler software infrastructure. It is modular, with existing modules for monitoring of cpu, memory, network, and numalink activity.
Its most notable feature is its composition of large amounts of statistics into a single graphical window. By displaying system behavior during a simulation's run on Columbia, NodeMon has enabled scientists to see how their codes utilize Columbia's resources in real time, potentially highlighting bottlenecks and opportunities for further optimization. The tool is currently used for real-time monitoring of Columbia, as well as the Scientific Visualization group's graphics clusters. NodeMon was designed to provide a compact, complete overview of the Columbia system, providing low-latency, high frequency visual feedback about the system's nodes and the jobs running on them.