Again, the scope of this competency has been difficult for me to determine. The only other seemingly-reputable LFCE guide on the Internet (TecMint) hasn’t yet published its post on this subject, and as with the previous post (Monitoring Network Performance), I’m very interested to see how they handle it. Unlike the previous post, it seems to me the scope of this competency can extend almost infinitely. “System use,” after all, describes anything the system can do, and knowing how to generate reports regarding absolutely anything the system might do certainly covers a wide range of skills. “Outages” could be absolutely any disruption of service (a process dies unexpectedly, the NIC dies, RAM corruption, who knows?) and “user requests” could be, well, anything (as you probably know if you work in information technology).
If you ask me, the hands-down most important utility for understanding system behavior is the auditing subsystem: auditd. Auditd allows you to build rules which observe the kernel at the system call level, so you get deep introspection into the operation of the system. You can observe below every application, so regardless of what crazy code runs on your system, you get to see its effects at the lowest level observable by the user. Understanding how to use auditctl to configure auditing rules and control auditd behavior, aureport to view summaries of the auditd logs, and ausearch to locate specific events gives a system administrator a huge amount of interrogative power over a system.
For an example, I have used auditd to determine the reason that the Mirth application failed to run properly during a vendor installation. The process simply wouldn’t start and the logs the vendor chose to reference apparently showed nothing of value (I myself did not check). I assumed it was attempting to access a file to which it did not have permission. The vendor was ready to give up for the day, but it only took me about five minutes to add an auditing rule to look for EPERM error values, execute the vendor script, and then search the auditd logs for the incidents occurring during the relevant timeframe. With the problematic directory’s group permission properly set we were able to move on; this simple capability saved us hours or even days of delay.
My guess at the major suite of tools of which to be aware for the LFCE examination is sysstat. Installed on Red Hat Enterprise Linux 6 by default, the sysstat package is also available on CentOS 6 as part of the base repository. By default on RHEL6, an /etc/cron.d/sysstat file is put in place to write a binary-format daily record of system activity at ten minute intervals to /var/log/sar/sadd (where dd is the calendar value for the day) along with a plain-text daily summary of process accounting to /var/log/sar/sardd. You’ll see that these cron jobs are specified using the sa1 and sa2 scripts provided with sysstat – see their man pages for information on their purposes and uses.
The package includes the sar (“system activity reporter,” I believe) utility for reporting the contents of activity counters in the Linux kernel. It is very readily used for the generation of reports and it can be used to access a very wide range of information – CPU utilization, RAM utilization, disk I/O, paging statistics, interrupt requests (!), power management statistics, and some really interesting network statistics including NFS-oriented information such as number of RPC requests made per second, broken down into read/write/access/getattr, if specified. See this handy table for all of the sar functionality broken out in an easily understood manner.
In addition to sar, the sysstat package includes some other handy utilities. Perhaps the most useful is pidstat, which allows the administrator to specify processes of interest and scrutinize their resource usage.
Despite the above guess (made primarily because of the sysstat package’s relative simplicity and accessibility), the performance report generating software of choice in the RHEL/CentOS 6 environment is Performance Co-Pilot (see the User’s and Administrator’s Guide for everything you need to know). It is recommended specifically in the Red Hat Enterprise Linux 6 Performance Tuning Guide and it is extremely versatile and feature rich. Learning how to use it is useful not only for understanding an excellent system usage and performance monitoring and reporting tool, but also for learning how to think about the act of monitoring and reporting performance and usage in general (which is also something learned from studying sysstat).
Those are the most valuable tools I use in the scope of this competency. There is also, of course, the slew of basic tools for discovering rudimentary information; uptime reveals the time which has passed since the operating system was last down, last reveals a quick and dirty list of user login events, w reveals the users currently logged into the system and their present activity, and top is the basic real-time system usage monitor. All of those tools are relatively straightforward (though top can be more versatile than the others) and take very little time to understand and use effectively. They are frequently used for quick system insight, but their functionality is also subsumed by the far more robust auditd and sysstat/pcp packages.
- Manual pages
- Generally, section 8 of the manual pages corresponding to the commands below, but also:
- pmie (very interesting, but likely outside the scope of the LFCE)
- Your choice of location
- Monitor specific system metrics for a predetermined time and duration.
- Monitor specific system metrics for an activity which can be started and stopped by the administrator to observe the performance implications.
- Configure routine performance monitoring to occur at regular intervals every day.
- Examine historical performance monitoring logs for one or multiple hosts and extract information relevant to a given inquiry.
- Correlate performance metrics between multiple hosts engaged in interrelated activities.
- Examine performance monitoring data and recommend system improvements to optimize performance.
If you have a Fedora workstation at home (and I really recommend it if you work with Linux professionally), you can install the sysstat or pcp software and start logging away. Being a performance optimization junkie myself, I enjoy seeing performance reports during gaming or virtual machine execution, for example. There are a million ways to monitor your system, so just get started! Focus on the high-level stuff like CPU, memory, and disk utilization and go from there.
My recommendation is that you challenge yourself to run high-end software on low-end hardware. On my $400 system sporting a simple AMD APU (A10-6800k), for example, I still manage to play Left 4 Dead 2 with my friends, most of whom have $1000+ systems. Do I max out all the graphics settings? No. But, performance monitoring can tell me where I’m hitting bottlenecks so that I can incrementally improve my experience.
If you don’t have a Linux workstation, you may be hard-pressed to generate enough usage on a home-made Linux server to be very interesting, but if that’s what you have, go for it! If you’re not sufficiently using your server, find software that interests you and see if you can get more usage out of the system with it.
Use sar to kick off a monitoring session (make it a background job) for a gaming session, or perhaps a Phoronix suite benchmark operation. Build an NFS system (that’ll be part of a later competency!) and use the NFS-specific metrics to monitor the server while you have a client utilize its services. The basic approach here is to do something interesting with your system and log the relevant metrics. Challenging yourself to determine the proper metrics to investigate and subsequently implement the monitoring solution is the way to train for this competency.