EYE Frequently Asked Questions ------------------------------ (February 2010) EYE.FAQ VERSION = '5.3.0' The latest version of this document can always be found at http://eye.glacierconsulting.net/public/FAQ Sections: * GENERAL * IDIST * IHEALTH * IREAD * IGREP * ICOL * PROBLEMS GENERAL ------- Q: What is EYE? EYE is Glacier Consulting's operating system analysis and support utilities. A data collection utility is run on each target system and the data analysed either through IHEALTH or manually using various utilities. Q: How do the EYE utilities select targets? To specify an individually set of servers, use: > idist OPERATION -h SERVER1,SERVER2,SERVER3 ... To specify a group of servers, just stipulate the group at the end of the command: > idist OPERATION -g group1,group2,groupN To target hosts of a particular managed system (see note *): > idist OPERATION -m system1,system2,systemN * This requires that data collection of each individual host and associated HMC has taken place at least once. Q: It appears as if only IDIST and IHEALTH support the above? That is correct, since IDIST and IHEALTH target multiple hosts. IREAD, IGREP and IDOC need a host or file argument, since they target individual hosts. Q: Why are there different versions of the program for AIX? The perl libraries for each release of AIX is different. IDIST ----- Q: Why is there a need for IDIST? What does it do? IDIST is a powerful utility that is used to execute a program or command in parallel on multiple remote servers. In the context of health checks, IDIST is able to run a health check in parallel on multiple servers, and retrieve the collected output files to a local (Central Monitoring Server). It is also able to install new versions of the collector utility and expire old collector files on each remote server. It includes functionality to run commands or a script on multiple servers at once. Q: How many tasks will IDIST run in parallel at a time? By default, 10. This can easily be changed in the ICONFIG tool under Global Settings. Q: Why does IDIST start tasks in a different order every time it is run? The program intentionally submits the previously slowest jobs first. By scheduling the slowest tasks first, they effectively get a longer period of parallel execution time, which in turn reduces the overall run-time. Q: The collector directory on my client machines is getting full, can it be managed? You can expire the older collector output files by running the following from your CMS: > idist expire -x 7 -r /var/eye all In this example all collector files older than 7 days are expired. It is best to add this command to a cron-job and schedule it once a week. IHEALTH ------- Q: What is the purpose of IHEALTH? IHEALTH is an operating system analysis tool. It provides alerts for events that potentially pose a risk that may result in downtime, or deviate from best practice standards. It is not intended as a complete replacement for system audit tools, since its primary focus is availability. Q: What filtering options does IHEALTH have? Severity filtering can be configured so that only HIGH, MEDIUM or LOW severity exceptions are displayed. There are also options to suppress AUDIT, SECURITY and PERFORMANCE related exceptions. The IHEALTH utility also has a DISPLAYMODE setting, which can be set to "brief" or "full". As an example, IHEALTH will only show "14 insecure network options" in brief mode, whereas a full description of each insecure network option will be displayed in "full" mode. For more info, see the section on ihealth.rc in README.ihealth. Q: Do I really have to type the long collector file name as an argument to IHEALTH? No. You simply specify the host name of a server. Q: What if I disagree with the reported exceptions? Hey, we're not perfect. Any incorrect results will be fixed with high priority. Tests are conducted on hundreds of systems before releasing IHEALTH. Q: IHEALTH does not seem to report on resources I'm interested in? Q: I require a comprehensive audit for my systems. It is also possible to submit your collector file to Glacier's team of experts for further analysis. You should be executing 'icol -r 2' and send the output file for analysis. The industry acknowledged team of Glacier specialises in clustering, security, backup and RAS (Reliability, Availability and Scalability) on several different UNIX platforms. Contact details are provided below. Q: Does IHEALTH allow for any specific/custom thresholds? Yes, these are documented in ihealth.rc and ICONFIG. Q: Why does the error-log (errpt) results seem much smaller than my system error log? IHEALTH condenses errpt entries into unique errors. It will count, but not display duplicate errors. Q: What exceptions does IHEALTH check for? Why is the report so small? IHEALTH only reports on exceptions, and will by default also only generate graphs for the exceptions that were identified. The reports are usually smaller if filtering is active. The following categories are checked: Platform - Firmware, and platform specific checks. Sensors - System hardware sensors Availability - System downtime Capacity - Filesystem checks, incl. free space. Processes - Analysis of the active process state. HACMP - HACMP related sanity checks. Services - Subsystem checks. Performance - CPU, memory, disk & network. SAR spike and trend analysis. Memory - VMM related checks. System logs - Scans for exceptions. Security - Scans logs & settings. Network - Scans system counters & interfaces for exceptions. Storage - Checks for LVM, internal & external disk subsystem issues. LPPs - Check LPP configuration & performs APAR analysis. IHEALTH includes over 200 individual checks. Q: IHEALTH reports exceptions which warrants investigation, but I cannot seem to interpret them? Contact Glacier Consulting. Q: I want to see the collected data for myself? see IREAD & IGREP. IREAD ----- Q: IREAD only seems to display the names of commands, and not the actual content? This is the default mode of operation. You have to specify the record-type and "-r" to read the actual output. Q: What record-types can be used? Supported record-types are: commands (-c), files (-f), stat (-s). Links are not currently supported. Q: Do I have to type the full path of each command? For commands you only need to type the command name and its arguments. For files, you have to type the full path. IGREP ----- Q: Is this similar to UNIX grep? What does it do? IGREP allows a user to scan the command names and all output of a collector file for a given pattern. Q: Is it possible to check find & display the full output of all "uname" commands? Yes. > igrep.exe -i -c 'uname' TARGETSERVER Q: Is it possible to only view the names of the commands, instead of all the output? Yes. > igrep.exe -c 'uname' TARGETSERVER Q: Is it possible to scan ALL the collected data for a pattern? Yes. We call this extensive search-pattern matches. Examples follow: > igrep.exe -e PATTERN TARGETSERVER # show matching lines for each pattern match > igrep.exe -x -e PATTERN TARGETSERVER # only display record names for matches > igrep.exe -i -e PATTERN TARGETSERVER # show entire record if match ICOL ---- Q: What does ICOL do? ICOL is a data collector. It collects the output of commands, files and other system data that can be analysed remotely. Q: Why not use the AIX snap utility instead? ICOL provides 1. Data archiving. 2. Customization on a per command/file level. 3. Expiration of previous collected data versions. 4. Collection timeouts. If multiple errors or timeouts occur, the collector will try to behave intelligently and non-intrusive. 5. Run-levels. A run-level is used to group similar output - e.g. a runlevel for disk, network, security or comprehensive system snapshots for offsite auditing. Q: Can ICOL encrypt its output? Not yet, at this stage it only compresses the output. Q: Can ICOL run continuously? Does it open any connections to the outside world? No. Q: Can ICOL be run in a quiet mode? Yes. Use the -q flag. Q: Is there a way to run a healthcheck in a single command on a target? Yes. If you have purhased and installed the utilities, you can simply run "idist health TARGETS" which will automatically run ICOL & IHEALTH. Q: Why not just leave the collector output file on the target server? Retrieving the output files with the IDIST utility is highly recommended. It stores critical recovery and system state information, which can assist greatly during outages and DR situations. In addition, the output of earlier collector files can be compared, so that trends can be observed. Q: Is it possible to recover from a hung or extremely unresponsive system? If you have recently retrieved a collector output, it may be possible to determine PIDs that can be killed one-by-one from the output of collected "ps" info. This is one of the few ways to recover from a runaway process that consumes most memory. If the system has enough memory to execute kill, you may be able recover by killing the least important applications or the highest memory consumers first. When the system becomes responsive again, you can quickly run "ps" to find and kill the real runaway process. Q: Can I manually copy an AIX 5.3 icol.exe to another server running a different AIX version? It is not recommended. In particular, AIX bug IY79272 may cause a system crash on AIX 5.2 TL8 (and perhaps others). PROBLEMS -------- Q: I get a message "Error on line XXX"? This could be caused by unexpected output which are not handled properly by the program. We'd like to avoid this as far as possible and encourage you to report this, by submitting the relevant collector file to us. Q: ICOL seems to run forever, why is this? This has been observed when the program's cached work files are damaged or corrupted. ICOL sets up a temporary work area under /tmp/par-root. It is possible that a scheduled program or cron-job (such as skulker), may remove some of the cached files in this directory. Remove the /tmp/par-USER directory entirely if this happens. To monitor ICOL progress, monitor /tmp/icol.log. It may also be possible that ICOL is taking long to traverse /proc. Check the load on the target server. Q: I get the following error when I run any of the EYE tools on AIX: "Cannot open or remove a file containing a running program." You are very likely running the IHEALTH program as a regular user. The packager we use has a deficiency in that it does not preserve permissions for shared libraries. You will need to find all "libgd.a" files in /tmp/par-MYUSERNAME and change their permissions to "640" (no permissions for others). Run "slibclean", and the error should no longer appear. --------------------------------------------- Contact Details: http://eye.glacierconsulting.net Niel Lambrechts & Clifford Weinmann Glacier Consulting ---------------------------------------------