This chapter describes some of the tools available for use in CallCoordinator problem determination.
These types of tools are available to assist you in problem determination:
The host computer records operational and error states during operation of the CallCoordinator system. This data resides on the system log maintained by CICS for the region in which the CallCoordinator system is running.
The CallCoordinator system makes an entry to the CICS system log as each status change or error condition occurs in the operation of the host computer. The CICS system log entries reflect a complete history of the existing CallCoordinator session on the host computer.
Note: The CICS system log uses 24-hour clock time. If you are using a 12-hour clock format, be aware that the entries in the log will not match the entries you see in CallCoordinator.
You can define your own log for CallCoordinator messages only. This log is defined on the COR General Settings panel, in the System log ID field. The product default is CSMT, which is the CICS system log, but you can override it with your own destination.
For more detailed information, see "Understanding CallCoordinator's Interaction With the CICS System Log".
CICS maintains internal trace tables to record activity in the CICS region. CICS makes an entry in a trace table each time the system initiates a CICS management program. Agent application programs can also make entries.
For more details, refer to the CICS Problem Determination Guide appropriate for the CICS release being used at your facility. This CICS guide describes other traces that are useful in analyzing specific conditions.
CICS makes a formatted printout (dump) each time a program check or abnormal end-of-task occurs. Part of the formatted printout includes a program check and abnormal-end trace table that shows information relating to the most recent program checks or abnormal ends.
For more details, refer to the CICS Problem Determination Guide appropriate for the CICS release being used at your facility.
The Advanced Communications Function/Virtual Telecommunications Access Method (ACF/VTAM) has a buffer trace. Use this buffer trace to investigate the data sent to ACF/VTAM before the data is actually passed to CICS by the communications link, through the ACF/NCP running in the communication controller.
Refer to the Advanced Communication Function for Virtual Telecommunication Access Method Diagnosis Guide for this and other tools in ACF/VTAM. Refer to the Advanced Communication Function for System Support Program (ACF/SSP) Diagnosis Guide for further information on diagnostic tools for the NCP.
This section gives brief descriptions of logs and traces on the host computer, that can be used in problem determination.
The MIS log on the host computer records all call event messages, session messages, and status messages received by the host, plus some messages from the Call Tracking Manager (CTM), Load Balancing Manager (LBM) , Screen Presentation Manager (SPM), and Application Program Interface (API) modules.
MIS log process data sets and backup files can be used for reports. Therefore, it is important to set up backup procedures or archiving so data can be extracted for reports. Refer to the CallPath CallCoordinator/CICS Application Programming Guide for information on how to use the Write MIS Record API for data collection and reports. Then you enter the name of your customized program for collecting data in the MIS Log Exit name field on the COR Telephony Settings panel (VA32).
CallCoordinator provides batch programs to print out the log. To learn more about using the MIS Log, refer to the Operations part of the CallPath CallCoordinator/CICS System Management Guide. To find out more about the record format of the log, refer to the CallPath CallCoordinator/CICS Application Programming Guide.
The CallCoordinator Trace Facility on the host computer generates trace records in a CICS journal file for use in analysis of the causes of software problems. Under routine operating conditions, keep the trace turned off, to avoid any impact on system performance. The trace data information is for IBM support group only. It is not for customer problem determination purposes. For more information on using these traces, see Chapter 4, "Using the CallCoordinator Panels for Problem Determination". The trace will be written to a CICS journal file. The journal id is set on the COR Telephony Settings panel.
CallCoordinator software records operational and error states during operation of the CallCoordinator system. It enters data into the CICS system log for the region in which the CallCoordinator system is running.
The CICS system log is re-initialized each time the CallCoordinator region is started. Therefore, CallCoordinator entries in the system log reflect a complete history of the existing session.
A system log entry appears in the format:
000001 EZPACTLC 01/16/92 09:45:50 EZP0503I BEGIN CALLCOORDINATOR TABLE LOAD
The format meaning is:
The parts of the system log entry are defined as follows:
For detailed information about looking at or printing the CICS system log, refer to the CICS for OS/390 Problem Determination Guide appropriate for the CICS release being used at your facility.
See Chapter 6, "CallCoordinator Messages" for a discussion of the content and causes of CallCoordinator messages and suggested responses.
The system administrator uses the COR Operations panel (VA10), option 1 from V800, to turn on or turn off trace record generation. In routine operation, the trace facility should be turned off to avoid logging unnecessary messages and slowing down the system. When a problem occurs that requires a trace it can be turned on.
You will be directed by your system programmer or Level 1 or 2 program support to turn on certain traces during the problem determination process.
To verify that you have collected trace data, see that the trace record format includes the source module identifier, a unique record identifier, a date/time stamp, and unique text description.
A trace entry appears in the format:
EZPSAGTC MAIN 01/16/92 09:33:50 I 000420 = TEXTTEXTTEXT
The format meaning is:
For the tools for this product, refer to the CallPath/CICS for OS/390 System Management Guide and the CallPath/CICS for OS/390 Application Programming Guide.
For CallPath SwitchServer/2® tools refer to Using CallPath SwitchServer/2
The system administrator can use this panel under normal operating conditions to check the status of the system and long running tasks, and to view table usage statistics.
You can see information on the status of the system and the long running tasks in CallCoordinator. Long running tasks (LRT) are those tasks that begin running when the system is started, and run in the background without operator instruction, unless an error condition occurs, until the system is shut down. You can refer to the CICS system log to determine if an error condition exists.
When any long running task is not active, the system as a whole is not active. To make the system active, identify which LRT is inactive and correct its problem.
See the discussion in Chapter 4, "Using the CallCoordinator Panels for Problem Determination" for specific information on performing the problem determination tasks.
On panel V800 choose option 3, then 6 against the required COR
to display the COR Recovery Policy panel (VA31).
VA31 CallCoordinator V 2.1 System Configuration
COR Recovery Policy
For COR : CORD
Recovery Policy
Recovery type... 2 Recovery method..... A
0 - No Recovery A - Automatic
1 - Local Restart M - Manual
2 - Alt Restart
3 - Resource Relocation
Recover Agents.. Y Alternate SysID .... CORF
COR SysID: CORD TermID: P125
F1=Help F3=Exit F5=Refresh F12=
|
Use this panel to define or to update the recovery policy for the selected COR. You may specify Automatic or Manual Recovery, the number of attempts to recover and the Recovery Type.
The recovery type may be specified as one of the following:
The above actions take place if Recovery Method is Automatic, otherwise a console message is written asking the operator for instructions.
The Recover Agents field, when set to Y (Yes), tells the Recovery Manager to sign agents on to the recovered system if resources are recovered in an alternate COR. Agents are not signed off if recovery takes place locally.
See the Planning part of the CallPath CallCoordinator/CICS System Management Guide for more information about recovery policy, and for details of the Switch Detail panel (VA42). The Switch Detail panel contains a Secondary COR for the switch which is to be used by the Recovery Manager if the switch fails.
If the recovery method is Automatic, the Recovery Manager will perform Automatic recovery and restart based on the recovery policy for the failing COR, as defined on the COR Recovery Policy panel (VA31).
If the recovery method is Manual, the Recovery Manager will prompt the operator, via WTORs to the system console, to direct the recovery and restart operation.
If Recovery Type is No Recovery, Recovery Method is ignored.
Transaction V884 monitors the CallCoordinator system and causes a shutdown (or purge) when it determines that one or more Long Running Tasks (LRTs) are not responding. V884 continually checks the status of each LRT using a CICS INQUIRE TASK command. If any of the LRTs does not appear in the returned Task list, Shutdown is invoked. When this happens, the system status is set to PURGED and recovery is attempted according to the Recovery Policy for that COR.
A transaction dump is taken from the original LRT failure, and should be provided to CallCoordinator support to assist diagnosis.
Before the recovery is attempted, transaction V900 is started to issue an abend code of CP99. It is recommended that abend CP99 is set to issue a full system dump, to help with diagnosis by CallCoordinator support. The purpose of issuing this unique abend code is to make it unnecessary to issue a system dump for more common abend codes, such as ASRA and AICA. However, if your installation does issue system dumps from the common abend codes, then you do not need to issue a system dump for CP99.
CallCoordinator maintains an in-core Trace table with 1000 entries. If Tracing has been set on (through the VA10 panel), these entries are continually written to a CICS Journal file. If Tracing is not already set on, it is set on for 5 seconds in the event of an LRT failure in order to write the last 1000 in-core trace entries to the CICS Journal. The CallCoordinator support group may ask for a copy of this journal for diagnostic purposes.