Problem Determination


 Return to Library  Contents  Previous Topic  Bottom of Topic  Next Topic  Index  Help


Chapter 2. Problem Determination Tools Overview

       

This chapter describes some of the tools available for use in CallCoordinator problem determination.

These types of tools are available to assist you in problem determination:


Understanding Host Computer Tools

    This section briefly describes tools available on CICS to help in problem determination. Except for understanding the use of the CICS system log,       this guide does not give further descriptions of these tools, because the CICS manuals discuss them in more detail.

CICS System Log

     

The host computer records operational and error states during operation of the CallCoordinator system. This data resides on the system log maintained by CICS for the region in which the CallCoordinator system is running.    

The CallCoordinator system makes an entry to the CICS system log as each       status change or error condition occurs in the operation of the host computer. The CICS system log entries reflect a complete history of the existing CallCoordinator session on the host computer.

Note: The CICS system log uses 24-hour clock time. If you are using a 12-hour clock format, be aware that the entries in the log will not match the entries you see in CallCoordinator.

You can define your own log for CallCoordinator messages only. This log is defined on the COR General Settings panel, in the System log ID field. The product default is CSMT, which is the CICS system log, but you can override it with your own destination.

For more detailed information, see "Understanding CallCoordinator's Interaction With the CICS System Log".

CICS Traces

     

CICS maintains internal trace tables to record activity in the CICS         region. CICS makes an entry in a trace table each time the system initiates     a CICS management program. Agent application programs can also make entries.

For more details, refer to the CICS Problem Determination Guide appropriate for the CICS release being used at your facility. This CICS guide describes other traces that are useful in analyzing specific conditions.

CICS Dumps

   

CICS makes a formatted printout (dump) each time a     program check or abnormal end-of-task occurs. Part of the formatted printout includes a program check and abnormal-end trace table that shows information relating to the most recent program checks or abnormal ends.

For more details, refer to the CICS Problem Determination Guide appropriate for the CICS release being used at your facility.

ACF/VTAM Buffer Trace

             

The Advanced Communications Function/Virtual Telecommunications Access Method (ACF/VTAM) has a buffer trace. Use this buffer trace to investigate the data sent to ACF/VTAM before the data is actually passed to CICS by the   communications link, through the ACF/NCP running in the communication controller.

Refer to the Advanced Communication Function for Virtual Telecommunication Access Method Diagnosis Guide     for this and other tools in ACF/VTAM. Refer to the Advanced Communication Function for System Support Program (ACF/SSP) Diagnosis Guide for further information on diagnostic tools for the NCP.


Understanding CallCoordinator Tools

       

This section gives brief descriptions of logs and traces on the host computer, that can be used in problem determination.  

CallCoordinator MIS Log (MIS)

   

The MIS log on the host computer records all call event messages, session messages, and status messages received by the host, plus some messages from the Call Tracking Manager (CTM), Load Balancing Manager (LBM) , Screen Presentation Manager (SPM), and Application Program Interface (API) modules.

MIS log process data sets and backup files can be used for reports. Therefore, it is important to set up backup procedures or archiving so data can be extracted for reports. Refer to the CallPath CallCoordinator/CICS Application Programming Guide for information on how to use the Write MIS Record API for data collection and reports. Then you enter the name of your customized program for collecting data in the MIS Log Exit name field on the COR Telephony Settings panel (VA32).

CallCoordinator provides batch programs to print out the log. To learn more about using the MIS Log, refer to the Operations part of the CallPath CallCoordinator/CICS System Management Guide. To find out more about the record format of the log, refer to the CallPath CallCoordinator/CICS Application Programming Guide.

CallCoordinator Trace Log Generation

         

The CallCoordinator Trace Facility on the host computer generates trace records in a CICS journal file for use in analysis of the causes of software problems. Under routine operating conditions, keep the trace turned off, to avoid any impact on system performance. The trace data information is for IBM support group only. It is not for customer problem determination purposes. For more information on using these traces, see Chapter 4, "Using the CallCoordinator Panels for Problem Determination". The trace will be written to a CICS journal file. The journal id is set on the COR Telephony Settings panel.

Understanding CallCoordinator's Interaction With the CICS System Log

   

CallCoordinator software records operational and error states during operation of the CallCoordinator system. It enters data into the CICS system log     for the region in which the CallCoordinator system is running.

The CICS system log is re-initialized each time the CallCoordinator region is     started. Therefore, CallCoordinator entries in the system log reflect a complete history of the existing session.

A system log entry appears in the format:

000001 EZPACTLC 01/16/92 09:45:50 EZP0503I BEGIN
CALLCOORDINATOR TABLE LOAD

The format meaning is:

<sequence number>
<module name>
<date>
<time>
<message ID>
<message text>

The parts of the system log entry are defined as follows:      

Element
Meaning

Module name
The portion of the system that issued the message. CICS prefixes are:
Prefix
Meaning
CAM or EZP
CallCoordinator
DFH
CICS message

Date
Indicates the day on which the message was issued (the format is as defined on the COR General Settings panel (VA33)

Time
Indicates the time when the message was issued (the format is as defined on the COR General Settings panel (VA33)

Message ID
Identifies the message and its type. The suffix of the message defines its type. CallCoordinator uses these message types:
Suffix
Meaning
E
Error
I
Informational
W
Warning.
Chapter 6, "CallCoordinator Messages" explains the message types in more detail.

Message text
Gives the message.

For detailed information about looking at or printing the CICS system log, refer to the CICS for OS/390 Problem Determination Guide appropriate for the CICS release being used at your facility.

See Chapter 6, "CallCoordinator Messages" for a discussion of the content and causes of CallCoordinator messages and suggested responses.

Collecting Trace Data

The system administrator uses the COR Operations panel (VA10), option 1 from V800, to turn on or turn off trace record generation. In routine operation, the trace facility should be turned off to avoid logging unnecessary messages and slowing down the system. When a problem occurs that requires a trace it can be turned on.

You will be directed by your system programmer or Level 1 or 2 program support to turn on certain traces during the problem determination process.

To verify that you have collected trace data, see that the trace record format includes the source module identifier, a unique record identifier, a date/time stamp, and unique text description.

A trace entry appears in the format:

EZPSAGTC MAIN 01/16/92 09:33:50   I 000420 = TEXTTEXTTEXT

The format meaning is:

<source module>
<module label>
<date>
<time>
<type> (E=Error I=Informational, W=Warning)
<length>
<text description>


Understanding other CallPath tools

Figure 1 shows how CallCoordinator relates to the other components, including CallPath/CICS for OS/390 and CallPath SwitchServer/2. You may need to understand the tools available for these other CallPath products.

CallPath/CICS for OS/390

For the tools for this product, refer to the CallPath/CICS for OS/390 System Management Guide and the CallPath/CICS for OS/390 Application Programming Guide.

SwitchServer/2

IBM CallPath SwitchServer/2

For CallPath SwitchServer/2® tools refer to Using CallPath SwitchServer/2


Understanding the COR Operations Panel (VA10)

   

The system administrator can use this panel under normal operating conditions to check the status of the system and long running tasks, and to view table usage statistics.

You can see information on the status of the system and the long running tasks in CallCoordinator. Long running tasks (LRT) are those   tasks that begin running when the system is started, and run in the background without operator instruction, unless an error condition occurs, until the system is shut down. You can refer to the CICS system log to determine if an error condition exists.

When any long running task is not active, the system as a whole is not active. To make the system active, identify which LRT is inactive and correct its problem.

See the discussion in Chapter 4, "Using the CallCoordinator Panels for Problem Determination" for specific information on performing the problem determination tasks.


Shutdown and Recovery

       

On panel V800 choose option 3, then 6 against the required COR to display the COR Recovery Policy panel (VA31).
VA31             CallCoordinator V 2.1   System Configuration
                              COR Recovery Policy
 
For COR : CORD
 
Recovery Policy
Recovery type... 2                              Recovery method..... A
    0 - No Recovery                                        A - Automatic
    1 - Local Restart                                      M - Manual
    2 - Alt Restart
    3 - Resource Relocation
 
 
Recover Agents.. Y                              Alternate SysID .... CORF
 
 
 
 
 
 
 
COR SysID:  CORD        TermID: P125
 
 
F1=Help      F3=Exit       F5=Refresh               F12=

Use this panel to define or to update the recovery policy for the selected COR. You may specify Automatic or Manual Recovery, the number of attempts to recover and the Recovery Type.

Recovery Type

     

The recovery type may be specified as one of the following:

Recovery Type 0 - No Recovery
No attempt is made to recover the system.

Recovery Type 1 - Local Recovery
The Recovery Manager attempts to restart CallCoordinator in the local region if CallCoordinator was purged due to an LRT failure.

Recovery Type 2 - Alternate Restart
The Recovery Manager attempts to start CallCoordinator in the alternate COR defined in the Alternate SysID field. If CallCoordinator is currently active in the alternate COR, the Recovery Manager simply starts the switches that were active in the failed COR in the alternate COR.

Recovery Type 3 - Resource Relocation
The Recovery Manager attempts to relocate switches from the purged COR to the COR defined in the Alternate SysID field.

The above actions take place if Recovery Method is Automatic, otherwise a console message is written asking the operator for instructions.

The Recover Agents field, when set to Y (Yes), tells the Recovery Manager to sign agents on to the recovered system if resources are recovered in an alternate COR. Agents are not signed off if recovery takes place locally.

See the Planning part of the CallPath CallCoordinator/CICS System Management Guide for more information about recovery policy, and for details of the Switch Detail panel (VA42). The Switch Detail panel contains a Secondary COR for the switch which is to be used by the Recovery Manager if the switch fails.

Recovery Method

       

If the recovery method is Automatic, the Recovery Manager will perform Automatic recovery and restart based on the recovery policy for the failing COR, as defined on the COR Recovery Policy panel (VA31).

If the recovery method is Manual, the Recovery Manager will prompt the operator, via WTORs to the system console, to direct the recovery and restart operation.

If Recovery Type is No Recovery, Recovery Method is ignored.

Transaction and System Dumps

     

    Transaction V884 monitors the CallCoordinator system and causes a shutdown (or purge) when it determines that one or more Long Running Tasks (LRTs) are not responding. V884 continually checks the status of each LRT using a CICS INQUIRE TASK command. If any of the LRTs does not appear in the returned Task list, Shutdown is invoked. When this happens, the system status is set to PURGED and recovery is attempted according to the Recovery Policy for that COR.

A transaction dump is taken from the original LRT failure, and should be provided to CallCoordinator support to assist diagnosis.

Before the recovery is attempted, transaction V900 is started to issue an abend code of CP99. It is recommended that abend CP99 is set to issue a full system dump, to help with diagnosis by CallCoordinator support. The purpose of issuing this unique abend code is to make it unnecessary to issue a system dump for more common abend codes, such as ASRA and AICA. However, if your installation does issue system dumps from the common abend codes, then you do not need to issue a system dump for CP99.

CallCoordinator maintains an in-core Trace table with 1000 entries. If Tracing has been set on (through the VA10 panel), these entries are continually written to a CICS Journal file. If Tracing is not already set on, it is set on for 5 seconds in the event of an LRT failure in order to write the last 1000 in-core trace entries to the CICS Journal. The CallCoordinator support group may ask for a copy of this journal for diagnostic purposes.


 Return to Library  Contents  Previous Topic  Top of Topic  Next Topic  Index  Help