K-T Problem Analysis is a systematic method to analyse a problem and understand the root cause of the issue instead of making assumptions and jumping to conclusions, which is still common place even today. This method has become very popular in IT and technical fields and has been included in the IT Infrastructure Library (ITIL) but can be applied to a wide range of problems.
The same structure of Problem analysis is used in DMAIC and G8D and has been proved many times over in the field. Furthermore specific aspects of the Problem Analysis method can be used to greatly improve and strengthen these other problem solving methods.
There are 5 steps to Kepner-Tregoe Problem Analysis
- Define the Problem
- Describe the Problem
- Establish possible causes
- Test the most probable cause
- Verify the true root cause
Please note that this post contains some tabular information in images rather than html tables. If you are using a screen reader please contact me at ian dot cosgrove at gmail dot com for the information and I will provide this in the most convenient format for you.
Define the problem
The first step is crucial, if you don’t know what problem is how can you expect to fix it? Many people skip by this stage presuming they know what the problem is, this leads to difficulty down the line and wastes time, often leading the team to go back and re-examine this step.
When someone reports an issue such as “My screen is blank” it’s easy to say that is what the problem is and many people immediately go to work on this. However asking a few basic questions can reveal far more information about the nature of the problem and helps define possible causes of the issue. Let’s expand this further with a few basic questions, you can use any method that helps you but here we will look at the 5 W’s
- Who is experiencing the problem?
- Why is this important, why is this being done?
- What are the effects/symptoms – errors / defects or something you expected doesn’t happen?
- When does the problem occur or when did it start happening?
- Where does the problem occur?
Let’s ask the questions and expand the problem definition from “My screen is blank”
- Who. Mr. John Doe
- Why, needs to see his screen so he can process part orders
- What, the screen goes blank while booting up his computer, nothing appears on screen but he can hear the start up music for Windows
- When, the computer powers up but then goes completely blank, everything was fine yesterday when he shut it down
- Where, Mr. Doe’s computer screen
Let’s revisit the Problem definition with this new information
“Mr. Doe is unable to process part orders because his screen goes blank during boot up ever since he shut it down yesterday”
This is a much better problem description, it allows a team to understand exactly what the problem is from 1st glance, it narrows the focus of the questions that will follow and it helps us understand the impact of the problem – unable to process part orders.
I would recommend a basic reproduction of the problem now to verify the description, this rules out the possibility of human error, validates the problem description and confirms the circumstances. Imagine if someone reported that they are not receiving any email, is there a problem with their mail or maybe someone just hasn’t sent them any mail today..
Also a brief 5 Why’s on the WHAT (effects/symptoms) can be very useful to exclude some high level symptoms and accelerate the problem analysis. For example, why would the screen be blank? There could be several reasons
- The graphics card could have failed, no the computer wouldn’t boot, it would give 3 loud beeps then stop
- The screen could be faulty, no the manufacturers logo appears at first then the screen goes blank
- Hard Drive could be faulty and the computer doesn’t boot , no we hear the Windows start up music
- The backlight might have failed and the screen is dark, no the logo appears perfectly visible
- The display could be switching over to an external screen, no we can see everything on external display but when we switch to internal it’s still blank
Let’s see if we can improve our problem description now
“Mr. Doe is unable to process orders because his internal screen goes blank when Windows is starting since he shut it down yesterday”
Please forgive the IT related example but the same process can be applied by a Subject Matter Expert in any area to achieve the same benefits.
Describe the problem
In this step we describe 4 aspects of any problem, What the problem is, Where the problem occurs, When it occurred and the Extent to which it occurred. As a bonus we already have the answer to several of these questions after building our Problem Definition in Step 1 but the IS and IS NOT method allows us to explore these even further.
For each aspect of the problem we will describe what the problem IS, similar to step 1, and also what the problem COULD BE but IS NOT. The use of both columns allows us to describe what the problem is in detail but also rule out possible issues that did not cause the problem.
Let’s fill out the same table for our example problem
Identify possible causes
This juxtaposition of what the problem IS and IS NOT helps us to rationally examine what changes could have affected the items in the 1st column but not the items in the 2nd. Experience tells us that the majority of problems are due to a recent change, particularly for existing systems that have been working for some time without issues.
To expand our Problem Description to Possible Causes we will add 2 more columns to the worksheet, the first is ‘Differences’ which lists the differences between the IS and IS NOT, the second is ‘Changes’ which lists any changes to where the problem IS that could account for the differences.
It is important to note that effects don’t always immediately follow action, recent changes could have just exposed underlying problems that were always there so when considering the list of changes don’t limit yourself only to recent ones.
Test most probable causes
The changes from the previous step become a list of possible causes to explain why the problem may have occurred. With the Subject Matter Expert rank the possible causes by likelihood. For each possible cause ask “If THIS is the root cause of this problem, does it explain everything the problem IS and what the problem COULD BE but IS NOT?”
The following table helps you list the possible causes by likelihood
Verify true cause
Compare the probable cause against the Problem Description, does it satisfy all of the conditions of the problem?
When you’ve found a cause that explains all of these conditions test it to confirm it is correct with the procedure in the ‘True if’ column starting with the most probable cause. Reproduce the same conditions and if it results in the same symptoms you have confirmed the cause.
Once you’ve done that ask the 5 Whys to ensure this is the root cause of the problem, you can ask your team “Why do you believe this is the Root cause?” this exercise helps them to reflect on the decisions that led them to this conclusion and reinforces the excercise.
When you are confident you have identified the root cause of the problem develop a solution and ask if you are satisfied this would prevent any reoccurrence of the problem. If so implement the solution, then test the problem again under the same conditions, does the issue still occur?
To get back to our example imagine you have determined the problem with the display is due to a recent driver update which was installed but did not take effect until Mr. Doe had restarted his computer.
We can correct the problem by attaching an external screen and uninstalling the driver update. After restarting the computer the issue has been resolved, but has the root cause been addressed? It is unreasonable to ask someone to never update their drivers and it is unlikely that the intention of the driver team was to release an update that would stop the screen from working.
As an immediate action we can ensure Mr. Doe does not install this driver again. As a preventative action we can notify anyone else in the company who has the same computer that they should not install this driver until further notice.
Many troubleshooters will stop there as they have reached the end of their sphere of influence, we don’t have the access or experience to fix the driver ourselves but we should always notify the manufacturer of the issue and stay in contact with them until they have released a fix for the problem.
Characteristics of the Problem Analysis method
This method advocates a rational and systematic approach to analysing a problem without jumping to conclusions or making assumptions based on past experience. Compared to other methods one of the biggest advantages is the IS and COULD BE but IS NOT procedure which provides an intuitive approach to identifying possible causes for a problem.
Working within a Team is expected with the Kepner Tregoe method however it does lack the explicit steps of Six Sigma or the Global 8 Discipline methodologies, this can be an advantage or disadvantage depending on the problem you are addressing.
The process can generally be faster than other methods, without the statistical ball and chain of Six Sigma it can be quicker to progress through the method but as a result it can be harder if not impossible to detect subtle variation in a process and therefore implement the same level of controls.
The sole purpose of the method is in the name – Problem Analysis. Kepner-Tregoe have other methods to bring you from situational awareness through to solution/opportunity development which should be considered for a complete method to address problems.
Have feedback? Has this post helped you? Please let me know in the comments below!