what is troubleshooting methodology ?

Spryzen Security
6 min readAug 31, 2021

--

An effective troubleshooting methodology follows a set of steps to diagnose and fix a computer.

1. Identify the problem

• Question the user and identify user changes to computer and perform backups before making changes.

• Inquire regarding environmental or infrastructure changes.

• Review system and application logs

2. Establish a theory of probable cause (question the obvious)

• If necessary, conduct external or internal research based on symptoms

3. Test the theory to determine cause

• Once the theory is confirmed, determine the next steps to resolve problem

• If theory is not confirmed re-establish new theory or escalate

4. Establish a plan of action to resolve the problem and implement the solution

5. Verify full system functionality and, if applicable, implement preventive measures

6. Document findings, actions, and outcomes

Identify the Problem

There’s a reason you’re standing in front of a computer to repair it: something happened that the user of the computer has identified as “not good” and that’s why you’re here. First, you need to identify the problem by talking to the user. Get the user to show you what’s not good. Is it an error code? Is something not accessible? Is a device not responding?

Then ask the user that classic tech question (remember your communication skills here!): “Has anything recently changed on the computer that might have made this problem appear?” What you’re really saying is: “Have you messed with the computer? Did you install some evil program? Did you shove a USB drive in so hard that you broke the connection?” Of course, you never say these things; simply ask nicely without accusing so the user can help you troubleshoot the problem Ask also if any changes have happened in the environment around the workstation. Check for any infrastructure changes that might cause problems. If you can access them, review system and application logs for clues about faulty software. In most troubleshooting situations, it’s important to back up critical files before making changes to a system. To some extent, this is a matter of proper ongoing maintenance, but if some important bit of data disappears and you don’t have a backup, you know who the user will blame, don’t you? (We cover backup options in detail in even though that’s not how it works in the real world.

Establish a Theory of Probable Cause (Question the Obvious)

Now it’s time to analyze the issue and come up with a theory as to what is wrong, a theory of probable cause. Personally, I prefer the word “guess” at this point because very few errors are so obvious that you’ll know what to do. Fall back on your knowledge of the computing process to localize the issue based on the symptoms.

Keep your guesses…err…theories…simple. One of the great problems for techs is their desire to overlook the obvious problems in their desire to dig into the system theory and a fundamental understanding of the computing process is the core knowledge for techs for fixing things.

Research-In many situations, you’ll need to access other resources to root out the most probable cause of the problem. If necessary, therefore, you should conduct external or internal research based on the symptoms.

Outside the Case- Take a moment to look for clues before you open up the case. Most importantly, use all your senses in the process. What do you see? Is a connector mangled or a plastic part clearly damaged? Even if that connector or part works fine, the physical abuse could provide extra information. If the user can’t connect to a network, check the cable. Was something rolled over it that could have broken the thin internal wires? Is that a jelly smear near the jammed optical drive door? A visual examination of the external computer is important. When you put your hand on the system unit (that’s the case that houses all the computer parts), does it feel hot? Can you feel or hear the vibrations of the fans? If not, that would be a clue to an overheating or overheated computer. Modern computers can run when overly hot, but generally run very sluggishly.

If you spend a moment listening to the computer, you might get some clues to problem sources. A properly running computer doesn’t make a lot of sound, just a regular hum from the spinning fans. If you hear clicking or grinding sounds, that’s a very bad sign and a very important clue! We’ll cover data storage devices — the usual cause of clicking and grinding sounds .

Finally, don’t forget your nose. If you smell the unmistakable odor of ozone, you know that’s the smell electronic components give off when they cook or are simply running much too hot.

Test the Theory to Determine Cause

Okay,

so you’ve decided on a theory that makes sense. It’s time to test the theory to see if it fixes the problem. A challenge to fixing a computer is that the theory and the fix pretty much prove them selvesat the same time. In many cases, testing your theory does nothing more than verify that something is broken. If that’s the case, then replace the broken part.

If you don’t have the skills — or the permissions — to fix the issue,you need to escalate the problem. Escalation is the process your company (or sometimes just you) goes through when you — the person assigned to repair a problem — are not able to get the job done. It’s okay to escalate a problem because no one can fix every problem.

All companies should have some form of escalation policy. It might mean calling your boss. It might mean filling out and sending some in-house form to another department. Escalation is sometimes a more casual process. You might want to start researching the problem online; you might want to refer to in-house documentation to see if this problem has appeared in the past.

Establish a Plan of Action

At this point, you should have a good sense of the problem, including the scope and necessary permissions to do the job. You need to establish a plan of action to resolve the problem and implement the solution. Sometimes the plan requires a few steps before you can implement the solution. You might need additional resources such as known good replacement parts. A backup of user data should be part of the plan of action.

Verify and Prevent

Fantastic! Through either your careful work or escalation, you’ve solved the problem, or so you think. Remember two items here. First, even though you think the problem is fixed, you need to verify with the customer/user that it’s fixed. Second, try to do something to prevent the problem from happening again in the future, if possible.

Verify Full System Functionality-You need to verify full system functionality to make sure the user is happy

Document Findings, Actions, and Outcomes

Based on his famous quote, “Those who cannot remember the past are condemned to repeat it,” I think the philosopher George Santayana would have made a great technician. As a tech, the last

step of every troubleshooting job should be to document your findings, actions, and outcomes. What was the problem? What did you do to fix it? What worked? What didn’t? The best guide to use for documentation is: “What would I have liked to have known about this problem before I walked up to it?” Good documentation is the strongest sign of a good tech Documenting problems helps you track the troubleshooting history

of a computing device over time.

THANK YOU

--

--