Essential Guide to Network Troubleshooting Steps
Troubleshooting problems with a network can be a tedious and frustrating task. It’s often not clear what the root cause of network problems is. Some problems are intermittent, which makes them harder to test. Different operating systems, applications and networking hardware may respond to network problems in different ways, leading to inconsistency in symptoms.
In order to navigate challenges like these and troubleshoot networking issues effectively, you need to take a systematic approach. This article explains the essential network troubleshooting steps to walk through, as well as resolution strategies for common networking problems.
Assess the Problem
When a networking issue arises, your first step should be to assess the problem, in order to determine what the issue actually is. A user complaint that “the Internet is slow” could mean many different things - an application is not responding quickly enough due to limited bandwidth, a wireless connection is flaky, network latency rates are too high, and so on. You must ascertain the specific nature of the issue before you can fix it.
To help assess the extent and source of the problem, consider these steps:
- Look for recent networking changes: If you installed new hardware or software, or changed a configuration, there is a good chance that the issue is related to that change.
- Survey users: Determine how many users have experienced a similar issue, and when it first occurred.
- Check hardware: Since hardware systems are generally easier to troubleshoot, start with those. Check for visible signs of hardware damage. Try restarting hardware systems that you believe to be related to the problem.
- Check software: If hardware doesn’t seem to be the problem, look at software. Are there any networking applications or utilities that need to be updated? Are any of them running at maximum capacity, meaning you should allocate more resources to them or install more instances of them in order to handle demand better?
- Identify whether the issue is caused by one problem or multiple problems: Sometimes, there is just one root cause of an issue. Other times, there are multiple, interrelated problems. The troubleshooting you’ve performed to this point should give you a sense of the type of issue you are dealing with.
Further reading Network Audit Guide
Classify the Incident
Once you’ve determined the nature and scope of the problem, you can assign it a priority level. The priority level should reflect how many users the problem impacts and how critically it disrupts their work.
If it’s a major incident, begin responding as soon as you can. Otherwise, establish a timeline for when you will resolve the problem.
During this process, think as well about the origins of the incident and what it will take to fix it. Will you need to travel to a remote site? Do you need to order new equipment? Will you need multiple staff members to resolve the problem? Factors like these will influence how quickly you can plan to resolve the incident, regardless of its priority level.
In cases where you are dealing with a critical problem but can’t solve it right away, you may consider introducing a stopgap measure in the meantime. For example, if you need to rebuild a local server but don’t have new server hardware on hand, you could spin up a replacement server in the cloud using a virtual machine temporarily, then take an image of that server and transfer it to the new local server hardware when it arrives.
Develop a Response Plan
Before you begin working to resolve a networking incident, you should have a plan in place for how you will handle it. The plan should reflect what you believe to be the root cause (or causes) of the problem, and it should specify the resources and procedures that you will need in order to resolve it.
For serious or complex incidents, it’s wise to test your response plan before implementing it. For example, imagine a situation where your response plan is to replace a network router that is actively handling connections, but is starting to fail. In order to ensure that you can decommission the router and replace it with a new one with as little disruption to users as possible, you could perform a test run by setting up a spare router first. That way, you’ll familiarize yourself with the steps you’ll need to go through when performing the actual incident response. This will minimize the risk that something unexpected will happen and bring the live network down for an extended period.
Document the Troubleshooting Process
When you’re in the midst of responding to a networking issue, writing documentation may not be at the forefront of your mind. But it should be, because it’s critical to document every stage of the network troubleshooting process.
Write down the steps you perform in order to assess and diagnose problems, as well as the response plan you intend to follow. Then, note the outcome of the plan after you execute it. Having this documentation on hand will help you respond more quickly in the event that the problem occurs again. It is also useful when you are troubleshooting other network issues and want to determine what has and hasn’t worked in the past.
Further reading Documentation Best Practices for Network Administrators
Common Networking Issues and How to Troubleshoot Them
Now that we’ve discussed the basic steps for network troubleshooting, let’s look at two common types of network issues - wireless connectivity problems and bandwidth limitations - and the approaches you can take to resolving them.
Compared to wired networks, wireless networks are inherently less reliable. Signal loss can occur in the places where it's least expected and for reasons that are sometimes unanticipated. Still, providing reliable wireless connectivity is critical for virtually every modern business, which probably could not operate without mobile devices.
Having the appropriate tools for troubleshooting wireless problems, and a strong knowledge of how wireless range and transmission rates work, are the best ways to get there. Here is an overview of each of these points.
Assessing Wireless Networks
Here are a few types of resources that will help with assessing wireless networks:
- Wireless survey: The easiest way to perform a real wireless survey is with mobile apps. Each SSID will also indicate the signal strength, channel number, and band.
- Speed test: Speed test apps help to test wireless speeds from any point. Upload and download bandwidth speeds are indicated, as well as ping test reports.
- Heat maps: Heat map software shows an overhead view of business infrastructure, and the quality of wireless service offered in each inch of the building.
Understand Wireless Range
To get a full picture of the wireless range, you need to assess several distinct aspects of the connection.
- Signal strength: The strength of the signal is easiest to measure via a simple review of the number of “wireless bars” that show up from the testing device. Using a wireless survey app allows a more detailed look into wireless signal strength.
- Speed: Wireless speed is best tested with speed test apps. Wireless speed is often related to signal strength, but not always. It’s important to evaluate both aspects separately.
- Stability: Taking steps to check on signal stability is a good way to measure consistency. Perform strength and speed tests multiple times and observe the differences in the results. If there are gaps in data, this could demonstrate a lack of stability.
- Interference: Wireless interference can be a real issue and is impossible to detect without the proper tools. If a neighboring business has wireless networks that are on the same channels, it could interfere with the network in the managed building. Wireless survey tools can be used to help detect interference.
Bandwidth testing needs to be performed at all levels of a network. By doing this, it is easy to determine where points of failure lie. Here are the different levels to consider.
- Wireless clients: Bandwidth limitations on a wireless network may be caused by several factors: Distance from the access point, problems with wireless networking device drivers, the use of a legacy wireless protocol, and more. Test all of the variables from multiple locations and devices.
- Wired clients: These are the main clients on your network. As data flows through the network, it will slow down a bit as additional hops are encountered, but the impact should be minimal (a few milliseconds). If you are experiencing more serious delays, check to see whether your routers are overwhelmed with traffic (in which case you should add more to handle the demand), or certain applications or devices are “hogging” the network (which could be the result of a configuration issue, or a security problem - perhaps attackers have taken control of your network and are flooding it with illegitimate requests).
- Direct connection to switch equipment and router: If there is an issue with a networking device, connect to it directly and test speeds. For best results, measure performance when network traffic is both entering and exiting the device.
- Modem: If you believe your Internet connection itself is the root of the problem, connect directly to the modem and run network tests against it.
Further reading Network Performance Monitoring
Network Troubleshooting Tools
A variety of network troubleshooting tools are available for helping to assess performance and isolate problems.
On Linux or macOS, you can take advantage of various command-line tools, such as:
- ipconfig - this tool helps to determine whether your device receives a proper IP address. If the address starts with 169, try to inquire about a new one with the help of ipconfig/release and ipconfig/renew commands.
- ping - use this tool for testing whether a specific network host responds to messages sent via the network.
- tracert - for tracing how traffic flows across your network and searching for issues located in the path between the router and the internet.
- nslookup - for checking DNS records and IP address mappings.
- netstat - this tool displays network configuration information.
- nmap - use it to scan the network and display information about the IP address and ports running on each host.
Most of these tools or their equivalents are available for Windows as well, although they may not be installed by default. For more sophisticated testing, tools like IP calculators, speed test websites, port scanners and protocol analyzers can provide an overview of your network and its performance.
Finally, specialized hardware devices, including cable testers, time-domain reflectometers (TDRs) and loopback adapters can be useful for testing issues related to networking hardware.
Network troubleshooting is a complex task. Every networking incident is unique and requires a tailored response. However, following a systematic process for determining the nature of a problem and developing a response plan will ensure that you can resolve networking incidents before they become critical disruptions for your users.