Wireshark is the proverbial hammer that makes all networking problems look like ...

seiferteric · on July 15, 2023

It's like a debugger for networking, and surprisingly many programmers don't know how to use debuggers either.

x86x87 · on July 15, 2023

It's more like strace but yeah.

auguzanellato · on July 15, 2023

And a lot of programmers don’t know about strace either

maximus-decimus · on July 15, 2023

I can confirm, I didn't know about strace until this very moment. Looking at it, it basically only intercepts system calls? How often is that useful? What do people use it for?

TristanBall · on July 15, 2023

It answers, or at least gives the definitive first clue behind a huge number of slow downs or apparent hangs, given how many of those are actually blocking resource waits or retry loops gone mad.

It's probably most useful to sysadmins working with binaries, or even if you do do have the source, it's usually a shorter path to the solution for any app/os interaction problem.

It's useful for certain classes of optimisation and tuning, because it will give timings and aggregate timings.

I'll use it for things as simple as "where is this program reading it's config files" - often useful when doco is poor and/or there are multiple config locations selected by conditional logic.

There's an "ltrace" as well, for share library tracing, although I've personally found that less useful - bugs that that shows are more likely to be code/logic problems rather that os/infrastructure interaction - which is to say, usually outside my job scope.

On commercial unix, the equivalent to strace is truss, and it's been around forever.

Like many, wirewhark, strace/truss are my go-to tools for a huge amount of troubleshooting.

aseipp · on July 15, 2023

Program shits itself randomly during execution, crashes, and doesn't tell you why. strace it. Oh, it turns out it's trying to execve() a binary that doesn't exist on the system, which wasn't documented as a dependency, so I didn't install it. Fixed.

Lots of little things like that. Why is this program acting slow at startup when it should be fast? Oh, because it's opening and timing out on a socket connection with an unusually long timeout. Et cetera...

mitchs · on July 15, 2023

The most fun I've had with strace was debugging a 3-process deadlock. An snmp daemon was blocked waiting for a cli child process to finish, the cli was waiting for a response to a message on a socket it had open with a routing protocol daemon, which was waiting for a response from the snmp daemon.

It is also a great way to figure out why programs without useful debug output die. Ie. after a program opens and reads a config file it doesn't like, it starts cleaning up and exits.

jeffreygoesto · on July 15, 2023

I recently fired it up to quickly check which headers a crosscompiler used on a specific compilation unit. strace, grep, sort, done. I also use it as first check if something seems to hang. Sometimes you can see lock files trying to be acquired or access to wrong paths.

jeffreygoesto · on July 25, 2023

Just recently I found something performing way better for that task: https://brendangregg.com/blog/2014-07-25/opensnoop-for-linux...

pdw · on July 15, 2023

`strace -f -eopen,openat` to see which files a programs opens. Very often useful, even if just to check which config file(s) a program reads.

michaelcampbell · on July 16, 2023

I've used it on occasions to try and find why some app was erroring; typically the app would catch an exception (or ALL of them), and just die with something like "no."

For example if the open call fails with ENOENT, the file it's looking for doesn't exist, and strace will (also) tell you what file it's trying to open.

dekhn · on July 16, 2023

I use it constantly. I can't even imagine debugging some failures without strace. It's great for servers that don't log things but fail to load some config file (which can be debugged by inspecting the return codes of open() calls.

There is also ltrace, for library calls, although I find it less useful.

klysm · on July 16, 2023

It lets you see what a program is doing, where doing means any “effects” of the program that touch the system. Want to see whatS happening to files? You can strace for certain operations to certain files. Weird shitting the bed involving IO? Strace will illuminate the problem

jcul · on July 16, 2023

As well as all the other uses, it can be great for a quick way to see what files a program is trying to access. E.g. some undocumented binary where it's not clear where its config should be, strace will quickly show what it tries to access.

x86x87 · on July 15, 2023

it's useful when everything else you've tried failed and you have no clue what is going on. it's extremely helpful in figuring out why a program hangs or crashes. it's a good tool to have in the toolbox

el_benhameen · on July 15, 2023

Do you have any favorite sources for really understanding Wireshark? I’m not a networking professional per se, but I’m network-adjacent and I’ve dabbled in Wireshark from time to time. I can see the power, but it’s also one of those tools that’s totally overwhelming when I first approach it unless I have a very small, very specific problem. Or is it one of those tools that you learn as you need it?

baby_souffle · on July 15, 2023

There’s a million little features and tricks you can do but you’ll never stumble into them unless you’re actively googleing “how do I …”.

You might look for some pcap based CTFs with walkthroughs to get exposure to some of the more unique things you can do.

Just letting it run for a few min on your router and then powering a device up can also yield some interesting captures…

wolrah · on July 15, 2023

Unfortunately I can't really help there, I'm a "learn by doing" type of person who just jumps in the deep end and hopes he figures out how to swim.

Most of my learning was just "capture the problem happening, capture what happens when it works right if possible, open up the relevant RFCs, then try to understand what's different and why.

I work in the VoIP industry so I'm dealing with a lot of NAT problems (insert rant here about lazy ISPs that still haven't enabled IPv6 on their networks) and my main protocol (SIP) is heavily inspired by HTTP and as a result is more or less human readable plaintext, so it was a relatively easy learning curve to just have Wireshark open on one side of the screen and the relevant RFCs on the other side.

All I can really say is have a problem you want to solve and start from there.

el_benhameen · on July 15, 2023

Sounds reasonable to me, thanks. That’s how I always end up learning, but sometimes I wonder if there’s a better way.

zvmaz · on July 15, 2023

> Just yesterday I used it to troubleshoot a weird behavior in a recently upgraded Asterisk/FreePBX system which would have probably taken me days to guess my way through without packet captures

Do you mind sharing with us what was the problem and how you solved it with packet captures, if you have time? A blog post would be very interesting too.

toast0 · on July 15, 2023

For something like this firefox bug [1], getting down to pcaps helps determine where the problem is. Client is spinning on a request and server doesn't know about it could be a server problem or a client problem or a network in the middle problem.

In this case, the problem was the client wasn't actually sending the request, and with a sizable request that's visible even without decoding the https; although to be totally clear on what was happening, decoding was needed.

I've also debugged issued in remote networks where iirc, connections were being reset by some equipment local to the user. Seq/ack sequencing showed the resets were in response to a specific client sent packet and the timestamps showed it was impossible for that to have come from anywhere but equipment near the user.

For this bug [2], it took a lot of luck and patience to get a good capture, but once I did, the immediate problem became obvious: the machine I controlled was getting an icmp needs frag but DF set at the same mtu it was already using, and responding by sending the whole sendqueue at once, packetized to the new MTU that was the same as the old one. There's actually three problems here: a) there's no reason for the other side to send this packet (I found this is an already fixed linux bug with forwarding and large receive offload, but no way to contact the administrator of that router), b) our side shouldn't resend the whole sendqueue when the mtu changes, c) if the mtu didn't change, then there's no need to take any action. We only fixed c, but that solved the major problem: these resends would trigger more resends and we'd have periods of unavailability as the network was really busy.

This is pretty common when looking at wireshark; unless you work somewhere with full control of all clients and servers and a very network aware developer team, you're going to find lots of non-optimal or semi-broken stuff, and you've got to ignore it and focus on the majorly broken bit.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1740856

[2] https://reviews.freebsd.org/rS288412

licebmi__at__ · on July 15, 2023

I had to troubleshoot an issue where several network routers restarted in group without a cause but only when connected to the big wan. The problem was a network discovery software which when poorly configured, would send ssh connection attempts to the management interface and a bug on the specific firmware would crash the router.

A network capture was the only good clue.

wolrah · on July 15, 2023

I'm not much of a blogger, but here's the short version. If anyone happened to be on #freepbx yesterday morning they might already have seen this.

I had just upgraded and migrated one of my clients from an on premise FreePBX system that was a few years out of date and running on a repurposed desktop computer with a failing fan to a brand new instance running on a VPS. Everything was working fine with basic phone functionality, but their main ring group was taking a few seconds to stop ringing when answered. Calls would ring in to all phones effectively simultaneously as expected, but when someone answered the call certain phones kept ringing for almost four full seconds after that point.

In the past I had seen similar behaviors on AT&T DSL caused by their mandatory modem/router device having an anti-flood filter enabled by default which saw a bunch of nearly identical UDP packets hitting at once and dropped them after the first few. This site has cable internet through a dumb modem so I knew it wasn't that, but they had recently had their IT side taken over by a new company who put in a new firewall so that was a plausible answer.

Their IT however had been taken over from us so I wasn't about to go accusing them of getting it wrong without strong evidence. I'm also just that kind of person, I hate when someone blames me or my gear for problems we're not causing so I do my best to never be that guy either. I'll waste an extra few hours of mine any day of the week to be sure I'm not accusing someone else of getting it wrong without a reason.

I fired up sngrep on the server, waited for a call to come in, and saved all the SIP sessions that resulted. Download that file, load it up in Wireshark, and I see that while the INVITE messages to start ringing all went out more or less simultaneously (27 phones in ~5ms) the CANCEL messages that stop them from ringing once one answered were sent out sequentially, with the PBX waiting for the first one to respond and confirm it had stopped ringing before sending the next. Clearly this wasn't right, and it obviously wasn't a problem with the firewall either.

At that point I started looking at the Asterisk logs and saw that an AGI script was being run for each line that was ringing which wasn't there previously. That script was associated with a new FreePBX module for missed call notifications which was installed but unconfigured on the new server. It didn't indicate it was doing anything in the UI, but it sure seemed to be doing something in the logs.

I uninstalled that module and the next call all the CANCEL messages went out in ~5ms just like the INVITEs. I then filed a bug with FreePBX documenting what happened because I'm pretty sure it's not expected or desired for simply having that module installed to cause massive delays in ring groups.

---

In this case the packet captures demonstrated conclusively that the problem was on the server itself and not in the network. If the capture at the server had looked reasonable my next step would have been to have the IT vendor capture traffic on their firewall at the same time as I was capturing at the server so we could compare and see if it's getting messed with along the way, but here it was not necessary.

Like toast0 mentioned, captures help you narrow down where the problem is.

flashgordon · on July 16, 2023

Forget something as specific and hardcore as networking. I had a week to build a nodejs poc of a legacy spring/java app/service in Amazon that was doing a bunch of service to service auth with some Tibco messaging. I couldn't find any open implementations of Tibco clients (around 2013) and the frugality leadership principle meant getting an official spec would be almost impossible. I just needed a few details of the packet structure on a couple of requests. You can guess which tool saved the day for me here! Principle Eng at the time was surprised such a tool existed!

hartator · on July 15, 2023

Fiddle is awesome too.

protonbob · on July 15, 2023

Fiddler classic is what I always go for.