Traceroute doesn't see 90% of the machines your packet passes through.
When your packet leaves a router at some-pop-some-port-wherever, that fiber isn't usually the same piece of glass that plugs into the next hop. There's a whole chain of amplifiers and possibly multiplexers that handle it between here and there.
Some of those provide reliable transport service, giving you the illusion of a fiber that never breaks, despite backhoes doing what backhoes do. Some of those shift the wavelength of your signal, letting you use cheap optics without troubling in the nuances of DWDM that packs your signal alongside dozens of others onto the same long-haul fiber. Some of those just boost the signal, along with all those others on the same fiber.
But what all those machines have in common, is that none of them speak IP. None of them touch the payload. None of them are capable of decrementing a hop count. They're "part of the wire" as far as the packet is concerned.
In my experience, this leads to two types of network engineers, separated by their understanding of these underlying realities.
What's wrong with that? Certainly, someone with a complete picture is "better", but it is effectively two different types of problems. Do they need to be combined?
When everything's happy and packets are flowing as they should, there's absolutely nothing wrong with living at whatever layer of abstraction you wish.
But when something is broken, troubleshooting is a different animal. When people make assumptions based on their abstractions, they can waste a lot of time chasing a problem that can't exist, because the abstraction hides several more layers of complexity. Their mental model of the problem relies on a fantasy which they may not even realize is a fantasy.
It's unlikely for someone to accurately diagnose a fault that's several layers below where they're operating. Understanding that their abstractions are abstractions, and knowing when to hand things off to another layer of engineer, is critically important.
I mention it here because people aren't generally tracerouting things unless they suspect breakage somewhere. It's a troubleshooting tool. But people whose mental model of the network _only_ goes as low as the IP layer, are unlikely to do anything useful with traceroute unless the fault also happens to be at the IP layer.
Sounds like fun :)
You should write up a blog post where you had to debug some weird gnarly issues. A lot of us higher up the tech stack are pretty far removed from the signals level issues you're talking about but would love to hear about a low level debug session!
I would love to read such a blog post as well!
Thanks for the clarity in your response. I think we are pretty well aligned.
It is human stacking of the OSI model. The person in the ditch repairing the line probably doesn't care about traceroute, but the person with traceroute probably should care about what happens in the ditch.
I think you're reading the connotation into this. In my opinion, there is nothing wrong with being a higher level engineer. There's plenty of work "above the fold" so to speak.
For example, I think it behooves every software engineer to have a general grasp of how CPUs work, what speculative execution is, how CPU caching and invalidation works, etc, but the average webdev doesn't really need to know this, and might run into some abstraction breaking implications only a few times in their career, while debugging a tricky bug or performance regression.
I imagine something similar is true for network engineers. Likely many can work for years at a time without worrying about fiber signal repeaters, other than that one weird packet loss issue that ends up getting traced back to a marginal optic in a cable vault somewhere.
Of course, none of this applies to the compiler engineers or the people who build the physical network layer. They are in the "second type" of engineer that actually needs to understand this stuff in depth in order to do their jobs on a day to day basis.
I am reading the connotation into this and asking about it. 4 paragraphs of talking only about tech and then it diverged into a personal statement at the end.
What I meant by that statement is that whatever you're asking about is actually coming from you. I don't think the simple statement in the comment actually contains any connotation. Just like the old programmer joke - there are 10 kinds of people in the world, those who understand binary and those who do not. We just like dividing people into classes because it helps us understand the world.
I guess to engage with this a bit more, one reason you might find a negative connotation towards engineers who are ignorant of the underlying details is that it is the larger set. After all, most engineers do not need to understand the physical layer in depth, so there are fewer who do. People love to feel like they're part of a smaller "higher" class, for complex social reasons. This sometimes comes with a bit of a distaste towards those who are part of the "regular" class.
But overall, I think you're taking this further than it really needs to be taken. The GP was just saying that not all network engineers are wizards in the technical details of lower layers, just like not all software engineers can write a compiler. Does that make them "worse" in some way? Well it makes them a worse fit for that few specific jobs that require that extra knowledge, but in general, I don't think so, and I doubt most people do either.
That's why we have abstractions.
The OSI model exists for a reason.
You don't think about the life of the electrons going through your processors when you code.
Traceroute is a view at a certain level of abstraction. It also doesn't tell you if your packet was delivered using ethernet, wifi or a token ring. It just doesn't matter.
The OSI model hasn't been accurate representation of ip networking since pretty much day 1. It was made specifically for a different protocol, but in the stack we use today some layers are better split up in 2, some protocols exists in multiple layers. It's a nice metaphor but I think it's time to drop it!
https://computer.rip/2021-03-27-the-actual-osi-model.html
Something that I've noticed that somehow ends up lost when people learn "the model" is the encapsulation aspects.
I don't know if it's missing in people's course work or what, but I've had to use http://www.tcpipguide.com/free/diagrams/ipencap.png many a times to explain how stuff like VPNs work, correct statements like "firewalls don't have a routing table, firewalling is layer 4", explain things like MTU and payload size, or why certain traffic doesn't go beyond a broadcast segment normally.
Personally I think this is one of the better visualizations.
OSI was supposed to be a competitor to IP and Ethernet. That's the reason it exists.
It just doesn't matter until it does. It's fine to work at a higher level of abstraction. But people who understand a lower level of abstraction can do things people will call "impossible" with fault injection exploits, rowhammer etc.
To clarify my previous post, asymmetric routing is strictly an L3 behavior, and ECMP routing can also be an L3 behavior where a router chooses one of many equal-cost next hops based purely on data in the IP headers. The exact behavior of course depends on the ECMP load-balancing algorithm in use, whether it's per packet, per destination, or using a hash. And furthermore whether it's strictly IP or if it looks deeper into the packet and uses L3+L4 headers in its decision making.
Both asymmetric routing and ECMP routing are visible from L3. In the latter case, the routing decision can utilize some L4 data, so some L4 frobbing to get useful data points in practice is necessary for useful real-world diagnosis.
I agree with others that the OSI model is a good metaphor and a framework for reasoning about networking, but it is far from perfect, and the reality for those designing and operating network protocols and devices is messy.
MPLS is admittedly invisible and there isn't a thing you can do about it in the same way that you can't expect traceroute to give you a view of the switch ports it went through on a LAN. Of course it is useful to understand and keep in mind the fact that there may be, sometimes huge, gaps in your traceroutes. A sudden huge jump in RTT from one hop to the next can be confusing when trying to understand and troubleshoot a network issue.
CCNA baby
The more you know about how something works the better equipped you are to handle things breaking. It's a safe bet that semiconductor physics and the gate-level construction of CPUs isn't necessary to be a good programmer, but not much further up that stack are things like understanding superscalar processor architecture, how caches work, how CPU protection levels work, etc. Knowing about those things, for sufficiently performance or security-intensive applications, can make a ton of difference.
There's an analogy to networking there, too. You don't necessarily need to know how wave-division multiplexing, BGP, or DNS work to communicate over the Internet. For some categories of problems, though, a little bit of knowledge allows you to punch just a bit above your level.
Not sure why this matters for anyone not working for the ISP that those packets go through, and people working there would know about it. And they're mostly completely passive devices like filters.
Traceroute is what we have, and its mostly enough as you can say at which hop things went wrong, then its up the ISP to figure out what devices they need to look at.
It definitely helps to understand it when you are trying to explain to the ISP that it’s their problem. Knowing what you’re talking about will get your issue resolved much more quickly.
That depends who's doing the talking. You're a large business with a dedicated connection and your own network engineering team? Yeah, totally matters.
You're a single consumer, or a small business? Your dialog is limited to "Yes, I switched the router on and off. Yes, my local network works. No, I can reach the website over my phone, this is your problem".
Repeat three times, then get the inevitable reply of "it'll be up again in the next 24 hours".
Unless you're a network engineer, this really doesn't matter.
How does it help the ISP if you're just guessing what they might have between two routers?
In traceroute’s defense, it is traceroute — for sure it doesn’t tell you anything about the devices that don’t operate at the IP level. Those devices either don’t affect the IP “route” abstraction (e.g., signal boosters) or do so in ways that end up plausibly visible in the next hop.
There’s a reason the network layer abstraction is so strong, and an analogy to CPU ISAs here that have a similar strength.
TCP, similarly, doesn’t tell you when packets are deduplicated/resent/reordered/etc. — that’s just not part of the presented abstraction. Want that? Use UDP.
"It doesn't show me the local digital loop carrier!"
"Is the digital loop carrier doing any IP-level routing?"
"No, but..."
The smart man knows all of these things exist. The wise man understands when these things matter.
DWDM protected waves, MPLS, etc, are all out of scope when someone does a traceroute, finds their traffic is going to Europe asks me about it, and then I see a local upstream has picked up Cogent as a transit provider, and Cogent is preferring their customer routes over routes from their Tier 1 settlement free peers.
But this article is very much a primer, it's using the unix format of traceroute and talking about ICMP.
You don't know how many times I've seen people claim that they had two redundant connections, when those connections clearly rode the same conduit across a bridge not half a mile down the road. No, I don't care what your traceroute says, the backhoe knows otherwise.
as a predominantly wave-purchaser than builder, it matters when it matters and it doesn’t when it doesn’t.
we get by fairly well assuming l1 is working as designed when the light is on and not throwing errors.. at least on day-to-day ops. planning/sourcing is a bit of a different thing.
one note to your point though, back when people were still regularly picking up OC3s or doing frame relay or whatever, it was certainly more day-to-day to understand these things, but cant really fault anyone since most of that junk has gone away and we’re left with the happy ethernet handoff. especially in small scale DC/enterprise. cloud, too, made us care less.
And on the active networking component side of things it doesn't touch on MPLS which also doesn't modify the IP headers. You can enter a network in New York and get MPLS switched across the country via active network devices all the way to California and have it show up as a single hop on traceroute.
The explanation is great for a toy network bu in today's Internet the vast majority of routes are going to be asymmetrical and that requires running traceroutes from both ends and interpreting the results to find the faulty hop.
The author also doesn't cover equal cost multipath (ECMP) which is everywhere. With ECMP you have multiple ports that lead to the same. Next hop and packets are hashed based on some part of the fourtuple, sometimes five tuple including the input Port. In order to track down the faulty link, you need to pro each and every one of the ports which requires that you use a higher level protocol like UDP. Using icmp in this case will not show you an issue some percent of time, providing false negatives which makes it less useful.
Below the IP layer there is always the physical layer somewhere with all the complications.
Traceroute also doesn't show VPNs or other transports.
But it shows the TCP hops and the timing, thus gives me the tools to see who to call or which routing table or wore to analyse or fix. When doing that I got to lok at that connection.
Right, there is also the issue of BGP, Border Gateway Protocol for the core of the Internet?
Which command does go beyond the "traceroute" limitations you mentioned ?
A lot of ISP internal networks is going to be running over MPLS or encapsulation. You're not going to see any of those hops. The packet will just look like it teleported from the CPE to the Internet.