Fixing Tailscale IPv6 Routes On Linux With VRFs

by Admin 48 views
Fixing Tailscale IPv6 Routes on Linux with VRFs

Hey guys, ever run into a head-scratcher where your Tailscale IPv6 connectivity just seems to vanish on your Linux box, especially when you’re rocking a VRF configuration? You’re not alone! This can be a particularly tricky issue to diagnose, but don't sweat it, we're going to dive deep into understanding why this happens and, more importantly, how to fix it. We're talking about a classic case of network routing rules getting a little too clever for their own good, specifically when ifupdown2 moves around some critical IPv6 route lookup priorities. This isn't just a minor glitch; it can make your local node's IPv6 address effectively unreachable, both within your local network and from the wider Tailnet. So, if you're experiencing mysterious IPv6 black holes on your Linux system with VRFs and Tailscale, grab a coffee, because we're about to untangle this beast together. Our goal here is to make sense of the intricate dance between network interfaces, routing tables, and VPNs like Tailscale, ensuring your IPv6 traffic flows smoothly. We'll break down the technical jargon into easy-to-digest chunks, explain the core problem, walk through how to reproduce it, and then, most importantly, provide you with actionable fixes and workarounds to get everything back online. This guide is all about giving you high-quality, actionable insights to empower you to troubleshoot and resolve these kinds of challenging network configurations yourself. So, let’s get started and bring that IPv6 connectivity back to life!

Understanding the Core Problem: IPv6 Route Lookup on Linux with VRFs and Tailscale

Alright, let's kick things off by getting to the root cause of this pesky Tailscale IPv6 route lookup issue on Linux systems that are rocking a VRF configuration. When you’re dealing with advanced networking setups, particularly with Virtual Routing and Forwarding (VRF) instances managed by tools like ifupdown2, the standard Linux IP routing rules can get a bit shuffled. Typically, Linux has a very sensible default: the local table, which contains routes for local IP addresses (like your server's own IP), is checked at the highest priority (priority 0). This means that any traffic destined for your server's own IP is immediately matched and handled locally. However, when ifupdown2 comes into play with VRFs, it often relocates this crucial local rule. Instead of sitting pretty at priority 0, it gets bumped way down to priority 32765. This change, while seemingly innocuous for some scenarios, creates a significant IPv6 routing conundrum when combined with Tailscale’s network magic. For IPv4, this isn't usually an issue because Tailscale typically inserts specific routes for individual remote IPv4 nodes into a dedicated table (like table 52). Your local IPv4 address isn't usually covered by these broader Tailscale routes, so traffic to your local IPv4 still correctly falls through to the local table at its new, lower priority and gets processed. It's a subtle but important distinction that keeps IPv4 working as expected.

Now, here’s where the IPv6 problem rears its head. Tailscale, in its effort to simplify routing for the entire Tailnet, inserts a single, broader route for its entire Unique Local Address (ULA) prefix (e.g., fd7a:115c:a1e0::/48) into that same custom table, table 52. This is incredibly efficient for general Tailnet traffic. However, because your local node's own Tailscale IPv6 address (e.g., fd7a:115c:a1e0::53) falls squarely within this larger ULA prefix, the rule for fd7a:115c:a1e0::/48 in table 52 becomes a match for traffic destined to your local IPv6 address. Since table 52 is consulted at a higher priority (e.g., rule 5270) than the relocated local table (rule 32765), any packet trying to reach your node's IPv6 address first matches the broader Tailscale route. This route instructs the system to send the packet through the tailscale0 interface, rather than recognizing it as a packet destined to the local host's own address. Essentially, the system thinks it needs to forward the packet out the interface to another peer, even though the destination is itself! This effectively makes your node's Tailscale IPv6 address unreachable, both for other devices on your Tailnet trying to connect to you and, bizarrely, even for applications running on the same machine trying to bind to or connect via that IPv6 address. Understanding this hierarchy of rules and how Tailscale's ULA prefix interacts with ifupdown2's VRF modifications is absolutely crucial to grasping why this specific IPv6 routing problem occurs and, subsequently, how we can effectively tackle it. It's a classic example of a routing rule intended for efficiency causing an unexpected side effect in a complex networking setup.

The Nitty-Gritty: Why IPv6 Breaks with Tailscale in VRF Environments

Let’s really dig into the specific mechanics of why this Tailscale IPv6 routing issue manifests when VRFs are configured and ifupdown2 is managing your network interfaces. The core of the problem, as we touched upon, lies in the altered priority of the local route lookup table. When ifupdown2 initializes VRF interfaces, it systematically shifts the default from all lookup local rule from its pristine priority of 0 to a much lower priority of 32765. To illustrate, let's look at the output of ip -6 rule show on such a system:

1000:	from all lookup [l3mdev-table]
5210:	from all fwmark 0x80000/0xff0000 lookup main
5230:	from all fwmark 0x80000/0xff0000 lookup default
5250:	from all fwmark 0x80000/0xff0000 unreachable
5270:	from all lookup 52
32765:	from all lookup local
32766:	from all lookup main

Notice that rule 5270 directs all traffic to lookup 52, while our critical local table is only checked at 32765. This order is the culprit. Now, let's consider how Tailscale sets up its IPv6 routes. When Tailscale connects, it creates a tailscale0 interface and populates table 52 with routes that ensure connectivity to your Tailnet peers. Critically, it inserts a broad route for its entire ULA prefix, something like fd7a:115c:a1e0::/48 dev tailscale0 metric 1024 pref medium. This single entry covers all possible IPv6 addresses within that Tailscale subnet. The issue arises because your local machine's own Tailscale IPv6 address (e.g., fd7a:115c:a1e0::8601:b017) also falls directly within this /48 prefix. So, when a packet arrives destined for your machine’s own Tailscale IPv6 address, the routing decision process unfolds like this:

  1. The kernel starts evaluating rules from the lowest priority number up.
  2. It quickly reaches rule 5270: from all lookup 52. This rule is consulted long before rule 32765: from all lookup local.
  3. Inside table 52, the kernel finds the route fd7a:115c:a1e0::/48 dev tailscale0. Since your machine's own IPv6 address is part of this prefix, this route is a match.
  4. The system interprets this match as