Category Archives: Layer 2

Misconfigured Servers and Proxy ARP

I worked a migration this weekend from an older Catalyst 4500 to a Nexus 7004 and ran into a (what turned out to be) a very simple issue to fix, but one that confused the hell out of me trying to troubleshoot it.

The config on the old switch was very basic:  (I’ve included only the related vlan info):

interface Vlan1
ip address 10.68.1.1 255.255.255.0
ip pim sparse-dense-mode
ip policy route-map POLICY10
ip igmp join-group 239.255.255.255


interface Vlan109
ip address 10.68.9.1 255.255.255.0
ip helper-address 10.68.1.4
ip helper-address 10.68.1.22
ip wccp web-cache redirect in
ip pim sparse-dense-mode
ip igmp join-group 239.255.255.255

Looks relatively simple, right? Right. The topology was not complicated…it should be pretty cut and dry.

The symptoms we were having are described below:

  • A user in any vlan directly attached to the 7K could not access certain resources on Vlan 1, but some resources on that same vlan were fine. Keep in mind that users had no connectivity issues to these same devices prior to migration.
  • We could reach these resources in vlan 1 (that were not reachable from directly attached vlans) from a remote WAN site with no problem.
  • A ping from the N7K worked fine by default (source from the same vlan interface as the destination address), but if you sourced it from another vlan, it failed.

Hmph…weird. Any ideas…care to wager a guess?

So here are some of our findings after some additional digging:

  • Some servers had a subnet mask of 255.255.0.0
  • Some servers had a default gateway of a device that doesn’t exist!?! (but still within the same /16 space)
  • Some servers had the proper subnet mask

Do you detect a theme yet? Have a suspect in mind? (Well, two suspects)

If you guessed proxy-arp, you win a shrubbery. If you guessed proxy-arp and server guys that didn’t apparently know understand subnetting, you win two shrubberies.

Proxy-arp, if you are not familiar with it,  is a technique in which one host, usually a router, answers ARP requests intended for another machine. By “faking” its identity, the router accepts responsibility for routing packets to the “real” destination. Proxy ARP can help machines on a subnet reach remote subnets without the need to configure routing or a default gateway. Proxy ARP is defined in RFC 1027 (Definition from: here)

Proxy-arp is disabled by default on NX-OS. On the version of IOS that the 4500 was running, proxy-arp was enabled by default.

For example, one server we were investigating in Vlan 1 had IP address of 10.68.1.21/16. A device in Vlan 9 had an IP address of 10.68.9.5/24. See the problem here?

The server in vlan 1 thinks that the host in vlan 9 is in the same subnet, and since the N7K won’t respond to proxy-arp, the host in Vlan 1 won’t be able to communicate with the host in vlan 9, but it CAN reach devices outside of it’s own /16 subnet. That’s why communication to another branch worked fine, but communication to any directly attached vlan failed. (That is, any directly attached vlan within the same subnet as the server).

There’s a good discussion on the Cisco Support Forums about this (found here), so I won’t go into how the host builds the frame and the related source and destination MAC addresses, but if you want additional clarity, please let me know.

It was really a frustrating issue, and I wish I had thought of this sooner, but it’s one of those cases that you don’t run into it quite often enough to keep it fresh in your mind.

Advertisements

My VLANs…Where Did They Go?

In my line of work, we tend to do network assessments fairly regular basis. Customers will request them in order to help better understand what’s wrong with their network and how to improve it. This entails digging into the customer’s network and hopping from device to device and reviewing configurations.

One of the things that I cite quite often are VTP misconfigurations or not following best practices. Since many customers aren’t intimately familiar with VTP and how it works, a lot of times I’ll end up in discussions on the some of the finer points of VTP, and one of those finer points usually ends up being how a misconfiguration can blow away all of your VLANs. I’ll usually mention that even as a VTP client, a switch can still update the VLAN info on a VTP server, given the correct configuration. However, I can never seem to remember what the exact configuration is for this scenario to occur, so here it is.

  1. A trunk link must be present (either statically or dynamically created)
  2. The VTP client switch must have the same VTP domain name
  3. The VTP client switch must have the same VTP password
  4. The VTP client switch must have a higher revision number than the rest of the network

Given these circumstances, a VTP client switch can update your VTP servers with whatever VLANs the client has on it…and that may be all of the same VLANs plus a few new ones, or just the default VLAN!