We've moved from Blogger to WordPress!

You should be automatically redirected in 5 seconds. If not, visit
http://blog.michaelfmcnamara.com
and update your bookmarks.

Showing posts with label ERS8600. Show all posts
Showing posts with label ERS8600. Show all posts

Friday, June 13, 2008

ERS 8648GTR duplex mismatch

8648GTRI came across a bulletin from Nortel just recently that I thought was important enough to post here in case anyone reading this has 8648GTRs installed in his/her ERS 8600 chassis. I have about 10 of them installed at multiple locations, primarily in core switches feeding large server farms and other high-speed devices. I don't believe I need to extol the pains that auto-negotiation sometimes reaps on network engineers. While modern network switches and NICs are definitely more compatible with respect to auto-negotiation, problems sometimes still arise. It would seem that a duplex mismatch on one port of the 8648GTR could potentially impact performance on up to 24 ports. Here's the text from the bulletin;

Background:

An Ethernet port can operate either in Full or Half Duplex mode. A duplex mismatch is created when using inconsistent settings for duplex mode, i.e. full duplex on the port and half duplex on the connected device (or vise versa). This situation is most likely created when using inconsistent and inappropriate settings for auto-negotiation, i.e. auto-negotiation enabled on the port and disabled on the device connected to the port (or vise versa). The duplex mismatch problem can be corrected by setting consistent duplex mode on both the port and the connected device when hard setting the duplex mode or by enabling auto-negotiation on both the port and the connected device, when using auto-negotiation.

Ethernet ports of most devices today have auto-negotiation enabled as the default setting. When a device with auto-negotiation disabled is connected to a port that has auto-negotiation enabled, the port is not able to detect the duplex setting of the connected device and falls back to half duplex thus potentially causing a duplex mismatch. A duplex mismatch will cause physical layer errors and performance degradation of the connection. Any mixture of auto-negotiation enabled on one-side and auto-negotiation disabled on the other side is an "unsupported" configuration. The setting on both sides of any connection must match for proper operation. A problem has been identified when there is a duplex mismatch on one or more ports of an 8648 GTR module. For an 8648 GTR module, a duplex mismatch may cause complete communication issues on the port with the mis-matched duplex or occasionally on the entire lane (Port 1-24 or Port 25-48) that contains the port with mismatched duplex. The module can be recovered from the situation when physically reseated, but for complete recovery the mis-configuration must also be corrected. Correcting the duplex setting configuration alone will not recover the communication loss until the module is reseated as well.

Analysis:

A duplex mismatch may cause communication loss on a port or an entire lane of an 8648 GTR module. When there is such a communication loss, the debugging commands show that the ingress stats look normal with all traffic ingressing the impacted port(s) and the MAC addresses learned in the Forwarding Database Table for the devices connected to the port(s), but no traffic egressing the port(s).

Recommendations:

Nortel recommends proper configuration of auto-negotiation whenever possible to prevent a duplex mismatch situation. To avoid a duplex mismatch, auto-negotiation must be enabled on the port as well as the device connected to that port.

You can find a copy of the bulletin in PDF format right here. Interestingly there are quite a few restrictions and issues with the 8648GTR that I should probably discuss them here when the time allows.

Nortel is also in the process of releasing v5.0 software for the Ethernet Routing Switch 8600 along with four new IO modules (cards); 8612XLRS, 8634XGRS, 8648GBRS and 8648GTRS. I hope to talk about those in the very near future.

Cheers!

Sunday, March 2, 2008

Ethernet Frames Maligned

I thought I would share this story with everyone. We had discovered an issue with Ethernet frames being maligned/corrupted between the Motorola Access Port 300 (AP300) and the Motorola Wireless (WS5100) LAN Switch.

We had a ticket open with Motorola trying to understand why a significant number of our AP300s were rebooting themselves at odd hours during the early morning. Motorola had requested that we provide network traces at the Access Point and Wireless Switch. Surprisingly Motorola came back and pointed out that the payload in some of the Ethernet frames was getting modified between the Wireless Switch and the Access Port.

The fundamental equipment involved in this problem were as follows; Nortel Ethernet Switch 460 (ES 460), Ethernet Switch 470 (ES 470), Ethernet Routing Switch 5520 (ERS 5520), Ethernet Routing Switch 8600 (ERS8600); Motorola Wireless LAN Switch 5100 (WS5100) and Access Ports 300(AP300).

The Motorola WS5100s and AP300s are physically connected over the same Layer 2 Ethernet network. The “Ethernet 1” port on the WS5100 is connected to a Virtual Local Area Network (VLAN) which provides a single broadcast domain for all AP 300s to connect to the WS5100. The “Ethernet 2” port on the WS5100 is used as a trunk interface to bridge between the WLANs (wireless) and VLANs (wired) segments. We essentially have core switches and edge switches (distribution is collapsed down into the core). The core switch can be a single ERS8600 or a pair of ERS8600s (Layer 3) connected via an IST (Inter-Switch Trunk). At the edge we generally deploy ES470(Layer 2) or ERS5520(Layer 2). We have deployed ES460s (PoE) into closets where ES470s are already present to specifically support PoE and the wireless network.

Here is a quick topology of the network with respect to the WS5100s and AP300s.
We recently started deploying the ERS5520s (in place of the ES470s) which directly support PoE allowing us to deploy one less piece of equipment at the edge and also provides one less bridge (hop) to switch through.We have been plagued by a problem that is affecting the Motorola AP300s causing them to randomly reset and re-adopt at different times of the day without warning or cause. In searching for the cause of this problem we’ve documented numerous Ethernet frames being maligned as they travel from the AP300 to the WS5100.

With respect to the examples I’m going to draw the following topology applies;

It should be noted that we do use the ES460s and ERS5520s to remark the 802.1p bits in the Ethernet frame so we can provide some measure of QoS with respect to the Nortel (Spectralink) Wireless LAN phones that we currently have deployed. In essence we mark all Ethernet packets on the “APVLAN” with a QoS level of 4 (“Gold”, BoSS-65530).

Network Trace Analysis

I will refer to the following two trace files;
"ers460side1.pcap" closet ES460 trace
"ers8600side1.pcap" core ERS8600 trace
I tried to merge up the two traces so each trace is synchronous with the other. We'll focus on packet 3, you can see in the closet ES460 trace that bytes 15 and 16 are 0x20 and 0x12 respectively.



Looking at the other trace you can see that bytes 15 and 16 are different than in the first trace. You can see that the bits in 16 have been shifted to bytes 26.



You can again see the same problem in packet 4;




You can see it again in packets 6, 7, 10, 39, 43, 45, etc.

In the end the problem turned out to be a software/hardware issue with the Nortel Ethernet Routing Switch 8600. If DiffServ was enabled on the Ethernet port that was being mirrored, the mirrored data was somehow getting corrupted in the process of copying the packets. Once we disabled DiffServ on the Ethernet port the problem disappeared. We opened a case with Nortel but were told that it would be handled as an enhancement request, not a correction request (go figure!).

I personally no longer trust either the port mirror or packet capture facilities of the Nortel ERS 8600 and rely on physical taps so there can be no doubt or questions about the validity of the capture data.

We still have issues with our Motorola AP300s rebooting from time to time but they have been much better since Motorola released v2.1.3 software for the WS5000/WS5100s. We are currently working with Motorola to resolve issues in their v3.x software line that is causing our Nortel 2211 (Spectralink) wireless phones to occasionally reboot while idle and roaming.

Cheers!