Networking: June 2010

Tuesday, June 15, 2010

Provider Bridges

IEEE 802.1ad
Amendment to 802.1Q, for the purpose to create an architecture and bridge protocols to provide separate instances of MAC services to multiple independent users of a Bridged Local Area Network without the cooperation of the users, and small cooperation between users and provider of the MAC service.

What this does is provides the possibility for customers to run their own VLANs inside a service provider's provided VLAN. In doing this the Service provider makes one VLAN for the customer and the customer creates the given VLAN as a trunk.

Insertion of 802.1ad DoubleTag in Ethernet-II frame

Defined in this standards document:

Wednesday, June 9, 2010

CR-LDP and RSVP-TE

CR-LDP and RSVP-TE are both signaling mechanisms used to support Traffic Engineering across an MPLS backbone. RSVP is a QoS signaling protocol that is an IETF standard and has existed for quite some time. RSVP-TE extends RSVP to support label distribution and explicit routing while CR-LDP proposed to extend LDP (designed for hop-by-hop label distribution to support QoS signaling and explicit routing). MPLS Traffic Engineering tunnels are not limited to IP route selection procedures and thus will spread network traffic more uniformly across the backbone taking advantage of all available links. A signaling protocol is required to set up these explicit MPLS routes or tunnels.

There are many similarities between CR-LSP and RSVP-TE for constraint-based routing. The Explicit Route Objects that are used are extremely similar. Both protocols use ordered Label Switched Path (LSP) setup procedures. Both protocols include some QoS information in the signaling messages to enable resource allocation and LSP establishment to take place automatically.

At the present time CD-LDP development has ended and RSVP-TE has emerged as the "winner" for traffic engineering protocols.

IS-IS

Intermediate System to Intermediate System:
A protocol used to determine the best way to forward datagrams through a packet switched network.

A IGP protocol
Not intended for routing between ASes
Floods topology information throughout the network
Packets are forwarded based on best topological path through network
Uses Dijkstra's Algorithm
Preferable to service providers while OSPF is preferable to enterprise networks

Whats the difference from OSPF:
Because both are link state, both support Classless networks, use multicast to discover neighboring routers using hello packets, and support authentication of routing updates, these protocols are very similar.

OSPF routes IP and is a layer 3 protocol (on IP), IS-IS is an OSI protocol (same as CLNS), and does not use IP to carry routing information.
A topological map is built of the network, which indicate which IP subnets each IS-IS router can reach, using the lowest cost to an IP subnet to forward traffic.
Differs from OSPF in how topology is flooded. It is less chatty, and can support larger networks, therefore favorable in ISP (Internet Service Provider) networks. It easily adaptive to support IPV6 due to the fact its neutral regarding network address routing.
The "Area" concept is different - IS-IS routers use Levels, with Level 1 being and intra (within) area, Level 2 being an inter (between) area, or level 1 and 2 routers. Level 2 routers can only form relationships with other level 2 routers, and level 1 with level 1. 1-2 routers exchange with both kinds and are used for the communication between the two. IS-IS borders are in between routers, designated as level 2 or 1-2. This results in a IS-IS router being a part of a single area, and does not require area 0. OSPF creates a web topology and IS-IS creates a logical topology of a backbone or level 2 routers with branches of level 1-2 and level 1 routers forming individual routers.

Tuesday, June 8, 2010

OSPF

Open Shortest Path First:

Dynamic routing protocol in IP networking. A link state routing protocol in the group of Interior gateway protocols (IGP), that operates in a single Autonomous system (AS). IPV4 uses OSPF V2
Gathers link state information from routers and constructs topology map of the network, this creates the routing table
Supports variable-length subnet masking (VLSM) or CIDR
Detects changes in topology and converges a new routing structure within seconds using a shortest path algorithm (Dijkstra's)
Link state info is maintained on each router as a link state database (LSDB), making a tree image of the network. Copies of this are updated through flooding on OSPF routers
Routing policies in OSPF are governed by link cost factors, this could be the distance (round trip time), network throughput of the link, or link availability and reliability, represented as unitless numbers. Load balancing can occur with traffic balancing between equal path costs.
OSPF networks can be divided into areas, which are identified as 32-bit numbers, in decimal or octet.
Area 0 (0.0.0.0) is the core or backbone region. Each additional area must have a direct or virtual connection to the backbone, these connections are maintained by a area border router (ABR). ABRs maintains separate link state databases for each area it serves and contains summarized routes for all areas in the network.
OSPF doesn't use a TCP/IP protocol (UDP or TCP), but is encapsulated into IP datagrams with port number 89.
OSPF uses multicast addressing for route flooding on broadcast network links, id a non-broadcast network exists special provisions allow neighbor discovery. OSPF IP packets never travel more than 1 hop, OSPF reserves the multicast address 224.0.0.5.
When multicast IP traffic needs to be routed, OSPF supports Multicast Open Shortest Path First (MOSPF), this is not used all the time as PIM is typically used in conjunction with OSPF or other IGPs are widely deployed.
OSPF can run securely between routers, using a variety of authentication methods to allow only trusted routers.
Version 3 (for IPV6) does not use authentication, instead it used IPSec, It runs per link, rather than subnet. IP prefix's has been removed from hello and link state adverts, area and router ID are still 32-bit values.

Neighbor Relationships:

Adjacent routers are formed when they are in the same broadcast domain or have a point to point link. This is identified by a hello OSPF packet, and a two-way state is formed (very basic). The routers in the network select a designated router (DR) and a backup designated router (BDR) which act as a hub to reduce traffic. OSPF uses unicast and multicast to send hello's and link state updates.
Neighbor tables are called adjacency databases. In order to form neighbor relationships between two routers, the interfaces must be in the same area, one interface can only belong to one area.

Areas:

Backbone is responsible for distributing info to non-backbone areas, and must be contiguous. If physical limitations exist, the configuration of virtual links can be used. As an example - 0.0.0.0 is physically connected to 0.0.0.1, 0.0.0.2 is not connected to .0 but it is connected to .1; .2 can use a virtual link through a transit area of .1 to reach the backbone.
Stub area - does not receive route adverts external to the AS and routing within the area is based on a default route.
Not so stubby area (NSSA)- A type of stub area that imports AS external routes and sends them to other areas, yet it itself (its own AS) does not receive AS external routes from other areas. In other words, its an injection of external routes in a limited fashion into the stub area.
Totally stubby area (TSA) - Like a stub area, but does not allow summary routes in addition to not having external routes, or Inter-area (IA) routes are not summarizes into totally stubby area. Only default routes are used in the area, thus being the only Type 3 LSA in the area.
NSSA TSA - Takes attributes of a TSA (Type 3 and 4 summary routs are not flooded into the area type). The area can receive only from the default route 0.0.0.0 but also contains the ASBR that accepts external routing info and injects it into the local area 0.0.0.0. LSA type 7 only exists in NSSA, the ASBR generates the LSA, and an ABR translated it into a type 5 LSA which is then propagated into the OSPF domain.

An area is called NSSA and TSA when a ASBR sends externals to a TSA, and are available to OSPF speakers in that area. The external routes are summarized before injecting them in the TSA, this would typically happen when a newly acquired subsidiary is on the edge of a TSA. Routers in TSA-NSSA send all traffic to ABR, but not routes by ASBR.
Path Preference:
Path cost is used as a basic routing metric, determined by speed of the interface. There are 4 types of metrics, with preference in order:
1.Intra-area
2.Inter-area
3.External type 1 - external path cost and sum of internal path cost to ASBR
4.External Type 2 - value of external path cost
OSPF-TE:
Traffic engineering uses opaque LSA type-length-value elements to obtain more information about topology. This can support OOB on data plane network, and can be used on non-IP, such as Optical networks.
Router Types:
The router type is an attribute of an OSPF process. A physical router can have one or more OSPF process - ex) A router connected to more than one area and receives routes from a BGP process connected to another AS, is both a ABR and an ASBR

Area Border Router (ABR): A router that connects one or more areas to the main backbone. Its a member of all areas its connected to, and keeps a link state DB in memory, one for each area its connected
Autonomous System Boundary Router (ASBR): A router that's connected to more than one routing protocol and exchanges routing information with other routers in other protocols. They typically run an exterior routing protocol like BGP. It is used to distribute routes from other external ASs throughout its own AS.
Internal Router: A router that has OSPF neighbor relationships interfaces in the same area.
Backbone router: Routers connected to the OSPF backbone.
Designated router: Router interface elected to be broadcast multiaccess. Nonbroadcast multiaccess (NBMA) media would require special techniques to support DR function, this is usually done via point to point lines.
DRs can be associated with other OSPF router types, A router can have a physical interface that are DR, others Backup (BDR), and others non-designated. DRs are elected on the following criteria:
If priority on OSPF is 0, it will never be DR or BDR
If DR fails and BDR takes over, another election will choose a new BDR
Highest priority hello packets win the election.
If two routers tie in priority setting, the highest Router ID (RID) wins (highest loopback IP configured on the router). If no loopback exists the highest interface IP wins.
Second highest priority becomes BDR
priority is between 0-255, higher numbers increase chances of becoming DR
If a new router with a higher priority comes online after a DR has been elected, it will not be a DR until current DR fails.
When DR goes down and BDR takes over, a new BDR is elected. If the scenario repeats the new BDR stays unchanged.

DRs are a source for routing updates, they maintain a complete topology table of the network and send multicast updates. All routers form a slave/master relationship with the DR. When updates are received they are sent to the DR and BDR on 224.0.0.6, and DR sends and update on 224.0.0.5. This reduces network load. Elections on NBMA (Frame Relay or ATM) can also occur. elections do not happen on p-to-p links because the routers must be adjacent and the bandwidth cannot be further optimized.
There are 5 OSPF Packet formats as follows:
Type Description

1 Hello

2 Database Description

3 Link State Request

4 Link State Update

5 Link State Acknowledgement

Type	Description
1	Hello
2	Database Description
3	Link State Request
4	Link State Update
5	Link State Acknowledgement

Monday, June 7, 2010

MPLS

Multiprotocol Label Switching:

When used in a network, incoming packets are assigned a "Label" by a Label Edge Router (LER). The packets are forwarded along a Label Switch Path (LSP), by a Label Switch Router (LSR) makes the forwarding decisions, wherein the LSR strips off the label and applies a new one with instructions for the next hop on how to forward the packet.
LSPs are similar to circuit switched paths, except they are not dependent on particular layer 2 technology.
LSPs can traverse any type of transport medium (ATM, Frame Relay or Ethernet)
Brings the speed of layer 2 switching to layer 3, routers preform forwarding decisions based on the label, rather than route lookup. Although now irrelevant due to routers able to process routing information as fast speeds now.

MPLS Benefits:

Class of traffic and path engineering
Service providers create IP tunnels throughout their network (VPNs) without the need for encryption or end user appliances
L2 Transport (Ethernet, Frame relay and ATM over IP/MPLS core)
Elimination of multiple layers, all L1 (SONET), L2 (ATM) to L3, thereby simplifying the network management.

MPLS components:
Label: Short, fixed length, locally significant ID used to identify a FEC (Forwarding Equivalence Class) to which that packet is assigned.

The MPLS Label is formatted as follows:
|-20bits Label-|-3bits CoS-|-1bit Stack-|-8bits TTL-|

32-bit label is located after the L2 header and before IP header, it contains:

20-bit label field - Value of the MPLS label
3-bit COS field for queuing and discard algorythms
1-bit stack field - hierarchical label stack
8-bit Time to live (TTL) or shim header, provides IP TTL functionality.

Label Switch Path:

Provisioned using Label Distribution Protocols (LDPs) such as RSVP-TE or CR-LDP - these are used to establish a path through the network and reserve necessary resources to meet pre-defined service requirements.
Contrasted with traffic trunks - The path which a trunk traverses can be changed, the traffic trunk are aggregations of traffic flows of the same class inside an LSP

Label Distribution Protocol:

Lets an LSR distribute labels to LDP peers. When a label is assigned to a Forwarding Equivalence Class (FEC), LDP lets is relevant peers know of the label and its meaning. When a path is made, LDP helps in establishing a LSP using a set of procedures to distribute these labels among LSRs.
LSRs will agree on what labels should be used to forward traffic, this understanding is agreed upon by Label Distribution. LDP and Labels are the foundation of label switching.

LDP has the following basic characteristics:

Provides LSR discovery mechanism which allows peer discovery and communication establishment
Four classes of the message exist: DISCOVERY, ADJACENCY, LABEL ADVERTISEMENT, and NOTIFICATION
Runs over TCP

LDP modes:

Unsolicited downstream vs downstream on demand label assignment
Order vs independent LSP
Liberal vs Conservative label retention

Forward Equivalency Class:

A set of packets which are forwarded in the same manner.
FEC packets follow the same path
FEC packets are identified by labels
Examples include unicast packets, and multicast packets with the same source and destination address. Another example is when unicast Type of Service (TOS) bits are the same.

Label Switch Path:
A set of LSRs that packets belonging to a particular FEC travel to reach their destination. Because of the hierarchy label, it is possible to have different LSPs at different levels of labels for a packet to reach the destination - A LSP with a label of x has a set of LSRs that has a packet p that has to travel at x to reach its destination

As an example: consider the following scenario

|------| 1 1 |-----|
| R1 |--\ /---| R5 |
|------| \ 2 2 2 / |-----|
\|------| |-----| |-----| /
| R2 |---| R3 |---| R4 |/
|------| |-----| |-----|

p must travel from R1 to R5, 1 and 2 are the stack depth. R1 and 5 are edge routers and the rest are interior gateway routers. R1 and 5 are peers, alternately 2,3, and 4 are peers. R1 will swap p's label with a corresponding label used by R5, it will also push a new label for R2,3, and4 with the stack level at 2. Two LSPs exist - 1 at R1 and 5, 2 at R2,3, and 4.

Routing protocols and routes learned from the LSRs build LSPs. RSVP may be used
Possible to support both ordered and independent control without interoperability
Label binding can be created locally and remotely - downstream or upstream LSRs. Created in response to control or data driven traffic, the binding controlled in a stand alone protocol or piggybacking from an existing one such as Border Gateway Protocol (BGP).

IGPs (Interior Gateway Protocols) play a factor:
IGPs like OSPF and IS-IS are used to define reachability and binding/mapping between FEC and next hop addresses. No changes are required to run MPLS in corporations that use these IGPs due to the compatibility with these protocols.
Supported Protocols:
Network layer - IPV6, IPV4, IPX, and Appletalk
Link Layer - Ethernet, Token Ring, FDDI, ATM, Frame Relay, and Point to point.
MPLS not only works over any data link layer protocol, but carries a data link layer protocol over IP, Enabling Ethernet over MPLS
MPLS and ATM:
ATM can have traffic engineering capabilities with MPLS. This is done by tagging IP packets with labels that specify routes and priorities. This combines routing scalability and flexibility with the performance and traffic management of layer 2 switching.
Can map IP addresses and routing info into ATM switching tables. MPLS uses the same label swapping method as ATM. ATM-LSRs are forwarded by ATM forwarding component with label info in ATM header (VCI and VPI Fields). MPLS provides control component for IP on ATM switches and routers, MPLS IP services replace PNNI, ATM ARP Server, and NHRP Server.
Sometimes MPLS and ATM are on the same device, but they will run separately. MPLS path changes have no effect on ATM VCs(Virtual Circuits), this is called Ships in the Night. ATM control mechanisms can avoid resource conflicts by not allowing reservation of resources imposed by MPLS. This is being used to slowly phase out ATM and understand the need of resources to fill be MPLS.
MPLS-TE:
MPLS Traffic Engineering is used to select the best paths on the network when multiple or parallel links exist. It also optimises network resources for traffic performance.
In order to support TE, Source routing and the following components are needed:

All constraints are taken into account, therefore the source needs to know the restrictions of the other routers in the network

Ability to determine information about topology and attributes with links in network once path is established, forwarding support is necessary
Ability to reserve network resources and modify link attributes

MPLS TE leverages several foundation technologies:

Constraint Shortest path first algorithm used in path calc - Modified version of SPF algorithm for constrains support
Extension of RSVP for forwarding state along path, also reserving resources
IGPs link state and extension (OSPF with Opaque LSAs, IS-IS with Link State Packets TLV (type, length, and Value)) and keeping track of topology changes.

MPLS Traffic Merging:
FECs play an important role here, if traffic is ingress in an MPLS domain a FEC can be handled equivalently when forwarding and can contain a single label. If traffic is bound to the same FEC, the traffic will be forwarded the same way, regardless of network layer headers.
MPLS and loops:
Loop handling can be split between:

Loop prevention (Path Vector)
Loop mitigation - minimize negative effect of loops (TTL). When TTL reaches 0 the packet is discarded
Dynamic routing protocols

For mediums such as ATM and Frame Relay without the TTL option, MPLS uses buffer allocation for loop mitigation. This is mainly used on ATM switches because they can limit the amount of buffer space used by a VC.
If TTL still cannot be used, hop count can, like TTL it will decrement by 1 for every successful label binding. This information is carried within Link Description Protocol messages.
Path Vector has a list of LSRs that label distribution control message has traversed. Each LSR adds its info to the path vector list, when a LSR receives a message from its own identifier a loop is detected (this is also used by BGP with AS path attribute).
2 or more ASes within the same MPLS domain:
If two adjacent ASes exist, via ASBR summarizing eBGP routes before distributing them to their IGP or IGP routes covering a set of FECs which are different than eBGP routes, then ASBRs cannot forward traffic based on top-level label. This also applies with TE tunnels. Some traffic will be forwarded based on IP or a non top level label.
Now we have 2-3 MPLS forwarding domains with multiple ASes - one for each AS and one for a link between two ASBRs (where labeled packets are used instead of IP packets are used).
ASBRs probably would not be a ATM-LSR, due to the limited capacity of manipulating label stacks or forwarding unlabeled IP traffic.
Also with multi-provider BGP+MPLS VPNs - no top level LSP are established therefore the two ASes are separate admin domains or the two providers agree to allow lower lever LSPs to be established across the two ASes.
MPLS VPNs:
MPLS for VPNs provide traffic isolation, like ATM or Frame Relay. MPLS does not encrypt traffic, IPsec would have to be employed.
BGP MPLS-VPNs can exist, wherein BGP propagates VPN-IPV4 info using BGP multiprotocol extensions (MP-BGP) for handling these extended addresses. Reachability info such as VPN-IPV4 addresses on Edge label switch routers are propagated, also reachability into for a given VPN is propagated only to other members of that VPN. BGP multiprotocol extensions identify valid recipients for VPN routing info. All VPN members learn routes to other members.
Another idea is to use separate routing tables for VPNs that do not involve BGP.
Layer 2 ecapsulation methods can be used for encapsulation methods and layer 2 transport signal mechanisms, this is known as the "Martini Draft". This would have an advantage to many service providers due to the multitude of services provided.

Layer 2 VPN:

These are layer 2 services such as Frame Relay, ATM and Ethernet over an IP/MPLS backbone. This simplifies networks and reduces expenses.

VPLS:

These are Ethernet VLANS using MPLS. All edge devices maintain MAC address tables for reachable end nodes, like how LAN switches do. They allow Ethernet Reachability across geographic distances served by MPLS services.

There is no ecryption across MPLS VPNs, instead the use of tags are important. This makes MPLS just as secure as ATM and Frame Relay as interception of these types of network would require access to the SP. If security is an issue, IPSec or SSL can be used before going on the wire.

QOS:

MPLS Quality of Service is the same a IP - IP precedence, Committed access rate (CAR), Random Early Detection (RED), Weighted RED, Weighted Fair Queing (WFQ), Class-based WFQ, and priority Queing.
Diffserv has 64 classes, MPLS shim has 8, the exp field is 3 bits long and the diffsev us 6. Label-LSP and Exp-LSP solve this, since diffserv defines the interpretation of the TOS bits - as long as IP orecidence bits map to the Exp bits the same interpretation as the diffserv model can be applied to these bits. when extra bits are used L-LSP uses drop priority to identify what the remaining 3 bits mean.

The followings classes may be more appropriate for the initial deployment of MPLS QoS:

High-priority, low-latency "Premium" class (Gold Service)
Guaranteed-delivery "Mission-Critical" class (Silver Service)
Low-priority "Best-Effort" class (Bronze Service)

MPLS has QoS and ATM and Frame Relay has CoS. MPLS can also implement CoS using IP, this makes the network easier to provision and engineer.
GMPLS:

Encompasses time-divsion, wavelength, and spatial switching. Allows MPLS to be used as a control mechanism for configuring packet and not packet based devices.

Introduces new protocol called Link Management Protocol (LMP). This runs between adjacent nodes and establishes control channel connectivity and failure detection, also verifies channel connectivity.

GMPLS supports several features including:

Link Building - Grouping of many physical links into a single link
Link Hierarchy - Issuing of a suite of labels for various requirements of physical and logical devices on a path.
Unnumbered links - ability to config paths without IP info on every interface
Constraint Based Routing - automatically provision additional bandwidth, or change forwarding behavior due to network conditions like congestion or requirements of additional bandwidth

Two methods of operation are supported - Peer and Overlay. Peer is when all devices in a domain share the same control plane. Routers have visibility into optical topology and peer with optical switches.
Overlay is when optical and IP layers are separated with minimal interaction. An example of this today would be like ATM and IP, where there are no direct connections between the two routing layers.
Peer is simpler and more scalable, but overlay has fault isolation and separate control mechanisms for devices.

Friday, June 4, 2010

VRRP

Virtual Router Redundancy Protocol:

Consists of Master and backups, whereby the master does all the routing a a given point of time
Uses 00-00-5E-00-01-XX as it's MAC address, with XX being its Virtual Router ID (VRID). The physical routers in a Virtual Router communicate within themselves using packets with Multicast IP of 224.0.0.18, port 112.
The routers use a priority method to determine the master, highest priority wins, numbers are allocated between 1-255. This can and should be administratively set when planning a master swap in order to bypass hold time expiration timers (3xAdvertisment + skew time).
If a backup router fails to receive a packet from the master router three times, the backup will assume connectivity to the master is down and and election process is started throughout the group of routers with multicast packets.
Backup routers should only send multicast packets during elections, if all backup routers have the same priority, the router with the highest IP will become the master.
Orderly Election time is made effective by the use of skew time (Skew time = 1-(priority/256)) This reduces the chance of a Thundering herd issue.
If load sharing is used, backup router utilization is improved

R-PVST

Rapid Per-VLAN Spanning Tree:

Cisco
A combination of RSTP and PVST, much faster than PVST as it uses old STP (802.1d), thereby increasing the time it takes for fail over (50 seconds).

Thursday, June 3, 2010

MSTP

Multiple Spanning Tree Protocol:

Configures separate spanning tree for each VLAN group, Defines an extension to RSTP.All other alternate paths are blocked within each spanning tree.
MSTP creates MST regions, that run multiple MSTI (MST Instances). Regions and different STP bridges are interconnected with one CST (common spanning tree).
MSTP was inspired by Cisco's MISTP - Multiple Instances Spanning tree protocol.
MSTP includes all spanning tree information in one single BPDU, thereby reducing the number of BPDUs in a LAN communicating STP info for each VLAN. It also has backwards compatibility with RSTP and STP - This is achieved by adding additional region info after standard RSTP BPDU, also adding a number of MSTI messages (0-64). Each MSTI message conveys STP info for each instance. The instance is associated with a number of VLANs, the frames inside the VLANS operate in the Spanning tree instance within a MST region. A MST region is determined by other MSTP bridges with a MD5 digest in their VLAN instance table in the MSTP BPDU.
If an RSTP bridge see's a MSTP BPDU, it is seen as a RSTP BPDU (backwards compatible), therefore RSTP bridges see the a MSTP region as a single RSTP bridge, no matter how many MSTP bridges are inside the region. Another measure to see a single MST region is to incorporate a protocol that uses remaining hops as a TTL counter instead of a message age used by RSTP. The age increments only when STP info enters a MST region, therefore one region is one hop. Edge ports are known as boundary port, and can be configured to rapidly change to forwarding when connected to endpoints.

PVST

Per-VLAN Spanning Tree:

Cisco proprietary
Where multiple VLANS exist, Spanning Tree is deployed on each VLAN.
Only works with ISL (Cisco Protocol for VLAN encapsulation)
PVST+ was created due to 802.1Q encapsulation becoming dominant instead of ISL and to tunnel across a MSTP region

BPDU

Bridge Protocol Data Units:
Data frames that carry special information about bridge ID's and root path costs. The bridge sends a BPDU with the MAC of the port as source, and destination of STP multicast 01:80:C2:00:00:00.
Three types of BPDU's exist:

Configuration BPDU (CBPDU) - used for Spanning Tree Computation
Topology Change Notification (TCN) BPDU - announce changes in network topology
Topology Change Notification Acknowledgment (TCA)

BPDU's are exchanged every 2 seconds by default, network changes, starting and stopping port forwarding is notified to switches.
Once a new device is attached to a switch, it will process BPDU's to determine the topology of the network. A host attached to a switch, it will go into a forwarding state, after it listens and learns for 30 seconds and the forward delay has reached it 15 second default time. Another switch would remain in blocking mode if a loop would arise from its presence in the network. TCN will inform other switches of a port change, injected into the network by a non-root switch and propagated to root. When this TCN is received a Topology change flag is set and sent in a BPDU, which then is sent to all other switches, instructing them to age out their forwarding table entries.
Fields:
Bridge ID (BID) is eight bytes in length - first 2 are Priority, the last 6 are MAC. If MAC reduction is used, the first 4 bits are priority, and the last 12 bits are VLAN ID or MSTP instance number.

Once there is a stable switched network topology, we should see the following:

A MAC address for each switch
Path cost to root associated with each switch port
Port ID (MAC) with each switch port

Configuration BPDUs have:

A unique Identifier of the switch that the transmitting switch believes to be root
Cost of path to root from transmitting port
The ID of transmitting port

Configuration BPDUs communicate and compute STP. A MAC frame showing a BDPU sends teh switch group address to dest address field. All connected switches receive the BPDU - the info is not forwarded, but calculated by receiving switch and topology info is taken into consideration.

When a BPDU exchange occurs:

One switch is elected as root
The shortest distance to root is calculated for each switch
A designated switch is selected - the switch closest to root switch
A port for each switch is selected (to the root)
Ports in STP are selected

New BPDU:

With the introduction of RSTP, BPDUs have changes their functionality. 2 flags were previously used (TC and TCA). RSTP uses all six bits of the flag to:

Define role and state of port the originated the BPDU

Handle proposal and agreement mechanism

Legacy equipment using old BPDUs drop new BPDUs

How New BPDUs are Handled:

BPDU are Sent Every Hello-Time

BPDUs are sent every hello-time, instead of being relayed, every 2 seconds even if it does not receive any BPDUs from the root

Faster Aging of Information

If a hello is not received three consecutive times the timeout is expired. A neighbor is lost if this timer expires. In the previous version the problem may have existed anywhere on the link.

Note: Failures are detected even much faster in case of physical link failures.

Accepts Inferior BPDUs

This is similar to BackboneFast technology. The bridge receives inferior information from designated or root bridge, it is immedietly accepted and replaced of previously stored one.

/image/gif/paws/24062/146-g.gif

C is aware of an existing root path, sends a BPDU to B, which contains info about the root bridge. B now does not send its BPDUs and accepts the port that makes bridge C the new root port.

STP BackboneFast

BackboneFast - Fast convergence of Network backbone after STP topology change. An indirectly connected switch can detect a link failure by receiving inferior BPDU's from its designated bridge, root port or blocked port. Normally these BPDU's would be ignored by the maximum aging time.
The switch determines if it has an alternate path to the root bridge, if so the root port and other blocked ports become alternate paths to root bridge. If the inferior BPDU is noticed on the root port, all blocked ports become an alternate path to root bridge. If by chance there are no blocked ports and inferior BPDU's are received on the root port becomes the root switch and bypasses max aging time on root.

If the switch has an alternative path to root Bridge, a Root Link Query PDU is transmitted on all alternate paths to the root bridge, if no alternate is found the max aging time on inferior BPDU ports expire and be, the same applies if alternate paths to root bridge lose connectivity. If paths still exist to root bridge, all ports which received BPDU's become designated ports through listening and learning, then forwarding.

STP UplinkFast

UplinkFast - Uplink groups, or a set of ports in a VLAN provide fast convergance of a network access layer after a STP topology change. One uplink group is forwarding at any given time, and the uplink group is providing an alternate path in case the forwarding link fails at any time. Again this is bypassing the listening and learning states values accompanied with STP.

STP PortFast

PortFast - A switch places a port in the forwarding state immediately when physically active. No other STP devices should be connected, therefore its most appropriate for end-user devices. With no PortFast, ports must wait maxage plus twice forwading delay (default 50 seconds). PortFast bypasses listening and learning states.
Because PortFast should only be connected to end-station devices, PortFast BPDU guard can be used to prevent BPDU's seen on the portfast port by putting the port into an ErrDisable state. ErrDisable will disable the port and prevents BPDU from being received, the admin must manually put port back into service.

Wednesday, June 2, 2010

STP

Spanning Tree Protocol:

Main purpose is to prevent loops on a bridge
Creates Spanning Tree within a Mesh network, leaving a single path between two nodes
Creates spare links to provide full backup paths if active links fail
Messages are sent by STP by messages known as BPDU's - BPDU's are a very important aspect of STP and determine the state of the port a bridge will go in (as follows):

Blocking
Listening
Learning
Forwarding
Disabled

Blocking & Listening:

Discards frames from attached segment
Discards frames from another port
No learning
BPDU's are received but are directed to system module, does not transmit BPDU's
Allowed to be managed

Learning:

Discards frames from attached segment
Discards frames from another port for forwarding
Station location incorporated into address database
Directs BPDU's to system module
Uses BPDU's - receiving, processing and transmitting them from system module
Allowed to be managed

Forwarding:

Forward frames from attached segment
Forwards frames from switched port
Incorporates station location into address database
Recieves BPDU's and directs to system module
Processes BPDU's from system module
Allowed to be managed

Disabled:

Discards frames from attached segment
Discards frames switched from another port for forwarding.
Does not incorporate station location into database
Does not direct BPDU's to or from system module
Allowed to be managed

The states are traversed as follows:

Initialize to blocking
Blocking to listening or to disabled
Listening to learning or disabled
Learning to Forwarding or disabled
Forwarding to disabled

The path that bridges use is determined by the following rules:

Select Root Bridge: Contains the smallest bridge ID. Each bridge has an ID and a priority number, the bridge ID has both. When comparing the priority is looked at, if equal MAC's are compared, the lover MAC wins.

Determine the least cost paths to the root bridge: This can be configured by an administrator

Least path is traversed due to the following two rules:

Least cost path from each bridge: once root bridge is elected, every bridge calculates the cost from itself to root, the smallest cost is picked and known as the RP (root port) of the bridge.

Least cost path from each network segment: The bridges determine which bridge has the least cost to root, the network segment selected connected to the bridge is now the DP (Designated Port)

NOTE:

root ports send towards root bridge
designated ports send from root bridge

Disable all other root paths. All other ports become disabled and are known as Blocked Ports (BP).

Ties: If equal cost paths exist from two ports on the same bridge or two or more bridges on a network segment. This is how to avoid such a case:

Breaking ties for root ports: Lower bridge ID's are used

Breaking ties for designated ports: Again, the Lower Bridge ID is used to forward messages to root. If ID's are equal, lower MAC address is used

Final tie-breaker: If the same bridge is used (same MAC address) then the lower port priority is used.

RSTP

Rapid Spanning Tree Protocol:

Faster Convergence than STP
Can respond to changes within 6 seconds. This is done by detecting root switch failure - or failure of 3 hello times
Edge ports are considered if they have no more LAN's that have no other bridge attached. These go to forwarding state, RSTP monitors the port for BDPU's in case a new bridge is connected, when one is connected the once edge port is no longer a edge port.
RSTP responds to BPDU's sent from root bridge, STP does not do this. The best RSTP bridge will become the root bridge by means of this proposal, in inferior RSTP bridge will recieve this information and set its ports to discard. An Agreement is sent after this, and the new bridge can now rapidly transition the port to forwarding instead of listening/learning. This cascades up the stack away from the root bridge until the RSTP topology is created.
Backup details of discarded ports, thus creating an avoidance of timeouts
BPDUs are used in to make a calculation with STA (Spanning Tree Algorithm) to determine the role of a port

Bridge port roles:

Root - The best port that becomes the forwarder, spawned from nonroot to root
Designated - Forwarding port
Alternate - Different path to root, not the root port
Backup - Redundant path to segment where another bridge port connects
Disabled - Not turned on, not necessarily a part of STP

Port states:

Discarding
Learning
Forwarding

IST

InterSwitch Trunk:

Avaya
Link aggregation that connects two switches together to create one logical unit, sharing addressing, forwarding, and state information.
Required prior to SMLT, DSMLT, and RSMLT

VLACP

Virtual Link Aggregation Control Protocol:

Extension to LACP, used to detect end-to-end failures
Sends point-to-point hello packets and provides failure detection across any L2 domain, when hello packets are not received, link state is brought down
Timers can be reduced and one second failure detection and switchover can be achieved
Only works for Point-to-point applications, will not work for port-to-multiport (no guarantee for point-point match)
Used with MLT/SMLT to provide quick failure detection capability
Does not provide link aggregation, independent of LACP
Configuration must have same EtherType, multicast MAC address, and timers
Not typically used with LACP

Tuesday, June 1, 2010

LACP

Link Aggregation Control Protocol:

Allows aggregation of 2 or more ports to form LAG's (Link Aggregation Group's) so a MAC Client can treat the LAG as if it were a single link
Dynamically detects when links can be aggregated and does so when links become available
Capable of connecting to SMLT aggregated pair - provides standardized external link aggregation interface to third party vendors
Able to detect link layer failure within SP Network
Packets are exchanged end-to-end

Rules and Guidelines:

All ports in group must be in full duplex/same data rate
Must be in the same VLAN
LAG's must be in the same STP group (if applicable)

Configuration:

Port Priority: Which ports are standby if more than maximum number of ports in LAG is configured
System Priority: Generates Switch ID, determines Master Slave between SMLT apps.
Keys: Used to determine which ports are aggregated in to LAG, they do not need to match between peers
Timers: timeout x slow-periodic-time = 3 x 30s = 90s

SLPP

Simple Loop Prevention Protocol:

Avaya
Hello packets to detect network loops, checks packets from originating switch and peer switch in SMLT config
Per-VLAN basis to detect loops in untagged and tagged link configs
SLPP Packets are sent using L2 Multicast and a switch will only look at its own or its peer SLPP packets
Configured with the following criteria:

SLPP Tx Process: Which VLANS a switch should send Hello packets, then the packets are replicated out of all ports of SLPP enabled VLAN
SLPP Rx Process: Which ports on a switch should act when receiving SLPP packet by same switch or SMLT peer. Not recommended in Square or full mesh design on IST or core.
SLPP Action: The action to disable ports, can be modified by seeing how many SLPP packets are received before taking action

If MAC's are learned through looping ports, and because STP cannot detect the config issue, SLPP will disable the port

R-SMLT

Routed - SMLT:

Avaya
Exchange L3 information between nodes in a switch cluster for resiliency and simplicity
Works with SMLT and DSMLT to provide less than a second recovery
Providing active-active router concept to core SMLT networks
Supports SMLT Triangle, Square and Full Mesh
Takes care of packet forwarding, Static Unicast, RIP1, RIP2, OSPF, IPX RIP

DSMLT

Distributed Split Multi-Link Trunking:

Avaya
Enhanced SMLT Protocol
Allows ports in a trunk to span multiple units of a stack of switches or multiple cards in a chassis
Fault tolerance is the idea, as less than half a second is utilized to redistribute traffic on remaining active links when a link goes down
No outage is noticed by users

SMLT

Split Multi-Link Trunking:

Avaya
Multiple links treated as a single link and load balance traffic
Each packet uses hashed algorithm involving source and destination MAC info
Switches involved become aggregation switches and appear as one logical switch
If both ends are split in with 2 other devices, and no intermediate "mesh" connections, this is known as a "SMLT square"
If one end is split, this is known as a "SMLT Triangle"
SMLT Triangles do not need end devices to support SMLT
Heavily utilizes IST configuration
SLMT ID's are given to each SMLT device from end station
A single unit's ARP requests are given trough IST with its own connection information, this is not the case if redundant connections are present
Failure of one component results in half a second convergence time
When using SMLT it is not necessary to enable STP due to no logical bridging loops with IST
STP should be disabled on all SMLT ports

DMLT

Distributed Multi-Link Trunking:

Avaya
Load Balances traffic across connections, switches, or modules in a chassis
Allows the ports on a trunk to span multiple units or to span multiple cars in a chassis

MLT

Multi-Link Trunking:

Proprietary to Avaya
Groups several Ethernet links into one logical fault tolerant link between routers, switches and servers
Fault tolerant: If one link fails, the traffic will be distributed across remaining links
2 to 8 links
Used in conjunction with DMLT, SMLT, DSMLT, RSMLT
All physical ports must reside on the same switch (fixed with SMLT, DSMLT, RSMLT)

Networking