Monday, November 29, 2010

Reliable transport vs. reliable transport

One of the most contradictory uses of terminology in communications concerns the word transport as used by the IETF and the ITU-T communities. To make matters worse, the term’s prevalent modifier reliable leads to even further divergence in meaning.

To the Internet community, transport refers to the fourth layer of the OSI layer stack (a layer stack known to the ITU-T as X.200, but largely assumed to have been superseded by the more flexible G.80x layering model). The transport layer sits above the network layer (IP), and is responsible (or not) for a range of end-to-end path functionalities. The IETF has defined four transport protocols – the two celebrated ones being UDP (unreliable) and TCP (reliable), and their less fĂȘted brethren are SCTP (highly reliable and message-oriented), and DCCP (unreliable but TCP-friendly).

Since the IP stack does not extend below OSI layer 3, the basic tenet is that nothing can be done about defects in lower layers (congestion, lost or misordered packets) and the best strategy is to compensate for any such problems using the layer over IP (e.g., retransmission, packet reordering). Reliability in this context thus means employing such compensation.

To the communications infrastructure community a transport network is a communications network that serves no function other than transport of user information between the network’s ingress and its egress. In particular, no complex processing such as packet retransmission of reordering is in scope. Reliability in this context means monitoring the functionality and performance of the lowest accessible layer (OAM), and bypassing defective elements in as short a time as possible (protection switching).

For many years the Rubicon between L2 and L3 so effectively separated the two communities that the disparate usages could continue un-noticed. But then came MPLS.

MPLS was originally invented as a method of accelerating the processing of IP packets, by parsing the IP header only at network ingress, and attaching a short label to be looked-up from then on. With advances in header parsing hardware and algorithms this acceleration became less and less significant. However, the possibility of treating a packet consistently throughout the network, and thus performing traffic engineering under layer three instead of compensation over it, kept MPLS from becoming marginalized. While IntServ at the IP level (implemented by RSVP) never caught on, traffic engineering at the MPLS layer (RSVP-TE) starting gaining momentum.

Of course, for MPLS to truly make IP more reliable, it required more “transport” functionality (in the ITU sense), such as stronger OAM and protection switching. This lead to the introduction of “Transport-MPLS” (T-MPLS), later to be renamed “MPLS-Transport Profile” (MPLS-TP).
Thus it became impossible to suppress the conflict between the two transports. In the IETF Work on MPLS Transport Profile in the IETF is obviously not performed in the “Transport Area” (having nothing to do with transport), but in the “Routing Area” (although it has very little to do with routing).

So what does the future hold for the words “transport” and “reliable” ? In theory it would be possible to adopt synonyms (such “carriage”, and “resilient”), although I doubt that either community would be willing to abandon its traditions. At a high enough level of abstraction the meanings coalesce, so perhaps the best tactic today (whenever there is room for error) is to say sub-IP transport and super-IP transport. Reliability can be left for the super-IP case (where it is not really apt) since there are multiple alternatives for the sub-IP case.


Tuesday, November 23, 2010

IETF79 - Beijing !

I haven’t had much time to blog of late, having to catch up on work since returning from Beijing.

Beijing ? I hear you ask. Yes, the 79th IETF meeting was held 7-12 November in the Chinese capital.

This was my 26’th IETF, and things have changed since my first meeting. Back in the “old days” the meetings were mostly in the US (e.g., Minneapolis in the winter) and occasionally in Europe. This was then changed to a 3:2:1 rule, where half of the six meetings of two years taking place in North America (with Canada preferred to the US due to visa requirements), two meetings in Europe, and one meeting in Asia (Japan or S. Korea). Even then the default hotel chain with which the IETF had an arrangement was comfortably unvarying. The venue was so predictable that when I found out that the next European meeting was to be held in the French capital, I googled “Paris Hilton” and was surprised to retrieve photos of a scantily dressed heiress.

However, the proportion of Asian participants in SDOs has increased to such an extent that in the space of a single month, three of the SDOs that I follow held meetings in China – ITU-T SG15/Q13 (timing) met in Shenzhen 18 - 22 October, the MEF met 24-27 October in Beijing, followed by the IETF.

While the IETF general attendance figures were up (and for the first time the largest contingent was not from the US – but from China), several of the working groups that I attend suffered from a noticeable lack of major participants. In TICTOC, other than the two chairs and the Area Director, only two of the regulars were able to appear in person. This made it difficult to make any progress on the crucial issues.

However, the PWE3 session was lively, with the topics of making the control word mandatory and deprecating some of the VCCV modes drawing people to the mike. Unfortunately PWE3’s slot coincided with IPPM’s, but apparently IPPM was plagued with a situation similar to TICTOC’s. In the CODEC WG (whose chair couldn't make it to Beijing), the IPR-free audio codec for Internet use that is being developed was demoed. In the technical plenary there were interesting talks and exchanges in IPv6 operations and transitional issues, with the local speakers painting a grim picture of the IPv4 address availability.

All-in-all it was an interesting meeting in an interesting venue; a venue that I am certain to be visiting again.


Tuesday, November 2, 2010

OAM for flows

Continuing my coverage of the recent joint IESG/IAB design team on OAM, this time I want to discuss the issue of OAM for flows in Packet Switched Networks (PSNs).

From a pure topology standpoint any communications network is imply a set of source ports (i.e., interfaces into which we may input information), a set of destination ports (i.e., interfaces from which we may receive information), and a set of links connecting source ports to destination ports. Of course, the destination ports will may be located very far from the source ports (and this is the reason we use the network in the first place), but this geometry is irrelevant from a topological point of view.

PSNs are communications networks that transfer information in units of packets. They can be classified as either Connection Oriented (CO) , or ConnectLess (CL). In a CO network the end-to-end path from source port to destination port needs to be set up (by physical connection, or by manual/management-system configuration, or by control-plane signaling) before information can be sent down the path. Once set up, it makes sense to call this end-to-end path a “connection”. A connection is essentially an association of a source port with a destination port that has been prepared to carry information.

In a CL PSN each packet of information is individually sent towards its destination. No set up is required as each packet is identified by a unique destination address. When we send a data packet from a source port to the desired destination we can still think about an association of the source and destination ports, but as this association is ephemeral, this association does not constitute a connection. If, however, many similar packets are sent from source to destination, it may be useful to speak of a “flow” of packets. Of course, there is no guarantee that all the packets travel from source to destination over precisely the same path through the network, but in many cases this is the case for substantial periods of time until a reroute event takes place. When load balancing is used the definition (and its consequences) becomes truly problematic.

OAM mechanisms were originally designed for Circuit Switched (CS) or CO communications systems, such as PDH, SDH, and ATM networks. For such networks it makes perfect sense to ask about continuity of the connection, or its performance parameters (e.g., delay). Thus Continuity Check (CC) OAM functions became standard, and PM functions recommended for CO networks. The issue is more complex for CL networks. Continuity doesn’t mean very much if every packet is sent to a different destination! It means somewhat more when there is a prolonged flow; but even then packet loss and delay are statistical combinations, since consecutive packets may traverse different network elements on their route from source to destination.

When packets are delivered to an incorrect destination we say a misconnection has occurred, and Connectivity Verification (CV) monitors for such events. (There is often confusion between CC – a functionality needed for CO and CL networks, and CV – which is only for CL.)

For some types of prolonged flows it makes sense to introduce OAM mechanisms to monitor continuity and performance parameters. Pseudostreaming of video over the Internet may involve hundreds of packets per second for many minutes. IPTV flows are even higher in rate and can last for hours. Ethernet Virtual Connections (EVCs) between customer sites last indefinitely.

In a future entry I will discuss when it doesn’t make sense to talk about flows, and what OAM means for such cases.


Thursday, October 28, 2010


On October 12nd and 13th the IESG (Internet Engineering Steering Group) and IAB (Internet Architecture Board), the two IETF management bodies, held a joint design session on OAM. I was a bit surprised that the IETF leadership would be interested in devoting a separate meeting (not coinciding with an IETF conference) to the subject of OAM; OAM has never been an area of IETF expertise. Indeed, when the meeting was first announced on the IETF main discussion email list several long-time IETF participants asked for the acronym OAM to be spelled out! Of course, the ICMP (ping) was defined in RFC 792 circa 1981, and BFD that runs between routers has its own BFD Working Group (WG) in the IETF, but the overall concept of OAM has never been central to the IETF world view.

However, just as the interests of the ITU-T have been migrating up from synchronous networks to ones based on Ethernet, IP and MPLS, so have the interests of the IETF been migrating down from applications, end-to-end transport, and routing to pure transport functionality. And OAM is a crucial element of transport networks.

The physical meeting was held at George Mason University in Fairfax Virginia, but was also WebEx’ed. Thus I managed to actively participate, and even present slides, without having to travel; but did find myself jet-lagged due to shifting my work day by 6 hours. Unfortunately, the Internet connectivity at the conference site was not completely solid, and the remote attendees frequently found themselves talking to themselves on how to alert those on-site that the connection had failed. Some connectivity OAM would definitely have been useful …

So where does the IETF want to use OAM ? The main interest is now MPLS-TP, but there are still open issues regarding PW OAM.

The IETF PWE3 WG standardized an associated channel that shares fate with the PW traffic, which is mostly employed for OAM. This OAM is misnamed VCCV for Virtual Channel (an old name for PW) Connectivity Verification (which should be Continuity Check). VCCV presently allows IP ping, LSP ping, and BFD protocols to run inside the associated channel in order to provide FM. Back at IETF-67 (November 2006) I proposed using Y.1731 inside the associated channel. This idea was later developed into draft-mohan-pwe3-vccv-eth, backed by Nortel, RAD, France Telecom, KDDI, Huawei, NTT and Sprint, but was rejected by the larger community due to confusion as to its use of Ethernet (it was never intended to be limited to MPLS over Ethernet, or Ethernet PWs).

As I am sure all my readers know, MPLS-TP is a transport technology, being jointly developed by the ITU-T and IETF. The ITU-T views MPLS-TP as yet another transport network, which needs the same OAM functionality as all the other transport networks developed to date (SDH, OTN, carrier-grade Ethernet). In particular, the generic research on OAM for packet-based networks, and the protocol development (in cooperation with IEEE 802.1) of Y.1731, is seen by the ITU-T community as directly relevant to MPLS-TP. Work in the ITU, and Internet Draft draft-bhh-mpls-tp-oam-y1731 submitted to the IETF, proposed maximizing re-use of Y.1731 formats. This approach is strongly advocated by Alcatel-Lucent and Huawei, and is being backed by many operators, including China Mobile, China Telecom, Telecom Italia, France Telecom, Deutche Telekom, Telstra, and KPN. The idea expands on my earlier proposal, solves both FM and PM with a single OAM protocol, and is expected to undergo major deployment in the near future.

IETF participants from Cisco, Juniper, and Ericsson produced an alternative OAM FM proposal, based on the IETF’s own BFD instead of the ITU-T’s Y.1731. The IETF MPLS WG could not reach consensus as to which mechanism to prefer (in an email poll the community was split about 50/50). The MPLS WG chairs decided to exercise their authority to break such ties, and elevated the BFD-based draft to WG status as draft-ietf-mpls-tp-cc-cv-rdi, thus effectively killing draft-bhh for FM. The issue of PM was open for a while, but a Cisco draft has recently been elevated to become draft-ietf-mpls-tp-loss-delay, thus blocking draft-bhh from the PM function as well. This draft is not fully fleshed out, but it uses the MPLS-TP G-ACh mechanism, and allows either NTP-style or 1588-style timestamps.

So we see that OAM has become a hot (and contentious) topic in the IETF.

After this long introduction, my next entry will delve into a few of the subjects discussed at the joint design session.


Wednesday, September 29, 2010

OAM for FM and PM

The Operations, Administration, and Maintenance (OAM) functionality provided in all modern communications systems supports two distinguishable functions, namely Fault Management (FM) and Performance Management (PM).
It is important to remember that despite the use of the word “management” here, OAM is a user-plane function. OAM may trigger control plane procedures (e.g., protection switching) or management plane actions (such as alarms), but the OAM itself is data that runs along with the user data.

FM deals with the detection and reporting of malfunctions. ITU-T Recommendation G.806 defines a scale of such malfunctions :
  • anomaly (n): smallest observable discrepancy between desired and actual characteristics
  • defect (d): sufficient density of anomalies that interrupts some required function
  • fault cause (c): root cause behind multiple defects
  • failure (f): persistent fault cause such that the ability to perform the function is terminated

The main FM functions include :

  • Continuity Check (CC): checking that data sent from A to B indeed arrives at B
  • Connectivity Verification (CV): checking that data set from A to B does not incorrectly arrive at C
  • Loopback (LB): checking that data can be sent from A to B can be returned from B and received at A
  • Forward Defect Indication (FDI) also called Alarm Indication Signal (AIS): when data sent from A to B is destined for C, B reports to C that it did not receive data from A
  • Backward Defect Indication (BDI) also called Reverse Defect Indication (RDI): when data is sent from A to B, B reports to A that it did not receive the data.

PM deals with monitoring of parameters such as end-to-end delay, Bit Error Rate (BER), and Packet Loss Ratio (PLR). While there may not be loss of basic connectivity if performance parameters are not maintained within their desired realms, the ability to provide specific services may be compromised, even to the extent that there is a loss of service. For example, excessive round-trip delay makes it difficult to hold interactive audio conferences, and excessive PLR may lead to loss of an IPTV service. For this reason, Service Providers (SPs) commit to Service Level Agreements (SLAs) that specify the acceptable PM parameters.

A partial list of PM parameters that may appear in an SLA is :

  • BER or PLR (for packet oriented networks)
  • 1-way delay (1DM) also called latency: the amount of time it takes for data to go between two points of interest (this measurement requires clock synchronization between endpoints)
  • 2-way delay also called roundtrip delay (RTD): the amount of time it takes for data to go to a point of interest and return (does not require clock synchronization)
  • Packet Delay Variation (PDV): the variation of delay (may be 1-way or 2-way, but even 1-way does not require time synchronization, although frequency synchronization may be required for highly accurate measurements)
  • Availability: percentage of time that the service can be provided
  • Throughput or Bandwidth profile (for packet oriented networks): methods of quantifying the sustainable data rate (will generally be needed for each direction separately)

While certain FM functions, in particular Continuity Check (CC), are usually run periodically, PM functions are frequently called on an ad-hoc basis. However, with an SLA in effect, the SP needs to periodically monitor the PM parameters, and the customer may want to do so as well. In fact, while customers typically trust legacy SPs to provide the promised service level (after all, a 2.048 Mbps leased line is never going to deliver only 1.9 Mbps!), they have much less trust for newer services (it is relatively easy for a SP to cheat and provide 8 Mbps Ethernet throughput instead of the promised 10 Mbps).

In future entries I will deal with questions such as what parameter levels are needed for particular applications, how PM impacts user experience, and how SPs and customers should monitor performance.


Wednesday, September 8, 2010

Deployment, R&D, and protocols

In my last entry I discussed why the last mile is a bandwidth bottleneck while the backhaul network is a utilization bottleneck. Since I was discussing the access network I did not delve into the core, but it is clear that the core is where the rates are highest, and where the traffic is the most diverse in nature.

Based on these facts, we can enumerate the critical issues for deployment and R&D investment in each of these segments. For the last mile the most important deployment issue is maximizing the data-rate over existing infrastructures, and the area for technology improvement is data-rate enhancement for these infrastructures.

For the backhaul network the deployment imperative is congestion control, while development focuses on OAM and control plane protocols to minimize congestion and manage performance and faults.

For the core network the most costly deployment issue is large-capacity, fast and redundant network forwarding elements, along with rich connectivity. Future developments involve a huge range of topics, from optimized packet formats (MPLS) through routing protocols, to management plane functionality.

A further consequence of these different critical issues is the preference of protocols used in each of these segments. In the last mile efficiency is critical, but there no little need for complex connectivity. So physical-layer framing protocols rule. As there may be the need for multiplexing or inverse multiplexing, one sometimes sees non-trivial use of higher-layer protocols. However, these are usually avoided. For example, Ethernet has long had an inefficient inverse multiplexing mechanism (LAG), but this is being replaced with the more efficient sub-Ethernet PAF (EFM bonding) alongside physical layer (m-pair) bonding for DSL links.

In the backhaul network carrier-grade Ethernet has replaced ATM as the dominant protocol, although MPLS-TP advocates are proposing it for this segment. Carrier-grade Ethernet acquired all the required fault and performance mechanisms with the adoption of Y.1731, while the MEF has worked hard in developing the needed shaping, policing, and scheduling mechanisms.

In the core the IP suite is sovereign. MPLS was originally developed to accelerate IP forwarding, but advances in algorithms and hardware have made IPv4 forwarding remarkably fast. IP caters to a diverse set of traffic types, and the large number of RFCs attests to the richness of available functionality.

Of course it is sometimes useful to use different protocols. A service provider that requires out-of-footprint connectivity might prefer IP backhaul to Ethernet. An operator with regulatory constraints might prefer a pure Ethernet (PBBN) core to an IP one. Yet, understanding the nature and constraints of each of the segments helps us weigh the possibilities.


Thursday, August 26, 2010

Bandwidth and utilization bottlenecks

Let us consider an end-to-end data transport path that can be decomposed into the following segments
* end-to-end path = LAN + access network + core network + access network + LAN
There may be distinct service providers for each of these segments, thus many different decompositions may make sense from the business perspective. Yet, the identity of the access network, and of its components
* access network = last mile + backhaul network
are useful constructs for more fundamental reasons.

These reasons emanate from the concepts of bandwidth and bandwidth utilization (the ratio of required to available bandwidth). In general :
1) LAN and core have high bandwidth, while the last mile has low bandwidth.
2) LAN and core enjoy low utilizations, while the backhaul network suffers from high utilization.
Let's see why.

LANs are the most geographically constrained of the segments, and thus physics enables them to effortlessly run at high bandwidth. On the other hand, LANS handle only their owner’s traffic, and thus the required bandwidth is low as compared with that available. And if the bandwidth requirements increase, it is a relatively simple and inexpensive matter for the owner to upgrade switches or cabling. So utilization is low.

Core networks have the highest bandwidth requirements, and are geographically unconstrained. This is indeed challenging, however, the challenge is actually financial rather than physical. Physics allows transporting without error any quantity of digital data over any distance; it just extracts a monetary penalty when both bandwidth and distance are large. Since it is the core function of core network operators to provide this transport, the monetary penalty of high bandwidth is borne. Whenever trends show that bandwidth is becoming tight, network engineering comes into play – that is, either some of the traffic is rerouted or the network infrastructure is upgraded.

Shannon’s capacity law universally restricts the bandwidth of DSL, radio, cable or PON links used in the last mile. However, utilization is usually not a problem as customers purchase bandwidth that is commensurate with their needs, and understand that it is worthwhile to upgrade their service bandwidth as these needs increase.

On the other hand, the backhaul network is a true utilization bottleneck. Frequently the access provider does not own the infrastructure, and purchases bandwidth caps instead. Since the backhaul is shared infrastructure, overprovisioning these rings or trees would seriously impact OPEX overhead. Even when the infrastructure is owned by the provider, adding new segments involves purchasing right-of-way or paying license fees for microwave links.

So, the sole bandwidth bottleneck is the last mile, while the sole utilization bottleneck is the backhaul network. Understanding these facts is critical for proper network design.


Thursday, August 19, 2010

The access network equation

My last entry provoked several emails on the subject of the terms last/first mile vs. access networks. While answering these emails I found it useful to bring in an additional term – the backhaul network. Since these discussions took place elsewhere, I thought it would be best to summarize my explanation here.

Everyone knows what a LAN is and what a core network is. Simply put, the access network sits between the LAN or user and the core. For example, when a user connects a home or office LAN to the Internet via a DSL link, we have a LAN communicating over an access network with the Internet core. Similarly, when a smartphone user browses the Internet over the air interface to a neighboring cellsite, the phone connects over an access network to the Internet core.

However, the access network itself naturally divides into two segments, based on fundamental physical constraints. In the first example the DSL link can’t extend further than a few kilometers, due to the electrical properties of twisted copper pairs. In the second case when the user strays from the cell served by the base-station, the connection is reassigned to a neighboring cell, due to electromagnetic properties of radio waves. Such distance-limited media are the last mile (or first mile if you prefer).

DSLAMs and base-stations are examples of first aggregation points; they terminate last mile segments from multiple users and connect them to the core network. Since the physical constraints compel the first aggregation point to be physically close to its end-users, it will usually be physically remote from the core network. So an additional backhaul segment is needed to connect the first aggregation point to the core. Sometimes additional second aggregation points are used to aggregate multiple first aggregation points, and so on. In any case, we label the set of backhaul links and associated network elements the backhaul network.

We can sum this discussion up in a single equation:
* access network = last mile + backhaul network

I’ll discuss the consequences of this equation in future blog entries.


Sunday, August 8, 2010

Last mile or first mile ?

Physical-layer access technologies with limited range are usually called last mile technologies. More specifically, we usually use the expression last mile when considering xDSL, that enables several Mbps to be transported over several kilometers, or a fiber optic link or PON, that enable hundreds or even thousands of Mbps to be transported over tens of kilometers.

In the year 2000 the IEEE started talking about “Ethernet in the First Mile” (EFM). In 2001 the EFM task force (802.3ah) was created, that developed extensions to Ethernet that are now incorporated into 802.3 as clauses 56 through 67. These extensions include:

  • a VDSL physical medium called 10PASS-TS (clause 62)
  • an SHDSL physical medium called 2BASE-TL (clause 63)
  • a new inverse multiplexing method (different from LAG, sometimes referred to as EFM bonding) called PME aggregation (subclause 61.2.2)
  • a 100Mbps 10 km point-point 2-fiber medium called 100BASE-LX10 and a single-fiber one called 100BASE-BX10 (clause 58)
  • a Gbps 10 km 2-point-point 2-fiber medium called 1000BASE-LX10 and a single-fiber one called 1000BASE-BX10 (clause 59)
  • a Gbps point-multipoint single-fiber medium with 10 km range called 1000BASE-PX10, and one with a 20 km range called 1000BASE-PX20 (clause 60)
  • logic for the EPON Ethernet Passive Optical Network (clause 64)
  • OAM features (clause 57)

The EFM task force closed down in 2004, and thus it is no longer accurate to say “EFM bonding” or “EFM OAM”. Yet the expression “first mile” remains in use. Is there a difference between the “last mile” and the “first mile”?

I was not there when the IEEE came up with the nomenclature, but I feel that I understand the idea behind it. The term “last mile” was invented by core network engineers. For someone who lives in the WAN, the short-range link that reaches the end-user is justifiably called the “last mile”. On the other hand, the IEEE 802 standards committee takes a LAN-centric point of view. For someone who lives in the LAN, the technology that provides the first link to the outside world is understandably called the “first mile”.

For those of us who live in the access network it doesn’t matter whether you call it first or last mile, we call it home.


Monday, August 2, 2010

DNSSEC - Internet root signed

IP addresses (even 4-byte IPv4 ones), are generally not easy to remember, which is why humans prefer to type domain names into their browser address window, even if they are longer. It is job of the Domain Name System (DNS) to translate the domain name into the correct IP address, which is placed in the IP header and enables proper forwarding.

The DNS works recursively in the following way. When my application (for instance, my browser) needs the IP address for some domain name, it queries the operating system’s DNS resolver. If the resolver already knows the IP address (for example, it is preconfigured, or that domain name has been recently looked up and is cached) it returns it to the application. If not, the resolver will query a DNS server, that has been configured or found using DHCP. If this server knows the IP address (i.e., it is cached there), it returns it in an “A record” (or an “AAAA record” for IPv6 addresses); otherwise it recursively queries until it finds a server that has the required “A record”. It may eventually get to the authoritative DNS server for the domain in question; that is, the name server that didn’t learn the IP address from another name server, but was configured with it.

This system is hierarchical and distributed and thus very scalable, but is not very secure. The archetypical attack is DNS cache poisoning, which is carried out by impersonating a name server that knows the desired IP address, and causing a name server closer to the resolver to cache the incorrect result. When queried the attacker’s IP address is returned to the user who then browses to a malicious site where it is tricked into accepting fallacious content or infected with viruses to be exploited later.

DNSSEC (Domain Name System Security Extensions) adds source authentication and integrity checking to the DSN system in a backwards compatible way. In DNSSEC the DNS responses are cryptographically signed with public key signatures, and thus can’t be forged. This thwarts cache poisoning exploits. In addition, DNSSEC can also be used to protect non-DNS data, such as “CERT records” that can be used to authenticate emails.

DNSSEC is described in RFCs 4033, 4034, and 4035 from 2005, but the root zone of the Internet was only signed in July, 2010. This major milestone was celebrated last week at the Wednesday IETF-78 plenary with glasses of champagne and the handing out of stickers declaring IETF – DNSSEC – SIGNED.


Thursday, July 29, 2010

TICTOC update

This entry is an update for people who have been following the IETF TICTOC working group that I chair.

We had a very lively meeting yesterday, and the topic that evoked the most interest was that of how to transfer timing flows (especially 1588, but NTP as well) over MPLS networks.

The first question is why anything special treatment is needed here at all - after all, anything that can be carried over IP or Ethernet can be carried over MPLS! The reason for special treatment is that network elements along the path may need to be able to identify timing packets. For example, for high accuracy timing it will be necessary to prioritorize timing packets, for example by placing them in a usually empty queue in order to limit transit delay and delay variation. Furthermore, it may be desirable to identify timing packets and read specific field in them in order to implement on-path support (e.g., 1588 transparent clock or boundary clock functionality).

So what can be done? The simplest answer is nothing; well almost nothing. We could just send the timing packets as IP packets over MPLS and expect DPI to identify them. This would be expensive, potentially error-prone, and would introduce additional delay and perhaps delay variation.

The next simplest answer is to place timing packets in an LSP of their own, and to signal or configure the LSRs to recognize these timing packets. A draft describing such a mechanism has recently been written.

Another idea would be to use a special MPLS label could be defined that would be universally recognized. Unfortunately, there are only 16 reserved labels, and 6 have already been allocated - it will be hard to convince the MPLS experts to give us a label for this purpose. Alternatively, a specific combination of TC bits (ex-EXP bits) could be used to indicate timing packets.

For MPLS-TP environments the relatively new generic associated channel (GACh) could be used. These are packets that usually carry OAM, and for timing delivered by the service provider, this would seem a natural way to go.

Finally, one could define a new pseudowire type, or a new MPLS client (right now MPLS can carry MPLS, IP, or pseudowire clients).

In the discussions that ensued, the method of using a special LSP (carrying either IP or Ethernet timing packets), the use of the TC field, and the GACH method, all had proponents.

It is too early to say which method will prevail.


Wednesday, July 28, 2010

Trains, planes, MPLS, and IP

I am in Maastricht at the 78th IETF meeting.

In the weeks before the meeting many IETF'ers complained about the fact that there were no direct flights to the venue. The closest airports are Amsterdam and Brussels, but from there three trains had to be taken.

At first I had thought that the problem was related to carrying luggage between trains, but discovered that the concerns were more about the differences between connection-oriented and connectionless networking.

Plane flights are similar to MPLS or ATM.
You set up the entire end-to-end route before taking it.
You pay for the desired quality of service (business class / economy) and the appropriate resources are reserved for you.
Your connections should be seamless (e.g., they transfer your luggage for you) but, if a connection fails you have to invoke the management system and wait a long time for a reroute (with gold passengers suffering less delay than non-prioritorized ones).

Train rides are like IP.
You pay a flat rate to get to Maastricht.
There are many valid ways to get there (via Utrecht, via Amsterdam Centraal, or via Rotterdam), and which you take is nondeterministic, depending on when you start the journey and whether some lines are not running (due to weekends, work on the lines, etc).
The processing required at each hop may seem complex (arrive at platform 13, take elevator up, and then down to platform 5a), but your best strategy is just to take the first train in the right direction that comes along (after all Dijkstra was Dutch).
You can pay for a first-class DiffServ priority, but the difference is not very big and they don't reserve an assigned seat for you.
Even if you accidentally get on the wrong cabin and end up in Heerlan, you just take the next train back.

It seems surprising to me that so many IETF78 participants were worried that the IP way of doing things would not work.


Monday, July 26, 2010

Succulents and (tele)protection mechanisms

My wife and I cultivate succulents, plants that store water. Cacti are succulents, but we don’t collect them – only the cuddly and outwardly types. What I find so interesting about succulents is that the various types are relatively unrelated – there was never a proto-succulent that evolved into many subspecies. Instead many different kinds of plants developed similar mechanisms to cope with the same problem – that of conserving water. For example, the American Agave (from which Tequila is made) family looks so similar to the African Aloe (from which Aloe Vera is derived) family, that it is hard to tell them apart.

The same thing happens in different families of technologies. For example, in-building wiring in the UK uses ring circuits, where wiring goes from the fuse box to each of the receptacles, and back to the fuse box. In this way one can get away with smaller-diameter conductors than needed with a "radial circuit", since there are two parallel paths from the fuse box to each receptacle. Furthermore, if a wire is disconnected along one direction from the fuse box, there is still electricity at all of the receptacles. This is similar to what we call in communications engineering a 1+1 protection mechanism. In 1+1 protection the information is sent around a ring in both directions, and a copy that makes it (first or best) to the destination is extracted.

On the other hand, high-voltage electric distribution systems use a mechanism called "teleprotection" in order to bypass faults. When a fault is detected large relays switch in order to bypass the fault. This is similar to what we call fast reroute (FRR), in which detection of a communications network failure triggers rerouting of information around the failed link or element.

So, two of the protection mechanisms used in communications were independently discovered and implemented in electric wiring as well. I find that as interesting as succulents.


Wednesday, July 21, 2010

Shaping / policing in 3G networks

This is my first blog entry on the world of traffic shaping. We have all been hearing about the need for shaping in 3G mobile networks. The subject’s popularity exploded with the publicity of network overload problems that some operators were experiencing with the introduction of the iPhone. Most of that turned out to be due to signaling overload. However, almost everyone is convinced that if flat-rate plans remain then true data congestion will become acute at some point, and some sort of bandwidth shaping and policing will then be needed.

The question is where to apply the shaping. Most of the many solutions that I have seen are positioned relatively high up in the network, at the Gn (between SGSN and GGSN) or Gi (between GGSN and Internet) interfaces. And it’s easy to see why. Lower down (at the Iu or Iub RAN interfaces) there are scalability concerns and the protocols are much more complex.

The problem is that the same data overload concerns are forcing mobile operators to cache and/or offload traffic as close to the end-user as possible. This traffic doesn’t traverse shaping / policing functions positioned higher up, and thus leads them to believe that all is well. Hence the RAN becomes even more overloaded.

Two fixes come to mind. The more complex one is to signal the true situation upwards. This one may be preferred by operators, as it could potentially allow non-flat-rate billing even for traffic that doesn’t traverse their core. The cleaner solution is to perform the shaping lower down (e.g., at the Iub), before the caching/offloading. This solution entails classifying user traffic that is usually encrypted and even if in the clear is hidden by a plethora of complex protocols. The actual shaping is also complicated because of multiplexing.

I would appreciate hearing as to preference or other alternatives.