Universidad Publica de Valencia_T2_encaminamiento

download Universidad Publica de Valencia_T2_encaminamiento

of 114

Transcript of Universidad Publica de Valencia_T2_encaminamiento

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    1/114

    Transmisin de Datos Multimedia http://www.grc.upv.es/docencia/tdm Master IC 2007/2008

    Tema 2:

    Aspectos de encaminamiento

    Tema 2:

    Aspectos de encaminamiento

    Algoritmos bsicos de encaminamiento Link state

    Distance Vector

    Encaminamiento en Internet RIP

    OSPF BGP

    Multi-Protocol Label Switching (MPLS).

    IP multicast

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    2/114

    Tran

    smisindeDatosM

    ultimedia-MasterI

    C2007/2008

    2

    123

    0111

    value in arrivingpackets header

    routing algorithm

    local forwarding table

    header value output link

    0100010101111001

    3221

    Interplay between routing, forwarding

    Computer Networking: A TopDown Approach Featuring the

    Internet,3rd edition.

    Jim Kurose, Keith RossAddison-Wesley, July 2004.

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    3/114

    Tran

    smisindeDatosM

    ultimedia-MasterI

    C2007/2008

    3

    u

    yx

    wvz

    2

    2

    13

    1

    1

    2

    5

    3

    5

    Graph: G = (N,E)

    N = set of routers = { u, v, w, x, y, z }

    E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }

    Graph abstraction

    Remark: Graph abstraction is useful in other network contexts

    Example: P2P, where N is set of peers and E is set of TCP connections

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    4/114

    Tran

    smisindeDatosM

    ultimedia-MasterI

    C2007/2008

    4

    Graph abstraction: costs

    u

    yx

    wv

    z2

    2

    13

    1

    1

    2

    53

    5 c(x,x) = cost of link (x,x)

    - e.g., c(w,z) = 5

    cost could always be 1, or

    inversely related to bandwidth,

    or inversely related tocongestion

    Cost of path (x1, x2, x3,, xp) = c(x1,x2) + c(x2,x3) + + c(xp-1,xp)

    Question: Whats the least-cost path between u and z ?

    Routing algorithm: algorithm that finds least-cost path

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    5/114

    Tran

    smisindeDatosM

    ultimedia-MasterI

    C2007/2008

    5

    Routing Algorithm classification

    Global or decentralized information?

    Global:

    all routers have complete topology, link costinfo

    link state algorithms

    Decentralized: router knows physically-connected

    neighbors, link costs to neighbors

    iterative process of computation, exchangeof info with neighbors

    distance vector algorithms

    Static or dynamic?

    Static:

    routes change slowly over time

    Dynamic:

    routes change more quickly periodic update

    in response to link costchanges

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    6/114

    Tran

    smisindeDatosM

    ultimedia-MasterI

    C2007/2008

    6

    A Link-State Routing Algorithm: Dijsktras Algorithm

    net topology, link costs known to allnodes

    accomplished via link statebroadcast

    all nodes have same info computes least cost paths from one

    node (source) to all other nodes

    gives forwarding table forthat node

    iterative: after k iterations, know leastcost path to k destinations

    Notation:

    c(x,y): link cost from node x to y; = if notdirect neighbors

    D(v): current value of cost of path from source todestination v

    p(v):predecessor node along path from source to

    v

    N': set of nodes whose least cost path definitivelyknown

    1 Initialization:

    2 N' = {u}3 for all nodes v4 if v adjacent to u5 then D(v) = c(u,v)6 else D(v) = 7

    8 Loop9 find w not in N' such that D(w) is a minimum10 add w to N'11 update D(v) for all v adjacent to w and not in N' :12 D(v) = min( D(v), D(w) + c(w,v) )13 /* new cost to v is either old cost to v or known

    14 shortest path cost to w plus cost from w to v */15 until all nodes in N'

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    7/114

    Tran

    smisindeDatosM

    ultimedia-MasterI

    C2007/2008

    7

    Dijkstras algorithm example

    A F

    B

    D E

    C2

    2

    2

    3

    1

    1

    1

    3

    5

    step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

    0 A 2, A 5, A 1, A ~ ~

    5

    B C D E F

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    8/114

    Tran

    smisindeDatosM

    ultimedia-MasterIC2007/2008

    8

    Dijkstras algorithm example

    A F

    B

    D E

    C2

    2

    2

    3

    1

    1

    1

    3

    5

    step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

    0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~

    5

    B C D E F

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    9/114

    Tran

    smisindeDatosM

    ultimedia-MasterIC2007/2008

    9

    Dijkstras algorithm example

    A F

    B

    D E

    C2

    2

    2

    3

    1

    1

    1

    3

    5

    step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

    0 A 2, A 5, A 1, A ~ ~

    1 AD 2, A 4, D 2, D ~

    2 ADE 2, A 3, E 4, E

    5

    B C D E F

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    10/114

    Tran

    smisindeDatosM

    ultimedia-MasterIC2007/2008

    1

    0

    Dijkstras algorithm example

    A F

    B

    D E

    C2

    2

    2

    3

    1

    1

    1

    3

    5

    step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

    0 A 2, A 5, A 1, A ~ ~

    1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E

    3 ADEB 3, E 4, E

    5

    B C D E F

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    11/114

    TransmisindeDatosM

    ultimedia-MasterIC2007/2008

    1

    1

    Dijkstras algorithm example

    A F

    B

    D E

    C2

    2

    2

    3

    1

    1

    1

    3

    5

    step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

    0 A 2, A 5, A 1, A ~ ~

    1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E

    3 ADEB 3, E 4, E

    4 ADEBC 4, E

    5

    B C D E F

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    12/114

    TransmisindeDatosM

    ultimedia-MasterIC2007/2008

    1

    2

    Dijkstras algorithm example

    A F

    B

    D E

    C2

    2

    2

    3

    1

    1

    1

    3

    5

    step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

    0 A 2, A 5, A 1, A ~ ~

    1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E

    3 ADEB 3, E 4, E

    4 ADEBC 4, E

    5

    B C D E F

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    13/114

    TransmisindeDatosM

    ultimedia-MasterIC2007/2008

    1

    3

    Dijkstras algorithm example

    A

    ED

    CB

    F

    Resulting shortest-path tree from A:

    B

    DE

    C

    F

    (A,B)

    (A,D)(A,D)

    (A,D)

    (A,D)

    destination link

    Resulting forwarding table in A:

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    14/114

    TransmisindeDatosM

    ultimedia-MasterIC2007/2008

    1

    4

    Dijkstras algorithm, discussion

    Algorithm complexity: n nodes

    each iteration: need to check all nodes, w, not in N

    n(n+1)/2 comparisons: O(n2)

    more efficient implementations possible: O(nlogn)Oscillations possible:

    e.g., link cost = amount of carried traffic

    A

    D

    C

    B1 1+e

    e0

    e

    1 1

    0 0

    A

    DC

    B

    2+e 0

    00 1+e 1

    A

    DC

    B

    0 2+e

    1+e1 0 0

    A

    DC

    B

    2+e 0

    e0 1+e 1

    initially recompute

    routing

    recompute recompute

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    15/114

    TransmisindeDatosM

    ultimedia-MasterIC2007/2008

    1

    5

    Distance Vector Algorithm

    Bellman-Ford Equation (dynamic programming)

    Define: dx(y) := cost of least-cost path from x to y

    Then

    where min is taken over all neighbors v of x

    vdx(y) = min {c(x,v) + dv(y) }

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    16/114

    TransmisindeDatosM

    ultimedia-MasterIC2007/2008

    1

    6

    Bellman-Ford example

    u

    yx

    wv

    z2

    2

    1 3

    1

    1

    2

    53

    5

    Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3

    du(z) = min { c(u,v) + dv(z),

    c(u,x) + dx(z),c(u,w) + dw(z) }= min {2 + 5,

    1 + 3,

    5 + 3} = 4

    Node that achieves minimum is nexthop in shortest path forwarding table

    B-F equation says:

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    17/114

    TransmisindeDatosM

    ultimedia-Master

    IC2007/2008

    1

    7

    Distance Vector Algorithm

    Node x maintains the Distance vector: Dx = [Dx(y): y N ]

    Where Dx(y) = estimate of least cost from x to y

    Node x also maintains its neighbors distance vectors

    For each neighbor v, x maintainsDv = [Dv(y): y N ]

    Node x knows the cost to each neighbor v: c(x,v)

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    18/114

    TransmisindeDatosM

    ultimedia-Master

    IC2007/2008

    1

    8

    Distance vector algorithm

    Basic idea:

    Each node periodically sends its own distance vector estimate toneighbors

    When a node x receives new DV estimate from neighbor, it updatesits own DV using B-F equation:

    Under minor, natural conditions, the estimate Dx(y) converge to theactual least cost dx(y)

    Dx(y) minv{c(x,v) + Dv(y)} for each node y N

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    19/114

    TransmisindeDatosM

    ultimedia-Master

    IC2007/2008

    1

    9

    Distance Vector Algorithm

    Iterative, asynchronous: each localiteration caused by:

    local link cost change

    DV update message fromneighbor

    Distributed:

    each node notifies neighborsonly when its DV changes

    neighbors then notify their

    neighbors if necessary

    waitfor (change in local linkcost of msg from neighbor)

    recomputeestimates

    if DV to any dest has

    changed, notifyneighbors

    Each node:

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    20/114

    Tra

    nsmisindeDatosM

    ultimedia-Master

    IC2007/2008

    2

    0

    x y z

    x

    yz

    0 2 7

    from

    cost to

    from

    from

    x y z

    xyz

    0 2 3

    from

    cost tox y z

    x

    yz

    0 2 3

    from

    cost to

    x y z

    xyz

    cost tox y z

    xyz

    0 2 7

    from

    cost to

    x y z

    xyz

    0 2 3

    from

    cost to

    x y z

    xy

    z

    0 2 3

    from

    cost tox y z

    xy

    z

    0 2 7

    from

    cost tox y z

    xy

    z

    7 1 0

    cost to

    2 0 1

    2 0 17 1 0

    2 0 17 1 0

    2 0 1

    3 1 0

    2 0 13 1 0

    2 0 1

    3 1 0

    2 0 1

    3 1 0

    time

    x z12

    7

    y

    node x table

    node y table

    node z table

    Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}= min{2+0 , 7+1} = 2

    Dx(z) = min{c(x,y) + Dy(z), c(x,z) + Dz(z)}

    = min{2+1 , 7+0} = 3

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    21/114

    Tra

    nsmisindeDatosM

    ultimedia-Master

    IC2007/2008

    2

    1

    Comparison of LS and DV algorithms

    Message complexity

    LS: with n nodes, E links, O(nE)msgs sent

    DV: exchange betweenneighbors only

    convergence time varies

    Speed of Convergence

    LS: O(n2) algorithm requiresO(nE) msgs

    may have oscillations DV: convergence time varies

    may be routing loops

    count-to-infinity problem

    Robustness: what happens if routermalfunctions?

    LS:

    node can advertise incorrect linkcost

    each node computes only itsown table

    DV:

    DV node can advertise incorrectpath cost

    each nodes table used byothers

    error propagate thru network

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    22/114

    Transmisin de Datos Multimedia http://www.grc.upv.es/docencia/tdm Master IC 2007/2008

    Tema 2:

    Aspectos de encaminamiento

    Tema 2:

    Aspectos de encaminamiento

    Algoritmos bsicos de encaminamiento Link state Distance Vector

    Encaminamiento en Internet RIP

    OSPF

    BGP

    Multi-Protocol Label Switching (MPLS).

    IP multicast

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    23/114

    Tra

    nsmisindeDatosM

    ultimedia-Master

    IC2007/2008

    2

    3

    Hierarchical Routing

    Our routing study thus far - idealization all routers identical

    network flat

    not true in practice

    scale: with 200 million destinations:

    cant store all dests in routing tables!

    routing table exchange would swamp links!

    administrative autonomy

    internet = network of networks

    each network admin may want to control routing in its own network

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    24/114

    Tra

    nsmisindeDatosMultimedia-Master

    IC2007/2008

    2

    4

    Hierarchical Routing

    Gateway router

    Direct link to router in another AS

    aggregate routers into regions,autonomous systems (AS) routers in same AS run same routing protocol

    intra-AS routing protocol

    routers in different AS can run different intra-AS routing protocol

    8

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    25/114

    Tra

    nsmisindeDatosMultimedia-Master

    IC2007/2008

    2

    5

    3b

    1d

    3a

    1c 2aAS3

    AS1

    AS21a

    2c

    2b

    1b

    Intra-ASRoutingalgorithm

    Inter-ASRouting

    algorithm

    Forwardingtable

    3c

    Interconnected ASes

    Forwarding table is configured byboth intra- and inter-AS routingalgorithm

    Intra-AS sets entries forinternal dests

    Inter-AS & Intra-As sets

    entries for external dests

    8

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    26/114

    Tra

    nsmisindeDatosMultimedia-Master

    IC2007/2008

    2

    6

    3b

    1d

    3a

    1c2aAS3

    AS1

    AS21a

    2c2b

    1b

    3c

    Inter-AS tasks

    Suppose router in AS1 receivesdatagram for which dest is outside of

    AS1

    Router should forward packettowards one of the gateway

    routers, but which one?

    AS1 needs:

    to learn which dests arereachable through AS2 andwhich through AS3

    to propagate this reachabilityinfo to all routers in AS1

    Job of inter-AS routing!

    8

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    27/114

    Tra

    nsmisindeDatosMultimedia-Master

    IC2007/2008

    2

    7

    Learn from inter-ASprotocol that subnetx is reachable viamultiple gateways

    Use routing infofrom intra-AS

    protocol to determinecosts of least-cost

    paths to eachof the gateways

    Hot potato routing:Choose the gateway

    that has thesmallest least cost

    Determine fromforwarding table theinterface I that leads

    to least-cost gateway.Enter (x,I) in

    forwarding table

    Example: Choosing among multiple ASes

    Now suppose AS1 learns from the inter-AS protocol that subnet x isreachable from AS3 and from AS2.

    To configure forwarding table, router 1d must determine towardswhich gateway it should forward packets for dest x.

    This is also the job on inter-AS routing protocol!

    Hot potato routing: send packet towards closest of two routers.

    8

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    28/114

    Tra

    nsmisindeDatosMultimedia-Master

    IC2007/2008

    2

    8

    Intra-AS Routing

    Also known as Interior Gateway Protocols (IGP) Most common Intra-AS routing protocols:

    RIP: Routing Information Protocol

    OSPF: Open Shortest Path First

    IGRP: Interior Gateway Routing Protocol (Cisco proprietary)

    08

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    29/114

    Tra

    nsmisindeDatosMultimedia-Master

    IC2007/200

    2

    9

    RIP ( Routing Information Protocol)

    Distance vector algorithm Included in BSD-UNIX Distribution in 1982

    Distance metric: # of hops (max = 15 hops)

    DC

    BA

    u v

    w

    x

    yz

    destination hopsu 1v 2w 2

    x 3y 3z 2

    From router A to subsets:

    08

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    30/114

    Tra

    nsmisindeDatosMultimedia-Master

    IC2007/200

    3

    0

    RIP advertisements

    Distance vectors: exchanged among neighbors every 30 sec viaResponse Message (also called advertisement)

    Each advertisement: list of up to 25 destination nets within AS

    Table processing:

    RIP routing tables managed by application-level process called route-d(daemon)

    advertisements sent in UDP packets, periodically repeated

    physical

    link

    network forwarding(IP) table

    Transprt(UDP)

    routed

    physical

    link

    network(IP)

    Transprt(UDP)

    routed

    forwardingtable

    08

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    31/114

    Tra

    nsmisindeDatosMultimedia-Master

    IC2007/200

    3

    1

    RIP: Link Failure and Recovery

    If no advertisement heard after 180 sec --> neighbor/link declareddead

    routes via neighbor invalidated

    new advertisements sent to neighbors

    neighbors in turn send out new advertisements (if tables changed) link failure info quickly propagates to entire net

    poison reverse used to prevent ping-pong loops (infinite distance = 16hops)

    08

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    32/114

    Tra

    nsmisindeDatosMultimedia-Master

    IC2007/200

    3

    2

    OSPF (Open Shortest Path First)

    open: publicly available Uses Link State algorithm

    LS packet dissemination

    Topology map at each node

    Route computation using Dijkstras algorithm

    OSPF advertisement carries one entry per neighbor router

    Advertisements disseminated to entire AS (via flooding)

    Carried in OSPF messages directly over IP (rather than TCP or UDP

    08

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    33/114

    Tra

    nsmisindeDatos

    Multimedia-Master

    IC2007/20

    3

    3

    OSPF advanced features (not in RIP)

    Security: all OSPF messages authenticated (to prevent maliciousintrusion)

    Multiple same-cost paths allowed (only one path in RIP)

    For each link, multiple cost metrics for different TOS (e.g., satellite

    link cost set low for best effort; high for real time) Integrated uni- and multicast support:

    Multicast OSPF (MOSPF) uses same topology data base as OSPF

    Hierarchical OSPF in large domains.

    008

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    34/114

    TransmisindeDatos

    Multimedia-Master

    IC2007/20

    3

    4

    Hierarchical OSPF

    Two-level hierarchy: local area,backbone.

    Link-state advertisements onlyin area

    each nodes has detailed area

    topology; only know direction(shortest path) to nets in otherareas.

    Area border routers: summarizedistances to nets in own area,

    advertise to other Area Border routers. Backbone routers: run OSPF routing

    limited to backbone.

    Boundary routers: connect to otherASs.

    008

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    35/114

    TransmisindeDatos

    Multimedia-Master

    IC2007/20

    3

    5

    Internet inter-AS routing: BGP

    BGP (Border Gateway Protocol): thede facto standard BGP provides each AS a means to:

    1. Obtain subnet reachability information from neighboring ASs.

    2. Propagate the reachability information to all routers internal to the AS.

    3. Determine good routes to subnets based on reachability informationand policy.

    Allows a subnet to advertise its existence to rest of the Internet: Iam here

    008

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    36/114

    TransmisindeDatos

    Multimedia-Master

    IC2007/20

    3

    6

    BGP basics

    Pairs of routers (BGP peers) exchange routing info over semi-permanent TCP connections: BGP sessions

    Note that BGP sessions do not correspond to physical links.

    When AS2 advertises a prefix to AS1, AS2 is promising it will

    forward any datagrams destined to that prefix towards the prefix. AS2 can aggregate prefixes in its advertisement

    3b

    1d

    3a

    1c

    2aAS3

    AS1

    AS21a

    2c

    2b

    1b

    3c

    eBGP session

    iBGP session

    008

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    37/114

    TransmisindeDatos

    Multimedia-Master

    IC2007/2

    3

    7

    BGP route selection

    Router may learn about more than 1 route to some prefix. Routermust select route.

    Elimination rules:

    Local preference value attribute: policy decision

    Shortest AS-PATH Closest NEXT-HOP router: hot potato routing

    Additional criteria

    2008

    ff

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    38/114

    TransmisindeDatos

    Multimedia-Master

    IC2007/2

    3

    8

    Why different Intra- and Inter-AS routing ?

    Policy: Inter-AS: admin wants control over how its traffic routed, who routes

    through its net.

    Intra-AS: single admin, so no policy decisions needed

    Scale: hierarchical routing saves table size, reduced update traffic

    Performance:

    Intra-AS: can focus on performance

    Inter-AS: policy may dominate over performance

    T 2

    T 2

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    39/114

    Transmisin de Datos Multimedia http://www.grc.upv.es/docencia/tdm Master IC 2007/2008

    Tema 2:

    Aspectos de encaminamiento

    Tema 2:

    Aspectos de encaminamiento

    Algoritmos bsicos de encaminamiento

    Link state Distance Vector

    Encaminamiento en Internet RIP

    OSPF

    BGP

    Multi-Protocol Label Switching (MPLS).

    IP multicast

    2008

    MPLS Th M ti ti

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    40/114

    TransmisindeDatos

    Multimedia-MasterIC2007/2

    4

    0

    MPLS - The Motivation

    IP Protocol Suite - the most predominant networking technology.Voice & Data convergence on a single network infrastructure.

    Continual increase in number of users.

    Demand for higher connection speeds.

    Increase in traffic volumes.

    Ever-increasing number of ISP networks.

    MPLS Working Groups and Standards Standardized by the IETF - currently in Draft stage.

    MPLS recommendations are done by IP players for IP services

    MPLS core components are generic

    MPLS doesnt use specific technology process

    2008

    MPLS d ISO d l

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    41/114

    TransmisindeDatos

    Multimedia-MasterIC2007/2

    4

    1

    MPLS and ISO model

    PPP

    Physical (Optical - Electrical) 1

    2

    IP 3

    4

    Applications7to

    5

    FrameRelay

    ATM (*)

    TCP UDP

    PPP Frame Relay ATM (*)

    MPLS

    (*) ATM overlay model(without addressing and

    P-NNI) is considered asan ISO layer 2 protocol.

    IETF main goal is that

    when a layer is added,no modification isneeded on the existinglayers.

    All new protocol mustbe backwardcompatible

    /2008

    Some MPLS Terms

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    42/114

    TransmisindeDatos

    Multimedia-MasterIC2007/

    4

    2

    Some MPLS Terms...

    LER - Label Edge Router LSR - Label Switch Router

    FEC - Forward Equivalence Class

    Label - Associates a packet to a FEC

    Label Stack - Multiple labels containing information on how a packetis forwarded.

    Shim - Header containing a Label Stack

    Label Switch Path - path that a packet follows for a specific FEC LDP - Label Distribution Protocol, used to distribute Label

    information between MPLS-aware network devices

    Label Swapping - manipulation of labels to forward packets towards

    the destination.

    /2008

    MPLS Architecture

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    43/114

    Tr

    ansmisindeDatos

    Multimedia-MasterIC2007/

    4

    3

    Routing protocol OSPF OSPF OSPF

    AttributesPrecedence

    Local tableLabel table Local table Local table

    LSP (Label-Switched Paths)Label swapping Label removal

    Classification

    Label assignment

    Ingress

    Node

    Core

    Node

    Egress

    Node

    Label SwitchLayer 2

    Layer 1

    Layer 2

    Layer 1

    Layer 2

    Layer 1

    Layer 2

    Layer 1

    Layer 2

    Layer 1

    MPLS Architecture

    FEC table Local table Local table Local table

    Forward Equivalence Class

    /2008

    MPLS process

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    44/114

    Tr

    ansmisindeDatos

    Multimedia-MasterIC2007/

    4

    4

    Label swapping

    Label removal

    Classification

    Label assignment

    Label swapping

    Label removal

    Classification

    Label assignment

    OSPF / RIP / IS-IS

    Label Switch Path

    Label table

    Ingress

    Node

    Core

    Node

    Egress

    Node

    Layer 2

    Layer 1

    Layer 2

    Layer 1

    Layer 2

    Layer 1

    Precedence

    Label table Label table

    Layer 2

    Layer 1

    Layer 2

    Layer 1

    FEC FEC FEC

    MPLS process

    7/2008

    MPLS Cloud

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    45/114

    Tr

    ansmisindeDatos

    Multimedia-MasterIC2007

    4

    5

    LSR

    LER

    LSR

    LER

    IP PacketIP Packet w/ Label

    L3 RoutingL3 Routing

    Label SwappingLabel Swapping

    LER: Label Edge Router

    LERLER

    L3 RoutingL3 Routing

    L3 Routing

    MPLS Cloud

    LSR: Label Switch(ing) Router

    7/2008

    FEC Classification

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    46/114

    Tr

    ansmisindeDatos

    Multimedia-MasterIC2007

    4

    6

    Ingress Label FEC Egress Label

    6 138.120.6/24 - xxxx 9

    Ingress Label AttributeFEC Egress LabelIngress Label FEC Egress Label

    6 138.120.6/24 - xxxx 9

    Attribute

    A

    6 138.120.6/24 - xxxx 12B

    FECs are manually initiated by the operator

    A FEC is associated to at least one Label

    FEC Classification

    A packet can be mapped to a particular FEC based on the following criteria: destination IP address,

    source IP address,

    TCP/UDP port,

    in case of inter AS-MPLS, Source-AS and Dest-AS, class of service,

    application used,

    any combination of the previous criteria.

    7/2008

    What is a Label?

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    47/114

    Tr

    ansmisindeDatos

    Multimedia-MasterIC2007

    4

    7

    L2 Type L2 TypePort PortIngress Label Egress LabelFEC

    ATM 1-1 12 (i.e. 4/65) F1 22 (i.e. 5/65)3-4ATM

    ATM 1-1 15 (i.e. 0/25) F4 9 (i.e. 101) 5-1FR

    Gig Eth 5-1 7 F1 22 (i.e. 4/65)3-4ATM

    What is a Label?

    A label is a short, fixed length, locally significant identifier used toidentify a FEC.

    The label can be identified by the L2 technology identifier (e.g.VPI/VCI for ATM, DLCI for FR or MPLS label for PPP/Ethernet).

    MPLS Label Assignment Schemes Topology Driven

    Label assignment in response to routing protocols (OSPF and BGP) updates

    Control Driven Label assignment in response to RSVP, CR-LDP requests

    Traffic Driven Label assignment in response to flow detection & triggering

    7/2008

    The MPLS Shim Header

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    48/114

    Tr

    ansmisindeDatos

    Multimedia-Maste

    rIC2007

    4

    8

    Label

    (20 bits)

    Exp

    (3 bits)

    S

    (1 bit)

    TTL

    (8bits)

    Label : Label value (0 to 15 are reserved for special use)

    Exp : Experimental Use

    S : Bottom of Stack (set to 1 for the last entry in the label)TTL : Time To Live

    The MPLS Shim Header

    The Label (Shim Header) is represented as a sequence of Label Stack Entry

    Each Label Stack Entry is coded by 4 bytes (32 bits) as described 20 Bits is reserved for the Label Identifier (also named Label)

    Based on the contents of the label a swap,push(impose) orpop(dispose) operation can be performed on the packet's label stack

    7/2008

    Label Switched Path

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    49/114

    Tr

    ansmisindeDatos

    Multimedia-Maste

    rIC200

    4

    9

    Label Switched Path

    5 12

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 138.120 312

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 138.120 x4

    53

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 x 138.120

    MPLS switch

    MPLS switch

    MPLS switch

    MPLS switch1

    2

    3

    1 2

    3

    1

    2

    3

    4

    1

    2

    3

    138.120

    192.168127.20

    07/2008

    Hop by Hop IP forwarding

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    50/114

    Tr

    ansmisindeDatos

    Multimedia-Maste

    rIC200

    5

    0

    MPLS switch

    MPLS switch

    MPLS switch

    MPLS switch1

    2

    3

    1 2

    3

    1

    2

    3

    4

    1

    2

    3

    138.120

    192.168127.20

    138.120.6.12

    138.12

    0.6.12

    138.120.6.12138.120.6.1

    2

    138.120.6.12

    138.12

    0.6.12

    ??

    138.120.6.12

    Default3

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 x None

    ??

    138.12

    0.6.12

    Default Default

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 None 3

    ??

    138.12

    0.6.12 ??

    138.120.6.12

    Default

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 None x4

    ??138.120.6.1

    2

    Hop by Hop IP forwarding

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    51/114

    07/2008

    MPLS Label Distribution Protocol

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    52/114

    TransmisindeDatos

    Multimedia-Maste

    rIC200

    5

    2

    LDP - a set of procedures by which one LSR informs the other ofthe FEC-to-Label binding it has made.

    Currently, several protocols used as Label Distribution Protocol(LDP) are available:

    RSVP-TE (MPLS extension) LDP and CR-LDP

    BGP-4 MPLS extensions

    Label Distribution schemes

    07/2008

    Downstream stream on demand

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    53/114

    TransmisindeDatos

    Multimedia-Maste

    rIC20

    5

    3

    Downstream stream on demand

    Mapping12

    Mapping5

    5 12

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 138.120 312

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 138.120 x4

    53

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 x 138.120

    Reques

    t138.1

    20

    Request138.120

    MPLS switch

    MPLS switch

    MPLS switch

    MPLS switch1

    2

    3

    1 2

    3

    1

    2

    3

    4

    1

    2

    3

    138.120

    192.168127.20

    The label is requested by the

    upstream node and thedownstream node defines the label

    used.

    007/2008

    Unsolicited Downstream

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    54/114

    TransmisindeDatos

    Multimedia-Maste

    rIC20

    5

    4

    Unsolicited Downstream

    MPLS switch

    MPLS switch

    MPLS switch

    MPLS switch1

    2

    3

    1 2

    3

    1

    2

    3

    4

    1

    2

    3

    138.120

    192.168127.20

    Mapping12

    Mapping5

    5 1212

    5

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 138.120 3

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 138.120 x4

    3

    IngressInterface

    IngressLabel

    FEC EgressInterface

    EgressLabel

    1 x 138.120

    The downstream node defines

    the label and advertises it to theupstream node.

    007/2008

    Edge LSR Features

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    55/114

    TransmisindeDatos

    Multimedia-Maste

    rIC20

    5

    5

    Routing protocols FEC Classification

    Initiates LSP setup for Downstream On Demand method

    Adaptation of non-MPLS data to MPLS data

    Layer 2 translation for MPLS data

    Terminated MPLS-VPN

    At least one LDP protocol

    Edge LSR is counted into the TTL count as a regular router

    007/2008

    Core LSR Features

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    56/114

    TransmisindeDatos

    Multimedia-Maste

    rIC20

    5

    6

    Routing protocols Propagates Downstream On Demand method (request and mapping)

    Layer 2 translation

    High speed label forwarding/switching

    At least one LDP protocol

    007/2008

    MPLS Advantages

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    57/114

    TransmisindeDatos

    Multimedia-Maste

    rIC20

    5

    7

    Simplified Forwarding Efficient Explicit Routing

    Traffic Engineering

    QoS Routing

    Mappings from IP Packet to Forwarding Equivalence Class (FEC)

    Partitioning of Functionality

    Common Operation over Packet and Cell media

    Tema 2:

    Tema 2:

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    58/114

    Transmisin de Datos Multimedia http://www.grc.upv.es/docencia/tdm Master IC 2007/2008

    Aspectos de encaminamientoAspectos de encaminamiento

    Algoritmos bsicos de encaminamiento Link state

    Distance Vector

    Encaminamiento en Internet RIP

    OSPF

    BGP

    Multi-Protocol Label Switching (MPLS).

    IP multicast

    2007/2008

    Multicast = Efficient Data Distribution

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    59/114

    TransmisindeDatosMultimedia-Maste

    rIC2

    5

    9

    Src Src

    2007/2008

    Why Multicast ?

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    60/114

    T

    ransmisindeDatosMultimedia-Maste

    rIC2

    6

    0

    Need for efficient one-to-many delivery of same dataApplications:

    News/sports/stock/weather updates

    Distance learning

    Configuration, routing updates, service location Pointcast-type push apps

    Teleconferencing (audio, video, shared whiteboard, text editor)

    Distributed interactive gaming or simulations

    Email distribution lists Content distribution; Software distribution

    Web-cache updates

    Database replication

    2007/2008

    Why Not Broadcast or Unicast?

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    61/114

    T

    ransmisindeDatosMultimedia-Maste

    rIC2

    6

    1

    Broadcast: Send a copy to every machine on the net

    Simple, but inefficient

    All nodes must process packet even if they dont care

    Wastes more CPU cycles of slower machines (broadcast radiation) Network loops lead to broadcast storms

    Replicated Unicast:

    Sender sends a copy to each receiver in turn

    Receivers need to register or sender must be pre-configured Sender is focal point of all control traffic

    Reliability => per-receiver state, separate sessions/processes at sender

    2007/2008

    Multicast Apps Characteristics

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    62/114

    T

    ransmisindeDatosMultimedia-Maste

    rIC

    6

    2

    Number of (simultaneous) senders to the group The size of the groups

    Number of members (receivers)

    Geographic extent or scope

    Diameter of the group measured in router hops The longevity of the group

    Number of aggregate packets/second

    The peak/average used by source

    Level of human interactivity

    Lecture mode vs interactive

    Data-only (eg database replication) vs multimedia

    2007/2008

    Reliable Multicast vs. Unreliable Multicast

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    63/114

    T

    ransmisindeDatosMultimedia-MasterIC

    6

    3

    When a multicast message is sent by a process, the runtimesupport of the multicast mechanism is responsible for delivering themessage to each process currently in the multicast group.

    As each participating process may be on a separate host, due to

    factors such as failures of network links and/or network hosts,routing delays, and differences in software and hardware, the timebetween when a message is sent and when it is received may varyamong the recipient processes.

    Moreover, a message may not be received by one or more of theprocesses at all.

    2007/2008

    Classification of multicasting mechanisms in terms of messagedelivery

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    64/114

    T

    ransmisindeDatosMultimedia-MasterIC

    6

    4

    Unreliable multicast: The arrival of the correct message at each process is not guaranteed.

    Reliable multicast:

    Guarantees that each message is eventually delivered in a non-

    corrupted form to each process in the group. The definition of reliable multicast requires that each participating

    process receives exactly one copy of each message sent. It doesnot put any restriction of the order the messages delivered.

    Reliable multicast can be further classified based on the order of thedelivery of the messages: unordered, FIFO, causal order, atomicorder.

    C2007/2008

    Classification of reliable multicast -- unordered

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    65/114

    T

    ransmisindeDatosMultimedia-MasterIC

    6

    5

    An unordered reliable multicast system guarantees the safe deliveryof each message, but it provides no guarantee on the delivery orderof the messages.

    Example: Processes P1, P2, and P3 have formed a multicast group.

    Three messages, m1, m2, m3 have been sent to the group. Anunordered reliable multicast system may deliver the messages toeach of the three processes in any of these:

    m1-m2-m3,

    m1-m3-m2,m2-m1-m3,

    m2-m3-m1,

    m3-m1-m2,

    m3-m2-m1

    C2007/2008

    Classification of reliable multicast - FIFO

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    66/114

    T

    ransmisindeDatosMultimedia-MasterIC

    6

    6

    If process P sent messages mi and mj, in that order, then eachprocess in the multicast group will be delivered the messages miand mj, in that order.

    Note that FIFO multicast places no restriction on the delivery order

    among messages sent by different processes. For example, P1sends messages m11 then m12, and P2 sends messages m21 thenm22. It is possible for different processes to receive any of thefollowing orders:

    m11-m12-m21-m22,m11-m21-m12-m22,

    m11-m21-m22-m12,

    m21-m11-m12-m22

    m21-m11-m22-m12m21-m22-m11-m12.

    C2007/2008

    Classification of reliable multicast Causal order

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    67/114

    T

    ransmisindeDatosMultimedia-MasterIC

    6

    7

    If message mi causes (results in) the occurrence of message mj,then mi will be delivered to each process prior to mj. Messages miand mj are said to have a causal or happen-before relationship.

    For example, P1 sends a message m1, to which P2 replies with a

    multicast message m2. Since m2 is triggered by m1, the twomessages share a causal relationship of m1-> m2. A causal-ordermulticast message system ensures that these two messages will bedelivered to each of the processes in the order of m1- m2.

    C2007/2008

    Classification of reliable multicast Atomic order

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    68/114

    T

    ransmisindeDato

    sMultimedia-MasterIC

    6

    8

    In an atomic-order multicast system, all messages are guaranteedto be delivered to each participant in the exact same order. Notethat the delivery order does not have to be FIFO or causal, butmust be identical for each process.

    Example: P1 sends m1, P2 sends m2, and P3 sends m3.

    An atomic system will guarantee that the messages will be deliveredto each process in only one of the six orders:

    m1-m2- m3, m1- m3- m2, m2- m1-m3,

    m2-m3-m1, m3-m1- m2, m3-m2-m1.

    C2007/2008

    IP Multicast Architecture

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    69/114

    T

    ransmisindeDato

    sMultimedia-MasterIC

    6

    9

    Hosts

    Routers

    Service modelService model

    Host-to-router protocol

    (IGMP)

    Multicast routing protocols

    (various)

    IC2007/2008

    IP Multicast model: RFC 1112

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    70/114

    T

    ransmisindeDato

    sMultimedia-MasterI

    7

    0

    Message sent to multicast group (of receivers) Senders need not be group members

    A group identified by a single group address

    Use group address instead of destination address in IP packet sent togroup

    Groups can have any size;

    Group members can be located anywhere on the Internet

    Group membership is not explicitly known

    Receivers can join/leave at will

    Packets are not duplicated or delivered to destinations outside thegroup

    Distribution tree constructed for delivery of packets

    No more than one copy of packet appears on any subnet

    Packets delivered only to interested receivers => multicast deliverytree changes dynamically

    Network has to actively discover paths between senders and receivers

    IC2007/2008

    IP Multicast Addresses

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    71/114

    T

    ransmisindeDato

    sMultimedia-MasterI

    7

    1

    Class D IP addresses 224.0.0.0 239.255.255.255

    Address allocation: Well-known (reserved) multicast addresses, assigned by IANA:

    224.0.0.x and 224.0.1.x Transient multicast addresses, assigned andreclaimed dynamically, e.g., by sdr program

    Each multicast address represents a group of arbitrary size, called ahost group

    There is no structure within class D address space like subnetting=> flat address space

    1 1 1 0 Group ID

    IC2007/2008

    IP Multicast Service

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    72/114

    T

    ransmisindeDato

    sMultimedia-Mast

    er

    7

    2

    Sending Uses normal IP-Send operation, with an IP multicast address specified

    as the destination

    Must provide sending application a way to: Specify outgoing network interface, if >1 available

    Specify IP time-to-live (TTL) on outgoing packet Enable/disable loop-back if the sending host is/isn't a member of the

    destination group on the outgoing interface

    Receiving Two new operations

    Join-IP-Multicast-Group(group-address, interface)

    Leave-IP-Multicast-Group(group-address, interface)

    Receive multicast packets for joined groups via normal IP-Receive

    operation

    IC2007/2008

    Link-Layer Transmission/Reception

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    73/114

    T

    ransmisindeDato

    sMultimedia-Mast

    er

    7

    3

    Transmission IP multicast packet is transmitted as a link-layer multicast, on those

    links that support multicast

    Link-layer destination address is determined by an algorithm specific tothe type of link

    Reception

    Necessary steps are taken to receive desired multicasts on a particularlink, such as modifying address reception filters on LAN interfaces

    Multicast routers must be able to receive all IP multicasts on a link,

    without knowing in advance which groups will be used

    IC2007/2008

    Using Link-Layer Multicast Addresses

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    74/114

    T

    ransmisindeDato

    sMultimedia-Mast

    er

    7

    4

    Ethernet and other LANs using 802 addresses: Direct mapping! Simpler than unicast! No ARP etc.

    32 class D addresses may map to one MAC address

    Special OUI for IETF: 0x01-00-5E.

    No mapping needed for point-to-point links

    LAN multicast address

    0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0

    1 1 1 0 28 bits

    23 bits

    IP multicast address

    Group bit

    rIC2007/2008

    Multicast over LANs & Scoping

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    75/114

    T

    ransmisindeDato

    sMultimedia-Mast

    er

    7

    5

    Multicasts are flooded across MAC-layer bridges along a spanningtree

    But flooding may steal sending opportunity for non-member stationswhich want to transmit

    Almost like broadcast!

    Scope: How far do transmissions propagate?

    Implicit scoping: Reserved Mcast addresses => dont leave subnet.

    Also called link-local addresses

    TTL-based scoping:

    Multicast routers have a configured TTL threshold

    Multicast datagram dropped if TTL

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    76/114

    T

    ransmisindeDato

    sMultimedia-Mast

    er

    7

    6

    TTL expanding-ring search to reach or find a nearby subset of agroup

    Rings can be nested, but not overlapping

    s

    1

    2

    3

    rIC2007/2008

    IP Multicast Architecture

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    77/114

    TransmisindeDato

    sMultimedia-Mast

    er

    7

    7

    Hosts

    Routers

    Service model

    HostHost--toto--router protocolrouter protocol(IGMP)(IGMP)

    Multicast routing protocols

    (various)

    erIC2007/2008

    Internet Group Management Protocol

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    78/114

    TransmisindeDato

    sMultimedia-Maste

    7

    8

    IGMP: signaling protocol to establish, maintain, remove groups ona subnet.

    Objective: keep router up-to-date with group membership of entireLAN

    Routers need not know who all the members are, only that membersexist

    Each host keeps track of which mcast groups are subscribed to

    Socket API informs IGMP process of all joins

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    79/114

    erIC2007/2008

    How IGMP Works (cont.)

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    80/114

    TransmisindeDato

    sMultimedia-Maste

    8

    0

    When a hosts timer for group G expires, it sends a MembershipReport to group G, with TTL = 1

    Other members of G hear the report and stop (suppress) theirtimers

    Routers hear all reports, and time out non-responding groups

    Q

    G G G G

    Routers:

    Hosts:

    erIC2007/2008

    How IGMP Works (cont.)

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    81/114

    TransmisindeDato

    sMultimedia-Maste

    8

    1

    Normal case: only one report message per group present is sent inresponse to a query

    Query interval is typically 60-90 seconds

    When a host first joins a group, it sends immediate reports, instead

    of waiting for a query

    IGMPv2: Hosts may send a Leave group message to all routers(224.0.0.2) address

    Querier responds with a Group-specific Query message: see if anygroup members are present

    Lower leave latency

    terIC2007/2008

    The Java Basic Multicast API

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    82/114

    TransmisindeDato

    sMultimedia-Mast

    82

    At the transport layer, the basic multicast supported by Java is anextension of UDP (the User Datagram Protocol)

    For the basic multicast, Java provides a set of classes which areclosely related to the datagram socket API classes

    terIC2007/2008

    Datagram - recap

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    83/114

    TransmisindeDato

    sMultimedia-Mast

    83

    a byte array

    a DatagramPacket object

    receiver's

    address

    a DatagramSocket

    object

    sender process

    a byte array

    a DatagramPacket object

    a DatagramSocket

    object

    receiver process

    send

    receive

    object reference

    data flow

    terIC2007/2008

    The Java Basic Multicast API - 2

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    84/114

    TransmisindeDatosMultimedia-Mast

    84

    There are four major classes in the API, the first three of which wehave already seen in the context of datagram sockets.

    InetAddress: In the datagram socket API, this class represents theIP address of the sender or receiver. In multicasting, this classcan be used to identify a multicast group.

    DatagramPacket: As with datagram sockets, an object of this classrepresents an actual datagram; in multicast, a DatagramPacketobject represents a packet of data sent to all participants orreceived by each participant in a multicast group.

    DatagramSocket: In the datagram socket API, this class representsa socket through which a process may send or receive data.

    MulticastSocket : A MulticastSocket is a DatagramSocket, withadditional capabilities for joining and leaving a multicast group. Anobject of the multicast datagram socket class can be used for

    sending and receiving multicast packets. In the Java API, aMulticastSocket object is bound to a port address, e.g. 3456, andmethods of the object allows for the joining and leaving of amulticast address, e.g. 239.1.2.3

    terIC2007/2008

    Joining a multicast group

    To join a multicast group at IP address mand UDP portp, a

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    85/114

    TransmisindeDatosMultimedia-Mas

    85

    MulticastSocket object must be instantiated withp, then the objectsjoinGroup method can be invoked specifying the address m:

    // join a Multicast group at IP address

    // 239.1.2.3 and port 3456

    InetAddress group =

    InetAddress.getByName("239.1.2.3"); MulticastSocket s =

    new MulticastSocket(3456); s.joinGroup(group);

    terIC2007/2008

    Sending to a multicast group

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    86/114

    TransmisindeDatosMultimedia-Mas

    86

    A multicast message can be sent using syntax similar with the datagramsocket API.

    String msg = "a multicast message.";

    InetAddress group =

    InetAddress.getByName("239.1.2.3");

    MulticastSocket s =

    new MulticastSocket(3456);

    s.joinGroup(group); // optional

    DatagramPacket hi =new DatagramPacket(msg.getBytes( ),

    msg.length( ),group, 3456);

    s.send(hi);

    terIC2007/2008

    Receiving messages sent to a multicast group

    h h d l h

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    87/114

    TransmisindeDatosMultimedia-Mas

    87

    A process that has joined a multicast group may receive messages sent to the groupusing syntax similar to receiving data using a datagram socket API.

    byte[] buf = new byte[1000];

    InetAddress group =

    InetAddress.getByName("239.1.2.3");

    MulticastSocket s =

    new MulticastSocket(3456);

    s.joinGroup(group);

    DatagramPacket recv =

    new DatagramPacket(buf,buf.length);

    s.receive(recv);

    terIC2007/2008

    Leaving a multicast group

    A l lti t b i ki th l G

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    88/114

    TransmisindeDatosMultimedia-Mas

    88

    A process may leave a multicast group by invoking the leaveGroupmethod of a MulticastSocket object, specifying the multicastaddress of the group.

    s.leaveGroup(group);

    terIC2007/2008

    Setting the time-to-live

    Th ti t d t t lti t f

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    89/114

    TransmisindeDatosMultimedia-Mas

    89

    The runtime support needs to propagate a multicast message froma host to a neighboring host in an algorithm which, when executedproperly, will eventually deliver the message to all the participants.

    Under some anomalous circumstances, however, it is possible thatthe algorithm which controls the propagation does not terminateproperly, resulting in a packet circulating in the network indefinitely.

    Indefinite message propagation causes unnecessary overhead onthe systems and the network.

    To avoid this occurrence, it is recommended that a time to liveparameter be set with each multicast datagram.

    The time-to-live (ttl) parameter, when set, limits the count ofnetwork links or hops that the packet will be forwarded on thenetwork.

    as

    terIC2007/2008

    Setting the time-to-live

    Th d d ttl tti

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    90/114

    TransmisindeDatosMultimedia-Mas

    90

    The recommended ttl settings are: 0 if the multicast is restricted to processes on the same host

    1 if the multicast is restricted to processes on the same subnet

    32 if the multicast is restricted to processes on the same site

    64 if the multicast is restricted to is processes on the same region 128 is if the multicast is restricted to processes on the same

    continent

    255 is the multicast is unrestricted

    as

    terIC2007/2008

    Setting the time-to-live

    String msg = "Hello everyone!";

    dd

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    91/114

    TransmisindeDatosMultimedia-Mas

    91

    InetAddress group =

    InetAddress.getByName("239.1.2.3");MulticastSocket s = new MulticastSocket(3456);s.setTimeToLive(1);

    // set time-to-live to 1 hop

    DatagramPacket hi =

    new DatagramPacket(msg.getBytes( ),msg.length( ),group, 3456);

    s.send(hi);

    The value specified for the ttlmust be in the range 0

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    92/114

    TransmisindeDatosMultimedia-Mas

    92

    To join a group, you use the setsockopt() kernel service call with anew parameter. The new parameter is the ip_mreq structure:

    The imr_multiaddr field specifies the multicast group you want tojoin. It is the same format as the sin_addr field in the sockaddr_instructure. The imr_interface field lets you choose a particular hostinterface. This is similar to a bind(), which lets you specify the host

    interface (or leave the host option wide open with an INADDR_ANYvalue).

    /************************************************************/

    /*** The ip_mreq structure for selecting a multicast addr ***/

    /************************************************************/

    struct ip_mreq{

    struct in_addr imr_multiaddr; /* known multicast group */

    struct in_addr imr_interface; /* network interface */

    } ;

    Mas

    terIC2007/2008

    The C version: Joining Multicast Groups

    The following code snippet shows you how to join a group using thei t t It t th i i t f fi ld t INADDR ANY

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    93/114

    TransmisindeDat

    osMultimedia-Mas

    93

    The following code snippet shows you how to join a group using theip_mreq structure. It sets the imr_interface field to INADDR_ANYmerely for demonstration. Do not use it unless you have only oneinterface on your host; the results can be unpredictable .

    /************************************************************/

    /*** Join a multicast group ***/

    /************************************************************/

    const char *GroupID = "224.0.0.10";

    struct ip_mreq mreq;

    if ( inet_aton(GroupID, &mreq.imr_multiaddr) == 0 )

    panic("address (%s) bad", GroupID);

    mreq.imr_interface.s_addr = INADDR_ANY;

    if ( setsockopt(sd, SOL_IP, IP_ADD_MEMBERSHIP,&mreq,sizeof(mreq))!= 0)

    panic("Join multicast failed");

    /************************************************************/

    /*** Drop a multicast group ***/

    /************************************************************/

    if ( setsockopt(sd, SOL_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)) != 0 )

    panic("Drop multicast failed");

    Mas

    terIC2007/2008

    IP Multicast Architecture

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    94/114

    TransmisindeDat

    osMultimedia-M

    94

    Hosts

    Routers

    Service model

    Host-to-router protocol

    (IGMP)

    Multicast routing

    protocols

    Mas

    terIC2007/2008

    Multicast Routing

    Basic objective build distribution tree for multicast packets

    Th l f th di t ib ti t th b t t i i t l t

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    95/114

    TransmisindeDat

    osMultimedia-M

    95

    The leaves of the distribution tree are the subnets containing at leastone group member (detected by IGMP)

    Multicast service model makes it hard

    Anonymity

    Dynamic join/leave

    Mas

    terIC2007/2008

    Simple Multicast Routing Techniques

    Flood and prune

    Begin by flooding traffic to entire network

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    96/114

    TransmisindeDat

    osMultimedia-M

    96

    Begin by flooding traffic to entire network

    Prune branches with no receivers

    Examples: DVMRP, PIM-DM

    Unwanted state where there are no receivers

    Link-state multicast protocols

    Routers advertise groups for which they have receivers to entirenetwork

    Compute trees on demand

    Example: MOSPF

    Unwanted state where there are no senders

    Mas

    terIC2007/2008

    How to Flood Efficiently ?

    A router forwards a packet from source (S) iffit arrives via theshortest path from the router back to S

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    97/114

    TransmisindeDat

    osMultimedia-M

    97

    shortest path from the router back to S

    Reverse path check!

    Packet is replicated out all but the incoming interface

    Reverse shortest paths easy to compute

    just use info in DVrouting tables

    DV gives shortest reverse paths

    Efficient if costs are symmetric

    xxyy

    tt

    SS

    a

    zz

    Forward packets that arriveon shortest path from t to S(assume symmetric routes)

    Mas

    terIC2007/2008

    Problem

    Flooding can cause a given packet to be sent multiple times over the same

    link: can filter better than this!

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    98/114

    TransmisindeDat

    osMultimedia-

    98

    ca te bette t a t s

    Solution: Reverse Path Broadcasting

    xx yy

    zz

    SS

    a

    b

    duplicate packet

    MasterIC2007/2008

    Reverse Path Broadcasting (RPB)

    Basic idea: forward a packet from S only on child links for S

    Child link of router x for source S: link that has x as parent on the

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    99/114

    TransmisindeDatosMultimedia-

    99

    Child link of router x for source S: link that has x as parent on theshortest path from the link to S

    xx yy

    zz

    SS

    a

    b

    5 6

    child link of xfor S

    forward onlyto child link

    MasterIC2007/2008

    How to Find Child Links?

    Routing updates !

    If z tells x that it can reach S through y

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    100/114

    TransmisindeDatosMultimedia-

    10

    If z tells x that it can reach S through y,

    and if the cost of this path is >= the cost of the path from z to Sthrough x,

    then x knows that the link to z is a child link

    In case of tie, lower address wins

    xx yy

    zz

    SS

    a

    b

    5 6

    child link of xfor S

    forward only

    to child link

    MasterIC2007/2008

    Truncated RPB

    This is still a broadcast algorithm the traffic goes everywhere lousy filtering!

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    101/114

    TransmisindeDatosMultimedia-

    10

    y g

    First order solution: Truncated RPB

    Don't forward traffic onto networks with no receivers

    Identify leaves

    Leaf links are the child links that no other router uses to reach source S

    Use periodic updates of form: this is my next-link to source S

    If child is not the next-link for anyone, it is a leaf

    Detect group membership in leaf (IGMP)

    -MasterIC2007/2008

    Reverse Path Multicast (RPM)

    Prune back transmission so that only absolutely necessary linkscarry traffic

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    102/114

    TransmisindeDatosMultimedia-

    10

    y

    Use on-demand pruning so that router group state scales withnumber of active groups

    xx yy

    tt

    SS

    vv bbaa

    aa

    bb

    data messageprune message

    -MasterIC2007/2008

    Basic RPM Idea

    Prune (Source,Group) at leaf if no members

    Send Non-Membership Report (NMR) up the tree

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    103/114

    TransmisindeDatosMultimedia-

    10

    p p ( ) p

    If all children of router R prune (S,G)

    Propagate prune for (S,G) to parent R

    On timeout: Prune dropped

    Flow is reinstated

    Down stream routers re-prune

    Note: this is a soft-state approach Grafting: Explicitly reinstate sub-tree when

    IGMP detects new members at leaf, or when a child asks for a graft.

    -MasterIC2007/2008

    Putting it together: Topology

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    104/114

    TransmisindeDatosMultimedia

    10

    G G

    S

    G

    -MasterIC2007/2008

    Flood with Truncated Broadcast

    G G

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    105/114

    TransmisindeDatosMultimedia

    10

    S

    G

    -MasterIC2007/2008

    Pruning

    G G

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    106/114

    TransmisindeDatosMultimedia

    10

    S

    Prune (s,g)

    Prune (s,g)

    G

    a-MasterIC2007/2008

    Grafting

    G G

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    107/114

    TransmisindeDatosMultimedia

    10

    Graft (s,g)

    Graft (s,g)S

    G

    G

    Report (g)

    a-Ma

    sterIC2007/2008

    After Grafting Complete

    G G

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    108/114

    TransmisindeDa

    tosMultimedia

    10

    S

    G

    G

    a-Ma

    sterIC2007/2008

    Reliable Multicast: The Goal

    Implement reliability on top of IP multicast

    Why is this hard ?

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    109/114

    TransmisindeDa

    tosMultimedia

    10

    Sender cannot keep state for unknown number of dynamic receivers

    Remember open & dynamic group semantic?

    Algorithms like TCP that estimate path properties such as RTT and

    congestion window dont generalize to trees. Remember: TCP is only for a unicast session!

    Has to address (N)ACK implosions

    a-Ma

    sterIC2007/2008

    Implosion

    Packet 1 is lost All 4 receivers request a resend

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    110/114

    TransmisindeDa

    tosMultimedia

    11

    R1

    S

    R3 R4

    R2

    21

    R1

    S

    R3 R4

    R2

    Resend request

    ia-Ma

    sterIC2007/2008

    Retransmission

    Re-transmitter

    Options: sender, other receivers

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    111/114

    TransmisindeDa

    tosMultimedi

    11

    How to retransmit

    Unicast, multicast, scoped multicast, retransmission group,

    Problem: retransmissions (aka repairs) may reach destinations thatdont require a retransmission

    A.k.a exposure problem

    Solution: subcast the re-transmission only to receivers that need it.

    ia-Ma

    sterIC2007/2008

    Why Subcast? Exposure problem

    Packet 1 does not reach R1;Receiver 1 requests a resend

    Packet 1 resent to all 4 receivers

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    112/114

    TransmisindeDa

    tosMultimed

    11

    R1

    S

    R3 R4

    R2

    21

    R1

    S

    R3 R4

    R2

    q

    1

    1

    Resend request Resent packet

    dia-Ma

    sterIC2007/2008

    Ideal Recovery Model

    Packet 1 reaches R1 but is lostbefore reaching other Receivers

    Only one receiver sends NACK tothe nearest S or R with packet

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    113/114

    TransmisindeDa

    tosMultimed

    11

    S

    R3 R4

    R2

    2

    1

    S

    R3 R4

    R2

    g p

    Resend request

    1 1

    Resent packet

    Repair sent

    only to

    those thatneed packet

    R1 R1

    dia-Ma

    sterIC2007/2008

    Reliable Multicast Transport: Issues

    Retransmission can make reliable multicast as inefficient as

    replicated unicast (N)ACK-implosion if all destinations ack at once

  • 8/10/2019 Universidad Publica de Valencia_T2_encaminamiento

    114/114

    TransmisindeDa

    tosMultimed

    1

    1

    (N)ACK-implosion if all destinations ack at once

    Crying baby: a bad link affects entire group

    Heterogeneity: receivers, links, group sizes

    Anonymous/Open/Dynamic Group Model:

    Source does not know # of destinations, and destinations may vanish

    Multicast applications do not need strong reliability of the typeprovided by TCP.

    Can tolerate some reordering, delay, etc