tc: 网络流量控制

Tags: manual 

目录

说明

当一台机器上运行多个需要网络IO的程序, 如果一个程序通行量很大,会影响到其它程序的网络通信。需要对程序的网络流量进行控制。

通过 setsockopt 设置Socket SO_PRIORITY 选项,是一种比较模糊的方法。

SO_PRIORITY   //man 7 socket
    
Set the protocol-defined priority for all packets to be sent on this socket.
Linux uses this value to order the networking queues: packets with a higher 
priority may be processed  first  depending  on  the  selected device queueing
discipline.  For ip(7), this also sets the IP type-of-service (TOS) field for 
outgoing packets.  Setting a priority outside the range 0 to 6 requires the 
CAP_NET_ADMIN capability.

更好的方式是用 TC—— traffic control。

TC 手册与文档

TC的用法很复杂,而它的文档分布的又很分散,以至于汇总一下TC的资料都是很有意义的。

TC程序中使用说明:

//一级命令
    tc help   

//二级级命令
    tc [qdisc|class|filter] help

//三级命令
    tc qdisc add [stab|pfifo|bfifo|tbf|prio|htb|cbq|red|...] help
    tc class add [htb|cbq|...] help
    tc filter add [rsvp|u32|fw|route|basic|...] help

TC的Manual(Man手册):

//入口
tc(8)

//子项
tc-bfifo(8)
tc-cbq(8)
tc-choke(8)
tc-codel(8)
tc-drr(8)
tc-ematch(8)
tc-fq_codel(8)
tc-hfsc(7)
tc-hfsc(8)
tc-htb(8)
tc-pfifo(8)
tc-pfifo_fast(8)
tc-red(8)
tc-sfb(8)
tc-sfq(8)
tc-stab(8)
tc-tbf(8)

其他资料:

tcng是一个生成tc指令的工具,用来简化tc的使用。

TC 使用

TC本身只是一个与Linux Kernel交互的工具,执行流量控制的是Kernel。TC可以对进流量做有限的控制,主要用途是控制出去的流量。

TC的整套配置策略由qdisc、class、filter三部分组成。

qdisc:   报文队列的处理策略,分为classful和classless
         classful的qdisc中可以包含class
         classless的qdisc是最终送出报文的队列的策略

class:   class必须是隶属于某个类型是classful的qdisc,class自身的类型也必须是classful
         它的majorID等同于所属的qdisc的MajorID。
         class中可以包含一个子qdisc(classful或者classless)或者一个class
         class中必须有且只有一个classless的qdisc, 一般默认为pfifo
         class可以限制接收的报文的类型(优先级低于filter),设置方式视具体class类型而定
         子qdisc是一个完备的qdisc,如果它是classful的,可以继续包含class

filter:  filter必须隶属某个classful qdisc, 用于告知满足过滤条件的报文的去处
         报文的去处可以是一个class,也可以是一个qdisc

qdisc的种类:

classless qdisc
    pfifo_fast      //默认策略,three bands with different priority, first in first out
    bfifo/pfifo     //one queue, first in first out, tail drop
    red             //Random Early Detection:
                    //    When the queue's length reaches the max value , packet is dropped/marked at the designated probability
    sfq             //Stochastic Fairness Queueing:  
                    //    packets will be dipatched into different buckets by the hash value,
                    //    buckets is queried in a round robin fashion when dequeuing
    sfb             //?? Stochastic Fair Blue: 
                    //    8 levels of 16 bins, each bin maintains a marking probability
                    //    BLUE: A New Class of Active Queue Management Algorithms
    tbf             //Token Bucket Filter
                    //    traffic is filtered based on the expenditure of tockens
                    //    strict shaper, non-work-conserving
    choke           //Choose and keep Scheduler
                    //    once the queue hits a certain average length, a random packet is drawn from the queue
                    //    if both the to-be-queued and the drawn packet belong to the same flow, both packets are dropped
    codel           //Controlled-Delay Active Queue Management algorithm
                    //    use the local mininum queue as a measure of the standing/persistent queue
                    //    use a single state-tracking variable of the mininum delay 
                    //    measured in packet-sojourn time in the queue
    fq_codel        //Fair Queuing with Controlled Delay
                    //    combine codel and Fair Queueing, Fair Queueing, and be  measured by the delay time

classful qdisc
    PRIO            //Priority qdisc
    DRR             //Deficit round robin scheduler
    CBQ             //Class Based Queueing
    HTB             //Hierarchy Token Bucket
    HFSC            //Hierarchical Fair Service Curve's control under linux

filter支持以下分类器classifer:

fw:  基于firewall对packet设置的mark
u32: 基于报文头内容
route: 基于报文要去往的路由
rsvp,rsvp6: 基于RSVP
tcindex: 用于DSMARK
basic:  扩展的match分类方法,man tc-ematch

TC 流控算法

red:

Floyd, S., and Jacobson, V., Random Early Detection gateways for Congestion Avoidance. http://www.aciri.org/floyd/papers/red/red.html
Some changes to the algorithm by Alexey N. Kuznetsov.
Adaptive RED  : http://icir.org/floyd/papers/adaptiveRed.pdf

sfq:

Paul E. McKenney "Stochastic Fairness Queuing", IEEE INFOCOMM'90 Proceedings, San Francisco, 1990.
Paul E. McKenney "Stochastic Fairness Queuing", "Interworking: Research and Experience", v.2, 1991, p.113-131.
See also: M. Shreedhar and George Varghese "Efficient Fair Queuing using Deficit Round Robin", Proc. SIGCOMM 95.

sfb:

W. Feng, D. Kandlur, D. Saha, K. Shin, BLUE: A New Class of Active Queue Management Algorithms, U. Michigan CSE-TR-387-99, April 1999.

choke:

R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless Active Queue Management Scheme for Approximating Fair Bandwidth Allocation", IEEE INFOCOM, 2000.
A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial Characteristics", IEEE/ACM Transactions on Networking, 2004

codel:

Kathleen Nichols and Van Jacobson, "Controlling Queue Delay", ACM Queue, http://queue.acm.org/detail.cfm?id=2209336

DRR:

M. Shreedhar and George Varghese "Efficient Fair Queuing using Deficit Round Robin", Proc. SIGCOMM 95.

CBQ:

Sally Floyd and Van Jacobson, "Link-sharing and Resource Management Models for Packet Networks", IEEE/ACM Transactions on Networking, Vol.3, No.4, 1995
Sally Floyd, "Notes on CBQ and Guaranteed Service", 1995
Sally Floyd, "Notes on Class-Based Queueing: Setting Parameters", 1996
Sally Floyd and Michael Speer, "Experimental Results for Class-Based Queueing", 1998, not published.

HTB:

HTB website: http://luxik.cdi.cz/~devik/qos/htb/

HFSC:

http://en.wikipedia.org/wiki/Hierarchical_fair-service_curve

与 cgroup 结合

net_prio、 net_cls

为整个group中的程序产生的报文设置优先级

modprobe netprio_cgroup    
mount -t cgroup -onet_prio none /sys/fs/cgroup/net_prio
echo "eth0 5" > /sys/fs/cgroups/net_prio/iscsi/net_prio.ifpriomap

//iscsi组中的进程发送到eth0的报文具有的优先级是5

net_prio没有编译到内核中,首先需要加载模块: netprio_cgroup

通过 net_cls 为 group 中的程序产生的报文设置 tag(classid)。然后使用 tc 针对 classid 进行流量控制, iptables 也可以识别报文的 classid。

shaper

shaper

2.0.36版本中包含的整流模块,现在已经被移除。它通过创建一个虚拟设备——shaper, 然后将物理网卡直接绑定到shaper, 使从该物理网卡出去的流量满足shaper的限制。

可以用来作为参考。


推荐阅读

Copyright @2011-2019 All rights reserved. 转载请添加原文连接,合作请加微信lijiaocn或者发送邮件: [email protected],备注网站合作

友情链接:  系统软件  程序语言  运营经验  水库文集  网络课程  微信网文  发现知识星球