当一台机器上运行多个需要网络IO的程序, 如果一个程序通行量很大,会影响到其它程序的网络通信。需要对程序的网络流量进行控制。
通过 setsockopt 设置Socket SO_PRIORITY 选项,是一种比较模糊的方法。
SO_PRIORITY //man 7 socket
Set the protocol-defined priority for all packets to be sent on this socket.
Linux uses this value to order the networking queues: packets with a higher
priority may be processed first depending on the selected device queueing
discipline. For ip(7), this also sets the IP type-of-service (TOS) field for
outgoing packets. Setting a priority outside the range 0 to 6 requires the
CAP_NET_ADMIN capability.
更好的方式是用 TC—— traffic control。
TC的用法很复杂,而它的文档分布的又很分散,以至于汇总一下TC的资料都是很有意义的。
TC程序中使用说明:
//一级命令
tc help
//二级级命令
tc [qdisc|class|filter] help
//三级命令
tc qdisc add [stab|pfifo|bfifo|tbf|prio|htb|cbq|red|...] help
tc class add [htb|cbq|...] help
tc filter add [rsvp|u32|fw|route|basic|...] help
TC的Manual(Man手册):
//入口
tc(8)
//子项
tc-bfifo(8)
tc-cbq(8)
tc-choke(8)
tc-codel(8)
tc-drr(8)
tc-ematch(8)
tc-fq_codel(8)
tc-hfsc(7)
tc-hfsc(8)
tc-htb(8)
tc-pfifo(8)
tc-pfifo_fast(8)
tc-red(8)
tc-sfb(8)
tc-sfq(8)
tc-stab(8)
tc-tbf(8)
其他资料:
tcng是一个生成tc指令的工具,用来简化tc的使用。
TC本身只是一个与Linux Kernel交互的工具,执行流量控制的是Kernel。TC可以对进流量做有限的控制,主要用途是控制出去的流量。
TC的整套配置策略由qdisc、class、filter三部分组成。
qdisc: 报文队列的处理策略,分为classful和classless
classful的qdisc中可以包含class
classless的qdisc是最终送出报文的队列的策略
class: class必须是隶属于某个类型是classful的qdisc,class自身的类型也必须是classful
它的majorID等同于所属的qdisc的MajorID。
class中可以包含一个子qdisc(classful或者classless)或者一个class
class中必须有且只有一个classless的qdisc, 一般默认为pfifo
class可以限制接收的报文的类型(优先级低于filter),设置方式视具体class类型而定
子qdisc是一个完备的qdisc,如果它是classful的,可以继续包含class
filter: filter必须隶属某个classful qdisc, 用于告知满足过滤条件的报文的去处
报文的去处可以是一个class,也可以是一个qdisc
qdisc的种类:
classless qdisc
pfifo_fast //默认策略,three bands with different priority, first in first out
bfifo/pfifo //one queue, first in first out, tail drop
red //Random Early Detection:
// When the queue's length reaches the max value , packet is dropped/marked at the designated probability
sfq //Stochastic Fairness Queueing:
// packets will be dipatched into different buckets by the hash value,
// buckets is queried in a round robin fashion when dequeuing
sfb //?? Stochastic Fair Blue:
// 8 levels of 16 bins, each bin maintains a marking probability
// BLUE: A New Class of Active Queue Management Algorithms
tbf //Token Bucket Filter
// traffic is filtered based on the expenditure of tockens
// strict shaper, non-work-conserving
choke //Choose and keep Scheduler
// once the queue hits a certain average length, a random packet is drawn from the queue
// if both the to-be-queued and the drawn packet belong to the same flow, both packets are dropped
codel //Controlled-Delay Active Queue Management algorithm
// use the local mininum queue as a measure of the standing/persistent queue
// use a single state-tracking variable of the mininum delay
// measured in packet-sojourn time in the queue
fq_codel //Fair Queuing with Controlled Delay
// combine codel and Fair Queueing, Fair Queueing, and be measured by the delay time
classful qdisc
PRIO //Priority qdisc
DRR //Deficit round robin scheduler
CBQ //Class Based Queueing
HTB //Hierarchy Token Bucket
HFSC //Hierarchical Fair Service Curve's control under linux
filter支持以下分类器classifer:
fw: 基于firewall对packet设置的mark
u32: 基于报文头内容
route: 基于报文要去往的路由
rsvp,rsvp6: 基于RSVP
tcindex: 用于DSMARK
basic: 扩展的match分类方法,man tc-ematch
red:
Floyd, S., and Jacobson, V., Random Early Detection gateways for Congestion Avoidance. http://www.aciri.org/floyd/papers/red/red.html
Some changes to the algorithm by Alexey N. Kuznetsov.
Adaptive RED : http://icir.org/floyd/papers/adaptiveRed.pdf
sfq:
Paul E. McKenney "Stochastic Fairness Queuing", IEEE INFOCOMM'90 Proceedings, San Francisco, 1990.
Paul E. McKenney "Stochastic Fairness Queuing", "Interworking: Research and Experience", v.2, 1991, p.113-131.
See also: M. Shreedhar and George Varghese "Efficient Fair Queuing using Deficit Round Robin", Proc. SIGCOMM 95.
sfb:
W. Feng, D. Kandlur, D. Saha, K. Shin, BLUE: A New Class of Active Queue Management Algorithms, U. Michigan CSE-TR-387-99, April 1999.
choke:
R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless Active Queue Management Scheme for Approximating Fair Bandwidth Allocation", IEEE INFOCOM, 2000.
A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial Characteristics", IEEE/ACM Transactions on Networking, 2004
codel:
Kathleen Nichols and Van Jacobson, "Controlling Queue Delay", ACM Queue, http://queue.acm.org/detail.cfm?id=2209336
DRR:
M. Shreedhar and George Varghese "Efficient Fair Queuing using Deficit Round Robin", Proc. SIGCOMM 95.
CBQ:
Sally Floyd and Van Jacobson, "Link-sharing and Resource Management Models for Packet Networks", IEEE/ACM Transactions on Networking, Vol.3, No.4, 1995
Sally Floyd, "Notes on CBQ and Guaranteed Service", 1995
Sally Floyd, "Notes on Class-Based Queueing: Setting Parameters", 1996
Sally Floyd and Michael Speer, "Experimental Results for Class-Based Queueing", 1998, not published.
HTB:
HTB website: http://luxik.cdi.cz/~devik/qos/htb/
HFSC:
http://en.wikipedia.org/wiki/Hierarchical_fair-service_curve
为整个group中的程序产生的报文设置优先级
modprobe netprio_cgroup
mount -t cgroup -onet_prio none /sys/fs/cgroup/net_prio
echo "eth0 5" > /sys/fs/cgroups/net_prio/iscsi/net_prio.ifpriomap
//iscsi组中的进程发送到eth0的报文具有的优先级是5
net_prio没有编译到内核中,首先需要加载模块: netprio_cgroup
通过 net_cls 为 group 中的程序产生的报文设置 tag(classid)。然后使用 tc 针对 classid 进行流量控制, iptables 也可以识别报文的 classid。
2.0.36版本中包含的整流模块,现在已经被移除。它通过创建一个虚拟设备——shaper, 然后将物理网卡直接绑定到shaper, 使从该物理网卡出去的流量满足shaper的限制。
可以用来作为参考。