《Linux_Performance_and_Tuning_Guidelines_IBM》对Linux系统的性能调优做了非常到位的介绍.
dump-a-linux-processs-memory-to-file
查看Linux进程内存情况:
查看进程69420的内存分布:
# pmap -X 69420
69420: /usr/bin/python /usr/bin/ceilometer-collector --logfile /export/log/ceilometer/collector.log
Address Perm Offset Device Inode Size Rss Pss Referenced Anonymous Swap Locked Mapping
00400000 r-xp 00000000 08:03 68901272 4 4 0 4 0 0 0 python2.7
00600000 r--p 00000000 08:03 68901272 4 4 0 4 4 0 0 python2.7
00601000 rw-p 00001000 08:03 68901272 4 4 0 0 4 0 0 python2.7
0137d000 rw-p 00000000 00:00 0 27372 27372 21440 26080 27372 0 0 [heap]
02e38000 rw-p 00000000 00:00 0 35865172 35865044 35865044 35863860 35865044 0 0 [heap]
7f60e2952000 rw-p 00000000 00:00 0 1048580 1048580 1048580 1048580 1048580 0 0
7f616215a000 r-xp 00000000 08:03 100665730 60 8 0 8 0 0 0 libbz2.so.1.0.6
7f6162169000 ---p 0000f000 08:03 100665730 2044 0 0 0 0 0 0 libbz2.so.1.0.6
7f6162368000 r--p 0000e000 08:03 100665730 4 4 4 4 4 0 0 libbz2.so.1.0.6
7f6162369000 rw-p 0000f000 08:03 100665730 4 4 4 4 4 0 0 libbz2.so.1.0.6
7f616236a000 r-xp 00000000 08:03 12508 28 20 1 20 0 0 0 bz2.so
...(省略)...
7fffb929c000 rw-p 00000000 00:00 0 136 64 51 52 64 0 0 [stack]
7fffb935a000 r-xp 00000000 00:00 0 8 4 0 4 0 0 0 [vdso]
ffffffffff600000 r-xp 00000000 00:00 0 4 0 0 0 0 0 0 [vsyscall]
======== ======== ======== ========== ========= ==== ======
37254756 36949424 36939538 36946020 36946928 0 0 KB
输出说明:
Address: start address of map
Perm: permissions on map: read, write, execute, shared, private (copy on write)
Offset: offset into the file
Device: device name (major:minor)
Inode:
Size: Kbytes: size of map in kilobytes
Rss: resident set size in kilobytes
Pss: private set size in kilobytes
Referenced:
Anonymous:
Swap:
Locked:
Mapping: file backing the map, or '[ anon ]' for allocated memory, or '[ stack ]' for the program stack
在进行网络配置和状态查看时, 大多数资料中都是用ifconfig, 然后阅读ifconfig的man手册时会发现,”This program is obsolete!”
Now, should try to use “ip” command, 手册页: man 8 ip
网络性能测量工具: netperf
评测网络性能的时候, 需要根据实际需求,考虑下面三种情况
1 Bulk data transfer -- 网络状态以数据传输为主的, 每个连接进行大量数据的传送,测量指标是每秒传输的数据量。
2 Request/response type -- 网络状态以请求/响应为主, 测量指标是每秒种完成的交互次数。
3 Concurrent session -- 并发会话数。
netperf似乎已经是一个很古老的工具了, 官方直接下载链接还是2.6的版本(不支持并发), 而支持并发的netperf4版本目前只有一个候选版本4.0.0rc2. 需要留心有没有其它类似工具.
netperf http://www.netperf.org/
下面是netperf的支持测试类型中的几种,我们实际关注也主要是下面的几个。其它的测量工具应当也提供有类似的测试类型。
唯一遗憾的是netperf不支持并发(只能使用同时运行多个的netperf进程的方式模拟并发), 支持并发的netperf4还没有正式发布(2014/06/25)
TCP_RR: 测试TCP交互的速率,即1s内完成了多少次交互
不包含连接建立的时间, 一次请求和对应的回复为一次交互
TCP_CC: 测试连接建立关闭的速率,即1s内完成了多少连接
只建立、关闭连接,不传输任何数据
TCP_CRR: 测试TCP实际交互的速率,即1s内完成了多少次交互
包含连接建立的时间,每个连接中只进行一次交互。(模拟了Http无状态协议的行为)
将TCP_CRR的测试结果与TCP_RR的测试结果对比, 可以观察到连接的建立的性能消耗情况
UDP_RR: 测试UDP交互的速率,即1s内完成了多少次交互
TCP_STREAM: 测试TCP数据的接收速率, 即向目标发送数据的速率
不包括连接建立的时间
TCP_MAERTS: 测试TCP数据发送的速率, 即从目标接收数据的速率
不包括连接建立的时间
TCP_SENDFILE: 测试TCP发送文件的速率
UDP_STREAM: 测试UDP数据的接收和发送速率, 即两台设备之间的数据传输速率
ethtool eth0
[root@localhost net]# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Speed: 100Mb/s //链路状态
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: on
Supports Wake-on: g
Wake-on: g
Current message level: 0x000000ff (255)
drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes
//使用lspci -vvv查看网卡型号,和使用的驱动
[root@localhost net]# lspci -vvv
...
09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express
Subsystem: IBM IBM System x3350 (Machine type 4192)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
...
...
Kernel driver in use: tg3
Kernel modules: tg3 //使用驱动名称
...
//使用modinfo命令查看
[root@localhost net]# modinfo tg3
filename: /lib/modules/2.6.32-358.el6.x86_64/kernel/drivers/net/tg3.ko
firmware: tigon/tg3_tso5.bin
firmware: tigon/tg3_tso.bin
firmware: tigon/tg3.bin
version: 3.124
license: GPL
description: Broadcom Tigon3 ethernet driver
author: David S. Miller ([email protected]) and Jeff Garzik ([email protected])
srcversion: 5B4678D7BA2EF54F9D0DA07
...
//使用dmesg查看启动日志
[root@localhost net]# dmesg|grep tg3
tg3.c:v3.124 (March 21, 2012)
tg3 0000:08:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
tg3 0000:08:00.0: setting latency timer to 64
tg3 0000:08:00.0: eth0: Tigon3 [partno(BCM95722) rev a200] (PCI Express) MAC address 00:1a:64:bf:23:c5
tg3 0000:08:00.0: eth0: attached PHY is 5722/5756 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
tg3 0000:08:00.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
tg3 0000:08:00.0: eth0: dma_rwctrl[76180000] dma_mask[64-bit]
tg3 0000:09:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
tg3 0000:09:00.0: setting latency timer to 64
tg3 0000:09:00.0: eth1: Tigon3 [partno(BCM95722) rev a200] (PCI Express) MAC address 00:1a:64:bf:23:c4
tg3 0000:09:00.0: eth1: attached PHY is 5722/5756 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
tg3 0000:09:00.0: eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
tg3 0000:09:00.0: eth1: dma_rwctrl[76180000] dma_mask[64-bit]
tg3 0000:08:00.0: irq 24 for MSI/MSI-X
tg3 0000:09:00.0: irq 25 for MSI/MSI-X
tg3 0000:08:00.0: eth0: Link is up at 100 Mbps, full duplex
tg3 0000:08:00.0: eth0: Flow control is off for TX and off for RX
tg3 0000:09:00.0: PME# enabled
tg3 0000:09:00.0: irq 25 for MSI/MSI-X
tg3 0000:09:00.0: irq 25 for MSI/MSI-X
//查看虚拟文件系统中对应驱动的目录
[root@localhost tg3]# ls /sys/module/tg3
drivers holders initstate notes refcnt sections srcversion version
内核文档linux-3.2.12\Documentation\networking\目录下有部分驱动的文档,例如e100、e1000等, 其中介绍了驱动的加载参数.
但是只有部分驱动有对应文档, 例如tg3的文档就没有在内核文档中找到.
网卡特性以及对应的驱动, 以后有机会需要进行专门的学习研究…
linux-3.2.12\Documentation\networking\e1000.txt中介绍了Intel的网卡驱动, 可以在加载时设置每秒中断数、中断延迟、receive/transmit buffer description num等重要参数
//查看方式, qlen即为发送队列长度
ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 2000
link/ether 00:1a:64:bf:23:c5 brd ff:ff:ff:ff:ff:ff
//设置方式
ip link set eth0 txqueuelen 2000 (或者: ifconfig eth0 txqueuelen 2000)
ip link set mtu 9216 (或者ifconfig eth0 mtu 9216)
链路两端的MTU值都需要调整
不同的网卡支持的最大MTU值可能不同, 我使用的Broadcom的网卡支持的最大MTU是91024, Intel的网卡可以支持到151024
//查看网卡功能特性的状态:
[root@localhost ipv4]# ethtool -k eth0
Features for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off
将网络功能下放到网卡,可以降低CPU的负担,但是特别注意,如果网卡的处理能力不足,反而因为承当不了大量工作导致网络的性能降低,所以一定要结合实际情况
echo 02 > /proc/irq/16/smp_affinity
//查看网卡中断号
[root@localhost pci:tg3]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:1A:64:BF:23:C5
inet addr:172.19.3.63 Bcast:172.19.255.255 Mask:255.255.0.0
inet6 addr: fe80::21a:64ff:febf:23c5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:93843329 errors:0 dropped:248 overruns:0 frame:0
TX packets:5973026 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:2000
RX bytes:14578221737 (13.5 GiB) TX bytes:5939141324 (5.5 GiB)
Interrupt:16 //中断号16
通过调整系统默认的参数提升性能.
/proc/sys/net/ipv4/tcp_mem
/proc/sys/net/ipv4/rmem_default
/proc/sys/net/ipv4/rmem_max
/proc/sys/net/ipv4/wmem_default
/proc/sys/net/ipv4/wmem_max
...
更多的参数以及它们的含义、默认值,查看linux手册:
man 7 socket
man 7 udp
man 7 tcp
TCP的接收窗口如果过小, 会增加发送端的等待时间, 链路也处于不饱和的状态
注意这些参数的调整主要影响到大包的传输效果, 对小包的影响较小,详情可查阅<文献1>的4.7节中的图表。文献1>
net.core.netdev_max_backlog
根据实际的场景,调节协议的行为,可以带来性能的改善,同时需要衡量系统的安全性。
/proc/sys目录中的配置项的含义可以在解压后的内核源码中找到: linux-3.2.12\Documentation\sysctl* 网络相关的文档位于: linux-3.2.12\Documentation\networking
例如ip-sysctl.txt
net.ipv4.conf.all.accept_source_route = 0 //don't accept packets with SRR option
net.ipv4.conf.eth0.accept_source_route = 0 //don't accept packets with SRR option
//[L|S]SRR选项见RFC791
net.ipv4.conf.all.secure_redirects = 1 //Accept ICMP redirect messages only for gateways, listed in default gateway list
net.ipv4.conf.eth0.secure_redirects = 1
net.ipv4.conf.all.accept_redirects = 0 //don't accept redirects
net.ipv4.conf.eth0.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0 //don't send redirects
net.ipv4.conf.eth0.send_redirects = 0
net.ipv4.ipfrag_low_thresh = XX //ip分片重组占用的内存的大小
net.ipv4.ipfrag_high_thresh = XX
net.ipv4.icmp_echo_ignore_broadcasts = 1 //ignore all ICMP ECHO and TIMESTAMP requests via broadcast/multicast
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.tcp_tw_recycle = 1 //Enable fast recycling TIME-WAIT sockets
net.ipv4.tcp_tw_reuse = 1 //Allow to reuse TIME-WAIT sockets for new connections
net.ipv4.tcp_fin_timeout = XX //time to hold socket int state FIN-WAIT-2, default is 60s
net.ipv4.tcp_keepalive_time = XX //How often TCP sends out keepalive messages, default is 2h
net.ipv4.tcp_max_syn_backlog = XX //Maximal number of remenbered connection requests,
//Which have not received an acknowledgment from connecting client
net.ipv4.tcp_syncookies = 1 //Sent out syncookies when the syn backlog queue of a socket overflow
net.ipv4.tcp_sack = 0 //don't enable select acknowledgment
net.ipv4.tcp_dsack = 0 //don't allow tcp to send "duplicate" sacks
net.ipv4.tcp_timestamps = 0 //don't enable timestamps
net.ipv4.tcp_window_scaling = 0 //don't enable window scaling
系统的一些功能可能会严重影响到性能, 例如netfilter。根据实际情况调整策略。
wait..
wait..
wait..
1.1.4 Process priority and nice level
进程有两个优先级:
dynamic priority: 由内核动态调整, 数值越高优先级越高. 对应缩写为PR.
nice level: nice level影响静态优先级, 范围19~-20, 数值越小,优先级越高. 对应缩写是NI.
2.3.1 top
top字段含义: 参见 man top. 书中对字段的介绍过时.
top交互式命令可以在top启动后, 键入h或?查看。
2.3.2 vmstat
查看procs、memory、swap、io、system、cpu的状态
2.3.3 uptime
uptime给出1min、5min、15min内的系统平均负载, 没有因为等待而浪费CPU
在top显示的第一行也有load average值.
2.3.4 ps and pstree
ps axjf //显示父子进程间的关系
pstree //显示进程树
2.3.6 iostat
iostat -x
2.3.7 sar
2.3.8 mpstat
统计cpu数据
mpstat -P ALL
2.3.9 numastat
2.3.10 pmap
打印进程内存布局
2.3.11 netstat
netstat -natuw
2.3.15 strace
跟踪系统调用和信号
strace -p <pid>
信息收集:
dmesg //收集到硬件、驱动信息
ulimit -a //查看shell和该shell启动的程序使用的限制值
//文件/etc/security/limits.conf中可以配置设置值
chkconfig --list //查看开启的服务, 使用chkconfig修改后不是立即生效.
SELinux是否开启 //SELinux保证一个进程只会影响自己分配到资源, 增加了安全性,同时也降低了性能
//内核启动参数 selinux=0 关闭
// /etc/selinux/config配置文件
// 如果开启, /selinux/avc/hash_stats
调整内核参数:
1 直接修改/proc/sys下内容
2 使用sysctl 和/etc/sysctl.conf
https://www.freedesktop.org/wiki/Software/polkit/
polkit is an application-level toolkit for defining and handling the policy that allows unprivileged processes to speak to privileged processes: It is a framework for centralizing the decision making process with respect to granting access to privileged operations for unprivileged applications
systemd入门教程:
http://www.ruanyifeng.com/blog/2016/03/systemd-tutorial-commands.html
systemd unit template:
https://fedoramagazine.org/systemd-template-unit-files/
注意对不同的CentOS版本操作不同。
CentOS6:
//安装rpm工具
[root@host]# yum install rpm-build redhat-rpm-config asciidoc hmaccalc perl-ExtUtils-Embed xmlto
[root@host]# yum install audit-libs-devel binutils-devel elfutils-devel elfutils-libelf-devel
[root@host]# yum install newt-devel python-devel zlib-devel
[user@host]$ rpm -i http://vault.centos.org/6.6/updates/Source/SPackages/kernel-2.6.32-504.8.1.el6.src.rpm //选择内核版本
//在~/目录下,会出现一个rpmbuild目录
[root@localhost rpmbuild]# ls
SOURCES SPECS
//执行下面操作后在~/rpmbuild/BUILD/看到源代码
[user@host]$ cd ~/rpmbuild/SPECS
[user@host SPECS]$ rpmbuild -bp --target=$(uname -m) kernel.spec