[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 5 May 2009 20:41:35 +0300
From: Vladimir Ivashchenko <hazard@...ncoudi.com>
To: Eric Dumazet <dada1@...mosbay.com>
Cc: netdev@...r.kernel.org
Subject: Re: bond + tc regression ?
> > On both kernels, the system is running with at least 70% idle CPU.
> > The network interrupts are distributed accross the cores.
>
> You should not distribute interrupts, but bound a NIC to one CPU
Kernels 2.6.28 and 2.6.29 do this by default, so I thought its correct.
The defaults are wrong?
I have tried with IRQs bound to one CPU per NIC. Same result.
> > I thought it was a e1000e driver issue, but tweaking e1000e ring buffers
> > didn't help. I tried using e1000 on 2.6.28 by adding necessary PCI IDs,
> > I tried running on a different server with bnx cards, I tried disabling
> > NO_HZ and HRTICK, but still I have the same problem.
> >
> > However, if I don't utilize bond, but just apply rules on normal ethX
> > interfaces, there is no packet loss with 2.6.28/29.
> >
> > So, the problem appears only when I use 2.6.28/29 + bond + classful tc
> > combination.
> >
> > Any ideas ?
> >
>
> Yes, we need much more information :)
> Is it a forwarding setup only ?
Yes, the server is doing nothing else but forwarding, no iptables.
> cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 130 0 0 0 0 0 0 0 IO-APIC-edge timer
1: 2 0 0 0 0 0 0 0 IO-APIC-edge i8042
3: 0 0 0 1 0 1 0 0 IO-APIC-edge
4: 0 0 1 0 0 0 1 0 IO-APIC-edge
9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi
12: 4 0 0 0 0 0 0 0 IO-APIC-edge i8042
14: 0 0 0 0 0 0 0 0 IO-APIC-edge ata_piix
15: 0 0 0 0 0 0 0 0 IO-APIC-edge ata_piix
17: 30901 31910 31446 30655 31618 30550 31543 30958 IO-APIC-fasteoi aacraid
20: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4
21: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb5, ahci
22: 298387 297642 295508 294368 295533 295430 295275 296036 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
23: 10868 10926 10980 10738 10939 10615 10761 10909 IO-APIC-fasteoi uhci_hcd:usb3
57: 1486251823 1486835830 1486677250 1487105983 1488000303 1485941815 1487728317 1486624997 PCI-MSI-edge eth0
58: 1510676329 1509708161 1510347202 1509969755 1508599471 1511220118 1509094578 1509727616 PCI-MSI-edge eth1
59: 1482578890 1483618556 1482963700 1483164528 1484561615 1482130645 1484116749 1483557717 PCI-MSI-edge eth2
60: 1507341647 1506685822 1506862759 1506612818 1505689367 1507559672 1505911622 1506940613 PCI-MSI-edge eth3
NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts
LOC: 1020533656 1020535165 1020533613 1020534967 1020535173 1020534409 1020534985 1020534220 Local timer interrupts
RES: 18605 21215 15957 18637 22429 19493 16649 15589 Rescheduling interrupts
CAL: 160 214 186 185 199 205 190 180 Function call interrupts
TLB: 259515 264126 309016 312222 263163 265601 306189 305430 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
ERR: 0
MIS: 0
> tc -s -d qdisc
For test sake, I just put "tc qdisc add dev $IFACE root handle 1: prio" and no filters at all.
I get the same with HTB "tc qdisc add dev $IFACE root handle 1: htb default 99" and no subclasses.
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 13287736273644 bytes 1263672018 pkt (dropped 0, overlimits 0 requeues 2928480094)
rate 0bit 0pps backlog 0b 0p requeues 2928480094
qdisc pfifo_fast 0: dev eth1 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 40064376195000 bytes 1747026586 pkt (dropped 0, overlimits 0 requeues 463621814)
rate 0bit 0pps backlog 0b 0p requeues 463621814
qdisc pfifo_fast 0: dev eth2 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 13350145517965 bytes 1350897201 pkt (dropped 0, overlimits 0 requeues 2930879507)
rate 0bit 0pps backlog 0b 0p requeues 2930879507
qdisc pfifo_fast 0: dev eth3 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 40193456126884 bytes 1950653764 pkt (dropped 0, overlimits 0 requeues 465511120)
rate 0bit 0pps backlog 0b 0p requeues 465511120
qdisc prio 1: dev bond0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 985164834 bytes 2720991 pkt (dropped 241834, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc prio 1: dev bond1 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 2347118738 bytes 3089171 pkt (dropped 304601, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
** Drops on bond0/bond1 are increasing by approximately 5000 per second:
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 13287874353796 bytes 1264050808 pkt (dropped 0, overlimits 0 requeues 2928520779)
rate 0bit 0pps backlog 0b 0p requeues 2928520779
qdisc pfifo_fast 0: dev eth1 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 40064706826018 bytes 1747459793 pkt (dropped 0, overlimits 0 requeues 463669610)
rate 0bit 0pps backlog 0b 0p requeues 463669610
qdisc pfifo_fast 0: dev eth2 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 13350283202695 bytes 1351277761 pkt (dropped 0, overlimits 0 requeues 2930918488)
rate 0bit 0pps backlog 0b 0p requeues 2930918488
qdisc pfifo_fast 0: dev eth3 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 40193784868074 bytes 1951084029 pkt (dropped 0, overlimits 0 requeues 465558015)
rate 0bit 0pps backlog 0b 0p requeues 465558015
qdisc prio 1: dev bond0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 1260929539 bytes 3480340 pkt (dropped 311145, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc prio 1: dev bond1 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 3006490946 bytes 3952643 pkt (dropped 396850, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
With same setup on 2.6.23, drops are increasing only by 50/sec or so.
As soon as I do "tc qdisc del dev $IFACE root", packet loss stops.
> cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 80
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 17
Partner Key: 4
Partner Mac Address: 00:19:e7:b2:07:80
Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:24:bd:e9:cc
Aggregator ID: 1
Slave Interface: eth2
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:24:bd:e9:ce
Aggregator ID: 1
> cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 80
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 2
Actor Key: 17
Partner Key: 5
Partner Mac Address: 00:19:e7:b2:07:80
Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:24:bd:e9:cd
Aggregator ID: 2
Slave Interface: eth3
MII Status: up
Link Failure Count: 2
Permanent HW addr: 00:1b:24:bd:e9:cf
Aggregator ID: 2
> mpstat -P ALL 10
08:04:36 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
08:04:46 PM all 0.00 0.00 0.01 0.00 0.00 1.05 0.00 98.94 70525.73
08:04:46 PM 0 0.00 0.00 0.00 0.00 0.00 0.70 0.00 99.30 7814.41
08:04:46 PM 1 0.00 0.00 0.00 0.00 0.00 2.10 0.00 97.90 7814.41
08:04:46 PM 2 0.00 0.00 0.00 0.00 0.00 0.20 0.00 99.80 7814.41
08:04:46 PM 3 0.00 0.00 0.10 0.00 0.00 1.30 0.00 98.60 7814.51
08:04:46 PM 4 0.00 0.00 0.00 0.00 0.00 0.50 0.00 99.50 7814.41
08:04:46 PM 5 0.00 0.00 0.00 0.00 0.00 1.90 0.00 98.10 7814.41
08:04:46 PM 6 0.00 0.00 0.00 0.00 0.00 0.60 0.00 99.40 7814.41
08:04:46 PM 7 0.00 0.00 0.10 0.00 0.00 0.90 0.00 99.00 7814.51
08:04:46 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
08:04:46 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
08:04:56 PM all 0.00 0.00 0.01 0.00 0.00 1.49 0.00 98.50 66429.30
08:04:56 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 7303.50
08:04:56 PM 1 0.00 0.00 0.00 0.00 0.00 1.60 0.00 98.40 7303.50
08:04:56 PM 2 0.00 0.00 0.00 0.00 0.00 1.20 0.00 98.80 7303.50
08:04:56 PM 3 0.00 0.00 0.00 0.00 0.00 3.20 0.00 96.80 7303.40
08:04:56 PM 4 0.00 0.00 0.00 0.00 0.00 1.90 0.00 98.10 7303.60
08:04:56 PM 5 0.00 0.00 0.00 0.00 0.00 1.20 0.00 98.80 7303.50
08:04:56 PM 6 0.00 0.00 0.10 0.00 0.00 1.80 0.00 98.10 7303.50
08:04:56 PM 7 0.00 0.00 0.00 0.00 0.00 1.20 0.00 98.80 7303.50
08:04:56 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> ifconfig -a
bond0 Link encap:Ethernet HWaddr 00:1B:24:BD:E9:CC
inet addr:xxx.xxx.135.44 Bcast:xxx.xxx.135.47 Mask:255.255.255.248
inet6 addr: fe80::21b:24ff:febd:e9cc/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:436076190 errors:0 dropped:391250 overruns:0 frame:0
TX packets:2620156321 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4210046233 (3.9 GiB) TX bytes:2520272242 (2.3 GiB)
bond1 Link encap:Ethernet HWaddr 00:1B:24:BD:E9:CD
inet addr:xxx.xxx.70.156 Bcast:xxx.xxx.70.159 Mask:255.255.255.248
inet6 addr: fe80::21b:24ff:febd:e9cd/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:239471641 errors:0 dropped:344 overruns:0 frame:0
TX packets:3704083902 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2488754745 (2.3 GiB) TX bytes:2685275089 (2.5 GiB)
eth0 Link encap:Ethernet HWaddr 00:1B:24:BD:E9:CC
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:2235085582 errors:0 dropped:353786 overruns:0 frame:0
TX packets:1266449269 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3768096439 (3.5 GiB) TX bytes:113363829 (108.1 MiB)
Memory:fc6e0000-fc700000
eth1 Link encap:Ethernet HWaddr 00:1B:24:BD:E9:CD
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:4228974804 errors:0 dropped:344 overruns:0 frame:0
TX packets:1750216649 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3350270261 (3.1 GiB) TX bytes:3358220645 (3.1 GiB)
Memory:fc6c0000-fc6e0000
eth2 Link encap:Ethernet HWaddr 00:1B:24:BD:E9:CC
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:2495958020 errors:0 dropped:37464 overruns:0 frame:0
TX packets:1353707165 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:442055526 (421.5 MiB) TX bytes:2406943933 (2.2 GiB)
Memory:fcde0000-fce00000
eth3 Link encap:Ethernet HWaddr 00:1B:24:BD:E9:CD
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:305464222 errors:0 dropped:0 overruns:0 frame:0
TX packets:1953867360 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3433479245 (3.1 GiB) TX bytes:3622113909 (3.3 GiB)
Memory:fcd80000-fcda0000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:53537 errors:0 dropped:0 overruns:0 frame:0
TX packets:53537 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:431006433 (411.0 MiB) TX bytes:431006433 (411.0 MiB)
NOTE: ifconfig drops on bond0/bond1 are *NOT* increasing. These drops are there from before.
--
Best Regards
Vladimir Ivashchenko
Chief Technology Officer
PrimeTel, Cyprus - www.prime-tel.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists