[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A450955.1010806@itcare.pl>
Date: Fri, 26 Jun 2009 19:45:57 +0200
From: Paweł Staszewski <pstaszewski@...are.pl>
To: Eric Dumazet <dada1@...mosbay.com>
CC: Jarek Poplawski <jarkao2@...il.com>,
Eric Dumazet <eric.dumazet@...il.com>,
Linux Network Development list <netdev@...r.kernel.org>
Subject: Re: weird problem
Eric Dumazet pisze:
> Jarek Poplawski a écrit :
>
>> On Fri, Jun 26, 2009 at 08:37:19AM +0000, Jarek Poplawski wrote:
>>
>>> On 25-06-2009 22:18, Eric Dumazet wrote:
>>>
>>>> Pawe? Staszewski a ?crit :
>>>>
>>>>> Ok
>>>>>
>>>>> After this day of observation im near 100% sure that this cpu load is
>>>>> made by route cahce flushes
>>>>> When route cache increase to its "net.ipv4.route.gc_thresh" size or is
>>>>> near that size
>>>>> system is starting to drop some routes from cache then cpu load is
>>>>> increase from 2% to near 80%
>>>>> after cleaning / flush cache when cache is filling cpu load is again
>>>>> normal 2%
>>>>>
>>>>> Someone know how to resolve this ?
>>>>> on kernels < 2.6.29 i don't see this, all start after upgrade from
>>>>> 2.6.28 to 2.6.29 - then i try 2.6.29.1 , 2.6.29.3 and 2.6.30 and on all
>>>>> this kernels >= 2.6.29 problem with cpu load is the same.
>>>>>
>>>>> I can minimize this cpu fluctuations by changing of route cache /proc
>>>>> parameters but the best result for my router was
>>>>>
>>>>> 15 sec of 2% cpu
>>>>> and after
>>>>> 15sec of 80% cpu
>>>>>
>>>>>
>>>>> Regards
>>>>> Pawel Staszewski
>>>>>
>>>> I believe this is known 2.6.29 regressions
>>>>
>>>> Following two commits should correct the problem you have
>>>>
>>>> Your best bet would be to try 2.6.31-rc1, and tell us if this recent kernel
>>>> is ok on your machine ?
>>>>
>>> Btw., the first of these commits is in 2.6.30, which according to
>>>
>> And the second as well.
>>
>>
>
> Thanks Jarek.
>
> Pawel made some reports errors in fib thread, so I am not sure he really
> tried 2.6.30 and had same oprofile results.
>
> rt_worker_func() taking 13% of cpu0 is an alarm for me :)
> And 21% of cpu0 and 34% of cpu6 taken by oprofiled seems odd too...
>
> Pawel, could you give us :
>
> grep . /proc/sys/net/ipv4/route/*
> cat /proc/interrupts
>
> on your various kernels (previous to 2.6.29, 2.6.29, 2.6.30, ...)
>
> I suspect a change in hash table size, and/or change in interrupt affinities...
>
>
>
first machine:
Linux TM_01_C1 2.6.29.5 #1 SMP Fri Jun 26 19:11:30 UTC 2009 x86_64
Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
grep . /proc/sys/net/ipv4/route/*
/proc/sys/net/ipv4/route/error_burst:1250
/proc/sys/net/ipv4/route/error_cost:250
/proc/sys/net/ipv4/route/gc_elasticity:4
/proc/sys/net/ipv4/route/gc_interval:1
/proc/sys/net/ipv4/route/gc_min_interval:0
/proc/sys/net/ipv4/route/gc_min_interval_ms:0
/proc/sys/net/ipv4/route/gc_thresh:190536
/proc/sys/net/ipv4/route/gc_timeout:15
/proc/sys/net/ipv4/route/max_size:524288
/proc/sys/net/ipv4/route/min_adv_mss:256
/proc/sys/net/ipv4/route/min_pmtu:552
/proc/sys/net/ipv4/route/mtu_expires:600
/proc/sys/net/ipv4/route/redirect_load:5
/proc/sys/net/ipv4/route/redirect_number:9
/proc/sys/net/ipv4/route/redirect_silence:5120
/proc/sys/net/ipv4/route/secret_interval:3600
dmesg | grep route
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4
CPU5 CPU6 CPU7
0: 43 0 0 1 1
2 0 0 IO-APIC-edge timer
1: 0 0 0 1 0
0 0 1 IO-APIC-edge i8042
9: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi acpi
14: 0 0 0 0 0
0 0 0 IO-APIC-edge ide0
15: 0 0 0 0 0
0 0 0 IO-APIC-edge ide1
29: 1139988 18351004 89662 3 0
1 0 3 PCI-MSI-edge eth0
30: 0 2 20221692 1 0
3 0 0 PCI-MSI-edge eth1
31: 0 1 1 0 0
0 0 0 PCI-MSI-edge
32: 0 0 0 0 0
0 2 0 PCI-MSI-edge
33: 1 1 0 0 0
0 0 0 PCI-MSI-edge
34: 0 0 0 1 0
1 0 0 PCI-MSI-edge
35: 0 0 0 1 0
0 0 1 PCI-MSI-edge
36: 0 0 0 0 1
0 0 1 PCI-MSI-edge
37: 1 0 0 0 0
1 0 0 PCI-MSI-edge
38: 0 0 1 0 1
0 0 0 PCI-MSI-edge
39: 0 0 2 0 0
0 0 0 PCI-MSI-edge
40: 0 0 0 0 0
0 2 0 PCI-MSI-edge
41: 0 2 0 0 0
0 0 0 PCI-MSI-edge
42: 0 0 0 0 0
2 0 0 PCI-MSI-edge
43: 0 0 0 2 0
0 0 0 PCI-MSI-edge
44: 0 0 0 0 0
0 0 2 PCI-MSI-edge
45: 2 0 0 0 0
0 0 0 PCI-MSI-edge
46: 0 0 0 0 2
0 0 0 PCI-MSI-edge
48: 233 200 185 257 256
260 269 257 PCI-MSI-edge ahci
49: 0 1 1 0 0
2 1 0 PCI-MSI-edge ioat-msi
NMI: 0 0 0 0 0
0 0 0 Non-maskable interrupts
LOC: 1191321 26059516 25803111 64841 32718
26651 54058 24166 Local timer interrupts
RES: 921 59 58 20 14
8 10 13 Rescheduling interrupts
CAL: 20 85 88 87 90
90 91 86 Function call interrupts
TLB: 103 116 937 954 95
115 1006 1020 TLB shootdowns
SPU: 0 0 0 0 0
0 0 0 Spurious interrupts
ERR: 0
MIS: 0
second machine:
Linux TM_02_C1 2.6.30 #1 SMP Thu Jun 25 21:49:58 CEST 2009 i686 Intel(R)
Xeon(R) CPU 3075 @ 2.66GHz GenuineIntel GNU/Linux
cat /proc/interrupts
CPU0 CPU1
0: 182 129 IO-APIC-edge timer
1: 1886 1672 IO-APIC-edge i8042
6: 1 1 IO-APIC-edge floppy
9: 0 0 IO-APIC-fasteoi acpi
12: 2 2 IO-APIC-edge i8042
14: 0 0 IO-APIC-edge ide0
15: 0 0 IO-APIC-edge ide1
27: 41793 26401 PCI-MSI-edge ahci
28: 13482 11260 PCI-MSI-edge eth2
29: 3 1326457765 PCI-MSI-edge eth1
30: 1240943198 137973134 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 1607938599 1514565603 Local timer interrupts
SPU: 0 0 Spurious interrupts
RES: 1098 1190 Rescheduling interrupts
CAL: 28 105 Function call interrupts
TLB: 2886 3055 TLB shootdowns
ERR: 0
MIS: 0
grep . /proc/sys/net/ipv4/route/*
/proc/sys/net/ipv4/route/error_burst:1250
/proc/sys/net/ipv4/route/error_cost:250
/proc/sys/net/ipv4/route/gc_elasticity:4
/proc/sys/net/ipv4/route/gc_interval:1
/proc/sys/net/ipv4/route/gc_min_interval:0
/proc/sys/net/ipv4/route/gc_min_interval_ms:0
/proc/sys/net/ipv4/route/gc_thresh:190536
/proc/sys/net/ipv4/route/gc_timeout:15
/proc/sys/net/ipv4/route/max_size:1524288
/proc/sys/net/ipv4/route/min_adv_mss:256
/proc/sys/net/ipv4/route/min_pmtu:552
/proc/sys/net/ipv4/route/mtu_expires:600
/proc/sys/net/ipv4/route/redirect_load:5
/proc/sys/net/ipv4/route/redirect_number:9
/proc/sys/net/ipv4/route/redirect_silence:5120
/proc/sys/net/ipv4/route/secret_interval:3600
dmesg | grep route
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
rtstat -k entries -i 1 -c 10
rt_cache|
entries|
112754|
112446|
112277|
111451|
111042|
110314|
109153|
108370|
107730|
107478|
> Change in hash table size comes from commit c9503e0fe052020e0294cd07d0ecd982eb7c9177
>
> But as Pawel mentioned "net.ipv4.route.gc_thresh = 190536", I believe
> his hash table is smaller than 512k entries!
>
> Author: Anton Blanchard <anton@...ba.org>
> Date: Mon Apr 27 05:42:24 2009 -0700
>
> ipv4: Limit size of route cache hash table
>
> Right now we have no upper limit on the size of the route cache hash table.
> On a 128GB POWER6 box it ends up as 32MB:
>
> IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)
>
> It would be nice to cap this for memory consumption reasons, but a massive
> hashtable also causes a significant spike when measuring OS jitter.
>
> With a 32MB hashtable and 4 million entries, rt_worker_func is taking
> 5 ms to complete. On another system with more memory it's taking 14 ms.
> Even though rt_worker_func does call cond_sched() to limit its impact,
> in an HPC environment we want to keep all sources of OS jitter to a minimum.
>
> With the patch applied we limit the number of entries to 512k which
> can still be overriden by using the rt_entries boot option:
>
> IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)
>
> With this patch rt_worker_func now takes 0.460 ms on the same system.
>
> Signed-off-by: Anton Blanchard <anton@...ba.org>
> Acked-by: Eric Dumazet <dada1@...mosbay.com>
> Signed-off-by: David S. Miller <davem@...emloft.net>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists