lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 26 Jun 2009 19:57:39 +0200 From: Paweł Staszewski <pstaszewski@...are.pl> To: Eric Dumazet <dada1@...mosbay.com> CC: Jarek Poplawski <jarkao2@...il.com>, Eric Dumazet <eric.dumazet@...il.com>, Linux Network Development list <netdev@...r.kernel.org> Subject: Re: weird problem Paweł Staszewski pisze: > Eric Dumazet pisze: >> Jarek Poplawski a écrit : >> >>> On Fri, Jun 26, 2009 at 08:37:19AM +0000, Jarek Poplawski wrote: >>> >>>> On 25-06-2009 22:18, Eric Dumazet wrote: >>>> >>>>> Pawe? Staszewski a ?crit : >>>>> >>>>>> Ok >>>>>> >>>>>> After this day of observation im near 100% sure that this cpu >>>>>> load is >>>>>> made by route cahce flushes >>>>>> When route cache increase to its "net.ipv4.route.gc_thresh" size >>>>>> or is >>>>>> near that size >>>>>> system is starting to drop some routes from cache then cpu load is >>>>>> increase from 2% to near 80% >>>>>> after cleaning / flush cache when cache is filling cpu load is again >>>>>> normal 2% >>>>>> >>>>>> Someone know how to resolve this ? >>>>>> on kernels < 2.6.29 i don't see this, all start after upgrade from >>>>>> 2.6.28 to 2.6.29 - then i try 2.6.29.1 , 2.6.29.3 and 2.6.30 and >>>>>> on all >>>>>> this kernels >= 2.6.29 problem with cpu load is the same. >>>>>> >>>>>> I can minimize this cpu fluctuations by changing of route cache >>>>>> /proc >>>>>> parameters but the best result for my router was >>>>>> >>>>>> 15 sec of 2% cpu >>>>>> and after >>>>>> 15sec of 80% cpu >>>>>> >>>>>> >>>>>> Regards >>>>>> Pawel Staszewski >>>>>> >>>>> I believe this is known 2.6.29 regressions >>>>> >>>>> Following two commits should correct the problem you have >>>>> >>>>> Your best bet would be to try 2.6.31-rc1, and tell us if this >>>>> recent kernel >>>>> is ok on your machine ? >>>>> >>>> Btw., the first of these commits is in 2.6.30, which according to >>>> >>> And the second as well. >>> >>> >> >> Thanks Jarek. >> >> Pawel made some reports errors in fib thread, so I am not sure he really >> tried 2.6.30 and had same oprofile results. >> >> rt_worker_func() taking 13% of cpu0 is an alarm for me :) >> And 21% of cpu0 and 34% of cpu6 taken by oprofiled seems odd too... >> oprofile from: Linux TM_01_C1 2.6.29.5 #1 SMP Fri Jun 26 19:11:30 UTC 2009 x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux CPU: Core 2, speed 3000.21 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 Samples on CPU 0 Samples on CPU 1 Samples on CPU 2 Samples on CPU 3 Samples on CPU 4 Samples on CPU 5 Samples on CPU 6 Samples on CPU 7 samples % samples % samples % samples % samples % samples % samples % samples % image name app name symbol name 9999 19.0926 0 0 0 0 0 0 0 0 0 0 0 0 0 0 vmlinux vmlinux rt_worker_func 9017 17.2175 219 0.0277 13 0.0017 29 0.0814 26 0.3410 12 0.1420 13 0.1718 115 1.7649 vmlinux vmlinux free_block 8867 16.9311 13588 1.7177 11888 1.5423 9000 25.2731 801 10.5063 461 5.4550 788 10.4136 476 7.3051 vmlinux vmlinux mwait_idle 5714 10.9106 197 0.0249 143 0.0186 0 0 0 0 0 0 0 0 0 0 vmlinux vmlinux dst_destroy 2917 5.5699 95 0.0120 76 0.0099 33 0.0927 3 0.0393 0 0 2 0.0264 0 0 vmlinux vmlinux __call_rcu 2293 4.3784 74 0.0094 49 0.0064 48 0.1348 24 0.3148 12 0.1420 22 0.2907 12 0.1842 vmlinux vmlinux __rcu_process_callbacks 1816 3.4676 24921 3.1504 22351 2.8997 1090 3.0609 443 5.8106 297 3.5144 414 5.4711 382 5.8625 vmlinux vmlinux _raw_spin_lock 1055 2.0145 24 0.0030 21 0.0027 0 0 0 0 0 0 0 0 0 0 vmlinux vmlinux dst_rcu_free 669 1.2774 752 0.0951 717 0.0930 3223 9.0506 679 8.9061 1527 18.0689 232 3.0659 517 7.9343 libc-2.8.so libc-2.8.so (no symbols) 590 1.1266 1745 0.2206 1208 0.1567 83 0.2331 4 0.0525 5 0.0592 6 0.0793 4 0.0614 vmlinux vmlinux kmem_cache_free 568 1.0846 36 0.0046 29 0.0038 0 0 0 0 0 0 0 0 0 0 vmlinux vmlinux ipv4_dst_destroy 534 1.0196 583 0.0737 641 0.0832 691 1.9404 1402 18.3893 542 6.4134 934 12.3431 736 11.2953 vmlinux vmlinux tg_shares_up 457 0.8726 20 0.0025 19 0.0025 0 0 0 0 0 0 0 0 0 0 vmlinux vmlinux call_rcu_bh 422 0.8058 0 0 429 0.0557 0 0 0 0 845 9.9988 0 0 0 0 bgpd bgpd bgp_best_selection 397 0.7581 0 0 585 0.0759 0 0 153 2.0068 917 10.8508 0 0 0 0 bgpd bgpd bgp_route_next 339 0.6473 3085 0.3900 3301 0.4283 169 0.4746 20 0.2623 19 0.2248 36 0.4757 31 0.4758 vmlinux vmlinux _raw_spin_unlock 319 0.6091 3645 0.4608 3122 0.4050 0 0 0 0 0 0 0 0 0 0 vmlinux vmlinux local_bh_enable_ip 290 0.5537 0 0 40 0.0052 0 0 0 0 0 0 0 0 0 0 vmlinux vmlinux e1000e_update_stats 271 0.5175 152 0.0192 146 0.0189 371 1.0418 626 8.2109 192 2.2719 334 4.4139 165 2.5322 vmlinux vmlinux find_next_bit 228 0.4354 19 0.0024 7 9.1e-04 5 0.0140 15 0.1967 12 0.1420 10 0.1322 20 0.3069 vmlinux vmlinux rcu_process_callbacks 217 0.4144 0 0 352 0.0457 0 0 84 1.1018 534 6.3188 0 0 0 0 bgpd bgpd bgp_scan_timer 203 0.3876 694 0.0877 521 0.0676 6 0.0168 2 0.0262 0 0 2 0.0264 0 0 vmlinux vmlinux __phys_addr 191 0.3647 182 0.0230 66 0.0086 208 0.5841 608 7.9748 259 3.0647 268 3.5417 134 2.0565 vmlinux vmlinux find_busiest_group 186 0.3552 5221 0.6600 4432 0.5750 116 0.3257 131 1.7183 90 1.0650 122 1.6123 106 1.6268 vmlinux >> Pawel, could you give us : >> >> grep . /proc/sys/net/ipv4/route/* >> cat /proc/interrupts >> >> on your various kernels (previous to 2.6.29, 2.6.29, 2.6.30, ...) >> >> I suspect a change in hash table size, and/or change in interrupt >> affinities... >> >> >> > first machine: > Linux TM_01_C1 2.6.29.5 #1 SMP Fri Jun 26 19:11:30 UTC 2009 x86_64 > Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux > > grep . /proc/sys/net/ipv4/route/* > /proc/sys/net/ipv4/route/error_burst:1250 > /proc/sys/net/ipv4/route/error_cost:250 > /proc/sys/net/ipv4/route/gc_elasticity:4 > /proc/sys/net/ipv4/route/gc_interval:1 > /proc/sys/net/ipv4/route/gc_min_interval:0 > /proc/sys/net/ipv4/route/gc_min_interval_ms:0 > /proc/sys/net/ipv4/route/gc_thresh:190536 > /proc/sys/net/ipv4/route/gc_timeout:15 > /proc/sys/net/ipv4/route/max_size:524288 > /proc/sys/net/ipv4/route/min_adv_mss:256 > /proc/sys/net/ipv4/route/min_pmtu:552 > /proc/sys/net/ipv4/route/mtu_expires:600 > /proc/sys/net/ipv4/route/redirect_load:5 > /proc/sys/net/ipv4/route/redirect_number:9 > /proc/sys/net/ipv4/route/redirect_silence:5120 > /proc/sys/net/ipv4/route/secret_interval:3600 > > dmesg | grep route > IP route cache hash table entries: 262144 (order: 9, 2097152 bytes) > > > cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 > CPU5 CPU6 CPU7 > 0: 43 0 0 1 1 > 2 0 0 IO-APIC-edge timer > 1: 0 0 0 1 0 > 0 0 1 IO-APIC-edge i8042 > 9: 0 0 0 0 0 > 0 0 0 IO-APIC-fasteoi acpi > 14: 0 0 0 0 0 > 0 0 0 IO-APIC-edge ide0 > 15: 0 0 0 0 0 > 0 0 0 IO-APIC-edge ide1 > 29: 1139988 18351004 89662 3 0 > 1 0 3 PCI-MSI-edge eth0 > 30: 0 2 20221692 1 0 > 3 0 0 PCI-MSI-edge eth1 > 31: 0 1 1 0 0 > 0 0 0 PCI-MSI-edge > 32: 0 0 0 0 0 > 0 2 0 PCI-MSI-edge > 33: 1 1 0 0 0 > 0 0 0 PCI-MSI-edge > 34: 0 0 0 1 0 > 1 0 0 PCI-MSI-edge > 35: 0 0 0 1 0 > 0 0 1 PCI-MSI-edge > 36: 0 0 0 0 1 > 0 0 1 PCI-MSI-edge > 37: 1 0 0 0 0 > 1 0 0 PCI-MSI-edge > 38: 0 0 1 0 1 > 0 0 0 PCI-MSI-edge > 39: 0 0 2 0 0 > 0 0 0 PCI-MSI-edge > 40: 0 0 0 0 0 > 0 2 0 PCI-MSI-edge > 41: 0 2 0 0 0 > 0 0 0 PCI-MSI-edge > 42: 0 0 0 0 0 > 2 0 0 PCI-MSI-edge > 43: 0 0 0 2 0 > 0 0 0 PCI-MSI-edge > 44: 0 0 0 0 0 > 0 0 2 PCI-MSI-edge > 45: 2 0 0 0 0 > 0 0 0 PCI-MSI-edge > 46: 0 0 0 0 2 > 0 0 0 PCI-MSI-edge > 48: 233 200 185 257 256 > 260 269 257 PCI-MSI-edge ahci > 49: 0 1 1 0 0 > 2 1 0 PCI-MSI-edge ioat-msi > NMI: 0 0 0 0 0 > 0 0 0 Non-maskable interrupts > LOC: 1191321 26059516 25803111 64841 32718 > 26651 54058 24166 Local timer interrupts > RES: 921 59 58 20 14 > 8 10 13 Rescheduling interrupts > CAL: 20 85 88 87 90 > 90 91 86 Function call interrupts > TLB: 103 116 937 954 95 > 115 1006 1020 TLB shootdowns > SPU: 0 0 0 0 0 > 0 0 0 Spurious interrupts > ERR: 0 > MIS: 0 > > > second machine: > Linux TM_02_C1 2.6.30 #1 SMP Thu Jun 25 21:49:58 CEST 2009 i686 > Intel(R) Xeon(R) CPU 3075 @ 2.66GHz GenuineIntel GNU/Linux > > cat /proc/interrupts > CPU0 CPU1 > 0: 182 129 IO-APIC-edge timer > 1: 1886 1672 IO-APIC-edge i8042 > 6: 1 1 IO-APIC-edge floppy > 9: 0 0 IO-APIC-fasteoi acpi > 12: 2 2 IO-APIC-edge i8042 > 14: 0 0 IO-APIC-edge ide0 > 15: 0 0 IO-APIC-edge ide1 > 27: 41793 26401 PCI-MSI-edge ahci > 28: 13482 11260 PCI-MSI-edge eth2 > 29: 3 1326457765 PCI-MSI-edge eth1 > 30: 1240943198 137973134 PCI-MSI-edge eth0 > NMI: 0 0 Non-maskable interrupts > LOC: 1607938599 1514565603 Local timer interrupts > SPU: 0 0 Spurious interrupts > RES: 1098 1190 Rescheduling interrupts > CAL: 28 105 Function call interrupts > TLB: 2886 3055 TLB shootdowns > ERR: 0 > MIS: 0 > > grep . /proc/sys/net/ipv4/route/* > /proc/sys/net/ipv4/route/error_burst:1250 > /proc/sys/net/ipv4/route/error_cost:250 > /proc/sys/net/ipv4/route/gc_elasticity:4 > /proc/sys/net/ipv4/route/gc_interval:1 > /proc/sys/net/ipv4/route/gc_min_interval:0 > /proc/sys/net/ipv4/route/gc_min_interval_ms:0 > /proc/sys/net/ipv4/route/gc_thresh:190536 > /proc/sys/net/ipv4/route/gc_timeout:15 > /proc/sys/net/ipv4/route/max_size:1524288 > /proc/sys/net/ipv4/route/min_adv_mss:256 > /proc/sys/net/ipv4/route/min_pmtu:552 > /proc/sys/net/ipv4/route/mtu_expires:600 > /proc/sys/net/ipv4/route/redirect_load:5 > /proc/sys/net/ipv4/route/redirect_number:9 > /proc/sys/net/ipv4/route/redirect_silence:5120 > /proc/sys/net/ipv4/route/secret_interval:3600 > > > dmesg | grep route > IP route cache hash table entries: 262144 (order: 9, 2097152 bytes) > > rtstat -k entries -i 1 -c 10 > rt_cache| > entries| > 112754| > 112446| > 112277| > 111451| > 111042| > 110314| > 109153| > 108370| > 107730| > 107478| > > > > >> Change in hash table size comes from commit >> c9503e0fe052020e0294cd07d0ecd982eb7c9177 >> >> But as Pawel mentioned "net.ipv4.route.gc_thresh = 190536", I believe >> his hash table is smaller than 512k entries! >> >> Author: Anton Blanchard <anton@...ba.org> >> Date: Mon Apr 27 05:42:24 2009 -0700 >> >> ipv4: Limit size of route cache hash table >> >> Right now we have no upper limit on the size of the route cache >> hash table. >> On a 128GB POWER6 box it ends up as 32MB: >> >> IP route cache hash table entries: 4194304 (order: 9, >> 33554432 bytes) >> >> It would be nice to cap this for memory consumption reasons, but >> a massive >> hashtable also causes a significant spike when measuring OS jitter. >> >> With a 32MB hashtable and 4 million entries, rt_worker_func is >> taking >> 5 ms to complete. On another system with more memory it's taking >> 14 ms. >> Even though rt_worker_func does call cond_sched() to limit its >> impact, >> in an HPC environment we want to keep all sources of OS jitter to >> a minimum. >> >> With the patch applied we limit the number of entries to 512k which >> can still be overriden by using the rt_entries boot option: >> >> IP route cache hash table entries: 524288 (order: 6, 4194304 >> bytes) >> >> With this patch rt_worker_func now takes 0.460 ms on the same >> system. >> >> Signed-off-by: Anton Blanchard <anton@...ba.org> >> Acked-by: Eric Dumazet <dada1@...mosbay.com> >> Signed-off-by: David S. Miller <davem@...emloft.net> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@...r.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@...r.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists