lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 28 May 2013 14:43:05 -0300 From: Rafael Aquini <aquini@...hat.com> To: Eric Dumazet <eric.dumazet@...il.com> Cc: Ben Greear <greearb@...delatech.com>, Francois Romieu <romieu@...zoreil.com>, atomlin@...hat.com, netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com, pshelar@...ira.com, mst@...hat.com, alexander.h.duyck@...el.com, riel@...hat.com, sergei.shtylyov@...entembedded.com, linux-kernel@...r.kernel.org Subject: Re: [Patch v2] skbuff: Hide GFP_ATOMIC page allocation failures for dropped packets On Tue, May 28, 2013 at 09:29:37AM -0700, Eric Dumazet wrote: > On Tue, 2013-05-28 at 13:15 -0300, Rafael Aquini wrote: > > > The real problem seems to be that more and more the network stack (drivers, perhaps) > > is relying on chunks of contiguous page-blocks without a fallback mechanism to > > order-0 page allocations. When memory gets fragmented, these alloc failures > > start to pop up more often and they scare ordinary sysadmins out of their paints. > > > > Where do you see that ? > > I see exactly the opposite trend. > > We have less and less buggy drivers, and we want to catch last > offenders. > Perhaps the explanation is because we're looking into old stuff bad effects, then. But just to list a few for your appreciation: -------------------------------------------------------- Apr 23 11:25:31 217-IDC kernel: httpd: page allocation failure. order:1, mode:0x20 Apr 23 11:25:31 217-IDC kernel: Pid: 19747, comm: httpd Not tainted 2.6.32-358.2.1.el6.x86_64 #1 Apr 23 11:25:31 217-IDC kernel: Call Trace: Apr 23 11:25:31 217-IDC kernel: <IRQ> [<ffffffff8112c207>] ? __alloc_pages_nodemask+0x757/0x8d0 Apr 23 11:25:31 217-IDC kernel: [<ffffffffa0337361>] ? bond_start_xmit+0x2f1/0x5d0 [bonding] .... -------------------------------------------------------- Apr 4 18:51:32 exton kernel: swapper: page allocation failure. order:1, mode:0x20 Apr 4 18:51:32 exton kernel: Pid: 0, comm: swapper Not tainted 2.6.32-279.19.1.el6.x86_64 #1 Apr 4 18:51:32 exton kernel: Call Trace: Apr 4 18:51:32 exton kernel: <IRQ> [<ffffffff811231ff>] ? __alloc_pages_nodemask+0x77f/0x940 Apr 4 18:51:32 exton kernel: [<ffffffff8115d1a2>] ? kmem_getpages+0x62/0x170 Apr 4 18:51:32 exton kernel: [<ffffffff8115ddba>] ? fallback_alloc+0x1ba/0x270 Apr 4 18:51:32 exton kernel: [<ffffffff8115d80f>] ? cache_grow+0x2cf/0x320 Apr 4 18:51:32 exton kernel: [<ffffffff8115db39>] ? ____cache_alloc_node+0x99/0x160 Apr 4 18:51:32 exton kernel: [<ffffffff8115ed00>] ? kmem_cache_alloc_node_trace+0x90/0x200 Apr 4 18:51:32 exton kernel: [<ffffffff8115ef1d>] ? __kmalloc_node+0x4d/0x60 Apr 4 18:51:32 exton kernel: [<ffffffff8141ea1d>] ? __alloc_skb+0x6d/0x190 Apr 4 18:51:32 exton kernel: [<ffffffff8141eb5d>] ? dev_alloc_skb+0x1d/0x40 Apr 4 18:51:32 exton kernel: [<ffffffffa04f5f50>] ? ipoib_cm_alloc_rx_skb+0x30/0x430 [ib_ipoib] Apr 4 18:51:32 exton kernel: [<ffffffffa04f71ef>] ? ipoib_cm_handle_rx_wc+0x29f/0x770 [ib_ipoib] Apr 4 18:51:32 exton kernel: [<ffffffffa03c6a46>] ? mlx4_ib_poll_cq+0x2c6/0x7f0 [mlx4_ib] .... -------------------------------------------------------- May 14 09:00:34 ifil03 kernel: swapper: page allocation failure. order:1, mode:0x20 May 14 09:00:34 ifil03 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.el6.x86_64 #1 May 14 09:00:34 ifil03 kernel: Call Trace: May 14 09:00:34 ifil03 kernel: <IRQ> [<ffffffff81123f0f>] ? __alloc_pages_nodemask+0x77f/0x940 May 14 09:00:34 ifil03 kernel: [<ffffffff8115ddc2>] ? kmem_getpages+0x62/0x170 May 14 09:00:34 ifil03 kernel: [<ffffffff8115e9da>] ? fallback_alloc+0x1ba/0x270 May 14 09:00:34 ifil03 kernel: [<ffffffff8115e42f>] ? cache_grow+0x2cf/0x320 May 14 09:00:34 ifil03 kernel: [<ffffffff8115e759>] ? ____cache_alloc_node+0x99/0x160 May 14 09:00:34 ifil03 kernel: [<ffffffff8115f53b>] ? kmem_cache_alloc+0x11b/0x190 May 14 09:00:34 ifil03 kernel: [<ffffffff8141f528>] ? sk_prot_alloc+0x48/0x1c0 May 14 09:00:34 ifil03 kernel: [<ffffffff8141f7b2>] ? sk_clone+0x22/0x2e0 May 14 09:00:34 ifil03 kernel: [<ffffffff8146ca26>] ? inet_csk_clone+0x16/0xd0 May 14 09:00:34 ifil03 kernel: [<ffffffff814858f3>] ? tcp_create_openreq_child+0x23/0x450 May 14 09:00:34 ifil03 kernel: [<ffffffff814832dd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0 May 14 09:00:34 ifil03 kernel: [<ffffffff814856b1>] ? tcp_check_req+0x201/0x420 May 14 09:00:34 ifil03 kernel: [<ffffffff8147b166>] ? tcp_rcv_state_process+0x116/0xa30 May 14 09:00:34 ifil03 kernel: [<ffffffff81482cfb>] ? tcp_v4_do_rcv+0x35b/0x430 May 14 09:00:34 ifil03 kernel: [<ffffffff81484471>] ? tcp_v4_rcv+0x4e1/0x860 May 14 09:00:34 ifil03 kernel: [<ffffffff814621fd>] ? ip_local_deliver_finish+0xdd/0x2d0 May 14 09:00:34 ifil03 kernel: [<ffffffff81462488>] ? ip_local_deliver+0x98/0xa0 May 14 09:00:34 ifil03 kernel: [<ffffffff8146194d>] ? ip_rcv_finish+0x12d/0x440 May 14 09:00:34 ifil03 kernel: [<ffffffff8101bd86>] ? intel_pmu_enable_all+0xa6/0x150 May 14 09:00:34 ifil03 kernel: [<ffffffff81461ed5>] ? ip_rcv+0x275/0x350 May 14 09:00:34 ifil03 kernel: [<ffffffff8142bedb>] ? __netif_receive_skb+0x49b/0x6e0 May 14 09:00:34 ifil03 kernel: [<ffffffff8142df88>] ? netif_receive_skb+0x58/0x60 May 14 09:00:34 ifil03 kernel: [<ffffffffa00a0a9e>] ? vmxnet3_rq_rx_complete+0x36e/0x880 [vmxnet3] .... -------------------------------------------------------- > > The big point of this change was to attempt to relief some of these warnings > > which we believed as being useless, since the net stack would recover from it > > by re-transmissions. > > We might have misjudged the scenario, though. Perhaps a better approach would be > > making the warning less verbose for all page-alloc failures. We could, perhaps, > > only print a stack-dump out, if some debug flag is passed along, either as > > reference, or by some CONFIG_DEBUG_ preprocessor directive. > > > warn_alloc_failed() uses the standard DEFAULT_RATELIMIT_INTERVAL which > is very small (5 * HZ) > > I would bump nopage_rs to somethin more reasonable, like one hour or one > day. > Neat! Worth to try, no doubts about that. Aaron? Cheers! -- Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists