[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1567546948.5576.68.camel@lca.pw>
Date: Tue, 03 Sep 2019 17:42:28 -0400
From: Qian Cai <cai@....pw>
To: Michal Hocko <mhocko@...nel.org>
Cc: Eric Dumazet <eric.dumazet@...il.com>, davem@...emloft.net,
netdev@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net/skbuff: silence warnings under memory pressure
On Tue, 2019-09-03 at 20:53 +0200, Michal Hocko wrote:
> On Tue 03-09-19 11:42:22, Qian Cai wrote:
> > On Tue, 2019-09-03 at 15:22 +0200, Michal Hocko wrote:
> > > On Fri 30-08-19 18:15:22, Eric Dumazet wrote:
> > > > If there is a risk of flooding the syslog, we should fix this
> > > > generically
> > > > in mm layer, not adding hundred of __GFP_NOWARN all over the places.
> > >
> > > We do already ratelimit in warn_alloc. If it isn't sufficient then we
> > > can think of a different parameters. Or maybe it is the ratelimiting
> > > which doesn't work here. Hard to tell and something to explore.
> >
> > The time-based ratelimit won't work for skb_build() as when a system under
> > memory pressure, and the CPU is fast and IO is so slow, it could take a long
> > time to swap and trigger OOM.
>
> I really do not understand what does OOM and swapping have to do with
> the ratelimiting here. The sole purpose of the ratelimit is to reduce
> the amount of warnings to be printed. Slow IO might have an effect on
> when the OOM killer is invoked but atomic allocations are not directly
> dependent on IO.
When there is a heavy memory pressure, the system is trying hard to reclaim
memory to fill up the watermark. However, the IO is slow to page out, but the
memory pressure keep draining atomic reservoir, and some of those skb_build()
will fail eventually.
Only if there is a fast IO, it will finish swapping sooner and then invoke the
OOM to end the memory pressure.
>
> > I suppose what happens is those skb_build() allocations are from softirq,
> > and
> > once one of them failed, it calls printk() which generates more interrupts.
> > Hence, the infinite loop.
>
> Please elaborate more.
>
If you look at the original report, the failed allocation dump_stack() is,
<IRQ>
warn_alloc.cold.43+0x8a/0x148
__alloc_pages_nodemask+0x1a5c/0x1bb0
alloc_pages_current+0x9c/0x110
allocate_slab+0x34a/0x11f0
new_slab+0x46/0x70
___slab_alloc+0x604/0x950
__slab_alloc+0x12/0x20
kmem_cache_alloc+0x32a/0x400
__build_skb+0x23/0x60
build_skb+0x1a/0xb0
igb_clean_rx_irq+0xafc/0x1010 [igb]
igb_poll+0x4bb/0xe30 [igb]
net_rx_action+0x244/0x7a0
__do_softirq+0x1a0/0x60a
irq_exit+0xb5/0xd0
do_IRQ+0x81/0x170
common_interrupt+0xf/0xf
</IRQ>
Since it has no __GFP_NOWARN to begin with, it will call,
printk
vprintk_default
vprintk_emit
wake_up_klogd
irq_work_queue
__irq_work_queue_local
arch_irq_work_raise
apic->send_IPI_self(IRQ_WORK_VECTOR)
smp_irq_work_interrupt
exiting_irq
irq_exit
and end up processing pending net_rx_action softirqs again which are plenty due
to connected via ssh etc.
Powered by blists - more mailing lists