linux-kernel - Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1349386058.16011.118.camel@edumazet-glaptop>
Date:	Thu, 04 Oct 2012 23:27:38 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	mbizon@...ebox.fr
Cc:	David Madore <david+ml@...ore.org>,
	Francois Romieu <romieu@...zoreil.com>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, Hugh Dickins <hughd@...gle.com>
Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109
 __alloc_pages_nodemask+0x1d4/0x68c()

On Thu, 2012-10-04 at 19:34 +0200, Maxime Bizon wrote:
> On Thu, 2012-10-04 at 19:17 +0200, Eric Dumazet wrote:
> 
> > > yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each
> > > packet forwarded has its headroom expanded, it is then recycled and gets
> > > its original default headroom back, then it gets forwarded,
> > > expanded, ...
> > 
> > Hmm, this sounds bad (especially without recycle)
> > 
> > Might I assume NET_SKB_PAD is 32 on this arch ?
> 
> It is, I have a setup with 6to4 tunneling, so needed headroom on tx is
> quite big.
> 

If we change NET_SKB_PAD minimum to be 64 (as it is on x86), it should
be enough for the added tunnel encapsulation or not ?


> I used to be careful about raising this value to avoid drivers using
> slab-4096 instead of slab-2048, but since our boards no longer have 16MB
> of RAM and with the recent changes in mainline it doesn't seem to be an
> issue anymore.

Yes, granted we can allocate order-3 pages for delivering skb->head
fragments, adding 32 bytes doesnt switch to slab-4096 since we dont use
it anymore.

> 
> It's not a that big issue in the non recycle case, just lower
> performance if the tunable is not set correctly. Though it would be nice
> to have a stat/counter so you know when you hit this kind of slow path.
> 

Yeah, we already mentioned this idea in the past. We are lazy now we
have good performance tools (perf)

> But on the recycle case, skb->head is reallocated to twice the size each
> time the packet is recycled and takes the same path again. This stresses
> the VM and you eventually get packet loss (and scary printk)
> 

OK, so to fix this on stable trees, skb_recycle() should not recycle skb
if skb->head is too big.

By the way, another problem with skb_recycle() is that skb->truesize can
be wrong as well. (One skb might had one frag, with a truesize of
2048/4096 bytes, and this frag was pulled in skb->head, so skb->truesize
is slightly wrong.

So we also must check if skb->truesize is equal to
SKB_TRUESIZE(skb_end_offset(skb)), or reset it in skb_recycle(),
I have no strong opinion.

Something like this (untested) patch :

But I really think we should remove skb_recycle() when net-next is
opened again.


diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b33a3a1..13ca215 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2659,7 +2659,10 @@ static inline bool skb_is_recycleable(const struct sk_buff *skb, int skb_size)
 	skb_size = SKB_DATA_ALIGN(skb_size + NET_SKB_PAD);
 	if (skb_end_offset(skb) < skb_size)
 		return false;
-
+	if (skb_end_offset(skb) > 2 * skb_size)
+		return false;
+	if (skb->truesize != SKB_TRUESIZE(skb_end_offset(skb)))
+		return false;
 	if (skb_shared(skb) || skb_cloned(skb))
 		return false;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/