[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1504645381.3841.6.camel@redhat.com>
Date: Tue, 05 Sep 2017 23:03:01 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Eric Dumazet <eric.dumazet@...il.com>,
"David S. Miller" <davem@...emloft.net>
Cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>
Subject: Re: [PATCH net-next 2/4] ip/options: explicitly provide net ns to
__ip_options_echo()
On Tue, 2017-09-05 at 10:18 -0700, Eric Dumazet wrote:
> On Thu, 2017-08-03 at 18:07 +0200, Paolo Abeni wrote:
> > __ip_options_echo() uses the current network namespace, and
> > currently retrives it via skb->dst->dev.
> >
> > This commit adds an explicit 'net' argument to __ip_options_echo()
> > and update all the call sites to provide it, usually via a simpler
> > sock_net().
> >
> > After this change, __ip_options_echo() no more needs to access
> > skb->dst and we can drop a couple of hack to preserve such
> > info in the rx path.
> >
> > Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> > ---
>
> David, Paolo
>
> This commit (91ed1e666a4ea2e260452a7d7d311ac5ae852cba "ip/options:
> explicitly provide net ns to __ip_options_echo()")
>
> needs to be backported to linux-4.13 stable version to avoid these kind
> of crashes [1]
>
> This is because of MSG_PEEK operation, hitting skb_consume_udp() while
> skb is still in receive queue.
>
> Next read() finding again the skb then can see a NULL skb->dst
>
> Thanks !
>
> [1]
> general protection fault: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 3017 Comm: syzkaller446772 Not tainted 4.13.0+ #68
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> task: ffff8801cd0a4380 task.stack: ffff8801cc498000
> RIP: 0010:__ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143
> RSP: 0018:ffff8801cc49f628 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: ffff8801cc49f928 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004
> RBP: ffff8801cc49f6b8 R08: ffff8801cc49f936 R09: ffffed0039893f28
> R10: 0000000000000003 R11: ffffed0039893f27 R12: ffff8801cc49f918
> R13: ffff8801ccbcf36c R14: 000000000000000f R15: 0000000000000018
> FS: 0000000000979880(0000) GS:ffff8801db200000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000200c0ff0 CR3: 00000001cc4ed000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> ip_options_echo include/net/ip.h:574 [inline]
> ip_cmsg_recv_retopts net/ipv4/ip_sockglue.c:91 [inline]
> ip_cmsg_recv_offset+0xa17/0x1280 net/ipv4/ip_sockglue.c:207
> udp_recvmsg+0xe0b/0x1260 net/ipv4/udp.c:1641
> inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793
> sock_recvmsg_nosec net/socket.c:792 [inline]
> sock_recvmsg+0xc9/0x110 net/socket.c:799
> SYSC_recvfrom+0x2dc/0x570 net/socket.c:1788
> SyS_recvfrom+0x40/0x50 net/socket.c:1760
> entry_SYSCALL_64_fastpath+0x1f/0xbe
> RIP: 0033:0x444c89
> RSP: 002b:00007ffd80c788e8 EFLAGS: 00000286 ORIG_RAX: 000000000000002d
> RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000444c89
> RDX: 0000000000000000 RSI: 0000000020bc0000 RDI: 0000000000000004
> RBP: 0000000000000082 R08: 00000000200c0ff0 R09: 0000000000000010
> R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000402390
> R13: 0000000000402420 R14: 0000000000000000 R15: 0000000000000000
> Code: f6 c1 01 0f 85 a5 01 00 00 48 89 4d b8 e8 31 e9 6b fd 48 8b 4d b8
> 48 b8 00 00 00 00 00 fc ff df 48 83 e1 fe 48 89 ca 48 c1 ea 03 <80> 3c
> 02 00 0f 85 41 02 00 00 48 8b 09 48 b8 00 00 00 00 00 fc
> RIP: __ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 RSP:
> ffff8801cc49f628
> ---[ end trace b30d95b284222843 ]---
> Kernel panic - not syncing: Fatal exception
Thank you Eric for the report!
Darn me, I seriously messed-up with the stateless consume.
I think we can have similar issues pith ipsec/secpath and MSG_PEEK,
even if with less catastropthic outcome.
What about the following, which should cover both cases? (only compile
tested, I'll test it tomorrow morning my time)
---
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d67a8182e5eb..63df75ae70ee 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -885,7 +885,7 @@ void kfree_skb(struct sk_buff *skb);
void kfree_skb_list(struct sk_buff *segs);
void skb_tx_error(struct sk_buff *skb);
void consume_skb(struct sk_buff *skb);
-void consume_stateless_skb(struct sk_buff *skb);
+void __consume_stateless_skb(struct sk_buff *skb);
void __kfree_skb(struct sk_buff *skb);
extern struct kmem_cache *skbuff_head_cache;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e07556606284..f2411a8744d7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -753,14 +753,11 @@ EXPORT_SYMBOL(consume_skb);
* consume_stateless_skb - free an skbuff, assuming it is stateless
* @skb: buffer to free
*
- * Works like consume_skb(), but this variant assumes that all the head
- * states have been already dropped.
+ * Alike consume_skb(), but this variant assumes that all the head
+ * states have been already dropped and usage count is one
*/
-void consume_stateless_skb(struct sk_buff *skb)
+void __consume_stateless_skb(struct sk_buff *skb)
{
- if (!skb_unref(skb))
- return;
-
trace_consume_skb(skb);
if (likely(skb->head))
skb_release_data(skb);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 62344804baae..979e4d8526ba 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1386,12 +1386,15 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
unlock_sock_fast(sk, slow);
}
+ if (!skb_unref(skb))
+ return;
+
/* In the more common cases we cleared the head states previously,
* see __udp_queue_rcv_skb().
*/
if (unlikely(udp_skb_has_head_state(skb)))
skb_release_head_state(skb);
- consume_stateless_skb(skb);
+ __consume_stateless_skb(skb);
}
EXPORT_SYMBOL_GPL(skb_consume_udp);
Powered by blists - more mailing lists