[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89i++mOSPmMbHP5jjwG04rD2cx6HHCyD0-J3f5u7=TQ-U1w@mail.gmail.com>
Date: Wed, 11 Sep 2019 22:59:25 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Christoph Paasch <christoph.paasch@...il.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: "David S . Miller" <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Soheil Hassas Yeganeh <soheil@...gle.com>,
Neal Cardwell <ncardwell@...gle.com>,
Eric Dumazet <eric.dumazet@...il.com>,
Jason Baron <jbaron@...mai.com>,
Vladimir Rutsky <rutsky@...gle.com>
Subject: Re: [PATCH net] tcp: remove empty skb from write queue in error cases
On Wed, Sep 11, 2019 at 7:36 PM Christoph Paasch
<christoph.paasch@...il.com> wrote:
>
> Hello,
>
> On Mon, Aug 26, 2019 at 11:04 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > Vladimir Rutsky reported stuck TCP sessions after memory pressure
> > events. Edge Trigger epoll() user would never receive an EPOLLOUT
> > notification allowing them to retry a sendmsg().
> >
> > Jason tested the case of sk_stream_alloc_skb() returning NULL,
> > but there are other paths that could lead both sendmsg() and sendpage()
> > to return -1 (EAGAIN), with an empty skb queued on the write queue.
> >
> > This patch makes sure we remove this empty skb so that
> > Jason code can detect that the queue is empty, and
> > call sk->sk_write_space(sk) accordingly.
> >
> > Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > Cc: Jason Baron <jbaron@...mai.com>
> > Reported-by: Vladimir Rutsky <rutsky@...gle.com>
> > Cc: Soheil Hassas Yeganeh <soheil@...gle.com>
> > Cc: Neal Cardwell <ncardwell@...gle.com>
> > ---
> > net/ipv4/tcp.c | 30 ++++++++++++++++++++----------
> > 1 file changed, 20 insertions(+), 10 deletions(-)
>
> I got syzkaller complaining now on 4.14.143 with the following reproducer:
>
> # {Threaded:true Collide:true Repeat:true RepeatTimes:0 Procs:1
> Sandbox: Fault:false FaultCall:-1 FaultNth:0 EnableTun:false
> UseTmpDir:false EnableCgroups:false EnableNetdev:false ResetNet:false
> HandleSegv:false Repro:false Trace:false}
> r0 = socket$inet_tcp(0x2, 0x1, 0x0)
> setsockopt$inet_tcp_TCP_REPAIR(r0, 0x6, 0x13, &(0x7f0000000040)=0x1, 0x4)
> setsockopt$inet_tcp_TCP_REPAIR_QUEUE(r0, 0x6, 0x14, &(0x7f00000012c0)=0x2, 0x4)
> setsockopt$inet_tcp_int(r0, 0x6, 0x19, &(0x7f0000000000)=0x9, 0x4)
> setsockopt$inet_tcp_TCP_MD5SIG(r0, 0x6, 0xe,
> &(0x7f00000001c0)={@...{{0x2, 0x0, @empty}}, 0x0, 0x2, 0x0,
> "c157cf4809151e5e89cfd6d934fbe981ec8ff6afc252ccf486c325c7ff3d35f3a89412a5cb6430e169092617df2ba65bf0ab844572e4e7dd4ece8ec1de5ac1ccd870067b018cb3b1f05f2391d872b67d"},
> 0xd8)
> connect$inet(r0, &(0x7f0000000080)={0x2, 0x0, @dev={0xac, 0x14, 0x14,
> 0x1d}}, 0x10)
> sendto(r0, 0x0, 0x87, 0x0, 0x0, 0x391)
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN PTI
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 2529 Comm: syz-executor709 Not tainted 4.14.143 #5
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
> task: ffff8880677fdc00 task.stack: ffff8880642b0000
> RIP: 0010:tcp_sendmsg_locked+0x6b4/0x4390 net/ipv4/tcp.c:1350
> RSP: 0018:ffff8880642bf718 EFLAGS: 00010206
> RAX: 0000000000000014 RBX: 0000000000000087 RCX: ffff88806a794f50
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000a0
> RBP: ffff8880642bfaa8 R08: 0000000000000006 R09: ffff8880677fe3a0
> R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
> R13: ffff88806a794f50 R14: ffff88806a794d00 R15: 0000000000000087
> FS: 00007f644b697700(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffcd37370b0 CR3: 00000000679f2006 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> tcp_sendmsg+0x2a/0x40 net/ipv4/tcp.c:1533
> inet_sendmsg+0x173/0x4e0 net/ipv4/af_inet.c:784
> sock_sendmsg_nosec net/socket.c:646 [inline]
> sock_sendmsg+0xc3/0x100 net/socket.c:656
> SYSC_sendto+0x35d/0x5e0 net/socket.c:1766
> do_syscall_64+0x241/0x680 arch/x86/entry/common.c:292
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x7f644afc6469
> RSP: 002b:00007f644b696f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000602130 RCX: 00007f644afc6469
> RDX: 0000000000000087 RSI: 0000000000000000 RDI: 0000000000000003
> RBP: 0000000000602138 R08: 0000000000000000 R09: 0000000000000391
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000060213c
> R13: 00007ffcd373700f R14: 00007f644b677000 R15: 0000000000000003
> Code: 74 08 3c 03 0f 8e f1 32 00 00 8b 85 98 fd ff ff 89 85 60 fd ff
> ff 48 8b 85 70 fd ff ff 48 8d b8 a0 00 00 00 48 89 f8 48 c1 e8 03 <42>
> 0f b6 04 20 84 c0 74 06 0f 8e d2 32 00 00 4c 8b bd 70 fd ff
> RIP: tcp_sendmsg_locked+0x6b4/0x4390 net/ipv4/tcp.c:1350 RSP: ffff8880642bf718
> ---[ end trace 70f07f242cd3b9d8 ]---
>
>
> It's because skb is NULL in tcp_sendmsg_locked at:
> skb = tcp_write_queue_tail(sk);
> if (tcp_send_head(sk)) {
> if (skb->ip_summed == CHECKSUM_NONE)
>
>
> I think we need this here on pre-rb-tree kernels :
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 5ce069ce2a97..efe767e20d01 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -924,8 +924,7 @@ static void tcp_remove_empty_skb(struct sock *sk,
> struct sk_buff *skb)
> {
> if (skb && !skb->len) {
> tcp_unlink_write_queue(skb, sk);
> - if (tcp_write_queue_empty(sk))
> - tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
> + tcp_check_send_head(sk, skb);
> sk_wmem_free_skb(sk, skb);
> }
> }
>
> Does that look good?
>
Yes the backport to 4.14.143 was not done properly.
Thanks
Powered by blists - more mailing lists