lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAvCjhiqtTBYNfgVHtOashJZuArY3mz2=938ip=5i4u_7wd85A@mail.gmail.com>
Date: Wed, 11 Oct 2023 13:28:05 +0300
From: Dmitry Kravkov <dmitryk@...lt.com>
To: netdev@...r.kernel.org, edumazet@...gle.com, 
	"Slava (Ice) Sheremet" <slavas@...lt.com>
Subject: kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer

Hi,

In our try to upgrade from 5.10 to 6.1 kernel we noticed stable crash
in kernel that bisected to this commit:

commit 849b425cd091e1804af964b771761cfbefbafb43
Author: Eric Dumazet <edumazet@...gle.com>
Date:   Tue Jun 14 10:17:34 2022 -0700

    tcp: fix possible freeze in tx path under memory pressure

    Blamed commit only dealt with applications issuing small writes.

    Issue here is that we allow to force memory schedule for the sk_buff
    allocation, but we have no guarantee that sendmsg() is able to
    copy some payload in it.

    In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.

    For example, if we consider tcp_wmem[0] = 4096 (default on x86),
    and initial skb->truesize being 1280, tcp_sendmsg() is able to
    copy up to 2816 bytes under memory pressure.

    Before this patch a sendmsg() sending more than 2816 bytes
    would either block forever (if persistent memory pressure),
    or return -EAGAIN.

    For bigger MTU networks, it is advised to increase tcp_wmem[0]
    to avoid sending too small packets.

    v2: deal with zero copy paths.

    Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
    Signed-off-by: Eric Dumazet <edumazet@...gle.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@...gle.com>
    Reviewed-by: Wei Wang <weiwan@...gle.com>
    Reviewed-by: Shakeel Butt <shakeelb@...gle.com>
    Signed-off-by: David S. Miller <davem@...emloft.net>

This happens in a pretty stressful situation when two 100Gb (E810 or
ConnectX6) ports transmit above 150Gbps that most of the data is read
from disks. So it appears that the system is constantly in a memory
deficit. Apparently reverting the patch in 6.1.38 kernel eliminates
the crash and system appears stable at delivering 180Gbps

[ 2445.532318] ------------[ cut here ]------------
[ 2445.532323] kernel BUG at net/ipv4/tcp_output.c:2642!
[ 2445.532334] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 2445.550934] CPU: 61 PID: 109767 Comm: nginx Tainted: G S         OE
    5.19.0-rc2+ #21
[ 2445.560127] ------------[ cut here ]------------
[ 2445.560565] Hardware name: Cisco Systems Inc
UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
11/21/2021
[ 2445.560571] RIP: 0010:tcp_write_xmit+0x70b/0x830
[ 2445.561221] kernel BUG at net/ipv4/tcp_output.c:2642!
[ 2445.561821] Code: 84 0b fc ff ff 0f b7 43 32 41 39 c6 0f 84 fe fb
ff ff 8b 43 70 41 39 c6 0f 82 ff 00 00 00 c7 43 30 01 00 00 00 e9 e6
fb ff ff <0f> 0b 8b 74 24 20 8b 85 dc 05 00 00 44 89 ea 01 c8 2b 43 28
41 39
[ 2445.561828] RSP: 0000:ffffc110ed647dc0 EFLAGS: 00010246
[ 2445.561832] RAX: 0000000000000000 RBX: ffff9fe1f8081a00 RCX: 00000000000005a8
[ 2445.561833] RDX: 000000000000043a RSI: 000002389172f8f4 RDI: 000000000000febf
[ 2445.561835] RBP: ffff9fe5f864e900 R08: 0000000000000000 R09: 0000000000000100
[ 2445.561836] R10: ffffffff9be060d0 R11: 000000000000000e R12: ffff9fe5f864e901
[ 2445.561837] R13: 0000000000000001 R14: 00000000000005a8 R15: 0000000000000000
[ 2445.561839] FS:  00007f342530c840(0000) GS:ffff9ffa7f940000(0000)
knlGS:0000000000000000
[ 2445.561842] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2445.561844] CR2: 00007f20ca4ed830 CR3: 00000045d976e005 CR4: 0000000000770ee0
[ 2445.561846] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2445.561847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2445.561849] PKRU: 55555554
[ 2445.561853] Call Trace:
[ 2445.561858]  <TASK>
[ 2445.564202] ------------[ cut here ]------------
[ 2445.568007]  ? tcp_tasklet_func+0x120/0x120
[ 2445.569107] kernel BUG at net/ipv4/tcp_output.c:2642!
[ 2445.569608]  tcp_tsq_handler+0x7c/0xa0
[ 2445.569627]  tcp_pace_kick+0x19/0x60
[ 2445.569632]  __run_hrtimer+0x5c/0x1d0
[ 2445.572264] ------------[ cut here ]------------
[ 2445.574287] ------------[ cut here ]------------
[ 2445.574292] kernel BUG at net/ipv4/tcp_output.c:2642!
[ 2445.582581]  __hrtimer_run_queues+0x7d/0xe0
--
--

-- 
--

Dmitry Kravkov     Software  Engineer
Qwilt | Mobile: +972-54-4839923 | dmitryk@...lt.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ