[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221101192211.33498-1-kuniyu@amazon.com>
Date: Tue, 1 Nov 2022 12:22:11 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <william.xuanziyang@...wei.com>
CC: <davem@...emloft.net>, <edumazet@...gle.com>,
<joannelkoong@...il.com>, <kuba@...nel.org>, <kuni1840@...il.com>,
<kuniyu@...zon.com>, <martin.lau@...nel.org>,
<mathew.j.martineau@...ux.intel.com>, <netdev@...r.kernel.org>,
<pabeni@...hat.com>
Subject: Re: [RFC] bhash2 and WARN_ON() for inconsistent sk saddr.
From: "Ziyang Xuan (William)" <william.xuanziyang@...wei.com>
Date: Tue, 1 Nov 2022 15:08:15 +0800
> Hello Kuniyuki Iwashima,
>
> > Hi,
> >
> > I want to discuss bhash2 and WARN_ON() being fired every day this month
> > on my syzkaller instance without repro.
> >
> > WARNING: CPU: 0 PID: 209 at net/ipv4/inet_connection_sock.c:548 inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1))
> > ...
> > inet_csk_listen_start (net/ipv4/inet_connection_sock.c:1205)
> > inet_listen (net/ipv4/af_inet.c:228)
> > __sys_listen (net/socket.c:1810)
> > __x64_sys_listen (net/socket.c:1819 net/socket.c:1817 net/socket.c:1817)
> > do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> > entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> >
> > For the very first implementation of bhash2, there was a similar report
> > hitting the same WARN_ON(). The fix was to update the bhash2 bucket when
> > the kernel changes sk->sk_rcv_saddr from INADDR_ANY. Then, syzbot has a
> > repro, so we can indeed confirm that it no longer triggers the warning on
> > the latest kernel.
> >
> > https://lore.kernel.org/netdev/0000000000003f33bc05dfaf44fe@google.com/
> >
> > However, Mat reported at that time that there were at least two variants,
> > the latter being the same as mine.
> >
> > https://lore.kernel.org/netdev/4bae9df4-42c1-85c3-d350-119a151d29@linux.intel.com/
> > https://lore.kernel.org/netdev/23d8e9f4-016-7de1-9737-12c3146872ca@linux.intel.com/
> >
> > This week I started looking into this issue and finally figured out
> > why we could not catch all cases with a single repro.
> >
>
> Provide another C repro for analysis. See the attachment.
Thanks for another variant.
Your repro also fails to connect() by RST, which resets saddr without
updating bhash2 bucket, and then listen() hits the WARN_ON().
I meant to say if there was no difference in failure paths we should
have caught all places where we need fixes with a single repro.
Once we know the root cause, it's not so hard to generate variants.
Anyway, I'll post a patch for consistent error handling and later
another patch to fix the root cause when I find a solid way.
Powered by blists - more mailing lists