[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f96b33ab-56d5-4a43-a1ff-2e68e2c55ac2@kernel.org>
Date: Mon, 22 Jan 2024 19:22:42 +0100
From: Matthieu Baerts <matttbe@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Eric Dumazet <edumazet@...gle.com>, Netdev <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: Kernel panic in netif_rx_internal after v6 pings between netns
Hi Jakub,
On 22/01/2024 18:28, Jakub Kicinski wrote:
(...)
> Somewhat related. What do you do currently to ignore crashes?
I was wondering why you wanted to ignore crashes :) ... but then I saw
the new "Test ignored" and "Crashes ignored" sections on the status
page. Just to be sure: you don't want to report issues that have not
been introduced by the new patches, right?
We don't need to do that on MPTCP side:
- either it is a new crash with patches that are in reviewed and that's
not impacting others → we test each series individually, not a batch of
series.
- or there are issues with recent patches, not in netdev yet → we fix,
or revert.
- or there is an issue elsewhere, like the kernel panic we reported
here: usually I try to quickly apply a workaround, e.g. applying a fix,
or a revert. I don't think we ever had an issue really impacting us
where we couldn't find a quick solution in one or two days. With the
panic we reported here, ~15% of the tests had an issue, that's "OK" to
have that for a few days/weeks
With fewer tests and a smaller community, it is easier for us to just
say on the ML and weekly meetings: "this is a known issue, please ignore
for the moment". But if possible, I try to add a workaround/fix in our
repo used by the CI and devs (not upstreamed).
For NIPA CI, do you want to do like with the build and compare with a
reference? Or multiple ones to take into account unstable tests? Or
maintain a list of known issues (I think you started to do that,
probably safer/easier for the moment)?
> I was seeing a lot of:
> https://netdev-2.bots.linux.dev/vmksft-net-mp/results/431181/vm-crash-thr0-2
>
> So I hacked up this function to filter the crash from NIPA CI:
> https://github.com/kuba-moo/nipa/blob/master/contest/remote/lib/vm.py#L50
> It tries to get first 5 function names from the stack, to form
> a "fingerprint". But I seem to recall a discussion at LPC's testing
> track that there are existing solutions for generating fingerprints.
> Are you aware of any?
No, sorry. But I guess they are using that with syzkaller, no?
I have to admit that crashes (or warnings) are quite rare, so there was
no need to have an automation there. But if it is easy to have a
fingerprint, I will be interested as well, it can help for the tracking:
to find occurrences of crashes/warnings that are very hard to reproduce.
> (FWIW the crash from above seems to be gone on latest linux.git,
> this night's CIs run are crash-free.)
Good it was quickly fixed!
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Powered by blists - more mailing lists