[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200522234409.GH11244@42.do-not-panic.com>
Date: Fri, 22 May 2020 23:44:09 +0000
From: Luis Chamberlain <mcgrof@...nel.org>
To: Steve deRosier <derosier@...il.com>
Cc: Johannes Berg <johannes@...solutions.net>,
Jakub Kicinski <kuba@...nel.org>,
Ben Greear <greearb@...delatech.com>, jeyu@...nel.org,
akpm@...ux-foundation.org, arnd@...db.de, rostedt@...dmis.org,
mingo@...hat.com, aquini@...hat.com, cai@....pw, dyoung@...hat.com,
bhe@...hat.com, peterz@...radead.org, tglx@...utronix.de,
gpiccoli@...onical.com, pmladek@...e.com,
Takashi Iwai <tiwai@...e.de>, schlad@...e.de,
andriy.shevchenko@...ux.intel.com,
Kees Cook <keescook@...omium.org>,
Daniel Vetter <daniel.vetter@...ll.ch>, will@...nel.org,
mchehab+samsung@...nel.org, Kalle Valo <kvalo@...eaurora.org>,
"David S. Miller" <davem@...emloft.net>,
Network Development <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-wireless <linux-wireless@...r.kernel.org>,
ath10k@...ts.infradead.org, jiri@...nulli.us,
briannorris@...omium.org
Subject: Re: [RFC 1/2] devlink: add simple fw crash helpers
On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote:
> Specifically, I don't think we should set a taint flag when a driver
> easily handles a routine firmware crash and is confident that things
> have come up just fine again. In other words, triggering the taint in
> every driver module where it spits out a log comment that it had a
> firmware crash and had to recover seems too much. Sure, firmware
> shouldn't crash, sure it should be open source so we can fix it,
> whatever... those sort of wishful comments simply ignore reality and
> our ability to affect effective change. A lot of WiFi firmware crashes
> and for well-known cases the drivers handle them well. And in some
> cases, not so well and that should be a place the driver should detect
> and thus raise a red flag. If a WiFi firmware crash can bring down
> the kernel, there's either a major driver bug or some very funky
> hardware crap going on. That sort of thing we should be able to
> detect, mark with a taint (or something), and fix if within our sphere
> of influence. I guess what it comes down to me is how aggressive we
> are about setting the flag.
Exactly the crux of the issue.
I hope that by now we should all be in agreement that at least a
firmware crash requiring a reboot is something we should record and
inform the user of. A taint seems like a reasonable standard practice
for these sorts of things.
> I would like there to be a single solution, or a minimized set
> depending on what makes sense for the requirements. I haven't had time
> to look into the alternatives mentioned here so I don't have an
> informed opinion about the solution. I do think Luis is trying to
> solve a real problem though. Can we look at this from the point of
> view of what are the requirements? What is it we're trying to solve?
>
> I _think_ that the goal of Luis's original proposal is to report up to
> the user, at some future point when the user is interested (because
> something super drastic just occured, but long after the fw crash),
> that there was a firmware crash without the user having to grep
> through all logs on the machine. And then if the user sees that flag
> and suspects it, then they can bother to find it in the logs or do
> more drastic debugging steps like finding the fw crash in the log and
> pulling firmware crash dumps, etc.
Yes, that's exactly it. Not all users are clueful to inspect logs. I now
have a generic uevent mechanism drafted which sends a uevent for *any*
taint. So that is, it does not even depend on this series. But it
accomplishes the goal of informing the user of taints.
> I think the various alternate solutions are great but perhaps solving
> a superset of features (like adding in user-space notifications etc)?
> Perhaps different people on these related threads are trying to solve
> different problems?
The uevent mechanism I implemented (but not yet posted for review) at
least sends out a smoke signal. I think that if each subsystem wants
to expand on this with dumping facilities that is great too!
Luis
Powered by blists - more mailing lists