[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200508062440.GF11244@42.do-not-panic.com>
Date: Fri, 8 May 2020 06:24:40 +0000
From: Luis Chamberlain <mcgrof@...nel.org>
To: jeyu@...nel.org, akpm@...ux-foundation.org, arnd@...db.de,
rostedt@...dmis.org, mingo@...hat.com, aquini@...hat.com,
cai@....pw, dyoung@...hat.com, bhe@...hat.com,
peterz@...radead.org, tglx@...utronix.de, gpiccoli@...onical.com,
pmladek@...e.com, tiwai@...e.de, schlad@...e.de,
andriy.shevchenko@...ux.intel.com, keescook@...omium.org,
will@...nel.org, mchehab+samsung@...nel.org, kvalo@...eaurora.org,
davem@...emloft.net, linux-kernel@...r.kernel.org
Subject: Re: [RFC] taint: add module firmware crash taint support
On Fri, May 08, 2020 at 08:11:24AM +0200, Daniel Vetter wrote:
> On Fri, May 08, 2020 at 02:14:38AM +0000, Luis Chamberlain wrote:
> > Device driver firmware can crash, and sometimes, this can leave your
> > system in a state which makes the device or subsystem completely
> > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
> > of scraping some magical words from the kernel log, which is driver
> > specific, is much easier. So instead provide a helper which lets drivers
> > annotate this.
> >
> > Once this happens, scrapers can easily scrape modules taint flags.
> > This will taint both the kernel and respective calling module.
> >
> > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as
> > this fact should in no way shape or form affect lockdep. This taint
> > is device driver specific.
> >
> > Signed-off-by: Luis Chamberlain <mcgrof@...nel.org>
> > ---
> >
> > Below is the full diff stat of manual inspection throughout the kernel
> > when this happens. My methodology is to just scrape for "crash" and
> > then study the driver a bit to see if indeed it seems like that the
> > firmware crashes there. In *many* cases there is even infrastructure
> > for this, so this is sometimes clearly obvious. Some other times it
> > required a bit of deciphering.
> >
> > The diff stat below is what I have so far, however the patch below
> > only includes the drivers that start with Q, as they were a source of
> > inspiration for this, and to make this RFC easier to read.
> >
> > If this seems sensible, I can follow up with the kernel helper first,
> > and then tackle each subsystem independently.
> >
> > I purposely skipped review of remoteproc and virtualization. That should
> > require its own separate careful review and considerations.
> >
> > drivers/atm/nicstar.c | 1 +
> > drivers/bluetooth/hci_qca.c | 1 +
> > drivers/gpu/drm/i915/i915_gpu_error.c | 1 +
> > drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 2 ++
> > drivers/gpu/drm/msm/msm_gpu.c | 1 +
>
> I'm not finding the drm changes in your diff below ...
That was on purpose, as this was an RFC and I didnt' want to
clutter this with noise.
> Also what Kees
> said, I think best to split this up and properly cc per
> get_maintainers.pl.
Sounds good.
Luis
Powered by blists - more mailing lists