netdev - Re: [PATCH 00/15] net: taint when the device driver firmware crashes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200511141113.GP11244@42.do-not-panic.com>
Date:   Mon, 11 May 2020 14:11:13 +0000
From:   Luis Chamberlain <mcgrof@...nel.org>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Jiri Pirko <jiri@...nulli.us>, jeyu@...nel.org,
        akpm@...ux-foundation.org, arnd@...db.de, rostedt@...dmis.org,
        mingo@...hat.com, aquini@...hat.com, cai@....pw, dyoung@...hat.com,
        bhe@...hat.com, peterz@...radead.org, tglx@...utronix.de,
        gpiccoli@...onical.com, pmladek@...e.com, tiwai@...e.de,
        schlad@...e.de, andriy.shevchenko@...ux.intel.com,
        keescook@...omium.org, daniel.vetter@...ll.ch, will@...nel.org,
        mchehab+samsung@...nel.org, kvalo@...eaurora.org,
        davem@...emloft.net, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/15] net: taint when the device driver firmware crashes

On Sat, May 09, 2020 at 11:35:46AM -0700, Jakub Kicinski wrote:
> On Sat,  9 May 2020 04:35:37 +0000 Luis Chamberlain wrote:
> > Device driver firmware can crash, and sometimes, this can leave your
> > system in a state which makes the device or subsystem completely
> > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
> > of scraping some magical words from the kernel log, which is driver
> > specific, is much easier. So instead this series provides a helper which
> > lets drivers annotate this and shows how to use this on networking
> > drivers.
> > 
> > My methodology for finding when firmware crashes is to git grep for
> > "crash" and then doing some study of the code to see if this indeed
> > a place where the firmware crashes. In some places this is quite
> > obvious.
> > 
> > I'm starting off with networking first, if this gets merged later on I
> > can focus on the other drivers, but I already have some work done on
> > other subsytems.
> > 
> > Review, flames, etc are greatly appreciated.
> 
> Tainting itself may be useful, but that's just the first step. I'd much
> rather see folks start using the devlink health infrastructure. Devlink
> is netlink based, but it's _not_ networking specific (many of its
> optional features obviously are, but don't let that mislead you).
> 
> With devlink health we get (a) a standard notification on the failure; 
> (b) information/state dump in a (somewhat) structured form, which can be
> collected & shared with vendors; (c) automatic remediation (usually
> device reset of some scope).

It indeed sounds very useful!

> Now regarding the tainting - as I said it may be useful, but don't we
> have to define what constitutes a "firmware crash"?

Yes indeed, I missed clarifying this in the documentation. I'll do so
in my next respin.

> There are many
> failure modes, some perfectly recoverable (e.g. processing queue hang), 
> some mere bugs (e.g. device fails to initialize some functions). All of
> them may impact the functioning of the system. How do we choose those
> that taint? 

Its up to the maintainers of the device driver, what I was aiming for
were those firmware crashes which indeed *can* have an impact on user
experience, and can *even* potentially require a driver removal / addition
to to get things back in order again.

  Luis