lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200509113546.7dcd1599@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Sat, 9 May 2020 11:35:46 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Luis Chamberlain <mcgrof@...nel.org>, Jiri Pirko <jiri@...nulli.us>
Cc:     jeyu@...nel.org, akpm@...ux-foundation.org, arnd@...db.de,
        rostedt@...dmis.org, mingo@...hat.com, aquini@...hat.com,
        cai@....pw, dyoung@...hat.com, bhe@...hat.com,
        peterz@...radead.org, tglx@...utronix.de, gpiccoli@...onical.com,
        pmladek@...e.com, tiwai@...e.de, schlad@...e.de,
        andriy.shevchenko@...ux.intel.com, keescook@...omium.org,
        daniel.vetter@...ll.ch, will@...nel.org,
        mchehab+samsung@...nel.org, kvalo@...eaurora.org,
        davem@...emloft.net, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/15] net: taint when the device driver firmware
 crashes

On Sat,  9 May 2020 04:35:37 +0000 Luis Chamberlain wrote:
> Device driver firmware can crash, and sometimes, this can leave your
> system in a state which makes the device or subsystem completely
> useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
> of scraping some magical words from the kernel log, which is driver
> specific, is much easier. So instead this series provides a helper which
> lets drivers annotate this and shows how to use this on networking
> drivers.
> 
> My methodology for finding when firmware crashes is to git grep for
> "crash" and then doing some study of the code to see if this indeed
> a place where the firmware crashes. In some places this is quite
> obvious.
> 
> I'm starting off with networking first, if this gets merged later on I
> can focus on the other drivers, but I already have some work done on
> other subsytems.
> 
> Review, flames, etc are greatly appreciated.

Tainting itself may be useful, but that's just the first step. I'd much
rather see folks start using the devlink health infrastructure. Devlink
is netlink based, but it's _not_ networking specific (many of its
optional features obviously are, but don't let that mislead you).

With devlink health we get (a) a standard notification on the failure; 
(b) information/state dump in a (somewhat) structured form, which can be
collected & shared with vendors; (c) automatic remediation (usually
device reset of some scope).

Now regarding the tainting - as I said it may be useful, but don't we
have to define what constitutes a "firmware crash"? There are many
failure modes, some perfectly recoverable (e.g. processing queue hang), 
some mere bugs (e.g. device fails to initialize some functions). All of
them may impact the functioning of the system. How do we choose those
that taint? 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ