linux-kernel - Re: [EXT] [PATCH 09/15] qed: use new module_firmware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2aaddb69-2292-ff3f-94c7-0ab9dbc8e53c@marvell.com>
Date:   Sat, 9 May 2020 09:32:51 +0300
From:   Igor Russkikh <irusskikh@...vell.com>
To:     Luis Chamberlain <mcgrof@...nel.org>, <jeyu@...nel.org>
CC:     <akpm@...ux-foundation.org>, <arnd@...db.de>,
        <rostedt@...dmis.org>, <mingo@...hat.com>, <aquini@...hat.com>,
        <cai@....pw>, <dyoung@...hat.com>, <bhe@...hat.com>,
        <peterz@...radead.org>, <tglx@...utronix.de>,
        <gpiccoli@...onical.com>, <pmladek@...e.com>, <tiwai@...e.de>,
        <schlad@...e.de>, <andriy.shevchenko@...ux.intel.com>,
        <keescook@...omium.org>, <daniel.vetter@...ll.ch>,
        <will@...nel.org>, <mchehab+samsung@...nel.org>,
        <kvalo@...eaurora.org>, <davem@...emloft.net>,
        <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        Ariel Elior <aelior@...vell.com>,
        GR-everest-linux-l2 <GR-everest-linux-l2@...vell.com>
Subject: Re: [EXT] [PATCH 09/15] qed: use new module_firmware_crashed()


> This makes use of the new module_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.
> 
> Using a taint flag allows us to annotate when this happens clearly.
> 
> Cc: Ariel Elior <aelior@...vell.com>
> Cc: GR-everest-linux-l2@...vell.com
> Signed-off-by: Luis Chamberlain <mcgrof@...nel.org>
> ---
>  drivers/net/ethernet/qlogic/qed/qed_debug.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c
> b/drivers/net/ethernet/qlogic/qed/qed_debug.c
> index f4eebaabb6d0..9cc6287b889b 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_debug.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c
> @@ -7854,6 +7854,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> *buffer)
>  						 REGDUMP_HEADER_SIZE,
>  						 &feature_size);
>  		if (!rc) {
> +			module_firmware_crashed();
>  			*(u32 *)((u8 *)buffer + offset) =
>  			    qed_calc_regdump_header(cdev,
> PROTECTION_OVERRIDE,
>  						    cur_engine,
> @@ -7869,6 +7870,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> *buffer)
>  		rc = qed_dbg_fw_asserts(cdev, (u8 *)buffer + offset +
>  					REGDUMP_HEADER_SIZE,
> &feature_size);
>  		if (!rc) {
> +			module_firmware_crashed();
>  			*(u32 *)((u8 *)buffer + offset) =
>  			    qed_calc_regdump_header(cdev, FW_ASSERTS,
>  						    cur_engine,
> feature_size,
> @@ -7906,6 +7908,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> *buffer)
>  		rc = qed_dbg_grc(cdev, (u8 *)buffer + offset +
>  				 REGDUMP_HEADER_SIZE, &feature_size);
>  		if (!rc) {
> +			module_firmware_crashed();
>  			*(u32 *)((u8 *)buffer + offset) =
>  			    qed_calc_regdump_header(cdev, GRC_DUMP,
>  						    cur_engine,


Hi Luis,

qed_dbg_all_data is being used to gather debug dump from device. Failures
inside it may happen due to various reasons, but they normally do not indicate
FW failure.

So I think its not a good place to insert this call.

Its hard to find exact good place to insert it in qed.

One more thing is that AFAIU taint flag gets permanent on kernel, but for
example our device can recover itself from some FW crashes, thus it'd be
transparent for user.

Whats the logical purpose of module_firmware_crashed? Does it mean fatal
unrecoverable error on device?

Thanks,
  Igor