netdev - Re: [RFC PATCH net-next 6/6] ice: implement devlink parameters to control flash update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200710132516.24994a33@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Fri, 10 Jul 2020 13:25:16 -0700
From:   Jakub Kicinski <kubakici@...pl>
To:     Jacob Keller <jacob.e.keller@...el.com>
Cc:     netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Tom Herbert <tom@...bertland.com>
Subject: Re: [RFC PATCH net-next 6/6] ice: implement devlink parameters to
 control flash update

On Fri, 10 Jul 2020 10:32:24 -0700 Jacob Keller wrote:
> On 7/9/2020 5:19 PM, Jakub Kicinski wrote:
> > On Thu,  9 Jul 2020 14:26:52 -0700 Jacob Keller wrote:  
> >> The flash update for the ice hardware currently supports a single fixed
> >> configuration:
> >>
> >> * Firmware is always asked to preserve all changeable fields
> >> * The driver never allows downgrades
> >> * The driver will not allow canceling a previous update that never
> >>   completed (for example because an EMP reset never occurred)
> >> * The driver does not attempt to trigger an EMP reset immediately.
> >>
> >> This default mode of operation is reasonable. However, it is often
> >> useful to allow system administrators more control over the update
> >> process. To enable this, implement devlink parameters that allow the
> >> system administrator to specify the desired behaviors:
> >>
> >> * 'reset_after_flash_update'
> >>   If enabled, the driver will request that the firmware immediately
> >>   trigger an EMP reset when completing the device update. This will
> >>   result in the device switching active banks immediately and
> >>   re-initializing with the new firmware.  
> > 
> > This should probably be handled through a reset API like what
> > Vasundhara is already working on.
> 
> Sure. I hadn't seen that work but I'll go take a look.
> 
> >> * 'allow_downgrade_on_flash_update'
> >>   If enabled, the driver will attempt to update device flash even when
> >>   firmware indicates that such an update would be a downgrade.  
> 
> There is also some trickiness here, because what this parameter does is
> cause the driver to ignore the firmware version check. I suppose we
> could just change the default behavior to ignoring that and assume user
> space will check itself?

Seems only appropriate to me.

I assume this is a safety check because downgrades are sometimes
impossible without factory reset (new FW version makes incompatible
changes to the NVM params or such)? FWIW that's a terrible user
experience, best avoided and handled as a exceptional circumstance
which it should be.

The defaults should be any FW version can be installed after any FW
version. Including downgrades, skipping versions etc.

> >> * 'ignore_pending_flash_update'
> >>   If enabled, the device driver will cancel a previous pending update.
> >>   A pending update is one where the steps to write the update to the NVM
> >>   bank has finished, but the device never reset, as the system had not
> >>   yet been rebooted.  
> > 
> > These can be implemented in user space based on the values of running
> > and stored versions from devlink info.  
> 
> So, there's some trickiness here. We actually have to perform some steps
> to cancel an update. Perhaps we should introduce a new option to request
> that a previous update be cancelled? If we don't tell the firmware to
> cancel the update, then future update requests will simply fail with
> some errors.

Can't it be canceled automatically when user requests a new image to
be flashed?

Perhaps best to think about it from the user perspective rather than
how the internal works. User wants a new FW, they flash it. Next boot -
the last flashed version should be activated.

If user wants to "cancel" and upgrade they will most likely flash the
previous version of the FW.

Is the pending update/ability to cancel thing also part of the DTMF
spec?

> >> * 'flash_update_preservation_level'
> >>   The value determines the preservation mode to request from firmware,
> >>   among the following 4 choices:
> >>   * PRESERVE_ALL (0)
> >>     Preserve all settings and fields in the NVM configuration
> >>   * PRESERVE_LIMITED (1)
> >>     Preserve only a limited set of fields, including the VPD, PCI serial
> >>     ID, MAC address, etc. This results in permanent settings being
> >>     reset, including changes to the port configuration, such as the
> >>     number of physical functions created.
> >>   * PRESERVE_FACTORY_SETTINGS (2)
> >>     Reset all configuration fields to the factory default settings
> >>     stored within the NVM.
> >>   * PRESERVE_NONE (3)
> >>     Do not perform any preservation.  
> > 
> > Could this also be handled in a separate reset API? It seems useful to
> > be able to reset to factory defaults at any time, not just FW upgrade..
>
> I'm not sure. At least the way it's described in the datasheet here is
> that this must be done during an update. I'll have to look into this
> further.
> 
> For the other 3 (I kept preserve none for completeness), these are
> referring to how much of the settings we preserve when updating to the
> new image, so I think they only apply at update time.

Not sure what the difference is between 2 and 3.

Not sure differentiating between 0 and 1 matters in practice. Clearly
users will not do 0 in the field, cause they don't have new IDs assigned
per product, and don't want to loose the IDs they put in their HW DB.

0 is only something a OEM can use, right? OEMs presumably generate the
image per device to flash the IDs, meaning difference between 0 and 1
seems to be equivalent to flashing a special OEM FW package vs flashing
a normal customer FW update...