netdev - Re: [RFC PATCH net-next 6/6] ice: implement devlink parameters to control flash update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0a12dbf7-58be-b0ad-53d7-61748b081b38@intel.com>
Date:   Fri, 10 Jul 2020 13:32:43 -0700
From:   Jacob Keller <jacob.e.keller@...el.com>
To:     Jakub Kicinski <kubakici@...pl>
Cc:     netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Tom Herbert <tom@...bertland.com>
Subject: Re: [RFC PATCH net-next 6/6] ice: implement devlink parameters to
 control flash update



On 7/10/2020 1:25 PM, Jakub Kicinski wrote:
> On Fri, 10 Jul 2020 10:32:24 -0700 Jacob Keller wrote:
>> On 7/9/2020 5:19 PM, Jakub Kicinski wrote:
>>> On Thu,  9 Jul 2020 14:26:52 -0700 Jacob Keller wrote:  
>>>> The flash update for the ice hardware currently supports a single fixed
>>>> configuration:
>>>>
>>>> * Firmware is always asked to preserve all changeable fields
>>>> * The driver never allows downgrades
>>>> * The driver will not allow canceling a previous update that never
>>>>   completed (for example because an EMP reset never occurred)
>>>> * The driver does not attempt to trigger an EMP reset immediately.
>>>>
>>>> This default mode of operation is reasonable. However, it is often
>>>> useful to allow system administrators more control over the update
>>>> process. To enable this, implement devlink parameters that allow the
>>>> system administrator to specify the desired behaviors:
>>>>
>>>> * 'reset_after_flash_update'
>>>>   If enabled, the driver will request that the firmware immediately
>>>>   trigger an EMP reset when completing the device update. This will
>>>>   result in the device switching active banks immediately and
>>>>   re-initializing with the new firmware.  
>>>
>>> This should probably be handled through a reset API like what
>>> Vasundhara is already working on.
>>
>> Sure. I hadn't seen that work but I'll go take a look.
>>
>>>> * 'allow_downgrade_on_flash_update'
>>>>   If enabled, the driver will attempt to update device flash even when
>>>>   firmware indicates that such an update would be a downgrade.  
>>
>> There is also some trickiness here, because what this parameter does is
>> cause the driver to ignore the firmware version check. I suppose we
>> could just change the default behavior to ignoring that and assume user
>> space will check itself?
> 
> Seems only appropriate to me.
> 
> I assume this is a safety check because downgrades are sometimes
> impossible without factory reset (new FW version makes incompatible
> changes to the NVM params or such)? FWIW that's a terrible user
> experience, best avoided and handled as a exceptional circumstance
> which it should be.
> 
> The defaults should be any FW version can be installed after any FW
> version. Including downgrades, skipping versions etc.
> 
>>>> * 'ignore_pending_flash_update'
>>>>   If enabled, the device driver will cancel a previous pending update.
>>>>   A pending update is one where the steps to write the update to the NVM
>>>>   bank has finished, but the device never reset, as the system had not
>>>>   yet been rebooted.  
>>>
>>> These can be implemented in user space based on the values of running
>>> and stored versions from devlink info.  
>>
>> So, there's some trickiness here. We actually have to perform some steps
>> to cancel an update. Perhaps we should introduce a new option to request
>> that a previous update be cancelled? If we don't tell the firmware to
>> cancel the update, then future update requests will simply fail with
>> some errors.
> 
> Can't it be canceled automatically when user requests a new image to
> be flashed?
> 
> Perhaps best to think about it from the user perspective rather than
> how the internal works. User wants a new FW, they flash it. Next boot -
> the last flashed version should be activated.
> 
> If user wants to "cancel" and upgrade they will most likely flash the
> previous version of the FW.
> 
> Is the pending update/ability to cancel thing also part of the DTMF
> spec?
> 

Sure, I suppose we could simply always cancel if we detect a previous
update.

I'm not sure if it's part of the spec. I mostly focused on the file format.

>>>> * 'flash_update_preservation_level'
>>>>   The value determines the preservation mode to request from firmware,
>>>>   among the following 4 choices:
>>>>   * PRESERVE_ALL (0)
>>>>     Preserve all settings and fields in the NVM configuration
>>>>   * PRESERVE_LIMITED (1)
>>>>     Preserve only a limited set of fields, including the VPD, PCI serial
>>>>     ID, MAC address, etc. This results in permanent settings being
>>>>     reset, including changes to the port configuration, such as the
>>>>     number of physical functions created.
>>>>   * PRESERVE_FACTORY_SETTINGS (2)
>>>>     Reset all configuration fields to the factory default settings
>>>>     stored within the NVM.
>>>>   * PRESERVE_NONE (3)
>>>>     Do not perform any preservation.  
>>>
>>> Could this also be handled in a separate reset API? It seems useful to
>>> be able to reset to factory defaults at any time, not just FW upgrade..
>>
>> I'm not sure. At least the way it's described in the datasheet here is
>> that this must be done during an update. I'll have to look into this
>> further.
>>
>> For the other 3 (I kept preserve none for completeness), these are
>> referring to how much of the settings we preserve when updating to the
>> new image, so I think they only apply at update time.
> 
> Not sure what the difference is between 2 and 3.
> 

I'll ask my colleagues. It is my understanding is currently the following:

0 (ALL) -> keep all of the settings/fields that can be configured within
the flash the same. This includes things like the port configuration
(number of physical functions). This is the default behavior.

1 (SELECTIVE) -> keep only a small subset that includes the static
fields that shouldn't change.

2 (FACTORY) -> as discussed earlier, restores fields from a factory
settings section. AFAIK this is a write-once thing where it is written
at the factory.

3 (NONE) -> just write what is in the flash image, don't preserve anything.

> Not sure differentiating between 0 and 1 matters in practice. Clearly
> users will not do 0 in the field, cause they don't have new IDs assigned
> per product, and don't want to loose the IDs they put in their HW DB.
> 
> 0 is only something a OEM can use, right? OEMs presumably generate the
> image per device to flash the IDs, meaning difference between 0 and 1
> seems to be equivalent to flashing a special OEM FW package vs flashing
> a normal customer FW update...
> 

So, I think 3) would be the case where you want to jsut use what's in
the image, while the diff between 0 and 1 is that 0 will preserve more
settings, while 1 will only preserve the smallest necessary set.

I can ask for further information. This list was given to me as part of
the request.