[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0a12dbf7-58be-b0ad-53d7-61748b081b38@intel.com>
Date: Fri, 10 Jul 2020 13:32:43 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jakub Kicinski <kubakici@...pl>
Cc: netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>,
Jesse Brandeburg <jesse.brandeburg@...el.com>,
Tom Herbert <tom@...bertland.com>
Subject: Re: [RFC PATCH net-next 6/6] ice: implement devlink parameters to
control flash update
On 7/10/2020 1:25 PM, Jakub Kicinski wrote:
> On Fri, 10 Jul 2020 10:32:24 -0700 Jacob Keller wrote:
>> On 7/9/2020 5:19 PM, Jakub Kicinski wrote:
>>> On Thu, 9 Jul 2020 14:26:52 -0700 Jacob Keller wrote:
>>>> The flash update for the ice hardware currently supports a single fixed
>>>> configuration:
>>>>
>>>> * Firmware is always asked to preserve all changeable fields
>>>> * The driver never allows downgrades
>>>> * The driver will not allow canceling a previous update that never
>>>> completed (for example because an EMP reset never occurred)
>>>> * The driver does not attempt to trigger an EMP reset immediately.
>>>>
>>>> This default mode of operation is reasonable. However, it is often
>>>> useful to allow system administrators more control over the update
>>>> process. To enable this, implement devlink parameters that allow the
>>>> system administrator to specify the desired behaviors:
>>>>
>>>> * 'reset_after_flash_update'
>>>> If enabled, the driver will request that the firmware immediately
>>>> trigger an EMP reset when completing the device update. This will
>>>> result in the device switching active banks immediately and
>>>> re-initializing with the new firmware.
>>>
>>> This should probably be handled through a reset API like what
>>> Vasundhara is already working on.
>>
>> Sure. I hadn't seen that work but I'll go take a look.
>>
>>>> * 'allow_downgrade_on_flash_update'
>>>> If enabled, the driver will attempt to update device flash even when
>>>> firmware indicates that such an update would be a downgrade.
>>
>> There is also some trickiness here, because what this parameter does is
>> cause the driver to ignore the firmware version check. I suppose we
>> could just change the default behavior to ignoring that and assume user
>> space will check itself?
>
> Seems only appropriate to me.
>
> I assume this is a safety check because downgrades are sometimes
> impossible without factory reset (new FW version makes incompatible
> changes to the NVM params or such)? FWIW that's a terrible user
> experience, best avoided and handled as a exceptional circumstance
> which it should be.
>
> The defaults should be any FW version can be installed after any FW
> version. Including downgrades, skipping versions etc.
>
>>>> * 'ignore_pending_flash_update'
>>>> If enabled, the device driver will cancel a previous pending update.
>>>> A pending update is one where the steps to write the update to the NVM
>>>> bank has finished, but the device never reset, as the system had not
>>>> yet been rebooted.
>>>
>>> These can be implemented in user space based on the values of running
>>> and stored versions from devlink info.
>>
>> So, there's some trickiness here. We actually have to perform some steps
>> to cancel an update. Perhaps we should introduce a new option to request
>> that a previous update be cancelled? If we don't tell the firmware to
>> cancel the update, then future update requests will simply fail with
>> some errors.
>
> Can't it be canceled automatically when user requests a new image to
> be flashed?
>
> Perhaps best to think about it from the user perspective rather than
> how the internal works. User wants a new FW, they flash it. Next boot -
> the last flashed version should be activated.
>
> If user wants to "cancel" and upgrade they will most likely flash the
> previous version of the FW.
>
> Is the pending update/ability to cancel thing also part of the DTMF
> spec?
>
Sure, I suppose we could simply always cancel if we detect a previous
update.
I'm not sure if it's part of the spec. I mostly focused on the file format.
>>>> * 'flash_update_preservation_level'
>>>> The value determines the preservation mode to request from firmware,
>>>> among the following 4 choices:
>>>> * PRESERVE_ALL (0)
>>>> Preserve all settings and fields in the NVM configuration
>>>> * PRESERVE_LIMITED (1)
>>>> Preserve only a limited set of fields, including the VPD, PCI serial
>>>> ID, MAC address, etc. This results in permanent settings being
>>>> reset, including changes to the port configuration, such as the
>>>> number of physical functions created.
>>>> * PRESERVE_FACTORY_SETTINGS (2)
>>>> Reset all configuration fields to the factory default settings
>>>> stored within the NVM.
>>>> * PRESERVE_NONE (3)
>>>> Do not perform any preservation.
>>>
>>> Could this also be handled in a separate reset API? It seems useful to
>>> be able to reset to factory defaults at any time, not just FW upgrade..
>>
>> I'm not sure. At least the way it's described in the datasheet here is
>> that this must be done during an update. I'll have to look into this
>> further.
>>
>> For the other 3 (I kept preserve none for completeness), these are
>> referring to how much of the settings we preserve when updating to the
>> new image, so I think they only apply at update time.
>
> Not sure what the difference is between 2 and 3.
>
I'll ask my colleagues. It is my understanding is currently the following:
0 (ALL) -> keep all of the settings/fields that can be configured within
the flash the same. This includes things like the port configuration
(number of physical functions). This is the default behavior.
1 (SELECTIVE) -> keep only a small subset that includes the static
fields that shouldn't change.
2 (FACTORY) -> as discussed earlier, restores fields from a factory
settings section. AFAIK this is a write-once thing where it is written
at the factory.
3 (NONE) -> just write what is in the flash image, don't preserve anything.
> Not sure differentiating between 0 and 1 matters in practice. Clearly
> users will not do 0 in the field, cause they don't have new IDs assigned
> per product, and don't want to loose the IDs they put in their HW DB.
>
> 0 is only something a OEM can use, right? OEMs presumably generate the
> image per device to flash the IDs, meaning difference between 0 and 1
> seems to be equivalent to flashing a special OEM FW package vs flashing
> a normal customer FW update...
>
So, I think 3) would be the case where you want to jsut use what's in
the image, while the diff between 0 and 1 is that 0 will preserve more
settings, while 1 will only preserve the smallest necessary set.
I can ask for further information. This list was given to me as part of
the request.
Powered by blists - more mailing lists