[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b97b131-65a2-e6d0-779e-d8ab31d5c0ae@intel.com>
Date: Mon, 3 Feb 2020 11:32:36 -0800
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jiri Pirko <jiri@...nulli.us>
Cc: netdev@...r.kernel.org, valex@...lanox.com, linyunsheng@...wei.com,
lihong.yang@...el.com
Subject: Re: [PATCH 03/15] devlink: add operation to take an immediate
snapshot
On 2/3/2020 3:50 AM, Jiri Pirko wrote:
> Thu, Jan 30, 2020 at 11:58:58PM CET, jacob.e.keller@...el.com wrote:
>> Add a new devlink command, DEVLINK_CMD_REGION_TAKE_SNAPSHOT. This
>> command is intended to enable userspace to request an immediate snapshot
>> of a region.
>>
>> Regions can enable support for requestable snapshots by implementing the
>> snapshot callback function in the region's devlink_region_ops structure.
>>
>> Implementations of this function callback should capture an immediate
>> copy of the data and return it and its destructor in the function
>> parameters. The core devlink code will generate a snapshot ID and create
>> the new snapshot while holding the devlink instance lock.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@...el.com>
>> ---
>> .../networking/devlink/devlink-region.rst | 9 +++-
>> include/net/devlink.h | 7 +++
>> include/uapi/linux/devlink.h | 2 +
>> net/core/devlink.c | 46 +++++++++++++++++++
>> 4 files changed, 62 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
>> index 1a7683e7acb2..262249e6c3fc 100644
>> --- a/Documentation/networking/devlink/devlink-region.rst
>> +++ b/Documentation/networking/devlink/devlink-region.rst
>> @@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
>> Regions may also be used to provide an additional way to debug complex error
>> states, but see also :doc:`devlink-health`
>>
>> +Regions may optionally support capturing a snapshot on demand via the
>> +``DEVLINK_CMD_REGION_TAKE_SNAPSHOT`` netlink message. A driver wishing to
>> +allow requested snapshots must implement the ``.snapshot`` callback for the
>> +region in its ``devlink_region_ops`` structure.
>> +
>> example usage
>> -------------
>>
>> @@ -40,8 +45,8 @@ example usage
>> # Delete a snapshot using:
>> $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
>>
>> - # Trigger (request) a snapshot be taken:
>> - $ devlink region trigger pci/0000:00:05.0/cr-space
>> + # Request an immediate snapshot, if supported by the region
>> + $ devlink region snapshot pci/0000:00:05.0/cr-space
>
>
> Hmm, the shapshot is now removed by user calling:
>
> $ devlink region del DEV/REGION snapshot SNAPSHOT_ID
> That is using DEVLINK_CMD_REGION_DEL netlink command calling
> devlink_nl_cmd_region_del()
>
> I think the creation should be symmetric. Something like:
> $ devlink region add DEV/REGION snapshot SNAPSHOT_ID
> SNAPSHOT_ID is either exact number or "any" if user does not care.
> The benefit of using user-passed ID value is that you can use this
> easily in scripts.
>
> The existing unused netlink command DEVLINK_CMD_REGION_NEW would be used
> for this.
>
So I have some concern trying to allow picking the snapshot id. I agree
it is useful, but want to make sure we pick the best design for how to
handle things.
Currently regions support taking a snapshot across multiple regions with
the same ID. this means that the region id value is stored per devlink
instead of per region.
If users can pick IDs, they can and probably will become sparse. This
means that we now need to be able to handle this.
If a user picks an ID, we want to ensure that the global region id
number is incremented properly so that we skip the used IDs, otherwise
those could accidentally collide.
The simplest solution is to just force the global ID to be 1 larger at a
minimum every time the userspace calls us with an ID.
But now what happens if a user requests a really large ID (U32_MAX - 1)?
and then we now overflow our region ID.
This was previously a rare occurrence, but has now become possibly common.
We could require/force the user to pick IDs within a limited range, and
have the automatic regions come from another range..
We could enhance ID selection to just pick "lowest number unused by any
region". This would allow re-using ID numbers after they've been
deleted.. I think this approach is the most robust but does require a
bit of extra computation.
Thanks,
Jake
Powered by blists - more mailing lists