[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200301085756.GS26061@nanopsycho>
Date: Sun, 1 Mar 2020 09:57:56 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, davem@...emloft.net, saeedm@...lanox.com,
leon@...nel.org, michael.chan@...adcom.com, vishal@...lsio.com,
jeffrey.t.kirsher@...el.com, idosch@...lanox.com,
aelior@...vell.com, peppe.cavallaro@...com,
alexandre.torgue@...com, jhs@...atatu.com,
xiyou.wangcong@...il.com, pablo@...filter.org,
ecree@...arflare.com, mlxsw@...lanox.com
Subject: Re: [patch net-next v2 12/12] sched: act: allow user to specify type
of HW stats for a filter
Sat, Feb 29, 2020 at 09:14:52PM CET, kuba@...nel.org wrote:
>On Sat, 29 Feb 2020 08:52:09 +0100 Jiri Pirko wrote:
>> Fri, Feb 28, 2020 at 08:59:23PM CET, kuba@...nel.org wrote:
>> >On Fri, 28 Feb 2020 18:25:05 +0100 Jiri Pirko wrote:
>> >> From: Jiri Pirko <jiri@...lanox.com>
>> >> +/* tca HW stats type */
>> >> +enum tca_act_hw_stats_type {
>> >> + TCA_ACT_HW_STATS_TYPE_ANY, /* User does not care, it's default
>> >> + * when user does not pass the attr.
>> >> + * Instructs the driver that user does not
>> >> + * care if the HW stats are "immediate"
>> >> + * or "delayed".
>> >> + */
>> >> + TCA_ACT_HW_STATS_TYPE_IMMEDIATE, /* Means that in dump, user gets
>> >> + * the current HW stats state from
>> >> + * the device queried at the dump time.
>> >> + */
>> >> + TCA_ACT_HW_STATS_TYPE_DELAYED, /* Means that in dump, user gets
>> >> + * HW stats that might be out of date
>> >> + * for some time, maybe couple of
>> >> + * seconds. This is the case when driver
>> >> + * polls stats updates periodically
>> >> + * or when it gets async stats update
>> >> + * from the device.
>> >> + */
>> >> + TCA_ACT_HW_STATS_TYPE_DISABLED, /* User is not interested in getting
>> >> + * any HW statistics.
>> >> + */
>> >> +};
>> >
>> >On the ABI I wonder if we can redefine it a little bit..
>> >
>> >Can we make the stat types into a bitfield?
>> >
>> >On request:
>> > - no attr -> any stats allowed but some stats must be provided *
>> > - 0 -> no stats requested / disabled
>> > - 0x1 -> must be stat type0
>> > - 0x6 -> stat type1 or stat type2 are both fine
>>
>> I was thinking about this of course. On the write side, this is ok
>> however, this is very tricky on read side. See below.
>>
>> >* no attr kinda doesn't work 'cause u32 offload has no stats and this
>> > is action-level now, not flower-level :S What about u32 and matchall?
>>
>> The fact that cls does not implement stats offloading is a lack of
>> feature of the particular cls.
>
>Yeah, I wonder how that squares with strict netlink parsing.
>
>> >We can add a separate attribute with "active" stat types:
>> > - no attr -> old kernel
>> > - 0 -> no stats are provided / stats disabled
>> > - 0x1 -> only stat type0 is used by drivers
>> > - 0x6 -> at least one driver is using type1 and one type2
>>
>> There are 2 problems:
>> 1) There is a mismatch between write and read. User might pass different
>> value than it eventually gets from kernel. I guess this might be fine.
>
>Separate attribute would work.
>
>> 2) Much bigger problem is, that since the same action may be offloaded
>> by multiple drivers, the read would have to provide an array of
>> bitfields, each array item would represent one offloaded driver. That is
>> why I decided for simple value instead of bitfield which is the same on
>> write and read.
>
>Why an array? The counter itself is added up from all the drivers.
>If the value is a bitfield all drivers can just OR-in their type.
Yeah, for uapi. Internally the array would be still needed. Also the
driver would need to somehow "write-back" the value to the offload
caller and someone (caller/tc) would have to use the array to track
these bitfields for individual callbacks (probably idr of some sort).
I don't know, is this excercise worth it?
Seems to me like we are overengineering this one a bit.
Also there would be no "any" it would be type0|type1|type2 the user
would have to pass. If new type appears, the userspace would have to be
updated to do "any" again :/ This is inconvenient.
>
>> >That assumes that we may one day add another stat type which would
>> >not be just based on the reporting time.
>> >
>> >If we only foresee time-based reporting would it make sense to turn
>> >the attribute into max acceptable delay in ms?
>> >
>> >0 -> only immediate / blocking stats
>> >(0, MAX) -> given reporting delay in ms is acceptable
>> >MAX -> don't care about stats at all
>>
>> Interesting, is this "delayed" granularity something that has a usecase?
>> It might turn into a guessing game between user and driver during action
>> insertion :/
>
>Yeah, I don't like the guessing part too, worst case refresh time may
>be system dependent.
>
>With just "DELAYED" I'm worried users will think the delay may be too
>long for OvS. Or simply poll the statistics more often than the HW
>reports them, which would be pointless.
>
>For the latter case I guess the best case refresh time is needed,
>while the former needs worst case. Hopefully the two are not too far
>apart.
>
>Maybe some day drivers may also tweak the refresh rate based on user
>requests to save PCIe bandwidth and CPU..
>
>Anyway.. maybe its not worth it today.
>
>> >> +tcf_flow_action_hw_stats_type(enum tca_act_hw_stats_type hw_stats_type)
>> >> +{
>> >> + switch (hw_stats_type) {
>> >> + default:
>> >> + WARN_ON(1);
>> >> + /* fall-through */
>> >
>> >without the policy change this seems user-triggerable
>>
>> Nope. tcf_action_hw_stats_type_get() takes care of setting
>> TCA_ACT_HW_STATS_TYPE_ANY when no attr is passed.
>
>I meant attribute is present but carries a large value.
Ah, will sanitize this.
>
>> >> + case TCA_ACT_HW_STATS_TYPE_ANY:
>> >> + return FLOW_ACTION_HW_STATS_TYPE_ANY;
>> >> + case TCA_ACT_HW_STATS_TYPE_IMMEDIATE:
>> >> + return FLOW_ACTION_HW_STATS_TYPE_IMMEDIATE;
>> >> + case TCA_ACT_HW_STATS_TYPE_DELAYED:
>> >> + return FLOW_ACTION_HW_STATS_TYPE_DELAYED;
>> >> + case TCA_ACT_HW_STATS_TYPE_DISABLED:
>> >> + return FLOW_ACTION_HW_STATS_TYPE_DISABLED;
>
Powered by blists - more mailing lists