netdev - Re: [PATCH net v1] net/sched: taprio: fix cycle time extension logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230602071406.GA31501@linux.intel.com>
Date: Fri, 2 Jun 2023 15:14:06 +0800
From: Tan Tee Min <tee.min.tan@...ux.intel.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@...el.com>,
	netdev@...r.kernel.org, vinicius.gomes@...el.com, kuba@...nel.org,
	pabeni@...hat.com, linux-kernel@...r.kernel.org,
	edumazet@...gle.com
Subject: Re: [PATCH net v1] net/sched: taprio: fix cycle time extension logic

On Tue, May 30, 2023 at 10:47:08PM +0300, Vladimir Oltean wrote:
> On Tue, May 30, 2023 at 04:25:41PM +0800, Muhammad Husaini Zulkifli wrote:
> > From: Tan Tee Min <tee.min.tan@...ux.intel.com>
> > 
> I think that the commit message insufficiently describes the problems
> with the existing code. Also, the incorrect Fixes: tag suggests it may
> even have been incompletely characterized.
> 
> Here are some additional thoughts of mine (also insufficient, since they
> are based on static code analysis, done now) which may nuance things a
> bit more, to change the Fixes tag and the shape of the proposed solution.

Thanks a lot for the reviews!

> 
> Now, taprio_dequeue_from_txq() uses the following "entry" fields:
> gate_close_time[tc] and budget[tc]. They are both recalculated
> incorrectly by advance_sched(), which your change addresses. That is one
> issue which should be described properly, and a patch limited to that.

Do you mean a separate patch to fix the recalculation issue for
gate_close_time[tc] and budget[tc] in advance_sched()?

> 
> so this is why your choice is to also update the cycle_time. An unfortunate
> consequence of that choice is that this might also become transiently visible
> to taprio_dump(), which I'm not totally convinced is desirable - because
> effectively, the cycle time shouldn't have changed. Although, true, the
> old oper schedule is going away soon, we don't really know how soon. The
> cycle times can be arbitrarily large. It would be great if we could save
> the correction into a dedicated "s64 correction" variable (ranging from
> -cycle_time+1 to +cycle_time_extension), and leave the cycle_time alone.

That makes sense. I will save the correction into a dedicated "correction"
variable, and make it use with the cycle time at other places, and not update
the oper cycle_time.

So that the taprio_dump() still shows the original oper cycle_time during the
schedule change pending period.

> 
> But taprio_enqueue() is also a problem, and that isn't addressed. It can be,
> that during a corrected cycle, frames which were oversized due to the
> queueMaxSDU restriction are transiently not oversized anymore, and
> should be allowed to pass - or the other way around (and this is worse):
> a frame which would have passed through a full-size window will not pass
> through a truncated last cycle (negative correction), and taprio_enqueue()
> will not detect this and will not drop the skb.
> 
> taprio_update_queue_max_sdu() would need to be called, and that depends
> on an up-to-date sched->max_open_gate_duration[tc] which isn't currently
> recalculated.
> 
> So, one way or another, taprio_calculate_gate_durations() also needs to
> be called again after a change to the schedule's correction.

Yape, the queueMaxSDU and max_open_gate_durations() both should need to be
updated too. I will correct it in v2 patch by adding in the
taprio_update_queue_max_sdu() and taprio_calculate_gate_durations()
after the schedule change.

> 
> >  net/sched/sch_taprio.c | 32 +++++++++++++++++++++-----------
> >  1 file changed, 21 insertions(+), 11 deletions(-)
> > 
> > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> > index 76db9a10ef504..ef487fef83fce 100644
> > --- a/net/sched/sch_taprio.c
> > +++ b/net/sched/sch_taprio.c
> > @@ -99,6 +99,7 @@ struct taprio_sched {
> >  	u32 max_sdu[TC_MAX_QUEUE]; /* save info from the user */
> >  	u32 fp[TC_QOPT_MAX_QUEUE]; /* only for dump and offloading */
> >  	u32 txtime_delay;
> > +	bool sched_changed;
> 
> nitpick: sched_change_pending?

Noted. Will change it to sched_change_pending.

>
> The choice of operations seems unintuitive, when you could have simply
> done:
> 
> 	ktime_t correction = ktime_sub(sched_base_time(admin), end_time);
> 
> 	next->interval += correction;
> 	oper->cycle_time += correction;
> 
> 	(or possibly save the correction instead, see the feedback above)

Thanks. Feedback taken.

> I see an issue here: you need to set q->sched_changed = true whenever
> should_change_schedules() returned true and not just for the last entry,
> correct? Because there could be a schedule change which needs to happens
> during entry 2 out of 4 of the current one (truncation case - negative
> correction). In that case, I believe that should_change_schedules()
> keeps shouting at you "change! change! change!" but you only call
> switch_schedules() when you've reached entry 4 with the "next" pointer,
> which is not what the standard suggests should be done.
> 
> IIUC, the standard says that when an admin and an oper sched meet, the
> decision of what to do near the AdminBaseTime - whether to elongate the
> next-to-last cycle of the oper sched or to let the last cycle run but to
> cut it short - depends on the OperCycleTimeExtension. In a nutshell,
> that variable says what is the maximum positive correction applicable to
> the last sched entry and to the cycle time. If a positive correction
> larger than that would be necessary (relative to the next-to-last cycle),
> the decision is to just let the last cycle run for how long it can.

Based on my understanding, here are the two cases when the Oper and Admin
cycle boundaries don’t line up perfectly:-
1/ The final Oper cycle before first Admin cycle is smaller than
   OperCycleTimeExtension:
   - Then extend the final oper cycle rather than restart a very short final
     oper cycle.
2/ The final Oper cycle before first Admin cycle is greater than
   OperCycleTimeExtension:
   - Then it won't extend the final Oper cycle and restart the final Oper
     cycle instead, then it will be truncated at Admin base time.

I think you are saying the scenario 2/ above, right?
Let me rework the solution and come back with the proper fixes.

> I guess at some point we should also fix up this comment?
> 
> 	/* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about
> 	 * how precisely the extension should be made. So after
> 	 * conformance testing, this logic may change.
> 	 */
> 	if (ktime_compare(next_base_time, extension_time) <= 0)
> 		return true;

Agree. Let me fix up this comment in next patch.

Thanks,
Tee Min