linux-kernel - Re: [PATCH v2 net 7/7] net/sched: taprio: enable cycle time adjustment for current entry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e37c0ba6-292d-4afe-a563-a82691b1a8f3@linux.intel.com>
Date:   Wed, 15 Nov 2023 19:57:00 +0800
From:   "Abdul Rahim, Faizal" <faizal.abdul.rahim@...ux.intel.com>
To:     Vladimir Oltean <vladimir.oltean@....com>
Cc:     Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 net 7/7] net/sched: taprio: enable cycle time
 adjustment for current entry



On 9/11/2023 9:18 pm, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:23AM -0500, Faizal Rahim wrote:
>> Handles cycle time adjustments for the current active entry
> 
> Use the imperative mood for commit messages, i.e. "handle".
> 
>> when new admin base time occurs quickly, either within the
>> current entry or the next one.
>>
>> Changes covers:
>> 1. Negative cycle correction or truncation
>> Occurs when the new admin base time falls before the expiry of the
>> current running entry.
>>
>> 2. Positive cycle correction or extension
>> Occurs when the new admin base time falls within the next entry,
>> and the current entry is the cycle's last entry. In this case, the
>> changes in taprio_start_sched() extends the schedule, preventing
>> old oper schedule from resuming and getting truncated in the next
>> advance_sched() call.
>>
>> 3. A new API, update_gate_close_time(), has been created to update
>> the gate_close_time of the current entry in the event of cycle
>> correction.
>>
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@...ux.intel.com>
>> ---
>>   net/sched/sch_taprio.c | 72 +++++++++++++++++++++++++++++++-----------
>>   1 file changed, 53 insertions(+), 19 deletions(-)
>>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index c60e9e7ac193..56743754d42e 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -1379,41 +1379,75 @@ static void setup_first_end_time(struct taprio_sched *q,
>>   		rcu_assign_pointer(q->current_entry, NULL);
>>   }
>>   
>> +static void update_gate_close_time(struct sched_entry *current_entry,
>> +				   ktime_t new_end_time,
>> +				   int num_tc)
>> +{
>> +	int tc;
>> +
>> +	for (tc = 0; tc < num_tc; tc++) {
>> +		if (current_entry->gate_mask & BIT(tc))
>> +			current_entry->gate_close_time[tc] = new_end_time;
>> +	}
>> +}
>> +
>>   static void taprio_start_sched(struct Qdisc *sch,
>>   			       ktime_t new_base_time,
>> -			       struct sched_gate_list *new)
>> +			       struct sched_gate_list *admin)
>>   {
>>   	struct taprio_sched *q = qdisc_priv(sch);
>> +	ktime_t expires = hrtimer_get_expires(&q->advance_timer);
>> +	struct net_device *dev = qdisc_dev(q->root);
>> +	struct sched_entry *curr_entry = NULL;
>>   	struct sched_gate_list *oper = NULL;
>> -	ktime_t expires, start;
>>   
>>   	if (FULL_OFFLOAD_IS_ENABLED(q->flags))
>>   		return;
>>   
>>   	oper = rcu_dereference_protected(q->oper_sched,
>>   					 lockdep_is_held(&q->current_entry_lock));
>> +	curr_entry = rcu_dereference_protected(q->current_entry,
>> +					       lockdep_is_held(&q->current_entry_lock));
>>   
>> -	expires = hrtimer_get_expires(&q->advance_timer);
>> -	if (expires == 0)
>> -		expires = KTIME_MAX;
>> +	if (hrtimer_active(&q->advance_timer)) {
>> +		oper->cycle_time_correction =
>> +			get_cycle_time_correction(oper, new_base_time,
>> +						  curr_entry->end_time,
>> +						  curr_entry);
>>   
>> -	/* If the new schedule starts before the next expiration, we
>> -	 * reprogram it to the earliest one, so we change the admin
>> -	 * schedule to the operational one at the right time.
>> -	 */
>> -	start = min_t(ktime_t, new_base_time, expires);
>> -
>> -	if (expires != KTIME_MAX &&
>> -	    ktime_compare(start, new_base_time) == 0) {
>> -		/* Since timer was changed to align to the new admin schedule,
>> -		 * setting the variable below to a non-initialized value will
>> -		 * indicate to advance_sched() to call switch_schedules() after
>> -		 * this timer expires.
> 
> I would appreciate not changing things that you've established in
> earlier changes. Try to keep stuff introduced earlier in a form that is
> as close as possible to the final form.
> 

Sorry for that, it occurred because I aimed to split a patch that 
specifically addressed a problem I identified, versus my entire patch set.

I'll keep that feedback in mind going forward.

>> +		if (cycle_corr_active(oper->cycle_time_correction)) {
>> +			/* This is the last entry we are running from oper,
>> +			 * subsequent entry will take from the new admin.
>> +			 */
>> +			ktime_t	now = taprio_get_time(q);
>> +			u64 gate_duration_left = ktime_sub(new_base_time, now);
> 
> What is special about "now" as a moment in time? Gate durations are
> calculated relative to the moment when the sched_entry begins.
> 

My reasoning behind using "now" is that, by the time taprio_start_sched() 
is called, "maybe" the entry has already run—up to 80% of its total 
duration, for example. So, if an extension is needed for this last entry, 
wouldn't it be more accurate to calculate the remaining 20% of the entry 
duration plus the duration to the new base time? That's the idea behind the 
calculation above.

In contrast, in the case of extension in `advance_sched()`, gate durations 
during correction are calculated using the full interval duration. This is 
because the entry hasn't run yet and will run for the entire interval 
duration. Hence, the logic in the code below makes sense for advance_sched():
      u64 new_gate_duration = next->interval + oper->cycle_time_correction;


Let me know what you think, my interpretation of what value gate_duration 
should be, in this scenario, could be wrong.


>> +			struct qdisc_size_table *stab =
>> +				rtnl_dereference(q->root->stab);
> 
> "q->root" is "sch".
> 
>> +			int num_tc = netdev_get_num_tc(dev);
> 
> It would be nice if you could pay some attention to the preferred
> variable declaration style, i.e. longer lines come first. If you cannot
> easily respect that, you could split the variable declarations from
> their initialization.
> 

Hmm, I usually try to follow that that reverse Christmas tree style, but in 
this scenario, ensuring the proper initialization value for the local 
variable seemed more important to me than adhering to that style.

I wasn't aware it was a set-in-stone kind of rule.
I'll make the change accordingly then.

>> +
>> +			oper->cycle_end_time = new_base_time;
>> +			curr_entry->end_time = new_base_time;
>> +			curr_entry->correction_active = true;
>> +
>> +			update_open_gate_duration(curr_entry, oper, num_tc,
>> +						  gate_duration_left);
> 
> Recalculating open gate durations with a cycle time correction seems
> very complicated, at least from this code path. What depends on this?
> The data path only looks at the gate_close_time. Can we get away with
> updating only the gate_close_time?
> 

Actually, it follows a similar logic to the earlier patches:
"net/sched: taprio: fix cycle time adjustment for next entry" and 
"net/sched: taprio: update impacted fields during cycle time adjustment." 
Those patches also involved recalculating gate duration with cycle time 
correction.

Skipping the update for max_open_gate_duration means max_sdu value will be 
incorrect. Neglecting gate_duration means entry->budget[tc] won't get the 
correct update. Both of these affect taprio_dequeue_from_txq() and 
taprio_enqueue().

By the way, weren't all these consequences part of your original concern in 
the comment back in v1?

The only difference in this patch is that it's handling the "current entry" 
instead of the "next entry" - like in advance_sched().

>> +			update_gate_close_time(curr_entry, new_base_time, num_tc);
>> +			taprio_update_queue_max_sdu(q, oper, stab);
>> +			taprio_set_budgets(q, oper, curr_entry);
> 
> There's a lot of duplication between the correction management from
> advance_sched() and the one from taprio_start_sched(). I wonder if some
> of it can go into a common function.
> 

Considered that, but it's not as doable as it seems.

I can't toss all of these into a single function and call it for both 
because there are some differences to navigate.
The common functions being called are taprio_update_queue_max_sdu(), 
update_open_gate_duration(), and taprio_set_budgets().
However, advance_sched() doesn't call update_gate_close_time(), and there 
are some differences in the order of where it's called in advance_sched().

Nevertheless, I'm all ears for suggestion on the common function.

>> +		}
>> +	}
>> +
>> +	if (!hrtimer_active(&q->advance_timer) ||
>> +	    cycle_corr_active(oper->cycle_time_correction)) {
>> +		/* Use new admin base time if :
>> +		 * 1. there's no active oper
>> +		 * 2. there's active oper and we will change to the new admin
>> +		 * schedule after the current entry from oper ends
>>   		 */
>> -		oper->cycle_time_correction = 0;
>> +		expires = new_base_time;
>>   	}
>>   
>> -	hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
>> +	hrtimer_start(&q->advance_timer, expires, HRTIMER_MODE_ABS);
>>   }
>>   
>>   static void taprio_set_picos_per_byte(struct net_device *dev,
>> -- 
>> 2.25.1
>>
>