netdev - Re: [PATCH v2 net 3/7] net/sched: taprio: update impacted fields during cycle time adjustment

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 15 Nov 2023 19:55:34 +0800
From: "Abdul Rahim, Faizal" <faizal.abdul.rahim@...ux.intel.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: Vinicius Costa Gomes <vinicius.gomes@...el.com>,
 Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>,
 Jiri Pirko <jiri@...nulli.us>, "David S . Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 net 3/7] net/sched: taprio: update impacted fields
 during cycle time adjustment



On 9/11/2023 7:41 am, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:19AM -0500, Faizal Rahim wrote:
>> Update impacted fields in advance_sched() if cycle_corr_active()
>> is true, which indicates that the next entry is the last entry
>> from oper that it will run.
>>
>> Update impacted fields:
>>
>> 1. gate_duration[tc], max_open_gate_duration[tc]
>> Created a new API update_open_gate_duration().The API sets the
>> duration based on the last remaining entry, the original value
>> was based on consideration of multiple entries.
>>
>> 2. gate_close_time[tc]
>> Update next entry gate close time according to the new admin
>> base time
>>
>> 3. max_sdu[tc], budget[tc]
>> Restrict from setting to max value because there's only a single
>> entry left to run from oper before changing to the new admin
>> schedule, so we shouldn't set to max.
>>
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@...ux.intel.com>
> 
> The commit message shouldn't be a text-to-speech output of the commit body.
> Say very shortly how the system should behave, what's wrong such that it
> doesn't behave as expected, what's the user-visible impact of the bug,
> and try to identify why the bug happened.
> 
> In this case, what happened is that commit a306a90c8ffe ("net/sched:
> taprio: calculate tc gate durations"), which introduced the impacted
> fields you are changing, never took dynamic schedule changes into
> consideration. So this commit should also appear in the Fixes: tag.

Got it.

>> ---
>>   net/sched/sch_taprio.c | 49 +++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 46 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index ed32654b46f5..119dec3bbe88 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -288,7 +288,8 @@ static void taprio_update_queue_max_sdu(struct taprio_sched *q,
>>   		/* TC gate never closes => keep the queueMaxSDU
>>   		 * selected by the user
>>   		 */
>> -		if (sched->max_open_gate_duration[tc] == sched->cycle_time) {
>> +		if (sched->max_open_gate_duration[tc] == sched->cycle_time &&
>> +		    !cycle_corr_active(sched->cycle_time_correction)) {
>>   			max_sdu_dynamic = U32_MAX;
>>   		} else {
>>   			u32 max_frm_len;
>> @@ -684,7 +685,8 @@ static void taprio_set_budgets(struct taprio_sched *q,
>>   
>>   	for (tc = 0; tc < num_tc; tc++) {
>>   		/* Traffic classes which never close have infinite budget */
>> -		if (entry->gate_duration[tc] == sched->cycle_time)
>> +		if (entry->gate_duration[tc] == sched->cycle_time &&
>> +		    !cycle_corr_active(sched->cycle_time_correction))
>>   			budget = INT_MAX;
>>   		else
>>   			budget = div64_u64((u64)entry->gate_duration[tc] * PSEC_PER_NSEC,
>> @@ -896,6 +898,32 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
>>   	return false;
>>   }
>>   
>> +/* Open gate duration were calculated at the beginning with consideration of
>> + * multiple entries. If cycle time correction is active, there's only a single
>> + * remaining entry left from oper to run.
>> + * Update open gate duration based on this last entry.
>> + */
>> +static void update_open_gate_duration(struct sched_entry *entry,
>> +				      struct sched_gate_list *oper,
>> +				      int num_tc,
>> +				      u64 open_gate_duration)
>> +{
>> +	int tc;
>> +
>> +	if (!entry || !oper)
>> +		return;
>> +
>> +	for (tc = 0; tc < num_tc; tc++) {
>> +		if (entry->gate_mask & BIT(tc)) {
>> +			entry->gate_duration[tc] = open_gate_duration;
>> +			oper->max_open_gate_duration[tc] = open_gate_duration;
>> +		} else {
>> +			entry->gate_duration[tc] = 0;
>> +			oper->max_open_gate_duration[tc] = 0;
>> +		}
>> +	}
>> +}
>> +
>>   static bool should_change_sched(struct sched_gate_list *oper)
>>   {
>>   	bool change_to_admin_sched = false;
>> @@ -1010,13 +1038,28 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>>   			/* The next entry is the last entry we will run from
>>   			 * oper, subsequent ones will take from the new admin
>>   			 */
>> +			u64 new_gate_duration =
>> +				next->interval + oper->cycle_time_correction;
>> +			struct qdisc_size_table *stab =
>> +				rtnl_dereference(q->root->stab);
>> +
> 
> The lockdep annotation for this RCU accessor is bogus.
> rtnl_dereference() is the same as rcu_dereference_protected(..., lockdep_rtnl_is_held()),
> which cannot be true in a hrtimer callback, as the rtnetlink lock is a
> sleepable mutex and hrtimers run in atomic context.
> 
> Running with lockdep enabled will tell you as much:
> 
> $ ./test_taprio_cycle_extension.sh
> Testing config change with a delay of 5250000000 ns between schedules
> [  100.734925]
> [  100.736703] =============================
> [  100.740780] WARNING: suspicious RCU usage
> [  100.744857] 6.6.0-10114-gca572939947f #1495 Not tainted
> [  100.750162] -----------------------------
> [  100.754236] net/sched/sch_taprio.c:1064 suspicious rcu_dereference_protected() usage!
> [  100.762155]
> [  100.762155] other info that might help us debug this:
> [  100.762155]
> [  100.770242]
> [  100.770242] rcu_scheduler_active = 2, debug_locks = 1
> [  100.776851] 1 lock held by swapper/0/0:
> [  100.780756]  #0: ffff3d9784b83b00 (&q->current_entry_lock){-...}-{3:3}, at: advance_sched+0x44/0x59c
> [  100.790099]
> [  100.790099] stack backtrace:
> [  100.794477] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-10114-gca572939947f #1495
> [  100.802346] Hardware name: LS1028A RDB Board (DT)
> [  100.807072] Call trace:
> [  100.809531]  dump_backtrace+0xf4/0x140
> [  100.813305]  show_stack+0x18/0x2c
> [  100.816638]  dump_stack_lvl+0x60/0x80
> [  100.820321]  dump_stack+0x18/0x24
> [  100.823654]  lockdep_rcu_suspicious+0x170/0x210
> [  100.828210]  advance_sched+0x384/0x59c
> [  100.831978]  __hrtimer_run_queues+0x200/0x430
> [  100.836360]  hrtimer_interrupt+0xdc/0x39c
> [  100.840392]  arch_timer_handler_phys+0x3c/0x4c
> [  100.844862]  handle_percpu_devid_irq+0xb8/0x28c
> [  100.849417]  generic_handle_domain_irq+0x2c/0x44
> [  100.854060]  gic_handle_irq+0x4c/0x110
> [  100.857830]  call_on_irq_stack+0x24/0x4c
> [  100.861775]  el1_interrupt+0x74/0xc0
> [  100.865370]  el1h_64_irq_handler+0x18/0x24
> [  100.869489]  el1h_64_irq+0x64/0x68
> [  100.872909]  arch_local_irq_enable+0x8/0xc
> [  100.877032]  cpuidle_enter+0x38/0x50
> [  100.880629]  do_idle+0x1ec/0x280
> [  100.883877]  cpu_startup_entry+0x34/0x38
> [  100.887822]  kernel_init+0x0/0x1a0
> [  100.891245]  start_kernel+0x0/0x3b0
> [  100.894756]  start_kernel+0x2f8/0x3b0
> [  100.898439]  __primary_switched+0xbc/0xc4
> 
> What I would do is:
> 
> 			struct qdisc_size_table *stab;
> 
> 			rcu_read_lock();
> 			stab = rcu_dereference(q->root->stab);
> 			taprio_update_queue_max_sdu(q, oper, stab);
> 			rcu_read_unlock();
> 
>>   			oper->cycle_end_time = new_base_time;
>>   			end_time = new_base_time;
>> +
>> +			update_open_gate_duration(next, oper, num_tc,
>> +						  new_gate_duration);
>> +			taprio_update_queue_max_sdu(q, oper, stab);
>>   		}
>>   	}
>>   
>>   	for (tc = 0; tc < num_tc; tc++) {
>> -		if (next->gate_duration[tc] == oper->cycle_time)
>> +		if (cycle_corr_active(oper->cycle_time_correction) &&
>> +		    (next->gate_mask & BIT(tc)))
>> +			/* Set to the new base time, ensuring a smooth transition
>> +			 * to the new schedule when the next entry finishes.
>> +			 */
>> +			next->gate_close_time[tc] = end_time;
>> +		else if (next->gate_duration[tc] == oper->cycle_time)
>>   			next->gate_close_time[tc] = KTIME_MAX;
>>   		else
>>   			next->gate_close_time[tc] = ktime_add_ns(entry->end_time,
>> -- 
>> 2.25.1
>>
> 

Will update, thanks.