netdev - Re: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date: Wed, 15 Nov 2023 19:55:53 +0800
From: "Abdul Rahim, Faizal" <faizal.abdul.rahim@...ux.intel.com>
To: Vladimir Oltean <vladimir.oltean@....com>,
 Vinicius Costa Gomes <vinicius.gomes@...el.com>
Cc: Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang
 <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
 "David S . Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of
 cycle_time and interval

On 9/11/2023 7:11 pm, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote:
>> Retrieve adjusted cycle_time and interval values through new APIs.
>> Note that in some cases where the original values are required,
>> such as in dump_schedule() and setup_first_end_time(), direct calls
>> to cycle_time and interval are retained without using the new APIs.
>>
>> Added a new field, correction_active, in the sched_entry struct to
>> determine the entry's correction state. This field is required due
>> to specific flow like find_entry_to_transmit() -> get_interval_end_time()
>> which retrieves the interval for each entry. During positive cycle
>> time correction, it's known that the last entry interval requires
>> correction. However, for negative correction, the affected entry
>> is unknown, which is why this new field is necessary.
> 
> I agree with the motivation, but I'm not sure if the chosen solution is
> correct.
> 
> static u32 get_interval(const struct sched_entry *entry,
> 			const struct sched_gate_list *oper)
> {
> 	if (entry->correction_active)
> 		return entry->interval + oper->cycle_time_correction;
> 
> 	return entry->interval;
> }
> 
> What if the schedule looks like this:
> 
> 	sched-entry S 0x01 125000000
> 	sched-entry S 0x02 125000000
> 	sched-entry S 0x04 125000000
> 	sched-entry S 0x08 125000000
> 	sched-entry S 0x10 125000000
> 	sched-entry S 0x20 125000000
> 	sched-entry S 0x40 125000000
> 	sched-entry S 0x80 125000000
> 
> and the calculated cycle_time_correction is -200000000? That would
> eliminate the entire last sched-entry (0x80), and the previous one
> (0x40) would run for just 75000000 ns. But your calculation would say
> that its interval is −75000000 ns (actually reported as an u32 positive
> integer, so it would be a completely bogus value).
> 
> So not only is the affected entry unknown, but also the amount of cycle
> time correction that applies to it is unknown.
> 

Just an FYI, my cycle time extension test for sending packets fails without 
updating the interval and cycle_time – the duration doesn't extend 
properly. I only observe proper extension when this patch is included.

In patch series v1, interval and cycle_time were updated directly. However, 
due to concerns in v1 comments about updating the fields directly, v2 
doesn't do that.

Regarding the concern about negative correction exceeding the interval 
value, I've checked the logic in get_cycle_time_correction() that sets 
cycle_time_correction, I don't see the possibility of this happening.... 
Still, if it does, it suggests an error much earlier than the 
get_interval() call. So, I propose a failure check in 
get_cycle_time_correction(). If the correction value is negative and 
consumes the entire entry interval or more, we set the negative 
cycle_time_correction to some arbitrary value, maybe half of the interval, 
just to mitigate the impact of the unknown error that occurred earlier.

What do you think ?

> I'm looking at where we need get_interval(), and it's from:
> 
> taprio_enqueue_one()
> -> is_valid_interval()
>     -> find_entry_to_transmit()
>        -> get_interval_end_time()
> -> get_packet_txtime()
>     -> find_entry_to_transmit()
> 
> I admit it's a part of taprio which I don't understand too well. Why do
> we perform such complex calculations in get_interval_end_time() when we
> should have struct sched_entry :: end_time precomputed and available for
> this purpose (although it was primarily inteded for advance_sched() and
> not for enqueue())?
> 
> Vinicius, do you know?