[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r1jbv6jc.ffs@nanos.tec.linutronix.de>
Date: Thu, 15 Apr 2021 18:47:35 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Lorenzo Colitti <lorenzo@...gle.com>,
Greg KH <gregkh@...uxfoundation.org>
Cc: Maciej Żenczykowski <zenczykowski@...il.com>,
Ingo Molnar <mingo@...nel.org>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
lkml <linux-kernel@...r.kernel.org>,
mikael.beckius@...driver.com,
Maciej Żenczykowski <maze@...gle.com>
Subject: Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
On Wed, Apr 14 2021 at 11:49, Lorenzo Colitti wrote:
> On Wed, Apr 14, 2021 at 2:14 AM Greg KH <gregkh@...uxfoundation.org> wrote:
>> To give context, the commit is now 46eb1701c046 ("hrtimer: Update
>> softirq_expires_next correctly after __hrtimer_get_next_event()") and is
>> attached below.
>>
>> The f_ncm.c driver is doing a lot of calls to hrtimer_start() with mode
>> HRTIMER_MODE_REL_SOFT for I think every packet it gets. If that should
>> not be happening, we can easily fix it but I guess no one has actually
>> had fast USB devices that noticed this until now :)
>
> AIUI the purpose of this timer is to support packet aggregation. USB
> transfers are relatively expensive/high latency. 6 Gbps is 500k
> 1500-byte packets per second, or one every 2us. So f_ncm buffers as
> many packets as will fit into 16k (usually, 10 1500-byte packets), and
> only initiates a USB transfer when those packets have arrived. That
> ends up doing only one transfer every 20us. It sets a 300us timer to
> ensure that if the 10 packets haven't arrived, it still sends out
> whatever it has when the timer fires. The timer is set/updated on
> every packet buffered by ncm.
>
> Is this somehow much more expensive in 5.10.24 than it was before?
> Even if this driver is somehow "holding it wrong", might there not be
> other workloads that have a similar problem? What about regressions on
> those workloads?
Let's put the question of whether this hrtimer usage is sensible or not
aside for now.
I stared at the change for a while and did some experiments to recreate
the problem, but that didn't get me anywhere.
Could you please do the following?
Enable tracing and enable the following tracepoints:
timers/hrtimer_cancel
timers/hrtimer_start
timers/hrtimer_expire_entry
irq/softirq_raise
irq/softirq_enter
irq/softirq_exit
and function tracing filtered on ncm_wrap_ntb() and
package_for_tx() only (to reduce the noise).
Run the test on a kernels with and without that commit and collect trace
data for both.
That should give me a pretty clear picture what's going on.
Thanks,
tglx
Powered by blists - more mailing lists