linux-kernel - Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r1jbv6jc.ffs@nanos.tec.linutronix.de>
Date:   Thu, 15 Apr 2021 18:47:35 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Lorenzo Colitti <lorenzo@...gle.com>,
        Greg KH <gregkh@...uxfoundation.org>
Cc:     Maciej Żenczykowski <zenczykowski@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        Anna-Maria Behnsen <anna-maria@...utronix.de>,
        lkml <linux-kernel@...r.kernel.org>,
        mikael.beckius@...driver.com,
        Maciej Żenczykowski <maze@...gle.com>
Subject: Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()

On Wed, Apr 14 2021 at 11:49, Lorenzo Colitti wrote:
> On Wed, Apr 14, 2021 at 2:14 AM Greg KH <gregkh@...uxfoundation.org> wrote:
>> To give context, the commit is now 46eb1701c046 ("hrtimer: Update
>> softirq_expires_next correctly after __hrtimer_get_next_event()") and is
>> attached below.
>>
>> The f_ncm.c driver is doing a lot of calls to hrtimer_start() with mode
>> HRTIMER_MODE_REL_SOFT for I think every packet it gets.  If that should
>> not be happening, we can easily fix it but I guess no one has actually
>> had fast USB devices that noticed this until now :)
>
> AIUI the purpose of this timer is to support packet aggregation. USB
> transfers are relatively expensive/high latency. 6 Gbps is 500k
> 1500-byte packets per second, or one every 2us. So f_ncm buffers as
> many packets as will fit into 16k (usually, 10 1500-byte packets), and
> only initiates a USB transfer when those packets have arrived. That
> ends up doing only one transfer every 20us. It sets a 300us timer to
> ensure that if the 10 packets haven't arrived, it still sends out
> whatever it has when the timer fires. The timer is set/updated on
> every packet buffered by ncm.
>
> Is this somehow much more expensive in 5.10.24 than it was before?
> Even if this driver is somehow "holding it wrong", might there not be
> other workloads that have a similar problem? What about regressions on
> those workloads?

Let's put the question of whether this hrtimer usage is sensible or not
aside for now.

I stared at the change for a while and did some experiments to recreate
the problem, but that didn't get me anywhere.

Could you please do the following?

Enable tracing and enable the following tracepoints:

    timers/hrtimer_cancel
    timers/hrtimer_start
    timers/hrtimer_expire_entry
    irq/softirq_raise
    irq/softirq_enter
    irq/softirq_exit

and function tracing filtered on ncm_wrap_ntb() and
package_for_tx() only (to reduce the noise).

Run the test on a kernels with and without that commit and collect trace
data for both.

That should give me a pretty clear picture what's going on.

Thanks,

        tglx