lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sehh2gw8.ffs@tglx>
Date: Sun, 24 Aug 2025 11:44:23 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Jirka Hladky <jhladky@...hat.com>, linux-kernel
 <linux-kernel@...r.kernel.org>, john.stultz@...aro.org,
 anna-maria@...utronix.de
Cc: Philip Auld <pauld@...hat.com>, Prarit Bhargava <prarit@...hat.com>,
 Luis Goncalves <lgoncalv@...hat.com>, Miroslav Lichvar
 <mlichvar@...hat.com>, Luke Yang <luyang@...hat.com>, Jan Jurca
 <jjurca@...hat.com>, Joe Mario <jmario@...hat.com>
Subject: Re: [REGRESSION] 76% performance loss in timer workloads caused by
 513793bc6ab3 "posix-timers: Make signal delivery consistent"

On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> I'm reporting a performance regression in kernel 6.13 that causes a
> 76% performance loss in timer-heavy workloads.

Are you talking about real world workloads or about the stress-ng bogosity?

> Through kernel bisection, we have identified the root cause as commit
> 513793bc6ab331b947111e8efaf8fcef33fb83e5.
>
> Summary
>
> Regression: 76% performance drop in applications using nanosleep()/POSIX timers
>  * 4.3x increase in timer overruns and voluntary context switches
>   * Dramatic drop in timer completion rate (76% -> 20%)
>   * Over 99% of timers fail to expire when timer migration is disabled in 6.13
> Root Cause: commit 513793bc6ab3 "posix-timers: Make signal delivery consistent"
> Impact: Timer signal delivery mechanism broken
> Reproducer: stress-ng --timer workload on any system.

That does:

arm_timer()
{
     timer.it_value.tv_sec = ...;
     timer.it_value.tv_nsec = ...;

     timer.it_interval.tv_sec = timer.it_value.tv_sec;
     timer.it_interval.tv_nsec = timer.it_value.tv_nsec;

     timer_settime(....&timer);
}

and in the signal handler it does:

     ...
     timer_getoverrun();
     arm_timer();

So from the kernel POV this means:

user space starts timer
arm_timer()
    ....    
        hrtimer_start()
    ...
        hrtimer_expire()
          raise_signal()

   signal_delivery()
        if (interval > 0)
#1          hrtimer_start()

user space signal_handler()
     
arm_timer()

        hrtimer_cancel();

#2      clear pending and overrun

        hrtimer_start();

So it's exactly doing what user space asks for.

Older kernels accounted for overruns and pending signals which might
have accumulated between #1 and #2, which is undefined behaviour as user
space cannot longer differentiate to which arming the expiry or the
overruns belong.

So clearing it when rearmed is the obvious correct thing to do because
it makes it consistent, no?

The same applies for the disarm scenario:

arm_timer()
     ...
     expires()
       raise_signal()

disarm_timer()
     ...
     discard signal

Older kernels did not discard it, but that makes zero sense because
after disarming the timer both the signal and the overrun becomes
immediately meaningless, no?

And this has nothing to do with timer migration or whatever, that's just
a matter of correctness.

If you can point me to a real world workload, which uses timers
correctly and does not just do random stuff with them, I'm happy to look
into it.

But this stress-ng thing is just made up nonsense which created bogus
statistics forever. So comparing bogus numbers is not an indicator for
a real regression.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ