lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200911234816.474ad4bd@oasis.local.home>
Date:   Fri, 11 Sep 2020 23:48:16 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Viresh Kumar <viresh.kumar@...aro.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-rt-users <linux-rt-users@...r.kernel.org>,
        Bo Gan <ganb@...are.com>, Sharath George <sharathg@...are.com>,
        Srivatsa Bhat <srivatsab@...are.com>,
        Him Kalyan Bordoloi <bordoloih@...are.com>
Subject: [REGRESSION] Needless shutting down of oneshot timer in nohz mode

Hi Thomas,

The VMware PhotonOS team is evaluating 4.19-rt compared to CentOS
3.10-rt (franken kernel from Red Hat). They found a regression between
the two kernels that was found to be introduced by:

 d25408756accb ("clockevents: Stop unused clockevent devices")

The issue is running this on a guest, and it causes a noticeable wake
up latency in cyclictest. The 4.19-rt kernel has two extra apic
instructions causing for two extra VMEXITs to occur over the 3.10-rt
kernel. I found out the reason why, and this is true for vanilla 5.9-rc
as well.

When running isocpus with NOHZ_FULL, I see the following.

  tick_nohz_idle_stop_tick() {
	hrtimer_start_range_ns() {
		remove_hrtimer(timer)
			/* no more timers on the base */
			expires = KTIME_MAX;
			tick_program_event() {
				clock_switch_state(ONESHOT_STOPPED);
				/* call to apic to shutdown timer */
			}
		}
		[..]
		hrtimer_reprogram(timer) {
			tick_program_event() {
				clock_switch_state(ONESHOT);
				/* call to apic to enable timer again! */
		}
	}
 }


Thus, we are needlessly shutting down and restarting the apic every
time we call tick_nohz_stop_tick() if there is a timer still on the
queue.

I'm not exactly sure how to fix this. Is there a way we can hold off
disabling the clock here until we know that it isn't going to be
immediately enabled again?

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ