lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20070524.150640.31659488.davem@davemloft.net>
Date:	Thu, 24 May 2007 15:06:40 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	tglx@...utronix.de
CC:	linux-kernel@...r.kernel.org
Subject: [BUG]: hrtimer overflow bug on 64-bit systems


I've been tracing this problem on sparc64 now that it uses hi-res
timers, and I finally figured it out.

The symptom is that ksoftirqd on most of the cpus of an idle SMP
system chew up around %10 of cpu time.

The raise_softirq_irqoff() is constantly being invoked via
tick_nohz_stop_sched_tick(), but why?

Tracing revealed that delta_ticks is enormous, which is correct
because no timers are pending on the cpu so we should schedule the
hrtimer as far into the future as possible.

tick_nohz_stop_sched_tick() then proceeds to start the hrtimer, and
then it checks hrtimer_active() but for some reason this is always
false.

Why?

The reason is that the expires calculation here:

		expires = ktime_add_ns(last_update, tick_period.tv64 *
				       delta_jiffies);

overflows on 64-bit systems.

On 32-bit systems, LONG_MAX is a 32-bit quantity so the largest
possible delta_jiffies don't cause an overflow in this multiply
(which is 64-bit).  (LONG_MAX determines how large a value
will be returned to indicate "infinity" from get_next_timer_interrupt(),
specifically it's the initialization of local variable 'expires'
in __next_timer_interrupt).

Because of this 'expires' is zero and of course that causes the
hrtimer to not get scheduled at all.

I'm surprised this problem is not seen with the x86_64 hrtimer patches
applied :-)

I'm not exactly sure how to best fix this, we could either make
__next_timer_interrupt use INT_MAX or we could make
tick_nohz_stop_sched_tick() realize and handle the overflow
properly.  Neither of those solutions are fully satisfactory in
my opinion :-)

FWIW, I've verified that using INT_MAX in __next_timer_interrupt()
makes the problem go away.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ