lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1f1b08da0911171415r12503874kdfc6b50aa376c9c0@mail.gmail.com>
Date:	Tue, 17 Nov 2009 14:15:37 -0800
From:	john stultz <johnstul@...ibm.com>
To:	Lee Merrill <lee_merrill@...oo.com>
Cc:	LKML <linux-kernel@...r.kernel.org>
Subject: Re: Jiffies jumping with the x86 HPET

On Mon, Nov 16, 2009 at 3:59 PM, Lee Merrill <lee_merrill@...oo.com> wrote:
> We are seeing jiffies go forward occasionally, by 300 seconds, this it appears is due to the following code in the 2.6.16 kernel:
>
> mark_offset_tsc_hpet(void):
> ...
> 1       hpet_current = hpet_readl(HPET_COUNTER);
> 2       rdtsc(last_tsc_low, last_tsc_high);
> 3
> 4       /* lost tick compensation */
> 5       offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
> 6       if (unlikely(((offset - hpet_last) > hpet_tick) && (hpet_last != 0))
> 7                                       && detect_lost_ticks) {
> 8               int lost_ticks = (offset - hpet_last) / hpet_tick;
> 9               jiffies_64 += lost_ticks;
> 10      }
> 11      hpet_last = hpet_current;
>
> where "offset - hpet_last" is an
> unsigned -9, thus the test passes, and jiffies is incremented by a
> large and invalid amount (by a bit less than 300 seconds). Now the
> HPET_T0_CMP register being the timer comparison register, when the HPET's counter
> reaches that value, the comparison register is incremented by
> hpet_tick, and an interrupt is generated.
>
> So let's say hpet_tick is 100, thus the
> timer interrupts at every 100 HPET ticks, and let's say that just
> before line 1, we get delayed, so that another timer interrupt becomes
> pending.Then we read the counter (say, 809) and HPET_T0_CMP (900), and
> store the counter value of 809 in "hpet_last". Then we get our pending
> timer interrupt, and HPET_T0_CMPis still 900, so "offset" is 900 - 100
> or 800, and "offset - hpet_last" would be unsigned -9, and jiffies gets
> a large increment.

[snip]

> The above scenario requires that the timer interrupt routine be either interrupted itself (can you tell what priority each interrupt is?) or that it get preempted, if such preemption of a timer interrupt is possible, or some such. The fix would be simple, to just make the comparison "((offset - hpet_last) > hpet_tick)" be a signed comparison.
>
> The timer system being rewritten in 2.6.18 means this problem is not present from this version on, but we see one failure a week or so in the lab, and in several systems in the field, so a patch or at least a note for kernels before 2.6.18 might be helpful.

So yea, its concerning your hardware is skipping ticks and doing it in
irq context as well. The timekeeping rework was done to avoid this
sort of issue from bad hardware. So I'd recommend upgrading, but I
understand that's not always an option.

No promises on if there's not other issues here, but it would seem the
issue is stemming from mixing the HPET_COUNTER and HPET_T0_CMP.
Instead, hpet_last should be the hpet_readl(HPET_T0_CMP), which avoids
issues if you get blocked (likely due to an SMI).

So the code would look something like:

/* lost tick compensation */
hpet_cmp = hpet_readl(HPET_T0_CMP)
offset = hpet_cmp - hpet_tick;
 if (unlikely(((offset - hpet_last) > hpet_tick) && (hpet_last != 0))
                                      && detect_lost_ticks) {
                    int lost_ticks = (offset - hpet_last) / hpet_tick;
                    jiffies_64 += lost_ticks;
}
hpet_last = hpet_cmp;

Again, no promises that this doesn't have other side effects, but
seems like it would avoid the negative offsets properly and make sure
the hpet_last actually stores the time of the irq, rather then the
time it was serviced.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ