linux-kernel - Jiffies jumping with the x86 HPET

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <768725.48321.qm@web53403.mail.re2.yahoo.com>
Date:	Mon, 16 Nov 2009 15:59:15 -0800 (PST)
From:	Lee Merrill <lee_merrill@...oo.com>
To:	LKML <linux-kernel@...r.kernel.org>
Subject: Jiffies jumping with the x86 HPET

We are seeing jiffies go forward occasionally, by 300 seconds, this it appears is due to the following code in the 2.6.16 kernel:

mark_offset_tsc_hpet(void):
...
1       hpet_current = hpet_readl(HPET_COUNTER);
2       rdtsc(last_tsc_low, last_tsc_high);
3
4       /* lost tick compensation */
5       offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
6       if (unlikely(((offset - hpet_last) > hpet_tick) && (hpet_last != 0))
7                                       && detect_lost_ticks) {
8               int lost_ticks = (offset - hpet_last) / hpet_tick;
9               jiffies_64 += lost_ticks;
10      }
11      hpet_last = hpet_current;

where "offset - hpet_last" is an
unsigned -9, thus the test passes, and jiffies is incremented by a
large and invalid amount (by a bit less than 300 seconds). Now the
HPET_T0_CMP register being the timer comparison register, when the HPET's counter
reaches that value, the comparison register is incremented by
hpet_tick, and an interrupt is generated.

So let's say hpet_tick is 100, thus the
timer interrupts at every 100 HPET ticks, and let's say that just
before line 1, we get delayed, so that another timer interrupt becomes
pending.Then we read the counter (say, 809) and HPET_T0_CMP (900), and
store the counter value of 809 in "hpet_last". Then we get our pending
timer interrupt, and HPET_T0_CMPis still 900, so "offset" is 900 - 100
or 800, and "offset - hpet_last" would be unsigned -9, and jiffies gets
a large increment.

Here also are the actual values for a failure, annotating the disassembled code (whether or not the above scenario is correct).

0xfcd64600:     0x000124ef      0x0000dfb9      0x00000000      0xdff19ea8
                EAX             ECX             EDX             EBX

0xfcd64610:     0xdff19e3c      0x7a120471      0x00000000      0x00000000
                ESP             EBP             ESI             EDI

0xfcd64620:     0xc010d529      0x00000006      0x00000060      0x00000068
                EIP             PS              CS              SS

0xfcd64630:     0x0000007b      0x0000007b      0x00000000      0x00000000
                DS              ES              FS              GS

0xfcd64640:     0xc034cc00      0x00000000      0x00000000      0x00000000
0xfcd64650:     0xffff0ff1      0x000d0703      0xffff0ff1      0x000d0703

                // hpet_current = hpet_readl(HPET_COUNTER);
                // rdtsc(last_tsc_low, last_tsc_high);
c010d4e0:       0f 31                   rdtsc 
c010d4e2:       a3 4c db 38 c0          mov    %eax,0xc038db4c      // last_tsc_low: 0x55fe575c
                // offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
c010d4e7:       b8 08 01 00 00          mov    $0x108,%eax
c010d4ec:       89 15 50 db 38 c0       mov    %edx,0xc038db50      // last_tsc_high: 0x00099c39
c010d4f2:       e8 79 90 00 00          call   c0116570 <hpet_readl>
c010d4f7:       8b 15 20 90 39 c0       mov    0xc0399020,%edx      // hpet_tick: 0x0000dfb9
c010d4fd:       8b 0d 40 db 38 c0       mov    0xc038db40,%ecx      // hpet_last: 0x7a12047a
c010d503:       89 c5                   mov    %eax,%ebp
c010d505:       29 d5                   sub    %edx,%ebp            // %ebp: offset: 0x7a120471
                // (offset - hpet_last): -9
                // detect_lost_ticks: 1
                // if (unlikely(((offset - hpet_last) > hpet_tick) && (hpet_last != 0))
                //              && detect_lost_ticks) {
c010d507:       89 e8                   mov    %ebp,%eax
c010d509:       29 c8                   sub    %ecx,%eax
c010d50b:       39 d0                   cmp    %edx,%eax
c010d50d:       76 20                   jbe    c010d52f <mark_offset_tsc_hpet+0x87>
c010d50f:       85 c9                   test   %ecx,%ecx
c010d511:       74 1c                   je     c010d52f <mark_offset_tsc_hpet+0x87>
c010d513:       83 3d 60 db 38 c0 00    cmpl   $0x0,0xc038db60
c010d51a:       74 13                   je     c010d52f <mark_offset_tsc_hpet+0x87>
                // int lost_ticks = (offset - hpet_last) / hpet_tick;
c010d51c:       89 d1                   mov    %edx,%ecx
c010d51e:       31 d2                   xor    %edx,%edx
c010d520:       f7 f1                   div    %ecx
c010d522:       99                      cltd  
                // jiffies_64 += lost_ticks;
c010d523:       01 05 00 cc 34 c0       add    %eax,0xc034cc00
c010d529:       11 15 04 cc 34 c0       adc    %edx,0xc034cc04

The above scenario requires that the timer interrupt routine be either interrupted itself (can you tell what priority each interrupt is?) or that it get preempted, if such preemption of a timer interrupt is possible, or some such. The fix would be simple, to just make the comparison "((offset - hpet_last) > hpet_tick)" be a signed comparison.

The timer system being rewritten in 2.6.18 means this problem is not present from this version on, but we see one failure a week or so in the lab, and in several systems in the field, so a patch or at least a note for kernels before 2.6.18 might be helpful.

Lee Merrill
Bus-Tech Inc.


 
|===============================================================
| Lee Merrill -------------------------- Home page: leenotes.org
| "Give thanks in all circumstances..."
|===============================================================

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/