lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <715998165@web.de>
Date:	Thu, 14 May 2009 22:25:01 +0200
From:	devzero@....de
To:	akataria@...are.com
Cc:	Alan Cox <alan@...rguk.ukuu.org.uk>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86: Reduce the default HZ value

> On Tue, 2009-05-12 at 12:45 -0700, devzero@....de wrote:
> > >> > As a side note Red Hat ships runtime configurable tick behaviour in RHEL
> > >> > these days. HZ is fixed but the ticks can be bunched up. That was done as
> > >> > a quick fix to keep stuff portable but its a lot more sensible than
> > >> > randomly messing with the HZ value and its not much code either.
> > >> > 
> > >> Hi Alan, 
> > >> 
> > >> I guess you are talking about the tick_divider patch ? 
> > >> And that's still same as reducing the HZ value only that it can be done
> > >> dynamically (boot time), right ? 
> > >
> > >Yes - which has the advantage that you can select different behaviours
> > >rather than distributions having to build with HZ=1000 either for
> > >compatibility or responsiveness can still allow users to drop to a lower
> > >HZ value if doing stuff like HPC.
> > >
> > >Basically it removes the need to argue about it at build time and lets
> > >the user decide.
> > 
> > any reason why this did not reach mainline?
> 
> I think it is because during the time when this was implemented for RHEL
> 5, mainline was moving towards the tickless approach, which might have
> prompted people to think that it would no more be useful for mainline.
> 
> Since Alan was the one who implemented those patches, I guess he would
> have a better say on this. Alan, are there any plans for mainlining this
> now ?
> 
> Alok

anyway, just fyi or for some additional transparency, here`s the 4 tick-divider 
related patches from "recent" RHEL5  kernel 
(-> http://isoredirect.centos.org/centos/5/os/SRPMS/kernel-2.6.18-128.el5.src.rpm)

regards
roland


cat ./linux-2.6-docs-update-kernel-parameters-with-tick-divider.patch

From: Chris Lalancette <clalance@...hat.com>
Date: Wed, 17 Sep 2008 17:14:19 +0200
Subject: [docs] update kernel-parameters with tick-divider
Message-id: 48D11ECB.1060100@...hat.com
O-Subject: [RHEL5.3 PATCH v2]: Update kernel-parameters with tick-divider
Bugzilla: 454792
RH-Acked-by: Prarit Bhargava <prarit@...hat.com>
RH-Acked-by: Alan Cox <alan@...hat.com>
RH-Nacked-by: Alan Cox <alan@...hat.com>

We have a request to better document the tick divider patch that went into 5.1.
 Towards this end, I came up with the following patch to
Documentation/kernel-parameters.txt.  Not sure if it needs ACKs or anything, but
I wanted to make sure dzickus saw it.  This will resolve BZ 454792.  This
version doesn't tell the user to divide by zero (thanks Alan).

--
Chris Lalancette

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b5bbd11..20ab2a9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -470,6 +470,10 @@ running once the system is up.
                        See drivers/char/README.epca and
                        Documentation/digiepca.txt.

+       divider=        [IA-32,X86-64]
+                       divide kernel HZ rate by given value.
+                       Format: <num>, where <num> is between 1 and 25
+
        dmascc=         [HW,AX25,SERIAL] AX.25 Z80SCC driver with DMA
                        support available.
                        Format: <io_dev0>[,<io_dev1>[,..<io_dev32>]]





cat ./linux-2.6-x86_64-fix-casting-issue-in-tick-divider-patch.patch

From: Prarit Bhargava <prarit@...hat.com>
Subject: [RHEL 5.1 PATCH]: Fix casting issue in tick divider patch
Date: Wed, 20 Jun 2007 14:16:29 -0400
Bugzilla: 244861
Message-Id: <20070620181629.28881.27223.sendpatchset@...rit.boston.redhat.com>
Changelog: [x86_64] Fix casting issue in tick divider patch


Fix a casting bug in the tick divider patch.

Successfully tested by me on a variety of systems that were exhibiting slow
boot behaviour.

Resolves BZ 244861.

--- linux-2.6.18.x86_64/arch/x86_64/kernel/time.c.orig  2007-06-20 04:21:58.000000000 -0400
+++ linux-2.6.18.x86_64/arch/x86_64/kernel/time.c       2007-06-20 04:28:58.000000000 -0400
@@ -433,7 +433,7 @@ void main_timer_handler(struct pt_regs *
                                (((long) offset << US_SCALE) / vxtime.tsc_quot) - 1;
        }
        /* SCALE: We expect tick_divider - 1 lost, ie 0 for normal behaviour */
-       if (lost > tick_divider - 1)  {
+       if (lost > (int)tick_divider - 1)  {
                handle_lost_ticks(lost, regs);
                jiffies += lost - (tick_divider - 1);
        }



cat ./linux-2.6-x86-fixes-for-the-tick-divider-patch.patch

From: Chris Lalancette <clalance@...hat.com>
Subject: Re: [RHEL 5.1.z PATCH]: Fixes for the tick divider patch
Date: Tue, 02 Oct 2007 16:53:22 -0400
Bugzilla: 315471
Message-Id: <4702AFC2.9020702@...hat.com>
Changelog: [x86] Fixes for the tick divider patch

All,
     While testing the tick divider patch under VMware, a number of issues were
found with it:

1)  On i386, when specifying "divider=10 apic=verbose", a bogus value was
printed for the CPU MHz and the host bus speed.  This is because during APIC
calibration, we were using "HZ/10" loops instead of "REAL_HZ/10", causing the
calculation to go out of bounds.

2)  On x86_64, when using the tick divider, it wasn't dividing the local APIC as
well as the external timer.  This causes problems under VMware since the
hypervisor (ESX server) has to deliver 1000 local APIC interrupts per second to
each logical processor, which can end up causing time drift.  By properly
dividing the local APIC as well as the external time source, it significantly
reduces the load on the HV, and the guests have less tendency to drift.

3)  On x86_64, we weren't looping during smp_local_timer_interrupt(), so we were
losing profiling ticks.

3)  On x86_64, when using the tick divider with PM-Timer, lost tick compensation
wasn't being calculated properly.  In particular, we would count ticks as lost
when they really weren't, because we were using HZ instead of REAL_HZ in the
lost calculation.

4)  On x86_64, TSC suffers from the same problem as PM-Timer.

The attached patch fixes all 4 of these problems.  Additionally, this patch also
adds a "hz=" command-line parameter for both i386 and x86_64.  This is nicer way
to specify the divider from a user point-of-view; they don't have to know the
current value of HZ in order to specify the HZ value they want.

These patches are not upstream, since upstream has since gone with the tickless
kernel.

Patches successfully tested by myself (just for verifying basic correctness),
and HP and VMware using ESX server.

This fixes BZ 305011.  Please review and ACK.

Chris Lalancette


>
> ACK less the hz= bits for 5.1.z, per Alan's concern about only certain
> values in the currently accepted range actually being valid. I'd say
> fully bake that part for 5.2 and just take the fixes for 5.1.z.
>

Same patch, with hz= bits removed for the z-stream.

Chris Lalancette

diff -urp linux-2.6.18.noarch.orig/arch/i386/kernel/apic.c linux-2.6.18.noarch/arch/i386/kernel/apic.c
--- linux-2.6.18.noarch.orig/arch/i386/kernel/apic.c    2007-10-02 16:42:24.000000000 -0400
+++ linux-2.6.18.noarch/arch/i386/kernel/apic.c 2007-10-02 16:47:00.000000000 -0400
@@ -1027,7 +1027,7 @@ static int __init calibrate_APIC_clock(v
        long tt1, tt2;
        long result;
        int i;
-       const int LOOPS = HZ/10;
+       const int LOOPS = REAL_HZ/10;

        apic_printk(APIC_VERBOSE, "calibrating APIC timer ...\n");

@@ -1076,13 +1076,13 @@ static int __init calibrate_APIC_clock(v
        if (cpu_has_tsc)
                apic_printk(APIC_VERBOSE, "..... CPU clock speed is "
                        "%ld.%04ld MHz.\n",
-                       ((long)(t2-t1)/LOOPS)/(1000000/HZ),
-                       ((long)(t2-t1)/LOOPS)%(1000000/HZ));
+                       ((long)(t2-t1)/LOOPS)/(1000000/REAL_HZ),
+                       ((long)(t2-t1)/LOOPS)%(1000000/REAL_HZ));

        apic_printk(APIC_VERBOSE, "..... host bus clock speed is "
                "%ld.%04ld MHz.\n",
-               result/(1000000/HZ),
-               result%(1000000/HZ));
+               result/(1000000/REAL_HZ),
+               result%(1000000/REAL_HZ));

        return result;
 }
diff -urp linux-2.6.18.noarch.orig/arch/x86_64/kernel/apic.c linux-2.6.18.noarch/arch/x86_64/kernel/apic.c
--- linux-2.6.18.noarch.orig/arch/x86_64/kernel/apic.c  2007-10-02 16:42:30.000000000 -0400
+++ linux-2.6.18.noarch/arch/x86_64/kernel/apic.c       2007-10-02 16:47:00.000000000 -0400
@@ -811,7 +811,7 @@ static int __init calibrate_APIC_clock(v
        printk(KERN_INFO "Detected %d.%03d MHz APIC timer.\n",
                result / 1000 / 1000, result / 1000 % 1000);

-       return result * APIC_DIVISOR / HZ;
+       return result * APIC_DIVISOR / REAL_HZ;
 }

 static unsigned int calibration_result;
@@ -941,10 +941,13 @@ void setup_APIC_extened_lvt(unsigned cha

 void smp_local_timer_interrupt(struct pt_regs *regs)
 {
-       profile_tick(CPU_PROFILING, regs);
+       int i;
+       for (i = 0; i < tick_divider; i++) {
+               profile_tick(CPU_PROFILING, regs);
 #ifdef CONFIG_SMP
-       update_process_times(user_mode(regs));
+               update_process_times(user_mode(regs));
 #endif
+       }
        if (apic_runs_main_timer > 1 && smp_processor_id() == boot_cpu_id)
                main_timer_handler(regs);
        /*
diff -urp linux-2.6.18.noarch.orig/arch/x86_64/kernel/pmtimer.c linux-2.6.18.noarch/arch/x86_64/kernel/pmtimer.c
--- linux-2.6.18.noarch.orig/arch/x86_64/kernel/pmtimer.c       2006-09-19 23:42:06.000000000 -0400
+++ linux-2.6.18.noarch/arch/x86_64/kernel/pmtimer.c    2007-10-02 16:47:00.000000000 -0400
@@ -64,8 +64,8 @@ int pmtimer_mark_offset(void)

        delta += offset_delay;

-       lost = delta / (USEC_PER_SEC / HZ);
-       offset_delay = delta % (USEC_PER_SEC / HZ);
+       lost = delta / (USEC_PER_SEC / REAL_HZ);
+       offset_delay = delta % (USEC_PER_SEC / REAL_HZ);

        rdtscll(tsc);
        vxtime.last_tsc = tsc - offset_delay * (u64)cpu_khz / 1000;
diff -urp linux-2.6.18.noarch.orig/arch/x86_64/kernel/time.c linux-2.6.18.noarch/arch/x86_64/kernel/time.c
--- linux-2.6.18.noarch.orig/arch/x86_64/kernel/time.c  2007-10-02 16:42:31.000000000 -0400
+++ linux-2.6.18.noarch/arch/x86_64/kernel/time.c       2007-10-02 16:47:43.000000000 -0400
@@ -65,6 +65,8 @@ static int notsc __initdata = 0;
 #define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
 #define FSEC_PER_TICK (FSEC_PER_SEC / HZ)

+#define USEC_PER_REAL_TICK (USEC_PER_SEC / REAL_HZ)
+
 #define NS_SCALE       10 /* 2^10, carefully chosen */
 #define US_SCALE       32 /* 2^32, arbitralrily chosen */

@@ -304,7 +306,7 @@ unsigned long long monotonic_clock(void)
                        this_offset = hpet_readl(HPET_COUNTER);
                } while (read_seqretry(&xtime_lock, seq));
                offset = (this_offset - last_offset);
-               offset *= NSEC_PER_TICK / hpet_tick;
+               offset *= NSEC_PER_TICK / hpet_tick_real;
        } else {
                do {
                        seq = read_seqbegin(&xtime_lock);
@@ -406,7 +408,7 @@ void main_timer_handler(struct pt_regs *
                }

                monotonic_base +=
-                       (offset - vxtime.last) * NSEC_PER_TICK / hpet_tick;
+                       (offset - vxtime.last) * NSEC_PER_TICK / hpet_tick_real;

                vxtime.last = offset;
 #ifdef CONFIG_X86_PM_TIMER
@@ -415,14 +417,14 @@ void main_timer_handler(struct pt_regs *
 #endif
        } else {
                offset = (((tsc - vxtime.last_tsc) *
-                          vxtime.tsc_quot) >> US_SCALE) - USEC_PER_TICK;
+                          vxtime.tsc_quot) >> US_SCALE) - USEC_PER_REAL_TICK;

                if (offset < 0)
                        offset = 0;

-               if (offset > USEC_PER_TICK) {
-                       lost = offset / USEC_PER_TICK;
-                       offset %= USEC_PER_TICK;
+               if (offset > USEC_PER_REAL_TICK) {
+                       lost = offset / USEC_PER_REAL_TICK;
+                       offset %= USEC_PER_REAL_TICK;
                }

                /* FIXME: 1000 or 1000000? */







cat ./linux-2.6-x86-tick-divider.patch

From: Alan Cox <alan@...hat.com>
Subject: [RHEL5]: Tick Divider (Bugzilla #215403]
Date: Wed, 18 Apr 2007 16:39:15 -0400
Bugzilla: 215403
Message-Id: <20070418203915.GA23344@...serv.devel.redhat.com>
Changelog: [x86] Tick Divider


The following patch implements a tick divider feature that allows you to
boot the kernel with HZ at 1000 but the real timer tick rate lower (thus
not breaking all the modules and kABI).

The selection is done at boot to minimize risk and the patch has been reworked
so that you can do an informal attempt at a proof that it doesn't cause
regression for the non dividing case.

The patch interleaved with notes follows, and below that the actual patch
proper.

Xen kernels remain at 250HZ because
a) Xen guests have a 'tickless mode'
b) Xen itself has issues with multiple differing guest GZ rates

Not queued for upstream as the upstream path is Ingo's tickless kernel, which
is not viable as a RHEL5 tweak

Index: linux-2.6.18.noarch/arch/i386/kernel/apic.c
===================================================================
--- linux-2.6.18.noarch.orig/arch/i386/kernel/apic.c
+++ linux-2.6.18.noarch/arch/i386/kernel/apic.c
@@ -1185,10 +1185,13 @@ EXPORT_SYMBOL(switch_ipi_to_APIC_timer);

 inline void smp_local_timer_interrupt(struct pt_regs * regs)
 {
-       profile_tick(CPU_PROFILING, regs);
+       int i;
+       for (i = 0; i < tick_divider; i++) {
+               profile_tick(CPU_PROFILING, regs);
 #ifdef CONFIG_SMP
-       update_process_times(user_mode_vm(regs));
+               update_process_times(user_mode_vm(regs));
 #endif
+       }

        /*
         * We take the 'long' return path, and there every subsystem
Index: linux-2.6.18.noarch/arch/i386/kernel/apm.c
===================================================================
--- linux-2.6.18.noarch.orig/arch/i386/kernel/apm.c
+++ linux-2.6.18.noarch/arch/i386/kernel/apm.c
@@ -1189,7 +1189,7 @@ static void reinit_timer(void)
        unsigned long flags;

        spin_lock_irqsave(&i8253_lock, flags);
-       /* set the clock to 100 Hz */
+       /* set the clock to HZ */
        outb_p(0x34, PIT_MODE);         /* binary, mode 2, LSB/MSB, ch 0 */
        udelay(10);
        outb_p(LATCH & 0xff, PIT_CH0);  /* LSB */
Index: linux-2.6.18.noarch/arch/i386/kernel/i8253.c
===================================================================
--- linux-2.6.18.noarch.orig/arch/i386/kernel/i8253.c
+++ linux-2.6.18.noarch/arch/i386/kernel/i8253.c
@@ -26,6 +26,7 @@ void setup_pit_timer(void)
        spin_lock_irqsave(&i8253_lock, flags);
        outb_p(0x34,PIT_MODE);          /* binary, mode 2, LSB/MSB, ch 0 */
        udelay(10);
+       /* Physical HZ */
        outb_p(LATCH & 0xff , PIT_CH0); /* LSB */
        udelay(10);
        outb(LATCH >> 8 , PIT_CH0);     /* MSB */
@@ -94,8 +95,11 @@ static cycle_t pit_read(void)
        spin_unlock_irqrestore(&i8253_lock, flags);

        count = (LATCH - 1) - count;
-
-       return (cycle_t)(jifs * LATCH) + count;
+       /* Adjust to logical ticks */
+       count *= tick_divider;
+
+       /* Keep the jiffies in terms of logical ticks not physical */
+       return (cycle_t)(jifs * LOGICAL_LATCH) + count;
 }

 static struct clocksource clocksource_pit = {
Index: linux-2.6.18.noarch/arch/i386/kernel/time.c
===================================================================
--- linux-2.6.18.noarch.orig/arch/i386/kernel/time.c
+++ linux-2.6.18.noarch/arch/i386/kernel/time.c
@@ -366,3 +367,22 @@ void __init time_init(void)

        time_init_hook();
 }
+
+#ifdef CONFIG_TICK_DIVIDER
+
+unsigned int tick_divider = 1;
+
+static int __init divider_setup(char *s)
+{
+       unsigned int divider = 1;
+       get_option(&s, &divider);
+       if (divider >= 1 && HZ/divider >= 25)
+               tick_divider = divider;
+       else
+               printk(KERN_ERR "tick_divider: %d is out of range.\n", divider);
+       return 1;
+}
+
+__setup("divider=", divider_setup);
+
+#endif
Index: linux-2.6.18.noarch/arch/i386/kernel/time_hpet.c
===================================================================
--- linux-2.6.18.noarch.orig/arch/i386/kernel/time_hpet.c
+++ linux-2.6.18.noarch/arch/i386/kernel/time_hpet.c
@@ -24,6 +24,7 @@

 static unsigned long hpet_period;      /* fsecs / HPET clock */
 unsigned long hpet_tick;               /* hpet clks count per tick */
+unsigned long hpet_tick_real;          /* hpet clocks per interrupt */
 unsigned long hpet_address;            /* hpet memory map physical address */
 int hpet_use_timer;

@@ -156,7 +157,8 @@ int __init hpet_enable(void)

        hpet_use_timer = id & HPET_ID_LEGSUP;

-       if (hpet_timer_stop_set_go(hpet_tick))
+       hpet_tick_real = hpet_tick * tick_divider;
+       if (hpet_timer_stop_set_go(hpet_tick_real))
                return -1;

        use_hpet = 1;
Index: linux-2.6.18.noarch/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.18.noarch.orig/arch/x86_64/Kconfig
+++ linux-2.6.18.noarch/arch/x86_64/Kconfig
@@ -443,6 +443,13 @@ config HPET_EMULATE_RTC
        bool "Provide RTC interrupt"
        depends on HPET_TIMER && RTC=y

+config TICK_DIVIDER
+       bool "Support clock division"
+       default n
+       help
+         Supports the use of clock division allowing the real interrupt
+         rate to be lower than the HZ setting.
+
 # Mark as embedded because too many people got it wrong.
 # The code disables itself when not needed.
 config IOMMU
Index: linux-2.6.18.noarch/arch/x86_64/kernel/i8259.c
===================================================================
--- linux-2.6.18.noarch.orig/arch/x86_64/kernel/i8259.c
+++ linux-2.6.18.noarch/arch/x86_64/kernel/i8259.c
@@ -498,6 +498,7 @@ static void setup_timer_hardware(void)
 {
        outb_p(0x34,0x43);              /* binary, mode 2, LSB/MSB, ch 0 */
        udelay(10);
+       /* LATCH is in physical clocks */
        outb_p(LATCH & 0xff , 0x40);    /* LSB */
        udelay(10);
        outb(LATCH >> 8 , 0x40);        /* MSB */
Index: linux-2.6.18.noarch/arch/x86_64/kernel/time.c
===================================================================
--- linux-2.6.18.noarch.orig/arch/x86_64/kernel/time.c
+++ linux-2.6.18.noarch/arch/x86_64/kernel/time.c
@@ -70,7 +70,8 @@ static int notsc __initdata = 0;
 unsigned int cpu_khz;                                  /* TSC clocks / usec, not used here */
 EXPORT_SYMBOL(cpu_khz);
 static unsigned long hpet_period;                      /* fsecs / HPET clock */
-unsigned long hpet_tick;                               /* HPET clocks / interrupt */
+unsigned long hpet_tick;                               /* HPET clocks / HZ */
+unsigned long hpet_tick_real;                          /* HPET clocks / interrupt */
 int hpet_use_timer;                            /* Use counter of hpet for time keeping, otherwise PIT */
 unsigned long vxtime_hz = PIT_TICK_RATE;
 int report_lost_ticks;                         /* command line option */
@@ -108,7 +109,9 @@ static inline unsigned int do_gettimeoff
 {
        /* cap counter read to one tick to avoid inconsistencies */
        unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;
-       return (min(counter,hpet_tick) * vxtime.quot) >> US_SCALE;
+       /* The hpet counter runs at a fixed rate so we don't care about HZ
+          scaling here. We do however care that the limit is in real ticks */
+       return (min(counter,hpet_tick_real) * vxtime.quot) >> US_SCALE;
 }

 unsigned int (*do_gettimeoffset)(void) = do_gettimeoffset_tsc;
@@ -332,7 +335,7 @@ static noinline void handle_lost_ticks(i
                        printk(KERN_WARNING "Falling back to HPET\n");
                        if (hpet_use_timer)
                                vxtime.last = hpet_readl(HPET_T0_CMP) -
-                                                       hpet_tick;
+                                                       hpet_tick_real;
                        else
                                vxtime.last = hpet_readl(HPET_COUNTER);
                        vxtime.mode = VXTIME_HPET;
@@ -355,7 +358,7 @@ void main_timer_handler(struct pt_regs *
 {
        static unsigned long rtc_update = 0;
        unsigned long tsc;
-       int delay = 0, offset = 0, lost = 0;
+       int delay = 0, offset = 0, lost = 0, i;

 /*
  * Here we are in the timer irq handler. We have irqs locally disabled (so we
@@ -373,8 +376,10 @@ void main_timer_handler(struct pt_regs *
                /* if we're using the hpet timer functionality,
                 * we can more accurately know the counter value
                 * when the timer interrupt occured.
+                *
+                * We are working in physical time here
                 */
-               offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
+               offset = hpet_readl(HPET_T0_CMP) - hpet_tick_real;
                delay = hpet_readl(HPET_COUNTER) - offset;
        } else if (!pmtmr_ioport) {
                spin_lock(&i8253_lock);
@@ -382,14 +387,19 @@ void main_timer_handler(struct pt_regs *
                delay = inb_p(0x40);
                delay |= inb(0x40) << 8;
                spin_unlock(&i8253_lock);
+               /* We are in physical not logical ticks */
                delay = LATCH - 1 - delay;
+               /* True ticks of delay elapsed */
+               delay *= tick_divider;
        }

        tsc = get_cycles_sync();

        if (vxtime.mode == VXTIME_HPET) {
-               if (offset - vxtime.last > hpet_tick) {
-                       lost = (offset - vxtime.last) / hpet_tick - 1;
+               if (offset - vxtime.last > hpet_tick_real) {
+                       lost = (offset - vxtime.last) / hpet_tick_real - 1;
+                       /* Lost is now in real ticks but we want logical */
+                       lost *= tick_divider;
                }

                monotonic_base +=
@@ -422,33 +432,35 @@ void main_timer_handler(struct pt_regs *
                        vxtime.last_tsc = tsc -
                                (((long) offset << US_SCALE) / vxtime.tsc_quot) - 1;
        }
-
-       if (lost > 0) {
+       /* SCALE: We expect tick_divider - 1 lost, ie 0 for normal behaviour */
+       if (lost > tick_divider - 1)  {
                handle_lost_ticks(lost, regs);
-               jiffies += lost;
+               jiffies += lost - (tick_divider - 1);
        }

 /*
  * Do the timer stuff.
  */

-       do_timer(regs);
+       for (i = 0; i < tick_divider; i++) {
+               do_timer(regs);
 #ifndef CONFIG_SMP
-       update_process_times(user_mode(regs));
+               update_process_times(user_mode(regs));
 #endif

-/*
- * In the SMP case we use the local APIC timer interrupt to do the profiling,
- * except when we simulate SMP mode on a uniprocessor system, in that case we
- * have to call the local interrupt handler.
- */
+       /*
+        * In the SMP case we use the local APIC timer interrupt to do the profiling,
+        * except when we simulate SMP mode on a uniprocessor system, in that case we
+        * have to call the local interrupt handler.
+        */

 #ifndef CONFIG_X86_LOCAL_APIC
-       profile_tick(CPU_PROFILING, regs);
+               profile_tick(CPU_PROFILING, regs);
 #else
-       if (!using_apic_timer)
-               smp_local_timer_interrupt(regs);
+               if (!using_apic_timer)
+                       smp_local_timer_interrupt(regs);
 #endif
+       }

 /*
  * If we have an externally synchronized Linux clock, then update CMOS clock
@@ -800,8 +812,8 @@ static int hpet_timer_stop_set_go(unsign
        if (hpet_use_timer) {
                hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
                    HPET_TN_32BIT, HPET_T0_CFG);
-               hpet_writel(hpet_tick, HPET_T0_CMP); /* next interrupt */
-               hpet_writel(hpet_tick, HPET_T0_CMP); /* period */
+               hpet_writel(hpet_tick_real, HPET_T0_CMP); /* next interrupt */
+               hpet_writel(hpet_tick_real, HPET_T0_CMP); /* period */
                cfg |= HPET_CFG_LEGACY;
        }
 /*
@@ -836,16 +848,19 @@ static int hpet_init(void)
        if (hpet_period < 100000 || hpet_period > 100000000)
                return -1;

+       /* Logical ticks */
        hpet_tick = (FSEC_PER_TICK + hpet_period / 2) / hpet_period;
+       /* Ticks per real interrupt */
+       hpet_tick_real = hpet_tick * tick_divider;

        hpet_use_timer = (id & HPET_ID_LEGSUP);

-       return hpet_timer_stop_set_go(hpet_tick);
+       return hpet_timer_stop_set_go(hpet_tick_real);
 }

 static int hpet_reenable(void)
 {
-       return hpet_timer_stop_set_go(hpet_tick);
+       return hpet_timer_stop_set_go(hpet_tick_real);
 }

 #define PIT_MODE 0x43
@@ -864,6 +879,7 @@ static void __init __pit_init(int val, u

 void __init pit_init(void)
 {
+       /* LATCH is in actual interrupt ticks */
        __pit_init(LATCH, 0x34); /* binary, mode 2, LSB/MSB, ch 0 */
 }

@@ -1002,7 +1018,7 @@ void time_init_gtod(void)
        if (vxtime.hpet_address && notsc) {
                timetype = hpet_use_timer ? "HPET" : "PIT/HPET";
                if (hpet_use_timer)
-                       vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick;
+                       vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick_real;
                else
                        vxtime.last = hpet_readl(HPET_COUNTER);
                vxtime.mode = VXTIME_HPET;
@@ -1073,7 +1089,7 @@ static int timer_resume(struct sys_devic
        xtime.tv_nsec = 0;
        if (vxtime.mode == VXTIME_HPET) {
                if (hpet_use_timer)
-                       vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick;
+                       vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick_real;
                else
                        vxtime.last = hpet_readl(HPET_COUNTER);
 #ifdef CONFIG_X86_PM_TIMER
@@ -1352,3 +1368,22 @@ int __init notsc_setup(char *s)
 }

 __setup("notsc", notsc_setup);
+
+#ifdef CONFIG_TICK_DIVIDER
+
+
+unsigned int tick_divider = 1;
+
+static int __init divider_setup(char *s)
+{
+       unsigned int divider = 1;
+       get_option(&s, &divider);
+       if (divider >= 1 && HZ/divider >= 25)
+               tick_divider = divider;
+       else
+               printk(KERN_ERR "tick_divider: %d is out of range.\n", divider);
+       return 1;
+}
+
+__setup("divider=", divider_setup);
+#endif
Index: linux-2.6.18.noarch/include/asm-i386/mach-default/do_timer.h
===================================================================
--- linux-2.6.18.noarch.orig/include/asm-i386/mach-default/do_timer.h
+++ linux-2.6.18.noarch/include/asm-i386/mach-default/do_timer.h
@@ -16,17 +16,21 @@

 static inline void do_timer_interrupt_hook(struct pt_regs *regs)
 {
-       do_timer(regs);
+       int i;
+       for (i = 0; i < tick_divider; i++) {
+               do_timer(regs);
 #ifndef CONFIG_SMP
-       update_process_times(user_mode_vm(regs));
+               update_process_times(user_mode_vm(regs));
 #endif
+       }
 /*
  * In the SMP case we use the local APIC timer interrupt to do the
  * profiling, except when we simulate SMP mode on a uniprocessor
  * system, in that case we have to call the local interrupt handler.
  */
 #ifndef CONFIG_X86_LOCAL_APIC
-       profile_tick(CPU_PROFILING, regs);
+       for (i = 0; i < tick_divider; i++)
+               profile_tick(CPU_PROFILING, regs);
 #else
        if (!using_apic_timer)
                smp_local_timer_interrupt(regs);
Index: linux-2.6.18.noarch/include/asm-i386/mach-visws/do_timer.h
===================================================================
--- linux-2.6.18.noarch.orig/include/asm-i386/mach-visws/do_timer.h
+++ linux-2.6.18.noarch/include/asm-i386/mach-visws/do_timer.h
@@ -6,20 +6,24 @@

 static inline void do_timer_interrupt_hook(struct pt_regs *regs)
 {
+       int i;
        /* Clear the interrupt */
        co_cpu_write(CO_CPU_STAT,co_cpu_read(CO_CPU_STAT) & ~CO_STAT_TIMEINTR);

-       do_timer(regs);
+       for (i = 0; i < tick_divider; i++) {
+               do_timer(regs);
 #ifndef CONFIG_SMP
-       update_process_times(user_mode_vm(regs));
+               update_process_times(user_mode_vm(regs));
 #endif
+       }
 /*
  * In the SMP case we use the local APIC timer interrupt to do the
  * profiling, except when we simulate SMP mode on a uniprocessor
  * system, in that case we have to call the local interrupt handler.
  */
 #ifndef CONFIG_X86_LOCAL_APIC
-       profile_tick(CPU_PROFILING, regs);
+       for (i = 0; i < tick_divider; i++)
+               profile_tick(CPU_PROFILING, regs);
 #else
        if (!using_apic_timer)
                smp_local_timer_interrupt(regs);
Index: linux-2.6.18.noarch/include/asm-i386/mach-voyager/do_timer.h
===================================================================
--- linux-2.6.18.noarch.orig/include/asm-i386/mach-voyager/do_timer.h
+++ linux-2.6.18.noarch/include/asm-i386/mach-voyager/do_timer.h
@@ -3,12 +3,14 @@

 static inline void do_timer_interrupt_hook(struct pt_regs *regs)
 {
-       do_timer(regs);
+       int i;
+       for (i = 0; i < tick_divider; i++) {
+               do_timer(regs);
 #ifndef CONFIG_SMP
-       update_process_times(user_mode_vm(regs));
+               update_process_times(user_mode_vm(regs));
 #endif
-
-       voyager_timer_interrupt(regs);
+               voyager_timer_interrupt(regs);
+       }
 }

 static inline int do_timer_overflow(int count)
Index: linux-2.6.18.noarch/include/linux/jiffies.h
===================================================================
--- linux-2.6.18.noarch.orig/include/linux/jiffies.h
+++ linux-2.6.18.noarch/include/linux/jiffies.h
@@ -33,10 +33,21 @@
 # error You lose.
 #endif

+#ifndef CONFIG_TICK_DIVIDER
+#define tick_divider 1
+#else
+extern unsigned int tick_divider;
+#endif
+
+#define REAL_HZ (HZ/tick_divider)
 /* LATCH is used in the interval timer and ftape setup. */
-#define LATCH  ((CLOCK_TICK_RATE + HZ/2) / HZ) /* For divider */
+#define LATCH  ((CLOCK_TICK_RATE + REAL_HZ/2) / REAL_HZ)       /* For divider */
+
+#define LATCH_HPET ((HPET_TICK_RATE + REAL_HZ/2) / REAL_HZ)
+
+#define LOGICAL_LATCH  ((CLOCK_TICK_RATE + HZ/2) / HZ) /* For divider */

-#define LATCH_HPET ((HPET_TICK_RATE + HZ/2) / HZ)
+#define LOGICAL_LATCH_HPET ((HPET_TICK_RATE + HZ/2) / HZ)

 /* Suppose we want to devide two numbers NOM and DEN: NOM/DEN, the we can
  * improve accuracy by shifting LSH bits, hence calculating:
@@ -51,9 +62,9 @@
                              + ((((NOM) % (DEN)) << (LSH)) + (DEN) / 2) / (DEN))

 /* HZ is the requested value. ACTHZ is actual HZ ("<< 8" is for accuracy) */
-#define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LATCH, 8))
+#define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LOGICAL_LATCH, 8))

-#define ACTHZ_HPET (SH_DIV (HPET_TICK_RATE, LATCH_HPET, 8))
+#define ACTHZ_HPET (SH_DIV (HPET_TICK_RATE, LOGICAL_LATCH_HPET, 8))

 /* TICK_NSEC is the time between ticks in nsec assuming real ACTHZ */
 #define TICK_NSEC (SH_DIV (1000000UL * 1000, ACTHZ, 8))
Index: linux-2.6.18.noarch/init/calibrate.c
===================================================================
--- linux-2.6.18.noarch.orig/init/calibrate.c
+++ linux-2.6.18.noarch/init/calibrate.c
@@ -26,7 +26,6 @@ __setup("lpj=", lpj_setup);
  * Also, this code tries to handle non-maskable asynchronous events
  * (like SMIs)
  */
-#define DELAY_CALIBRATION_TICKS                        ((HZ < 100) ? 1 : (HZ/100))
 #define MAX_DIRECT_CALIBRATION_RETRIES         5

 static unsigned long __devinit calibrate_delay_direct(void)
@@ -37,6 +36,7 @@ static unsigned long __devinit calibrate
        unsigned long tsc_rate_min, tsc_rate_max;
        unsigned long good_tsc_sum = 0;
        unsigned long good_tsc_count = 0;
+       unsigned long delay_calibration_ticks = ((REAL_HZ < 100) ? 1 : (REAL_HZ/100));
        int i;

        if (read_current_timer(&pre_start) < 0 )
@@ -65,7 +65,7 @@ static unsigned long __devinit calibrate
                pre_start = 0;
                read_current_timer(&start);
                start_jiffies = jiffies;
-               while (jiffies <= (start_jiffies + 1)) {
+               while (jiffies <= (start_jiffies + tick_divider)) {
                        pre_start = start;
                        read_current_timer(&start);
                }
@@ -74,15 +74,18 @@ static unsigned long __devinit calibrate
                pre_end = 0;
                end = post_start;
                while (jiffies <=
-                      (start_jiffies + 1 + DELAY_CALIBRATION_TICKS)) {
+                      (start_jiffies + tick_divider * (1 + delay_calibration_ticks))) {
                        pre_end = end;
                        read_current_timer(&end);
                }
                read_current_timer(&post_end);

-               tsc_rate_max = (post_end - pre_start) / DELAY_CALIBRATION_TICKS;
-               tsc_rate_min = (pre_end - post_start) / DELAY_CALIBRATION_TICKS;
-
+               tsc_rate_max = (post_end - pre_start) / delay_calibration_ticks;
+               tsc_rate_min = (pre_end - post_start) / delay_calibration_ticks;
+
+               tsc_rate_max /= tick_divider;
+               tsc_rate_min /= tick_divider;
+
                /*
                 * If the upper limit and lower limit of the tsc_rate is
                 * >= 12.5% apart, redo calibration.
Index: linux-2.6.18.noarch/arch/i386/Kconfig
===================================================================
--- linux-2.6.18.noarch.orig/arch/i386/Kconfig
+++ linux-2.6.18.noarch/arch/i386/Kconfig
@@ -238,6 +238,13 @@ config HPET_EMULATE_RTC
        depends on HPET_TIMER && RTC=y
        default y

+config TICK_DIVIDER
+       bool "Support clock division"
+       default n
+       help
+         Supports the use of clock division allowing the real interrupt
+         rate to be lower than the HZ setting.
+
 config NR_CPUS
        int "Maximum number of CPUs (2-255)"
        range 2 255

______________________________________________________
GRATIS für alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ