[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f1a065b0801171706o57ae5d6w8a986db508ddcee4@mail.gmail.com>
Date: Thu, 17 Jan 2008 17:06:33 -0800
From: "Russell Leidich" <rml@...gle.com>
To: "Andi Kleen" <andi@...stfloor.org>,
"Torsten Kaiser" <just.for.lkml@...glemail.com>,
"Andrew Morton" <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org,
"Thomas Gleixner" <tglx@...utronix.de>,
"Ingo Molnar" <mingo@...e.hu>, valdis.kletnieks@...edu,
"Tim Hockin" <thockin@...gle.com>
Subject: Re: [PATCH] AMD Thermal Interrupt Support
All,
Sorry, but the Google code review process has turned up a subtle issue
with my formerly submitted patch. It is fixed in this newer patch
(attached). Otherwise, the functional description is exactly as
explained below.
The issue is that there were 2 printk()s which printed the northbridge
PCI device number associated with throttling initialization and
detection. These printk()s assumed that the northbridges were
enumerated in-order. While this is highly likely to be the case, it
should not have been assumed. In the event that it is _not_ the case,
then the PCI device numbers printed out would have been wrong.
Since I had to respin the patch, I took the opportunity to fix a few
minor issues as well: (1) Changed the contents of the aforementioned
printk()s to reveal the PCI bus, function, and register which relate
to the throttling message -- not just the device. (2) Deleted all
mention of "AMD Quad Core", leaving only "Family 0x10". (3) Fixed one
comment with incorrect grammar. (4) Fixed punctuation in a printk().
I sincerely hope that this will be the last iteration. Thanks to all
who contributed to the review, here and at Google!
On Jan 10, 2008 6:21 PM, Russell Leidich <rml@...gle.com> wrote:
> All,
>
> Here's the hopefully-final version of the patch, which I have just
> tested on Intel and AMD.
>
> In my AMD test, I happened to discover that although MCEs were being
> logged, the MCE counter at
> /sys/devices/system/cpu/cpu(whatever)/thermal_throttle/count was not
> being updated. I fixed this by (1) moving smp_thermal_init from
> late_initcall to device_initcall in mce_amd_64.c (2) moving
> thermal_throttle_init_device from device_initcall to late_initcall in
> therm_throt.c.
>
> Thanks to Andi for his notes on how to hack up an indirect call in
> AT&T-style X86-64.
>
> To reiterate my earlier general description of the patch:
>
> This patch adds thermal machine check event (MCE) support to AMD
> Barcelona (and probably later, if new PCI IDs are added to
> k8_northbridges[]), styled after the same in the Intel code. The
> initialization consists of 3 parts: (1) northbridge, which enables the
> delivery of the interrupt to the local APIC, (2) APIC, which enables
> delivery to X86, and (3) hotplug support in threshold_cpu_callback(),
> which accomplishes #2 for CPUs which (re)enter the OS later.
>
> Whenever the temperature reaches the throttling threshold programmed
> into a northbridge register (by the BIOS -- my code doesn't change
> this value), a thermal interrupt is delivered. The interrupt is
> delivered to the vector specified in the thermal APIC register at
> (APIC_BASE + 0x330), which is identical to Intel. Because the vector
> register is the same between both architectures, and because I don't
> know which brand of CPU is present until runtime (if either),
> thermal_interrupt in entry_64.S will branch to smp_thermal_interrupt
> in mce_thermal.c. In turn, smp_thermal_interrupt will branch to the
> CPU-specific code for handling the interrupt. (Apart from the common
> vector location, AMD and Intel use radically different architectures
> for the purpose of reporting throttling events.) At that point, an
> MCE is logged if and only if the temperature has made a low-to-high
> transition. Rate limiting is employed so as not to spam the log.
>
> --
> Russell Leidich
>
--
Russell Leidich
View attachment "patch.txt" of type "text/plain" (12276 bytes)
Powered by blists - more mailing lists