linux-kernel - Re: [PATCH] AMD Thermal Interrupt Support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3f1a065b0801171706o57ae5d6w8a986db508ddcee4@mail.gmail.com>
Date:	Thu, 17 Jan 2008 17:06:33 -0800
From:	"Russell Leidich" <rml@...gle.com>
To:	"Andi Kleen" <andi@...stfloor.org>,
	"Torsten Kaiser" <just.for.lkml@...glemail.com>,
	"Andrew Morton" <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	"Thomas Gleixner" <tglx@...utronix.de>,
	"Ingo Molnar" <mingo@...e.hu>, valdis.kletnieks@...edu,
	"Tim Hockin" <thockin@...gle.com>
Subject: Re: [PATCH] AMD Thermal Interrupt Support

All,

Sorry, but the Google code review process has turned up a subtle issue
with my formerly submitted patch.  It is fixed in this newer patch
(attached).  Otherwise, the functional description is exactly as
explained below.

The issue is that there were 2 printk()s which printed the northbridge
PCI device number associated with throttling initialization and
detection.  These printk()s assumed that the northbridges were
enumerated in-order.  While this is highly likely to be the case, it
should not have been assumed.  In the event that it is _not_ the case,
then the PCI device numbers printed out would have been wrong.

Since I had to respin the patch, I took the opportunity to fix a few
minor issues as well: (1) Changed the contents of the aforementioned
printk()s to reveal the PCI bus, function, and register which relate
to the throttling message -- not just the device.  (2) Deleted all
mention of "AMD Quad Core", leaving only "Family 0x10".  (3) Fixed one
comment with incorrect grammar.  (4) Fixed punctuation in a printk().

I sincerely hope that this will be the last iteration.  Thanks to all
who contributed to the review, here and at Google!

On Jan 10, 2008 6:21 PM, Russell Leidich <rml@...gle.com> wrote:
> All,
>
> Here's the hopefully-final version of the patch, which I have just
> tested on Intel and AMD.
>
> In my AMD test, I happened to discover that although MCEs were being
> logged, the MCE counter at
> /sys/devices/system/cpu/cpu(whatever)/thermal_throttle/count was not
> being updated.  I fixed this by (1) moving smp_thermal_init from
> late_initcall to device_initcall in mce_amd_64.c (2) moving
> thermal_throttle_init_device from device_initcall to late_initcall in
> therm_throt.c.
>
> Thanks to Andi for his notes on how to hack up an indirect call in
> AT&T-style X86-64.
>
> To reiterate my earlier general description of the patch:
>
> This patch adds thermal machine check event (MCE) support to AMD
> Barcelona (and probably later, if new PCI IDs are added to
> k8_northbridges[]), styled after the same in the Intel code.  The
> initialization consists of 3 parts: (1) northbridge, which enables the
> delivery of the interrupt to the local APIC, (2) APIC, which enables
> delivery to X86, and (3) hotplug support in threshold_cpu_callback(),
> which accomplishes #2 for CPUs which (re)enter the OS later.
>
> Whenever the temperature reaches the throttling threshold programmed
> into a northbridge register (by the BIOS -- my code doesn't change
> this value), a thermal interrupt is delivered.  The interrupt is
> delivered to the vector specified in the thermal APIC register at
> (APIC_BASE + 0x330), which is identical to Intel.  Because the vector
> register is the same between both architectures, and because I don't
> know which brand of CPU is present until runtime (if either),
> thermal_interrupt in entry_64.S will branch to smp_thermal_interrupt
> in mce_thermal.c.  In turn, smp_thermal_interrupt will branch to the
> CPU-specific code for handling the interrupt.  (Apart from the common
> vector location, AMD and Intel use radically different architectures
> for the purpose of reporting throttling events.)  At that point, an
> MCE is logged if and only if the temperature has made a low-to-high
> transition.  Rate limiting is employed so as not to spam the log.
>
> --
> Russell Leidich
>



-- 
Russell Leidich

View attachment "patch.txt" of type "text/plain" (12276 bytes)