[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <987664A83D2D224EAE907B061CE93D5301EA9704CD@orsmsx505.amr.corp.intel.com>
Date: Wed, 7 Sep 2011 22:16:04 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: Minskey Guo <chaohong_guo@...ux.intel.com>,
Chen Gong <gong.chen@...ux.intel.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...e.hu>, Borislav Petkov <bp@...64.org>,
Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
Subject: RE: [PATCH 5/5] mce: recover from "action required" errors reported
in data path in usermode
> __memory_failure() handling calls some routines, such
> as is_free_buddy_page(), which needs to acquire the spin
> lock, zone->lock. How can we guarantee that other CPUs
> haven't acquired the lock when receiving #mc broadcast
> and entering #mc handlers ?
By the time I call __memory_failure() - the other cpus have
been released from mce handler - so they are back executing
normal code.
But Chen Gong's earlier comments made me look again at entry_64.S
code - ane I realized that I missed seeing code in the return
path from do_machine_check() that switched from MCE stack to
regular kernel stack before processing TIF_MCE_NOTIFY.
I may go back and re-visit a path that I looked at to change
do_machine_check from "void" return to "unsigned long" and have
it return the address for the "AR" case and "0" otherwise.
Then we could switch out of machine check stack to non-mce
context to call __memory_failure(). When I looked at this
before the entry_64.S path looked plausible. The 32-bit
path looked to be painful (too many macros in entry_32.S)
-Tony
Powered by blists - more mailing lists