[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210426082316.GA181354@hori.linux.bs1.fc.nec.co.jp>
Date: Mon, 26 Apr 2021 08:23:17 +0000
From: HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@....com>
To: Borislav Petkov <bp@...en8.de>
CC: Naoya Horiguchi <nao.horiguchi@...il.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Tony Luck <tony.luck@...el.com>,
Aili Yao <yaoaili@...gsoft.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Oscar Salvador <osalvador@...e.de>,
David Hildenbrand <david@...hat.com>,
Andy Lutomirski <luto@...nel.org>, Jue Wang <juew@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 3/3] mm,hwpoison: add kill_accessing_process() to find
error virtual address
On Fri, Apr 23, 2021 at 01:57:25PM +0200, Borislav Petkov wrote:
> On Fri, Apr 23, 2021 at 02:18:34AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > I don't know exactly. MCE subsystem seems to have code extracting linear
> > address, so I wonder that that could be used as a hint to memory_failure()
> > to find the proper virtual address.
>
> See "Table 15-3. Address Mode in IA32_MCi_MISC[8:6]" in the SDM -
> apparently it can report all kinds of address types, depending on the hw
> incarnation or MCA bank type or whatnot. Tony knows :)
"15.9.3.2 Architecturally Defined SRAR Errors" says that the register
is supposed to have physical address.
For both the data load and instruction fetch errors, the ADDRV and MISCV
flags in the IA32_MCi_STATUS register are set to indicate that the offending
physical address information is available from the IA32_MCi_MISC and the
IA32_MCi_ADDR registers.
> > The situation in question is caused by action required MCE, so
> > we know which process we should send SIGBUS to. So if we choose
> > to send SIGBUS to all, no innocent bystanders would be affected.
> > But when the process have multiple virtual addresses associated
> > with the error physical address, the process receives multiple
> > SIGBUSs and all but one have wrong value in si_addr in siginfo_t,
> > so that's confusing.
>
> Is that scenario real or hypothetical?
>
> Because I'd expect that if we send it a SIGBUS and we poison that page,
> then all the VAs mapping it will have to handle the situation that that
> page has been poisoned and pulled from under them.
IIUC, the above should be done by the first MCE handling. In "already
hwpoisoned" case, the page has already been poisoned and all mapping for it
should be already unmapped, then what we need additionally is to send SIGBUS
to report to the application that it should take some action or abort
immediately.
Thanks,
Naoya Horiguchi
Powered by blists - more mailing lists