linux-kernel - Re: [PATCH v3] mm,hwpoison: return -EHWPOISON when page already poisoned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20210406090444.2a69b9e2@alex-virtual-machine>
Date:   Tue, 6 Apr 2021 09:04:44 +0800
From:   Aili Yao <yaoaili@...gsoft.com>
To:     "HORIGUCHI NAOYA堀口　直也)" 
        <naoya.horiguchi@....com>, "Luck, Tony" <tony.luck@...el.com>
CC:     Oscar Salvador <osalvador@...e.de>,
        "david@...hat.com" <david@...hat.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "yangfeng1@...gsoft.com" <yangfeng1@...gsoft.com>,
        "sunhao2@...gsoft.com" <sunhao2@...gsoft.com>,
        <yaoaili@...gsoft.com>
Subject: Re: [PATCH v3] mm,hwpoison: return -EHWPOISON when page already
 poisoned

On Mon, 5 Apr 2021 13:50:18 +0000
HORIGUCHI NAOYA(堀口　直也) <naoya.horiguchi@....com> wrote:

> On Fri, Apr 02, 2021 at 03:11:20PM +0000, Luck, Tony wrote:
> > >> Combined with my "mutex" patch (to get rid of races where 2nd process returns
> > >> early, but first process is still looking for mappings to unmap and tasks
> > >> to signal) this patch moves forward a bit. But I think it needs an
> > >> additional change here in kill_me_maybe() to just "return" if there is a
> > >> EHWPOISON return from memory_failure()
> > >> 
> > > Got this, Thanks for your reply!
> > > I will dig into this!
> > 
> > One problem with this approach is when the first task to find poison
> > fails to complete actions. Then the poison pages are not unmapped,
> > and just returning from kill_me_maybe() gets into a loop :-(
> 
> Yes, that's the pain point.  We need send SIGBUS to the current process in
> "already haredware poisoned" case of memory_failure().  SIGBUS should
> contain the error virtual address, but unfortunately walking the page table
> or using p->mce_vaddr is not always reliable now.
> 
> So as a second-best approach, we can extend the "walking page table"
> approach such that we walk over the whole virtual address space to make sure
> that the number of entries pointing to the error page is exactly 1.
> If that's the case, then we can confidently send SIGBUS with it.  If we find
> multiple entries pointing to the error page, then we give up guessing, then
> send a nomral SIGBUS to the current process.  That's not worse than now,
> and I think we need wait in the hope that the virtual address will be
> available in MCE handler.
> 
> Anyway I'll try to write a patch for this.

Yeah, previous patch didn't adress the multiple virtual address issue, If there is a way to fix that,
That would be great!

-- 
Thanks!
Aili Yao