linux-kernel - Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210304101653.546a9da1@alex-virtual-machine>
Date:   Thu, 4 Mar 2021 10:16:53 +0800
From:   Aili Yao <yaoaili@...gsoft.com>
To:     "Luck, Tony" <tony.luck@...el.com>
CC:     "HORIGUCHI NAOYA堀口　直也)" 
        <naoya.horiguchi@....com>, Oscar Salvador <osalvador@...e.de>,
        "david@...hat.com" <david@...hat.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "bp@...en8.de" <bp@...en8.de>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "yangfeng1@...gsoft.com" <yangfeng1@...gsoft.com>,
        <yaoaili@...gsoft.com>
Subject: Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

On Wed, 3 Mar 2021 15:41:35 +0000
"Luck, Tony" <tony.luck@...el.com> wrote:

> > For error address with sigbus, i think this is not an issue resulted by the patch i post, before my patch, the issue is already there.
> > I don't find a realizable way to get the correct address for same reason --- we don't know whether the page mapping is there or not when
> > we got to kill_me_maybe(), in some case, we may get it, but there are a lot of parallel issue need to consider, and if failed we have to fallback
> > to the error brach again, remaining current code may be an easy option;  
> 
> My RFC patch from yesterday removes the uncertainty about whether the page is there or not. After it walks the page
> tables we know that the poison page isn't mapped (note that patch is RFC for a reason ... I'm 90% sure that it should
> do a bit more that just clear the PRESENT bit).
> 
> So perhaps memory_failure() has queued a SIGBUS for this task, if so, we take it when we return from kill_me_maybe()
> 
> If not, we will return to user mode and re-execute the failing instruction ... but because the page is unmapped we will take a #PF

Got this, I have some error thoughts here.


> The x86 page fault handler will see that the page for this physical address is marked HWPOISON, and it will send the SIGBUS
> (just like it does if the page had been removed by an earlier UCNA/SRAO error).

if your methods works, should it be like this?

1582                         pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
1583                         if (PageHuge(page)) {
1584                                 hugetlb_count_sub(compound_nr(page), mm);
1585                                 set_huge_swap_pte_at(mm, address,
1586                                                      pvmw.pte, pteval,
1587                                                      vma_mmu_pagesize(vma));
1588                         } else {
1589                                 dec_mm_counter(mm, mm_counter(page));
1590                                 set_pte_at(mm, address, pvmw.pte, pteval);
1591                         }

the page fault check if it's a poison page using is_hwpoison_entry(),

-- 
Thanks!
Aili Yao