linux-kernel - Re: [Question]: major faults are still triggered after mlockall when numa balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dadffb1c-4491-b242-5568-b661e32cca1f@huawei.com>
Date:   Fri, 10 Nov 2023 17:39:43 +0800
From:   "zhangpeng (AS)" <zhangpeng362@...wei.com>
To:     Matthew Wilcox <willy@...radead.org>
CC:     <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        <akpm@...ux-foundation.org>, <lstoakes@...il.com>,
        <hughd@...gle.com>, <david@...hat.com>, <fengwei.yin@...el.com>,
        <vbabka@...e.cz>, <peterz@...radead.org>, <mgorman@...e.de>,
        <mingo@...hat.com>, <riel@...hat.com>, <ying.huang@...el.com>,
        <hannes@...xchg.org>, Nanyong Sun <sunnanyong@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [Question]: major faults are still triggered after mlockall when
 numa balancing

On 2023/11/10 1:27, Matthew Wilcox wrote:

> On Thu, Nov 09, 2023 at 09:47:24PM +0800, zhangpeng (AS) wrote:
>> There is a stage in numa fault which will set pte as 0 in do_numa_page() :
>> ptep_modify_prot_start() will clear the vmf->pte, until
>> ptep_modify_prot_commit() assign a value to the vmf->pte.
> [...]
>
>> Our problem scenario is as follows:
>>
>> task 1                      task 2
>> ------                      ------
>> /* scan global variables */
>> do_numa_page()
>>    spin_lock(vmf->ptl)
>>    ptep_modify_prot_start()
>>    /* set vmf->pte as null */
>>                              /* Access global variables */
>>                              handle_pte_fault()
>>                                /* no pte lock */
>>                                do_pte_missing()
>>                                  do_fault()
>>                                    do_read_fault()
>>    ptep_modify_prot_commit()
>>    /* ptep update done */
>>    pte_unmap_unlock(vmf->pte, vmf->ptl)
>>                                      do_fault_around()
>>                                      __do_fault()
>>                                        filemap_fault()
>>                                          /* page cache is not available
>>                                          and a major fault is triggered */
>>                                          do_sync_mmap_readahead()
>>                                          /* page_not_uptodate and goto
>>                                          out_retry. */
>>
>> Is there any way to avoid such a major fault?
> Yes, this looks like a bug.
>
> It seems to me that the easiest way to fix this is not to zero the pte
> but to make it protnone?  That would send task 2 into do_numa_page()
> where it would take the ptl, then check pte_same(), see that it's
> changed and goto out, which will end up retrying the fault.
>
> I'm not particularly expert at page table manipulation, so I'll let
> somebody who is propose an actual patch.  Or you could try to do it?

Thank you for your reply.
Sorry, I'm not particularly good at page table related manipulation
either. It would be great if somebody who are better at this part could
help solve it.

-- 
Best Regards,
Peng