lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874jhugom8.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Fri, 10 Nov 2023 13:32:15 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     "zhangpeng (AS)" <zhangpeng362@...wei.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        lstoakes@...il.com, hughd@...gle.com, david@...hat.com,
        fengwei.yin@...el.com, vbabka@...e.cz, peterz@...radead.org,
        mgorman@...e.de, mingo@...hat.com, riel@...hat.com,
        hannes@...xchg.org, Nanyong Sun <sunnanyong@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [Question]: major faults are still triggered after mlockall
 when numa balancing

Matthew Wilcox <willy@...radead.org> writes:

> On Thu, Nov 09, 2023 at 09:47:24PM +0800, zhangpeng (AS) wrote:
>> There is a stage in numa fault which will set pte as 0 in do_numa_page() :
>> ptep_modify_prot_start() will clear the vmf->pte, until
>> ptep_modify_prot_commit() assign a value to the vmf->pte.
>
> [...]
>
>> Our problem scenario is as follows:
>> 
>> task 1                      task 2
>> ------                      ------
>> /* scan global variables */
>> do_numa_page()
>>   spin_lock(vmf->ptl)
>>   ptep_modify_prot_start()
>>   /* set vmf->pte as null */
>>                             /* Access global variables */
>>                             handle_pte_fault()
>>                               /* no pte lock */
>>                               do_pte_missing()
>>                                 do_fault()
>>                                   do_read_fault()
>>   ptep_modify_prot_commit()
>>   /* ptep update done */
>>   pte_unmap_unlock(vmf->pte, vmf->ptl)
>>                                     do_fault_around()
>>                                     __do_fault()
>>                                       filemap_fault()
>>                                         /* page cache is not available
>>                                         and a major fault is triggered */
>>                                         do_sync_mmap_readahead()
>>                                         /* page_not_uptodate and goto
>>                                         out_retry. */
>> 
>> Is there any way to avoid such a major fault?
>
> Yes, this looks like a bug.
>
> It seems to me that the easiest way to fix this is not to zero the pte
> but to make it protnone?  That would send task 2 into do_numa_page()
> where it would take the ptl, then check pte_same(), see that it's
> changed and goto out, which will end up retrying the fault.

There are other places in the kernel where the PTE is cleared, for
example, move_ptes() in mremap.c.  IIUC, we need to audit all them.

Another possible solution is to check PTE again with PTL held before
reading in file data.  This will increase the overhead of major fault
path.  Is it acceptable?

> I'm not particularly expert at page table manipulation, so I'll let
> somebody who is propose an actual patch.  Or you could try to do it?

--
Best Regards,
Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ