lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 Jun 2022 11:38:26 -0600
From:   Khalid Aziz <khalid.aziz@...cle.com>
To:     Barry Song <21cnbao@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
        Arnd Bergmann <arnd@...db.de>,
        Jonathan Corbet <corbet@....net>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>, ebiederm@...ssion.com,
        hagen@...u.net, jack@...e.cz, Kees Cook <keescook@...omium.org>,
        kirill@...temov.name, kucharsk@...il.com, linkinjeon@...nel.org,
        linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, longpeng2@...wei.com,
        Andy Lutomirski <luto@...nel.org>, markhemm@...glemail.com,
        Peter Collingbourne <pcc@...gle.com>,
        Mike Rapoport <rppt@...nel.org>, sieberf@...zon.com,
        sjpark@...zon.de, Suren Baghdasaryan <surenb@...gle.com>,
        tst@...oebel-theuer.de, Iurii Zaikin <yzaikin@...gle.com>
Subject: Re: [PATCH v1 09/14] mm/mshare: Do not free PTEs for mshare'd PTEs

On 5/30/22 22:24, Barry Song wrote:
> On Tue, Apr 12, 2022 at 4:07 AM Khalid Aziz <khalid.aziz@...cle.com> wrote:
>>
>> mshare'd PTEs should not be removed when a task exits. These PTEs
>> are removed when the last task sharing the PTEs exits. Add a check
>> for shared PTEs and skip them.
>>
>> Signed-off-by: Khalid Aziz <khalid.aziz@...cle.com>
>> ---
>>   mm/memory.c | 22 +++++++++++++++++++---
>>   1 file changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index c77c0d643ea8..e7c5bc6f8836 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -419,16 +419,25 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>                  } else {
>>                          /*
>>                           * Optimization: gather nearby vmas into one call down
>> +                        * as long as they all belong to the same mm (that
>> +                        * may not be the case if a vma is part of mshare'd
>> +                        * range
>>                           */
>>                          while (next && next->vm_start <= vma->vm_end + PMD_SIZE
>> -                              && !is_vm_hugetlb_page(next)) {
>> +                              && !is_vm_hugetlb_page(next)
>> +                              && vma->vm_mm == tlb->mm) {
>>                                  vma = next;
>>                                  next = vma->vm_next;
>>                                  unlink_anon_vmas(vma);
>>                                  unlink_file_vma(vma);
>>                          }
>> -                       free_pgd_range(tlb, addr, vma->vm_end,
>> -                               floor, next ? next->vm_start : ceiling);
>> +                       /*
>> +                        * Free pgd only if pgd is not allocated for an
>> +                        * mshare'd range
>> +                        */
>> +                       if (vma->vm_mm == tlb->mm)
>> +                               free_pgd_range(tlb, addr, vma->vm_end,
>> +                                       floor, next ? next->vm_start : ceiling);
>>                  }
>>                  vma = next;
>>          }
>> @@ -1551,6 +1560,13 @@ void unmap_page_range(struct mmu_gather *tlb,
>>          pgd_t *pgd;
>>          unsigned long next;
>>
>> +       /*
>> +        * If this is an mshare'd page, do not unmap it since it might
>> +        * still be in use.
>> +        */
>> +       if (vma->vm_mm != tlb->mm)
>> +               return;
>> +
> 
> expect unmap, have you ever tested reverse mapping in vmscan, especially
> folio_referenced()? are all vmas in those processes sharing page table still
> in the rmap of the shared page?
> without shared PTE, if 1000 processes share one page, we are reading 1000
> PTEs, with it, are we reading just one? or are we reading the same PTE
> 1000 times? Have you tested it?
> 

We are treating mshared region same as threads sharing address space. There is one PTE that is being used by all 
processes and the VMA maintained in the separate mshare mm struct that also holds the shared PTE is the one that gets 
added to rmap. This is a different model with mshare in that it adds an mm struct that is separate from the mm structs 
of the processes that refer to the vma and pte in mshare mm struct. Do you see issues with rmap in this model?

Thanks,
Khalid

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ