lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e95a6e4-9993-40ae-b563-44b7024da25c@redhat.com>
Date: Tue, 27 Aug 2024 19:35:48 +0200
From: David Hildenbrand <david@...hat.com>
To: zhiguojiang <justinjiang@...o.com>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, oe-lkp@...ts.linux.dev, oliver.sang@...el.com
Cc: opensource.kernel@...o.com
Subject: Re: [PATCH v2] vma remove the unneeded avc bound with non-CoWed folio

On 27.08.24 03:50, zhiguojiang wrote:
> 
> 
> 在 2024/8/27 1:24, David Hildenbrand 写道:
>> On 23.08.24 16:01, Zhiguo Jiang wrote:
>>> After CoWed by do_wp_page, the vma established a new mapping
>>> relationship
>>> with the CoWed folio instead of the non-CoWed folio. However, regarding
>>> the situation where vma->anon_vma and the non-CoWed folio's anon_vma are
>>> not same, the avc binding relationship between them will no longer be
>>> needed, so it is issue for the avc binding relationship still existing
>>> between them.
>>>
>>> This patch will remove the avc binding relationship between vma and the
>>> non-CoWed folio's anon_vma, which each has their own independent
>>> anon_vma. It can also alleviates rmap overhead simultaneously.
>>>
>>> Signed-off-by: Zhiguo Jiang <justinjiang@...o.com>
>>> ---
>>> -v2:
>>>    * Solve the kernel test robot noticed "WARNING"
>>>      Reported-by: kernel test robot <oliver.sang@...el.com>
>>>      Closes:
>>> https://lore.kernel.org/oe-lkp/202408230938.43f55b4-lkp@intel.com
>>>    * Update comments to more accurately describe this patch.
>>>
>>> -v1:
>>> https://lore.kernel.org/linux-mm/20240820143359.199-1-justinjiang@vivo.com/
>>>
>>>    include/linux/rmap.h |  1 +
>>>    mm/memory.c          |  8 +++++++
>>>    mm/rmap.c            | 53 ++++++++++++++++++++++++++++++++++++++++++++
>>>    3 files changed, 62 insertions(+)
>>>
>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>> index 91b5935e8485..8607d28a3146
>>> --- a/include/linux/rmap.h
>>> +++ b/include/linux/rmap.h
>>> @@ -257,6 +257,7 @@ void folio_remove_rmap_ptes(struct folio *,
>>> struct page *, int nr_pages,
>>>        folio_remove_rmap_ptes(folio, page, 1, vma)
>>>    void folio_remove_rmap_pmd(struct folio *, struct page *,
>>>            struct vm_area_struct *);
>>> +void folio_remove_anon_avc(struct folio *, struct vm_area_struct *);
>>>      void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
>>>            unsigned long address, rmap_t flags);
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 93c0c25433d0..4c89cb1cb73e
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -3428,6 +3428,14 @@ static vm_fault_t wp_page_copy(struct vm_fault
>>> *vmf)
>>>                 * old page will be flushed before it can be reused.
>>>                 */
>>>                folio_remove_rmap_pte(old_folio, vmf->page, vma);
>>> +
>>> +            /*
>>> +             * If the new_folio's anon_vma is different from the
>>> +             * old_folio's anon_vma, the avc binding relationship
>>> +             * between vma and the old_folio's anon_vma is removed,
>>> +             * avoiding rmap redundant overhead.
>>> +             */
>>> +            folio_remove_anon_avc(old_folio, vma);
>>
>> ... by increasing write fault latency, introducing an RMAP walk (!)? Hmm?
>>
>> On the reuse path, we do a folio_move_anon_rmap(), to optimize that.
>>
> Thanks for your comments. This may not be a good fixup patch. The
> resue patch folio_move_anon_rmap() seems to be exclusive or
> _refcount = 1 folios. The fork() path seems to clear exclusive flag
> in copy_page_range() --> ... --> __folio_try_dup_anon_rmap(). However,
> I observed lots of orphan avcs by the above debug trace logs in
> wp_page_copy(). But they may be not removed by discussing with Mika.

Was this patch ever tested? I cannot even boot a simple VM without an endless stream of

[    5.804598] ------------[ cut here ]------------
[    5.805494] WARNING: CPU: 11 PID: 595 at mm/rmap.c:443 unlink_anon_vmas+0x19b/0x1d0
[    5.806962] Modules linked in: qemu_fw_cfg
[    5.807762] CPU: 11 UID: 0 PID: 595 Comm: dracut-rootfs-g Tainted: G        W          6.11.0-rc4+ #72
[    5.809546] Tainted: [W]=WARN
[    5.810127] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[    5.811753] RIP: 0010:unlink_anon_vmas+0x19b/0x1d0
[    5.812680] Code: b0 00 00 00 00 75 1f f0 ff 8f a0 00 00 00 75 a2 e8 8a fd ff ff eb 9b 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d4 82 d0 00 0f 0b eb dd <0f> 0b eb cf 0f 0b 48 83 c7 08 e8 16 40 d7 ff e9 ea fe ff ff 48 8b
[    5.816247] RSP: 0018:ffffa19f43bb78d0 EFLAGS: 00010286
[    5.817258] RAX: ffff8a71c1bdd2d0 RBX: ffff8a71c1bdd2c0 RCX: ffff8a71c27a86c8
[    5.818624] RDX: 0000000000000001 RSI: ffff8a71c2771b28 RDI: ffff8a71c27a9e60
[    5.820011] RBP: dead000000000122 R08: 0000000000000000 R09: 0000000000000001
[    5.821380] R10: 0000000000000200 R11: 0000000000000001 R12: ffff8a71c2771b28
[    5.822748] R13: dead000000000100 R14: ffff8a71c2771b18 R15: ffff8a71c27a9e60
[    5.824122] FS:  0000000000000000(0000) GS:ffff8a7337980000(0000) knlGS:0000000000000000
[    5.825665] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.826775] CR2: 00007fca7f70ac58 CR3: 00000001027b2004 CR4: 0000000000770ef0
[    5.828146] PKRU: 55555554
[    5.828686] Call Trace:
[    5.829169]  <TASK>
[    5.829594]  ? __warn.cold+0xb1/0x13e
[    5.830312]  ? unlink_anon_vmas+0x19b/0x1d0
[    5.831118]  ? report_bug+0xff/0x140
[    5.831840]  ? handle_bug+0x3c/0x80
[    5.832524]  ? exc_invalid_op+0x17/0x70
[    5.833262]  ? asm_exc_invalid_op+0x1a/0x20
[    5.834086]  ? unlink_anon_vmas+0x19b/0x1d0
[    5.834908]  free_pgtables+0x130/0x290
[    5.835661]  exit_mmap+0x19a/0x460
[    5.836351]  __mmput+0x4b/0x120
[    5.836965]  do_exit+0x2e1/0xac0
[    5.837601]  ? lock_release+0xd5/0x2c0
[    5.838343]  do_group_exit+0x36/0xa0
[    5.839035]  __x64_sys_exit_group+0x18/0x20
[    5.839866]  x64_sys_call+0x14b4/0x14c0


Andrew, please remove this from mm-unstable.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ