[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZzM60CLkWKwFzWqa@localhost.localdomain>
Date: Tue, 12 Nov 2024 12:24:00 +0100
From: Oscar Salvador <osalvador@...e.de>
To: Peter Xu <peterx@...hat.com>
Cc: riel@...riel.com, linux-kernel@...r.kernel.org, kernel-team@...a.com,
linux-mm@...ck.org, akpm@...ux-foundation.org,
muchun.song@...ux.dev, mike.kravetz@...cle.com, leit@...a.com,
willy@...radead.org, stable@...nel.org,
Ackerley Tng <ackerleytng@...gle.com>
Subject: Re: [PATCH 2/4] hugetlbfs: extend hugetlb_vma_lock to private VMAs
On Thu, Nov 07, 2024 at 03:18:51PM -0500, Peter Xu wrote:
> +Ackerley +Oscar
>
> I'm reading the resv code recently and just stumbled upon this. So want to
> raise this question.
>
> IIUC __vma_private_lock() will return false for MAP_PRIVATE hugetlb vma if
> the vma is dup()ed from a fork(), with/without commit 187da0f8250a
> ("hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write") which fixed a
> slightly different issue.
>
> The problem is the current vma lock for private mmap() is based on the resv
> map, and the resv map only belongs to the process that mmap()ed this
> private vma. E.g. dup_mmap() has:
>
> if (is_vm_hugetlb_page(tmp))
> hugetlb_dup_vma_private(tmp);
>
> Which does:
>
> if (vma->vm_flags & VM_MAYSHARE) {
> ...
> } else
> vma->vm_private_data = NULL; <---------------------
>
> So even if I don't know how many of us are even using hugetlb PRIVATE +
> fork(), assuming that's the most controversial use case that I'm aware of
> on hugetlb that people complains about.. with some tricky changes like
> 04f2cbe35699.. Just still want to raise this pure question, that after a
> fork() on private vma, and if I read it alright, lock/unlock operations may
> become noop..
I have been taking a look at this, and yes, __vma_private_lock will
return false for private hugetlb mappings that were forked .
I quickly checked what protects what and we currently have:
hugetlb_vma_lock_read - copy_hugetlb_page_range (only sharing)
hugetlb_vma_lock_read - hugetlb_wp (only for HPAGE_RESV_OWNER)
hugetlb_vma_lock_read - hugetlb_fault , protects huge_pmd_unshare?
hugetlb_vma_lock_read - pagewalks
hugetlb_vma_lock_write - hugetlb_change_protection
hugetlb_vma_lock_write - hugetlb_unshare_pmds
hugetlb_vma_lock_wirte - move_hugetlb_page_tables
hugetlb_vma_lock_wirte - _hugetlb_zap_begin (unmap_vmas)
the ones taking the hugetlb_vma_lock in write (so, the last four) also
take the i_mmap_lock_write (vma->vm_file->f_mapping), and AFAIK, hugetlb
mappings, private or not, should have vma->vm_file->f_mapping set.
Which means that technically we cannot race between hugetlb_change_protection
and move_hugetlb_page_tables etc.
But, checking
commit bf4916922c60f43efaa329744b3eef539aa6a2b2
Author: Rik van Riel <riel@...riel.com>
Date: Thu Oct 5 23:59:07 2023 -0400
hugetlbfs: extend hugetlb_vma_lock to private VMAs
which its motivation was to protect MADV_DONTNEED vs page_faults, I do
not see how it gets protected with private hugetlb mappings that were
dupped (forked).
madvise_dontneed_single_vma
zap_page_range_single
_hugetlb_zap_begin
hugetlb_vma_lock_write - noop for mappings that do not own the reservation
i_mmap_lock_write
But the hugetlb_fault path only takes hugetlb_vma_lock_*, so theorically
we still could race between page_fault vs madvise_dontneed_single_vma?
A quick way to prove would be map a hugetlb private mapping, fork it and
have two threads tryong to madvise(MADV_DONTNEED) and the other trying
to write to it?
I do not know, maybe we are protected in some other way I cannot see
right now.
I will have a look.
--
Oscar Salvador
SUSE Labs
Powered by blists - more mailing lists