[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <97ed86a0-9fac-3dbc-0f9e-d669484c9485@redhat.com>
Date: Mon, 16 Aug 2021 18:13:20 +0200
From: David Hildenbrand <david@...hat.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Khalid Aziz <khalid.aziz@...cle.com>,
"Longpeng (Mike, Cloud Infrastructure Service Product Dept.)"
<longpeng2@...wei.com>, Steven Sistare <steven.sistare@...cle.com>,
Anthony Yznaga <anthony.yznaga@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"Gonglei (Arei)" <arei.gonglei@...wei.com>
Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC
On 16.08.21 17:59, Matthew Wilcox wrote:
> On Mon, Aug 16, 2021 at 05:01:44PM +0200, David Hildenbrand wrote:
>> On 16.08.21 16:40, Matthew Wilcox wrote:
>>> On Mon, Aug 16, 2021 at 04:33:09PM +0200, David Hildenbrand wrote:
>>>>>> I did not follow why we have to play games with MAP_PRIVATE, and having
>>>>>> private anonymous pages shared between processes that don't COW, introducing
>>>>>> new syscalls etc.
>>>>>
>>>>> It's not about SHMEM, it's about file-backed pages on regular
>>>>> filesystems. I don't want to have XFS, ext4 and btrfs all with their
>>>>> own implementations of ARCH_WANT_HUGE_PMD_SHARE.
>>>>
>>>> Let me ask this way: why do we have to play such games with MAP_PRIVATE?
>>>
>>> : Mappings within this address range behave as if they were shared
>>> : between threads, so a write to a MAP_PRIVATE mapping will create a
>>> : page which is shared between all the sharers.
>>>
>>> If so, that's a misunderstanding, because there are no games being played.
>>> What Khalid's saying there is that because the page tables are already
>>> shared for that range of address space, the COW of a MAP_PRIVATE will
>>> create a new page, but that page will be shared between all the sharers.
>>> The second write to a MAP_PRIVATE page (by any of the sharers) will not
>>> create a COW situation. Just like if all the sharers were threads of
>>> the same process.
>>>
>>
>> It actually seems to be just like I understood it. We'll have multiple
>> processes share anonymous pages writable, even though they are not using
>> shared memory.
>>
>> IMHO, sharing page tables to optimize for something kernel-internal (page
>> table consumption) should be completely transparent to user space. Just like
>> ARCH_WANT_HUGE_PMD_SHARE currently is unless I am missing something
>> important.
>>
>> The VM_MAYSHARE check in want_pmd_share()->vma_shareable() makes me assume
>> that we really only optimize for MAP_SHARED right now, never for
>> MAP_PRIVATE.
>
> It's definitely *not* about being transparent to userspace. It's about
> giving userspace new functionality where multiple processes can choose
> to share a portion of their address space with each other. What any
> process changes in that range changes, every sharing process sees.
> mmap(), munmap(), mprotect(), mremap(), everything.
Oh okay, so it's actually much more complicated and complex than I
thought. Thanks for clarifying that! I recall virtiofsd had similar
requirements for sharing memory with the QEMU main process, I might be
wrong.
"existing shared memory area" and your initial page table example made
me assume that we are simply dealing with sharing page tables of MAP_SHARED.
It's actually something like a VMA container that you share between
processes. And whatever VMAs are currently inside that VMA container is
mirrored to other processes. I assume sharing page tables could actually
be an implementation detail, especially when keeping MAP_PRIVATE
(confusing in that context!) and other features that will give you
surprises (uffd) out of the picture.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists