[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db2b7337-4c6b-4e4b-71d3-dc4940353498@redhat.com>
Date: Mon, 16 Aug 2021 10:02:22 +0200
From: David Hildenbrand <david@...hat.com>
To: Khalid Aziz <khalid.aziz@...cle.com>,
"Longpeng (Mike, Cloud Infrastructure Service Product Dept.)"
<longpeng2@...wei.com>, Matthew Wilcox <willy@...radead.org>
Cc: Steven Sistare <steven.sistare@...cle.com>,
Anthony Yznaga <anthony.yznaga@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"Gonglei (Arei)" <arei.gonglei@...wei.com>
Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC
On 13.08.21 21:49, Khalid Aziz wrote:
> On Tue, 2021-07-13 at 00:57 +0000, Longpeng (Mike, Cloud Infrastructure
> Service Product Dept.) wrote:
>> Hi Matthew,
>>
>>> -----Original Message-----
>>> From: Matthew Wilcox [mailto:willy@...radead.org]
>>> Sent: Monday, July 12, 2021 9:30 AM
>>> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
>>> <longpeng2@...wei.com>
>>> Cc: Steven Sistare <steven.sistare@...cle.com>; Anthony Yznaga
>>> <anthony.yznaga@...cle.com>; linux-kernel@...r.kernel.org;
>>> linux-mm@...ck.org; Gonglei (Arei) <arei.gonglei@...wei.com>
>>> Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC
>>>
>>> On Mon, Jul 12, 2021 at 09:05:45AM +0800, Longpeng (Mike, Cloud
>>> Infrastructure Service Product Dept.) wrote:
>>>> Let me describe my use case more clearly (just ignore if you're not
>>>> interested in it):
>>>>
>>>> 1. Prog A mmap() 4GB memory (anon or file-mapping), suppose the
>>>> allocated VA range is [0x40000000,0x140000000)
>>>>
>>>> 2. Prog A specifies [0x48000000,0x50000000) and
>>>> [0x80000000,0x100000000) will be shared by its child.
>>>>
>>>> 3. Prog A fork() Prog B and then Prog B exec() a new ELF binary.
>>>>
>>>> 4. Prog B notice the shared ranges (e.g. by input parameters or
>>>> ...)
>>>> and remap them to a continuous VA range.
>>>
>>> This is dangerous. There must be an active step for Prog B to accept
>>> Prog A's
>>> ranges into its address space. Otherwise Prog A could almost
>>> completely fill
>>> Prog B's address space and so control where Prog B places its
>>> mappings. It
>>> could also provoke a latent bug in Prog B if it doesn't handle
>>> address space
>>> exhaustion gracefully.
>>>
>>> I had a proposal to handle this. Would it meet your requirements?
>>> https://lore.kernel.org/lkml/20200730152250.GG23808@casper.infradead.org/
>>
>> I noticed your proposal of project Sileby and I think it can meet
>> Steven's requirement, but I not sure whether it's suitable for mine
>> because there's no sample code yet, is it in progress ?
>
> Hi Mike,
>
> I am working on refining the ideas from project Sileby. I am also
> working on designing the implementation. Since the original concept,
> the mshare API has evolved further. Here is what it loks like:
>
> The mshare API consists of two system calls - mshare() and
> mshare_unlink()
>
> mshare
> ======
>
> int mshare(char *name,void *addr, size_t length, int oflags, mode_t
> mode)
>
> mshare() creates and opens a new, or opens an existing shared memory
> area that will be shared at PTE level. name refers to shared object
> name that exists under /dev/mshare (this is subject to change. There
> might be better ways to manage the names for mshare'd areas). addr is
> the starting address of this shared memory area and length is the size
> of this area. oflags can be one of:
>
> O_RDONLY opens shared memory area for read only access by everyone
> O_RDWR opens shared memory area for read and write access
> O_CREAT creates the named shared memory area if it does not exist
> O_EXCL If O_CREAT was also specified, and a shared memory area
> exists with that name, return an error.
>
> mode represents the creation mode for the shared object under
> /dev/mshare.
>
> Return Value
> ------------
>
> mshare() returns a file descriptor. A read from this file descriptor
> returns two long values - (1) starting address, and (2) size of the
> shared memory area.
>
> Notes
> -----
>
> PTEs are shared at pgdir level and hence it imposes following
> requirements on the address and size given to the mshare():
>
> - Starting address must be aligned to pgdir size (512GB on x86_64)
> - Size must be a multiple of pgdir size
> - Any mappings created in this address range at any time become
> shared automatically
> - Shared address range can have unmapped addresses in it. Any
> access to unmapped address will result in SIGBUS
>
> Mappings within this address range behave as if they were shared
> between threads, so a write to a MAP_PRIVATE mapping will create a
> page which is shared between all the sharers. The first process that
> declares an address range mshare'd can continue to map objects in the
> shared area. All other processes that want mshare'd access to this
> memory area can do so by calling mshare(). After this call, the
> address range given by mshare becomes a shared range in its address
> space. Anonymous mappings will be shared and not COWed.
Did I understand correctly that you want to share actual page tables
between processes and consequently different MMs? That sounds like a
very bad idea.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists