linux-kernel - Re: [RFC PATCH 0/5] madvise MADV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db2b7337-4c6b-4e4b-71d3-dc4940353498@redhat.com>
Date:   Mon, 16 Aug 2021 10:02:22 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Khalid Aziz <khalid.aziz@...cle.com>,
        "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" 
        <longpeng2@...wei.com>, Matthew Wilcox <willy@...radead.org>
Cc:     Steven Sistare <steven.sistare@...cle.com>,
        Anthony Yznaga <anthony.yznaga@...cle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "Gonglei (Arei)" <arei.gonglei@...wei.com>
Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC

On 13.08.21 21:49, Khalid Aziz wrote:
> On Tue, 2021-07-13 at 00:57 +0000, Longpeng (Mike, Cloud Infrastructure
> Service Product Dept.) wrote:
>> Hi Matthew,
>>
>>> -----Original Message-----
>>> From: Matthew Wilcox [mailto:willy@...radead.org]
>>> Sent: Monday, July 12, 2021 9:30 AM
>>> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
>>> <longpeng2@...wei.com>
>>> Cc: Steven Sistare <steven.sistare@...cle.com>; Anthony Yznaga
>>> <anthony.yznaga@...cle.com>; linux-kernel@...r.kernel.org;
>>> linux-mm@...ck.org; Gonglei (Arei) <arei.gonglei@...wei.com>
>>> Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC
>>>
>>> On Mon, Jul 12, 2021 at 09:05:45AM +0800, Longpeng (Mike, Cloud
>>> Infrastructure Service Product Dept.) wrote:
>>>> Let me describe my use case more clearly (just ignore if you're not
>>>> interested in it):
>>>>
>>>> 1. Prog A mmap() 4GB memory (anon or file-mapping), suppose the
>>>> allocated VA range is [0x40000000,0x140000000)
>>>>
>>>> 2. Prog A specifies [0x48000000,0x50000000) and
>>>> [0x80000000,0x100000000) will be shared by its child.
>>>>
>>>> 3. Prog A fork() Prog B and then Prog B exec() a new ELF binary.
>>>>
>>>> 4. Prog B notice the shared ranges (e.g. by input parameters or
>>>> ...)
>>>> and remap them to a continuous VA range.
>>>
>>> This is dangerous.  There must be an active step for Prog B to accept
>>> Prog A's
>>> ranges into its address space.  Otherwise Prog A could almost
>>> completely fill
>>> Prog B's address space and so control where Prog B places its
>>> mappings.  It
>>> could also provoke a latent bug in Prog B if it doesn't handle
>>> address space
>>> exhaustion gracefully.
>>>
>>> I had a proposal to handle this.  Would it meet your requirements?
>>> https://lore.kernel.org/lkml/20200730152250.GG23808@casper.infradead.org/
>>
>> I noticed your proposal of project Sileby and I think it can meet
>> Steven's requirement, but I not sure whether it's suitable for mine
>> because there's no sample code yet, is it in progress ?
> 
> Hi Mike,
> 
> I am working on refining the ideas from project Sileby. I am also
> working on designing the implementation. Since the original concept,
> the mshare API has evolved further. Here is what it loks like:
> 
> The mshare API consists of two system calls - mshare() and
> mshare_unlink()
> 
> mshare
> ======
> 
> int mshare(char *name,void *addr, size_t length, int oflags, mode_t
> mode)
> 
> mshare() creates and opens a new, or opens an existing shared memory
> area that will be shared at PTE level. name refers to shared object
> name that exists under /dev/mshare (this is subject to change. There
> might be better ways to manage the names for mshare'd areas). addr is
> the starting address of this shared memory area and length is the size
> of this area. oflags can be one of:
> 
>      O_RDONLY opens shared memory area for read only access by everyone
>      O_RDWR opens shared memory area for read and write access
>      O_CREAT creates the named shared memory area if it does not exist
>      O_EXCL If O_CREAT was also specified, and a shared memory area
>          exists with that name, return an error.
> 
> mode represents the creation mode for the shared object under
> /dev/mshare.
> 
> Return Value
> ------------
> 
> mshare() returns a file descriptor. A read from this file descriptor
> returns two long values - (1) starting address, and (2) size of the
> shared memory area.
> 
> Notes
> -----
> 
> PTEs are shared at pgdir level and hence it imposes following
> requirements on the address and size given to the mshare():
> 
>      - Starting address must be aligned to pgdir size (512GB on x86_64)
>      - Size must be a multiple of pgdir size
>      - Any mappings created in this address range at any time become
>      shared automatically
>      - Shared address range can have unmapped addresses in it. Any
>      access to unmapped address will result in SIGBUS
> 
> Mappings within this address range behave as if they were shared
> between threads, so a write to a MAP_PRIVATE mapping will create a
> page which is shared between all the sharers. The first process that
> declares an address range mshare'd can continue to map objects in the
> shared area. All other processes that want mshare'd access to this
> memory area can do so by calling mshare(). After this call, the
> address range given by mshare becomes a shared range in its address
> space. Anonymous mappings will be shared and not COWed.

Did I understand correctly that you want to share actual page tables 
between processes and consequently different MMs? That sounds like a 
very bad idea.


-- 
Thanks,

David / dhildenb