lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ff4d759-307e-31a2-4124-98de9e423d7e@efficios.com>
Date:   Wed, 2 Nov 2022 09:46:31 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Beau Belgrave <beaub@...ux.microsoft.com>
Cc:     rostedt@...dmis.org, mhiramat@...nel.org,
        dcook@...ux.microsoft.com, alanau@...ux.microsoft.com,
        linux-trace-devel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] tracing/user_events: Remote write ABI

On 2022-10-31 12:53, Beau Belgrave wrote:
> On Sat, Oct 29, 2022 at 09:58:26AM -0400, Mathieu Desnoyers wrote:
>> On 2022-10-28 18:17, Beau Belgrave wrote:
>>> On Fri, Oct 28, 2022 at 05:50:04PM -0400, Mathieu Desnoyers wrote:
>>>> On 2022-10-27 18:40, Beau Belgrave wrote:
>>
>> [...]
>>>
>>>>>
>>>>> NOTE:
>>>>> User programs that wish to have the enable bit shared across forks
>>>>> either need to use a MAP_SHARED allocated address or register a new
>>>>> address and file descriptor. If MAP_SHARED cannot be used or new
>>>>> registrations cannot be done, then it's allowable to use MAP_PRIVATE
>>>>> as long as the forked children never update the page themselves. Once
>>>>> the page has been updated, the page from the parent will be copied over
>>>>> to the child. This new copy-on-write page will not receive updates from
>>>>> the kernel until another registration has been performed with this new
>>>>> address.
>>>>
>>>> This seems rather odd. I would expect that if a parent process registers
>>>> some instrumentation using private mappings for enabled state through the
>>>> user events ioctl, and then forks, the child process would seamlessly be
>>>> traced by the user events ABI while being able to also change the enabled
>>>> state from the userspace tracer libraries (which would trigger COW).
>>>> Requiring the child to re-register to user events is rather odd.
>>>>
>>>
>>> It's the COW that is the problem, see below.
>>>
>>>> What is preventing us from tracing the child without re-registration in this
>>>> scenario ?
>>>>
>>>
>>> Largely knowing when the COW occurs on a specific page. We don't make
>>> the mappings, so I'm unsure if we can ask to be notified easily during
>>> these times or not. If we could, that would solve this. I'm glad you are
>>> thinking about this. The note here was exactly to trigger this
>>> discussion :)
>>>
>>> I believe this is the same as a Futex, I'll take another look at that
>>> code to see if they've come up with anything regarding this.
>>>
>>> Any ideas?
>>
>> Based on your description of the symptoms, AFAIU, upon registration of a
>> given user event associated with a mm_struct, the user events ioctl appears
>> to translates the virtual address into a page pointer immediately, and keeps
>> track of that page afterwards. This means it loses track of the page when
>> COW occurs.
>>
> 
> No, we keep the memory descriptor and virtual address so we can properly
> resolve to page per-process.
> 
>> Why not keep track of the registered virtual address and struct_mm
>> associated with the event rather than the page ? Whenever a state change is
>> needed, the virtual-address-to-page translation will be performed again. If
>> it follows a COW, it will get the new copied page. If it happens that no COW
>> was done, it should map to the original page. If the mapping is shared, the
>> kernel would update that shared page. If the mapping is private, then the
>> kernel would COW the page before updating it.
>>
>> Thoughts ?
>>
> 
> I think you are forgetting about page table entries. My understanding is
> the process will have the VMAs copied on fork, but the page table
> entries will be marked read-only. Then when the write access occurs, the
> COW is created (since the PTE says readonly, but the VMA says writable).
> However, that COW page is now only mapped within that forked process
> page table.
> 
> This requires tracking the child memory descriptors in addition to the
> parent. The most straightforward way I see this happening is requiring
> user side to mmap the user_event_data fd that is used for write. This
> way when fork occurs in dup_mm() / dup_mmap() that mmap'd
> user_event_data will get open() / close() called per-fork. I could then
> copy the enablers from the parent but with the child's memory descriptor
> to allow proper lookup.
> 
> This is like fork before COW, it's a bummer I cannot see a way to do
> this per-page. Doing the above would work, but it requires copying all
> the enablers, not just the one that changed after the fork.

This brings an overall design concern I have with user-events: AFAIU, 
the lifetime of the user event registration appears to be linked to the 
lifetime of a file descriptor.

What happens when that file descriptor is duplicated and send over to 
another process through unix sockets credentials ? Does it mean that the 
kernel have a handle on the wrong process to update the "enabled" state?

Also, what happens on execve system call if the file descriptor 
representing the user event is not marked as close-on-exec ? Does it 
mean the kernel can corrupt user-space memory of the after-exec loaded 
binary when it attempts to update the "enabled" state ?

If I get this right, I suspect we might want to move the lifetime of the 
user event registration to the memory space (mm_struct).

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ