lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <22fcde31-16c4-42d0-ad99-568173ec4dd0@redhat.com>
Date: Mon, 22 Apr 2024 20:59:18 +0200
From: David Hildenbrand <david@...hat.com>
To: Guillaume Morin <guillaume@...infr.org>
Cc: oleg@...hat.com, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, muchun.song@...ux.dev
Subject: Re: [RFC][PATCH] uprobe: support for private hugetlb mappings

On 22.04.24 20:11, Guillaume Morin wrote:
> (Dropping Mike Kravetz as CC since he has retired and his email is no
> longer valid, adding Muchun since he's the current hugetlb maintainer,
> as well as linux-trace-kernel)
> 
> On 22 Apr 11:39, David Hildenbrand wrote:
>>
>> On 19.04.24 20:37, Guillaume Morin wrote:
>>> libhugetlbfs, the Intel iodlr code both allow to remap .text onto a
>>> hugetlb private mapping. It's also pretty easy to do it manually.
>>> One drawback of using this functionality is the lack of support for
>>> uprobes (NOTE uprobe ignores shareable vmas)
>>>
>>> This patch adds support for private hugetlb mappings.  It does require exposing
>>> some hugetlbfs innards and relies on copy_user_large_folio which is only
>>> available when CONFIG_HUGETLBFS is used so I had to use an ugly #ifdef
>>>
>>> If there is some interest in applying this patch in some form or
>>> another, I am open to any refactoring suggestions (esp getting rid the
>>> #ifdef in uprobes.c) . I tried to limit the
>>> amount of branching.
>>
>> All that hugetlb special casing .... oh my. What's the benefit why we should
>> be interested in making that code less clean -- to phrase it in a nice way
>> ;) ?
> 
> I do appreciate the nice phrasing. Believe me, I did try to limit the
> special casing to a minimum :-).
> 
> Outside of __replace_page, I added only 3-ish branches so I do not think
> it's *too* bad. The uprobe code is using PAGE_{SHIFT,MASK} quite liberally so I
> had to add calls to retrieve these for the hugetlb vmas.
> 
> __replace_page has a lot of special casing. I certainly agree (and
> unfortunately for me it's at the beginning of the patch :)).  It's doing
> something pretty uncommon outside of the mm code so it has to make a
> bunch of specific hugetlb calls. I am not quite sure how to improve it
> but if you have suggestions, I'd be happy to refactor.

See below.

> 
> The benefit - to me - is very clear. People do use hugetlb mappings to
> run code in production environments. The perf benefits are there for some
> workloads. Intel has published a whitepaper about it etc.
> Uprobes are a very good tool to do live tracing. If you can restart the
> process and reproduce, you should be able to disable hugetlb remapping
> but if you need to look at a live process, there are not many options.
> Not being able to use uprobes is crippling.

Please add all that as motivation to the patch description or cover letter.

> 
>> Yes, libhugetlbfs exists. But why do we have to support uprobes with it?
>> Nobody cared until now, why care now?
> 
> I think you could ask the same question for every new feature patch :)

I have to, because it usually indicates a lack of motivation in the 
cover-letter/patch description :P

People will have to maintain that code, and maintaining hugetlb code in 
odd places is no fun ...

> 
> Since the removal a few releases ago of the __morecore() hook in glibc,
> the main feature of libhugetlbfs is ELF segments remapping. I think
> there are definitely a lot of users that simply deal with this
> unnecessary limitation.
> 
> I am certainly not shoving this patch through anyone's throat if there
> is no interest. But we definitely find it a very useful feature ...

Let me try to see if we can get this done cleaner.

One ugly part (in general here) is the custom page replacement in the 
registration part.

We are guaranteed to have a MAP_PRIVATE mapping. Instead of replacing 
pages ourselves (which we likely shouldn't do ...) ... maybe we could 
use FAULT_FLAG_UNSHARE faults such that we will get an anonymous folio 
populated. (like KSM does nowadays)

Punching FOLL_PIN|FOLL_LONGTERM into GUP would achieve the same thing, 
but using FOLL_WRITE would not work on many file systems. So maybe we 
have to trigger an unsharing fault ourselves.

That would do the page replacement for us and we "should" be able to 
lookup an anonymous folio that we can then just modify, like ptrace would.

But then, there is also unregistration part, with weird conditional page 
replacement. Zapping the anon page if the content matches the content of 
the original page is one thing. But why are we placing an existing 
anonymous page by a new anonymous page when the content from the 
original page differs (but matches the one from the just copied page?)?

I'll have to further think about that one. It's all a bit nasty.


One thing to note is that hugetlb folios don't grow on trees. Likely, 
Many setups *don't* reserve extra hugetlb folios and you might just 
easily be running out of free hugetlb folios that you can use to break 
COW here (replace a file hugetlb by a fresh anon hugetlb page). Likely 
it's easy to make register or unregister fail.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ