[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZiaoZlGc_8ZV3736@bender.morinfr.org>
Date: Mon, 22 Apr 2024 20:11:50 +0200
From: Guillaume Morin <guillaume@...infr.org>
To: David Hildenbrand <david@...hat.com>
Cc: Guillaume Morin <guillaume@...infr.org>, oleg@...hat.com,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
muchun.song@...ux.dev
Subject: Re: [RFC][PATCH] uprobe: support for private hugetlb mappings
(Dropping Mike Kravetz as CC since he has retired and his email is no
longer valid, adding Muchun since he's the current hugetlb maintainer,
as well as linux-trace-kernel)
On 22 Apr 11:39, David Hildenbrand wrote:
>
> On 19.04.24 20:37, Guillaume Morin wrote:
> > libhugetlbfs, the Intel iodlr code both allow to remap .text onto a
> > hugetlb private mapping. It's also pretty easy to do it manually.
> > One drawback of using this functionality is the lack of support for
> > uprobes (NOTE uprobe ignores shareable vmas)
> >
> > This patch adds support for private hugetlb mappings. It does require exposing
> > some hugetlbfs innards and relies on copy_user_large_folio which is only
> > available when CONFIG_HUGETLBFS is used so I had to use an ugly #ifdef
> >
> > If there is some interest in applying this patch in some form or
> > another, I am open to any refactoring suggestions (esp getting rid the
> > #ifdef in uprobes.c) . I tried to limit the
> > amount of branching.
>
> All that hugetlb special casing .... oh my. What's the benefit why we should
> be interested in making that code less clean -- to phrase it in a nice way
> ;) ?
I do appreciate the nice phrasing. Believe me, I did try to limit the
special casing to a minimum :-).
Outside of __replace_page, I added only 3-ish branches so I do not think
it's *too* bad. The uprobe code is using PAGE_{SHIFT,MASK} quite liberally so I
had to add calls to retrieve these for the hugetlb vmas.
__replace_page has a lot of special casing. I certainly agree (and
unfortunately for me it's at the beginning of the patch :)). It's doing
something pretty uncommon outside of the mm code so it has to make a
bunch of specific hugetlb calls. I am not quite sure how to improve it
but if you have suggestions, I'd be happy to refactor.
The benefit - to me - is very clear. People do use hugetlb mappings to
run code in production environments. The perf benefits are there for some
workloads. Intel has published a whitepaper about it etc.
Uprobes are a very good tool to do live tracing. If you can restart the
process and reproduce, you should be able to disable hugetlb remapping
but if you need to look at a live process, there are not many options.
Not being able to use uprobes is crippling.
> Yes, libhugetlbfs exists. But why do we have to support uprobes with it?
> Nobody cared until now, why care now?
I think you could ask the same question for every new feature patch :)
Since the removal a few releases ago of the __morecore() hook in glibc,
the main feature of libhugetlbfs is ELF segments remapping. I think
there are definitely a lot of users that simply deal with this
unnecessary limitation.
I am certainly not shoving this patch through anyone's throat if there
is no interest. But we definitely find it a very useful feature ...
Guillaume.
--
Guillaume Morin <guillaume@...infr.org>
Powered by blists - more mailing lists