[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0c205e8a-9c7e-4500-8ede-00fae2d97d2f@huaweicloud.com>
Date: Wed, 28 May 2025 00:29:52 +0800
From: Pu Lehui <pulehui@...weicloud.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: mhiramat@...nel.org, oleg@...hat.com, peterz@...radead.org,
akpm@...ux-foundation.org, Liam.Howlett@...cle.com, vbabka@...e.cz,
jannh@...gle.com, pfalcato@...e.de, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, pulehui@...wei.com
Subject: Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten
when expanding vma during mremap
On 2025/5/27 21:34, Lorenzo Stoakes wrote:
> Hi Pu,
>
> Just as you were sending this I was about to send you my suggestion :) That
> is hilarious haha. Maybe some cosmic connection? :P
Hi Lorenzo,
haha, so coincidental.π
>
> Anyway I don't know whether your approach is _quite_ correct here, you are
> generally disabling the uprobe behaviour on all VMA expansion (that is -
> new VMA merge, relocate_vma_down() (unlikely to be relevant ;), it seems
> sensible to only do so in the circumstances we know to be problematic.
agree, we should limit that on copy_vma.
>
> I really do not want mremap-specific stuff in general merge code, it's the
> move page tables bit that makes this broken.
>
> So... :P let me attach my suggestion as a fix-patch below.
>
> TO BE CLEAR - I present this in this form purely to make it easy for you to
> grab, PLEASE take this patch and rework it (if it makes sense), I don't
> need any attribution other than a Suggested-by, maybe.
This is truly appreciate! In fact, this is my first patch related to
memory and uprobe. My original intention was to spark discussion and
attract experts to help handle this matter. I don't place much
importance on the attribution of this patch. Like you, my focus is on
contributing to the resolution of the issue.π
>
> They are rather close other than where this flag is set :)
>
> I am by the way assuming that uprobes work by installing a special PTE at
> page offset 0 and only in the case where there is something installed here
> do we need to worry.
As Oleg later mentioned, we do not need this restriction, and I just
verified this using the following test case:
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <syscall.h>
#include <sys/mman.h>
#include <linux/perf_event.h>
#define FNAME "./foo"
int main(int argc, char *argv[])
{
int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
struct perf_event_attr attr = {
.type = 9,
.size = 72,
.config1 = (long) FNAME,
.config2 = 8192,
};
void *addr1, *addr2;
ftruncate(fd, 4 * 4096);
mmap(NULL, 4096, PROT_EXEC, MAP_PRIVATE, fd, 4096);
syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
addr1 = mmap(NULL, 3 * 4096, PROT_NONE, MAP_PRIVATE, fd, 4096);
addr2 = mremap(addr1, 2 * 4096, 3 * 4096, MREMAP_MAYMOVE);
mremap(addr2 + 4096, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED,
addr1 + 4096);
return 0;
}
But one thing confuses meβif mmap is used with zero_offset, it will not
reproduce. Perhaps I need to verify this tomorrow.
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <syscall.h>
#include <sys/mman.h>
#include <linux/perf_event.h>
#define FNAME "./foo"
int main(int argc, char *argv[])
{
int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
struct perf_event_attr attr = {
.type = 9,
.size = 72,
.config1 = (long) FNAME,
.config2 = 8192,
};
void *addr1, *addr2;
ftruncate(fd, 4 * 4096);
mmap(NULL, 4096, PROT_EXEC, MAP_PRIVATE, fd, 0);
syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
addr1 = mmap(NULL, 4 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
addr2 = mremap(addr1, 3 * 4096, 4 * 4096, MREMAP_MAYMOVE);
mremap(addr2 + 2 * 4096, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED,
addr1 + 2 * 4096);
return 0;
}
>
> Please correct me if I'm misunderstanding here, I am not an expert on
> uprobes.
>
> Thanks! This is a great find overall.
>
> I'd also quite like us to make your repro a self test if possible? Not sure
> where we'd put it... could be in tools/testing/selftests/mm/merge.c
> actually, new one I created explicitly for VMA merge scenarios.
I will have a try!
>
> Cheers, Lorenzo
>
> On Tue, May 27, 2025 at 01:23:50PM +0000, Pu Lehui wrote:
>> From: Pu Lehui <pulehui@...wei.com>
>>
>> We encountered a BUG alert triggered by Syzkaller as follows:
>> BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
>>
>> And we can reproduce it with the following steps:
>> 1. register uprobe on file at zero offset
>> 2. mmap the file at zero offset:
>> addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
>> 3. mremap part of vma1 to new vma2:
>> addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
>> 4. mremap back to orig addr1:
>> mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>>
>> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
>> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
>> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
>> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
>> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
>> 4096, addr1 + 8192] still maps the file, it will take
>> vma_merge_new_range to merge these two addr ranges, and then do
>> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
>> offset, it will install uprobe anon page to the merged vma. However, the
>> upcomming move_page_tables step, which use set_pte_at to remap the vma2
>> uprobe anon page to the merged vma, will over map the old uprobe anon
>> page in the merged vma, and lead the old uprobe anon page to be orphan.
>>
>> Since the uprobe anon page will be remapped to the merged vma, we can
>> remove the unnecessary uprobe_mmap on merged vma, that is, do not
>> perform uprobe_mmap on expanded vma.
>>
>> This problem was first find in linux-6.6.y and also exists in the
>> community syzkaller:
>> https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
>>
>> The complete Syzkaller C reproduction program is as follows:
>>
>> #define _GNU_SOURCE
>> #include <fcntl.h>
>> #include <unistd.h>
>> #include <syscall.h>
>> #include <sys/mman.h>
>> #include <linux/perf_event.h>
>>
>> int main(int argc, char *argv[])
>> {
>> int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
>> struct perf_event_attr attr = {
>> .type = 9,
>> .uprobe_path = (long) FNAME,
>> .probe_offset = 0x0,
>> };
>> void *addr1, *addr2;
>>
>> write(fd, "x", 1);
>> mmap(NULL, 4096, PROT_EXEC, MAP_PRIVATE, fd, 0);
>>
>> syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
>> addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
>> addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
>> mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>>
>> return 0;
>> }
>>
>> Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
>
> Nit, but we'll want to cc: stable once we agree on a solution too.
add `Cc: stable@...r.kernel.org` above Fixes tag?
>
>> Signed-off-by: Pu Lehui <pulehui@...wei.com>
>> ---
>> mm/vma.c | 7 ++++++-
>> mm/vma.h | 7 +++++++
>> 2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vma.c b/mm/vma.c
>> index 1c6595f282e5..6445f515c7f2 100644
>> --- a/mm/vma.c
>> +++ b/mm/vma.c
>> @@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>>
>> if (vp->file) {
>> i_mmap_unlock_write(vp->mapping);
>> - uprobe_mmap(vp->vma);
>> + if (!vp->skip_vma_uprobe)
>> + uprobe_mmap(vp->vma);
>>
>> if (vp->adj_next)
>> uprobe_mmap(vp->adj_next);
>
> I think we need to cover off the adj_next case too, strictly. See the
> attached patch :P
>
> To be honest the adj_next case won't be relevant, as this is not set for a
> new VMA, but it'd be odd to have a 'skip' or 'no' uprobe option and to not
> obey it in both cases.
alright, I haven't thought of any possible situations yet. If there are
any problems later, we can add it back.
>
>> @@ -737,6 +738,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
>> if (vma_iter_prealloc(vmg->vmi, vma))
>> return -ENOMEM;
>>
>> + vp.skip_vma_uprobe = vmg->skip_vma_uprobe;
>> vma_prepare(&vp);
>> /*
>> * THP pages may need to do additional splits if we increase
>> @@ -1151,6 +1153,9 @@ int vma_expand(struct vma_merge_struct *vmg)
>> if (remove_next)
>> vmg->__remove_next = true;
>>
>> + /* skip uprobe_mmap on expanded vma */
>> + vmg->skip_vma_uprobe = true;
>> +
>
> Yeah this doesn't feel right, this will make this happen for all
> vma_expand() cases (including new VMA merge) and that's surely not correct
> for non-mremap cases?
agree!
>
> Again see the attached patch :P
>
>> if (commit_merge(vmg))
>> goto nomem;
>>
>> diff --git a/mm/vma.h b/mm/vma.h
>> index 9a8af9be29a8..56cc0364d239 100644
>> --- a/mm/vma.h
>> +++ b/mm/vma.h
>> @@ -19,6 +19,8 @@ struct vma_prepare {
>> struct vm_area_struct *insert;
>> struct vm_area_struct *remove;
>> struct vm_area_struct *remove2;
>> + /* Whether to skip uprobe_mmap on vma */
>> + bool skip_vma_uprobe;
>> };
>>
>> struct unlink_vma_file_batch {
>> @@ -120,6 +122,11 @@ struct vma_merge_struct {
>> */
>> bool give_up_on_oom :1;
>>
>> + /*
>> + * Whether to skip uprobe_mmap on merged vma.
>> + */
>> + bool skip_vma_uprobe :1;
>> +
>> /* Internal flags set during merge process: */
>>
>> /*
>> --
>> 2.34.1
>>
>
> ----8<----
>>>From 30e39d0f223ae4ab5669584593071f5f7266beeb Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> Date: Tue, 27 May 2025 14:11:26 +0100
> Subject: [PATCH] mm: avoid orphaned uprobe
>
> If, on mremap, me move a file-backed VMA mapped at offset 0 that possess a
> uprobe and it merges with an adjacent VMA, we will then invoke
> uprobe_mmap() once again to install a uprobe into this newly established
> VMA.
>
> This is problematic, as when we then move the page tables back into place,
> we overwrite the uprobe entry and thus orphan the merge-established uprobe.
>
> Avoid this by threading state to explicitly disallow the establishment of a
> new uprobe upon merge under these circumstances.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> ---
> mm/vma.c | 21 ++++++++++++++++++---
> mm/vma.h | 5 +++++
> 2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index 1c6595f282e5..cc18d1dcdc57 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -169,6 +169,9 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
> vp->file = vma->vm_file;
> if (vp->file)
> vp->mapping = vma->vm_file->f_mapping;
> +
> + if (vmg && vmg->no_uprobe)
> + vp->no_uprobe = true;
> }
>
> /*
> @@ -358,10 +361,13 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>
> if (vp->file) {
> i_mmap_unlock_write(vp->mapping);
> - uprobe_mmap(vp->vma);
>
> - if (vp->adj_next)
> - uprobe_mmap(vp->adj_next);
> + if (!vp->no_uprobe) {
> + uprobe_mmap(vp->vma);
> +
> + if (vp->adj_next)
> + uprobe_mmap(vp->adj_next);
> + }
> }
>
> if (vp->remove) {
> @@ -1830,6 +1836,15 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> vmg.middle = NULL; /* New VMA range. */
> vmg.pgoff = pgoff;
> vmg.next = vma_iter_next_rewind(&vmi, NULL);
> +
> + /*
> + * If the VMA we are copying might contain a uprobe PTE, ensure that we
> + * do not establish one upon merge. otherwise, when mremap() moves page
> + * tables into place, we'll orphan the one just created.
> + */
> + if (vma->vm_file && vma->vm_pgoff == 0)
> + vmg.no_uprobe = true;
> +
> new_vma = vma_merge_new_range(&vmg);
>
> if (new_vma) {
> diff --git a/mm/vma.h b/mm/vma.h
> index 9a8af9be29a8..4c35c5ab1aa2 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -19,6 +19,8 @@ struct vma_prepare {
> struct vm_area_struct *insert;
> struct vm_area_struct *remove;
> struct vm_area_struct *remove2;
> +
> + bool no_uprobe :1;
> };
>
> struct unlink_vma_file_batch {
> @@ -120,6 +122,9 @@ struct vma_merge_struct {
> */
> bool give_up_on_oom :1;
>
> + /* If set, do not install a uprobe upon merge. */
> + bool no_uprobe :1;
> +
> /* Internal flags set during merge process: */
>
> /*
> --
> 2.49.0
nice, thanks Lorenzo!
Powered by blists - more mailing lists