linux-kernel - Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250902163514.f877d9c96e913f08c0c6b0b1@kernel.org>
Date: Tue, 2 Sep 2025 16:35:14 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Luo Gengkun <luogengkun@...weicloud.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, mathieu.desnoyers@...icios.com,
 linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, Catalin
 Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 linux-arm-kernel@...ts.infradead.org, Mark Rutland <mark.rutland@....com>
Subject: Re: [PATCH] tracing: Fix tracing_marker may trigger page fault
 during preempt_disable

On Tue, 2 Sep 2025 11:47:32 +0800
Luo Gengkun <luogengkun@...weicloud.com> wrote:

> 
> On 2025/9/1 23:56, Masami Hiramatsu (Google) wrote:
> > On Fri, 29 Aug 2025 08:26:04 -0400
> > Steven Rostedt <rostedt@...dmis.org> wrote:
> >
> >> [ Adding arm64 maintainers ]
> >>
> >> On Fri, 29 Aug 2025 16:29:07 +0800
> >> Luo Gengkun <luogengkun@...weicloud.com> wrote:
> >>
> >>> On 2025/8/20 1:50, Steven Rostedt wrote:
> >>>> On Tue, 19 Aug 2025 10:51:52 +0000
> >>>> Luo Gengkun <luogengkun@...weicloud.com> wrote:
> >>>>   
> >>>>> Both tracing_mark_write and tracing_mark_raw_write call
> >>>>> __copy_from_user_inatomic during preempt_disable. But in some case,
> >>>>> __copy_from_user_inatomic may trigger page fault, and will call schedule()
> >>>>> subtly. And if a task is migrated to other cpu, the following warning will
> >>>> Wait! What?
> >>>>
> >>>> __copy_from_user_inatomic() is allowed to be called from in atomic context.
> >>>> Hence the name it has. How the hell can it sleep? If it does, it's totally
> >>>> broken!
> >>>>
> >>>> Now, I'm not against using nofault() as it is better named, but I want to
> >>>> know why you are suggesting this change. Did you actually trigger a bug here?
> >>> yes, I trigger this bug in arm64.
> >> And I still think this is an arm64 bug.
> > I think it could be.
> >
> >>>>   
> >>>>> be trigger:
> >>>>>           if (RB_WARN_ON(cpu_buffer,
> >>>>>                          !local_read(&cpu_buffer->committing)))
> >>>>>
> >>>>> An example can illustrate this issue:
> > You've missed an important part.
> >
> >>>>> process flow						CPU
> >>>>> ---------------------------------------------------------------------
> >>>>>
> >>>>> tracing_mark_raw_write():				cpu:0
> >>>>>      ...
> >>>>>      ring_buffer_lock_reserve():				cpu:0
> >>>>>         ...
> > 	preempt_disable_notrace(); --> this is unlocked by ring_buffer_unlock_commit()
> >
> >>>>>         cpu = raw_smp_processor_id()			cpu:0
> >>>>>         cpu_buffer = buffer->buffers[cpu]			cpu:0
> >>>>>         ...
> >>>>>      ...
> >>>>>      __copy_from_user_inatomic():				cpu:0
> > So this is called under preempt-disabled.
> >
> >>>>>         ...
> >>>>>         # page fault
> >>>>>         do_mem_abort():					cpu:0
> >>>> Sounds to me that arm64 __copy_from_user_inatomic() may be broken.
> >>>>   
> >>>>>            ...
> >>>>>            # Call schedule
> >>>>>            schedule()					cpu:0
> > If this does not check the preempt flag, it is a problem.
> > Maybe arm64 needs to do fixup and abort instead of do_mem_abort()?
> 
> My kernel was built without CONFIG_PREEMPT_COUNT, so the preempt_disable()
> does nothing more than act as a barrier. In this case, it can pass the
> check by schedule(). Perhaps this is another issue?

OK, I got it. Indeed, in that case, we have no way to check this
happens in the preempt critical section.
Anyway, as in discussed here, __copy_from_user_inatomic() is for
the internal function, so I'm also OK to this patch.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>

BTW, currently we just write a fault message if the
__copy_from_user_*() hits a fault, but I think we can retry with
normal __copy_from_user() to a kernel buffer and copy it in the
ring buffer as slow path.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@...nel.org>