[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <79d32963-de38-49cf-a03f-f6f5f4fbb462@I-love.SAKURA.ne.jp>
Date: Sat, 15 Jun 2024 19:59:53 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Nicolas Saenz Julienne <nsaenz@...zon.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
"Steven Rostedt (Google)" <rostedt@...dmis.org>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman
<eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] bpf: don't call mmap_read_trylock() from IRQ context
Hello.
On 2024/06/15 1:40, Alexei Starovoitov wrote:
> On Fri, Jun 14, 2024 at 8:15 AM Tetsuo Handa
> <penguin-kernel@...ove.sakura.ne.jp> wrote:
>> "inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage." upon unlock from IRQ work
>> was reported at https://syzkaller.appspot.com/bug?extid=40905bca570ae6784745 .
>
> imo the issue is elsewhere. syzbot reports:
> local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
> __mmap_lock_do_trace_released+0x9c/0x620 mm/mmap_lock.c:243
> __mmap_lock_trace_released include/linux/mmap_lock.h:42 [inline]
>
> it complains about:
> local_lock(&memcg_paths.lock);
> in TRACE_MMAP_LOCK_EVENT.
> which looks like a false positive.
Commit 2b5067a8143e ("mm: mmap_lock: add tracepoints around lock
acquisition") introduced TRACE_MMAP_LOCK_EVENT() macro as below.
#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \
do { \
const char *memcg_path; \
preempt_disable(); \
memcg_path = get_mm_memcg_path(mm); \
trace_mmap_lock_##type(mm, \
memcg_path != NULL ? memcg_path : "", \
##__VA_ARGS__); \
if (likely(memcg_path != NULL)) \
put_memcg_path_buf(); \
preempt_enable(); \
} while (0)
Commit d01079f3d0c0 ("mm/mmap_lock: remove dead code for !CONFIG_TRACING
configurations") moved the location of this macro.
Commit 832b50725373 ("mm: mmap_lock: use local locks instead of disabling
preemption") replaced preempt_disable() with local_lock(&memcg_paths.lock)
based on an argument that preempt_disable() has to be avoided because
get_mm_memcg_path() might sleep if PREEMPT_RT=y.
The local_lock() macro is defined as
#define local_lock(lock) __local_lock(lock)
in include/linux/local_lock.h and __local_lock() macro is defined as
#define __local_lock(lock) \
do { \
preempt_disable(); \
local_lock_acquire(this_cpu_ptr(lock)); \
} while (0)
if PREEMPT_RT=n or
#define __local_lock(__lock) \
do { \
migrate_disable(); \
spin_lock(this_cpu_ptr((__lock))); \
} while (0)
if PREEMPT_RT=y in include/linux/local_lock_internal.h .
Note that the difference between preempt_disable() and migrate_disable().
Commit d01079f3d0c0 ("mm/mmap_lock: remove dead code for !CONFIG_TRACING
configurations") by error replaced local_lock() with preempt_disable(),
and commit e904c2ccf9b5 ("mm: mmap_lock: fix disabling preemption
directly") replaced preempt_disable() with local_lock().
Now, syzbot started reporting
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
and
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
messages, for local_lock() does not disable IRQ.
I think we can replace local_lock() with local_lock_irqsave() like
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 1854850b4b89..1901ad22ccbd 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -156,14 +156,15 @@ static inline void put_memcg_path_buf(void)
#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \
do { \
const char *memcg_path; \
- local_lock(&memcg_paths.lock); \
+ unsigned long flags; \
+ local_lock_irqsave(&memcg_paths.lock, flags); \
memcg_path = get_mm_memcg_path(mm); \
trace_mmap_lock_##type(mm, \
memcg_path != NULL ? memcg_path : "", \
##__VA_ARGS__); \
if (likely(memcg_path != NULL)) \
put_memcg_path_buf(); \
- local_unlock(&memcg_paths.lock); \
+ local_unlock_irqrestore(&memcg_paths.lock, flags); \
} while (0)
#else /* !CONFIG_MEMCG */
because local_lock_irqsave() is defined as
#define local_lock_irqsave(lock, flags) __local_lock_irqsave(lock, flags)
in include/linux/local_lock.h and __local_lock_irqsave() macro is defined as
#define __local_lock_irqsave(lock, flags) \
do { \
local_irq_save(flags); \
local_lock_acquire(this_cpu_ptr(lock)); \
} while (0)
if PREEMPT_RT=n or
#define __local_lock_irqsave(lock, flags) \
do { \
typecheck(unsigned long, flags); \
flags = 0; \
__local_lock(lock); \
} while (0)
if PREEMPT_RT=y in include/linux/local_lock_internal.h .
But a fundamental question arises here; why do we need to hold
memcg_paths.lock here, for as of commit 44ef20baed8e ("Merge tag
's390-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux"),
"git grep -nF memcg_paths" indicates that memcg_paths.lock is used within
only TRACE_MMAP_LOCK_EVENT() macro.
mm/mmap_lock.c:48:static DEFINE_PER_CPU(struct memcg_path, memcg_paths) = {
mm/mmap_lock.c:63: memcg_path = per_cpu_ptr(&memcg_paths, cpu);
mm/mmap_lock.c:101: rcu_assign_pointer(per_cpu_ptr(&memcg_paths, cpu)->buf, new);
mm/mmap_lock.c:135: struct memcg_path *memcg_path = this_cpu_ptr(&memcg_paths);
mm/mmap_lock.c:152: local_sub(MEMCG_PATH_BUF_SIZE, &this_cpu_ptr(&memcg_paths)->buf_idx);
mm/mmap_lock.c:159: local_lock(&memcg_paths.lock); \
mm/mmap_lock.c:166: local_unlock(&memcg_paths.lock); \
That is, what is the reason we can't revert commit 832b50725373 and
make the TRACE_MMAP_LOCK_EVENT() macro to branch directly like below?
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 1854850b4b89..1e440c7d48e6 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -153,19 +153,35 @@ static inline void put_memcg_path_buf(void)
rcu_read_unlock();
}
+#ifndef CONFIG_PREEMPT_RT
#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \
do { \
const char *memcg_path; \
- local_lock(&memcg_paths.lock); \
+ preempt_disable(); \
memcg_path = get_mm_memcg_path(mm); \
trace_mmap_lock_##type(mm, \
memcg_path != NULL ? memcg_path : "", \
##__VA_ARGS__); \
if (likely(memcg_path != NULL)) \
put_memcg_path_buf(); \
- local_unlock(&memcg_paths.lock); \
+ preempt_enable(); \
} while (0)
+#else
+#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \
+ do { \
+ const char *memcg_path; \
+ migrate_disable(); \
+ memcg_path = get_mm_memcg_path(mm); \
+ trace_mmap_lock_##type(mm, \
+ memcg_path != NULL ? memcg_path : "", \
+ ##__VA_ARGS__); \
+ if (likely(memcg_path != NULL)) \
+ put_memcg_path_buf(); \
+ migrate_enable(); \
+ } while (0)
+#endif
+
#else /* !CONFIG_MEMCG */
int trace_mmap_lock_reg(void)
Is the reason because &buf[idx] in get_memcg_path_buf() might become out of
bounds due to preemption in normal context if PREEMPT_RT=y ? If so, can't we
add "idx >=0 && idx < CONTEXT_COUNT" check into get_memcg_path_buf() and
return NULL if preemption (or interrupt or recursion if any) exhausted the per
cpu buffer?
/*
* How many contexts our trace events might be called in: normal, softirq, irq,
* and NMI.
*/
#define CONTEXT_COUNT 4
Powered by blists - more mailing lists