[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACT4Y+aXtpXOzesh=+52Vt4+hufixQ8HrHMJXAQ8MFeRR5D_Sg@mail.gmail.com>
Date: Fri, 10 Jan 2025 13:13:30 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Kun Hu <huk23@...udan.edu.cn>
Cc: andreyknvl@...il.com, akpm@...ux-foundation.org, elver@...gle.com,
arnd@...db.de, nogikh@...gle.com, kasan-dev@...glegroups.com,
linux-kernel@...r.kernel.org,
"jjtan24@...udan.edu.cn" <jjtan24@...udan.edu.cn>
Subject: Re: Bug: Potential KCOV Race Condition in __sanitizer_cov_trace_pc
Leading to Crash at kcov.c:217
On Fri, 10 Jan 2025 at 09:14, Kun Hu <huk23@...udan.edu.cn> wrote:
> >> HEAD commit: dbfac60febfa806abb2d384cb6441e77335d2799
> >> git tree: upstream
> >> Console output: https://drive.google.com/file/d/1rmVTkBzuTt0xMUS-KPzm9OafMLZVOAHU/view?usp=sharing
> >> Kernel config: https://drive.google.com/file/d/1m1mk_YusR-tyusNHFuRbzdj8KUzhkeHC/view?usp=sharing
> >> C reproducer: /
> >> Syzlang reproducer: /
> >>
> >> The crash in __sanitizer_cov_trace_pc at kernel/kcov.c:217 seems to be related to the handling of KCOV instrumentation when running in a preemption or IRQ-sensitive context. Specifically, the code might allow potential recursive invocations of __sanitizer_cov_trace_pc during early interrupt handling, which could lead to data races or inconsistent updates to the coverage area (kcov_area). It remains unclear whether this is a KCOV-specific issue or a rare edge case exposed by fuzzing.
> >
> > Hi Kun,
> >
> > How have you inferred this from the kernel oops?
> > I only see a stall that may have just happened to be caught inside of
> > __sanitizer_cov_trace_pc function since it's executed often in an
> > instrumented kernel.
> >
> > Note: on syzbot we don't report stalls on instances that have
> > perf_event_open enabled, since perf have known bugs that lead to stall
> > all over the kernel.
>
> Hi Dmitry,
>
> Please allow me to ask for your advice:
>
> We get the new c and syzlang reproducer for multiple rounds of reproducing. Indeed, the location of this issue has varied (BUG: soft lockup in tmigr_handle_remote in ./kernel/time/timer_migration.c). The crash log, along with the C and Syzlang reproducer are provided below:
>
> Crash log: https://drive.google.com/file/d/16YDP6bU3Ga8OI1l7hsNFG4EdvjxuBz8d/view?usp=sharing
> C reproducer: https://drive.google.com/file/d/1BHDc6XdXsat07yb94h6VWJ-jIIKhwPfn/view?usp=sharing
> Syzlang reproducer: https://drive.google.com/file/d/1qo1qfr0KNbyIK909ddAo6uzKnrDPdGyV/view?usp=sharing
>
> Should I report the issue to the maintainer responsible for “timer_migration.c”?
If it shows stalls in 2 locations, I assume it can show stalls all
over the kernel.
The only thing the reproducer is doing is perf_event_open, so I would
assume the issue is related to perf.
Powered by blists - more mailing lists