linux-kernel - Re: [PATCH] ftrace: Fixup lockdep assert held of text

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <mhng-5de4c6ca-7e77-4b55-9097-59296872e6de@palmerdabbelt-glaptop1>
Date:   Mon, 24 Aug 2020 17:29:32 -0700 (PDT)
From:   Palmer Dabbelt <palmer@...belt.com>
To:     rostedt@...dmis.org
CC:     guoren@...nel.org, Paul Walmsley <paul.walmsley@...ive.com>,
        mingo@...hat.com, linux-kernel@...r.kernel.org,
        linux-csky@...r.kernel.org, guoren@...ux.alibaba.com,
        linux-riscv@...ts.infradead.org
Subject:     Re: [PATCH] ftrace: Fixup lockdep assert held of text_mutex

On Thu, 13 Aug 2020 08:37:43 PDT (-0700), rostedt@...dmis.org wrote:
> On Wed, 12 Aug 2020 22:13:19 -0700 (PDT)
> Palmer Dabbelt <palmer@...belt.com> wrote:
>
>> Sorry, I'm not really sure what's going on here.  I'm not really seeing code
>> that matches this in our port right now, so maybe this is aginst some other
>> tree?  If it's the RISC-V kprobes patch set then I was hoping to take a look at
>> that tomorrow (or I guess a bit earlier this week, but I had some surprise work
>> stuff to do).  IIRC there were a handful of races in the last patch set I saw,
>> but it's been a while so I don't remember for sure.
>>
>> That said, I certainly wouldn't be surprised if there's a locking bug in our
>> ftrace stuff.  It'd be way easier for me to figure out what's going on if you
>> have a concrete suggestion as to how to fix the issues -- even if it's just a
>> workaround.
>
> The issue is actually quite basic.
>
> ftrace_init_nop() is called quite early in boot up and never called
> again. It's called before SMP is set up, so it's on a single CPU, and
> no worries about synchronization with other CPUs is needed.
>
> On x86, it is called before text_poke() is initialized (which is used
> to synchronize code updates across CPUs), and thus can't be called.
> There's a "text_poke_early()" that is used instead, which is basically
> just a memcpy().
>
> Now, if ftrace_init_nop() is not defined by the architecture, it is a
> simple call to ftrace_make_nop(), which is also used to disable ftrace
> callbacks.
>
> The issue is that we have the following path on riscv:
>
>  ftrace_init_nop()
>    ftrace_make_nop()
>      __ftrace_modify_call()
>        patch_text_nosync()
>          patch_insn_write()
>            lockdep_assert_held(&text_mutex);
>
> Boom! text_mutex is not held, and lockdep complains.
>
>
> The difference between ftrace_make_nop() being called by
> ftrace_init_nop() and being called later to disable function tracing is
> that the latter will have:
>
>
> 	ftrace_arch_code_modify_prepare();
> 	[..]
> 	ftrace_make_nop();
> 	[..]
> 	ftrace_arch_code_modify_post_process();
>
> and the former will not have those called.
>
> On x86, we handle the two different cases with:
>
>
> static int ftrace_poke_late = 0;
>
> int ftrace_arch_code_modify_prepare(void)
> {
> 	mutex_lock(&text_mutex);
> 	ftrace_poke_late = 1;
> 	return 0;
> }
>
> int ftrace_arch_code_modify_post_process(void)
> {
> 	text_poke_finish();
> 	ftrace_poke_late = 0;
> 	mutex_unlock(&text_mutex);
> }
>
> Although, the post_process() probably doesn't even need to set
> ftrace_poke_late back to zero.
>
> Then in ftrace_make_nop(), we have:
>
>   ftrace_make_nop()
>     ftrace_modify_code_direct()
>       if (ftrace_poke_late)
>         text_poke_queue(...); // this checks if text_mutex is held
>       else
>         text_poke_early(...); // is basically just memcpy, no test on text_mutex.
>
> The two solutions for riscv, is either to implement the same thing as
> above, or you can create your own ftrace_init_nop() to take the
> text_mutex before calling ftrace_make_nop(), and that should work too.

Ya, thanks, that's a pretty straight-forward issue.  I've To'd you on a patch,
but it's essentially just exactly what you suggested so I doubt it's that
interesting.

I pointed out in the patch notes that it seems reasonable to have the generic
code handle this case, would you be opposed to doing it that way?