lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Aug 2020 11:37:43 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Palmer Dabbelt <palmer@...belt.com>
Cc:     guoren@...nel.org, Paul Walmsley <paul.walmsley@...ive.com>,
        mingo@...hat.com, linux-kernel@...r.kernel.org,
        linux-csky@...r.kernel.org, guoren@...ux.alibaba.com,
        linux-riscv@...ts.infradead.org
Subject: Re: [PATCH] ftrace: Fixup lockdep assert held of text_mutex

On Wed, 12 Aug 2020 22:13:19 -0700 (PDT)
Palmer Dabbelt <palmer@...belt.com> wrote:

> Sorry, I'm not really sure what's going on here.  I'm not really seeing code
> that matches this in our port right now, so maybe this is aginst some other
> tree?  If it's the RISC-V kprobes patch set then I was hoping to take a look at
> that tomorrow (or I guess a bit earlier this week, but I had some surprise work
> stuff to do).  IIRC there were a handful of races in the last patch set I saw,
> but it's been a while so I don't remember for sure.
> 
> That said, I certainly wouldn't be surprised if there's a locking bug in our
> ftrace stuff.  It'd be way easier for me to figure out what's going on if you
> have a concrete suggestion as to how to fix the issues -- even if it's just a
> workaround.

The issue is actually quite basic.

ftrace_init_nop() is called quite early in boot up and never called
again. It's called before SMP is set up, so it's on a single CPU, and
no worries about synchronization with other CPUs is needed.

On x86, it is called before text_poke() is initialized (which is used
to synchronize code updates across CPUs), and thus can't be called.
There's a "text_poke_early()" that is used instead, which is basically
just a memcpy().

Now, if ftrace_init_nop() is not defined by the architecture, it is a
simple call to ftrace_make_nop(), which is also used to disable ftrace
callbacks.

The issue is that we have the following path on riscv:

 ftrace_init_nop()
   ftrace_make_nop()
     __ftrace_modify_call()
       patch_text_nosync()
         patch_insn_write()
           lockdep_assert_held(&text_mutex);

Boom! text_mutex is not held, and lockdep complains.


The difference between ftrace_make_nop() being called by
ftrace_init_nop() and being called later to disable function tracing is
that the latter will have:


	ftrace_arch_code_modify_prepare();
	[..]
	ftrace_make_nop();
	[..]
	ftrace_arch_code_modify_post_process();

and the former will not have those called.

On x86, we handle the two different cases with:


static int ftrace_poke_late = 0;

int ftrace_arch_code_modify_prepare(void)
{
	mutex_lock(&text_mutex);
	ftrace_poke_late = 1;
	return 0;
}

int ftrace_arch_code_modify_post_process(void)
{
	text_poke_finish();
	ftrace_poke_late = 0;
	mutex_unlock(&text_mutex);
}

Although, the post_process() probably doesn't even need to set
ftrace_poke_late back to zero.

Then in ftrace_make_nop(), we have:

  ftrace_make_nop()
    ftrace_modify_code_direct()
      if (ftrace_poke_late)
        text_poke_queue(...); // this checks if text_mutex is held
      else
        text_poke_early(...); // is basically just memcpy, no test on text_mutex.

The two solutions for riscv, is either to implement the same thing as
above, or you can create your own ftrace_init_nop() to take the
text_mutex before calling ftrace_make_nop(), and that should work too.

-- Steve

Powered by blists - more mailing lists