lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20221112061314.GE725751@paulmck-ThinkPad-P17-Gen-1>
Date:   Fri, 11 Nov 2022 22:13:14 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>
Cc:     Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rcu: Dump memory object info if callback is invalid

On Sat, Nov 12, 2022 at 10:21:49AM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/11/12 2:35, Paul E. McKenney wrote:
> > On Fri, Nov 11, 2022 at 06:04:26PM +0800, Leizhen (ThunderTown) wrote:
> >>
> >>
> >> On 2022/11/11 15:45, Paul E. McKenney wrote:
> >>> On Sat, Nov 05, 2022 at 10:35:19AM +0800, Zhen Lei wrote:
> >>>> The callback rhp->func becomes NULL is usually caused by use-after-free.
> >>>> So the information about 'rhp' is very useful. Unfortunately, nothing is
> >>>> printed at the moment. Look at the panic output below, if no vmcore is
> >>>> generated, there is almost no way to analyze it except to know that the
> >>>> bug exists.
> >>>>
> >>>> Unable to handle kernel NULL pointer dereference at virtual address 0
> >>>> ... ...
> >>>> PC is at 0x0
> >>>> LR is at rcu_do_batch+0x1c0/0x3b8
> >>>> ... ...
> >>>>  (rcu_do_batch) from (rcu_core+0x1d4/0x284)
> >>>>  (rcu_core) from (__do_softirq+0x24c/0x344)
> >>>>  (__do_softirq) from (__irq_exit_rcu+0x64/0x108)
> >>>>  (__irq_exit_rcu) from (irq_exit+0x8/0x10)
> >>>>  (irq_exit) from (__handle_domain_irq+0x74/0x9c)
> >>>>  (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98)
> >>>>  (gic_handle_irq) from (__irq_svc+0x5c/0x94)
> >>>>  (__irq_svc) from (arch_cpu_idle+0x20/0x3c)
> >>>>  (arch_cpu_idle) from (default_idle_call+0x4c/0x78)
> >>>>  (default_idle_call) from (do_idle+0xf8/0x150)
> >>>>  (do_idle) from (cpu_startup_entry+0x18/0x20)
> >>>>  (cpu_startup_entry) from (0xc01530)
> >>>>
> >>>> So add mem_dump_obj(rhp) to output some information, for example:
> >>>>   slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256
> >>>>
> >>>> Now we know the size of the memory block and the offset of rcu_head. Then
> >>>> we can check the code. It's going to be slow and tiring, but it's better
> >>>> than no way to start.
> >>>>
> >>>> Signed-off-by: Zhen Lei <thunder.leizhen@...wei.com>
> >>>
> >>> I have pulled this in with the usual wordsmithing (please check!)
> >>> for review and testing, thank you!
> >>
> >> Great! Thanks. Provides a lot of valuable debugging method information.
> >>
> >> In the following two lines, there are a few extra spaces after the dot.
> >> I will delete it in v2.
> >>
> >> rhp->func to be set to NULL.  This defeats the debugging prints used by
> >> locate the problem.   If the problem is reproducible, additional slab
> > 
> > Please do adjust my wordsmithing as required.
> > 
> >>> Questions include "Is 0x3 correct for functions compiled with all
> >>> supported compiler options on all architectures on which the Linux
> >>
> >> Sorry, I found it possible that it wouldn't work on x86. Although I had
> >> no problems booting up on x86 before. I ran a script today and found that
> >> there were addresses that were not 4-byte aligned.
> >>
> >> I'll send v2 on your basis.
> > 
> > x86 can be like that sometimes...
> > 
> > I revert your current patch, and look forward to seeing your v2.
> > 
> >> cat System.map | grep -E ' t | T ' | awk '{print substr($1,length($1),length($1))}' | sort | uniq -c
> >>   52521 0
> >>     409 1
> >>     394 2
> >>     417 3
> >>     404 4
> >>     458 5
> >>     405 6
> >>     393 7
> >>    1205 8
> >>     457 9
> >>     442 a
> >>     435 b
> >>     421 c
> >>     418 d
> >>     421 e
> >>     426 f
> > 
> > Indeed, quite a few!  Maybe the address check can be arch-specific,
> > maybe using IS_ENABLED()?
> 
> There are more precise, more effective, but relatively time-consuming
> functions. Such as is_vmalloc_addr(), virt_addr_valid().

We cannot afford to slow down callback invocation very much.  Even the
two checks you are currently adding are a bit concerning.  The problem
happens when CPUs flood RCU with callbacks.  Callback invocation needs
to keep up with the flood of call_rcu() invocations.

> >>> kernel runs?", "Is this added information useful often enough for
> >>> this to be pushed to mainline?", and so on.
> > 
> > And another question is "Should this be default?"  There may be concerns
> > with callback-invocation throughput during callback-flooding events.
> 
> The overhead of the if statement is small compared with that of
> memory freeing. However, there is a low probability that the callback
> fails, it's also wise to set it to non-default.

Very good, works for me!

> >> I originally wanted to add a member in struct rcu_head and backup
> >> 'func' to the previous node. This way, when the error is detected,
> >> the hook function can be printed out. This will help us quickly
> >> find the user of the invalid rhp. However, the size of the struct
> >> page is limited and cannot be expanded.
> > 
> > Although that information could be clobbered just as easily as could
> > the ->func value, right?
> 
> There is a low probability that the previous node and the current node
> are in the same memory area. Therefore, it is relatively reliable to
> back up ->func to the previous node. However, this method can be used
> temporarily to locate the problem. After all, the impact is too great.

Agreed.  I could imagine at least some people running something like your
current patches in production.  I have a much harder time imagining very
many people running an expanded rcu_head structure in production.

							Thanx, Paul

> >> Further more, we can dump the contents of mem object.
> >>
> >> I have a problem that has not been resolved and has not reproduced.
> >> The surrounding contents of 'rhp' have been dumped, as below.
> >> You can highlight 00000024 and 00000030, you'll see that this is a
> >> fixed 80-bytes structure. There is also a bidirectional linked list
> >> in the structure. If I have mem_dump_obj(rhp) information, I can
> >> narrow it down considerably.
> >>
> >> [20220928044206]5390: 00000024 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >> [20220928044206]53b0: 00000000 00000000 00000000 00000000 cfa4d580 ffff4596 00000000 00000000
> >> [20220928044206]53d0: 7438f148 ffff4596 7438f148 ffff4596 00000024 00000000 0b828cfa 0f00aaf4
> >> [20220928044206]53f0: 00000000 00000000 00000000 00000000 496653c0 ffff4596 00000000 00000000
> >> [20220928044206]5410: 00000000 00000000 00000000 00000000 ae0769e0 ffff4596 ae0769e0 ffff4596
> >> [20220928044206]5430: 00000030 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >> [20220928044206]5450: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >> [20220928044206]5470: ae076988 ffff4596 ae076988 ffff4596 00000024 00000000 00000000 00000000
> > 
> > OK, I consider the "is this useful" question to be answered in the
> > affirmative.
> > 
> > 							Thanx, Paul
> > 
> >>>> ---
> >>>>  kernel/rcu/rcu.h      | 7 +++++++
> >>>>  kernel/rcu/srcutiny.c | 1 +
> >>>>  kernel/rcu/srcutree.c | 1 +
> >>>>  kernel/rcu/tasks.h    | 1 +
> >>>>  kernel/rcu/tiny.c     | 1 +
> >>>>  kernel/rcu/tree.c     | 1 +
> >>>>  6 files changed, 12 insertions(+)
> >>>>
> >>>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> >>>> index 70c79adfdc7046c..4844dec36bddb48 100644
> >>>> --- a/kernel/rcu/rcu.h
> >>>> +++ b/kernel/rcu/rcu.h
> >>>> @@ -10,6 +10,7 @@
> >>>>  #ifndef __LINUX_RCU_H
> >>>>  #define __LINUX_RCU_H
> >>>>  
> >>>> +#include <linux/mm.h>
> >>>>  #include <trace/events/rcu.h>
> >>>>  
> >>>>  /*
> >>>> @@ -211,6 +212,12 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
> >>>>  }
> >>>>  #endif	/* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
> >>>>  
> >>>> +static inline void debug_rcu_head_callback(struct rcu_head *rhp)
> >>>> +{
> >>>> +	if (unlikely(!rhp->func || (unsigned long)rhp->func & 0x3))
> >>>> +		mem_dump_obj(rhp);
> >>>> +}
> >>>> +
> >>>>  extern int rcu_cpu_stall_suppress_at_boot;
> >>>>  
> >>>>  static inline bool rcu_stall_is_suppressed_at_boot(void)
> >>>> diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> >>>> index 33adafdad261389..5e7f336baa06ae0 100644
> >>>> --- a/kernel/rcu/srcutiny.c
> >>>> +++ b/kernel/rcu/srcutiny.c
> >>>> @@ -138,6 +138,7 @@ void srcu_drive_gp(struct work_struct *wp)
> >>>>  	while (lh) {
> >>>>  		rhp = lh;
> >>>>  		lh = lh->next;
> >>>> +		debug_rcu_head_callback(rhp);
> >>>>  		local_bh_disable();
> >>>>  		rhp->func(rhp);
> >>>>  		local_bh_enable();
> >>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> >>>> index ca4b5dcec675bac..294972e66b31863 100644
> >>>> --- a/kernel/rcu/srcutree.c
> >>>> +++ b/kernel/rcu/srcutree.c
> >>>> @@ -1631,6 +1631,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
> >>>>  	rhp = rcu_cblist_dequeue(&ready_cbs);
> >>>>  	for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
> >>>>  		debug_rcu_head_unqueue(rhp);
> >>>> +		debug_rcu_head_callback(rhp);
> >>>>  		local_bh_disable();
> >>>>  		rhp->func(rhp);
> >>>>  		local_bh_enable();
> >>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> >>>> index b0b885e071fa8dc..b7f8c67c586cdc4 100644
> >>>> --- a/kernel/rcu/tasks.h
> >>>> +++ b/kernel/rcu/tasks.h
> >>>> @@ -478,6 +478,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
> >>>>  	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
> >>>>  	len = rcl.len;
> >>>>  	for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) {
> >>>> +		debug_rcu_head_callback(rhp);
> >>>>  		local_bh_disable();
> >>>>  		rhp->func(rhp);
> >>>>  		local_bh_enable();
> >>>> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> >>>> index bb8f7d270f01747..56e9a5d91d97ec5 100644
> >>>> --- a/kernel/rcu/tiny.c
> >>>> +++ b/kernel/rcu/tiny.c
> >>>> @@ -97,6 +97,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
> >>>>  
> >>>>  	trace_rcu_invoke_callback("", head);
> >>>>  	f = head->func;
> >>>> +	debug_rcu_head_callback(head);
> >>>>  	WRITE_ONCE(head->func, (rcu_callback_t)0L);
> >>>>  	f(head);
> >>>>  	rcu_lock_release(&rcu_callback_map);
> >>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >>>> index 93c286b98c8f03d..3b93b9df8042a84 100644
> >>>> --- a/kernel/rcu/tree.c
> >>>> +++ b/kernel/rcu/tree.c
> >>>> @@ -2256,6 +2256,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >>>>  		trace_rcu_invoke_callback(rcu_state.name, rhp);
> >>>>  
> >>>>  		f = rhp->func;
> >>>> +		debug_rcu_head_callback(rhp);
> >>>>  		WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> >>>>  		f(rhp);
> >>>>  
> >>>> -- 
> >>>> 2.25.1
> >>>>
> >>> .
> >>>
> >>
> >> -- 
> >> Regards,
> >>   Zhen Lei
> > .
> > 
> 
> -- 
> Regards,
>   Zhen Lei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ