lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 14 Nov 2022 08:06:29 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>
Cc:     "Zhang, Qiang1" <qiang1.zhang@...el.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] rcu: Dump memory object info if callback function is
 invalid

On Mon, Nov 14, 2022 at 03:18:10PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/11/12 14:08, Paul E. McKenney wrote:
> > On Sat, Nov 12, 2022 at 10:32:32AM +0800, Leizhen (ThunderTown) wrote:
> >> On 2022/11/12 2:42, Paul E. McKenney wrote:
> >>> On Fri, Nov 11, 2022 at 01:05:56PM +0000, Zhang, Qiang1 wrote:
> >>>> On 2022/11/11 19:54, Zhang, Qiang1 wrote:
> >>>>>> When a structure containing an RCU callback rhp is (incorrectly) 
> >>>>>> freed and reallocated after rhp is passed to call_rcu(), it is not 
> >>>>>> unusual for
> >>>>>> rhp->func to be set to NULL. This defeats the debugging prints used 
> >>>>>> rhp->by
> >>>>>> __call_rcu_common() in kernels built with 
> >>>>>> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, which expect to identify the 
> >>>>>> offending code using the identity of this function.
> >>>>>>
> >>>>>> And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things 
> >>>>>> are even worse, as can be seen from this splat:
> >>>>>>
> >>>>>> Unable to handle kernel NULL pointer dereference at virtual address 0 
> >>>>>> ... ...
> >>>>>> PC is at 0x0
> >>>>>> LR is at rcu_do_batch+0x1c0/0x3b8
> >>>>>> ... ...
> >>>>>> (rcu_do_batch) from (rcu_core+0x1d4/0x284)
> >>>>>> (rcu_core) from (__do_softirq+0x24c/0x344)
> >>>>>> (__do_softirq) from (__irq_exit_rcu+0x64/0x108)
> >>>>>> (__irq_exit_rcu) from (irq_exit+0x8/0x10)
> >>>>>> (irq_exit) from (__handle_domain_irq+0x74/0x9c)
> >>>>>> (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98)
> >>>>>> (gic_handle_irq) from (__irq_svc+0x5c/0x94)
> >>>>>> (__irq_svc) from (arch_cpu_idle+0x20/0x3c)
> >>>>>> (arch_cpu_idle) from (default_idle_call+0x4c/0x78)
> >>>>>> (default_idle_call) from (do_idle+0xf8/0x150)
> >>>>>> (do_idle) from (cpu_startup_entry+0x18/0x20)
> >>>>>> (cpu_startup_entry) from (0xc01530)
> >>>>>>
> >>>>>> This commit therefore adds calls to mem_dump_obj(rhp) to output some 
> >>>>>> information, for example:
> >>>>>>
> >>>>>>  slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256
> >>>>>>
> >>>>>> This provides the rough size of the memory block and the offset of 
> >>>>>> the rcu_head structure, which as least provides at least a few clues 
> >>>>>> to help locate the problem. If the problem is reproducible, 
> >>>>>> additional slab debugging can be enabled, for example, 
> >>>>>> CONFIG_DEBUG_SLAB=y, which can provide significantly more information.
> >>>>>>
> >>>>>> Signed-off-by: Zhen Lei <thunder.leizhen@...wei.com>
> >>>>>> Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> >>>>>> ---
> >>>>>> kernel/rcu/rcu.h      | 7 +++++++
> >>>>>> kernel/rcu/srcutiny.c | 1 +
> >>>>>> kernel/rcu/srcutree.c | 1 +
> >>>>>> kernel/rcu/tasks.h    | 1 +
> >>>>>> kernel/rcu/tiny.c     | 1 +
> >>>>>> kernel/rcu/tree.c     | 1 +
> >>>>>> 6 files changed, 12 insertions(+)
> >>>>>>
> >>>>>> v1 --> v2:
> >>>>>> 1. Remove condition "(unsigned long)rhp->func & 0x3", it have problems on x86.
> >>>>>> 2. Paul E. McKenney helped me update the commit message, thanks.
> >>>>>>
> >>>>>
> >>>>> Hi, Zhen Lei
> >>>>>
> >>>>> Maybe the following scenarios should be considered:
> >>>>>
> >>>>>                 CPU 0
> >>>>> tasks context
> >>>>>    spin_lock(&vmap_area_lock)
> >>>>>           Interrupt 
> >>>>> 	 RCU softirq
> >>>>> 	      rcu_do_batch
> >>>>> 		mem_dump_obj
> >>>>>                                   vmalloc_dump_obj
> >>>>>                                        spin_lock(&vmap_area_lock)   <--  deadlock     
> >>>>
> >>>>> Right, thanks. I just saw the robot's report. So this patch should be dropped.
> >>>>> I'll try to add an helper in mm, where I can check whether the lock has been held, and dump the content of memory object.
> >>>>
> >>>> This is a workaround, or maybe try a modification like the following, 
> >>>> of course, need to ask Paul's opinion.
> >>>
> >>> Another approach is to schedule a workqueue handler to do the
> >>> mem_dump_obj().  This would allow mem_dump_obj() to run in a clean
> >>> environment.
> >>
> >> It's about to panic, so no chance to schedule.
> > 
> > It won't panic if you drop the callback on the floor.
> > 
> > Though to your point, the ->next pointer is likely also trashed.  So you
> > could just drop the remainder of the callback list on the floor.  That
> > might provide a good (though not perfect) chance of getting decent output.
> 
> OK, I think I understand what you mean.
> if (!f)
> 	schedule_work(&work);
> else
> 	f(rhp)

Yes, except that the "schedule_work()" also needs to be accompanied
by something that refuses to execute the rest of those callbacks.
This needs to break out of the loop (or return) and to adjust counts,
among other things.  This might be as easy as setting count to the
negative of the length of the "rcl" list, but does need some attention
to the code following the callback-invocation loop.

							Thanx, Paul

> >>> This would allow vmalloc_dump_obj() to be called unconditionally.
> >>>
> >>> Other thoughts?
> >>
> >> locked = spin_is_locked(&vmap_area_lock);
> >> if (!locked)
> >>     spin_lock(&vmap_area_lock)
> >>
> >> Careful analysis is required, which may cause other problems.
> >>
> >> Or in new function:
> >> if (locked)
> >>     return;
> >> spin_lock(&vmap_area_lock);
> >>
> >> If there is a chance to dump the data, dump the data. If there is no
> >> chance to dump the data, do not dump the data. This is the fate of
> >> debugging information.
> > 
> > My concern is that there will be increasing numbers of special cases
> > over time.
> 
> OK, I got it.
> 
> > 
> > 							Thanx, Paul
> > 
> >>>> diff --git a/mm/util.c b/mm/util.c
> >>>> index 12984e76767e..86da0739fe5d 100644
> >>>> --- a/mm/util.c
> >>>> +++ b/mm/util.c
> >>>> @@ -1119,14 +1119,18 @@ void mem_dump_obj(void *object)
> >>>>  {
> >>>>         const char *type;
> >>>>
> >>>> +       if (is_vmalloc_addr(object)) {
> >>>> +               if (in_task() && vmalloc_dump_obj(object))
> >>>> +                       return;
> >>>> +               type = "vmalloc memory";
> >>>> +               goto end;
> >>>> +       }
> >>>> +
> >>>>         if (kmem_valid_obj(object)) {
> >>>>                 kmem_dump_obj(object);
> >>>>                 return;
> >>>>         }
> >>>>
> >>>> -       if (vmalloc_dump_obj(object))
> >>>> -               return;
> >>>> -
> >>>>         if (virt_addr_valid(object))
> >>>>                 type = "non-slab/vmalloc memory";
> >>>>         else if (object == NULL)
> >>>> @@ -1135,7 +1139,7 @@ void mem_dump_obj(void *object)
> >>>>                 type = "zero-size pointer";
> >>>>         else
> >>>>                 type = "non-paged memory";
> >>>> -
> >>>> +end:
> >>>>         pr_cont(" %s\n", type);
> >>>>  }
> >>>>  EXPORT_SYMBOL_GPL(mem_dump_obj);
> >>>>
> >>>> Thanks
> >>>> Zqiang
> >>>>
> >>>>
> >>>>>
> >>>>> Thanks
> >>>>> Zqiang
> >>>>>
> >>>>>
> >>>>>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 
> >>>>>> 65704cbc9df7b3d..32ab45fabf8eebf 100644
> >>>>>> --- a/kernel/rcu/rcu.h
> >>>>>> +++ b/kernel/rcu/rcu.h
> >>>>>> @@ -10,6 +10,7 @@
> >>>>>> #ifndef __LINUX_RCU_H
> >>>>>> #define __LINUX_RCU_H
> >>>>>>
> >>>>>> +#include <linux/mm.h>
> >>>>>> #include <trace/events/rcu.h>
> >>>>>>
> >>>>>> /*
> >>>>>> @@ -211,6 +212,12 @@ static inline void debug_rcu_head_unqueue(struct 
> >>>>>> rcu_head *head) }
> >>>>>> #endif	/* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
> >>>>>>
> >>>>>> +static inline void debug_rcu_head_callback(struct rcu_head *rhp) {
> >>>>>> +	if (unlikely(!rhp->func))
> >>>>>> +		mem_dump_obj(rhp);
> >>>>>> +}
> >>>>>> +
> >>>>>> extern int rcu_cpu_stall_suppress_at_boot;
> >>>>>>
> >>>>>> static inline bool rcu_stall_is_suppressed_at_boot(void)
> >>>>>> diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c index 
> >>>>>> 33adafdad261389..5e7f336baa06ae0 100644
> >>>>>> --- a/kernel/rcu/srcutiny.c
> >>>>>> +++ b/kernel/rcu/srcutiny.c
> >>>>>> @@ -138,6 +138,7 @@ void srcu_drive_gp(struct work_struct *wp)
> >>>>>> 	while (lh) {
> >>>>>> 		rhp = lh;
> >>>>>> 		lh = lh->next;
> >>>>>> +		debug_rcu_head_callback(rhp);
> >>>>>> 		local_bh_disable();
> >>>>>> 		rhp->func(rhp);
> >>>>>> 		local_bh_enable();
> >>>>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index 
> >>>>>> ca4b5dcec675bac..294972e66b31863 100644
> >>>>>> --- a/kernel/rcu/srcutree.c
> >>>>>> +++ b/kernel/rcu/srcutree.c
> >>>>>> @@ -1631,6 +1631,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
> >>>>>> 	rhp = rcu_cblist_dequeue(&ready_cbs);
> >>>>>> 	for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
> >>>>>> 		debug_rcu_head_unqueue(rhp);
> >>>>>> +		debug_rcu_head_callback(rhp);
> >>>>>> 		local_bh_disable();
> >>>>>> 		rhp->func(rhp);
> >>>>>> 		local_bh_enable();
> >>>>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 
> >>>>>> b0b885e071fa8dc..b7f8c67c586cdc4 100644
> >>>>>> --- a/kernel/rcu/tasks.h
> >>>>>> +++ b/kernel/rcu/tasks.h
> >>>>>> @@ -478,6 +478,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
> >>>>>> 	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
> >>>>>> 	len = rcl.len;
> >>>>>> 	for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = 
> >>>>>> rcu_cblist_dequeue(&rcl)) {
> >>>>>> +		debug_rcu_head_callback(rhp);
> >>>>>> 		local_bh_disable();
> >>>>>> 		rhp->func(rhp);
> >>>>>> 		local_bh_enable();
> >>>>>> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c index 
> >>>>>> bb8f7d270f01747..56e9a5d91d97ec5 100644
> >>>>>> --- a/kernel/rcu/tiny.c
> >>>>>> +++ b/kernel/rcu/tiny.c
> >>>>>> @@ -97,6 +97,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head 
> >>>>>> *head)
> >>>>>>
> >>>>>> 	trace_rcu_invoke_callback("", head);
> >>>>>> 	f = head->func;
> >>>>>> +	debug_rcu_head_callback(head);
> >>>>>> 	WRITE_ONCE(head->func, (rcu_callback_t)0L);
> >>>>>> 	f(head);
> >>>>>> 	rcu_lock_release(&rcu_callback_map);
> >>>>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 
> >>>>>> 15aaff3203bf2d0..ed93ddb8203d42c 100644
> >>>>>> --- a/kernel/rcu/tree.c
> >>>>>> +++ b/kernel/rcu/tree.c
> >>>>>> @@ -2088,6 +2088,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >>>>>> 		trace_rcu_invoke_callback(rcu_state.name, rhp);
> >>>>>>
> >>>>>> 		f = rhp->func;
> >>>>>> +		debug_rcu_head_callback(rhp);
> >>>>>> 		WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> >>>>>> 		f(rhp);
> >>>>>>
> >>>>>> --
> >>>>>> 2.25.1
> >>>>>
> >>>>> .
> >>>>>
> >>>>
> >>>> --
> >>>> Regards,
> >>>>   Zhen Lei
> >>> .
> >>>
> >>
> >> -- 
> >> Regards,
> >>   Zhen Lei
> > .
> > 
> 
> -- 
> Regards,
>   Zhen Lei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ