linux-kernel - Re: workqueue panic in 3.4 kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALZhoSRJ+HV93tDJo-57M-SdXi8Z6qW_bRV9FzSfGWuz8j-t8Q@mail.gmail.com>
Date:	Thu, 7 Mar 2013 23:22:40 +0800
From:	Lei Wen <adrian.wenl@...il.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	linux-kernel@...r.kernel.org, leiwen@...vell.com,
	wwang27@...vell.com
Subject: Re: workqueue panic in 3.4 kernel

Tejun,

On Thu, Mar 7, 2013 at 9:15 AM, Lei Wen <adrian.wenl@...il.com> wrote:
> Hi Tejun,
>
> On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo <tj@...nel.org> wrote:
>> Hello, Lei.
>>
>> On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
>>> We find a race condition as below:
>>> CPU0                                                                  CPU1
>>> timer interrupt happen
>>> __run_timers
>>>    __run_timers::spin_lock_irq(&base->lock)
>>>    __run_timers::spin_unlock_irq(&base->lock)
>>>
>>>    __cancel_work_timer
>>>
>>>    __cancel_work_timer::del_timer
>>>
>>>    __cancel_work_timer::wait_on_work
>>>
>>>    __cancel_work_timer::clear_work_data
>>>    __run_timers::call_timer_fn(timer, fn, data);
>>>          delayed_work_timer_fn::get_work_cwq
>>>    __run_timers::spin_lock_irq(&base->lock)
>>>
>>> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
>>> cpu0 is ready to
>>> run the timer callback, which is delayed_work_timer_fn in our case.
>>
>> If del_timer() happens after the timer starts running, del_timer()
>> would return NULL and try_to_grab_pending() will be called which will
>> return >=0 iff if successfully steals the PENDING bit (ie. it's the
>> sole owner of the work item).  If del_timer() happens before the timer
>> starts running, the timer function would never run.
>
> If del_timer() happen before __run_timers() is called, while timer irq
> already happen,
> would it return 1 for the timer is still not detached in __run_timers()?
> If it is possible, then we would call try_to_grab_pending(), so that
> work->data would
> be cleared in this way.
>
>>
>> clear_work_data() happens iff the work item is confirmed to be idle.
>> At this point, I'm pretty skeptical this is a bug in workqueue itself
>> and strongly suggest looking at the crashing workqueue user.
>
> Also I am not very familiar with workqueue mechanism, how many place
> in kernel would
> clear the work->data beside the clear_work_data()?
>
> From the memory, I cannot find any hint for work structure being destroyed.
> So the only possible seems to me is the work->data be set by someone on purpose.
>
> crash> struct delayed_work 0xbf03d544 -x
> struct delayed_work {
>   work = {
>     data = {
>       counter = 0x300
>     },
>     entry = {
>       next = 0xbf03d548,
>       prev = 0xbf03d548
>     },
>     func = 0xbf014b00
>   },
>   timer = {
>     entry = {
>       next = 0x0,
>       prev = 0x200200
>     },
>     expires = 0x12b638b,
>     base = 0xc0844e01,
>     function = 0xc014c7a0 <delayed_work_timer_fn>,
>     data = 0xbf03d544,
>     slack = 0xffffffff,
>     start_pid = 0xffffffff,
>     start_site = 0x0,
>     start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
>   }
> }

I captured a trace log, which shows my previous suspicion is true:
__cancel_work_timer is called before delayed_work_timer_fn, so that
work->data is cleared.

And when __cancel_work_timer is called, the timer is still pending,
so del_timer would return 1, thus no try_to_grab_pending would be called.

But it is very strange that in __run_timers, it still get the same timer.
Then its callback, delayed_work_timer_fn, would be called, which cause
the issue.

The detach_timer in __cancel_work_timer should already move the timer
from all list, am I right?
Could it happen for the timer_list be queued twice, like queue over two cpu?
If not, how could it happen?

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/