lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 31 Oct 2015 07:34:00 -0400
From:	Jeff Layton <jlayton@...chiereds.net>
To:	Tejun Heo <tj@...nel.org>
Cc:	linux-kernel@...r.kernel.org, bfields@...ldses.org,
	Michael Skralivetsky <michael.skralivetsky@...marydata.com>,
	Chris Worley <chris.worley@...marydata.com>,
	Trond Myklebust <trond.myklebust@...marydata.com>,
	Lai Jiangshan <laijs@...fujitsu.com>
Subject: Re: timer code oops when calling mod_delayed_work

On Sat, 31 Oct 2015 11:00:12 +0900
Tejun Heo <tj@...nel.org> wrote:

> (cc'ing Lai)
> 
> Hello, Jeff.
> 
> On Thu, Oct 29, 2015 at 01:58:36PM -0400, Jeff Layton wrote:
> > crash> p cache_cleaner
> > cache_cleaner = $12 = {
> >   work = {
> >     data = {
> >       counter = 0xfffffffe1
> 
> If I'm not mistaken, PENDING, flush color 14, OFFQ and POOL_NONE.
> 
> >     }, 
> >     entry = {
> >       next = 0xffffffffa03623c8 <cache_cleaner+8>, 
> >       prev = 0xffffffffa03623c8 <cache_cleaner+8>
> 
> Empty entry.
> 
> >     }, 
> >     func = 0xffffffffa03333c0 <cache_cleaner_func>
> >   }, 
> >   timer = {
> >     entry = {
> >       next = 0x0, 
> >       pprev = 0xffff88085fd0eaf8
> >     }, 
> >     expires = 0x100021e99, 
> >     function = 0xffffffff810b66a0 <delayed_work_timer_fn>, 
> >     data = 0xffffffffa03623c0, 
> >     flags = 0x200014, 
> >     slack = 0xffffffff, 
> >     start_pid = 0x0, 
> >     start_site = 0x0, 
> >     start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> >   }, 
> >   wq = 0xffff88085f48fc00, 
> >   cpu = 0x14
> > }
> > 
> > So the PENDING bit is set (lowest bit in data.counter), and timer->entry.pprev
> > pprev pointer is not NULL (so timer_pending is true). I also see that
> > there are several nfsd threads running the shrinker at the same time.
> > 
> > There is one potential problem that I see, but I'd appreciate someone
> > sanity checking me on this. Here is mod_delayed_work_on:
> ...
> > ...and here is the beginning of try_to_grab_pending:
> > 
> > ------------------[snip]------------------------
> >         /* try to steal the timer if it exists */
> >         if (is_dwork) {
> >                 struct delayed_work *dwork = to_delayed_work(work);
> > 
> >                 /*
> >                  * dwork->timer is irqsafe.  If del_timer() fails, it's
> >                  * guaranteed that the timer is not queued anywhere and not
> >                  * running on the local CPU.
> >                  */
> >                 if (likely(del_timer(&dwork->timer)))
> >                         return 1;
> >         }
> > 
> >         /* try to claim PENDING the normal way */
> >         if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
> >                 return 0;
> > ------------------[snip]------------------------
> > 
> > 
> > ...so if del_timer returns true, we'll return 1 from
> > try_to_grab_pending without actually setting the
> > WORK_STRUCT_PENDING_BIT, and then will end up calling
> > __queue_delayed_work.
> > 
> > That seems wrong to me -- shouldn't we be ensuring that that bit is set
> > when returning 1 from try_to_grab_pending to guard against concurrent
> > queue_delayed_work_on calls?
> 
> But if try_to_grab_pending() succeeded at stealing dwork->timer, it's
> known that the PENDING bit must already be set.  IOW, the bit is
> stolen together with the timer.
> 
> Heh, this one is tricky.  Yeah, try_to_grab_pending() missing PENDING
> would explain the failure but I can't see how it'd leak at the moment.
> 

Thanks Tejun. Yeah, I realized that after sending the response above.

If you successfully delete the timer the timer then the PENDING bit
should already be set. Might be worth throwing in something like this,
just before the return 1:

    WARN_ON(!test_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))

...but I doubt it would fire. I think it's likely that the bug is
elsewhere.

The other thing is that we've had this code in place for a couple of
years now, and this is the first time I've seen an oops like this. I
suspect that this may be a recent regression, but I don't know that for
sure.

I have asked Chris and Michael to see if they can bisect it down, but
it may be a bit before they can get that done. Any insight you might
have in the meantime would helpful.
-- 
Jeff Layton <jlayton@...chiereds.net>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ