lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YK9xTzlNSj83mAne@alley>
Date:   Thu, 27 May 2021 12:15:43 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Oleg Nesterov <oleg@...hat.com>, liumartin@...gle.com,
        akpm@...ux-foundation.org, Tejun Heo <tj@...nel.org>
Cc:     bp@...e.de, davidchao@...gle.com, jenhaochen@...gle.com,
        jkosina@...e.cz, josh@...htriplett.org, mhocko@...e.cz,
        mingo@...hat.com, mm-commits@...r.kernel.org, nathan@...nel.org,
        ndesaulniers@...gle.com, paulmck@...ux.vnet.ibm.com,
        peterz@...radead.org, rostedt@...dmis.org, stable@...r.kernel.org,
        tglx@...utronix.de, tj@...nel.org, vbabka@...e.cz,
        linux-kernel@...r.kernel.org
Subject: Re: +
 kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch
 added to -mm tree

Added Tejun into CC because of the workqueue API related question
at the end of the mail.

On Wed 2021-05-26 19:06:06, Oleg Nesterov wrote:
> On 05/24, Petr Mladek wrote:
> >
> > Your patch changes the semantic. The current semantic is the same for
> > the workqueue's counter-part mod_delayed_work_on().
> 
> OK, I see, thanks. I was confused by the comment.
> 
> > We should actually keep the "ret" value as is to stay compatible with
> > workqueue API:
> >
> > 	/*
> > 	 * Canceling could run in parallel from kthread_cancel_delayed_work_sync
> > 	 * and change work's canceling count as the spinlock is released and regain
> > 	 * in __kthread_cancel_work so we need to check the count again. Otherwise,
> > 	 * we might incorrectly queue the dwork and further cause
> > 	 * cancel_delayed_work_sync thread waiting for flush dwork endlessly.
> > 	 *
> > 	 * Keep the ret code. The API primary informs the caller
> > 	 * whether some pending work has been canceled (not proceed).
> > 	 */
> > 	if (work->canceling)
> > 		goto out;
> 
> Agreed, we should keep the "ret" value.

Martin Liu, could you please resend the patch without the "ret =
false" line? See above.

Andrew, could you please remove this patch from the -mm tree for now?

> but unless I am confused again this doesn't match mod_delayed_work_on()
> which always returns true if it races with cancel(). Nevermind, I think
> this doesn't matter.

Good point. I think that it is actually a bug. Most callers ignore
the return code but there is the following user:

static void addrconf_del_dad_work(struct inet6_ifaddr *ifp)
{
	if (cancel_delayed_work(&ifp->dad_work))
		__in6_ifa_put(ifp);
}
static void addrconf_mod_dad_work(struct inet6_ifaddr *ifp,
				   unsigned long delay)
{
	in6_ifa_hold(ifp);
	if (mod_delayed_work(addrconf_wq, &ifp->dad_work, delay))
		in6_ifa_put(ifp);
}

If mod_delayed_work() races with cancel_delayed_work() then both might
return true and call in6_ifa_put(ifp).

I thought that they were serialized by ifp->lock. But, for example,
addrconf_dad_start() calls addrconf_mod_dad_work() after releasing
this lock.

It is possible that they are serialized another way. But I think that
in principle only the one that really cancelled a pending work
should return "true".

Tejun, any opinion?  Feel free to ask for more context.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ