lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 23 Jul 2020 14:47:50 +0200
From:   Oleg Nesterov <oleg@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Hugh Dickins <hughd@...gle.com>, Michal Hocko <mhocko@...nel.org>,
        Linux-MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Michal Hocko <mhocko@...e.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page

On 07/22, Linus Torvalds wrote:
>
> Comments? Oleg, this should fix the race you talked about too.

Yes.

I still can't convince myself thatI fully understand this patch but I see
nothing really wrong after a quick glance...

> +	 * We can no longer use 'wait' after we've done the
> +	 * list_del_init(&wait->entry),

Yes, but see below,

> +	 * the target may decide it's all done with no
> +	 * other locking, and 'wait' has been allocated on
> +	 * the stack of the target.
>  	 */
> -	if (test_bit(key->bit_nr, &key->page->flags))
> -		return -1;
> +	target = wait->private;
> +	smp_mb();
>
> -	return autoremove_wake_function(wait, mode, sync, key);
> +	/*
> +	 * Ok, we have successfully done what we're waiting for.
> +	 *
> +	 * Now unconditionally remove the wait entry, so that the
> +	 * waiter can use that to see success or not.
> +	 *
> +	 * We _really_ should have a "list_del_init_careful()"
> +	 * to properly pair with an unlocked "list_empty_careful()".
> +	 */
> +	list_del_init(&wait->entry);
> +
> +	/*
> +	 * Theres's another memory barrier in the wakup path, that
> +	 * makes sure the wakup happens after the above is visible
> +	 * to the target.
> +	 */
> +	wake_up_state(target, mode);

We can no longer use 'target'. If it was already woken up it can notice
list_empty_careful(), return without taking q->lock, and exit.

Of course, this is purely theoretical... rcu_read_lock() should help
but perhaps we can avoid it somehow?

Say, can't we abuse WQ_FLAG_WOKEN?

	wake_page_function:
		wait->flags |= WQ_FLAG_WOKEN;
		wmb();
		autoremove_wake_function(...);

	wait_on_page_bit_common:

		for (;;) {
			set_current_state();
			if (wait.flags & WQ_FLAG_WOKEN)
				break;
			schedule();
		}

		finish_wait();

		rmb();
		return wait.flags & WQ_FLAG_WOKEN ? 0 : -EINTR;

Another (cosmetic) problem is that wake_up_state(mode) looks confusing.
It is correct but only because we know that mode == TASK_NORMAL and thus
wake_up_state() can'fail if the target is still blocked.

> +	spin_lock_irq(&q->lock);
> +	SetPageWaiters(page);
> +	if (!trylock_page_bit_common(page, bit_nr, behavior))
> +		__add_wait_queue_entry_tail(q, wait);

do we need SetPageWaiters() if trylock() succeeds ?

Oleg.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ