linux-kernel - Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20230605142034.GD32275@redhat.com>
Date:   Mon, 5 Jun 2023 16:20:35 +0200
From:   Oleg Nesterov <oleg@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Jason Wang <jasowang@...hat.com>,
        Mike Christie <michael.christie@...cle.com>,
        linux@...mhuis.info, nicolas.dichtel@...nd.com, axboe@...nel.dk,
        ebiederm@...ssion.com, linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, mst@...hat.com,
        sgarzare@...hat.com, stefanha@...hat.com, brauner@...nel.org
Subject: Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps
 regression

On 06/02, Linus Torvalds wrote:
>
> On Fri, Jun 2, 2023 at 1:59 PM Oleg Nesterov <oleg@...hat.com> wrote:
> >
> > As I said from the very beginning, this code is fine on x86 because
> > atomic ops are fully serialised on x86.
>
> Yes. Other architectures require __smp_mb__{before,after}_atomic for
> the bit setting ops to actually be memory barriers.
>
> We *should* probably have acquire/release versions of the bit test/set
> helpers, but we don't, so they end up being full memory barriers with
> those things. Which isn't optimal, but I doubt it matters on most
> architectures.
>
> So maybe we'll some day have a "test_bit_acquire()" and a
> "set_bit_release()" etc.

In this particular case we need clear_bit_release() and iiuc it is
already here, just it is named clear_bit_unlock().

So do you agree that vhost_worker() needs smp_mb__before_atomic()
before clear_bit() or just clear_bit_unlock() to avoid the race with
vhost_work_queue() ?

Let me provide a simplified example:

	struct item {
		struct llist_node	llist;
		unsigned long		flags;
	};

	struct llist_head HEAD = {};	// global

	void queue(struct item *item)
	{
		// ensure this item was already flushed
		if (!test_and_set_bit(0, &item->flags))
			llist_add(item->llist, &HEAD);

	}

	void flush(void)
	{
		struct llist_node *head = llist_del_all(&HEAD);
		struct item *item, *next;

		llist_for_each_entry_safe(item, next, head, llist)
			clear_bit(0, &item->flags);
	}

I think this code is buggy in that flush() can race with queue(), the same
way as vhost_worker() and vhost_work_queue().

Once flush() clears bit 0, queue() can come on another CPU and re-queue
this item and change item->llist.next. We need a barrier before clear_bit()
to ensure that next = llist_entry(item->next) in llist_for_each_entry_safe()
completes before the result of clear_bit() is visible to queue().

And, I do not think we can rely on control dependency because... because
I fail to see the load-store control dependency in this code,
llist_for_each_entry_safe() loads item->llist.next but doesn't check the
result until the next iteration.

No?

Oleg.