[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230605142034.GD32275@redhat.com>
Date: Mon, 5 Jun 2023 16:20:35 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Jason Wang <jasowang@...hat.com>,
Mike Christie <michael.christie@...cle.com>,
linux@...mhuis.info, nicolas.dichtel@...nd.com, axboe@...nel.dk,
ebiederm@...ssion.com, linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, mst@...hat.com,
sgarzare@...hat.com, stefanha@...hat.com, brauner@...nel.org
Subject: Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps
regression
On 06/02, Linus Torvalds wrote:
>
> On Fri, Jun 2, 2023 at 1:59 PM Oleg Nesterov <oleg@...hat.com> wrote:
> >
> > As I said from the very beginning, this code is fine on x86 because
> > atomic ops are fully serialised on x86.
>
> Yes. Other architectures require __smp_mb__{before,after}_atomic for
> the bit setting ops to actually be memory barriers.
>
> We *should* probably have acquire/release versions of the bit test/set
> helpers, but we don't, so they end up being full memory barriers with
> those things. Which isn't optimal, but I doubt it matters on most
> architectures.
>
> So maybe we'll some day have a "test_bit_acquire()" and a
> "set_bit_release()" etc.
In this particular case we need clear_bit_release() and iiuc it is
already here, just it is named clear_bit_unlock().
So do you agree that vhost_worker() needs smp_mb__before_atomic()
before clear_bit() or just clear_bit_unlock() to avoid the race with
vhost_work_queue() ?
Let me provide a simplified example:
struct item {
struct llist_node llist;
unsigned long flags;
};
struct llist_head HEAD = {}; // global
void queue(struct item *item)
{
// ensure this item was already flushed
if (!test_and_set_bit(0, &item->flags))
llist_add(item->llist, &HEAD);
}
void flush(void)
{
struct llist_node *head = llist_del_all(&HEAD);
struct item *item, *next;
llist_for_each_entry_safe(item, next, head, llist)
clear_bit(0, &item->flags);
}
I think this code is buggy in that flush() can race with queue(), the same
way as vhost_worker() and vhost_work_queue().
Once flush() clears bit 0, queue() can come on another CPU and re-queue
this item and change item->llist.next. We need a barrier before clear_bit()
to ensure that next = llist_entry(item->next) in llist_for_each_entry_safe()
completes before the result of clear_bit() is visible to queue().
And, I do not think we can rely on control dependency because... because
I fail to see the load-store control dependency in this code,
llist_for_each_entry_safe() loads item->llist.next but doesn't check the
result until the next iteration.
No?
Oleg.
Powered by blists - more mailing lists