linux-kernel - Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230602175846.GC555@redhat.com>
Date:   Fri, 2 Jun 2023 19:58:47 +0200
From:   Oleg Nesterov <oleg@...hat.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     Mike Christie <michael.christie@...cle.com>, linux@...mhuis.info,
        nicolas.dichtel@...nd.com, axboe@...nel.dk, ebiederm@...ssion.com,
        torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, mst@...hat.com,
        sgarzare@...hat.com, stefanha@...hat.com, brauner@...nel.org
Subject: Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps
 regression

On 06/02, Jason Wang wrote:
>
> On Thu, Jun 1, 2023 at 3:43 PM Oleg Nesterov <oleg@...hat.com> wrote:
> >
> > and the final rewrite:
> >
> >         if (work->node) {
> >                 work_next = work->node->next;
> >                 if (true)
> >                         clear_bit(&work->flags);
> >         }
> >
> > so again, I do not see the load-store control dependency.
>
> This kind of optimization is suspicious. Especially considering it's
> the control expression of the loop but not a condition.

It is not about optimization,

> Looking at the assembly (x86):
>
>    0xffffffff81d46c5b <+75>:    callq  0xffffffff81689ac0 <llist_reverse_order>
>    0xffffffff81d46c60 <+80>:    mov    %rax,%r15
>    0xffffffff81d46c63 <+83>:    test   %rax,%rax
>    0xffffffff81d46c66 <+86>:    je     0xffffffff81d46c3a <vhost_worker+42>
>    0xffffffff81d46c68 <+88>:    mov    %r15,%rdi
>    0xffffffff81d46c6b <+91>:    mov    (%r15),%r15
>    0xffffffff81d46c6e <+94>:    lock andb $0xfd,0x10(%rdi)
>    0xffffffff81d46c73 <+99>:    movl   $0x0,0x18(%rbx)
>    0xffffffff81d46c7a <+106>:   mov    0x8(%rdi),%rax
>    0xffffffff81d46c7e <+110>:   callq  0xffffffff821b39a0
> <__x86_indirect_thunk_array>
>    0xffffffff81d46c83 <+115>:   callq  0xffffffff821b4d10 <__SCT__cond_resched>
> ...
>
> I can see:
>
> 1) The code read node->next (+91) before clear_bit (+94)

The code does. but what about CPU ?

> 2) And the it uses a lock prefix to guarantee the execution order

As I said from the very beginning, this code is fine on x86 because
atomic ops are fully serialised on x86.

OK. we can't convince each other. I'll try to write another email when
I have time,

If this code is correct, then my understanding of memory barriers is even
worse than I think. I wouldn't be surprised, but I'd like to understand
what I have missed.

Oleg.