lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 2 Jun 2023 19:58:47 +0200
From:   Oleg Nesterov <oleg@...hat.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     Mike Christie <michael.christie@...cle.com>, linux@...mhuis.info,
        nicolas.dichtel@...nd.com, axboe@...nel.dk, ebiederm@...ssion.com,
        torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, mst@...hat.com,
        sgarzare@...hat.com, stefanha@...hat.com, brauner@...nel.org
Subject: Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps
 regression

On 06/02, Jason Wang wrote:
>
> On Thu, Jun 1, 2023 at 3:43 PM Oleg Nesterov <oleg@...hat.com> wrote:
> >
> > and the final rewrite:
> >
> >         if (work->node) {
> >                 work_next = work->node->next;
> >                 if (true)
> >                         clear_bit(&work->flags);
> >         }
> >
> > so again, I do not see the load-store control dependency.
>
> This kind of optimization is suspicious. Especially considering it's
> the control expression of the loop but not a condition.

It is not about optimization,

> Looking at the assembly (x86):
>
>    0xffffffff81d46c5b <+75>:    callq  0xffffffff81689ac0 <llist_reverse_order>
>    0xffffffff81d46c60 <+80>:    mov    %rax,%r15
>    0xffffffff81d46c63 <+83>:    test   %rax,%rax
>    0xffffffff81d46c66 <+86>:    je     0xffffffff81d46c3a <vhost_worker+42>
>    0xffffffff81d46c68 <+88>:    mov    %r15,%rdi
>    0xffffffff81d46c6b <+91>:    mov    (%r15),%r15
>    0xffffffff81d46c6e <+94>:    lock andb $0xfd,0x10(%rdi)
>    0xffffffff81d46c73 <+99>:    movl   $0x0,0x18(%rbx)
>    0xffffffff81d46c7a <+106>:   mov    0x8(%rdi),%rax
>    0xffffffff81d46c7e <+110>:   callq  0xffffffff821b39a0
> <__x86_indirect_thunk_array>
>    0xffffffff81d46c83 <+115>:   callq  0xffffffff821b4d10 <__SCT__cond_resched>
> ...
>
> I can see:
>
> 1) The code read node->next (+91) before clear_bit (+94)

The code does. but what about CPU ?

> 2) And the it uses a lock prefix to guarantee the execution order

As I said from the very beginning, this code is fine on x86 because
atomic ops are fully serialised on x86.

OK. we can't convince each other. I'll try to write another email when
I have time,

If this code is correct, then my understanding of memory barriers is even
worse than I think. I wouldn't be surprised, but I'd like to understand
what I have missed.

Oleg.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ