linux-kernel - Re: [PATCH] KVM: selftests: Double check on the current CPU in rseq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YtA3s0VRj3x7vO7B@google.com>
Date:   Thu, 14 Jul 2022 15:35:15 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Gavin Shan <gshan@...hat.com>, kvmarm@...ts.cs.columbia.edu,
        kvm@...r.kernel.org, linux-kselftest@...r.kernel.org,
        linux-kernel@...r.kernel.org, mathieu.desnoyers@...icios.com,
        shuah@...nel.org, maz@...nel.org, oliver.upton@...ux.dev,
        shan.gavin@...il.com
Subject: Re: [PATCH] KVM: selftests: Double check on the current CPU in
 rseq_test

On Thu, Jul 14, 2022, Paolo Bonzini wrote:
> On 7/14/22 10:06, Gavin Shan wrote:
> > In rseq_test, there are two threads created. Those two threads are
> > 'main' and 'migration_thread' separately. We also have the assumption
> > that non-migration status on 'migration-worker' thread guarantees the
> > same non-migration status on 'main' thread. Unfortunately, the assumption
> > isn't true. The 'main' thread can be migrated from one CPU to another
> > one between the calls to sched_getcpu() and READ_ONCE(__rseq.cpu_id).
> > The following assert is raised eventually because of the mismatched
> > CPU numbers.
> > 
> > The issue can be reproduced on arm64 system occasionally.
> 
> Hmm, this does not seem a correct patch - the threads are already
> synchronizing using seq_cnt, like this:
> 
> 	migration			main
> 	----------------------		--------------------------------
> 	seq_cnt = 1
> 	smp_wmb()
> 					snapshot = 0
> 					smp_rmb()
> 					cpu = sched_getcpu() reads 23
> 	sched_setaffinity()
> 					rseq_cpu = __rseq.cpuid reads 35
> 					smp_rmb()
> 					snapshot != seq_cnt -> retry
> 	smp_wmb()
> 	seq_cnt = 2
> 
> sched_setaffinity() is guaranteed to block until the task is enqueued on an
> allowed CPU.

Yes, and retrying could suppress detection of kernel bugs that this test is intended
to catch.

> Can you check that smp_rmb() and smp_wmb() generate correct instructions on
> arm64?

That seems like the most likely scenario (or a kernel bug), I distinctly remember
the barriers provided by tools/ being rather bizarre.