lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yt9d5ykkhrvv.fsf_-_@linux.ibm.com>
Date:   Wed, 29 Jun 2022 09:04:52 +0200
From:   Sven Schnelle <svens@...ux.ibm.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Janosch Frank <frankja@...ux.ibm.com>,
        Liam Howlett <liam.howlett@...cle.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Claudio Imbrenda <imbrenda@...ux.ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Guenter Roeck <linux@...ck-us.net>,
        "maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Yu Zhao <yuzhao@...gle.com>, Juergen Gross <jgross@...e.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Andreas Krebbel <krebbel@...ux.ibm.com>,
        Ilya Leoshkevich <iii@...ux.ibm.com>,
        Thomas Huth <thuth@...hat.com>, richard.henderson@...aro.org,
        qemu-devel@...gnu.org, qemu-s390x@...gnu.org
Subject: qemu-system-s390x hang in tcg (was: Re: [PATCH v8 23/70] mm/mmap:
 change do_brk_flags() to expand existing VMA and add do_brk_munmap())

Hi,

David Hildenbrand <david@...hat.com> writes:

> On 04.05.22 09:37, Janosch Frank wrote:
>> I had a short look yesterday and the boot usually hangs in the raid6 
>> code. Disabling vector instructions didn't make a difference but a few 
>> interruptions via GDB solve the problem for some reason.
>> 
>> CCing David and Thomas for TCG
>> 
>
> I somehow recall that KASAN was always disabled under TCG, I might be
> wrong (I thought we'd get a message early during boot that the HW
> doesn't support KASAN).
>
> I recall that raid code is a heavy user of vector instructions.
>
> How can I reproduce? Compile upstream (or -next?) with kasan support and
> run it under TCG?

I spent some time looking into this. It's usually hanging in
s390vx8_gen_syndrome(). My first thought was that it is a problem with
the VX instructions, but turned out that it hangs even if i remove all
the code from s390vx8_gen_syndrome().

Tracing the execution of TB's, i see that the generated code is always
jumping between a few TB's, but never exiting the TB's to check for
interrupts (i.e. return to cpu_tb_exec(). I only see calls to
helper_lookup_tb_ptr to lookup the tb pointer for the next TB.

The raid6 code is waiting for some time to expire by reading jiffies,
but interrupts are never processed and therefore jiffies doesn't change.
So the raid6 code hangs forever.

As a test, i made a quick change to test:

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index c997c2e8e0..35819fd5a7 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -319,7 +319,8 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);

     cflags = curr_cflags(cpu);
-    if (check_for_breakpoints(cpu, pc, &cflags)) {
+    if (check_for_breakpoints(cpu, pc, &cflags) ||
+        unlikely(qatomic_read(&cpu->interrupt_request))) {
         cpu_loop_exit(cpu);
     }

And that makes the problem go away. But i'm not familiar with the TCG
internals, so i can't say whether the generated code is incorrect or
something else is wrong. I have tcg log files of a failing + working run
if someone wants to take a look. They are rather large so i would have to
upload them somewhere.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ