[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87pmirj3aq.fsf@linaro.org>
Date: Wed, 29 Jun 2022 09:10:57 +0100
From: Alex Bennée <alex.bennee@...aro.org>
To: Sven Schnelle <svens@...ux.ibm.com>
Cc: David Hildenbrand <david@...hat.com>,
Janosch Frank <frankja@...ux.ibm.com>,
Liam Howlett <liam.howlett@...cle.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Claudio Imbrenda <imbrenda@...ux.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Guenter Roeck <linux@...ck-us.net>,
"maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Yu Zhao <yuzhao@...gle.com>, Juergen Gross <jgross@...e.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Andreas Krebbel <krebbel@...ux.ibm.com>,
Ilya Leoshkevich <iii@...ux.ibm.com>,
Thomas Huth <thuth@...hat.com>, richard.henderson@...aro.org,
qemu-devel@...gnu.org, qemu-s390x@...gnu.org
Subject: Re: qemu-system-s390x hang in tcg (was: Re: [PATCH v8 23/70]
mm/mmap: change do_brk_flags() to expand existing VMA and add
do_brk_munmap())
Sven Schnelle <svens@...ux.ibm.com> writes:
> Hi,
>
> David Hildenbrand <david@...hat.com> writes:
>
>> On 04.05.22 09:37, Janosch Frank wrote:
>>> I had a short look yesterday and the boot usually hangs in the raid6
>>> code. Disabling vector instructions didn't make a difference but a few
>>> interruptions via GDB solve the problem for some reason.
>>>
>>> CCing David and Thomas for TCG
>>>
>>
>> I somehow recall that KASAN was always disabled under TCG, I might be
>> wrong (I thought we'd get a message early during boot that the HW
>> doesn't support KASAN).
>>
>> I recall that raid code is a heavy user of vector instructions.
>>
>> How can I reproduce? Compile upstream (or -next?) with kasan support and
>> run it under TCG?
>
> I spent some time looking into this. It's usually hanging in
> s390vx8_gen_syndrome(). My first thought was that it is a problem with
> the VX instructions, but turned out that it hangs even if i remove all
> the code from s390vx8_gen_syndrome().
>
> Tracing the execution of TB's, i see that the generated code is always
> jumping between a few TB's, but never exiting the TB's to check for
> interrupts (i.e. return to cpu_tb_exec(). I only see calls to
> helper_lookup_tb_ptr to lookup the tb pointer for the next TB.
>
> The raid6 code is waiting for some time to expire by reading jiffies,
> but interrupts are never processed and therefore jiffies doesn't change.
> So the raid6 code hangs forever.
>
> As a test, i made a quick change to test:
>
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index c997c2e8e0..35819fd5a7 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -319,7 +319,8 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
> cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>
> cflags = curr_cflags(cpu);
> - if (check_for_breakpoints(cpu, pc, &cflags)) {
> + if (check_for_breakpoints(cpu, pc, &cflags) ||
> + unlikely(qatomic_read(&cpu->interrupt_request))) {
> cpu_loop_exit(cpu);
> }
>
> And that makes the problem go away. But i'm not familiar with the TCG
> internals, so i can't say whether the generated code is incorrect or
> something else is wrong. I have tcg log files of a failing + working run
> if someone wants to take a look. They are rather large so i would have to
> upload them somewhere.
Whatever is setting cpu->interrupt_request should be calling
cpu_exit(cpu) which sets the exit flag which is checked at the start of
every TB execution (see gen_tb_start).
--
Alex Bennée
Powered by blists - more mailing lists