[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbcf3c93-3868-2b0e-b831-43fa68c48d6c@gmail.com>
Date: Sat, 10 Aug 2019 07:24:14 -0400
From: Woody Suwalski <terraluna977@...il.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>
Subject: Kernel 5.3.x, 5.2.2+: VMware player suspend on 64/32 bit guests
Moving the thread to LKML, as suggested by Thomas...
>
>> ---------- Forwarded message ---------
>> From: Woody Suwalski <terraluna977@...il.com>
>> Date: Thu, Aug 1, 2019 at 3:45 PM
>> Subject: Intermittent suspend on 5.3 / 5.2
>> To: Rafael J. Wysocki <rjw@...ysocki.net>
>>
>>
>> Hi Rafał,
>> I know that you are investigating some issues between these 2 kernels,
>> however I see probably an unrelated problem with suspend on 5.3 and
>> 5.2.4. I think it has creeped in to 5.1.21 as well, but not sure (it is
>> intermittent). So far 4.20.17 works OK, and I think 5.2.0 works OK.
>> The problem I see is on both 32 and 64 bit VMs, in VMware workstation
>> 15. The VM is trying to suspend when no activity. It leaves out a black
>> box with cursor in top-left position. Upon wakeup from VMware it goes to
>> vmware pre-bios screen, and then expands the black box to the run-size
>> and switches to X.
>> The problem with new kernels is that (I think) the suspend fails - the
>> black box with cursor is there, but seems bigger, and of course is not
>> wake'able (have to reset). In kern.log suspend seems be running OK, and
>> then new dmesg lines kick in, and no obvious culprit.
>> So looking for a free advice .
>> a. You already know what it is
>> b. You may have suggestions as to which upstream patch could be to blame
>> c. I should boot with some debug params (console_off=0, or some other?)
>> and get some real info?
>>
>> BTW. For suspend to work I had to override mem_sleep to [shallow], or
>> maybe later to [s2idle] (the actual VMs are at work, referring from
>> memory...)
>>
>> If you have any ideas, all are welcomed
>> Thanks, Woody
On 8/6/2019 3:18 PM, Woody Suwalski wrote:
> Rafal, the patch (in 5.3-rc3)
>
> Fixes: f850a48a0799 ("ACPI: PM: Allow transitions to D0 to occur in
> special cases")
>
> does not fix the issue - it must be something else...
Sorry for the late response.
There are known issues in 5.3-rc related to power management which
should be fixed in -rc4. Please try that one when it is out.
Cheers!
Thomas Gleixner wrote:
> Woody,
>
> On Fri, 9 Aug 2019, Woody Suwalski wrote:
>
> For future things like this, please CC LKML. There is nothing secrit here
> and CC'ing the mailing list allows other people to find this and spare
> themself the whole bisection pain. Asided of that private mail does not
> scale. On the list other people can look at it and give input eventually.
>
>> After bisecting I have found the potential culprit:
>> dfe0cf8b x86/ioapic: Implement irq_get irqchip_state() callback
>>
>> I am repeating the bisection from start to re-confirm.
>>
>> Reverse-patch on 5.3-rc3 (64bit) is fixing the problem for me.
>> What is unclear - just adding the patch to 5.2.1 does not seem to
>> break it. So there is some more magic involved.
> Of course it does not do anything because 5.2.1 is not having
>
> f4999a2a3a48 ("genirq: Add optional hardware synchronization for shutdown")
>
>> Thomas, any suggestions?
> What that means is that there is an interrupt shutdown which hits the
> condition where an interrupt _IS_ marked in the IOAPIC as delivered to a
> CPU, but not serviced yet.
>
> Now the question is why it is not serviced. suspend_device_irqs() is
> calling into synchronize_irq(), which is probably the place where that
> it hangs. But that's called with CPUs online and interrupts enabled.
>
>> The reproduce methodology: use VMware player 15, either 32 or 64 bit build.
>> reboot and run "systemctl suspend". The first suspend works OK. The
>> second usually locks on kernels 5.2.2 and up. Maybe try 4 times to
>> confirm good (it is intermittent).
> -ENOVMWAREPLAYER and I'm traveling so I don't have a machine handy to
> install it. So if you can't debug it deeper down, I'm not going to have a
> chance to look at it before the end of next week.
>
> That said, can we please move this to LKML?
>
> Thanks,
>
> tglx
>
>
I can add some printk's into synchronize_irq(), however no idea if they
will be survive in the kmsg log after a next power-reset. I can wait for
a week :-)
Thanks, Woody
Powered by blists - more mailing lists