[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <355b58b1-6c51-4c42-b6ea-dcd6b1617a18@linux.ibm.com>
Date: Mon, 19 Aug 2024 09:45:55 +0530
From: Sourabh Jain <sourabhjain@...ux.ibm.com>
To: Michael Ellerman <mpe@...erman.id.au>, bhe@...hat.com
Cc: Hari Bathini <hbathini@...ux.ibm.com>, kexec@...ts.infradead.org,
linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
x86@...nel.org, Sachin P Bappalige <sachinpb@...ux.vnet.ibm.com>
Subject: Re: [PATCH] kexec/crash: no crash update when kexec in progress
Hello Michael and Boaquan
On 01/08/24 12:21, Sourabh Jain wrote:
> Hello Michael,
>
> On 01/08/24 08:04, Michael Ellerman wrote:
>> Sourabh Jain <sourabhjain@...ux.ibm.com> writes:
>>> The following errors are observed when kexec is done with SMT=off on
>>> powerpc.
>>>
>>> [ 358.458385] Removing IBM Power 842 compression device
>>> [ 374.795734] kexec_core: Starting new kernel
>>> [ 374.795748] kexec: Waking offline cpu 1.
>>> [ 374.875695] crash hp: kexec_trylock() failed, elfcorehdr may be
>>> inaccurate
>>> [ 374.935833] kexec: Waking offline cpu 2.
>>> [ 375.015664] crash hp: kexec_trylock() failed, elfcorehdr may be
>>> inaccurate
>>> snip..
>>> [ 375.515823] kexec: Waking offline cpu 6.
>>> [ 375.635667] crash hp: kexec_trylock() failed, elfcorehdr may be
>>> inaccurate
>>> [ 375.695836] kexec: Waking offline cpu 7.
>> Are they actually errors though? Do they block the actual kexec from
>> happening? Or are they just warnings in dmesg?
>
> The kexec kernel boots fine.
>
> This warning appears regardless of whether the kdump kernel is loaded.
>
> However, when the kdump kernel is loaded, we will not be able to
> update the kdump image (FDT).
> I think this should be fine given that kexec is in progress.
>
> Please let me know your opinion.
>
>> Because the fix looks like it could be racy.
>
> It seems like it is racy, but given that kexec takes the lock first
> and then
> brings the CPU up, which triggers the kdump image, which always fails to
> update the kdump image because it could not take the same lock.
>
> Note: the kexec lock is not released unless kexec boot fails.
Any comments or suggestions on this fix?
Thanks,
Sourabh Jain
Powered by blists - more mailing lists