[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZXFIsZ+0GmUZMFk3@MiWiFi-R3L-srv>
Date: Thu, 7 Dec 2023 12:23:13 +0800
From: Baoquan He <bhe@...hat.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Philipp Rudo <prudo@...hat.com>,
Donald Dutile <ddutile@...hat.com>,
Jiri Bohac <jbohac@...e.cz>, Pingfan Liu <piliu@...hat.com>,
Tao Liu <ltao@...hat.com>, Vivek Goyal <vgoyal@...hat.com>,
Dave Young <dyoung@...hat.com>, kexec@...ts.infradead.org,
linux-kernel@...r.kernel.org,
David Hildenbrand <dhildenb@...hat.com>
Subject: Re: [PATCH 0/4] kdump: crashkernel reservation from CMA
On 12/06/23 at 04:19pm, Michal Hocko wrote:
> On Wed 06-12-23 14:49:51, Michal Hocko wrote:
> > On Wed 06-12-23 12:08:05, Philipp Rudo wrote:
> [...]
> > > If I understand Documentation/core-api/pin_user_pages.rst correctly you
> > > missed case 1 Direct IO. In that case "short term" DMA is allowed for
> > > pages without FOLL_LONGTERM. Meaning that there is a way you can
> > > corrupt the CMA and with that the crash kernel after the production
> > > kernel has panicked.
> >
> > Could you expand on this? How exactly direct IO request survives across
> > into the kdump kernel? I do understand the RMDA case because the IO is
> > async and out of control of the receiving end.
>
> OK, I guess I get what you mean. You are worried that there is
> DIO request
> program DMA controller to read into CMA memory
> <panic>
> boot into crash kernel backed by CMA
> DMA transfer is done.
>
> DIO doesn't migrate the pinned memory because it is considered a very
> quick operation which doesn't block the movability for too long. That is
> why I have considered that a non-problem. RDMA on the other might pin
> memory for transfer for much longer but that case is handled by
> migrating the memory away.
>
> Now I agree that there is a chance of the corruption from DIO. The
> question I am not entirely clear about right now is how big of a real
> problem that is. DMA transfers should be a very swift operation. Would
> it help to wait for a grace period before jumping into the kdump kernel?
On system with hardware IOMMU of x86_64, people finally had fixed it
after very long history of trying, arguing. Until 2014, HPE's engineer came
up with a series to copy the 1st kernel's iommu page table to kdump
kernel so that the on-flight DMA from 1st kernel can continue
transferring. Later, these attempts and discussions were converted codes
into mainline kernel. Before that, people even tried to introduce
reset_devices() before jumping to kdump kernel. But that was denied
immediately because any extra unnecessary actions could cause uncertain
failure of kdump kernel, given 1st kernel has been in an unpredictable
unstable situation.
We can't guarantee how swift the DMA transfer could be in the cma, case,
it will be a venture.
[3]
[PATCH v9 00/13] Fix the on-flight DMA issue on system with amd iommu
https://lists.openwall.net/linux-kernel/2017/08/01/399
[2]
[PATCH 00/19] Fix Intel IOMMU breakage in kdump kernel
https://lists.openwall.net/linux-kernel/2015/06/13/72
[1]
[PATCH 0/8] iommu/vt-d: Fix crash dump failure caused by legacy DMA/IO
https://lkml.org/lkml/2014/4/24/836
>
> > Also if direct IO is a problem how come this is not a problem for kexec
> > in general. The new kernel usually shares all the memory with the 1st
> > kernel.
>
> This is also more clear now. Pure kexec is shutting down all the devices
> which should terminate the in-flight DMA transfers.
Exactly. That's what I have been noticing in this thread.
Powered by blists - more mailing lists