[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200624002357.GA9955@ashkalra_ubuntu_server>
Date: Wed, 24 Jun 2020 00:23:57 +0000
From: Ashish Kalra <ashish.kalra@....com>
To: Konrad Rzeszutek Wilk <konrad@...nok.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>, hch@....de,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com,
x86@...nel.org, luto@...nel.org, peterz@...radead.org,
dave.hansen@...ux-intel.com, iommu@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org, brijesh.singh@....com,
Thomas.Lendacky@....com
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV
guests.
Hello Konrad,
On Tue, Jun 23, 2020 at 09:38:43AM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Apr 27, 2020 at 06:53:18PM +0000, Ashish Kalra wrote:
> > Hello Konrad,
> >
> > On Mon, Mar 30, 2020 at 10:25:51PM +0000, Ashish Kalra wrote:
> > > Hello Konrad,
> > >
> > > On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Tue, Feb 04, 2020 at 07:35:00PM +0000, Ashish Kalra wrote:
> > > > > Hello Konrad,
> > > > >
> > > > > Looking fwd. to your feedback regarding support of other memory
> > > > > encryption architectures such as Power, S390, etc.
> > > > >
> > > > > Thanks,
> > > > > Ashish
> > > > >
> > > > > On Fri, Jan 24, 2020 at 11:00:08PM +0000, Ashish Kalra wrote:
> > > > > > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > >
> > > > > > > > Additional memory calculations based on # of PCI devices and
> > > > > > > > their memory ranges will make it more complicated with so
> > > > > > > > many other permutations and combinations to explore, it is
> > > > > > > > essential to keep this patch as simple as possible by
> > > > > > > > adjusting the bounce buffer size simply by determining it
> > > > > > > > from the amount of provisioned guest memory.
> > > > > > >>
> > > > > > >> Please rework the patch to:
> > > > > > >>
> > > > > > >> - Use a log solution instead of the multiplication.
> > > > > > >> Feel free to cap it at a sensible value.
> > > > > >
> > > > > > Ok.
> > > > > >
> > > > > > >>
> > > > > > >> - Also the code depends on SWIOTLB calling in to the
> > > > > > >> adjust_swiotlb_default_size which looks wrong.
> > > > > > >>
> > > > > > >> You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
> > > > > >
> > > > > > >> That function's purpose is to report a value.
> > > > > > >>
> > > > > > >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
> > > > > > >>
> > > > > > >> - Can you utilize the IOMMU_INIT APIs and have your own detect which would
> > > > > > >> modify the io_tlb_nslabs (and set swiotbl=1?).
> > > > > >
> > > > > > This seems to be a nice option, but then IOMMU_INIT APIs are
> > > > > > x86-specific and this swiotlb buffer size adjustment is also needed
> > > > > > for other memory encryption architectures like Power, S390, etc.
> > > >
> > > > Oh dear. That I hadn't considered.
> > > > > >
> > > > > > >>
> > > > > > >> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
> > > > > > >> perhaps add in this code ? Albeit it really should be in it's own
> > > > > > >> file, not in arch/x86/kernel/pci-swiotlb.c
> > > > > >
> > > > > > Actually, we piggyback on pci_swiotlb_detect_override which sets
> > > > > > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> > > > > > forces swiotlb on, but again this is all x86 architecture specific.
> > > >
> > > > Then it looks like the best bet is to do it from within swiotlb_init?
> > > > We really can't do it from swiotlb_size_or_default - that function
> > > > should just return a value and nothing else.
> > > >
> > >
> > > Actually, we need to do it in swiotlb_size_or_default() as this gets called by
> > > reserve_crashkernel_low() in arch/x86/kernel/setup.c and used to
> > > reserve low crashkernel memory. If we adjust swiotlb size later in
> > > swiotlb_init() which gets called later than reserve_crashkernel_low(),
> > > then any swiotlb size changes/expansion will conflict/overlap with the
> > > low memory reserved for crashkernel.
> > >
> > and will also potentially cause SWIOTLB buffer allocation failures.
> >
> > Do you have any feedback, comments on the above ?
>
>
> The init boot chain looks like this:
>
> initmem_init
> pci_iommu_alloc
> -> pci_swiotlb_detect_4gb
> -> swiotlb_init
>
> reserve_crashkernel
> reserve_crashkernel_low
> -> swiotlb_size_or_default
> ..
>
>
> (rootfs code):
> pci_iommu_init
> -> a bunch of the other IOMMU late_init code gets called..
> -> pci_swiotlb_late_init
>
> I have to say I am lost to how your patch fixes "If we adjust swiolb
> size later .. then any swiotlb size .. will overlap with the low memory
> reserved for crashkernel"?
>
Actually as per the boot flow :
setup_arch() calls reserve_crashkernel() and pci_iommu_alloc() is
invoked through mm_init()/mem_init() and not via initmem_init().
start_kernel:
...
setup_arch()
reserve_crashkernel
reserve_crashkernel_low
-> swiotlb_size_or_default
...
...
mm_init()
mem_init()
pci_iommu_alloc
-> pci_swiotlb_detect_4gb
-> swiotlb_init
So as per the above boot flow, reserve_crashkernel() can get called
before swiotlb_detect/init, and hence, if we don't fixup or adjust
the SWIOTLB buffer size in swiotlb_size_or_default() then crash kernel
will reserve memory which will conflict/overlap with any SWIOTLB bounce
buffer allocated memory (adjusted or fixed up later).
Therefore, we need to adjust/fixup SWIOTLB bounce buffer memory in
swiotlb_size_or_default() function itself, before swiotlb detect/init
funtions get invoked.
Thanks,
Ashish
> Or are you saying that 'reserve_crashkernel_low' is the _culprit_ and it
> is the one changing the size? And hence it modifying the swiotlb size
> will fix this problem? Aka _before_ all the other IOMMU get their hand
> on it?
>
> If so why not create an
> IOMMU_INIT(crashkernel_adjust_swiotlb,pci_swiotlb_detect_override,
> NULL, NULL);
>
> And crashkernel_adjust_swiotlb would change the size of swiotlb buffer
> if conditions are found to require it.
>
> You also may want to put a #define DEBUG in arch/x86/kernel/pci-iommu_table.c
> to check out whether the tree structure of IOMMU entries is correct.
>
>
>
> But still I am lost - if say the AMD one does decide for unknown reason
> to expand the SWIOTLB you are still stuck with the 'overlap with
> the low memory reserved' or so.
>
> Perhaps add a late_init that gets called as the last one to validate
> this ? And maybe if the swiotlb gets turned off you also take proper
> steps?
>
> > As such i feel, this patch is complete otherwise and can be included as
> > it is.
> >
> > Thanks,
> > Ashish
Powered by blists - more mailing lists