[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e6d67c6-94f8-5065-3264-02dd6a82888e@oracle.com>
Date: Thu, 1 Sep 2022 09:46:54 -0700
From: Dongli Zhang <dongli.zhang@...cle.com>
To: Robin Murphy <robin.murphy@....com>, Yu Zhao <yuzhao@...gle.com>
Cc: Christoph Hellwig <hch@...radead.org>,
Marek Szyprowski <m.szyprowski@...sung.com>,
iommu@...ts.linux.dev, linux-mips@...r.kernel.org,
linux-kernel <linux-kernel@...r.kernel.org>,
kernel test robot <lkp@...el.com>,
Dan Carpenter <dan.carpenter@...cle.com>
Subject: Re: [PATCH v2] Revert "swiotlb: panic if nslabs is too small"
Hi Robin,
On 9/1/22 5:24 AM, Robin Murphy wrote:
> On 2022-09-01 03:18, Dongli Zhang wrote:
>> Hi Yu,
>>
>> On 8/31/22 5:24 PM, Yu Zhao wrote:
>>> On Wed, Aug 31, 2022 at 4:20 PM Dongli Zhang <dongli.zhang@...cle.com> wrote:
>>>>
>>>> Hi Yu,
>>>>
>>>> As we discussed in the past, the swiotlb panic on purpose
>>>
>>> We should not panic() at all, especially on a platform that has been
>>> working well since at least 4.14.
>>>
>>> Did you check out this link I previously shared with you?
>>> https://urldefense.com/v3/__https://lore.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld*pnerc4J2Ag990WwAA@mail.gmail.com/__;Kw!!ACWV5N9M2RV99hQ!PXzSLurBv7VqxI1451TV4zO3_-BYj4grk-HYBsXzSnA6nZcXaBzdsQ-rF2DAqlICSRPMt-efYv_Uu2A2CQ$
>>>
>>
>> Thanks for sharing! I used to read that in the past. To be honest I am still
>> confused on when to use BUG/panic and when to not, as I still see many usage in
>> some patches.
>>
>> Just about swiotlb, it may panic in many cases during boot, e.g.,:
>>
>> https://urldefense.com/v3/__https://bugs.launchpad.net/ubuntu/*source/linux/*bug/1955655__;Kys!!ACWV5N9M2RV99hQ!MO4S9r0PisrW6Z-kZqUitAMbuGNMX6aceQd5VApqXllP6f1jaPCyL9x50um7chsn2uGL_pNqdBzTjZK5GebeKrI$
>
>
> That's really a different thing, but frankly that panic is also bogus anyway -
> there is no good reason to have different behaviour for failing to allocate a
> buffer slot because the buffer is full vs. failing to allocate a buffer slot
> because there is no buffer. If we can fail gracefully some of the time we should
> fail gracefully all of the time. Yes, there's a slight difference in that one
Thank you very much for the feedback!
Currently the swiotlb remap logic is more gracefully to handle failure, to
re-try with smaller nslabs (line 352), but will still panic at the end if the
value is reduced to smaller than IO_TLB_MIN_SLABS.
337 retry:
338 bytes = PAGE_ALIGN(nslabs << IO_TLB_SHIFT);
339 if (flags & SWIOTLB_ANY)
340 tlb = memblock_alloc(bytes, PAGE_SIZE);
341 else
342 tlb = memblock_alloc_low(bytes, PAGE_SIZE);
343 if (!tlb) {
344 pr_warn("%s: Failed to allocate %zu bytes tlb structure\n",
345 __func__, bytes);
346 return;
347 }
348
349 if (remap && remap(tlb, nslabs) < 0) {
350 memblock_free(tlb, PAGE_ALIGN(bytes));
351
352 nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
353 if (nslabs < IO_TLB_MIN_SLABS)
354 panic("%s: Failed to remap %zu bytes\n",
355 __func__, bytes);
356 goto retry;
357 }
> case has a chance of succeeding if retried in future while the other definitely
> doesn't, but it's not SWIOTLB's place to decide that the entire system is
> terminally unusable just because some device can't make a DMA mapping.
Currently the swiotlb panic at line 735 on purpose if swiotlb initialization is
failed, but when any DMA requires a swiotlb mapped slot.
724 phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
725 size_t mapping_size, size_t alloc_size,
726 unsigned int alloc_align_mask, enum dma_data_direction dir,
727 unsigned long attrs)
728 {
729 struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
730 unsigned int offset = swiotlb_align_offset(dev, orig_addr);
731 unsigned int i;
732 int index;
733 phys_addr_t tlb_addr;
734
735 if (!mem || !mem->nslabs)
736 panic("Can not allocate SWIOTLB buffer earlier and can't
now provide you with the DMA bounce buffer");
I agree it is a very interesting point whether it is a good idea to leave the
decision to swiotlb for a panic. One thing to take care is the driver should
gracefully handle the error returned by swiotlb. Otherwise, there may be IO hang
(e.g., hung_task) instead of an IO failure.
>
> Similarly for the other panics at init time, which seem to have originated from
> a mechanical refactoring of the memblock API with the expectation of preserving
> the existing behaviour at the time. Those have then just been moved around
> without anyone thinking to question why they're there, and if memblock *does*
> now return usable error information, why don't we start handling that error
> properly like we do in other init paths?
>
> Really there is no reason to panic anywhere in SWIOTLB. This is old code, things
> have moved on over the last 20 years, and we can and should do much better. I'll
> add this to my list of things to look at for cleanup once I find a bit of free
> time, unless anyone else fancies taking it on.
Thank you very much for the feedback. Looking forward to how this can be improved!
Dongli Zhang
Powered by blists - more mailing lists