[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230803115941.497-1-petrtesarik@huaweicloud.com>
Date: Thu, 3 Aug 2023 13:59:41 +0200
From: Petr Tesarik <petrtesarik@...weicloud.com>
To: Christoph Hellwig <hch@....de>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Robin Murphy <robin.murphy@....com>,
iommu@...ts.linux.dev (open list:DMA MAPPING HELPERS),
linux-kernel@...r.kernel.org (open list)
Cc: Roberto Sassu <roberto.sassu@...weicloud.com>, petr@...arici.cz
Subject: [PATCH v1] swiotlb: optimize get_max_slots()
From: Petr Tesarik <petr.tesarik.ext@...wei.com>
Use a simple logical shift and increment to calculate the number of slots
taken by the DMA segment boundary.
At least GCC-13 is not able to optimize the expression, producing this
horrible assembly code on x86:
cmpq $-1, %rcx
je .L364
addq $2048, %rcx
shrq $11, %rcx
movq %rcx, %r13
.L331:
// rest of the function here...
// after function epilogue and return:
.L364:
movabsq $9007199254740992, %r13
jmp .L331
After the optimization, the code looks more reasonable:
shrq $11, %r11
leaq 1(%r11), %rbx
Signed-off-by: Petr Tesarik <petr.tesarik.ext@...wei.com>
---
kernel/dma/swiotlb.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2b83e3ad9dca..a95d2ea2ae18 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -577,9 +577,7 @@ static inline phys_addr_t slot_addr(phys_addr_t start, phys_addr_t idx)
*/
static inline unsigned long get_max_slots(unsigned long boundary_mask)
{
- if (boundary_mask == ~0UL)
- return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
- return nr_slots(boundary_mask + 1);
+ return (boundary_mask >> IO_TLB_SHIFT) + 1;
}
static unsigned int wrap_area_index(struct io_tlb_mem *mem, unsigned int index)
--
2.25.1
Powered by blists - more mailing lists