lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ikvmv45i.fsf@tarshish>
Date: Tue, 27 Aug 2024 10:03:21 +0300
From: Baruch Siach <baruch@...s.co.il>
To: Marek Szyprowski <m.szyprowski@...sung.com>
Cc: Christoph Hellwig <hch@....de>,  Catalin Marinas
 <catalin.marinas@....com>,  Will Deacon <will@...nel.org>,  Robin Murphy
 <robin.murphy@....com>,  iommu@...ts.linux.dev,
  linux-arm-kernel@...ts.infradead.org,  linux-kernel@...r.kernel.org,
  linuxppc-dev@...ts.ozlabs.org,  linux-s390@...r.kernel.org,  Petr
 Tesařík
 <petr@...arici.cz>,  Ramon Fried <ramon@...reality.ai>,  Elad Nachman
 <enachman@...vell.com>,  linux-rockchip@...ts.infradead.org
Subject: Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit

Hi Marek,

On Tue, Aug 27 2024, Marek Szyprowski wrote:
> On 27.08.2024 06:52, Baruch Siach wrote:
>> Hi Marek,
>>
>> Thanks for your report.
>>
>> On Mon, Aug 26 2024, Marek Szyprowski wrote:
>>> On 11.08.2024 09:09, Baruch Siach wrote:
>>>> From: Catalin Marinas <catalin.marinas@....com>
>>>>
>>>> Hardware DMA limit might not be power of 2. When RAM range starts above
>>>> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
>>>> can not encode this limit.
>>>>
>>>> Use plain address for DMA zone limit.
>>>>
>>>> Since DMA zone can now potentially span beyond 4GB physical limit of
>>>> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
>>>>
>>>> Signed-off-by: Catalin Marinas <catalin.marinas@....com>
>>>> Co-developed-by: Baruch Siach <baruch@...s.co.il>
>>>> Signed-off-by: Baruch Siach <baruch@...s.co.il>
>>>> ---
>>> This patch landed recently in linux-next as commit ba0fb44aed47
>>> ("dma-mapping: replace zone_dma_bits by zone_dma_limit"). During my
>>> tests I found that it introduces the following warning on ARM64/Rockchip
>>> based Odroid M1 board (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts):
>> Does this warning go away if you revert both 3be9b846896d and ba0fb44aed47?
>
> Yes, linux-next with above mentioned commits reverted works fine.
>
>
>> Upstream rockchip DTs have no dma-ranges property. Is that the case for
>> your platform as well?
>>
>> Can you share kernel report of DMA zones and swiotlb? On my platform I get:
>>
>> [    0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x0000000800000000-0x000000083fffffff]
>> [    0.000000]   DMA32    empty
>> [    0.000000]   Normal   [mem 0x0000000840000000-0x0000000fffffffff]
>> ...
>> [    0.000000] software IO TLB: area num 8.
>> [    0.000000] software IO TLB: mapped [mem 0x000000083be38000-0x000000083fe38000] (64MB)
>>
>> What do you get at your end?
>
> On ba0fb44aed47 I got:
>
> [    0.000000] NUMA: No NUMA configuration found
> [    0.000000] NUMA: Faking a node at [mem 
> 0x0000000000200000-0x00000001ffffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x1ff7a0600-0x1ff7a2fff]
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000200000-0x00000001ffffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000200000-0x00000000083fffff]
> [    0.000000]   node   0: [mem 0x0000000009400000-0x00000000efffffff]
> [    0.000000]   node   0: [mem 0x00000001f0000000-0x00000001ffffffff]
> [    0.000000] Initmem setup node 0 [mem 
> 0x0000000000200000-0x00000001ffffffff]
> [    0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
> [    0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
> [    0.000000] cma: Reserved 96 MiB at 0x00000001f0000000 on node -1
>
> ...
>
> [    0.000000] software IO TLB: SWIOTLB bounce buffer size adjusted to 3MB
> [    0.000000] software IO TLB: area num 4.
> [    0.000000] software IO TLB: mapped [mem 
> 0x00000001fac00000-0x00000001fb000000] (4MB)
>
> On the fa3c109a6d30 (parent commit of the $subject) I got:
>
> [    0.000000] NUMA: No NUMA configuration found
> [    0.000000] NUMA: Faking a node at [mem 
> 0x0000000000200000-0x00000001ffffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x1ff7a0600-0x1ff7a2fff]
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   [mem 0x0000000100000000-0x00000001ffffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000200000-0x00000000083fffff]
> [    0.000000]   node   0: [mem 0x0000000009400000-0x00000000efffffff]
> [    0.000000]   node   0: [mem 0x00000001f0000000-0x00000001ffffffff]
> [    0.000000] Initmem setup node 0 [mem 
> 0x0000000000200000-0x00000001ffffffff]
> [    0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
> [    0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
> [    0.000000] cma: Reserved 96 MiB at 0x00000000ea000000 on node -1
>
> ...
>
> [    0.000000] software IO TLB: area num 4.
> [    0.000000] software IO TLB: mapped [mem 
> 0x00000000e6000000-0x00000000ea000000] (64MB)
>
> It looks that for some reasons $subject patch changes the default zone 
> and swiotlb configuration.

Does this fix the issue?

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index bfb10969cbf0..7fcd0aaa9bb6 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -116,6 +116,9 @@ static void __init arch_reserve_crashkernel(void)
 
 static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
 {
+	if (memblock_start_of_DRAM() < U32_MAX)
+		zone_limit = min(zone_limit, U32_MAX);
+
 	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
 }
 

Thanks,
baruch

>>> ------------[ cut here ]------------
>>> dwmmc_rockchip fe2b0000.mmc: swiotlb addr 0x00000001faf00000+4096
>>> overflow (mask ffffffff, bus limit 0).
>>> WARNING: CPU: 3 PID: 1 at kernel/dma/swiotlb.c:1594 swiotlb_map+0x2f0/0x308
>>> Modules linked in:
>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc4+ #15278
>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> pc : swiotlb_map+0x2f0/0x308
>>> lr : swiotlb_map+0x2f0/0x308
>>> ...
>>> Call trace:
>>>    swiotlb_map+0x2f0/0x308
>>>    dma_direct_map_sg+0x9c/0x2e4
>>>    __dma_map_sg_attrs+0x28/0x94
>>>    dma_map_sg_attrs+0x10/0x24
>>>    dw_mci_pre_dma_transfer+0xb8/0xf4
>>>    dw_mci_pre_req+0x50/0x68
>>>    mmc_blk_mq_issue_rq+0x3e0/0x964
>>>    mmc_mq_queue_rq+0x118/0x2b4
>>>    blk_mq_dispatch_rq_list+0x21c/0x714
>>>    __blk_mq_sched_dispatch_requests+0x490/0x58c
>>>    blk_mq_sched_dispatch_requests+0x30/0x6c
>>>    blk_mq_run_hw_queue+0x284/0x40c
>>>    blk_mq_flush_plug_list.part.0+0x190/0x974
>>>    blk_mq_flush_plug_list+0x1c/0x2c
>>>    __blk_flush_plug+0xe4/0x140
>>>    blk_finish_plug+0x38/0x4c
>>>    __ext4_get_inode_loc+0x22c/0x654
>>>    __ext4_get_inode_loc_noinmem+0x40/0xa8
>>>    __ext4_iget+0x154/0xcc0
>>>    ext4_get_journal_inode+0x30/0x110
>>>    ext4_load_and_init_journal+0x9c/0xaf0
>>>    ext4_fill_super+0x1fec/0x2d90
>>>    get_tree_bdev+0x140/0x1d8
>>>    ext4_get_tree+0x18/0x24
>>>    vfs_get_tree+0x28/0xe8
>>>    path_mount+0x3e8/0xb7c
>>>    init_mount+0x68/0xac
>>>    do_mount_root+0x108/0x1dc
>>>    mount_root_generic+0x100/0x330
>>>    mount_root+0x160/0x2d0
>>>    initrd_load+0x1f0/0x2a0
>>>    prepare_namespace+0x4c/0x29c
>>>    kernel_init_freeable+0x4b4/0x50c
>>>    kernel_init+0x20/0x1d8
>>>    ret_from_fork+0x10/0x20
>>> irq event stamp: 1305682
>>> hardirqs last  enabled at (1305681): [<ffff8000800e332c>]
>>> console_unlock+0x124/0x130
>>> hardirqs last disabled at (1305682): [<ffff80008124e684>] el1_dbg+0x24/0x8c
>>> softirqs last  enabled at (1305678): [<ffff80008005be1c>]
>>> handle_softirqs+0x4cc/0x4e4
>>> softirqs last disabled at (1305665): [<ffff8000800105b0>]
>>> __do_softirq+0x14/0x20
>>> ---[ end trace 0000000000000000 ]---
>>>
>>> This "bus limit 0" seems to be a bit suspicious to me as well as the
>>> fact that swiotlb is used for the MMC DMA. I will investigate this
>>> further tomorrow. The board boots fine though.
>> Looking at the code I guess that bus_dma_limit set to 0 means no bus
>> limit. But dma_mask for your device indicates 32-bit device limit. This
>> can't work with address above 4GB. For some reason DMA code tries to
>> allocate from higher address. This is most likely the reason
>> dma_capable() returns false.
>
> Indeed this looks like a source of the problem:
>
> [    3.123618] Synopsys Designware Multimedia Card Interface Driver
> [    3.139653] dwmmc_rockchip fe2b0000.mmc: IDMAC supports 32-bit 
> address mode.
> [    3.147739] dwmmc_rockchip fe2b0000.mmc: Using internal DMA controller.
> [    3.161659] dwmmc_rockchip fe2b0000.mmc: Version ID is 270a
> [    3.168455] dwmmc_rockchip fe2b0000.mmc: DW MMC controller at irq 
> 56,32 bit host data width,256 deep fifo
> [    3.182651] dwmmc_rockchip fe2b0000.mmc: Got CD GPIO
>
> ...
>
> [   11.009258] ------------[ cut here ]------------
> [   11.014762] dwmmc_rockchip fe2b0000.mmc: swiotlb addr 
> 0x00000001faf00000+4096 overflow (mask ffffffff, bus limit 0).
>
>
>> ...
>
> Best regards

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@...s.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ