lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15a058ba-3b51-46f3-bb1c-23792d100b55@linux.intel.com>
Date: Tue, 9 Jan 2024 13:54:27 +0800
From: Ethan Zhao <haifeng.zhao@...ux.intel.com>
To: Robin Murphy <robin.murphy@....com>, Ido Schimmel <idosch@...sch.org>
Cc: joro@...tes.org, will@...nel.org, iommu@...ts.linux.dev,
 linux-kernel@...r.kernel.org, zhangzekun11@...wei.com,
 john.g.garry@...cle.com, dheerajkumar.srivastava@....com, jsnitsel@...hat.com
Subject: Re: [PATCH v3 0/2] iommu/iova: Make the rcache depot properly
 flexible


On 1/9/2024 1:35 AM, Robin Murphy wrote:
> On 2023-12-28 12:23 pm, Ido Schimmel wrote:
>> On Tue, Sep 12, 2023 at 05:28:04PM +0100, Robin Murphy wrote:
>>> v2: 
>>> https://lore.kernel.org/linux-iommu/cover.1692641204.git.robin.murphy@arm.com/
>>>
>>> Hi all,
>>>
>>> I hope this is good to go now, just fixed the locking (and threw
>>> lockdep at it to confirm, which of course I should have done to begin
>>> with...) and picked up tags.
>>
>> Hi,
>>
>> After pulling the v6.7 changes we started seeing the following memory
>> leaks [1] of 'struct iova_magazine'. I'm not sure how to reproduce it,
>> which is why I didn't perform bisection. However, looking at the
>> mentioned code paths, they seem to have been changed in v6.7 as part of
>> this patchset. I reverted both patches and didn't see any memory leaks
>> when running a full regression (~10 hours), but I will repeat it to be
>> sure.
>>
>> Any idea what could be the problem?
>
> Hmm, we've got what looks to be a set of magazines forming a plausible 
> depot list (or at least the tail end of one):
>
> ffff8881411f9000 -> ffff8881261c1000
>
> ffff8881261c1000 -> ffff88812be26400
>
> ffff88812be26400 -> ffff8188392ec000
>
> ffff8188392ec000 -> ffff8881a5301000
>
> ffff8881a5301000 -> NULL
>
> which I guess has somehow become detached from its rcache->depot 
> without being freed properly? However I'm struggling to see any 
> conceivable way that could happen which wouldn't already be more 
> severely broken in other ways as well (i.e. either general memory 
> corruption or someone somehow still trying to use the IOVA domain 
> while it's being torn down).
>
> Out of curiosity, does reverting just patch #2 alone make a 
> difference? And is your workload doing anything "interesting" in 
> relation to IOVA domain lifetimes, like creating and destroying SR-IOV 
> virtual functions, changing IOMMU domain types via sysfs, or using 
> that horrible vdpa thing, or are you seeing this purely from regular 
> driver DMA API usage?

There no lock held when free_iova_rcaches(), is it possible 
free_iova_rcaches() race with the delayed iova_depot_work_func() ?

I don't know why not call cancel_delayed_work_sync(&rcache->work); first 
in free_iova_rcaches() to avoid possible race.


Thanks,

Ethan

>
> Thanks,
> Robin.
>
>>
>> Thanks
>>
>> [1]
>> unreferenced object 0xffff8881a5301000 (size 1024):
>>    comm "softirq", pid 0, jiffies 4306297099 (age 462.991s)
>>    hex dump (first 32 bytes):
>>      00 00 00 00 00 00 00 00 e7 7d 05 00 00 00 00 00 .........}......
>>      0f b4 05 00 00 00 00 00 b4 96 05 00 00 00 00 00 ................
>>    backtrace:
>>      [<ffffffff819f5f08>] __kmem_cache_alloc_node+0x1e8/0x320
>>      [<ffffffff818a239a>] kmalloc_trace+0x2a/0x60
>>      [<ffffffff8231d31e>] free_iova_fast+0x28e/0x4e0
>>      [<ffffffff82310860>] fq_ring_free_locked+0x1b0/0x310
>>      [<ffffffff8231225d>] fq_flush_timeout+0x19d/0x2e0
>>      [<ffffffff813e95ba>] call_timer_fn+0x19a/0x5c0
>>      [<ffffffff813ea16b>] __run_timers+0x78b/0xb80
>>      [<ffffffff813ea5bd>] run_timer_softirq+0x5d/0xd0
>>      [<ffffffff82f1d915>] __do_softirq+0x205/0x8b5
>>
>> unreferenced object 0xffff8881392ec000 (size 1024):
>>    comm "softirq", pid 0, jiffies 4306326731 (age 433.359s)
>>    hex dump (first 32 bytes):
>>      00 10 30 a5 81 88 ff ff 50 ff 0f 00 00 00 00 00 ..0.....P.......
>>      f3 99 05 00 00 00 00 00 87 b7 05 00 00 00 00 00 ................
>>    backtrace:
>>      [<ffffffff819f5f08>] __kmem_cache_alloc_node+0x1e8/0x320
>>      [<ffffffff818a239a>] kmalloc_trace+0x2a/0x60
>>      [<ffffffff8231d31e>] free_iova_fast+0x28e/0x4e0
>>      [<ffffffff82310860>] fq_ring_free_locked+0x1b0/0x310
>>      [<ffffffff8231225d>] fq_flush_timeout+0x19d/0x2e0
>>      [<ffffffff813e95ba>] call_timer_fn+0x19a/0x5c0
>>      [<ffffffff813ea16b>] __run_timers+0x78b/0xb80
>>      [<ffffffff813ea5bd>] run_timer_softirq+0x5d/0xd0
>>      [<ffffffff82f1d915>] __do_softirq+0x205/0x8b5
>>
>> unreferenced object 0xffff8881411f9000 (size 1024):
>>    comm "softirq", pid 0, jiffies 4306708887 (age 51.459s)
>>    hex dump (first 32 bytes):
>>      00 10 1c 26 81 88 ff ff 2c 96 05 00 00 00 00 00 ...&....,.......
>>      ac fe 0f 00 00 00 00 00 a6 fe 0f 00 00 00 00 00 ................
>>    backtrace:
>>      [<ffffffff819f5f08>] __kmem_cache_alloc_node+0x1e8/0x320
>>      [<ffffffff818a239a>] kmalloc_trace+0x2a/0x60
>>      [<ffffffff8231d31e>] free_iova_fast+0x28e/0x4e0
>>      [<ffffffff82310860>] fq_ring_free_locked+0x1b0/0x310
>>      [<ffffffff8231225d>] fq_flush_timeout+0x19d/0x2e0
>>      [<ffffffff813e95ba>] call_timer_fn+0x19a/0x5c0
>>      [<ffffffff813ea16b>] __run_timers+0x78b/0xb80
>>      [<ffffffff813ea5bd>] run_timer_softirq+0x5d/0xd0
>>      [<ffffffff82f1d915>] __do_softirq+0x205/0x8b5
>>
>> unreferenced object 0xffff88812be26400 (size 1024):
>>    comm "softirq", pid 0, jiffies 4306710027 (age 50.319s)
>>    hex dump (first 32 bytes):
>>      00 c0 2e 39 81 88 ff ff 32 ab 05 00 00 00 00 00 ...9....2.......
>>      e3 ac 05 00 00 00 00 00 1f b6 05 00 00 00 00 00 ................
>>    backtrace:
>>      [<ffffffff819f5f08>] __kmem_cache_alloc_node+0x1e8/0x320
>>      [<ffffffff818a239a>] kmalloc_trace+0x2a/0x60
>>      [<ffffffff8231d31e>] free_iova_fast+0x28e/0x4e0
>>      [<ffffffff82310860>] fq_ring_free_locked+0x1b0/0x310
>>      [<ffffffff8231225d>] fq_flush_timeout+0x19d/0x2e0
>>      [<ffffffff813e95ba>] call_timer_fn+0x19a/0x5c0
>>      [<ffffffff813ea16b>] __run_timers+0x78b/0xb80
>>      [<ffffffff813ea5bd>] run_timer_softirq+0x5d/0xd0
>>      [<ffffffff82f1d915>] __do_softirq+0x205/0x8b5
>>
>> unreferenced object 0xffff8881261c1000 (size 1024):
>>    comm "softirq", pid 0, jiffies 4306711547 (age 48.799s)
>>    hex dump (first 32 bytes):
>>      00 64 e2 2b 81 88 ff ff c0 7c 05 00 00 00 00 00 .d.+.....|......
>>      87 a5 05 00 00 00 00 00 0e 9a 05 00 00 00 00 00 ................
>>    backtrace:
>>      [<ffffffff819f5f08>] __kmem_cache_alloc_node+0x1e8/0x320
>>      [<ffffffff818a239a>] kmalloc_trace+0x2a/0x60
>>      [<ffffffff8231d31e>] free_iova_fast+0x28e/0x4e0
>>      [<ffffffff82310860>] fq_ring_free_locked+0x1b0/0x310
>>      [<ffffffff8231225d>] fq_flush_timeout+0x19d/0x2e0
>>      [<ffffffff813e95ba>] call_timer_fn+0x19a/0x5c0
>>      [<ffffffff813ea16b>] __run_timers+0x78b/0xb80
>>      [<ffffffff813ea5bd>] run_timer_softirq+0x5d/0xd0
>>      [<ffffffff82f1d915>] __do_softirq+0x205/0x8b5
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ