lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZZ6jG5NyaUpeCpXq@shredder>
Date: Wed, 10 Jan 2024 16:00:59 +0200
From: Ido Schimmel <idosch@...sch.org>
To: Robin Murphy <robin.murphy@....com>
Cc: joro@...tes.org, will@...nel.org, iommu@...ts.linux.dev,
	linux-kernel@...r.kernel.org, zhangzekun11@...wei.com,
	john.g.garry@...cle.com, dheerajkumar.srivastava@....com,
	jsnitsel@...hat.com, Catalin Marinas <catalin.marinas@....com>
Subject: Re: [PATCH v3 0/2] iommu/iova: Make the rcache depot properly
 flexible

On Wed, Jan 10, 2024 at 12:48:06PM +0000, Robin Murphy wrote:
> On 2024-01-09 5:21 pm, Ido Schimmel wrote:
> > Hi Robin,
> > 
> > Thanks for the reply.
> > 
> > On Mon, Jan 08, 2024 at 05:35:26PM +0000, Robin Murphy wrote:
> > > Hmm, we've got what looks to be a set of magazines forming a plausible depot
> > > list (or at least the tail end of one):
> > > 
> > > ffff8881411f9000 -> ffff8881261c1000
> > > 
> > > ffff8881261c1000 -> ffff88812be26400
> > > 
> > > ffff88812be26400 -> ffff8188392ec000
> > > 
> > > ffff8188392ec000 -> ffff8881a5301000
> > > 
> > > ffff8881a5301000 -> NULL
> > > 
> > > which I guess has somehow become detached from its rcache->depot without
> > > being freed properly? However I'm struggling to see any conceivable way that
> > > could happen which wouldn't already be more severely broken in other ways as
> > > well (i.e. either general memory corruption or someone somehow still trying
> > > to use the IOVA domain while it's being torn down).
> > 
> > The machine is running a debug kernel that among other things has KASAN
> > enabled, but there are no traces in the kernel log so there is no memory
> > corruption that I'm aware of.
> > 
> > > Out of curiosity, does reverting just patch #2 alone make a difference?
> > 
> > Will try and let you know.

I can confirm that the issue reproduces when only patch #2 is reverted.
IOW, patch #1 seems to be the problem:

unreferenced object 0xffff8881a1ff3400 (size 1024):
  comm "softirq", pid 0, jiffies 4296362635 (age 3540.420s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 67 b7 05 00 00 00 00 00  ........g.......
    3f a6 05 00 00 00 00 00 93 99 05 00 00 00 00 00  ?...............
  backtrace:
    [<ffffffff819f7a68>] __kmem_cache_alloc_node+0x1e8/0x320
    [<ffffffff818a3efa>] kmalloc_trace+0x2a/0x60
    [<ffffffff8231f8f3>] free_iova_fast+0x293/0x460
    [<ffffffff823132f0>] fq_ring_free_locked+0x1b0/0x310
    [<ffffffff82314ced>] fq_flush_timeout+0x19d/0x2e0
    [<ffffffff813e97da>] call_timer_fn+0x19a/0x5c0
    [<ffffffff813ea38b>] __run_timers+0x78b/0xb80
    [<ffffffff813ea7dd>] run_timer_softirq+0x5d/0xd0
    [<ffffffff82f21605>] __do_softirq+0x205/0x8b5
unreferenced object 0xffff888165b9a800 (size 1024):
  comm "softirq", pid 0, jiffies 4299383627 (age 519.460s)
  hex dump (first 32 bytes):
    00 34 ff a1 81 88 ff ff bd 9d 05 00 00 00 00 00  .4..............
    f3 ab 05 00 00 00 00 00 37 b5 05 00 00 00 00 00  ........7.......
  backtrace:
    [<ffffffff819f7a68>] __kmem_cache_alloc_node+0x1e8/0x320
    [<ffffffff818a3efa>] kmalloc_trace+0x2a/0x60
    [<ffffffff8231f8f3>] free_iova_fast+0x293/0x460
    [<ffffffff823132f0>] fq_ring_free_locked+0x1b0/0x310
    [<ffffffff82314ced>] fq_flush_timeout+0x19d/0x2e0
    [<ffffffff813e97da>] call_timer_fn+0x19a/0x5c0
    [<ffffffff813ea38b>] __run_timers+0x78b/0xb80
    [<ffffffff813ea7dd>] run_timer_softirq+0x5d/0xd0
    [<ffffffff82f21605>] __do_softirq+0x205/0x8b5

> > 
> > > And is your workload doing anything "interesting" in relation to IOVA
> > > domain lifetimes, like creating and destroying SR-IOV virtual
> > > functions, changing IOMMU domain types via sysfs, or using that
> > > horrible vdpa thing, or are you seeing this purely from regular driver
> > > DMA API usage?
> > 
> > The machine is running networking related tests, but it is not using
> > SR-IOV, VMs or VDPA so there shouldn't be anything "interesting" as far
> > as IOMMU is concerned.
> > 
> > The two networking drivers on the machine are "igb" for the management
> > port and "mlxsw" for the data ports (the machine is a physical switch).
> > I believe the DMA API usage in the latter is quite basic and I don't
> > recall any DMA related problems with this driver since it was first
> > accepted upstream in 2015.
> 
> Thanks for the clarifications, that seems to rule out all the most
> confusingly impossible scenarios, at least.
> 
> The best explanation I've managed to come up with is a false-positive race
> dependent on the order in which kmemleak scans the relevant objects. Say we
> have the list as depot -> A -> B -> C; the rcache object is scanned and sees
> the pointer to magazine A, but then A is popped *before* kmemleak scans it,
> such that when it is then scanned, its "next" pointer has already been
> wiped, thus kmemleak never observes any reference to B, so it appears that B
> and (transitively) C are "leaked". If that is the case, then I'd expect it
> should be reproducible with patch #1 alone (although patch #2 might make it
> slightly more likely if the work ever does result in additional pops
> happening), but I'd expect the leaked objects to be transient and not
> persist forever through repeated scans (what I don't know is whether
> kmemleak automatically un-leaks an object if it subsequently finds a new
> reference, or if it needs manually clearing in between scans). I'm not sure
> if there's a nice way to make that any better... unless maybe it might make
> sense to call kmemleak_not_leak(mag->next) in iova_depot_pop() before that
> reference disappears?

I'm not familiar with the code so I can't comment if that's the best
solution, but I will say that we've been running kmemleak as part of our
regression for years and every time we got a report it was an actual
memory leak. Therefore, in order to keep the tool reliable, I think it's
better to annotate the code to suppress false-positives rather than
ignoring it.

Please let me know if you want me to test a fix.

Thanks for looking into this!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ