linux-kernel - RE: [PATCH v2 1/1] swiotlb: Track and report io_tlb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BYAPR21MB1688983A72B114BBC3AA1DE2D7889@BYAPR21MB1688.namprd21.prod.outlook.com>
Date:   Tue, 28 Mar 2023 14:29:03 +0000
From:   "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To:     Petr Tesařík <petr@...arici.cz>
CC:     Christoph Hellwig <hch@...radead.org>, "hch@....de" <hch@....de>,
        "m.szyprowski@...sung.com" <m.szyprowski@...sung.com>,
        "robin.murphy@....com" <robin.murphy@....com>,
        Dexuan Cui <decui@...rosoft.com>,
        Tianyu Lan <Tianyu.Lan@...rosoft.com>,
        "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 1/1] swiotlb: Track and report io_tlb_used high water
 mark in debugfs

From: Petr Tesařík <petr@...arici.cz> Sent: Tuesday, March 28, 2023 6:50 AM
> 
> On Tue, 28 Mar 2023 13:12:13 +0000
> "Michael Kelley (LINUX)" <mikelley@...rosoft.com> wrote:
> 
> > From: Christoph Hellwig <hch@...radead.org> Sent: Monday, March 27, 2023 6:34
> PM
> > >
> > > On Sat, Mar 25, 2023 at 10:53:10AM -0700, Michael Kelley wrote:
> > > > @@ -659,6 +663,14 @@ static int swiotlb_do_find_slots(struct device *dev, int
> > > area_index,
> > > >   area->index = wrap_area_index(mem, index + nslots);
> > > >   area->used += nslots;
> > > >   spin_unlock_irqrestore(&area->lock, flags);
> > > > +
> > > > + new_used = atomic_long_add_return(nslots, &total_used);
> > > > + old_hiwater = atomic_long_read(&used_hiwater);
> > > > + do {
> > > > +         if (new_used <= old_hiwater)
> > > > +                 break;
> > > > + } while (!atomic_long_try_cmpxchg(&used_hiwater, &old_hiwater, new_used));
> > > > +
> > > >   return slot_index;
> > >
> > > Hmm, so we're right in the swiotlb hot path here and add two new global
> > > atomics?
> >
> > It's only one global atomic, except when the high water mark needs to be
> > bumped.  That results in an initial transient of doing the second global
> > atomic, but then it won't be done unless there's a spike in usage or the
> > high water mark is manually reset to zero.  Of course, there's a similar
> > global atomic subtract when the slots are released.
> >
> > Perhaps this accounting should go under #ifdef CONFIG_DEBUGFS?  Or
> > even add a swiotlb-specific debugfs config option to cover all the swiotlb
> > debugfs code.  From Petr Tesarik's earlier comments, it sounds like there
> > is interest in additional accounting, such as for fragmentation.
> 
> For my purposes, it does not have to be 100% accurate. I don't really
> mind if it is off by a few slots because of a race window, so we could
> (for instance):
> 
> - update a local variable and set the atomic after the loop,
> - or make it a per-cpu to reduce CPU cache bouncing,
> - or just about anything that is less heavy-weight than an atomic
>   CMPXCHG in the inner loop of a slot search.
> 

Perhaps I'm missing your point, but there's no loop here.  The atomic
add is done once per successful slot allocation.  If swiotlb_do_find_slots()
doesn't find any slots for the current area, it exits at the "not_found" label
and the atomic add isn't done.

In the case where the high water mark is bumped, the try_cmpxchg()
is in a loop only to deal with a race condition where another CPU updates
the high water mark first.  The try_cmpxchg() should only rarely be
executed more than once, and again, only when the high water mark
changes.

I thought about tracking the high water mark on a per-CPU basis or
per-area basis, but I don't think the resulting data is useful.  Adding up
the individual high water marks likely significantly over-estimates the
true high water mark.   Is there a clever way to make this useful that I'm
not thinking about?

Tracking the global high water mark using non-atomic add and subtract
could be done, with reduced accuracy.  But I wanted to be tracking the
high water mark over an extended time period (days, or even weeks),
and I don't have a feel for how much accuracy would be lost from
non-atomic arithmetic on the global high water mark.  It would still
be a shared cache line.

Regarding your other email about non-default io_tlb_mem instances,
my patch just extends what is already reported in debugfs, which
is only for the default io_tlb_mem.   The non-default instances seemed
to me to be fairly niche cases that weren't worth the additional
complexity, but maybe I'm wrong about that.

Michael