[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <670dd20098d9d_3f142943d@dwillia2-xfh.jf.intel.com.notmuch>
Date: Mon, 14 Oct 2024 19:22:56 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Ming Lei <ming.lei@...hat.com>, Robin Murphy <robin.murphy@....com>
CC: Christoph Hellwig <hch@....de>, Hannes Reinecke <hare@...e.de>, "Hamza
Mahfooz" <someguy@...ective-light.com>, Dan Williams
<dan.j.williams@...el.com>, <linux-block@...r.kernel.org>,
<io-uring@...r.kernel.org>, <linux-raid@...r.kernel.org>,
<iommu@...ts.linux.dev>, <linux-kernel@...r.kernel.org>
Subject: Re: [Report] annoyed dma debug warning "cacheline tracking EEXIST,
overlapping mappings aren't supported"
Ming Lei wrote:
> On Mon, Oct 14, 2024 at 07:09:08PM +0100, Robin Murphy wrote:
> > On 14/10/2024 8:58 am, Ming Lei wrote:
> > > On Mon, Oct 14, 2024 at 09:41:51AM +0200, Christoph Hellwig wrote:
> > > > On Mon, Oct 14, 2024 at 09:23:14AM +0200, Hannes Reinecke wrote:
> > > > > > 3) some storage utilities
> > > > > > - dm thin provisioning utility of thin_check
> > > > > > - `dt`(https://github.com/RobinTMiller/dt)
> > > > > >
> > > > > > I looks like same user buffer is used in more than 1 dio.
> > > > > >
> > > > > > 4) some self cooked test code which does same thing with 1)
> > > > > >
> > > > > > In storage stack, the buffer provider is far away from the actual DMA
> > > > > > controller operating code, which doesn't have the knowledge if
> > > > > > DMA_ATTR_SKIP_CPU_SYNC should be set.
> > > > > >
> > > > > > And suggestions for avoiding this noise?
> > > > > >
> > > > > Can you check if this is the NULL page? Operations like 'discard' will
> > > > > create bios with several bvecs all pointing to the same NULL page.
> > > > > That would be the most obvious culprit.
> > > >
> > > > The only case I fully understand without looking into the details
> > > > is raid1, and that will obviously map the same data multiple times
> > >
> > > The other cases should be concurrent DIOs on same userspace buffer.
> >
> > active_cacheline_insert() does already bail out for DMA_TO_DEVICE, so it
> > returning -EEXIST to tickle the warning would seem to genuinely imply these
> > are DMA mappings requesting to *write* the same cacheline concurrently,
> > which is indeed broken in general.
>
> The two io_uring tests are READ, and the dm thin_check are READ too.
"READ from the device" == "WRITE to the page" (DMA_FROM_DEVICE).
> For the raid1 case, the warning is from raid1_sync_request() which may
> have both READ/WRITE IO.
I don't see an easy way out of this without instrumenting archs that
can not support overlapping mappings to opt-in to bounce buffering for
these cases.
Archs that can support this can skip the opt-in and quiet this test, but
some of the value is being able to catch boundary conditions on more
widely available systems.
Powered by blists - more mailing lists