[<prev] [next>] [day] [month] [year] [list]
Message-ID: <c9d3a2fa523bef4da14c47bcc588cfd60b6fd6f6.camel@icenowy.me>
Date: Tue, 02 Jul 2024 17:36:55 +0800
From: Icenowy Zheng <uwu@...nowy.me>
To: Christian König <christian.koenig@....com>, Jiaxun
Yang <jiaxun.yang@...goat.com>, Huang Rui <ray.huang@....com>, Maarten
Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard
<mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, David
Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined
when snooping not available
在 2024-07-02星期二的 10:10 +0200,Christian König写道:
> Am 02.07.24 um 10:08 schrieb Christian König:
> > Am 02.07.24 um 03:46 schrieb Icenowy Zheng:
> > > 在 2024-07-01星期一的 13:40 +0200,Christian König写道:
> > > > Am 29.06.24 um 22:51 schrieb Icenowy Zheng:
> > > > > 于 2024年6月30日 GMT+08:00 03:57:47,Jiaxun Yang
> > > > > <jiaxun.yang@...goat.com> 写道:
> > > > > > 在2024年6月29日六月 上午6:22,Icenowy Zheng写道:
> > > > > > [...]
> > > > > > > @@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct
> > > > > > > ttm_buffer_object *bo,
> > > > > > > struct ttm_resource *res,
> > > > > > > caching = res->bus.caching;
> > > > > > > }
> > > > > > >
> > > > > > > + /* Downgrade cached mapping for non-snooping
> > > > > > > devices */
> > > > > > > + if (!bo->bdev->dma_coherent && caching ==
> > > > > > > ttm_cached)
> > > > > > > + caching = ttm_write_combined;
> > > > > > Hi Icenowy,
> > > > > >
> > > > > > Thanks for your patch! You saved many non-coh PCIe host
> > > > > > implementations a day!.
> > > > Ah, wait a second.
> > > >
> > > > Such a thing as non-coherent PCIe implementation doesn't exist.
> > > > The
> > > > PCIe
> > > > specification makes it mandatory for memory access to be cache
> > > > coherent.
> > > Really? I tried to get PCIe spec 2.0, PCI spec 3.0 and PCI-X
> > > addendum
> > > 1.0, none of this explicitly requires the PCIe controller and the
> > > CPU
> > > being fully coherent. The PCI-X spec even says "Note that PCI-X,
> > > like
> > > conventional PCI, does not require systems to support coherent
> > > caches
> > > for addresses accessed by PCI-X requesters".
> >
> > See the very first PCI specification, AGP 2.0 and the PCIe
> > extension
> > for non-snooped access.
> >
> > Originally it wasn't well defined what the PCI 1.0 spec meant with
> > coherency (e.g. snooping vs uncached).
> >
> > AGP was the first specification which explicitly defined that all
> > AGP
> > memory accesses must be non-snooped and all PCI accesses must snoop
> > the CPU caches.
> >
> > PCIe then had an extension which defined the "No Snooping
> > Attribute"
> > to allow emulating the AGP behavior.
> >
> > For the current PCIe 6.1 specification the non-snoop extension was
> > merged into the base specification.
> >
> > Here see section "2.2.6.5 No Snoop Attribute", e.g. "Hardware
> > enforced
> > cache coherency expected"
> >
> > As well as the notes under section 7.5.3.4 Device Control Register:
> >
> > Enable No Snoop - If this bit is Set, the Function is permitted to
> > Set
> > the No Snoop bit in the Requester
> > Attributes of transactions it initiates that do not require
> > hardware
> > enforced cache coherency (see Section 2.2.6.5 ).
> >
> > To summarize it: Not snooping caches is an extension, snooping
> > caches
> > is mandatory.
> >
> > > In addition, in the perspective of Linux, I think bypassing CPU
> > > cache
> > > of shared memory is considered as coherent access too, see
> > > dma_alloc_coherent() function's naming.
> >
> > Yes that's correct, but this is for platform devices. E.g. other
> > I/O
> > from drivers who doesn't need to work with malloced system memory
> > for
> > example.
> >
> > We have quite a bunch of V4L, sound and I also think network
> > devices
> > which work like that. But those are non-PCI devices.
> >
> > > > There are a bunch of non-compliant PCIe implementations which
> > > > have
> > > > broken cache coherency, but those explicitly violate the
> > > > specification
> > > > and because of that are not supported.
> > > Regardless of it violating the spec or not, these devices work
> > > with
> > > Linux subsystems that use dma_alloc_coherent to allocate DMA
> > > buffers
> > > (which is the most common case), and GPU drivers just give out
> > > cryptic
> > > error messages like "ring gfx test failed" without any mention of
> > > coherency issues at all, which makes the fact that Linux DRM/TTM
> > > subsystem currently requires PCIe snooping CPU cache more
> > > obscure.
> >
> > No, they don't even remotely work. You just got very basic tests
> > working.
> >
> > Both the Vulkan as well as the OpenGL specification require that
> > you
> > can import "normal" malloced() system memory into the GPU driver.
> >
> > This is not possible without a cache coherent platform
> > architecture.
> > So you can't fully support those APIs.
> >
> > We exercised this quite extensively already and even have a
> > confirmation from ARM engineers that the approach of attaching just
> > any PCIe root to an ARM IP core is not supported from their side.
> >
> > And if I'm not completely mistaken the RISC-V specification was
> > also
> > updated to disallow stuff like this.
>
> Googling helps:
> https://github.com/riscv/riscv-platform-specs/issues/83
riscv-platform-specs is a deprecated repository, and the specs are
never ratified by RVI. According to its content, it should be moved to
riscv-non-isa/, but the move didn't even happen. The successor of this
specific part, which is reduced to "server" context, is placed at
riscv-non-isa/server-soc, in a more specific form; however server-soc
is also not ratified yet either, and its CCS_040 item uses "SHOULD"
instead of "MUST" on "hardware enforced cache coherency".
>
> Regards,
> Christian.
>
> >
> > So yes you can have boards which implement non-snooped PCIe, but
> > you
> > get exactly zero support from hardware vendors to run software on
> > it.
> >
> > Regards,
> > Christian.
> >
> > > > Regards,
> > > > Christian.
> > > >
> > > > > > Unfortunately I don't think we can safely ttm_cached to
> > > > > > ttm_write_comnined, we've
> > > > > > had enough drama with write combine behaviour on all
> > > > > > different
> > > > > > platforms.
> > > > > >
> > > > > > See drm_arch_can_wc_memory in drm_cache.h.
> > > > > >
> > > > > Yes this really sounds like an issue.
> > > > >
> > > > > Maybe the behavior of ttm_write_combined should furtherly be
> > > > > decided
> > > > > by drm_arch_can_wc_memory() in case of quirks?
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > > > +
> > > > > > > return ttm_prot_from_caching(caching, tmp);
> > > > > > > }
> > > > > > > EXPORT_SYMBOL(ttm_io_prot);
> > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > index 7b00ddf0ce49f..3335df45fba5e 100644
> > > > > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > @@ -152,6 +152,10 @@ static void
> > > > > > > ttm_tt_init_fields(struct
> > > > > > > ttm_tt *ttm,
> > > > > > > enum ttm_caching caching,
> > > > > > > unsigned long
> > > > > > > extra_pages)
> > > > > > > {
> > > > > > > + /* Downgrade cached mapping for non-snooping
> > > > > > > devices */
> > > > > > > + if (!bo->bdev->dma_coherent && caching ==
> > > > > > > ttm_cached)
> > > > > > > + caching = ttm_write_combined;
> > > > > > > +
> > > > > > > ttm->num_pages = (PAGE_ALIGN(bo->base.size) >>
> > > > > > > PAGE_SHIFT) + extra_pages;
> > > > > > > ttm->page_flags = page_flags;
> > > > > > > ttm->dma_address = NULL;
> > > > > > > diff --git a/include/drm/ttm/ttm_caching.h
> > > > > > > b/include/drm/ttm/ttm_caching.h
> > > > > > > index a18f43e93abab..f92d7911f50e4 100644
> > > > > > > --- a/include/drm/ttm/ttm_caching.h
> > > > > > > +++ b/include/drm/ttm/ttm_caching.h
> > > > > > > @@ -47,7 +47,8 @@ enum ttm_caching {
> > > > > > >
> > > > > > > /**
> > > > > > > * @ttm_cached: Fully cached like normal system
> > > > > > > memory,
> > > > > > > requires that
> > > > > > > - * devices snoop the CPU cache on accesses.
> > > > > > > + * devices snoop the CPU cache on accesses.
> > > > > > > Downgraded
> > > > > > > to
> > > > > > > + * ttm_write_combined when the snooping
> > > > > > > capaiblity is
> > > > > > > missing.
> > > > > > > */
> > > > > > > ttm_cached
> > > > > > > };
> > > > > > > --
> > > > > > > 2.45.2
> >
Powered by blists - more mailing lists