lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6303afecce2dff9e7d30f67e0a74205256e0a524.camel@icenowy.me>
Date: Tue, 02 Jul 2024 09:46:10 +0800
From: Icenowy Zheng <uwu@...nowy.me>
To: Christian König <christian.koenig@....com>, Jiaxun
 Yang <jiaxun.yang@...goat.com>, Huang Rui <ray.huang@....com>, Maarten
 Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard
 <mripard@...nel.org>,  Thomas Zimmermann <tzimmermann@...e.de>, David
 Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined
 when snooping not available

在 2024-07-01星期一的 13:40 +0200,Christian König写道:
> Am 29.06.24 um 22:51 schrieb Icenowy Zheng:
> > 
> > 于 2024年6月30日 GMT+08:00 03:57:47,Jiaxun Yang
> > <jiaxun.yang@...goat.com> 写道:
> > > 
> > > 在2024年6月29日六月 上午6:22,Icenowy Zheng写道:
> > > [...]
> > > > @@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct
> > > > ttm_buffer_object *bo,
> > > > struct ttm_resource *res,
> > > >                 caching = res->bus.caching;
> > > >         }
> > > > 
> > > > +       /* Downgrade cached mapping for non-snooping devices */
> > > > +       if (!bo->bdev->dma_coherent && caching == ttm_cached)
> > > > +               caching = ttm_write_combined;
> > > Hi Icenowy,
> > > 
> > > Thanks for your patch! You saved many non-coh PCIe host
> > > implementations a day!.
> 
> Ah, wait a second.
> 
> Such a thing as non-coherent PCIe implementation doesn't exist. The
> PCIe 
> specification makes it mandatory for memory access to be cache
> coherent.

Really? I tried to get PCIe spec 2.0, PCI spec 3.0 and PCI-X addendum
1.0, none of this explicitly requires the PCIe controller and the CPU
being fully coherent. The PCI-X spec even says "Note that PCI-X, like
conventional PCI, does not require systems to support coherent caches
for addresses accessed by PCI-X requesters".

In addition, in the perspective of Linux, I think bypassing CPU cache
of shared memory is considered as coherent access too, see
dma_alloc_coherent() function's naming.

> 
> There are a bunch of non-compliant PCIe implementations which have 
> broken cache coherency, but those explicitly violate the
> specification 
> and because of that are not supported.

Regardless of it violating the spec or not, these devices work with
Linux subsystems that use dma_alloc_coherent to allocate DMA buffers
(which is the most common case), and GPU drivers just give out cryptic
error messages like "ring gfx test failed" without any mention of
coherency issues at all, which makes the fact that Linux DRM/TTM
subsystem currently requires PCIe snooping CPU cache more obscure.

> 
> Regards,
> Christian.
> 
> > > 
> > > Unfortunately I don't think we can safely ttm_cached to
> > > ttm_write_comnined, we've
> > > had enough drama with write combine behaviour on all different
> > > platforms.
> > > 
> > > See drm_arch_can_wc_memory in drm_cache.h.
> > > 
> > Yes this really sounds like an issue.
> > 
> > Maybe the behavior of ttm_write_combined should furtherly be
> > decided
> > by drm_arch_can_wc_memory() in case of quirks?
> > 
> > > Thanks
> > > 
> > > > +
> > > >         return ttm_prot_from_caching(caching, tmp);
> > > >   }
> > > >   EXPORT_SYMBOL(ttm_io_prot);
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > index 7b00ddf0ce49f..3335df45fba5e 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct
> > > > ttm_tt *ttm,
> > > >                                enum ttm_caching caching,
> > > >                                unsigned long extra_pages)
> > > >   {
> > > > +       /* Downgrade cached mapping for non-snooping devices */
> > > > +       if (!bo->bdev->dma_coherent && caching == ttm_cached)
> > > > +               caching = ttm_write_combined;
> > > > +
> > > >         ttm->num_pages = (PAGE_ALIGN(bo->base.size) >>
> > > > PAGE_SHIFT) + extra_pages;
> > > >         ttm->page_flags = page_flags;
> > > >         ttm->dma_address = NULL;
> > > > diff --git a/include/drm/ttm/ttm_caching.h
> > > > b/include/drm/ttm/ttm_caching.h
> > > > index a18f43e93abab..f92d7911f50e4 100644
> > > > --- a/include/drm/ttm/ttm_caching.h
> > > > +++ b/include/drm/ttm/ttm_caching.h
> > > > @@ -47,7 +47,8 @@ enum ttm_caching {
> > > > 
> > > >         /**
> > > >          * @ttm_cached: Fully cached like normal system memory,
> > > > requires that
> > > > -        * devices snoop the CPU cache on accesses.
> > > > +        * devices snoop the CPU cache on accesses. Downgraded
> > > > to
> > > > +        * ttm_write_combined when the snooping capaiblity is
> > > > missing.
> > > >          */
> > > >         ttm_cached
> > > >   };
> > > > -- 
> > > > 2.45.2
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ