lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a3d331e9-d17e-9135-68c7-8e3e10df184d@nvidia.com>
Date:   Thu, 2 Jul 2020 23:01:54 -0700
From:   James Jones <jajones@...dia.com>
To:     Daniel Stone <daniel@...ishbar.org>
CC:     "Kirill A. Shutemov" <kirill@...temov.name>,
        LKML <linux-kernel@...r.kernel.org>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        Ben Skeggs <bskeggs@...hat.com>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [git pull] drm for 5.8-rc1

On 7/2/20 2:14 PM, James Jones wrote:
> On 7/2/20 1:22 AM, Daniel Stone wrote:
>> Hi,
>>
>> On Wed, 1 Jul 2020 at 20:45, James Jones <jajones@...dia.com> wrote:
>>> OK, I think I see what's going on.  In the Xorg modesetting driver, the
>>> logic is basically:
>>>
>>> if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) {
>>>     drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm));
>>> } else {
>>>     drmModeAddFB(...);
>>> }
>>
>> I read this thread expecting to explain the correct behaviour we
>> implement in Weston and how modesetting needs to be fixed, but ...
>> that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we
>> used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'.
> 
> Yes, the hazards of reporting findings before verifying.  I now see 
> modesetting does query the DRM-KMS modifiers and attempt to allocate 
> with them if it found any.  However, I still see a lot of ways things 
> can go wrong, but I'm not going to share my speculation again until I've 
> actually verified it, which is taking a frustratingly long time.  The 
> modesetting driver is not my friend right now.

OK, several hours of dumb build+config mistakes later, I was actually 
able to reproduce the failure and walk through things.  There is a 
trivial fix for the issues in the X modesetting driver, working off 
Daniel Stone's claim that gbm_bo_get_modifier() should only be called 
when the allocation was made with gbm_bo_create_with_modifiers(). 
modeset doesn't respect that requirement now in the case that the atomic 
modesetting path is disabled, which is always the case currently because 
that path is broken.  Respecting that requirement is a half-liner and 
allows X to start properly.

If I force modeset to use the atomic path, X still fails to start with 
the above fix, validating the second theory I'd had:

-Current Mesa nouveau code basically ignores the modifier list passed in 
unless it is a single modifier requesting linear layout, and goes about 
allocating whatever layout it sees fit, and succeeds the allocation 
despite being passed a list of modifiers it knows nothing about.  Not 
great, fixed in my pending patches, obviously doesn't help existing 
deployed userspace.

-Current Mesa nouveau code, when asked what modifier it used for the 
above allocation, returns one of the "legacy" modifiers nouveau DRM-KMS 
knows nothing about.

-When the modeset driver tries to create an FB for that BO with the 
returned modifier, the nouveau kernel driver of course refuses.

I think it's probably worth fixing the modesetting driver for the 
reasons Daniel Vetter mentioned.  Then if I get my Mesa patches in 
before a new modesetting driver with working Atomic support is released, 
there'll be no need for long-term workarounds in the kernel.

Down to the real question of what to do in the kernel to support current 
userspace code: I still think the best fix is to accept the old 
modifiers but not advertise them.  However, Daniel Stone and others, if 
you think this will actually break userspace in other ways (Could you 
describe in a bit more detail or point me to test cases if so?), I 
suppose the only option would be to advertise & accept the old modifiers 
for now, and I suppose at a config option at some point to phase the old 
ones out, eventually drop them entirely.  This would be unfortunate, 
because as I mentioned, it could sometimes result in situations where 
apps think they can share a buffer between two devices but will get 
garbled data in practice.

I've included an initial version of the kernel patch inline below. 
Needs more testing, but I wanted to share it in case anyone has feedback 
on the idea, wants to see the general workflow, or wants to help test.

>>> There's no attempt to verify the DRM-KMS device supports the modifier,
>>> but then, why would there be?  GBM presumably chose a supported modifier
>>> at buffer creation time, and we don't know which plane the FB is going
>>> to be used with yet.  GBM doesn't actually ask the kernel which
>>> modifiers it supports here either though.
>>
>> Right, it doesn't ask, because userspace tells it which modifiers to
>> use. The correct behaviour is to take the list from the KMS
>> `IN_FORMATS` property and then pass that to
>> `gbm_(bo|surface)_create_with_modifiers`; GBM must then select from
>> that list and only that list. If that call does not succeed and Xorg
>> falls back to `gbm_surface_create`, then it must not call
>> `gbm_bo_get_modifier` - so that would be a modesetting bug. If that
>> call does succeed and `gbm_bo_get_modifier` subsequently reports a
>> modifier which was not in the list, that's a Mesa driver bug.
>>
>>> It just goes into Mesa via
>>> DRI and reports the modifier (unpatched) Mesa chose on its own.  Mesa
>>> just hard-codes the modifiers in its driver backends since its thinking
>>> in terms of a device's 3D engine, not display.  In theory, Mesa's DRI
>>> drivers could query KMS for supported modifiers if allocating from GBM
>>> using the non-modifiers path and the SCANOUT flag is set (perhaps some
>>> drivers do this or its equivalent?  Haven't checked.), but that seems
>>> pretty gnarly and doesn't fix the modifier-based GBM allocation path
>>> AFAIK.  Bit of a mess.
>>
>> Two options for GBM users:
>> * call gbm_*_create_with_modifiers, it succeeds, call
>> gbm_bo_get_modifier, pass modifier into AddFB
>> * call gbm_*_create (without modifiers), it succeeds, do not call
>> gbm_bo_get_modifier, do not pass a modifier into AddFB
>>
>> Anything else is a bug in the user. Note that falling back from 1 to 2
>> is fine: if `gbm_*_create_with_modifiers()` fails, you can fall back
>> to the non-modifier path, provided you don't later try to get a
>> modifier back out.
>>
>>> For a quick userspace fix that could probably be pushed out everywhere
>>> (Only affects Xorg server 1.20+ AFAIK), just retrying
>>> drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on
>>> failure should be sufficient.
>>
>> This would break other drivers.
> 
> I think this could be done in a way that wouldn't, though it wouldn't be 
> quite as simple.  Let's see what the true root cause is first though.
> 
>>> Still need to verify as I'm having
>>> trouble wrangling my Xorg build at the moment and I'm pressed for time.
>>> A more complete fix would be quite involved, as modesetting isn't really
>>> properly plumbed to validate GBM's modifiers against KMS planes, and it
>>> doesn't seem like GBM/Mesa/DRI should be responsible for this as noted
>>> above given the general modifier workflow/design.
>>>
>>> Most importantly, options I've considered for fixing from the kernel 
>>> side:
>>>
>>> -Accept "legacy" modifiers in nouveau in addition to the new modifiers,
>>> though avoid reporting them to userspace as supported to avoid further
>>> proliferation.  This is pretty straightforward.  I'll need to modify
>>> both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set
>>> plane validation logic (nv50_plane_format_mod_supported), but it should
>>> end up just being a few lines of code.
>>
>> I do think that they should also be reported to userspace if they are
>> accepted. Other users can and do look at the modifier list to see if
>> the buffer is acceptable for a given plane, so the consistency is good
>> here. Of course, in Mesa you would want to prioritise the new
>> modifiers over the legacy ones, and not allocate or return the legacy
>> ones unless that was all you were asked for. This would involve
>> tracking the used modifier explicitly through Mesa, rather than
>> throwing it away at alloc time and then later divining it from the
>> tiling mode.
> 
> Reporting them as supported is equivalent to reporting support for a 
> memory layout the chips don't actually support (It corresponds to a 
> valid layout on Tegra chips, but not on discrete NV chips).  This is 
> what the new modifiers are trying to avoid in the first place: Implying 
> buffers can be shared between these Tegra chips and discrete NV GPUs.
> 
> Thanks,
> -James
> 
>> Cheers,
>> Daniel
>>

nouveau: Accept 'legacy' format modifiers

Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.
---
  drivers/gpu/drm/nouveau/nouveau_display.c | 26 +++++++++++++++++++++--
  1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
  		   uint32_t *tile_mode,
  		   uint8_t *kind)
  {
+	struct nouveau_display *disp = nouveau_display(drm->dev);
  	BUG_ON(!tile_mode || !kind);

+	if ((modifier & (0xffull << 12)) == 0ull) {
+		/* Legacy modifier.  Translate to this device's 'kind.' */
+		modifier |= disp->format_modifiers[0] & (0xffull << 12);
+	}
+
  	if (modifier == DRM_FORMAT_MOD_LINEAR) {
  		/* tile_mode will not be used in this case */
  		*tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct 
drm_framebuffer *fb,
  	}
  }

+static const u64 legacy_modifiers[] = {
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+	DRM_FORMAT_MOD_INVALID
+};
+
  static int
  nouveau_validate_decode_mod(struct nouveau_drm *drm,
  			    uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
  	     (disp->format_modifiers[mod] != modifier);
  	     mod++);

-	if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
-		return -EINVAL;
+	if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+		for (mod = 0;
+		     (legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+		     (legacy_modifiers[mod] != modifier);
+		     mod++);
+		if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+			return -EINVAL;
+	}

  	nouveau_decode_mod(drm, modifier, tile_mode, kind);

-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ