lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YZJtsV15k6mrBt1f@phenom.ffwll.local>
Date:   Mon, 15 Nov 2021 15:24:49 +0100
From:   Daniel Vetter <daniel@...ll.ch>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Sudip Mukherjee <sudipm.mukherjee@...il.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Gurchetan Singh <gurchetansingh@...omium.org>,
        Nicholas Verne <nverne@...omium.org>,
        Gerd Hoffmann <kraxel@...hat.com>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        "open list:VIRTIO GPU DRIVER" 
        <virtualization@...ts.linux-foundation.org>,
        Daniel Vetter <daniel@...ll.ch>
Subject: Re: regression with mainline kernel

On Sat, Nov 13, 2021 at 09:06:57AM -0800, Linus Torvalds wrote:
> [ Hmm. This email was marked as spam for me. I see no obvious reason
> for it being marked as spam, but it happens.. ]
> 
> On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee
> <sudipm.mukherjee@...il.com> wrote:
> >
> > # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0]
> > drm/virtio: implement context init: add virtio_gpu_fence_event
> 
> Hmm. Judging from your automated screenshots, the login never appears.

Greg also reported a regression, plus I looked at the commit and had
questions too.

> > And, indeed reverting cd7f5ca33585 on top of debe436e77c7 has fixed
> > the problem I was seeing on my qemu test of x86_64. The qemu image is
> > based on Ubuntu.
> 
> Presumably either that commit is somehow buggy in itself - or it does
> exactly what it means to do, and the new poll() semantics just
> confuses the heck out of the X server (or wayland or whatever).
> 
> And honestly, if I read that thing correctly, the patch is entirely
> broken. The new poll function (virtio_gpu_poll()) will unconditionally
> remove the first event from the event list, and then report "Yeah, I
> had events".
> 
> This is completely bogus for a few reasons:
> 
>  - poll() really should be idempotent, because the poll function gets
> called multiple times
> 
>  - it doesn't even seem to check that the event that it removes is the
> new VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL kind of event, so it will
> unconditionally just remove random events.
> 
>  - it does seem to check the "vfpriv->ring_idx_mask" and do the old
> thing if that is zero, but I see absolutely no reason for that (and
> that check itself has caused problems, see below)
> 
> Honestly, my reaction to this all is that that commit is fundamentally
> broken and probably should be reverted regardless as "this commit does
> bad things".
> 
> HOWEVER - it has had a fix for a NULL pointer dereference in the
> meantime - can you check whether the current top of tree happens to
> work for you? Maybe your problem isn't due to "that commit does
> unnatural things", but simply due to the bug fixed in d89c0c8322ec
> ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll").
> 
> And if it's still broken with that commit, I'll happily revert it and
> people need to go back to the drawing board.
> 
> In fact, I would really suggest that people look at that
> virtio_gpu_poll() function regardless. That odd "let's unconditionally
> just drop events in the poll function is really REALLY broken
> behavior.

Tbh I haven't looked at the poll implementation, but it's fishy on
principles as in drm drivers shouldn't reinvent them. The commit message
cites vmwgfx as example for a private driver drm_event, but that works
perfectly fine with standard drm_poll (and is meant to work perfectly fine
with standard drm_poll).

So if it's buggy on top of questionable I think revert might be the right
choice irrespective of whether there's some fixes in-flight.

So if you end up pushing that revert:

References: https://lore.kernel.org/dri-devel/YZJrutLaiwozLfSw@phenom.ffwll.local/
Acked-by: Daniel Vetter <daniel.vetter@...ll.ch>

Plus credit Greg too and all that ofc.

But lets wait a bit for virtio folks to chime in, it's only Monday :-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ