lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5ccc62fe-3b44-487b-b807-412921395601@suse.de>
Date: Tue, 2 Jul 2024 10:05:08 +0200
From: Thomas Zimmermann <tzimmermann@...e.de>
To: Linux regressions mailing list <regressions@...ts.linux.dev>,
 "Kaplan, David" <David.Kaplan@....com>
Cc: "Petkov, Borislav" <Borislav.Petkov@....com>,
 "zack.rusin@...adcom.com" <zack.rusin@...adcom.com>,
 "dmitry.osipenko@...labora.com" <dmitry.osipenko@...labora.com>,
 "Koenig, Christian" <Christian.Koenig@....com>,
 Dave Airlie <airlied@...hat.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
 ML dri-devel <dri-devel@...ts.freedesktop.org>,
 "spice-devel@...ts.freedesktop.org" <spice-devel@...ts.freedesktop.org>,
 "virtualization@...ts.linux.dev" <virtualization@...ts.linux.dev>
Subject: Re: [REGRESSION] QXL display malfunction


Am 01.07.24 um 12:02 schrieb Linux regression tracking (Thorsten Leemhuis):
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
>
> Thomas, was there some progress wrt to fixing below regression? I might
> have missed something, but from here it looks like this fall through the
> cracks.

Thanks for reminding.


>
> Makes me wonder if we should temporarily revert this for now to fix this
> for rc7 and ensure things get at least one week of testing before the final.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 14.06.24 15:45, Kaplan, David wrote:
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>>> -----Original Message-----
>>> From: Thomas Zimmermann <tzimmermann@...e.de>
>>> Sent: Wednesday, June 12, 2024 9:26 AM
>>> To: Linux regressions mailing list <regressions@...ts.linux.dev>
>>> Cc: Petkov, Borislav <Borislav.Petkov@....com>;
>>> zack.rusin@...adcom.com; dmitry.osipenko@...labora.com; Kaplan, David
>>> <David.Kaplan@....com>; Koenig, Christian <Christian.Koenig@....com>;
>>> Dave Airlie <airlied@...hat.com>; Maarten Lankhorst
>>> <maarten.lankhorst@...ux.intel.com>; Maxime Ripard
>>> <mripard@...nel.org>; LKML <linux-kernel@...r.kernel.org>; ML dri-devel
>>> <dri-devel@...ts.freedesktop.org>; spice-devel@...ts.freedesktop.org;
>>> virtualization@...ts.linux.dev
>>> Subject: Re: [REGRESSION] QXL display malfunction
>>>
>>> Caution: This message originated from an External Source. Use proper
>>> caution when opening attachments, clicking links, or responding.
>>>
>>>
>>> Hi
>>>
>>> Am 12.06.24 um 14:41 schrieb Linux regression tracking (Thorsten Leemhuis):
>>>> [CCing a few more people and lists that get_maintainers pointed out
>>>> for qxl]
>>>>
>>>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>>>> for once, to make this easily accessible to everyone.
>>>>
>>>> Thomas, from here it looks like this report that apparently is caused
>>>> by a change of yours that went into 6.10-rc1 (b33651a5c98dbd
>>>> ("drm/qxl: Do not pin buffer objects for vmap")) fell through the
>>>> cracks. Or was progress made to resolve this and I just missed this?
>>>>
>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
>>>> hat)
>>>> --
>>>> Everything you wanna know about Linux kernel regression tracking:
>>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>>> If I did something stupid, please tell me, as explained on that page.
>>>>
>>>> #regzbot poke
>>>>
>>>>
>>>> On 03.06.24 04:29, Kaplan, David wrote:
>>>>>> -----Original Message-----
>>>>>> From: Kaplan, David
>>>>>> Sent: Sunday, June 2, 2024 9:25 PM
>>>>>> To: tzimmermann@...e.de; dmitry.osipenko@...labora.com; Koenig,
>>>>>> Christian <Christian.Koenig@....com>; zach.rusin@...adcom.com
>>>>>> Cc: Petkov, Borislav <Borislav.Petkov@....com>;
>>>>>> regressions@...t.linux.dev
>>>>>> Subject: [REGRESSION] QXL display malfunction
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am running an Ubuntu 19.10 VM with a tip kernel using QXL video
>>>>>> and I've observed the VM graphics often malfunction after boot,
>>>>>> sometimes failing to load the Ubuntu desktop or even immediately
>>> shutting the guest down.
>>>>>> When it does load, the guest dmesg log often contains errors like
>>>>>>
>>>>>> [    4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>>> 1
>>>>>> wrong: 65376256x16777216+0+0
>>>>>> [    4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>>> 1
>>>>>> wrong: 65376256x16777216+0+0
>>>>>> [    4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>>> 1
>>>>>> wrong: 65335296x16777216+0+0
>>> I don't see how these messages are related. Did they already appear before
>>> the broken commit was there?
>> No, I did not observe them prior to the broken commit.
>>
>>>>>> [    5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find
>>> id in
>>>>>> release_idr
>>> Is there only one such message in the log? Or multiple/frequent ones.
>> I would usually only see one.
>>
>>> Could you provide a stack trace of what happens before?
>> Here's the top of a backtrace when the error occurs:
>> #0  qxl_release_from_id_locked (qdev=qdev@...ry=0xffff88810126e000, id=id@...ry=262151)
>>      at drivers/gpu/drm/qxl/qxl_release.c:373
>> #1  0xffffffff819f5b6a in qxl_garbage_collect (qdev=0xffff88810126e000)
>>      at drivers/gpu/drm/qxl/qxl_cmd.c:222
>> #2  0xffffffff810e3aa8 in process_one_work (worker=worker@...ry=0xffff888101680300,
>>      work=0xffff88810126f340) at kernel/workqueue.c:3231
>> #3  0xffffffff810e6281 in process_scheduled_works (worker=<optimized out>)
>>      at kernel/workqueue.c:3312
>> #4  worker_thread (__worker=0xffff888101680300) at kernel/workqueue.c:3393
>>
>>> We sometimes draw into the buffer object from the CPU. For accessing the
>>> buffer object's pages from the CPU, only a vmap operation should be
>>> necessary. It appears as if qxl also requires a pin. My guess is that the pin
>>> inserts the buffer-object's host-side pages and the code around
>>> qxl_release_from_id_locked() appears to be garbage-collecting them.
>>> Hence without the pin, the GC complains about inconsistent state.
>>>>>> I bisected the issue down to "drm/qxl: Do not pin buffer objects for
>>> vmap"
>>>>>> (b33651a5c98dbd5a919219d8c129d0674ef74299).
>>> Thanks for bisecting. Does it work if you revert that commit?
>> Yes
>>
>> Thanks --David Kaplan

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ