lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <BN7PR02MB4148C95E563FE49E2652D1F4D4C02@BN7PR02MB4148.namprd02.prod.outlook.com>
Date: Mon, 24 Feb 2025 19:24:49 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Saurabh Singh Sengar <ssengar@...ux.microsoft.com>
CC: "kys@...rosoft.com" <kys@...rosoft.com>, "haiyangz@...rosoft.com"
	<haiyangz@...rosoft.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>,
	"decui@...rosoft.com" <decui@...rosoft.com>, "deller@....de" <deller@....de>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"linux-fbdev@...r.kernel.org" <linux-fbdev@...r.kernel.org>,
	"dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"ssengar@...rosoft.com" <ssengar@...rosoft.com>
Subject: RE: [PATCH] fbdev: hyperv_fb: Allow graceful removal of framebuffer

From: Saurabh Singh Sengar <ssengar@...ux.microsoft.com> Sent: Monday, February 24, 2025 5:30 AM
> 
> On Mon, Feb 24, 2025 at 12:38:22AM +0000, Michael Kelley wrote:
> > From: Saurabh Singh Sengar <ssengar@...ux.microsoft.com> Sent: Sunday, February 23, 2025 6:10 AM
> > >
> > > On Sat, Feb 22, 2025 at 08:16:53PM +0000, Michael Kelley wrote:
> > > > From: Saurabh Singh Sengar <ssengar@...ux.microsoft.com> Sent: Saturday, February 22, 2025 9:27 AM
> > > > >
> >
> > [anip]
> >
> > > > >
> > > > > I had considered moving the entire `hvfb_putmem()` function to `destroy`,
> > > > > but I was hesitant for two reasons:
> > > > >
> > > > >   1. I wasn’t aware of any scenario where this would be useful. However,
> > > > >      your explanation has convinced me that it is necessary.
> > > > >   2. `hvfb_release_phymem()` relies on the `hdev` pointer, which requires
> > > > >      multiple `container_of` operations to derive it from the `info` pointer.
> > > > >      I was unsure if the complexity was justified, but it seems worthwhile now.
> > > > >
> > > > > I will move `hvfb_putmem()` to the `destroy` function in V2, and I hope this
> > > > > will address all the cases you mentioned.
> > > > >
> > > >
> > > > Yes, that's what I expect needs to happen, though I haven't looked at the
> > > > details of making sure all the needed data structures are still around. Like
> > > > you, I just had this sense that hvfb_putmem() might need to be moved as
> > > > well, so I tried to produce a failure scenario to prove it, which turned out
> > > > to be easy.
> > > >
> > > > Michael
> > >
> > > I will add this in V2 as well. But I have found an another issue which is
> > > not very frequent.
> > >
> > >
> > > [  176.562153] ------------[ cut here ]------------
> > > [  176.562159] fb0: fb_WARN_ON_ONCE(pageref->page != page)
> > > [  176.562176] WARNING: CPU: 50 PID: 1522 at drivers/video/fbdev/core/fb_defio.c:67
> > > fb_deferred_io_mkwrite+0x215/0x280
> > >
> > > <snip>
> > >
> > > [  176.562258] Call Trace:
> > > [  176.562260]  <TASK>
> > > [  176.562263]  ? show_regs+0x6c/0x80
> > > [  176.562269]  ? __warn+0x8d/0x150
> > > [  176.562273]  ? fb_deferred_io_mkwrite+0x215/0x280
> > > [  176.562275]  ? report_bug+0x182/0x1b0
> > > [  176.562280]  ? handle_bug+0x133/0x1a0
> > > [  176.562283]  ? exc_invalid_op+0x18/0x80
> > > [  176.562284]  ? asm_exc_invalid_op+0x1b/0x20
> > > [  176.562289]  ? fb_deferred_io_mkwrite+0x215/0x280
> > > [  176.562291]  ? fb_deferred_io_mkwrite+0x215/0x280
> > > [  176.562293]  do_page_mkwrite+0x4d/0xb0
> > > [  176.562296]  do_wp_page+0xe8/0xd50
> > > [  176.562300]  ? ___pte_offset_map+0x1c/0x1b0
> > > [  176.562304]  __handle_mm_fault+0xbe1/0x10e0
> > > [  176.562307]  handle_mm_fault+0x17f/0x2e0
> > > [  176.562309]  do_user_addr_fault+0x2d1/0x8d0
> > > [  176.562314]  exc_page_fault+0x85/0x1e0
> > > [  176.562318]  asm_exc_page_fault+0x27/0x30
> > >
> > > Looks this is because driver is unbind still Xorg is trying to write
> > > to memory which is causing some page faults. I have confirmed PID 1522
> > > is of Xorg. I think this is because we need to cancel the framebuffer
> > > deferred work after flushing it.
> >
> > Does this new issue occur even after moving hvfb_putmem()
> > into the destroy() function?
> 
> Unfortunately yes :(
> 
> >                             I'm hoping it doesn't. I've
> > looked at the fb_deferred_io code, and can't quite figure out
> > how that deferred I/O work is supposed to get cancelled. Or
> > maybe it's just not supposed to get started again after the flush.
> >
> 
> I want to understand why cancel_delayed_work_sync was introduce in
> hvfb_suspend and not the flush. Following commit introduced it.
> 
> 382a462217572 ('video: hyperv_fb: Fix hibernation for the deferred IO feature')
> 
> But I agree this need more analysis.
> 
> > If the new issue still happens, that seems like more of a flaw
> > in the fb deferred I/O mechanism not shutting itself down
> > properly.
> >
> 
> As the repro rate is quite low, this will take some effort to get this
> fixed. Shall we take this in a separate patch later ?
> 

Yes, I'm OK with doing a separate patch later. It might turn out
to not be a bug in hyperv_fb anyway.

Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ