lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157A86150A5F7055F30E43BD4FF2@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Thu, 13 Feb 2025 03:43:07 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Saurabh Singh Sengar <ssengar@...ux.microsoft.com>
CC: "haiyangz@...rosoft.com" <haiyangz@...rosoft.com>, "wei.liu@...nel.org"
	<wei.liu@...nel.org>, "decui@...rosoft.com" <decui@...rosoft.com>,
	"deller@....de" <deller@....de>, "weh@...rosoft.com" <weh@...rosoft.com>,
	"dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
	"linux-fbdev@...r.kernel.org" <linux-fbdev@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>
Subject: RE: [PATCH 1/1] fbdev: hyperv_fb: iounmap() the correct memory when
 removing a device

From: Saurabh Singh Sengar <ssengar@...ux.microsoft.com> Sent: Wednesday, February 12, 2025 7:07 PM
> 
> On Thu, Feb 13, 2025 at 01:35:22AM +0000, Michael Kelley wrote:
> > From: Saurabh Singh Sengar <ssengar@...ux.microsoft.com> Sent: Monday, February 10, 2025 8:52 AM
> > >
> > [snip]
> > > > > >
> > > > > > While we are at it, I want to mention that I also observed below WARN
> > > > > > while removing the hyperv_fb, but that needs a separate fix.
> > > > > >
> > > > > >
> > > > > > [   44.111220] WARNING: CPU: 35 PID: 1882 at drivers/video/fbdev/core/fb_info.c:70 framebuffer_release+0x2c/0x40
> > > > > > < snip >
> > > > > > [   44.111289] Call Trace:
> > > > > > [   44.111290]  <TASK>
> > > > > > [   44.111291]  ? show_regs+0x6c/0x80
> > > > > > [   44.111295]  ? __warn+0x8d/0x150
> > > > > > [   44.111298]  ? framebuffer_release+0x2c/0x40
> > > > > > [   44.111300]  ? report_bug+0x182/0x1b0
> > > > > > [   44.111303]  ? handle_bug+0x6e/0xb0
> > > > > > [   44.111306]  ? exc_invalid_op+0x18/0x80
> > > > > > [   44.111308]  ? asm_exc_invalid_op+0x1b/0x20
> > > > > > [   44.111311]  ? framebuffer_release+0x2c/0x40
> > > > > > [   44.111313]  ? hvfb_remove+0x86/0xa0 [hyperv_fb]
> > > > > > [   44.111315]  vmbus_remove+0x24/0x40 [hv_vmbus]
> > > > > > [   44.111323]  device_remove+0x40/0x80
> > > > > > [   44.111325]  device_release_driver_internal+0x20b/0x270
> > > > > > [   44.111327]  ? bus_find_device+0xb3/0xf0
> > > > > >
> > > > >
> > > > > Thanks for pointing this out. Interestingly, I'm not seeing this WARN
> > > > > in my experiments. What base kernel are you testing with? Are you
> > > > > testing on a local VM or in Azure? What exactly are you doing
> > > > > to create the problem? I've been doing unbind of the driver,
> > > > > but maybe you are doing something different.
> > > > >
> > > > > FWIW, there is yet another issue where after doing two unbind/bind
> > > > > cycles of the hyperv_fb driver, there's an error about freeing a
> > > > > non-existent resource. I know what that problem is, and it's in
> > > > > vmbus_drv.c. I'll be submitting a patch for that as soon as I figure out
> > > > > a clean fix.
> > > > >
> > > > > Michael
> > > >
> > > > This is on local Hyper-V. Kernel: 6.14.0-rc1-next-20250205+
> > > > I run below command to reproduce the above error:
> > > > echo "5620e0c7-8062-4dce-aeb7-520c7ef76171" >
> > > /sys/bus/vmbus/devices/5620e0c7-8062-4dce-aeb7-520c7ef76171/driver/unbind
> > > >
> > > > When hvfb_remove is called I can see the refcount for framebuffer is 2 when ,
> > > > I expect it to be 1. After unregistering this framebuffer there is still 1 refcount
> > > > remains, which is the reason for this WARN at the time of framebuffer_release.
> > > >
> > > > I wonder who is registering/using this extra framebuffer. Its not hyperv_drm or
> > > > hyperv_fb IIUC.
> > > >
> > > > - Saurabh
> > >
> > > Here are more details about this WARN:
> > >
> > > Xorg opens `/dev/fb0`, which increases the framebuffer's reference
> > > count, as mentioned above.  As a result, when unbinding the driver,
> > > this WARN is expected, indicating that the framebuffer is still in use.
> > >
> > > I am open to suggestion what could be the correct behavior in this case.
> > > There acan be two possible options:
> > >
> > >  1. Check the framebuffer reference count and prevent the driver from
> > >     unbinding/removal.
> > > OR
> > >
> > >  2. Allow the driver to unbind while issuing this WARN. (Current scenario)
> > >
> >
> > >From looking at things and doing an experiment, I think there's a 3rd
> > option, which gets rid of the of the WARN while still allowing the unbind.
> >
> > The experiment is to boot Linux in a Gen2 Hyper-V guest with both the
> > Hyper-V FB and Hyper-V DRM modules removed. In this case, the
> > generic EFI framebuffer driver (efifb) should get used. With this driver,
> > a program can open /dev/fb0, and while it is open, unbind the efifb
> > driver (which is in /sys/bus/platform/drivers/efi-framebuffer).
> > Interestingly, there's no WARN generated. But when the hyperv_fb
> > driver is loaded and used, the WARN *is* generated, as you observed.
> >
> > So I looked at the code for efifb.  It does the framebuffer_release()
> > call in a function that hyperv_fb doesn't have. Based on the comments
> > in efifb.c, we need a similar function to handle the call to
> > framebuffer_release().  And the efifb driver also does the iounmap()
> > in that same function, which makes we wonder if the hyperv_fb
> > driver should do similarly. It will need a little more analysis to
> > figure that out.
> >
> > You found the bug.  Do you want to work on fixing the hyperv_fb
> > driver? And maybe the Hyper-V DRM driver needs the same fix.
> > I haven't looked. Alternatively, if you are busy, I can work on the fix.
> > Let me know your preference.
> >
> > Michael
> 
> Thanks for your analysis, its a good to know about fbib driver is not having
> this issue. We can take it as a reference.
> 
> At the first look I see efib driver is having a fb_ops.fb_destroy function
> which gets called after put_fb_info (responsible for decrementing the
> ref count). 

Yes, that's exactly what I was thinking.  If some user space program has
/dev/fb0 open, the driver can be unbound and the unbind will succeed.
The user space program will get an error the next time it tries to reference
the open device file descriptor. Presumably the user space program will
close /dev/fb0 at that point, or just terminate with an error, in which case
Linux will close /dev/fb0 as the user space process terminates. In either
case, fb_info sticks around until that happens and causes the refcount to
be decremented to 1, and then the destroy function is called to do
the final cleanup and free the memory for the fb_info structure.

At least that's what I think happens based on the comments in the
efifb driver. :-) But I have not spent time checking all the details.

> Also it uses devm_register_framebuffer which handles the registration
> and unregister of framebuffer more gracefully.
> 
> I will work on this.
> 

Sounds good.  It's in your court.

Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ