lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <BN7PR02MB4148157E5307E306FD9D093ED4D92@BN7PR02MB4148.namprd02.prod.outlook.com>
Date: Wed, 19 Mar 2025 20:29:12 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Helge Deller <deller@....de>, "linux-fbdev@...r.kernel.org"
	<linux-fbdev@...r.kernel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>
Subject: RE: fbdev deferred I/O broken in some scenarios

From: Helge Deller <deller@....de> Sent: Tuesday, March 18, 2025 1:16 AM
> Hi Michael,
> 
> On 3/18/25 03:05, Michael Kelley wrote:
> > I've been trying to get mmap() working with the hyperv_fb.c fbdev driver, which
> > is for Linux guests running on Microsoft's Hyper-V hypervisor. The hyperv_fb driver
> > uses fbdev deferred I/O for performance reasons. But it looks to me like fbdev
> > deferred I/O is fundamentally broken when the underlying framebuffer memory
> > is allocated from kernel memory (alloc_pages or dma_alloc_coherent).
> >
> > The hyperv_fb.c driver may allocate the framebuffer memory in several ways,
> > depending on the size of the framebuffer specified by the Hyper-V host and the VM
> > "Generation".  For a Generation 2 VM, the framebuffer memory is allocated by the
> > Hyper-V host and is assigned to guest MMIO space. The hyperv_fb driver does a
> > vmalloc() allocation for deferred I/O to work against. This combination handles mmap()
> > of /dev/fb<n> correctly and the performance benefits of deferred I/O are substantial.
> >
> > But for a Generation 1 VM, the hyperv_fb driver allocates the framebuffer memory in
> > contiguous guest physical memory using alloc_pages() or dma_alloc_coherent(), and
> > informs the Hyper-V host of the location. In this case, mmap() with deferred I/O does
> > not work. The mmap() succeeds, and user space updates to the mmap'ed memory are
> > correctly reflected to the framebuffer. But when the user space program does munmap()
> > or terminates, the Linux kernel free lists become scrambled and the kernel eventually
> > panics. The problem is that when munmap() is done, the PTEs in the VMA are cleaned
> > up, and the corresponding struct page refcounts are decremented. If the refcount goes
> > to zero (which it typically will), the page is immediately freed. In this way, some or all
> > of the framebuffer memory gets erroneously freed. From what I see, the VMA should
> > be marked VM_PFNMAP when allocated memory kernel is being used as the
> > framebuffer with deferred I/O, but that's not happening. The handling of deferred I/O
> > page faults would also need updating to make this work.
> >
> > The fbdev deferred I/O support was originally added to the hyperv_fb driver in the
> > 5.6 kernel, and based on my recent experiments, it has never worked correctly when
> > the framebuffer is allocated from kernel memory. fbdev deferred I/O support for using
> > kernel memory as the framebuffer was originally added in commit 37b4837959cb9
> > back in 2008 in Linux 2.6.29. But I don't see how it ever worked properly, unless
> > changes in generic memory management somehow broke it in the intervening years.
> >
> > I think I know how to fix all this. But before working on a patch, I wanted to check
> > with the fbdev community to see if this might be a known issue and whether there
> > is any additional insight someone might offer. Thanks for any comments or help.
> 
> I haven't heard of any major deferred-i/o issues since I've jumped into fbdev
> maintenance. But you might be right, as I haven't looked much into it yet and
> there are just a few drivers using it.
> 

Thanks for the input. In the fbdev directory, there are 9 drivers using deferred I/O.
Of those, 6 use vmalloc() to allocate the framebuffer, and that path works just fine.
The other 3 use alloc_pages(), dma_alloc_coherent(), or __get_free_pages(), all of
which manifest the underlying problem when munmap()'ed.  Those 3 drivers are:

* hyperv_fb.c, which I'm working with
* sh_mobile_lcdcfb.c
* ssd1307fb.c

Do you have any ownership or status information about the last two?  Neither is
listed in MAINTAINERS, so maybe they are for old devices and now effectively
abandoned. Before I make code changes to fb_defio.c, I wanted to make sure
I have all the context I can get, such as why this problem hasn't surfaced outside
of the hyperv_fb.c driver. Part of the reason is that evidently the vmalloc() based
approach is more common, and it works fine. With only two other drivers using
contiguous kernel memory allocations, and with those two perhaps being for old
and now mostly unused devices, that might explain things. The hyperv_fb.c
config that uses contiguous kernel memory is also somewhat rare, and I only
stumbled across it when debugging other problems.

In any case, I'm pretty convinced this is an issue with fb_defio.c and not
something specific to hyperv_fb.c, so I'll get to work on a fix and how it goes
from there.

Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ