lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7eaf4c10-292e-18d7-e8ce-3a6b72122381@redhat.com>
Date:   Fri, 28 May 2021 10:58:01 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Dan Williams <dan.j.williams@...el.com>,
        Bjorn Helgaas <helgaas@...nel.org>
Cc:     Greg KH <gregkh@...uxfoundation.org>,
        Arnd Bergmann <arnd@...db.de>, Ingo Molnar <mingo@...hat.com>,
        Kees Cook <keescook@...omium.org>,
        Matthew Wilcox <willy@...radead.org>,
        Russell King <linux@....linux.org.uk>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>,
        Linux PCI <linux-pci@...r.kernel.org>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Krzysztof WilczyƄski <kw@...ux.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v4] /dev/mem: Revoke mappings when a driver claims the
 region

On 27.05.21 23:30, Dan Williams wrote:
> On Thu, May 27, 2021 at 1:58 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
>>
>> [+cc Daniel, Krzysztof, Jason, Christoph, linux-pci]
>>
>> On Thu, May 21, 2020 at 02:06:17PM -0700, Dan Williams wrote:
>>> Close the hole of holding a mapping over kernel driver takeover event of
>>> a given address range.
>>>
>>> Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
>>> introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
>>> kernel against scenarios where a /dev/mem user tramples memory that a
>>> kernel driver owns. However, this protection only prevents *new* read(),
>>> write() and mmap() requests. Established mappings prior to the driver
>>> calling request_mem_region() are left alone.
>>>
>>> Especially with persistent memory, and the core kernel metadata that is
>>> stored there, there are plentiful scenarios for a /dev/mem user to
>>> violate the expectations of the driver and cause amplified damage.
>>>
>>> Teach request_mem_region() to find and shoot down active /dev/mem
>>> mappings that it believes it has successfully claimed for the exclusive
>>> use of the driver. Effectively a driver call to request_mem_region()
>>> becomes a hole-punch on the /dev/mem device.
>>
>> This idea of hole-punching /dev/mem has since been extended to PCI
>> BARs via [1].
>>
>> Correct me if I'm wrong: I think this means that if a user process has
>> mmapped a PCI BAR via sysfs, and a kernel driver subsequently requests
>> that region via pci_request_region() or similar, we punch holes in the
>> the user process mmap.  The driver might be happy, but my guess is the
>> user starts seeing segmentation violations for no obvious reason and
>> is not happy.
>>
>> Apart from the user process issue, the implementation of [1] is
>> problematic for PCI because the mmappable sysfs attributes now depend
>> on iomem_init_inode(), an fs_initcall, which means they can't be
>> static attributes, which ultimately leads to races in creating them.
> 
> See the comments in iomem_get_mapping(), and revoke_iomem():
> 
>          /*
>           * Check that the initialization has completed. Losing the race
>           * is ok because it means drivers are claiming resources before
>           * the fs_initcall level of init and prevent iomem_get_mapping users
>           * from establishing mappings.
>           */
> 
> ...the observation being that it is ok for the revocation inode to
> come on later in the boot process because userspace won't be able to
> use the fs yet. So any missed calls to revoke_iomem() would fall back
> to userspace just seeing the resource busy in the first instance. I.e.
> through the normal devmem_is_allowed() exclusion.
> 
>>
>> So I'm raising the question of whether this hole-punch is the right
>> strategy.
>>
>>    - Prior to revoke_iomem(), __request_region() was very
>>      self-contained and really only depended on the resource tree.  Now
>>      it depends on a lot of higher-level MM machinery to shoot down
>>      mappings of other tasks.  This adds quite a bit of complexity and
>>      some new ordering constraints.
>>
>>    - Punching holes in the address space of an existing process seems
>>      unfriendly.  Maybe the driver's __request_region() should fail
>>      instead, since the driver should be prepared to handle failure
>>      there anyway.
> 
> It's prepared to handle failure, but in this case it is dealing with a
> root user of 2 minds.
> 
>>
>>    - [2] suggests that the hole punch protects drivers from /dev/mem
>>      writers, especially with persistent memory.  I'm not really
>>      convinced.  The hole punch does nothing to prevent a user process
>>      from mmapping and corrupting something before the driver loads.
> 
> The motivation for this was a case that was swapping between /dev/mem
> access and /dev/pmem0 access and they forgot to stop using /dev/mem
> when they switched to /dev/pmem0. If root wants to use /dev/mem it can
> use it, if root wants to stop the driver from loading it can set
> mopdrobe policy or manually unbind, and if root asks the kernel to
> load the driver while it is actively using /dev/mem something has to
> give. Given root has other options to stop a driver the decision to
> revoke userspace access when root messes up and causes a collision
> seems prudent to me.
> 

Is there a real use case for mapping pmem via /dev/mem or could we just 
prohibit the access to these areas completely?

What's the use case for "swapping between /dev/mem access and /dev/pmem0 
access" ?

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ