linux-kernel - Re: [PATCH v2 0/6] mm/devm_memremap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPcyv4hujnGHtTwE78gvmEoY3Y6nLsd1AhJfeKMwHrxLvStf9w@mail.gmail.com>
Date:   Thu, 16 May 2019 14:51:40 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Jane Chu <jane.chu@...cle.com>
Cc:     linux-nvdimm <linux-nvdimm@...ts.01.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>,
        Jérôme Glisse <jglisse@...hat.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v2 0/6] mm/devm_memremap_pages: Fix page release race

On Thu, May 16, 2019 at 9:45 AM Jane Chu <jane.chu@...cle.com> wrote:
>
> Hi,
>
> I'm able to reproduce the panic below by running two sets of ndctl
> commands that actually serve legitimate purpose in parallel (unlike
> the brute force experiment earlier), each set in a indefinite loop.
> This time it takes about an hour to panic.  But I gather the cause
> is probably the same: I've overlapped ndctl commands on the same
> region.
>
> Could we add a check in nd_ioctl(), such that if there is
> an ongoing ndctl command on a region, subsequent ndctl request
> will fail immediately with something to the effect of EAGAIN?
> The rationale being that kernel should protect itself against
> user mistakes.

We do already have locking in the driver to prevent configuration
collisions. The problem looks to be broken assumptions about running
the device unregistration path in a separate thread outside the lock.
I suspect it may be incorrect assumptions about the userspace
visibility of the device relative to teardown actions. To be clear
this isn't the nd_ioctl() path this is the sysfs path.


> Also, sensing the subject fix is for a different problem, and has been
> verified, I'm happy to see it in upstream, so we have a better
> code base to digger deeper in terms of how the destructive ndctl
> commands interacts to typical mission critical applications, include
> but not limited to rdma.

Right, the crash signature you are seeing looks unrelated to the issue
being address in these patches which is device-teardown racing active
page pins. I'll start the investigation on the crash signature, but
again I don't think it reads on this fix series.