lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YD4I+VPr3UNt063H@dhcp22.suse.cz>
Date:   Tue, 2 Mar 2021 10:44:25 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Shakeel Butt <shakeelb@...gle.com>,
        syzbot <syzbot+506c8a2a115201881d45@...kaller.appspotmail.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>
Subject: Re: possible deadlock in sk_clone_lock

On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> On 3/1/21 9:23 AM, Michal Hocko wrote:
> > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@...e.com> wrote:
> > [...]
> >>> Then how come this can ever be a problem? in_task() should exclude soft
> >>> irq context unless I am mistaken.
> >>>
> >>
> >> If I take the following example of syzbot's deadlock scenario then
> >> CPU1 is the one freeing the hugetlb pages. It is in the process
> >> context but has disabled softirqs (see __tcp_close()).
> >>
> >>         CPU0                    CPU1
> >>         ----                    ----
> >>    lock(hugetlb_lock);
> >>                                 local_irq_disable();
> >>                                 lock(slock-AF_INET);
> >>                                 lock(hugetlb_lock);
> >>    <Interrupt>
> >>      lock(slock-AF_INET);
> >>
> >> So, this deadlock scenario is very much possible.
> > 
> > OK, I see the point now. I was focusing on the IRQ context and hugetlb
> > side too much. We do not need to be freeing from there. All it takes is
> > to get a dependency chain over a common lock held here. Thanks for
> > bearing with me.
> > 
> > Let's see whether we can make hugetlb_lock irq safe.
> 
> I may be confused, but it seems like we have a general problem with
> calling free_huge_page (as a result of put_page) with interrupts
> disabled.
> 
> Consider the current free_huge_page code.  Today, we drop the lock
> when processing gigantic pages because we may need to block on a mutex
> in cma code.  If our caller has disabled interrupts, then it doesn't
> matter if the hugetlb lock is irq safe, when we drop it interrupts will
> still be disabled we can not block .  Right?  If correct, then making
> hugetlb_lock irq safe would not help.
> 
> Again, I may be missing something.
> 
> Note that we also are considering doing more with the hugetlb lock
> dropped in this path in the 'free vmemmap of hugetlb pages' series.
> 
> Since we need to do some work that could block in this path, it seems
> like we really need to use a workqueue.  It is too bad that there is not
> an interface to identify all the cases where interrupts are disabled.

Wouldn't something like this help? It is quite ugly but it would be
simple enough and backportable while we come up with a more rigorous 
solution. What do you think?

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4bdb58ab14cb..c9a8b39f678d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
 void free_huge_page(struct page *page)
 {
 	/*
-	 * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
+	 * Defer freeing if in non-task context or when put_page is called
+	 * with IRQ disabled (e.g from via TCP slock dependency chain) to
+	 * avoid hugetlb_lock deadlock.
 	 */
-	if (!in_task()) {
+	if (!in_task() || irqs_disabled()) {
 		/*
 		 * Only call schedule_work() if hpage_freelist is previously
 		 * empty. Otherwise, schedule_work() had been called but the
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ