lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANsGZ6YBf+JCPYa_QOhO-uauebK5HVmacaGCQmvNSsws3-ca-g@mail.gmail.com>
Date:   Thu, 23 Jul 2020 20:45:34 -0700
From:   Hugh Dickins <hughd@...gle.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Oleg Nesterov <oleg@...hat.com>, Michal Hocko <mhocko@...nel.org>,
        Linux-MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Michal Hocko <mhocko@...e.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page

On Thu, Jul 23, 2020 at 5:47 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Thu, Jul 23, 2020 at 5:07 PM Hugh Dickins <hughd@...gle.com> wrote:
> >
> > I say that for full disclosure, so you don't wrack your brains
> > too much, when it may still turn out to be a screwup on my part.
>
> Sounds unlikely.
>
> If that patch applied even reasonably closely, I don't see how you'd
> see a list corruption that wasn't due to the patch.
>
> You'd have had to use the wrong spinlock by mistake due to munging it,
> or something crazy like that.
>
> The main list-handling change is
>
>  (a) open-coding of that finish_wait()
>
>  (b) slightly different heuristics for removal in the wakeup function
>
> where (a) was because my original version of finishing the wait needed
> to do that return code checking.
>
> So a normal "finish_wait()" just does
>
>         list_del_init(&wait->entry);
>
> where-as my open-coded one replaced that with
>
>         if (!list_empty(&wait->entry)) {
>                 list_del(&wait->entry);
>                 ret = -EINTR;
>         }
>
> and apart from that "set return to -EINTR because nobody woke us up",
> it also uses just a regular "list_del()" rather than a
> "list_del_init()". That causes the next/prev field to be poisoned
> rather than re-initialized. But that second change was because the
> list entry is on the stack, and we're not touching it any more and are
> about to return, so I removed the "init" part.
>
> Anyway, (a) really looks harmless. Unless the poisoning now triggers
> some racy debug test that had been hidden by the "init". Hmm.
>
> In contrast, (b) means that the likely access patterns of irqs
> removing the wait entry from the list might be very different from
> before. The old "autoremove" semantics would only remove the entry
> from the list when try_to_wake_up() actually woke things up. Now, a
> successful bit state _always_ removes it, which was kind of the point.
> But it might cause very different list handling patterns.
>
> All the actual list handling looks "obviously safe" because it's
> protected by the spinlock, though...
>
> If you do get oopses with the new patch too, try to send me a copy,
> and maybe I'll stare at exactly where it happens register contents and
> go "aah".

This new version is doing much better: many hours to go, but all
machines have got beyond the danger point where yesterday's version
was crashing - phew!

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ