lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgptprCsM9Vv7wvBF6q23rR5WA94pBGD5kfS2sPwgNVyA@mail.gmail.com>
Date:   Thu, 23 Jul 2020 17:46:59 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Oleg Nesterov <oleg@...hat.com>, Michal Hocko <mhocko@...nel.org>,
        Linux-MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Michal Hocko <mhocko@...e.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page

On Thu, Jul 23, 2020 at 5:07 PM Hugh Dickins <hughd@...gle.com> wrote:
>
> I say that for full disclosure, so you don't wrack your brains
> too much, when it may still turn out to be a screwup on my part.

Sounds unlikely.

If that patch applied even reasonably closely, I don't see how you'd
see a list corruption that wasn't due to the patch.

You'd have had to use the wrong spinlock by mistake due to munging it,
or something crazy like that.

The main list-handling change is

 (a) open-coding of that finish_wait()

 (b) slightly different heuristics for removal in the wakeup function

where (a) was because my original version of finishing the wait needed
to do that return code checking.

So a normal "finish_wait()" just does

        list_del_init(&wait->entry);

where-as my open-coded one replaced that with

        if (!list_empty(&wait->entry)) {
                list_del(&wait->entry);
                ret = -EINTR;
        }

and apart from that "set return to -EINTR because nobody woke us up",
it also uses just a regular "list_del()" rather than a
"list_del_init()". That causes the next/prev field to be poisoned
rather than re-initialized. But that second change was because the
list entry is on the stack, and we're not touching it any more and are
about to return, so I removed the "init" part.

Anyway, (a) really looks harmless. Unless the poisoning now triggers
some racy debug test that had been hidden by the "init". Hmm.

In contrast, (b) means that the likely access patterns of irqs
removing the wait entry from the list might be very different from
before. The old "autoremove" semantics would only remove the entry
from the list when try_to_wake_up() actually woke things up. Now, a
successful bit state _always_ removes it, which was kind of the point.
But it might cause very different list handling patterns.

All the actual list handling looks "obviously safe" because it's
protected by the spinlock, though...

If you do get oopses with the new patch too, try to send me a copy,
and maybe I'll stare at exactly where it happens register contents and
go "aah".

        Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ