linux-kernel - Re: [PATCH v2] pstore: Don't use semaphores in always-atomic-context code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <202202231128.E7445769AD@keescook>
Date:   Wed, 23 Feb 2022 11:29:43 -0800
From:   Kees Cook <keescook@...omium.org>
To:     Jann Horn <jannh@...gle.com>
Cc:     Anton Vorontsov <anton@...msg.org>,
        Colin Cross <ccross@...roid.com>,
        Tony Luck <tony.luck@...el.com>,
        Ard Biesheuvel <ardb@...nel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        linux-efi@...r.kernel.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [PATCH v2] pstore: Don't use semaphores in always-atomic-context
 code

On Wed, Feb 23, 2022 at 06:50:52PM +0100, Jann Horn wrote:
> On Wed, Feb 23, 2022 at 8:50 AM Kees Cook <keescook@...omium.org> wrote:
> > On February 18, 2022 10:19:50 AM PST, Jann Horn <jannh@...gle.com> wrote:
> > >pstore_dump() is *always* invoked in atomic context (nowadays in an RCU
> > >read-side critical section, before that under a spinlock).
> > >It doesn't make sense to try to use semaphores here.
> >
> > Ah, very nice. Thanks for the analysis!
> >
> > >[...]
> > >-static bool pstore_cannot_wait(enum kmsg_dump_reason reason)
> > >+bool pstore_cannot_block_path(enum kmsg_dump_reason reason)
> >
> > Why the rename,
> 
> That's one of the parts of commit ea84b580b955 that I included in the
> revert. "wait" in the name is not accurate, since "wait" in the kernel
> normally refers to scheduling away until some condition is fulfilled.
> (Though I guess "block" also isn't the best name either... idk.) The
> place where we might want to have different behavior depending on
> whether we're handling a kernel crash are spinlocks; during a kernel
> crash, we shouldn't deadlock on them, but otherwise, AFAIK it's fine
> to block on them.

Gotcha. I'm find to avoid "wait"; I was just curious why it was
changing, but I see now.

> 
> > extern, and EXPORT? This appears to still only have the same single caller?
> 
> Also part of the revert. I figured it might make sense to also revert
> that part because:
> 
> With this commit applied, the EFI code will always take the "nonblock"
> path for now, but that's kinda suboptimal; on some platforms the
> "blocking" path uses a semaphore, so we really can't take that, but on
> x86 it uses a spinlock, which we could block on if we're not oopsing.
> We could avoid needlessly losing non-crash dmesg dumps there; I don't
> know whether we care about that though.
> 
> So I figured that we might want to start adding new callers to this
> later on. But if you want, I'll remove that part of the revert and
> resend?

Yeah, let's just keep this static -- there's no reason to export it.

> 
> > > [...]
> > >-                      pr_err("dump skipped in %s path: may corrupt error record\n",
> > >-                              in_nmi() ? "NMI" : why);
> > >-                      return;
> > >-              }
> > >-              if (down_interruptible(&psinfo->buf_lock)) {
> > >-                      pr_err("could not grab semaphore?!\n");
> > >+      if (pstore_cannot_block_path(reason)) {
> > >+              if (!spin_trylock_irqsave(&psinfo->buf_lock, flags)) {
> > >+                      pr_err("dump skipped in %s path because of concurrent dump\n"
> > >+                                     , in_nmi() ? "NMI" : why);
> >
> > The pr_err had the comma following the format string moved,
> 
> Ah, whoops, that was also part of the revert, but I guess I should
> have left that part out...
> 
> > and the note about corruption removed. Is that no longer accurate?
> 
> There should be no more corruption since commit 959217c84c27 ("pstore:
> Actually give up during locking failure") - if we're bailing out, we
> can't be causing corruption, I believe?

Yeah, agreed. String content change is fine, the weird leading comma I'd
like to do without. :)

Thanks!

-- 
Kees Cook