[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez0UJDBzoaB4=c0Uju6L-eZvhWMdnzAp8N3QfeERbzYv2w@mail.gmail.com>
Date: Wed, 23 Feb 2022 18:50:52 +0100
From: Jann Horn <jannh@...gle.com>
To: Kees Cook <keescook@...omium.org>
Cc: Anton Vorontsov <anton@...msg.org>,
Colin Cross <ccross@...roid.com>,
Tony Luck <tony.luck@...el.com>,
Ard Biesheuvel <ardb@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-efi@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH v2] pstore: Don't use semaphores in always-atomic-context code
On Wed, Feb 23, 2022 at 8:50 AM Kees Cook <keescook@...omium.org> wrote:
> On February 18, 2022 10:19:50 AM PST, Jann Horn <jannh@...gle.com> wrote:
> >pstore_dump() is *always* invoked in atomic context (nowadays in an RCU
> >read-side critical section, before that under a spinlock).
> >It doesn't make sense to try to use semaphores here.
>
> Ah, very nice. Thanks for the analysis!
>
> >[...]
> >-static bool pstore_cannot_wait(enum kmsg_dump_reason reason)
> >+bool pstore_cannot_block_path(enum kmsg_dump_reason reason)
>
> Why the rename,
That's one of the parts of commit ea84b580b955 that I included in the
revert. "wait" in the name is not accurate, since "wait" in the kernel
normally refers to scheduling away until some condition is fulfilled.
(Though I guess "block" also isn't the best name either... idk.) The
place where we might want to have different behavior depending on
whether we're handling a kernel crash are spinlocks; during a kernel
crash, we shouldn't deadlock on them, but otherwise, AFAIK it's fine
to block on them.
> extern, and EXPORT? This appears to still only have the same single caller?
Also part of the revert. I figured it might make sense to also revert
that part because:
With this commit applied, the EFI code will always take the "nonblock"
path for now, but that's kinda suboptimal; on some platforms the
"blocking" path uses a semaphore, so we really can't take that, but on
x86 it uses a spinlock, which we could block on if we're not oopsing.
We could avoid needlessly losing non-crash dmesg dumps there; I don't
know whether we care about that though.
So I figured that we might want to start adding new callers to this
later on. But if you want, I'll remove that part of the revert and
resend?
> > [...]
> >- pr_err("dump skipped in %s path: may corrupt error record\n",
> >- in_nmi() ? "NMI" : why);
> >- return;
> >- }
> >- if (down_interruptible(&psinfo->buf_lock)) {
> >- pr_err("could not grab semaphore?!\n");
> >+ if (pstore_cannot_block_path(reason)) {
> >+ if (!spin_trylock_irqsave(&psinfo->buf_lock, flags)) {
> >+ pr_err("dump skipped in %s path because of concurrent dump\n"
> >+ , in_nmi() ? "NMI" : why);
>
> The pr_err had the comma following the format string moved,
Ah, whoops, that was also part of the revert, but I guess I should
have left that part out...
> and the note about corruption removed. Is that no longer accurate?
There should be no more corruption since commit 959217c84c27 ("pstore:
Actually give up during locking failure") - if we're bailing out, we
can't be causing corruption, I believe?
Powered by blists - more mailing lists