linux-kernel - Re: [PATCH] spi: spi-geni-qcom: Fix NULL pointer access in geni_spi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD=FV=Xgw+33pCycHyaMPsk64Qs+oh8e-RtJaM1yn0F27qZRVQ@mail.gmail.com>
Date:   Thu, 10 Dec 2020 17:30:17 -0800
From:   Doug Anderson <dianders@...omium.org>
To:     Stephen Boyd <swboyd@...omium.org>
Cc:     Roja Rani Yarubandi <rojay@...eaurora.org>,
        Mark Brown <broonie@...nel.org>,
        Andy Gross <agross@...nel.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        linux-spi <linux-spi@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Akash Asthana <akashast@...eaurora.org>,
        msavaliy@....qualcomm.com
Subject: Re: [PATCH] spi: spi-geni-qcom: Fix NULL pointer access in geni_spi_isr

Hi,

On Thu, Dec 10, 2020 at 5:21 PM Stephen Boyd <swboyd@...omium.org> wrote:
>
> > > I guess I'm not convinced that the hardware is so bad that it cancels
> > > and aborts the sequencer, raises an irq for that, and then raises an irq
> > > for the earlier rx/tx that the sequencer canceled out. Is that
> > > happening? It's called a sequencer because presumably it runs a sequence
> > > of operations like tx, rx, cs change, cancel, and abort. Hopefully it
> > > doesn't run them out of order. If they run at the same time it's fine,
> > > the irq handler will see all of them and throw away reads, etc.
> >
> > Maybe answered by me explaining that I'm worried about the case where
> > "abort" times out (and thus the "done" from the abort is still
> > pending).
> >
> > NOTE: I will also assert that if we send the "abort" then it seems
> > like it has a high likelihood of timing out.  Why do I say that?  In
> > order to even get to sending the "abort", it means:
> >
> > a) The original transfer timed out
> >
> > b) The "cancel" timed out.  As you can see, if the "cancel" doesn't
> > time out we don't even send the "abort"
> >
> > ...so two things timed out, one of which we _just_ sent.  The "abort"
> > feels like a last ditch effort.  Might as well try it, but things were
> > in pretty sorry shape to start with by the time we tried it.
> >
>
> Yeah and so if it comes way later because it timed out then what's the
> point of calling synchronize_irq() again? To make the completion
> variable set when it won't be tested again until it is reinitialized?

Presumably the idea is to try to recover to a somewhat usable state
again?  We're not rebooting the machine so, even though this transfer
failed, we will undoubtedly do another transfer later.  If that
"abort" interrupt comes way later while we're setting up the next
transfer we'll really confuse ourselves.

I guess you could go the route of adding a synchronize_irq() at the
start of the next transfer, but I'd rather add the overhead in the
exceptional case (the timeout) than the normal case.  In the normal
case we don't need to worry about random IRQs from the past transfer
suddenly showing up.

-Doug