linux-kernel - Re: [PATCH] soc: qcom: rpmh: Avoid accessing freed memory from batch API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <154697354993.15366.3623820425365137828@swboyd.mtv.corp.google.com>
Date:   Tue, 08 Jan 2019 10:52:29 -0800
From:   Stephen Boyd <swboyd@...omium.org>
To:     Evan Green <evgreen@...omium.org>, Lina Iyer <ilina@...eaurora.org>
Cc:     Andy Gross <andy.gross@...aro.org>, linux-kernel@...r.kernel.org,
        linux-arm-msm@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        "Raju P.L.S.S.S.N" <rplsssn@...eaurora.org>,
        Matthias Kaehlcke <mka@...omium.org>
Subject: Re: [PATCH] soc: qcom: rpmh: Avoid accessing freed memory from batch API

Quoting Evan Green (2019-01-08 10:30:04)
> On Tue, Jan 8, 2019 at 9:49 AM Lina Iyer <ilina@...eaurora.org> wrote:
> >
> > diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
> > index c7beb6841289..0303a2971d4a 100644
> > --- a/drivers/soc/qcom/rpmh.c
> > +++ b/drivers/soc/qcom/rpmh.c
> > @@ -80,6 +80,7 @@ void rpmh_tx_done(const struct tcs_request *msg, int r)
> >         struct rpmh_request *rpm_msg = container_of(msg, struct rpmh_request,
> >                                                     msg);
> >         struct completion *compl = rpm_msg->completion;
> > +       bool free = rpm_msg->needs_free;
> >
> >         rpm_msg->err = r;
> >
> > @@ -94,7 +95,7 @@ void rpmh_tx_done(const struct tcs_request *msg, int r)
> >         complete(compl);
> >
> >  exit:
> > -       if (rpm_msg->needs_free)
> > +       if (free)
> >                 kfree(rpm_msg);
> >  }
> >
> 
> Hi Lina,
> I think that's a worthy fix, too, and is needed to solve the issue you describe.

Looks like we need both fixes so I can combine them together.

> 
> But I think Stephen's fix is still needed. In the rpmh_write_batch
> scenario, we queue N things into rpmh, but set the same completion for
> all of them. If only the first one completes but not the others, then
> the loop in rpmh_write_batch will call wait_for_completion_timeout N
> times on the same completion, and then goes on to free all N requests,
> even though only the first one is actually done and out of the system
> (well, almost out of the system, with the bug you noticed above).

This code looks an awful lot like kref_put() with a release function
that's a kfree() of the rpm message. Would that simplify the code
somewhat if we made a refcounter (that probably only counted up to 2)
and then properly refcounted the messages? We would have to allocate a
bunch of messages for the batch writing API, but I'm not sure that's a
big deal either.

> 
> We considered having just one completion on the last transfer, but
> then if there's an error part way through you have no way of waiting
> on the transfers that did get submitted. So I think N completions are
> needed.

For that, I'd like to make the batch API know more about the TCS it's
filling so it can know how to unwind the state if something fails. From
what I can tell this function is implemented at the wrong abstraction
level. It should be a lower level function so it can manage the queue
and push multiple messages through.