[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <51B72382.7060702@oracle.com>
Date: Tue, 11 Jun 2013 09:17:54 -0400
From: konrad wilk <konrad.wilk@...cle.com>
To: Jan Beulich <JBeulich@...e.com>
CC: david.vrabel@...rix.com, roger.pau@...rix.com,
xen-devel <xen-devel@...ts.xen.org>,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] xen/blkback: Check for insane amounts of request on the
ring.
On 6/11/2013 3:42 AM, Jan Beulich wrote:
>>>> On 10.06.13 at 18:43, Konrad Rzeszutek Wilk <konrad.wilk@...cle.com> wrote:
>> On Mon, Jun 10, 2013 at 04:52:35PM +0100, Jan Beulich wrote:
>>>>>> On 07.06.13 at 22:11, Konrad Rzeszutek Wilk <konrad.wilk@...cle.com> wrote:
>>>> On Tue, Jun 04, 2013 at 03:57:06PM -0400, Konrad Rzeszutek Wilk wrote:
>>>>> + /* N.B. 'rp', not 'rc'. */
>>>>> + if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rp)) {
>>>>> + pr_warn(DRV_PFX "Frontend provided bogus ring requests (%d - %d = %d).
>>>> Halting ring processing on dev=%04x\n",
>>>>> + rp, rc, rp - rc, blkif->vbd.pdevice);
>>>> Hm, I seem to be able to get:
>>>>
>>>> [ 189.398095] xen-blkback:Frontend provided bogus ring requests (125 - 115 =
>>>> 10). Halting ring processing on dev=800011
>>>> or:
>>>> [ 478.558699] xen-blkback:Frontend provided bogus ring requests (95 - 94 =
>>>> 1). Halting ring processing on dev=800011
>>>>
>>>> Which is clearly wrong. Piggybacking on the rsp_prod_pvt does not seem to
>>>> cut it.
>>> We see that too, but not very frequently. One thing is that
>>> rsp_prod_pvt doesn't get printed along with rc and rp, thus
>>> making it not immediately obvious how this can be off in any way.
>>>
>>> Among the instance there are cases where the printed
>>> difference is 32, which makes me wonder whether part of the
>>> problem is the >= in the macro (we may want > here).
>>>
>>> And then we might have been living with some sort of issue in the
>>> past, because the existing use of the macro just causes the loop
>>> to be exited, with it getting re-entered subsequently (i.e. at worst
>>> causing performance issues).
>> My observation was that the rsp_prod_pvt was lagging behind b/c the
>> READ requests weren't completed. In other words, the processing
>> of the ring was stalled b/c 'make_response' hadn't been called yet.
>> Which meant that rsp_prod was not updated to rsp_prod_pvt (backend
>> does not care about that value, only frontend does).
> I don't buy this: rsp_prod is being updated by the backend just for
> the frontend's sake, so this value really doesn't need looking at (or
> else we'd become susceptible to the guest maliciously writing that
> field).
>
> rsp_prod_pvt, otoh, is never behind rsp_prod, and if the guest
> produces requests that don't have matching space for responses,
> the guest is doing something bogus (and perhaps malicious).
I believe this is what I saw with the rsp_prod_pvt added in the printk.
Unfortunately I did not save the logs.
>
>> Going back to the rc an rp check solves the immediate 'insane ring
>> check'.
> Consequently, while this check is better than none at all, I think it
> is still too lax, and we really want to check against the produced
> responses. Just that other than for the rc check using >=, we'd
> need > for the rp one.
>
> But first of all let me see if I can get the original broken check to
> trigger wrongly here (so far only our stage testing caught these),
> and look at by how much rsp_prod_pvt really lags.
OK.
>
> Jan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists