linux-kernel - Re: [PATCH 2/5] usb: gadget: f_midi: added spinlock on transmit function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56DEF267.5090803@felipetonello.com>
Date:	Tue, 8 Mar 2016 15:40:23 +0000
From:	Felipe Ferreri Tonello <eu@...ipetonello.com>
To:	Felipe Balbi <balbi@...nel.org>, linux-usb@...r.kernel.org
Cc:	linux-kernel@...r.kernel.org,
	Michal Nazarewicz <mina86@...a86.com>,
	Clemens Ladisch <clemens@...isch.de>
Subject: Re: [PATCH 2/5] usb: gadget: f_midi: added spinlock on transmit
 function

Hi Balbi,

On 08/03/16 14:01, Felipe Balbi wrote:
> 
> Hi,
> 
> Felipe Ferreri Tonello <eu@...ipetonello.com> writes:
>>>>>>>> Since f_midi_transmit is called by both ALSA and USB frameworks, it
>>>>>>> can
>>>>>>>> potentially cause a race condition between both calls. This is bad
>>>>>>> because the
>>>>>>>> way f_midi_transmit is implemented can't handle concurrent calls.
>>>>>>> This is due
>>>>>>>> to the fact that the usb request fifo looks for the next element and
>>>>>>> only if
>>>>>>>> it has data to process it enqueues the request, otherwise re-uses it.
>>>>>>> If both
>>>>>>>> (ALSA and USB) frameworks calls this function at the same time, the
>>>>>>>> kfifo_seek() will return the same usb_request, which will cause a
>>>>>>> race
>>>>>>>> condition.
>>>>>>>>
>>>>>>>> To solve this problem a syncronization mechanism is necessary. In
>>>>>>> this case it
>>>>>>>> is used a spinlock since f_midi_transmit is also called by
>>>>>>> usb_request->complete
>>>>>>>> callback in interrupt context.
>>>>>>>>
>>>>>>>> On benchmarks realized by me, spinlocks were more efficient then
>>>>>>> scheduling
>>>>>>>> the f_midi_transmit tasklet in process context and using a mutex
>>>>>>>> to synchronize. Also it performs better then previous
>>>>>>>> implementation
>>>>>>> that
>>>>>>>> allocated a usb_request for every new transmit made.
>>>>>>>
>>>>>>> behaves better in what way ? Also, previous implementation would not
>>>>>>> suffer from this concurrency problem, right ?
>>>>>>
>>>>>> The spin lock is faster than allocating usb requests all the time,
>>>>>> even if the udc uses da for it.
>>>>>
>>>>> did you measure ? Is the extra speed really necessary ? How did you
>>>>> benchmark this ?
>>>>
>>>> Yes I did measure and it was not that significant. This is not about
>>>> speed. There was a bug in that approach that I already explained on
>>>
>>> you have very confusing statements. When I mentioned that previous code
>>> wouldn't have the need for the spinlock you replied that spinlock was
>>> faster.
>>>
>>> When I asked you about benchmarks you reply saying it's not about the
>>> speed.
>>>
>>> Make up your mind dude. What are you trying to achieve ?
>>>
>>>> that patch, which was approved and applied BTW.
>>>
>>> patches can be reverted if we realise we're better off without
>>> them. Don't get cocky, please.
>>
>> Yes am I aware of that, but I honestly think that is the wrong way of
>> dealing with this.
>>
>> ?? I don't get why am I giving this impression.
> 
> re-read your emails. The gist goes like this:
> 
> . Send patch
> . Got comments
> . Well, whatever, you can just ignore if you don't agree

This is one of the problems with email. It can give the wrong impression
and feelings. :)

That was not what I meant at all. I mean that for real, not in a
childish manner. I'm sorry if I gave you that impression.

> 
>>>> Any way, this spinlock should've been there since that patch but I
>>>> couldn't really trigger this problem without a stress test.
>>>
>>> which tells me you sent me patches without properly testing. How much
>>> time did it take to trigger this ? How did you trigger this situation ?
>>
>> No, that is no true. The implementation I sent is working properly for
>> any real world usage.
>>
>> The stress test I made to break the current implementation is *not* a
>> real use-case. I made it in order to push as far as possible how fast
>> the driver can *reliably* handle while sending and reading data. Then I
>> noticed the bug.
>>
>> So, to answer your question. To trigger this bug is not a matter of
>> time. The following needs to happen:
>>  1. Device send MIDI message that is *bigger* than the usb request
>> length. (just this by itself is really unlikely to happen in real world
>> usage)
> 
> I wouldn't say it's unlikely. You just cannot trust the other side of
> the wire. We've seen e.g. Xbox 360's SCSI layer sending messages of the
> wrong size and we worked around them in g_mass_storage.
> 
> Broken implementations are a real thing ;-)

Fair enough. And that's why I am pushing this fix. :)

> 
>>  2. Host send a MIDI message back *exactly* at the same time as the
>> device is processing the second part of the usb request from the same
>> message.
> 
> also not that unlikely to happen ;-) You can't assume the host will only
> shift tokens on the wire at the time you're expecting it to.
> 
>> I couldn't trigger this in all the tests we've made. I just triggered
>> when I was sending huge messages back and forth (device <-> host) as
>> mentioned.
> 
> fair enough.
> 
>> In fact, we have thousands of devices out there without this patch (but
>> with my previous patch that introduced this bug).
> 
> that's thousands of devices waiting to have a problem, right ? :-)

:X

> 
>> I am not trying to say it wasn't a mistake. That patch unfortunately
>> introduces this bug, but it has real improvements over the previous
>> implementation. AFAIR the improvements are:
>>  * Fixes a bug that was causing the DMA buffer to fill it up causing a
>> kernel panic.
> 
> this is a good point. Had forgotten about that detail. Thanks
> 
>>  * Pre allocate IN usb requests so there is no allocation overhead while
>> sending data (same behavior already existed for the OUT endpoint). This
>> ensure that the DMA memory is not misused affecting the rest of the
>> system.
> 
> also, arguably, a good idea. Recycling requests is a lot nicer and it's
> what most gadget drivers do.
> 
>>  * It doesn't crash if the host doesn't send an ACK after IN data
>> packets and we have reached the limit of available memory. Also, this is
>> useful because it causes the ALSA layer to timeout, which is the correct
>> userspace behavior.
> 
> right
> 
>>  * Continuous to send data to the correct Jack (associated to each ALSA
>> substream) if that was interrupted somehow, for instance by the size
>> limit of a usb request.
> 
> ok.
> 
>>>> So, this patch fixes a bug in the current implementation.
>>>
>>> fixes a regression introduced by you, true. I'm trying to figure out if
>>> we're better off without the original patch; to make a good decision I
>>> need to know if the extra "speed" we get from not allocating requests on
>>> demand are really that important.
>>>
>>> So, how much faster did you get and is that extra "speed" really
>>> important ?
>>
>> The speed is not relevant at all in this case. It was not the goal of
>> the patch, but I mentioned because it is obvious that with no memory
>> allocation there will be an increase of speed that the code is executed.
>>
>> I did measure the speed improvements at that time, it was real but not
>> relevant. I don't think we should be discussing this anyway.
> 
> fair enough. This was probably the first email from you which gave me
> some peace of mind that you know what you're doing with this fix. Keep
> in mind that we all receive hundreds of emails a day and it's difficult
> to track things over time.

True. I will try to keep this always in mind.

> 
> It's also a big PITA when someone sends fixes and cleanups on the same
> series and/or with dependencies between them. The correct way is to send
> *only* fixes first. They should be minimal patches that *only* fix the
> problem. If the code looks messy or doesn't follow the coding style,
> that's something you do on a completely separate fix and, usually, from
> a clean topic branch starting at a tag from Linus (exceptions may arise,
> of course).

Got it.

> 
> So anyway, to finally finish this up. Can you send JUST the bare minimum
> fix necessary to avoid the regression ? Also, add a proper Fixes: foobar
> line on commit log (see commit e18b7975c885bc3a938b9a76daf32957ea0235fa
> for an example).
> 
> Then we can get that merged. Keep in mind that you might have to Cc
> stable (see same commit listed above).

Ok.

I will send the state-machine refactor as another patch in another topic
then.

> 
> After this is sorted out, then let's see how we can help you move your
> product to libusbgx and check if there's anything missing in configfs
> to cope with your use-case.

That will be great, thanks! I will keep the list posted.

> 
> ps: can you point me to your devices shipping with f_midi ? Which
> architecture are they using ? Which USB Peripheral Controller ? This
> might be a good addition to my test farm depending on your answers above
> :-p

Seaboard GRAND[1]. Freescale's i.MX 6 running an ARM A9. The controller
is Chip Idea.

[1] https://www.roli.com/products/seaboard-grand

-- 
Felipe

Download attachment "0x92698E6A.asc" of type "application/pgp-keys" (7196 bytes)