lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180108125910.4fa6c4ff@redhat.com>
Date:   Mon, 8 Jan 2018 12:59:10 +0100
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     Mauro Carvalho Chehab <mchehab@...pensource.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>,
        Josef Griebichler <griebichler.josef@....at>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Alan Stern <stern@...land.harvard.edu>,
        USB list <linux-usb@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Rik van Riel <riel@...hat.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Hannes Frederic Sowa <hannes@...hat.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>,
        Jonathan Corbet <corbet@....net>,
        LMML <linux-media@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        David Miller <davem@...emloft.net>
Subject: Re: dvb usb issues since kernel 4.9

On Mon, 8 Jan 2018 08:02:00 -0200
Mauro Carvalho Chehab <mchehab@...pensource.com> wrote:

> Hi Linus,
> 
> Em Sun, 7 Jan 2018 13:23:39 -0800
> Linus Torvalds <torvalds@...ux-foundation.org> escreveu:
> 
> > On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab
> > <mchehab@...pensource.com> wrote:  
> > >
> > > Em Sat, 6 Jan 2018 16:04:16 +0100
> > > "Josef Griebichler" <griebichler.josef@....at> escreveu:    
> > >>
> > >> the causing commit has been identified.
> > >> After reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> > >> its working again.    
> > >
> > > Just replying to me won't magically fix this. The ones that were involved on
> > > this patch should also be c/c, plus USB people. Just added them.    
> > 
> > Actually, you seem to have added an odd subset of the people involved.
> > 
> > For example, Ingo - who actually committed that patch - wasn't on the cc.  
> 
> Sorry, my fault. I forgot to add him to it.
> 
> > I do think we need to simply revert that patch. It's very simple: it
> > has been reported to lead to actual problems for people, and we don't
> > fix one problem and then say "well, it fixed something else" when
> > something breaks.
> > 
> > When something breaks, we either unbreak it, or we revert the change
> > that caused the breakage.
> > 
> > It's really that simple. That's what "no regressions" means.  We don't
> > accept changes that cause regressions. This one did.  
> 
> Yeah, we should either unbreak or revert it. In the specific case of
> media devices, Alan came with a proposal of increasing the number of
> buffers. This is an one line change, and increase a capture delay from
> 0.63 ms to 5 ms on this specific case (Digital TV) shouldn't make much
> harm. So, I guess it would worth trying it before reverting the patch.

Let find the root-cause of this before reverting, as this will hurt the
networking use-case.

I want to see if the increase buffer will solve the issue (the current
buffer of 0.63 ms seem too small). 

I would also like to see experiments with adjusting adjust the sched
priority of the kthread's and/or the userspace prog. (e.g use command
like 'sudo chrt --fifo -p 10 $(pgrep udp_sink)' ).


Are we really sure that the regression is cause by 4cd13c21b207
("softirq: Let ksoftirqd do its job"), the forum thread also report
that the problem is almost gone after commit 34f41c0316ed ("timers: Fix
overflow in get_next_timer_interrupt")
 https://git.kernel.org/torvalds/c/34f41c0316ed

It makes me suspicious that this fix changes things...
After this fix, I suspect that changing the sched priorities, will fix
the remaining glitches.


> It is hard to foresee the consequences of the softirq changes for other
> devices, though.

Yes, it is hard to foresee, I can only cover networking.

For networking, if reverting this, we will (again) open the kernel for
an easy DDoS vector with UDP packets.  As mentioned in the commit desc,
before you could easily cause softirq to take all the CPU time from the
application, resulting in very low "good-put" in the UDP-app. (That's why
it was so easy to DDoS DNS servers before...)

With the softirqd patch in place, ksoftirqd is scheduled fairly between
other applications running on the same CPU.  But in some cases this is
not what you want, so as the also commit mentions, the admin can now
more easily tune process scheduling parameters if needed, to adjust for
such use-cases (it was not really an admin choice before).


> For example, we didn't have any reports about this issue affecting cameras,
> Most cameras use ISOC nowadays, but some only provide bulk transfers.
> We usually try to use the minimum number of buffers possible, as
> increasing latency on cameras can be very annoying, specially on
> videoconference applications.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ