[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <560A4E3A.2080104@tomt.net>
Date: Tue, 29 Sep 2015 10:39:22 +0200
From: "Andre Tomt (LKML)" <lkml@...t.net>
To: Julian Anastasov <ja@....bg>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
linux-kernel@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>, stable@...r.kernel.org,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Stephen Hemminger <stephen@...workplumber.org>,
holger.hoffstaette@...glemail.com
Subject: Re: [PATCH 4.1 125/159] net: call rcu_read_lock early in
process_backlog
I just had another hang with it reverted on two different guests..
However it took nearly 6 hours rather than the usual "few minutes" for
these two. So now I'm a little unsure about my initial conclusions.
On 29. sep. 2015 09:40, Julian Anastasov wrote:
> On Tue, 29 Sep 2015, Andre Tomt (LKML) wrote:
>
> No problem with 4.2+? Same setup/config?
I've not yet observed any problems like this on 4.2+.
The 4.1 and 4.2 config is more or less identical.
>
>> For now I think this patch should be reverted in 4.1.9.
>>
>> The hangs have occured so far on Xen PV and KVM x86_64 virtual machines, they
>> will hang completely within minutes or hours depending on the type of
>> workload. The workloads are all fairly light, one running low traffic
>> email/antispam, another running monitoring and metrics of ~5 hosts and one
>> running a single terminal IRC client. All but the IRC one will hang within a
>> few minutes of booting.
>>
>> When they lock up they only respond to sysrq, with ttyS0/hvc0 not echoing
>> anything typed in back, and are completely dead on the network. One system
>> managed to report rcu stalls but no backtraces (I'll look over the debug
>> config, if there is any interest).
>>
>> My bare metal desktop has yet to be able to hit it, but it might be entirely
>> down to a different type of workload.
>>
>> Something missing in 4.1?
>
> They are 2 related patches, the first one is
> [PATCH 4.1 124/159] net: do not process device backlog during unregistration
Would reverting this change anything outside device unregistration at all?
> But the problematic patch calls rcu_read_lock while
> local IRQ is disabled (in process_backlog), this is something
> that should be noted for the patch. I'll try to see what Xen does.
> It would be useful to see .config and any kind of backtraces/stalls,
> it will help also to other developers to catch the problem...
4.1 and 4.2 configs attached
I'll see if I can get some more debugging options enabled and a fully
mainline test kernel, I still got a few local patches for security/ and
runtime modify_ldt switching lurking in here..
So far the kernels have not produced any output, other than a RCU stall
detected message without any backtrace or other information, and that
was just one time out of a couple dozen hangs.
View attachment "config-4.1.0-1" of type "text/plain" (127299 bytes)
View attachment "config-4.2.0-1" of type "text/plain" (129408 bytes)
Powered by blists - more mailing lists