linux-kernel - Re: [PATCH 4.1 125/159] net: call rcu_read_lock early in process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <560A4E3A.2080104@tomt.net>
Date:	Tue, 29 Sep 2015 10:39:22 +0200
From:	"Andre Tomt (LKML)" <lkml@...t.net>
To:	Julian Anastasov <ja@....bg>
Cc:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	linux-kernel@...r.kernel.org,
	"David S. Miller" <davem@...emloft.net>, stable@...r.kernel.org,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Stephen Hemminger <stephen@...workplumber.org>,
	holger.hoffstaette@...glemail.com
Subject: Re: [PATCH 4.1 125/159] net: call rcu_read_lock early in
 process_backlog

I just had another hang with it reverted on two different guests..
However it took nearly 6 hours rather than the usual "few minutes" for
these two. So now I'm a little unsure about my initial conclusions.

On 29. sep. 2015 09:40, Julian Anastasov wrote:
> On Tue, 29 Sep 2015, Andre Tomt (LKML) wrote:
> 
> 	No problem with 4.2+? Same setup/config?

I've not yet observed any problems like this on 4.2+.

The 4.1 and 4.2 config is more or less identical.

> 
>> For now I think this patch should be reverted in 4.1.9.
>>
>> The hangs have occured so far on Xen PV and KVM x86_64 virtual machines, they
>> will hang completely within minutes or hours depending on the type of
>> workload. The workloads are all fairly light, one running low traffic
>> email/antispam, another running monitoring and metrics of ~5 hosts and one
>> running a single terminal IRC client. All but the IRC one will hang within a
>> few minutes of booting.
>>
>> When they lock up they only respond to sysrq, with ttyS0/hvc0 not echoing
>> anything typed in back, and are completely dead on the network. One system
>> managed to report rcu stalls but no backtraces (I'll look over the debug
>> config, if there is any interest).
>>
>> My bare metal desktop has yet to be able to hit it, but it might be entirely
>> down to a different type of workload.
>>
>> Something missing in 4.1?
> 
> 	They are 2 related patches, the first one is
> [PATCH 4.1 124/159] net: do not process device backlog during unregistration

Would reverting this change anything outside device unregistration at all?

> 	But the problematic patch calls rcu_read_lock while
> local IRQ is disabled (in process_backlog), this is something
> that should be noted for the patch. I'll try to see what Xen does.
> It would be useful to see .config and any kind of backtraces/stalls,
> it will help also to other developers to catch the problem...

4.1 and 4.2 configs attached

I'll see if I can get some more debugging options enabled and a fully
mainline test kernel, I still got a few local patches for security/ and
runtime modify_ldt switching lurking in here..

So far the kernels have not produced any output, other than a RCU stall
detected message without any backtrace or other information, and that
was just one time out of a couple dozen hangs.

View attachment "config-4.1.0-1" of type "text/plain" (127299 bytes)

View attachment "config-4.2.0-1" of type "text/plain" (129408 bytes)