lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAB=NE6Wti1RpAFk5q_YeZn2F9Rd=wsiwhyPszu74nG9fXwH5vQ@mail.gmail.com>
Date:	Fri, 5 Sep 2014 00:47:16 -0700
From:	"Luis R. Rodriguez" <mcgrof@...not-panic.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Dmitry Torokhov <dmitry.torokhov@...il.com>,
	Wu Zhangjin <falcon@...zu.com>, Takashi Iwai <tiwai@...e.de>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Oleg Nesterov <oleg@...hat.com>, hare@...e.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	Joseph Salisbury <joseph.salisbury@...onical.com>,
	Benjamin Poirier <bpoirier@...e.de>,
	Santosh Rastapur <santosh@...lsio.com>,
	Kay Sievers <kay@...y.org>,
	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
	Tim Gardner <tim.gardner@...onical.com>,
	Pierre Fersing <pierre-fersing@...rref.org>,
	Nagalakshmi Nandigama <nagalakshmi.nandigama@...gotech.com>,
	Praveen Krishnamoorthy <praveen.krishnamoorthy@...gotech.com>,
	Sreekanth Reddy <sreekanth.reddy@...gotech.com>,
	Abhijit Mahajan <abhijit.mahajan@...gotech.com>,
	Casey Leedom <leedom@...lsio.com>,
	Hariprasad S <hariprasad@...lsio.com>,
	MPT-FusionLinux.pdl@...gotech.com,
	Linux SCSI List <linux-scsi@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <tj@...nel.org> wrote:
> On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
> ...
>> +             /*
>> +              * I got SIGKILL, but wait for 60 more seconds for completion
>> +              * unless chosen by the OOM killer. This delay is there as a
>> +              * workaround for boot failure caused by SIGKILL upon device
>> +              * driver initialization timeout.
>> +              *
>> +              * N.B. this will actually let the thread complete regularly,
>> +              * wait_for_completion() will be used eventually, the 60 second
>> +              * try here is just to check for the OOM over that time.
>> +              */
>> +             WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
>> +                       "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
>> +             for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
>> +                     if (wait_for_completion_timeout(&done, HZ))
>> +                             goto wait_done;
>> +
>
> Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
> instead of 30?

Nope! I fell into the same trap and only with tons of patience by part
of Tetsuo with me was I able to grok that the 60 seconds here are not
for increasing the timeout, this is just time spent checking to ensure
that the OOM wasn't the one who triggered the SIGKILL. Even if the
drivers took eons it should be fine now, I tried it :D

>  Why do we even need this with the proposed async
> probing changes?

Ah -- well without it the way we "find" drivers that need this new
"async feature" is by a bug report and folks saying their system can't
boot, or they say their device doesn't come up. That's all. Tracing
this to systemd and a timeout was one of the most ugliest things ever.
There two insane bug reports you can go check:

mptsas was the first:

http://article.gmane.org/gmane.linux.kernel/1669550
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248

Then cxgb4:

https://bugzilla.novell.com/show_bug.cgi?id=877622

I only had Cc'd you on the newest gem pata_marvell :

https://bugzilla.kernel.org/show_bug.cgi?id=59581

We can't seriously expect to be doing all this work for every driver.
a WARN_ONCE() would enable us to find the drivers that need this new
async probe "feature".

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ