linux-kernel - Re: BLKSECDISCARD ioctl and hung tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200212230652.GA145444@mit.edu>
Date:   Wed, 12 Feb 2020 18:06:52 -0500
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Salman Qazi <sqazi@...gle.com>
Cc:     Jens Axboe <axboe@...nel.dk>, Ming Lei <ming.lei@...hat.com>,
        Bart Van Assche <bvanassche@....org>,
        Christoph Hellwig <hch@....de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-block@...r.kernel.org, Gwendal Grignou <gwendal@...gle.com>,
        Jesse Barnes <jsbarnes@...gle.com>
Subject: Re: BLKSECDISCARD ioctl and hung tasks

This is a problem we've been strugging with in other contexts.  For
example, if you have the hung task timer set to 2 minutes, and the
system to panic if the hung task timer exceeds that, and an NFS server
which the client is writing to crashes, and it takes longer for the
NFS server to come back, that might be a situation where we might want
to exempt the hung task warning from panic'ing the system.  On the
other hand, if the process is failing to schedule for other reasons,
maybe we would still want the hung task timeout to go off.

So I've been meditating over whether the right answer is to just
globally configure the hung task timer to something like 5 or 10
minutes (which would require no kernel changes, yay?), or have some
way of telling the hung task timeout logic that it shouldn't apply, or
should have a different timeout, when we're waiting for I/O to
complete.

It seems to me that perhaps there's a different solution here for your
specific case, which is what if there is a asynchronous version of
BLKSECDISCARD, either using io_uring or some other interface?  That
bypasses the whole issue of how do we modulate the hung task timeout
when it's a situation where maybe it's OK for a userspace thread to
block for more than 120 seconds, without having to either completely
disable the hung task timeout, or changing it globally to some much
larger value.

					- Ted