lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20190401131943.GA2979@mit.edu>
Date:   Mon, 1 Apr 2019 09:19:43 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     liu.song11@....com.cn
Cc:     adilger@...ger.ca, jack@...e.cz, fishland@...yun.com,
        jack@...e.com, linux-ext4@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] jbd2: do not start commit when t_updates does not
 backtozero

On Mon, Apr 01, 2019 at 10:35:04AM +0800, liu.song11@....com.cn wrote:
> 
> Our device is CF card(TS8GCF300), mount options are very general(rw,dirsync,
> relatime,data=ordered).
> The hung problem appears under ext4, but the reason is related to the way 
> of use. In our system, there are many RT tasks, which make normal priority 
> tasks survived in harsh environments, such as syslogd. The syslog record is 
> also under the same device, which is really a stumbling block.
> We moved the location of the syslog record to another device and the hungtask 
> problem was solved.

So the general advice which is going to be true for all file systems
is (a) don't try to do any file I/O from real-time tasks, and (b) if
you must do file I/O from a real-time task, be prepared to be willing
to accept your real-time time tasks blocking behind device I/O, thus
destroying your real-time guarantees, and (c) make sure any kernel
threads used by the file system (e.g., such as the jbd2 kernel thread
for ext4) is also given real-time priority.

Was syslogd being run with real-time priority?  If not, you're going
to not really have real-time performance unles you make sure syslog(3)
calls don't block waiting for syslogd to acknowledge the write.  See
syslog-async as referenced here[1].

[1] https://stackoverflow.com/questions/208098/can-syslog-performance-be-improved

What I suspect was happening was you were using standard syslog(3)
which was blocking for syslogd to respond, syslog was by default
trying to fsync every single log entry before returning success (this
can changed by making the appropriate change to syslog.conf; that's a
different change suggested by [1] above), and so your real-time task
that was calling syslog was blocking.  Since it was a real-time task,
and the jbd2 kernel thread was not a real time thread, this caused a
deadlock.

There are multiple things you can try to optimize (and with real-time
systems, getting configuration right is really, REALLY, critical), but
it sounds like the real root cause is you have a real-time task using
syslog(3).  Don't do that.  It will probably cause you problems in
multiple dimensions.

						- Ted

P.S.  Especially don't try using syslog in real-time tasks if said
real-time system is going to be used in commercial aviation.  It might
cause scandals ala the 737 MAX.  :-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ