lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120123201410.GH28526@quack.suse.cz>
Date:	Mon, 23 Jan 2012 21:14:10 +0100
From:	Jan Kara <jack@...e.cz>
To:	Tomer Margalit <tomermargalit@...il.com>
Cc:	Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
	Nezer Zaidenberg <nzaidenberg@....com>
Subject: Re: Failing a bio right

On Sun 22-01-12 11:29:44, Tomer Margalit wrote:
> Hi Jav,
> 
> Thanks for the reply.
> 
> On Fri, Jan 20, 2012 at 4:01 PM, Jan Kara <jack@...e.cz> wrote:
> >  Hello,
> >
> > On Thu 19-01-12 18:04:19, Tomer Margalit wrote:
> >> I have a make_request function that blocks writes (by using
> >> wait_event_interruptible on some event).
> >> I want the user to be able to stop the function if it takes too long
> >> (that's the reason for the interruptible version).
> >> So when the call is interrupted I call bio_endio with the EINTR error
> >> to signal the interruption.
> >> Usually this works fine, but after a lot of writes, the system says
> >> "lost page write due to I/O error on device".
> >  This is because end_buffer_write_sync() doesn't really distinguish
> > errors.  So when some error happens it complains about I/O error.
> >
> >> At this point the process hangs.
> >  That is strange - you should probably collect stack trace of the failing
> > process (e.g. via 'echo w >/proc/sysrq-trigger'). That should tell us more.
> >
> 
> I cannot get a stack trace of the process since it hangs (probably in
> the write) - for instance doing 'gdb -p PID` or `strace -p PID` causes
> those to hang as well. The process doesn't segfault either.
  That's why I told you to use 'echo w >/proc/sysrq-trigger' and looking
at dmesg.

> >> Is this the right way to do what I'm trying to do?
> >  I'm not sure how is it supposed to work. Writes happen usually in an
> 
> The bdev I am creating is a virtual disk that replicates writes to a
> remote location. My intention is that it will behave like a socket -
> i.e. block until writes can be done. Actually the bdev is additionally
> meant to be semi-synchronous, so that after a buffer is filled, all
> writes are blocked until some buffers are sent to the remote end.
> 
> This works in principle, but when I try to cancel a write which is
> taking too long (for instance 100MB), it doesn't do anything (since
> it's stuck in the kernel).
> 
> > async manner (through page cache and flusher thread) or are you using
> > direct IO? Also if a write is interrupted at this point, you just lost the
> 
> All of this behavior happens when I do the final fsync(2) after all
> the data has been written.
> 
> > content of the buffer (as it is marked clean and !uptodate). Users usually
> > don't like that.
> >
> 
> I don't mind about contents lost since the user doesn't want to wait
> until the end of the write (if done without flush it may take as long
> as it requires, but flushing means wait until writes are done).
> 
> As a side note, I use the fsync since I have also implemented a
> marking mechanism for the bdev - and before creating a mark I need to
> make sure all previous writes have been flushed.
  OK, I see. Let's see what the stack traces of the hung process are.

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ