[<prev] [next>] [day] [month] [year] [list]
Message-ID: <74b8fcf6-b5e0-7cae-d860-0ed894bfe938@suse.de>
Date: Tue, 3 Dec 2019 22:09:32 +0800
From: Coly Li <colyli@...e.de>
To: kungf <wings.wyang@...il.com>
Cc: kent.overstreet@...il.com, linux-bcache@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode
On 2019/12/3 3:16 下午, kungf wrote:
>
>
> On Mon, 2 Dec 2019 at 19:09, Coly Li <colyli@...e.de
> <mailto:colyli@...e.de>> wrote:
>>
>> On 2019/12/2 6:24 下午, kungf wrote:
>> > data may lost when in the follow scene of writeback mode:
>> > 1. client write data1 to bcache
>> > 2. client fdatasync
>> > 3. bcache flush cache set and backing device
>> > if now data1 was not writed back to backing, it was only guaranteed
> safe in cache.
>> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
>> > So data1 was not guaranteed in non-volatile storage, it may lost if
> power interruption
>> >
>>
>> Hi,
>>
>> Do you encounter such problem in real work load ? With bcache journal, I
>> don't see the possibility of data lost with your description.
>>
>> Correct me if I am wrong.
>>
>> Coly Li
>>
> Hi Coly,
>
> Sorry to confuse you. As i known now, write_dirty function write dirty
> to backing without FUA,and write_dirty_finish make dirty key clean,
> it means the data indexed by the key will not be writeback again, am i
> wrong?
Yes, you are right. This is the behavior as design. We don't guarantee
the data will be on always on platter, this is what most storage systems do.
> I only find that the backing device will be flushed when bcache get an
> PREFLUSH bio, any other place it will be flushed in journal?
>
Storage system flushes its buffer when upper layer requires, that means
if the application wants to make its writing data flushed on platter, it
should explicitly issue a flush request.
What you observe and test are all as designed IMHO. The I/O stack does
not guarantee any data persistent on storage media unless an explicit
flush request received from upper layer and returned to upper layer.
Coly Li
> I made a test that write bcache with dd,and then detach it, blktrace
> the cache and backing device at the same time.
> 1. close writeback
> # echo 0 > /sys/block/bcache0/bcache/writeback_running
> 2. write data with a fdatasync
> #dd if=/dev/zero of=/dev/bcache0 bs=16k count=1 oflag=direct
> 3. detach and trigger writeback
> #echo b1f40ca5-37a3-4852-9abf-6abed96d71db >/sys/block/bcache0/bcache/detach
>
> the blow text is blkparse result.
> from cache blktrace blow, we can see 16k data write to cache set, and
> then flush with op FWFSM (PREFLUSH| WRITE| FUA|SYNC|META )
> ```
> 8,160 33 1 0.000000000 222844 A W 630609920 + 32 <-
> (8,167) 1464320
> 8,167 33 2 0.000000478 222844 Q W 630609920 + 32 [dd]
> 8,167 33 3 0.000006167 222844 G W 630609920 + 32 [dd]
> 8,167 33 5 0.000011385 222844 I W 630609920 + 32 [dd]
> 8,167 33 6 0.000023890 948 D W 630609920 + 32
> [kworker/33:1H]
> 8,167 33 7 0.000111203 0 C W 630609920 + 32 [0]
> 8,160 34 1 0.000167029 215616 A FWFSM 629153808 + 8 <-
> (8,167) 8208
> 8,167 34 2 0.000167490 215616 Q FWFSM 629153808 + 8
> [kworker/34:2]
> 8,167 34 3 0.000169061 215616 G FWFSM 629153808 + 8
> [kworker/34:2]
> 8,167 34 4 0.000301308 949 D WFSM 629153808 + 8
> [kworker/34:1H]
> 8,167 34 5 0.000348832 0 C WFSM 629153808 + 8 [0]
> 8,167 34 6 0.000349612 0 C WFSM 629153808 [0]
> ```
>
> from backing blktrace blow, the backing device first get flush op FWS
> (PERFLUSH|WRITE|SYNC) because of we stop writeback, then get W op after
> detach,
> the 16k data was writeback to backing device, and after this, the
> backing device never get flush op, */it means that the 16k data we write
> it's not safe in backing/*
> */device, even we dd write with fdatasync./*
> ```
> 8,144 33 1 0.000000000 222844 Q WSM 8 + 8 [dd]
> 8,144 33 2 0.000016609 222844 G WSM 8 + 8 [dd]
> 8,144 33 5 0.000020710 222844 I WSM 8 + 8 [dd]
> 8,144 33 6 0.000031967 948 D WSM 8 + 8 [kworker/33:1H]
> 8,144 33 7 0.000152945 88631 C WS 16 + 32 [0]
> 8,144 34 1 0.000186127 215616 Q FWS [kworker/34:2]
> 8,144 34 2 0.000187006 215616 G FWS [kworker/34:2]
> 8,144 33 8 0.000326761 0 C WSM 8 + 8 [0]
> 8,144 34 3 0.020195027 0 C WS 16 [0]
> 8,144 34 4 0.020195904 0 C FWS 16 [0]
> 8,144 23 1 19.415130395 215884 Q W 16 + 32 [kworker/23:2]
> 8,144 23 2 19.415132072 215884 G W 16 + 32 [kworker/23:2]
> 8,144 23 3 19.415133134 215884 I W 16 + 32 [kworker/23:2]
> 8,144 23 4 19.415137776 1215 D W 16 + 32 [kworker/23:1H]
> 8,144 23 5 19.416607260 0 C W 16 + 32 [0]
> 8,144 24 1 19.416640754 222593 Q WSM 8 + 8 [bcache_writebac]
> 8,144 24 2 19.416642698 222593 G WSM 8 + 8 [bcache_writebac]
> 8,144 24 3 19.416643505 222593 I WSM 8 + 8 [bcache_writebac]
> 8,144 24 4 19.416650589 1107 D WSM 8 + 8 [kworker/24:1H]
> 8,144 24 5 19.416865258 0 C WSM 8 + 8 [0]
> 8,144 24 6 19.416871350 221889 Q WSM 8 + 8 [kworker/24:1]
> 8,144 24 7 19.416872201 221889 G WSM 8 + 8 [kworker/24:1]
> 8,144 24 8 19.416872542 221889 I WSM 8 + 8 [kworker/24:1]
> 8,144 24 9 19.416875458 1107 D WSM 8 + 8 [kworker/24:1H]
> 8,144 24 10 19.417076935 0 C WSM 8 + 8 [0]
> ```
>
>
>
> On Mon, 2 Dec 2019 at 19:09, Coly Li <colyli@...e.de
> <mailto:colyli@...e.de>> wrote:
>
> On 2019/12/2 6:24 下午, kungf wrote:
> > data may lost when in the follow scene of writeback mode:
> > 1. client write data1 to bcache
> > 2. client fdatasync
> > 3. bcache flush cache set and backing device
> > if now data1 was not writed back to backing, it was only
> guaranteed safe in cache.
> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> > So data1 was not guaranteed in non-volatile storage, it may lost
> if power interruption
> >
>
> Hi,
>
> Do you encounter such problem in real work load ? With bcache journal, I
> don't see the possibility of data lost with your description.
>
> Correct me if I am wrong.
>
> Coly Li
>
> > Signed-off-by: kungf <wings.wyang@...il.com
> <mailto:wings.wyang@...il.com>>
> > ---
> > drivers/md/bcache/writeback.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/bcache/writeback.c
> b/drivers/md/bcache/writeback.c
> > index 4a40f9eadeaf..e5cecb60569e 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> > */
> > if (KEY_DIRTY(&w->key)) {
> > dirty_init(w);
> > - bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> > + bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> > io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> > bio_set_dev(&io->bio, io->dc->bdev);
> > io->bio.bi_end_io = dirty_endio;
> >
>
Powered by blists - more mailing lists