linux-kernel - Re: [PATCH] dm cache: fix race affecting dirty block count

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 03 Aug 2014 06:33:51 +0300
From:	Anssi Hannula <anssi.hannula@....fi>
To:	Pranith Kumar <bobby.prani@...il.com>
CC:	ejt@...hat.com, snitzer@...hat.com,
	LKML <linux-kernel@...r.kernel.org>,
	torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] dm cache: fix race affecting dirty block count

03.08.2014, 05:10, Pranith Kumar kirjoitti:
> Corrently adding Anssi this time.
> 
> On Sat, Aug 2, 2014 at 10:00 PM, Pranith Kumar <bobby.prani@...il.com> wrote:
>> Hello Anssi, Joe, Mike,

Hi,

>> I just found your patch in the latest rc and wanted to ask a few
>> questions. I am not sure how this patch helps at all other than luck in
>> that dm_cblock_t type is of type int32_t, which should guarantee that it
>> is atomic on most platforms. Which begs the question, what platform did
>> you encounter this problem?

On x86_64. Regular integer increment/decrement/addition/subtraction are
not atomic on x86(_64).

Assembly for increment without patch is
	addl   $0x1,0x108(%rdi)
and for decrement
	subl   $0x1,0x108(%rbx)

while with patch:
	lock incl 0x108(%rdi)
and
	mov    $0xffffffff,%eax
	lock xadd %eax,0x108(%rbx)
.


>> The patch purports to solve a race condition by making use of atomic_t.
>> I am not sure that is enough. If indeed there is a race you need to use
>> smp_mb__{before/after}_atomic() for both your uses of atomic_inc() and
>> atomic_set().

The issue was only about concurrent increments/decrements getting lost
completely from time to time, which atomic_inc/dec will prevent without
any explicit barriers.

>> Also I have a concern about why this mail was not CC'ed on LKML. I had
>> to go to some difficulty in finding this patch. So please CC LKML for
>> such patches.

I don't usually CC LKML if there is a subsystem-specific mailing list
unless there is a specific reason to CC LKML as well. I thought this was
standard practice, but I'm happy to spam LKML as well from now on if
that is preferred by others as well.


>> Thanks,
>> --
>> Pranith
>>
>> -- Begin forwarded Message --
>>
>>
>> nr_dirty is updated without locking, causing it to drift so that it is
>> non-zero (either a small positive integer, or a very large one when an
>> underflow occurs) even when there are no actual dirty blocks.
>>
>> Fix that by using an atomic type for nr_dirty.
>>
>> Signed-off-by: Anssi Hannula <anssi hannula iki fi>
>> Cc: Joe Thornber <ejt redhat com>
>> Cc: stable vger kernel org
>>
>> ---
>>
>> So here's a patch for the issue I reported a few days ago.
>>
>> I'm not sure what the actual effects of having wrong nr_dirty are
>> beyond the wrong value reported to userspace. Having wrong nr_dirty
>> causes dm_table_event(cache->ti->table) to be called at incorrect times,
>> but I'm not sure what the implications of that are.
>>
>> So someone more knowledged should really amend my commit message
>> accordingly so that it explains the user-visible issues :)
>>
>> I did notice that after a reboot (after having incorrect nr_dirty) the
>> entire cache was considered to be dirty and a full writeback occurred,
>> but I don't know if that was related at all.
>>
>>
>>  drivers/md/dm-cache-target.c | 11 +++++------
>>  1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
>> index 0df3ec085ebb..83a7eb5ef113 100644
>> --- a/drivers/md/dm-cache-target.c
>> +++ b/drivers/md/dm-cache-target.c
>> @@ -154,7 +154,7 @@ struct cache {
>>         /*
>>          * cache_size entries, dirty if set
>>          */
>> -       dm_cblock_t nr_dirty;
>> +       atomic_t nr_dirty;
>>         unsigned long *dirty_bitset;
>>
>>         /*
>> @@ -403,7 +403,7 @@ static bool is_dirty(struct cache *cache, dm_cblock_t b)
>>  static void set_dirty(struct cache *cache, dm_oblock_t oblock, dm_cblock_t cblock)
>>  {
>>         if (!test_and_set_bit(from_cblock(cblock), cache->dirty_bitset)) {
>> -               cache->nr_dirty = to_cblock(from_cblock(cache->nr_dirty) + 1);
>> +               atomic_inc(&cache->nr_dirty);
>>                 policy_set_dirty(cache->policy, oblock);
>>         }
>>  }
>> @@ -412,8 +412,7 @@ static void clear_dirty(struct cache *cache, dm_oblock_t oblock, dm_cblock_t cbl
>>  {
>>         if (test_and_clear_bit(from_cblock(cblock), cache->dirty_bitset)) {
>>                 policy_clear_dirty(cache->policy, oblock);
>> -               cache->nr_dirty = to_cblock(from_cblock(cache->nr_dirty) - 1);
>> -               if (!from_cblock(cache->nr_dirty))
>> +               if (atomic_dec_return(&cache->nr_dirty) == 0)
>>                         dm_table_event(cache->ti->table);
>>         }
>>  }
>> @@ -2003,7 +2002,7 @@ static int cache_create(struct cache_args *ca, struct cache **result)
>>         init_waitqueue_head(&cache->migration_wait);
>>
>>         r = -ENOMEM;
>> -       cache->nr_dirty = 0;
>> +       atomic_set(&cache->nr_dirty, 0);
>>         cache->dirty_bitset = alloc_bitset(from_cblock(cache->cache_size));
>>         if (!cache->dirty_bitset) {
>>                 *error = "could not allocate dirty bitset";
>> @@ -2513,7 +2512,7 @@ static void cache_status(struct dm_target *ti, status_type_t type,
>>                        (unsigned) atomic_read(&cache->stats.demotion),
>>                        (unsigned) atomic_read(&cache->stats.promotion),
>>                        (unsigned long long) from_cblock(residency),
>> -                      cache->nr_dirty);
>> +                      (unsigned) atomic_read(&cache->nr_dirty));
>>
>>                 if (cache->features.write_through)
>>                         DMEMIT("1 writethrough ");
>> --
>> 1.8.4.5
>>
>> -- End forwarded Message --
>>
> 
> 
> 


-- 
Anssi Hannula
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/