TensorFlow内存管理bfc算法实例


Posted in Python onFebruary 03, 2020

1. 基本介绍

tensorflow设备内存管理模块实现了一个best-fit with coalescing算法(后文简称bfc算法)。

bfc算法是Doung Lea's malloc(dlmalloc)的一个非常简单的版本。

它具有内存分配、释放、碎片管理等基本功能。

2. bfc基本算法思想

1. 数据结构

整个内存空间由一个按基址升序排列的Chunk双向链表来表示,它们的直接前趋和后继必须在地址连续的内存空间。Chunk结构体里含有实际大小、请求大小、是否被占用、基址、直接前趋、直接后继、Bin索引等信息。

TensorFlow内存管理bfc算法实例

2. 申请

用户申请一个内存块(malloc)。根据chunk双链表找到一个合适的内存块,如果该内存块的大小是用户申请的大小的二倍以上,那么就将该内存块切分成两块,这就是split操作。

返回其中一块给用户,并将该内存块标识为占用

Spilt操作会新增一个chunk,所以需要修改chunk双链表以维持前驱和后继关系

如果用户申请512的空间,正好有一块1024的chunk2是空闲的,由于1024/512 =2,所以chunk2 被split为2块:chunk2_1和chunk2_2。返回chunk2_1给用户并将其标志位占用状态。

3. 释放

用户释放一个内存块(free)。先将该块标记为空闲。然后根据chunk数据结构中的信息找到其前驱和后继内存块。如果前驱和后继块中有空闲的块,那么将刚释放的块和空闲的块合并成一个更大的chunk(这就是merge操作,合并当前块和其前后的空闲块)。再修改双链表结构以维持前驱后继关系。这就做到了内存碎片的回收。

如果用户要free chunk3,由于chunk3的前驱chunk2也是空闲的,所以将chunk2和chunk3合并得到一个新的chunk2',大小为chunk2和chunk3之和。

3. bins

1. bins数据结构

bfc算法采取的是被动分块的策略。最开始整个内存是一个chunk,随着用户申请空间的次数增加,最开始的大chunk会被不断的split开来,从而产生越来越多的小chunk。当chunk数量很大时,为了寻找一个合适的内存块而遍历双链表无疑是一笔巨大的开销。为了实现对空闲块的高效管理,bfc算法设计了bin这个抽象数据结构。

每个bin都有一个size属性,一个bin是一个拥有chunk size >= binsize的空闲chunk的集合。集合中的chunk按照chunk size的升序组织成单链表。bfc算法维护了一个bin的集合:bins。它由多个bin以及从属于每个bin的chunks组成。内存中所有的空闲chunk都由bins管理。

TensorFlow内存管理bfc算法实例

图中每一列表示一个bin,列首方格中的数字表示bin的size。bin size的大小都是256的2^n的倍。每个bin下面挂载了一系列的空闲chunk,每个chunk的chunk size都大于等于所属的bin的bin size,按照chunk size的升序挂载成单链表。

2. bins操作

bfc算法针对bins这个集合设计了三个操作:search、insert、delete。

search

给定一个chunk size,从bins中找到大于等于该chunksize的最小的那个空闲chunk。Search操作具体流程如下。如果bin以数组的形式组织,那么可以从index = chunk size /256 >>2的那个bin开始查找。最好的情况是开始查找的那个bin的chunk链表非空,那么直接返回链表头即可。这种情况时间复杂度是常数级的。最坏的情况是遍历bins数组中所有的bin。对于一般大小的内存来说,bins数组元素非常少,比如4G空间只需要23个bin就足够了(256 * 2 ^ 23 > 4G),因此也很快能返回结果。总体来说search操作是非常高效的。对于固定大小内存来说,查找时间是常数量级的。

insert

将一个空闲的chunk插入到一个bin所挂载的chunk链表中,同时需要维持chunk链表的升序关系。具体流程是直接将chunk插入到index = chunk size /256 >>2的那个bin中即可。

delete

将一个空闲的chunk从bins中移除。

4. 总结

将内存分块管理,按块进行空间分配和释放。

通过split操作将大内存块分解成用户需要的小内存块。

通过merge操作合并小的内存块,做到内存碎片回收

通过bin这个抽象数据结构实现对空闲块高效管理。

5. 代码分析

1. 代码地址

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/common_runtime

2. 数据结构

Chunk

static const int kInvalidChunkHandle = -1;
...
struct Chunk {
  size_t size = 0; // Full size of buffer.

  // We sometimes give chunks that are larger than needed to reduce
  // fragmentation. requested_size keeps track of what the client
  // actually wanted so we can understand whether our splitting
  // strategy is efficient.
  size_t requested_size = 0;

  // allocation_id is set to -1 when the chunk is not in use. It is assigned a
  // value greater than zero before the chunk is returned from
  // AllocateRaw, and this value is unique among values assigned by
  // the parent allocator.
  int64 allocation_id = -1;
  void* ptr = nullptr; // pointer to granted subbuffer.

  // If not kInvalidChunkHandle, the memory referred to by 'prev' is directly
  // preceding the memory used by this chunk. E.g., It should start
  // at 'ptr - prev->size'
  ChunkHandle prev = kInvalidChunkHandle;

  // If not kInvalidChunkHandle, the memory referred to by 'next' is directly
  // following the memory used by this chunk. E.g., It should be at
  // 'ptr + size'
  ChunkHandle next = kInvalidChunkHandle;

  // What bin are we in?
  BinNum bin_num = kInvalidBinNum;

  bool in_use() const { return allocation_id != -1; }
};

Bin

// A Bin is a collection of similar-sized free chunks.
struct Bin {
  // All chunks in this bin have >= bin_size memory.
  size_t bin_size = 0;

  struct ChunkComparator {
    explicit ChunkComparator(BFCAllocator* allocator)
      : allocator_(allocator) {}
    // Sort first by size and then use pointer address as a tie breaker.
    bool operator()(const ChunkHandle ha,
            const ChunkHandle hb) const NO_THREAD_SAFETY_ANALYSIS {
      const Chunk* a = allocator_->ChunkFromHandle(ha);
      const Chunk* b = allocator_->ChunkFromHandle(hb);
      if (a->size != b->size) {
        return a->size < b->size;
      }
      return a->ptr < b->ptr;
    }

    private:
      BFCAllocator* allocator_; // The parent allocator
  };

  typedef std::set<ChunkHandle, ChunkComparator> FreeChunkSet;
  // List of free chunks within the bin, sorted by chunk size.
  // Chunk * not owned.
  FreeChunkSet free_chunks;
  Bin(BFCAllocator* allocator, size_t bs)
    : bin_size(bs), free_chunks(ChunkComparator(allocator)) {}
};

AllocationRegion

AllocationRegion给一个连续的内存区域做指针到ChunkHandle的映射。

RegionManager

RegionManager聚集了一个或多个AllocationRegion,并提供一个从指针到基础ChunkHandle的间接层,这个间接层可在多个不连续的内存区域进行分配。

3. 分配大小

将每次分配的内存大小调整为kMinAllocationSize的N倍,这样所有内存地址都是很好地按字节对齐了。

// kMinAllocationSize = 256
static const size_t kMinAllocationBits = 8;
static const size_t kMinAllocationSize = 1 << kMinAllocationBits;
...
size_t BFCAllocator::RoundedBytes(size_t bytes) {
  size_t rounded_bytes =
    (kMinAllocationSize *
    ((bytes + kMinAllocationSize - 1) / kMinAllocationSize));
  DCHECK_EQ(size_t{0}, rounded_bytes % kMinAllocationSize);
  return rounded_bytes;
}

4. 初始化bin

typedef int BinNum;
static const int kInvalidBinNum = -1;
static const int kNumBins = 21;
...
// 二进制2^8往左移0,1,2位
// (static_cast<size_t>(256) << 0) = 256
// (static_cast<size_t>(256) << 1) = 512
// (static_cast<size_t>(256) << 2) = 1024
size_t BinNumToSize(BinNum index) {
  return static_cast<size_t>(256) << index;
}
...
char bins_space_[sizeof(Bin) * kNumBins];
// Map from bin size to Bin
Bin* BinFromIndex(BinNum index) {
  return reinterpret_cast<Bin*>(&(bins_space_[index * sizeof(Bin)]));
}
...
// We create bins to fit all possible ranges that cover the
// memory_limit_ starting from allocations up to 256 bytes to
// allocations up to (and including) the memory limit.
for (BinNum b = 0; b < kNumBins; b++) {
  size_t bin_size = BinNumToSize(b);
  VLOG(1) << "Creating bin of max chunk size "
      << strings::HumanReadableNumBytes(bin_size);
  new (BinFromIndex(b)) Bin(this, bin_size);
  CHECK_EQ(BinForSize(bin_size), BinFromIndex(b));
  CHECK_EQ(BinForSize(bin_size + 255), BinFromIndex(b));
  CHECK_EQ(BinForSize(bin_size * 2 - 1), BinFromIndex(b));
  if (b + 1 < kNumBins) {
    CHECK_NE(BinForSize(bin_size * 2), BinFromIndex(b));
  }
}

5. 查找bin

// 求属于第几个bin
BinNum BinNumForSize(size_t bytes) {
  uint64 v = std::max<size_t>(bytes, 256) >> kMinAllocationBits;
  int b = std::min(kNumBins - 1, Log2FloorNonZero(v));
  return b;
}
// 最高位非零的二进制位数,eg: 0001 0101B 为5
inline int Log2FloorNonZero(uint64 n) {
#if defined(__GNUC__)
  return 63 ^ __builtin_clzll(n);
#elif defined(PLATFORM_WINDOWS)
  unsigned long index;
  _BitScanReverse64(&index, n);
  return index;
#else
  int r = 0;
  while (n > 0) {
    r++;
    n >>= 1;
  }
  return r;
#endif
}

6. 查找Chunk

// 先加锁
mutex_lock l(lock_);
void* ptr = FindChunkPtr(bin_num, rounded_bytes, num_bytes);
if (ptr != nullptr) {
  return ptr;
}
// FindChunkPtr函数内部
void* BFCAllocator::FindChunkPtr(BinNum bin_num, size_t rounded_bytes,
                 size_t num_bytes) {
  // First identify the first bin that could satisfy rounded_bytes.
  for (; bin_num < kNumBins; bin_num++) {
    // Start searching from the first bin for the smallest chunk that fits
    // rounded_bytes.
    Bin* b = BinFromIndex(bin_num);
    for (auto citer = b->free_chunks.begin(); citer != b->free_chunks.end();
        ++citer) {
      // 从之前得到的Bin索引开始,查找合适的空闲Chunk:
      const BFCAllocator::ChunkHandle h = (*citer);
      BFCAllocator::Chunk* chunk = ChunkFromHandle(h);
      DCHECK(!chunk->in_use());
      if (chunk->size >= rounded_bytes) {
        // We found an existing chunk that fits us that wasn't in use, so remove
        // it from the free bin structure prior to using.
        RemoveFreeChunkIterFromBin(&b->free_chunks, citer);

        // If we can break the size of the chunk into two reasonably
        // large pieces, do so.
        //
        // TODO(vrv): What should be the criteria when deciding when
        // to split? 
        // 具体实现后面会分析
        if (chunk->size >= rounded_bytes * 2) {
          SplitChunk(h, rounded_bytes);
          chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved
        }

        // The requested size of the returned chunk is what the user
        // has allocated.
        chunk->requested_size = num_bytes;
        // Assign a unique id and increment the id counter, marking the
        // chunk as being in use.
        chunk->allocation_id = next_allocation_id_++;

        // Update stats.
        ++stats_.num_allocs;
        stats_.bytes_in_use += chunk->size;
        stats_.max_bytes_in_use =
          std::max(stats_.max_bytes_in_use, stats_.bytes_in_use);
        stats_.max_alloc_size =
          std::max<std::size_t>(stats_.max_alloc_size, chunk->size);

        VLOG(4) << "Returning: " << chunk->ptr;
        if (VLOG_IS_ON(4)) {
          LOG(INFO) << "A: " << RenderOccupancy();
        }
        return chunk->ptr;
      }
    }
  }
  return nullptr;
}

7. 拆分Chunk

如果Chunk的大小大于等于申请内存大小的2倍,那么将该Chunk拆分成2个:第一个Chunk的大小等于申请内存大小,第二个Chunk作为它的直接后继。

if (chunk->size >= rounded_bytes * 2) {
  SplitChunk(h, rounded_bytes);
  chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved
}

void BFCAllocator::SplitChunk(BFCAllocator::ChunkHandle h, size_t num_bytes) {
  // Allocate the new chunk before we do any ChunkFromHandle
  ChunkHandle h_new_chunk = AllocateChunk();

  Chunk* c = ChunkFromHandle(h);
  CHECK(!c->in_use() && (c->bin_num == kInvalidBinNum));

  // Create a new chunk starting num_bytes after c
  BFCAllocator::Chunk* new_chunk = ChunkFromHandle(h_new_chunk);
  new_chunk->ptr = static_cast<void*>(static_cast<char*>(c->ptr) + num_bytes);
  region_manager_.set_handle(new_chunk->ptr, h_new_chunk);

  // Set the new sizes of the chunks.
  new_chunk->size = c->size - num_bytes;
  c->size = num_bytes;

  // The new chunk is not in use.
  new_chunk->allocation_id = -1;

  // Maintain the pointers.
  // c <-> c_neighbor becomes
  // c <-> new_chunk <-> c_neighbor
  BFCAllocator::ChunkHandle h_neighbor = c->next;
  new_chunk->prev = h;
  new_chunk->next = h_neighbor;
  c->next = h_new_chunk;
  if (h_neighbor != kInvalidChunkHandle) {
    Chunk* c_neighbor = ChunkFromHandle(h_neighbor);
    c_neighbor->prev = h_new_chunk;
  }

  // Add the newly free chunk to the free bin.
  InsertFreeChunkIntoBin(h_new_chunk);
}

8. 回收chunk

加锁,获得ChunkHandle

mutex_lock l(lock_);
BFCAllocator::ChunkHandle h = region_manager_.get_handle(ptr);
FreeAndMaybeCoalesce(h);

FreeAndMaybeCoalesce

void BFCAllocator::FreeAndMaybeCoalesce(BFCAllocator::ChunkHandle h) {
  Chunk* c = ChunkFromHandle(h);
  CHECK(c->in_use() && (c->bin_num == kInvalidBinNum));

  // Mark the chunk as no longer in use
  c->allocation_id = -1;

  // Updates the stats.
  stats_.bytes_in_use -= c->size;

  // This chunk is no longer in-use, consider coalescing the chunk
  // with adjacent chunks.
  ChunkHandle chunk_to_reassign = h;

  // If the next chunk is free, coalesce the two
  if (c->next != kInvalidChunkHandle) {
    Chunk* cnext = ChunkFromHandle(c->next);
    if (!cnext->in_use()) {
    //   VLOG(8) << "Chunk at " << cnext->ptr << " merging with c " <<
    //   c->ptr;

    chunk_to_reassign = h;

    // Deletes c->next
    RemoveFreeChunkFromBin(c->next);
    Merge(h, ChunkFromHandle(h)->next);
    }
  }

  // If the previous chunk is free, coalesce the two
  c = ChunkFromHandle(h);
  if (c->prev != kInvalidChunkHandle) {
    Chunk* cprev = ChunkFromHandle(c->prev);
    if (!cprev->in_use()) {
    //   VLOG(8) << "Chunk at " << c->ptr << " merging into c->prev "
    //    << cprev->ptr;

    chunk_to_reassign = c->prev;

    // Deletes c
    RemoveFreeChunkFromBin(c->prev);
    Merge(ChunkFromHandle(h)->prev, h);
    c = ChunkFromHandle(h);
    }
  }

  InsertFreeChunkIntoBin(chunk_to_reassign);
}

Merge

// Merges h1 and h2 when Chunk(h1)->next is h2 and Chunk(h2)->prev is c1.
// We merge Chunk(h2) into Chunk(h1).
void BFCAllocator::Merge(BFCAllocator::ChunkHandle h1,
             BFCAllocator::ChunkHandle h2) {
  Chunk* c1 = ChunkFromHandle(h1);
  Chunk* c2 = ChunkFromHandle(h2);
  // We can only merge chunks that are not in use.
  CHECK(!c1->in_use() && !c2->in_use());

  // c1's prev doesn't change, still points to the same ptr, and is
  // still not in use.

  // Fix up neighbor pointers
  //
  // c1 <-> c2 <-> c3 should become
  // c1 <-> c3

  BFCAllocator::ChunkHandle h3 = c2->next;
  c1->next = h3;
  CHECK(c2->prev == h1);
  if (h3 != kInvalidChunkHandle) {
    BFCAllocator::Chunk* c3 = ChunkFromHandle(h3);
    c3->prev = h1;
  }

  // Set the new size
  c1->size += c2->size;

  DeleteChunk(h2);
}

以上这篇TensorFlow内存管理bfc算法实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python之import机制详解
Jul 03 Python
Python Web框架Flask信号机制(signals)介绍
Jan 01 Python
Python 如何访问外围作用域中的变量
Sep 11 Python
spark: RDD与DataFrame之间的相互转换方法
Jun 07 Python
pandas筛选某列出现编码错误的解决方法
Nov 07 Python
用python做游戏的细节详解
Jun 25 Python
python下PyGame的下载与安装过程及遇到问题
Aug 04 Python
基于Python实现船舶的MMSI的获取(推荐)
Oct 21 Python
你可能不知道的Python 技巧小结
Jan 29 Python
Django 如何实现文件上传下载
Apr 08 Python
PyQt5 显示超清高分辨率图片的方法
Apr 11 Python
据Python爬虫不靠谱预测可知今年双十一销售额将超过6000亿元
Nov 11 Python
python numpy数组中的复制知识解析
Feb 03 #Python
opencv python Canny边缘提取实现过程解析
Feb 03 #Python
Pycharm debug调试时带参数过程解析
Feb 03 #Python
Python使用enumerate获取迭代元素下标
Feb 03 #Python
TensorFlow 显存使用机制详解
Feb 03 #Python
opencv python如何实现图像二值化
Feb 03 #Python
python实现人机猜拳小游戏
Feb 03 #Python
You might like
利用PHP制作简单的内容采集器的代码
2007/11/28 PHP
require(),include(),require_once()和include_once()区别
2008/03/27 PHP
php入门学习知识点四 PHP正则表达式基本应用
2011/07/14 PHP
php笔记之:php数组相关函数的使用
2013/04/26 PHP
百度工程师讲PHP函数的实现原理及性能分析(三)
2015/05/13 PHP
Javascript 表单之间的数据传递代码
2008/12/04 Javascript
JS代码优化技巧之通俗版(减少js体积)
2011/12/23 Javascript
jQuery点击弹出下拉菜单的小例子
2013/08/01 Javascript
JavaScript通过Date-Mask将日期转换成字符串的方法
2015/06/04 Javascript
z-blog SyntaxHighlighter 长代码无法换行解决办法(基于jquery)
2015/11/18 Javascript
js类式继承与原型式继承详解
2016/04/07 Javascript
基于JS实现9种不同的面包屑和分布式多步骤导航效果
2017/02/21 Javascript
react.js 获取真实的DOM节点实例(必看)
2017/04/17 Javascript
vue自定v-model实现表单数据双向绑定问题
2018/09/03 Javascript
Vuejs2 + Webpack框架里,模拟下载的实例讲解
2018/09/05 Javascript
亲自动手实现vue日历控件
2019/06/26 Javascript
详解Python中的多线程编程
2015/04/09 Python
Python实现大文件排序的方法
2015/07/10 Python
python导入时小括号大作用
2017/01/10 Python
用Python实现筛选文件脚本的方法
2018/10/27 Python
Python中安装easy_install的方法
2018/11/18 Python
PyTorch 1.0 正式版已经发布了
2018/12/13 Python
Django框架模型简单介绍与使用分析
2019/07/18 Python
将matplotlib绘图嵌入pyqt的方法示例
2020/01/08 Python
获取CSDN文章内容并转换为markdown文本的python
2020/09/06 Python
Sneaker Studio罗马尼亚网站:购买运动鞋
2018/11/04 全球购物
如何将一个描述日期或日期/时间的字符串转换为一个Date对象
2015/10/13 面试题
化工专业大学生职业生涯规划书
2014/01/14 职场文书
篮球社团活动总结
2014/06/27 职场文书
会计电算化实训报告
2014/11/04 职场文书
数学教师个人总结
2015/02/06 职场文书
归元寺导游词
2015/02/06 职场文书
2015年秘书个人工作总结
2015/04/25 职场文书
《最后一头战象》读后感:动物也有感情
2020/01/02 职场文书
在Python中如何使用yield
2021/06/07 Python
python垃圾回收机制原理分析
2022/04/13 Python