编程 Python

完美解决TensorFlow和Keras大数据量内存溢出的问题

Posted in Python onJuly 03, 2020

内存溢出问题是参加kaggle比赛或者做大数据量实验的第一个拦路虎。

以前做的练手小项目导致新手产生一个惯性思维——读取训练集图片的时候把所有图读到内存中，然后分批训练。

其实这是有问题的，很容易导致OOM。现在内存一般16G，而训练集图片通常是上万张，而且RGB图，还很大，VGG16的图片一般是224x224x3，上万张图片，16G内存根本不够用。这时候又会想起——设置batch，但是那个batch的输入参数却又是图片，它只是把传进去的图片分批送到显卡，而我OOM的地方恰是那个“传进去”的图片，怎么办？

解决思路其实说来也简单，打破思维定式就好了，不是把所有图片读到内存中，而是只把所有图片的路径一次性读到内存中。

大致的解决思路为：

将上万张图片的路径一次性读到内存中，自己实现一个分批读取函数，在该函数中根据自己的内存情况设置读取图片，只把这一批图片读入内存中，然后交给模型，模型再对这一批图片进行分批训练，因为内存一般大于等于显存，所以内存的批次大小和显存的批次大小通常不相同。

下面代码分别介绍Tensorflow和Keras分批将数据读到内存中的关键函数。Tensorflow对初学者不太友好，所以我个人现阶段更习惯用它的高层API Keras来做相关项目，下面的TF实现是之前不会用Keras分批读时候参考的一些列资料，在模型训练上仍使用Keras，只有分批读取用了TF的API。

Tensorlow

在input.py里写get_batch函数。

def get_batch(X_train, y_train, img_w, img_h, color_type, batch_size, capacity):
  '''
  Args:
    X_train: train img path list
    y_train: train labels list
    img_w: image width
    img_h: image height
    batch_size: batch size
    capacity: the maximum elements in queue
  Returns:
    X_train_batch: 4D tensor [batch_size, width, height, chanel],\
            dtype=tf.float32
    y_train_batch: 1D tensor [batch_size], dtype=int32
  '''
  X_train = tf.cast(X_train, tf.string)

  y_train = tf.cast(y_train, tf.int32)
  
  # make an input queue
  input_queue = tf.train.slice_input_producer([X_train, y_train])

  y_train = input_queue[1]
  X_train_contents = tf.read_file(input_queue[0])
  X_train = tf.image.decode_jpeg(X_train_contents, channels=color_type)

  X_train = tf.image.resize_images(X_train, [img_h, img_w], 
                   tf.image.ResizeMethod.NEAREST_NEIGHBOR)

  X_train_batch, y_train_batch = tf.train.batch([X_train, y_train],
                         batch_size=batch_size,
                         num_threads=64,
                         capacity=capacity)
  y_train_batch = tf.one_hot(y_train_batch, 10)

  return X_train_batch, y_train_batch

在train.py文件中训练（下面不是纯TF代码，model.fit是Keras的拟合，用纯TF的替换就好了）。

X_train_batch, y_train_batch = inp.get_batch(X_train, y_train, 
                       img_w, img_h, color_type, 
                       train_batch_size, capacity)
X_valid_batch, y_valid_batch = inp.get_batch(X_valid, y_valid, 
                       img_w, img_h, color_type, 
                       valid_batch_size, capacity)
with tf.Session() as sess:

  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)
  try:
    for step in np.arange(max_step):
      if coord.should_stop() :
        break
      X_train, y_train = sess.run([X_train_batch, 
                       y_train_batch])
      X_valid, y_valid = sess.run([X_valid_batch,
                       y_valid_batch])
       
      ckpt_path = 'log/weights-{val_loss:.4f}.hdf5'
      ckpt = tf.keras.callbacks.ModelCheckpoint(ckpt_path, 
                           monitor='val_loss', 
                           verbose=1, 
                           save_best_only=True, 
                           mode='min')
      model.fit(X_train, y_train, batch_size=64, 
             epochs=50, verbose=1,
             validation_data=(X_valid, y_valid),
             callbacks=[ckpt])
      
      del X_train, y_train, X_valid, y_valid

  except tf.errors.OutOfRangeError:
    print('done!')
  finally:
    coord.request_stop()
  coord.join(threads)
  sess.close()

Keras

keras文档中对fit、predict、evaluate这些函数都有一个generator，这个generator就是解决分批问题的。

关键函数：fit_generator

# 读取图片函数
def get_im_cv2(paths, img_rows, img_cols, color_type=1, normalize=True):
  '''
  参数：
    paths：要读取的图片路径列表
    img_rows:图片行
    img_cols:图片列
    color_type:图片颜色通道
  返回: 
    imgs: 图片数组
  '''
  # Load as grayscale
  imgs = []
  for path in paths:
    if color_type == 1:
      img = cv2.imread(path, 0)
    elif color_type == 3:
      img = cv2.imread(path)
    # Reduce size
    resized = cv2.resize(img, (img_cols, img_rows))
    if normalize:
      resized = resized.astype('float32')
      resized /= 127.5
      resized -= 1. 
    
    imgs.append(resized)
    
  return np.array(imgs).reshape(len(paths), img_rows, img_cols, color_type)

获取批次函数，其实就是一个generator

def get_train_batch(X_train, y_train, batch_size, img_w, img_h, color_type, is_argumentation):
  '''
  参数：
    X_train：所有图片路径列表
    y_train: 所有图片对应的标签列表
    batch_size:批次
    img_w:图片宽
    img_h:图片高
    color_type:图片类型
    is_argumentation:是否需要数据增强
  返回: 
    一个generator，x: 获取的批次图片 y: 获取的图片对应的标签
  '''
  while 1:
    for i in range(0, len(X_train), batch_size):
      x = get_im_cv2(X_train[i:i+batch_size], img_w, img_h, color_type)
      y = y_train[i:i+batch_size]
      if is_argumentation:
        # 数据增强
        x, y = img_augmentation(x, y)
      # 最重要的就是这个yield，它代表返回，返回以后循环还是会继续，然后再返回。就比如有一个机器一直在作累加运算，但是会把每次累加中间结果告诉你一样，直到把所有数加完
      yield({'input': x}, {'output': y})

训练函数

result = model.fit_generator(generator=get_train_batch(X_train, y_train, train_batch_size, img_w, img_h, color_type, True), 
     steps_per_epoch=1351, 
     epochs=50, verbose=1,
     validation_data=get_train_batch(X_valid, y_valid, valid_batch_size,img_w, img_h, color_type, False),
     validation_steps=52,
     callbacks=[ckpt, early_stop],
     max_queue_size=capacity,
     workers=1)

就是这么简单。但是当初从0到1的过程很难熬，每天都没有进展，没有头绪，急躁占据了思维的大部，熬过了这个阶段，就会一切顺利，不是运气，而是踩过的从0到1的每个脚印累积的灵感的爆发，从0到1的脚印越多，后面的路越顺利。

以上这篇完美解决TensorFlow和Keras大数据量内存溢出的问题就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持三水点靠木。

完美解决TensorFlow和Keras大数据量内存溢出的问题

- Author -

刘开心_8a6c

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python求解平方根的方法

Mar 11 Python

python创建进程fork用法

Jun 04 Python

Python tkinter模块中类继承的三种方式分析

Aug 08 Python

基于Python函数的作用域规则和闭包(详解)

Nov 29 Python

Python模拟自动存取款机的查询、存取款、修改密码等操作

Sep 02 Python

如何安装多版本python python2和python3共存以及pip共存

Sep 18 Python

Python装饰器简单用法实例小结

Dec 03 Python

Python 文本文件内容批量抽取实例

Dec 10 Python

浅谈Python 列表字典赋值的陷阱

Jan 20 Python

Django+zTree构建组织架构树的方法

Aug 21 Python

python爬虫scrapy框架的梨视频案例解析

Feb 20 Python

Python爬虫实战之爬取携程评论

Jun 02 Python

Keras 在fit_generator训练方式中加入图像random_crop操作

Jul 03 #Python

keras的三种模型实现与区别说明

Jul 03 #Python

Keras中 ImageDataGenerator函数的参数用法

Jul 03 #Python

python程序如何进行保存

Jul 03 #Python

keras的ImageDataGenerator和flow()的用法说明

Jul 03 #Python

python如何安装下载后的模块

Jul 03 #Python

python中id函数运行方式

Jul 03 #Python

You might like

php仿ZOL分页类代码

2008/10/02 PHP

php自动加载autoload机制示例分享

2014/02/20 PHP

PHPCMS2008广告模板SQL注入漏洞修复

2016/10/11 PHP

PHP中检索字符串的方法分析【strstr与substr_count方法】

2017/02/17 PHP

解决php-fpm.service not found问题的办法

2017/06/06 PHP

非常漂亮的JS代码经典广告

2007/10/21 Javascript

escape、encodeURI 和 encodeURIComponent 的区别

2009/03/02 Javascript

file模式访问网页时iframe高度自适应解决方案

2013/01/16 Javascript

仿当当网淘宝网等主流电子商务网站商品分类导航菜单

2013/09/25 Javascript

解决Extjs4中form表单提交后无法进入success函数问题

2013/11/26 Javascript

简述JavaScript对传统文档对象模型的支持

2015/06/16 Javascript

jQuery height()、innerHeight()、outerHeight()函数的区别详解

2016/05/23 Javascript

JS之相等操作符详解

2016/09/13 Javascript

JavaScript下拉菜单功能实例代码

2017/03/01 Javascript

react-native滑动吸顶效果的实现过程

2019/06/03 Javascript

vue excel上传预览和table内容下载到excel文件中

2019/12/10 Javascript

jQuery实现王者荣耀手风琴效果

2020/01/17 jQuery

关于element-ui表单中限制输入纯数字的解决方式

2020/09/08 Javascript

Python中title()方法的使用简介

2015/05/20 Python

Python、PyCharm安装及使用方法（Mac版）详解

2017/04/28 Python

Python通过matplotlib绘制动画简单实例

2017/12/13 Python

Python数据结构与算法之使用队列解决小猫钓鱼问题

2017/12/14 Python

python中使用print输出中文的方法

2018/07/16 Python

python flask框架实现重定向功能示例

2019/07/02 Python

shell程序中如何注释

2012/01/28 面试题

学院书画协会部门职责

2013/11/28 职场文书

基层工作经历证明

2014/01/13 职场文书

医院学雷锋活动策划方案

2014/02/15 职场文书

《童趣》教学反思

2014/02/19 职场文书

诚信的演讲稿范文

2014/05/12 职场文书

校园环保建议书

2014/05/14 职场文书

资料员岗位职责

2015/02/10 职场文书

我在伊朗长大观后感

2015/06/16 职场文书

2016应届毕业生自荐信范文

2016/01/28 职场文书

Python使用scapy模块发包收包

2021/05/07 Python

Java详细解析==和equals的区别

2022/04/07 Java/Android