编程 Python

Tensorflow训练模型越来越慢的2种解决方案

Posted in Python onFebruary 07, 2020

1 解决方案

【方案一】

载入模型结构放在全局，即tensorflow会话外层。

'''载入模型结构:最关键的一步'''
saver = tf.train.Saver()
'''建立会话'''
with tf.Session() as sess:
 for i in range(STEPS):
 '''开始训练'''
 _, loss_1, acc, summary = sess.run([train_op_1, train_loss, train_acc, summary_op], feed_dict=feed_dict)
 '''保存模型'''
 saver.save(sess, save_path="./model/path", i)

【方案二】

在方案一的基础上，将模型结构放在图会话的外部。

'''预测值'''
train_logits= network_model.inference(inputs, keep_prob)
'''损失值'''
train_loss = network_model.losses(train_logits)
'''优化'''
train_op = network_model.train(train_loss, learning_rate)
'''准确率'''
train_acc = network_model.evaluation(train_logits, labels)
'''模型输入'''
feed_dict = {inputs: x_batch, labels: y_batch, keep_prob: 0.5}
'''载入模型结构'''
saver = tf.train.Saver()
'''建立会话'''
with tf.Session() as sess:
 for i in range(STEPS):
 '''开始训练'''
 _, loss_1, acc, summary = sess.run([train_op_1, train_loss, train_acc, summary_op], feed_dict=feed_dict)
 '''保存模型'''
 saver.save(sess, save_path="./model/path", i)

2 时间测试

通过不同方法测试训练程序，得到不同的训练时间，每执行一次训练都重新载入图结构，会使每一步的训练时间逐次增加，如果训练步数越大，后面训练速度越来越慢，最终可导致图爆炸，而终止训练。

【时间累加】

2019-05-15 10:55:29.009205: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
step: 0, time cost: 1.8800880908966064
step: 1, time cost: 1.592250108718872
step: 2, time cost: 1.553826093673706
step: 3, time cost: 1.5687050819396973
step: 4, time cost: 1.5777575969696045
step: 5, time cost: 1.5908267498016357
step: 6, time cost: 1.5989274978637695
step: 7, time cost: 1.6078357696533203
step: 8, time cost: 1.6087186336517334
step: 9, time cost: 1.6123006343841553
step: 10, time cost: 1.6320762634277344
step: 11, time cost: 1.6317598819732666
step: 12, time cost: 1.6570467948913574
step: 13, time cost: 1.6584930419921875
step: 14, time cost: 1.6765813827514648
step: 15, time cost: 1.6751370429992676
step: 16, time cost: 1.7304580211639404
step: 17, time cost: 1.7583982944488525

【时间均衡】

2019-05-15 13:03:49.394354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7048 MB memory) -> physical GPU (device: 1, name: Tesla P4, pci bus id: 0000:00:0d.0, compute capability: 6.1)
step: 0, time cost: 1.9781079292297363
loss1:6.78, loss2:5.47, loss3:5.27, loss4:7.31, loss5:5.44, loss6:6.87, loss7: 6.84
Total loss: 43.98, accuracy: 0.04, steps: 0, time cost: 1.9781079292297363
step: 1, time cost: 0.09688425064086914
step: 2, time cost: 0.09693264961242676
step: 3, time cost: 0.09671926498413086
step: 4, time cost: 0.09688210487365723
step: 5, time cost: 0.09646058082580566
step: 6, time cost: 0.09669041633605957
step: 7, time cost: 0.09666872024536133
step: 8, time cost: 0.09651994705200195
step: 9, time cost: 0.09705543518066406
step: 10, time cost: 0.09690332412719727

3 原因分析

(1) Tensorflow使用图结构构建系统，图结构中有节点(node)和边(operation)，每次进行计算时会向图中添加边和节点进行计算或者读取已存在的图结构；

(2) 使用图结构也是一把双刃之剑，可以加快计算和提高设计效率，但是，程序设计不合理会导向负面，使训练越来约慢；

(3) 训练越来越慢是因为运行一次sess.run，向图中添加一次节点或者重新载入一次图结构，导致图中节点和边越来越多，计算参数也成倍增长；

(4) tf.train.Saver()就是载入图结构的类，因此设计训练程序时，若每执行一次跟新就使用该类载入图结构，自然会增加参数数量，必然导致训练变慢；

(5) 因此，将载入图结构的类放在全局，即只载入一次图结构，其他时间只训练图结构中的参数，可保持原有的训练速度；

4 总结

(1) 设计训练网络，只载入一次图结构即可；

(2) tf.train.Saver()就是载入图结构的类，将该类的实例化放在全局，即会话外部，解决训练越来越慢。

以上这篇Tensorflow训练模型越来越慢的2种解决方案就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持三水点靠木。

Tensorflow训练模型越来越慢的2种解决方案

- Author -

xdq101

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python 面向对象成员的访问约束

Dec 23 Python

Python递归实现汉诺塔算法示例

Mar 19 Python

Python+OpenCV实现车牌字符分割和识别

Mar 31 Python

Python利用Django如何写restful api接口详解

Jun 08 Python

使用pandas将numpy中的数组数据保存到csv文件的方法

Jun 14 Python

Python用Try语句捕获异常的实例方法

Jun 26 Python

浅析python内置模块collections

Nov 15 Python

pycharm修改file type方式

Nov 19 Python

python numpy库linspace相同间隔采样的实现

Feb 25 Python

如何打包Python Web项目实现免安装一键启动的方法

May 21 Python

keras 获取某层输出获取复用层的多次输出实例

May 23 Python

python 三种方法实现对Excel表格的读写

Nov 19 Python

详解python itertools功能

Feb 07 #Python

Python中itertools的用法详解

Feb 07 #Python

Python转换itertools.chain对象为数组的方法

Feb 07 #Python

已安装tensorflow-gpu,但keras无法使用GPU加速的解决

Feb 07 #Python

python十进制转二进制的详解

Feb 07 #Python

基于Tensorflow使用CPU而不用GPU问题的解决

Feb 07 #Python

python实现ip地址的包含关系判断

Feb 07 #Python

You might like

php生成xml时添加CDATA标签的方法

2014/10/17 PHP

PHP7扩展开发之hello word实现方法详解

2018/01/15 PHP

在Laravel的Model层做数据缓存的实现

2019/09/26 PHP

combox改进版页面原型参考dojo的，比网上jQuery的那些combox功能强，代码更小

2010/04/15 Javascript

自己写的兼容ie和ff的在线文本编辑器类似ewebeditor

2012/12/12 Javascript

js验证模型自我实现的具体方法

2013/06/21 Javascript

JqueryMobile动态生成listView并实现刷新的两种方法

2014/03/05 Javascript

ajax请求乱码的解决方法(中文乱码)

2014/04/10 Javascript

常用的jquery模板插件——jQuery Boilerplate介绍

2014/09/23 Javascript

jQuery下拉框的简单应用

2016/06/24 Javascript

vue.js组件vue-waterfall-easy实现瀑布流效果

2017/08/22 Javascript

在 webpack 中使用 ECharts的实例详解

2018/02/05 Javascript

JS+HTML5 Canvas实现简单的写字板功能示例

2018/08/30 Javascript

关于vue路由缓存清除在main.js中的设置

2019/11/06 Javascript

关于uniApp editor微信滑动问题

2021/01/15 Javascript

[01:12]快闪回顾DOTA2亚洲邀请赛（DAC）静候2018新征程开启

2018/03/11 DOTA

打印出python 当前全局变量和入口参数的所有属性

2009/07/01 Python

PHP webshell检查工具 python实现代码

2009/09/15 Python

python在linux中输出带颜色的文字的方法

2014/06/19 Python

Python实现Windows上气泡提醒效果的方法

2015/06/03 Python

浅谈用VSCode写python的正确姿势

2017/12/16 Python

关于python之字典的嵌套,递归调用方法

2019/01/21 Python

Python自定义函数计算给定日期是该年第几天的方法示例

2019/05/30 Python

python3射线法判断点是否在多边形内

2019/06/28 Python

Python 迭代，for...in遍历，迭代原理与应用示例

2019/10/12 Python

Python逐行读取文件内容的方法总结

2020/02/14 Python

python中使用asyncio实现异步IO实例分析

2021/02/26 Python

英国奢华护肤、美容和Spa品牌：Temple Spa

2019/11/02 全球购物

德国亚洲食品网上商店：asiafoodland.de

2019/12/28 全球购物

医科大学生的自我评价

2013/12/04 职场文书

学习三严三实对照检查材料思想汇报

2014/09/22 职场文书

毕业生实习期转正自我鉴定

2014/09/26 职场文书

呼啸山庄读书笔记

2015/06/29 职场文书

田径运动会广播稿

2015/08/19 职场文书

幼儿教师继续教育培训心得体会

2016/01/19 职场文书

大学生村官驻村工作心得体会

2016/01/23 职场文书