编程 Python

使用Python和百度语音识别生成视频字幕的实现

Posted in Python onApril 09, 2020

从视频中提取音频

安装 moviepy

pip install moviepy

相关代码：

audio_file = work_path + '\\out.wav'
video = VideoFileClip(video_file)
video.audio.write_audiofile(audio_file,ffmpeg_params=['-ar','16000','-ac','1'])

根据静音对音频分段

使用音频库 pydub，安装：

pip install pydub

第一种方法：

# 这里silence_thresh是认定小于-70dBFS以下的为silence，发现小于 sound.dBFS * 1.3 部分超过 700毫秒，就进行拆分。这样子分割成一段一段的。
sounds = split_on_silence(sound, min_silence_len = 500, silence_thresh= sound.dBFS * 1.3)


sec = 0
for i in range(len(sounds)):
 s = len(sounds[i])
 sec += s
print('split duration is ', sec)
print('dBFS: {0}, max_dBFS: {1}, duration: {2}, split: {3}'.format(round(sound.dBFS,2),round(sound.max_dBFS,2),sound.duration_seconds,len(sounds)))

使用Python和百度语音识别生成视频字幕的实现

感觉分割的时间不对，不好定位，我们换一种方法：

# 通过搜索静音的方法将音频分段
# 参考：https://wqian.net/blog/2018/1128-python-pydub-split-mp3-index.html
timestamp_list = detect_nonsilent(sound,500,sound.dBFS*1.3,1)
 
for i in range(len(timestamp_list)):
 d = timestamp_list[i][1] - timestamp_list[i][0]
 print("Section is :", timestamp_list[i], "duration is:", d)
print('dBFS: {0}, max_dBFS: {1}, duration: {2}, split: {3}'.format(round(sound.dBFS,2),round(sound.max_dBFS,2),sound.duration_seconds,len(timestamp_list)))

输出结果如下：

使用Python和百度语音识别生成视频字幕的实现

感觉这样好处理一些

使用百度语音识别

现在百度智能云平台创建一个应用，获取 API Key 和 Secret Key：

使用Python和百度语音识别生成视频字幕的实现

获取 Access Token

使用百度 AI 产品需要授权，一定量是免费的，生成字幕够用了。

'''
百度智能云获取 Access Token
'''
def fetch_token():
 params = {'grant_type': 'client_credentials',
    'client_id': API_KEY,
    'client_secret': SECRET_KEY}
 post_data = urlencode(params)
 if (IS_PY3):
  post_data = post_data.encode( 'utf-8')
 req = Request(TOKEN_URL, post_data)
 try:
  f = urlopen(req)
  result_str = f.read()
 except URLError as err:
  print('token http response http code : ' + str(err.errno))
  result_str = err.reason
 if (IS_PY3):
  result_str = result_str.decode()


 print(result_str)
 result = json.loads(result_str)
 print(result)
 if ('access_token' in result.keys() and 'scope' in result.keys()):
  print(SCOPE)
  if SCOPE and (not SCOPE in result['scope'].split(' ')): # SCOPE = False 忽略检查
   raise DemoError('scope is not correct')
  print('SUCCESS WITH TOKEN: %s EXPIRES IN SECONDS: %s' % (result['access_token'], result['expires_in']))
  return result['access_token']
 else:
  raise DemoError('MAYBE API_KEY or SECRET_KEY not correct: access_token or scope not found in token response')

使用 Raw 数据进行合成

这里使用百度语音极速版来合成文字，因为官方介绍专有GPU服务集群，识别响应速度较标准版API提升2倍及识别准确率提升15%。适用于近场短语音交互，如手机语音搜索、聊天输入等场景。支持上传完整的录音文件，录音文件时长不超过60秒。实时返回识别结果

def asr_raw(speech_data, token):
 length = len(speech_data)
 if length == 0:
  # raise DemoError('file %s length read 0 bytes' % AUDIO_FILE)
  raise DemoError('file length read 0 bytes')


 params = {'cuid': CUID, 'token': token, 'dev_pid': DEV_PID}
 #测试自训练平台需要打开以下信息
 #params = {'cuid': CUID, 'token': token, 'dev_pid': DEV_PID, 'lm_id' : LM_ID}
 params_query = urlencode(params)


 headers = {
  'Content-Type': 'audio/' + FORMAT + '; rate=' + str(RATE),
  'Content-Length': length
 }


 url = ASR_URL + "?" + params_query
 # print post_data
 req = Request(ASR_URL + "?" + params_query, speech_data, headers)
 try:
  begin = timer()
  f = urlopen(req)
  result_str = f.read()
  # print("Request time cost %f" % (timer() - begin))
 except URLError as err:
  # print('asr http response http code : ' + str(err.errno))
  result_str = err.reason


 if (IS_PY3):
  result_str = str(result_str, 'utf-8')
 return result_str

生成字幕

字幕格式： https://www.cnblogs.com/tocy/p/subtitle-format-srt.html

生成字幕其实就是语音识别的应用，将识别后的内容按照 srt 字幕格式组装起来就 OK 了。具体字幕格式的内容可以参考上面的文章，代码如下：

idx = 0
for i in range(len(timestamp_list)):
 d = timestamp_list[i][1] - timestamp_list[i][0]
 data = sound[timestamp_list[i][0]:timestamp_list[i][1]].raw_data
 str_rst = asr_raw(data, token)
 result = json.loads(str_rst)
 # print("rst is ", result)
 # print("rst is ", rst['err_no'][0])


 if result['err_no'] == 0:
  text.append('{0}\n{1} --> {2}\n'.format(idx, format_time(timestamp_list[i][0]/ 1000), format_time(timestamp_list[i][1]/ 1000)))
  text.append( result['result'][0])
  text.append('\n')
  idx = idx + 1
  print(format_time(timestamp_list[i][0]/ 1000), "txt is ", result['result'][0])
with open(srt_file,"r+") as f:
 f.writelines(text)

总结

我在视频网站下载了一个视频来作测试，极速模式从速度和识别率来说都是最好的，感觉比网易见外平台还好用。

到此这篇关于使用Python和百度语音识别生成视频字幕的文章就介绍到这了,更多相关Python 百度语音识别生成视频字幕内容请搜索三水点靠木以前的文章或继续浏览下面的相关文章希望大家以后多多支持三水点靠木！

使用Python和百度语音识别生成视频字幕的实现

- Author -

孙??

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

python3实现字符串的全排列的方法(无重复字符)

Jul 07 Python

python处理两种分隔符的数据集方法

Dec 12 Python

python itchat给指定联系人发消息的方法

Jun 11 Python

使用python进行广告点击率的预测的实现

Jul 04 Python

python 利用pywifi模块实现连接网络破解wifi密码实时监控网络

Sep 16 Python

windows中安装Python3.8.0的实现方法

Nov 19 Python

python 如何去除字符串头尾的多余符号

Nov 19 Python

python创建学生管理系统

Nov 22 Python

OpenCV哈里斯(Harris)角点检测的实现

Jan 15 Python

利用django创建一个简易的博客网站的示例

Sep 29 Python

关于Python不换行输出和不换行输出end=““不显示的问题(亲测已解决)

Oct 27 Python

python面向对象版学生信息管理系统

Jun 24 Python

利用Python制作动态排名图的实现代码

Apr 09 #Python

使用python接受tgam的脑波数据实例

Apr 09 #Python

解决使用python print打印函数返回值多一个None的问题

Apr 09 #Python

Python 实现自动完成A4标签排版打印功能

Apr 09 #Python

python网络编程：socketserver的基本使用方法实例分析

Apr 09 #Python

Python使用扩展库pywin32实现批量文档打印实例

Apr 09 #Python

python3 自动打印出最新版本执行的mysql2redis实例

Apr 09 #Python

You might like

php抓取页面的几种方法详解

2013/06/17 PHP

PHP实现一个限制实例化次数的类示例

2019/09/16 PHP

JavaScript基本对象

2007/01/11 Javascript

一直复略了的一个问题，关于表单重复提交

2007/02/15 Javascript

JS隐藏参数post传值实例

2013/04/18 Javascript

js原型继承的两种方法对比介绍

2014/03/30 Javascript

javascript计时器编写过程与实现方法

2016/02/29 Javascript

javascript函数命名的三种方式及区别介绍

2016/03/22 Javascript

基于Node.js + WebSocket打造即时聊天程序嗨聊

2016/11/29 Javascript

Vue.js组件tree实现无限级树形菜单

2016/12/02 Javascript

JS检测window.open打开的窗口是否关闭

2017/06/25 Javascript

利用jsonp与代理服务器方案解决跨域问题

2017/09/14 Javascript

javascript+css3开发打气球小游戏完整代码

2017/11/28 Javascript

Bootstrap Fileinput 4.4.7文件上传实例详解

2018/07/25 Javascript

[22:07]DOTA2-DPC中国联赛正赛 iG vs Magma 选手采访

2021/03/11 DOTA

python实现探测socket和web服务示例

2014/03/28 Python

Python中自定义函数的教程

2015/04/27 Python

Python中处理字符串的相关的len()方法的使用简介

2015/05/19 Python

Python中返回字典键的值的values()方法使用

2015/05/22 Python

Sublime开发python程序的示例代码

2018/01/24 Python

Python Tkinter模块实现时钟功能应用示例

2018/07/23 Python

Python的UTC时间转换讲解

2019/02/26 Python

Pytorch中accuracy和loss的计算知识点总结

2019/09/10 Python

python3操作注册表的方法（Url protocol）

2020/02/05 Python

欧洲最大的高尔夫零售商：American Golf

2019/09/02 全球购物

Watch Station官方网站：世界一流的手表和智能手表

2020/01/05 全球购物

Everlast官网：拳击、综合格斗和健身相关的体育用品

2020/08/03 全球购物

升职自荐书范文

2013/11/28 职场文书

yy婚礼主持词

2014/03/14 职场文书

大学生个人先进事迹材料范文

2014/05/03 职场文书

计划生育证明格式范本

2014/09/12 职场文书

信用卡结清证明怎么写

2014/09/13 职场文书

考试没考好检讨书（精选篇）

2014/11/16 职场文书

用Python实现一个打字速度测试工具来测试你的手速

2021/05/28 Python

SpringBoot 集成Redis 过程

2021/06/02 Redis

python使用pymysql模块操作MySQL

2021/06/16 Python