Python统计纯文本文件中英文单词出现个数的方法总结【测试可用】


Posted in Python onJuly 25, 2018

本文实例讲述了Python统计纯文本文件中英文单词出现个数的方法。分享给大家供大家参考,具体如下:

第一版: 效率低

# -*- coding:utf-8 -*-
#!python3
path = 'test.txt'
with open(path,encoding='utf-8',newline='') as f:
  word = []
  words_dict= {}
  for letter in f.read():
    if letter.isalnum():
      word.append(letter)
    elif letter.isspace(): #空白字符 空格 \t \n
      if word:
        word = ''.join(word).lower() #转小写
        if word not in words_dict:
          words_dict[word] = 1
        else:
          words_dict[word] += 1
        word = []
#处理最后一个单词
if word:
  word = ''.join(word).lower() # 转小写
  if word not in words_dict:
    words_dict[word] = 1
  else:
    words_dict[word] += 1
  word = []
for k,v in words_dict.items():
  print(k,v)

运行结果:

we 4
are 1
busy 1
all 1
day 1
like 1
swarms 1
of 6
flies 1
without 1
souls 1
noisy 1
restless 1
unable 1
to 1
hear 1
the 7
voices 1
soul 1
as 1
time 1
goes 1
by 1
childhood 1
away 2
grew 1
up 1
years 1
a 1
lot 1
memories 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence 1
regardless 1
shackles 1
mind 1
indulge 1
in 1
world 1
buckish 1
focus 1
on 1
beneficial 1
principle 1
lost 1
themselves 1

第二版:

缺点:遇到大文件要一次读入内存,性能不好

# -*- coding:utf-8 -*-
#!python3
import re
path = 'test.txt'
with open(path,'r',encoding='utf-8') as f:
  data = f.read()
  word_reg = re.compile(r'\w+')
  #word_reg = re.compile(r'\w+\b')
  word_list = word_reg.findall(data)
  word_list = [word.lower() for word in word_list] #转小写
  word_set = set(word_list) #避免重复查询
  # words_dict = {}
  # for word in word_set:
  #   words_dict[word] = word_list.count(word)
  # 简洁写法
  words_dict = {word: word_list.count(word) for word in word_set}
  for k,v in words_dict.items():
    print(k,v)

运行结果:

on 1
also 1
souls 1
focus 1
soul 1
time 1
noisy 1
grew 1
lot 1
childish 1
like 1
voices 1
indulge 1
swarms 1
buckish 1
restless 1
we 4
hear 1
childhood 1
as 1
world 1
themselves 1
are 1
bottom 1
memories 1
the 7
of 6
flies 1
without 1
have 2
day 1
busy 1
to 1
eroded 1
regardless 1
unable 1
innocence 1
up 1
a 1
in 1
mind 1
goes 1
by 1
lost 1
principle 1
once 1
away 2
years 1
beneficial 1
all 1
shackles 1

第三版:

# -*- coding:utf-8 -*-
#!python3
import re
path = 'test.txt'
with open(path, 'r', encoding='utf-8') as f:
  word_list = []
  word_reg = re.compile(r'\w+')
  for line in f:
    #line_words = word_reg.findall(line)
    #比上面的正则更加简单
    line_words = line.split()
    word_list.extend(line_words)
  word_set = set(word_list) # 避免重复查询
  words_dict = {word: word_list.count(word) for word in word_set}
  for k, v in words_dict.items():
    print(k, v)

运行结果:

childhood 1
innocence, 1
are 1
of 6
also 1
lost 1
We 1
regardless 1
noisy, 1
by, 1
on 1
themselves. 1
grew 1
lot 1
bottom 1
buckish, 1
time 1
childish 1
voices 1
once 1
restless, 1
shackles 1
world 1
eroded 1
As 1
all 1
day, 1
swarms 1
we 3
soul. 1
memories, 1
in 1
without 1
like 1
beneficial 1
up, 1
unable 1
away 1
flies 1
goes 1
a 1
have 2
away, 1
mind, 1
focus 1
principle, 1
hear 1
to 1
the 7
years 1
busy 1
souls, 1
indulge 1

第四版:使用Counter统计

# -*- coding:utf-8 -*-
#!python3
import collections
import re
path = 'test.txt'
with open(path, 'r', encoding='utf-8') as f:
  word_list = []
  word_reg = re.compile(r'\w+')
  for line in f:
    line_words = line.split()
    word_list.extend(line_words)
  words_dict = dict(collections.Counter(word_list)) #使用Counter统计
  for k, v in words_dict.items():
    print(k, v)

运行结果:

We 1
are 1
busy 1
all 1
day, 1
like 1
swarms 1
of 6
flies 1
without 1
souls, 1
noisy, 1
restless, 1
unable 1
to 1
hear 1
the 7
voices 1
soul. 1
As 1
time 1
goes 1
by, 1
childhood 1
away, 1
we 3
grew 1
up, 1
years 1
away 1
a 1
lot 1
memories, 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence, 1
regardless 1
shackles 1
mind, 1
indulge 1
in 1
world 1
buckish, 1
focus 1
on 1
beneficial 1
principle, 1
lost 1
themselves. 1

注:这里使用的测试文本test.txt如下:

We are busy all day, like swarms of flies without souls, noisy, restless, unable to hear the voices of the soul. As time goes by, childhood away, we grew up, years away a lot of memories, once have also eroded the bottom of the childish innocence, we regardless of the shackles of mind, indulge in the world buckish, focus on the beneficial principle, we have lost themselves.

Python 相关文章推荐
《Python之禅》中对于Python编程过程中的一些建议
Apr 03 Python
Python实现批量修改文件名实例
Jul 08 Python
详解python3百度指数抓取实例
Dec 12 Python
Python中多个数组行合并及列合并的方法总结
Apr 12 Python
python实现zabbix发送短信脚本
Sep 17 Python
python如果快速判断数字奇数偶数
Nov 13 Python
后端开发使用pycharm的技巧(推荐)
Mar 27 Python
TensorFLow 数学运算的示例代码
Apr 21 Python
python3 使用openpyxl将mysql数据写入xlsx的操作
May 15 Python
python中random.randint和random.randrange的区别详解
Sep 20 Python
python如何写个俄罗斯方块
Nov 06 Python
pytorch model.cuda()花费时间很长的解决
Jun 01 Python
基于DataFrame改变列类型的方法
Jul 25 #Python
对pandas中Series的map函数详解
Jul 25 #Python
基于pandas将类别属性转化为数值属性的方法
Jul 25 #Python
Django实现支付宝付款和微信支付的示例代码
Jul 25 #Python
Python走楼梯问题解决方法示例
Jul 25 #Python
python 批量修改/替换数据的实例
Jul 25 #Python
django 实现电子支付功能的示例代码
Jul 25 #Python
You might like
让你同时上传 1000 个文件 (一)
2006/10/09 PHP
解读PHP的Yii框架中请求与响应的处理流程
2016/03/17 PHP
用js模拟JQuery的show与hide动画函数代码
2010/09/20 Javascript
javascript重复绑定事件造成的后果说明
2013/03/02 Javascript
jquery实现简单的拖拽效果实例兼容所有主流浏览器
2013/06/21 Javascript
js遍历子节点子元素附属性及方法
2014/08/19 Javascript
javascript匿名函数实例分析
2014/11/18 Javascript
JavaScript实现Base64编码转换
2016/04/23 Javascript
JS实现环形进度条(从0到100%)效果
2016/07/05 Javascript
js实现StringBuffer的简单实例
2016/09/02 Javascript
基于JavaScript实现购物网站商品放大镜效果
2016/09/06 Javascript
Ajax验证用户名或昵称是否已被注册
2017/04/05 Javascript
Bootstrap Table使用整理(一)
2017/06/09 Javascript
vuex state及mapState的基础用法详解
2018/04/19 Javascript
浅谈super-vuex使用体验
2018/06/25 Javascript
vue使用pdfjs显示PDF可复制的实现方法
2018/12/14 Javascript
详解Vue2.5+迁移至Typescript指南
2019/08/01 Javascript
vue中使用rem布局代码详解
2019/10/30 Javascript
基于Angular 8和Bootstrap 4实现动态主题切换的示例代码
2020/02/11 Javascript
vue抽出组件并传值实例
2020/07/31 Javascript
antd的select下拉框因为数据量太大造成卡顿的解决方式
2020/10/31 Javascript
Python Web框架Flask中使用新浪SAE云存储实例
2015/02/08 Python
python使用tornado实现简单爬虫
2018/07/28 Python
python openpyxl使用方法详解
2019/07/18 Python
解决Django layui {{}}冲突的问题
2019/08/29 Python
Python+OpenCV图像处理——实现轮廓发现
2020/10/23 Python
python中if嵌套命令实例讲解
2021/02/25 Python
HTML5 video 事件应用示例
2014/09/11 HTML / CSS
香港个人化生活购物网站:Ballyhoo Limited
2016/09/10 全球购物
英国家具、照明、家居用品网上商店:Wayfair.co.uk
2020/02/13 全球购物
2014全国两会大学生学习心得体会
2014/03/10 职场文书
企业金融服务方案
2014/06/03 职场文书
教师党员自我剖析材料
2014/09/29 职场文书
2016七一建党节慰问信
2015/11/30 职场文书
mysql知识点整理
2021/04/05 MySQL
利用前端HTML+CSS+JS开发简单的TODOLIST功能(记事本)
2021/04/13 Javascript