Python统计纯文本文件中英文单词出现个数的方法总结【测试可用】


Posted in Python onJuly 25, 2018

本文实例讲述了Python统计纯文本文件中英文单词出现个数的方法。分享给大家供大家参考,具体如下:

第一版: 效率低

# -*- coding:utf-8 -*-
#!python3
path = 'test.txt'
with open(path,encoding='utf-8',newline='') as f:
  word = []
  words_dict= {}
  for letter in f.read():
    if letter.isalnum():
      word.append(letter)
    elif letter.isspace(): #空白字符 空格 \t \n
      if word:
        word = ''.join(word).lower() #转小写
        if word not in words_dict:
          words_dict[word] = 1
        else:
          words_dict[word] += 1
        word = []
#处理最后一个单词
if word:
  word = ''.join(word).lower() # 转小写
  if word not in words_dict:
    words_dict[word] = 1
  else:
    words_dict[word] += 1
  word = []
for k,v in words_dict.items():
  print(k,v)

运行结果:

we 4
are 1
busy 1
all 1
day 1
like 1
swarms 1
of 6
flies 1
without 1
souls 1
noisy 1
restless 1
unable 1
to 1
hear 1
the 7
voices 1
soul 1
as 1
time 1
goes 1
by 1
childhood 1
away 2
grew 1
up 1
years 1
a 1
lot 1
memories 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence 1
regardless 1
shackles 1
mind 1
indulge 1
in 1
world 1
buckish 1
focus 1
on 1
beneficial 1
principle 1
lost 1
themselves 1

第二版:

缺点:遇到大文件要一次读入内存,性能不好

# -*- coding:utf-8 -*-
#!python3
import re
path = 'test.txt'
with open(path,'r',encoding='utf-8') as f:
  data = f.read()
  word_reg = re.compile(r'\w+')
  #word_reg = re.compile(r'\w+\b')
  word_list = word_reg.findall(data)
  word_list = [word.lower() for word in word_list] #转小写
  word_set = set(word_list) #避免重复查询
  # words_dict = {}
  # for word in word_set:
  #   words_dict[word] = word_list.count(word)
  # 简洁写法
  words_dict = {word: word_list.count(word) for word in word_set}
  for k,v in words_dict.items():
    print(k,v)

运行结果:

on 1
also 1
souls 1
focus 1
soul 1
time 1
noisy 1
grew 1
lot 1
childish 1
like 1
voices 1
indulge 1
swarms 1
buckish 1
restless 1
we 4
hear 1
childhood 1
as 1
world 1
themselves 1
are 1
bottom 1
memories 1
the 7
of 6
flies 1
without 1
have 2
day 1
busy 1
to 1
eroded 1
regardless 1
unable 1
innocence 1
up 1
a 1
in 1
mind 1
goes 1
by 1
lost 1
principle 1
once 1
away 2
years 1
beneficial 1
all 1
shackles 1

第三版:

# -*- coding:utf-8 -*-
#!python3
import re
path = 'test.txt'
with open(path, 'r', encoding='utf-8') as f:
  word_list = []
  word_reg = re.compile(r'\w+')
  for line in f:
    #line_words = word_reg.findall(line)
    #比上面的正则更加简单
    line_words = line.split()
    word_list.extend(line_words)
  word_set = set(word_list) # 避免重复查询
  words_dict = {word: word_list.count(word) for word in word_set}
  for k, v in words_dict.items():
    print(k, v)

运行结果:

childhood 1
innocence, 1
are 1
of 6
also 1
lost 1
We 1
regardless 1
noisy, 1
by, 1
on 1
themselves. 1
grew 1
lot 1
bottom 1
buckish, 1
time 1
childish 1
voices 1
once 1
restless, 1
shackles 1
world 1
eroded 1
As 1
all 1
day, 1
swarms 1
we 3
soul. 1
memories, 1
in 1
without 1
like 1
beneficial 1
up, 1
unable 1
away 1
flies 1
goes 1
a 1
have 2
away, 1
mind, 1
focus 1
principle, 1
hear 1
to 1
the 7
years 1
busy 1
souls, 1
indulge 1

第四版:使用Counter统计

# -*- coding:utf-8 -*-
#!python3
import collections
import re
path = 'test.txt'
with open(path, 'r', encoding='utf-8') as f:
  word_list = []
  word_reg = re.compile(r'\w+')
  for line in f:
    line_words = line.split()
    word_list.extend(line_words)
  words_dict = dict(collections.Counter(word_list)) #使用Counter统计
  for k, v in words_dict.items():
    print(k, v)

运行结果:

We 1
are 1
busy 1
all 1
day, 1
like 1
swarms 1
of 6
flies 1
without 1
souls, 1
noisy, 1
restless, 1
unable 1
to 1
hear 1
the 7
voices 1
soul. 1
As 1
time 1
goes 1
by, 1
childhood 1
away, 1
we 3
grew 1
up, 1
years 1
away 1
a 1
lot 1
memories, 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence, 1
regardless 1
shackles 1
mind, 1
indulge 1
in 1
world 1
buckish, 1
focus 1
on 1
beneficial 1
principle, 1
lost 1
themselves. 1

注:这里使用的测试文本test.txt如下:

We are busy all day, like swarms of flies without souls, noisy, restless, unable to hear the voices of the soul. As time goes by, childhood away, we grew up, years away a lot of memories, once have also eroded the bottom of the childish innocence, we regardless of the shackles of mind, indulge in the world buckish, focus on the beneficial principle, we have lost themselves.

Python 相关文章推荐
python基础教程之数字处理(math)模块详解
Mar 25 Python
python创建线程示例
May 06 Python
python实现2048小游戏
Mar 30 Python
Python实现判断给定列表是否有重复元素的方法
Apr 11 Python
python实时监控cpu小工具
Jun 21 Python
python pexpect ssh 远程登录服务器的方法
Feb 14 Python
Pycharm最新激活码2019(推荐)
Dec 31 Python
Python3连接Mysql8.0遇到的问题及处理步骤
Feb 17 Python
python小程序之4名牌手洗牌发牌问题解析
May 15 Python
python代码中怎么换行
Jun 17 Python
Python常用外部指令执行代码实例
Nov 05 Python
关于django python manage.py startapp 应用名出错异常原因解析
Dec 15 Python
基于DataFrame改变列类型的方法
Jul 25 #Python
对pandas中Series的map函数详解
Jul 25 #Python
基于pandas将类别属性转化为数值属性的方法
Jul 25 #Python
Django实现支付宝付款和微信支付的示例代码
Jul 25 #Python
Python走楼梯问题解决方法示例
Jul 25 #Python
python 批量修改/替换数据的实例
Jul 25 #Python
django 实现电子支付功能的示例代码
Jul 25 #Python
You might like
一个更简单的无限级分类菜单代码
2007/01/16 PHP
深入解析php中的foreach函数
2013/08/31 PHP
php实现简单的语法高亮函数实例分析
2015/04/27 PHP
使用PHP如何实现高效安全的ftp服务器(二)
2015/12/30 PHP
PHP+MariaDB数据库操作基本技巧备忘总结
2018/05/21 PHP
php过滤htmlspecialchars() 函数实现把预定义的字符转换为 HTML 实体用法分析
2019/06/25 PHP
JS option location 页面跳转实现代码
2008/12/27 Javascript
jQuery 第二课 操作包装集元素代码
2010/03/14 Javascript
JS中confirm,alert,prompt函数区别分析
2011/01/17 Javascript
JS实现div内部的文字或图片自动循环滚动代码
2013/04/19 Javascript
jQuery实现企业网站横幅焦点图切换功能实例
2015/04/30 Javascript
详解JavaScript中的客户端消息框架设计原理
2015/06/24 Javascript
如何用angularjs制作一个完整的表格
2016/01/21 Javascript
zTree实现节点修改的实时刷新功能
2017/03/20 Javascript
js实现分页功能
2017/05/24 Javascript
fetch 使用及如何接收JS传值
2017/11/11 Javascript
nodejs基于WS模块实现WebSocket聊天功能的方法
2018/01/12 NodeJs
详解如何用webpack4从零开始构建react开发环境
2019/01/27 Javascript
Vue Cli3 打包配置并自动忽略console.log语句的方法
2020/04/23 Javascript
Vue+ElementUI 中级联选择器Bug问题的解决
2020/07/31 Javascript
vue使用过滤器格式化日期
2021/01/20 Vue.js
跟老齐学Python之赋值,简单也不简单
2014/09/24 Python
使用Python的内建模块collections的教程
2015/04/28 Python
python数据类型判断type与isinstance的区别实例解析
2017/10/31 Python
Python字典常见操作实例小结【定义、添加、删除、遍历】
2019/10/25 Python
PyCharm2020.1.2社区版安装,配置及使用教程详解(Windows)
2020/08/07 Python
Python爬虫抓取论坛关键字过程解析
2020/10/19 Python
Python识别处理照片中的条形码
2020/11/16 Python
python 爬虫网页登陆的简单实现
2020/11/30 Python
7款设计巧妙的css3飘带状3D立体效果的导航菜单和表单窗口
2013/02/04 HTML / CSS
HTML5的Geolocation地理位置定位API使用教程
2016/05/12 HTML / CSS
英国门销售网站:Green Tree Doors
2020/01/07 全球购物
怎样建立和理解非常复杂的声明?例如定义一个包含N 个指向返回 指向字符的指针的函数的指针的数组?
2013/03/19 面试题
预备党员政审材料
2014/02/04 职场文书
民主评议党员个人总结
2015/02/13 职场文书
叶问观后感
2015/06/15 职场文书