python正向最大匹配分词和逆向最大匹配分词的实例


Posted in Python onNovember 14, 2018

正向最大匹配

# -*- coding:utf-8 -*-
 
CODEC='utf-8'
 
def u(s, encoding):
  'converted other encoding to unicode encoding'
  if isinstance(s, unicode):
    return s
  else:
    return unicode(s, encoding)
 
def fwd_mm_seg(wordDict, maxLen, str):
  'forward max match segment'
  wordList = []
  segStr = str
  segStrLen = len(segStr)
  for word in wordDict:
    print 'word: ', word
  print "\n"
  while segStrLen > 0:
    if segStrLen > maxLen:
      wordLen = maxLen
    else:
      wordLen = segStrLen
    subStr = segStr[0:wordLen]
    print "subStr: ", subStr
    while wordLen > 1:
      if subStr in wordDict:
        print "subStr1: %r" % subStr
        break
      else:
        print "subStr2: %r" % subStr
        wordLen = wordLen - 1
        subStr = subStr[0:wordLen]
#      print "subStr3: ", subStr
    wordList.append(subStr)
    segStr = segStr[wordLen:]
    segStrLen = segStrLen - wordLen
  for wordstr in wordList:
    print "wordstr: ", wordstr
  return wordList
    
      
def main():
  fp_dict = open('words.dic')
  wordDict = {}
  for eachWord in fp_dict:
    wordDict[u(eachWord.strip(), 'utf-8')] = 1
  segStr = u'你好世界hello world'
  print segStr
  wordList = fwd_mm_seg(wordDict, 10, segStr)
  print "==".join(wordList)
  
 
if __name__ == '__main__':
  main()

逆向最大匹配

# -*- coding:utf-8 -*-
 
 
def u(s, encoding):
  'converted other encoding to unicode encoding'
  if isinstance(s, unicode):
    return s
  else:
    return unicode(s, encoding)
 
CODEC='utf-8'
 
def bwd_mm_seg(wordDict, maxLen, str):
  'forward max match segment'
  wordList = []
  segStr = str
  segStrLen = len(segStr)
  for word in wordDict:
    print 'word: ', word
  print "\n"
  while segStrLen > 0:
    if segStrLen > maxLen:
      wordLen = maxLen
    else:
      wordLen = segStrLen
    subStr = segStr[-wordLen:None]
    print "subStr: ", subStr
    while wordLen > 1:
      if subStr in wordDict:
        print "subStr1: %r" % subStr
        break
      else:
        print "subStr2: %r" % subStr
        wordLen = wordLen - 1
        subStr = subStr[-wordLen:None]
#      print "subStr3: ", subStr
    wordList.append(subStr)
    segStr = segStr[0: -wordLen]
    segStrLen = segStrLen - wordLen
  wordList.reverse()
  for wordstr in wordList:
    print "wordstr: ", wordstr
  return wordList
    
      
def main():
  fp_dict = open('words.dic')
  wordDict = {}
  for eachWord in fp_dict:
    wordDict[u(eachWord.strip(), 'utf-8')] = 1
  segStr = ur'你好世界hello world'
  print segStr
  wordList = bwd_mm_seg(wordDict, 10, segStr)
  print "==".join(wordList)
 
if __name__ == '__main__':
  main()

以上这篇python正向最大匹配分词和逆向最大匹配分词的实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python检测服务器是否正常
Feb 16 Python
详细探究Python中的字典容器
Apr 14 Python
Python中enumerate函数代码解析
Oct 31 Python
Python+matplotlib绘制不同大小和颜色散点图实例
Jan 19 Python
python 通过logging写入日志到文件和控制台的实例
Apr 28 Python
详解TensorFlow查看ckpt中变量的几种方法
Jun 19 Python
Python使用itertools模块实现排列组合功能示例
Jul 02 Python
windows下numpy下载与安装图文教程
Apr 02 Python
Python音频操作工具PyAudio上手教程详解
Jun 26 Python
python图像处理模块Pillow的学习详解
Oct 09 Python
浅谈keras的深度模型训练过程及结果记录方式
Jan 24 Python
django有哪些好处和优点
Sep 01 Python
对python中的乘法dot和对应分量相乘multiply详解
Nov 14 #Python
在python中实现对list求和及求积
Nov 14 #Python
python 统计一个列表当中的每一个元素出现了多少次的方法
Nov 14 #Python
Python 实现两个列表里元素对应相乘的方法
Nov 14 #Python
python将一组数分成每3个一组的实例
Nov 14 #Python
Python中实现单例模式的n种方式和原理
Nov 14 #Python
解决Python print输出不换行没空格的问题
Nov 14 #Python
You might like
PHP+MYSQL的文章管理系统(一)
2006/10/09 PHP
PHP文件大小格式化函数合集
2014/03/10 PHP
CutePsWheel javascript libary 控制输入文本框为可使用滚轮控制的js库
2010/02/07 Javascript
JS+CSS 制作的超级简单的下拉菜单附图
2013/11/22 Javascript
jQuery鼠标事件汇总
2015/08/30 Javascript
通过正则表达式获取url中参数的简单实现
2016/06/07 Javascript
AngularJS 过滤与排序详解及实例代码
2016/09/14 Javascript
JS文件上传神器bootstrap fileinput详解
2021/01/28 Javascript
javascript高级模块化require.js的具体使用方法
2017/10/31 Javascript
node.js支持多用户web终端实现及安全方案
2017/11/29 Javascript
js 数组详细操作方法及解析合集
2018/06/01 Javascript
微信小程序实现两边小中间大的轮播效果的示例代码
2018/12/07 Javascript
JavaScript队列结构Queue实现过程解析
2020/03/07 Javascript
微信小程序实现滑动操作代码
2020/04/23 Javascript
javascript canvas实现简易时钟例子
2020/09/05 Javascript
python cs架构实现简单文件传输
2020/03/20 Python
利用python脚本如何简化jar操作命令
2019/02/24 Python
Python Process多进程实现过程
2019/10/22 Python
python绘制玫瑰的实现代码
2020/03/02 Python
python实现二分类和多分类的ROC曲线教程
2020/06/15 Python
CSS3正方体旋转示例代码
2013/08/08 HTML / CSS
违纪检讨书2000字
2014/02/08 职场文书
企业授权委托书范本
2014/04/02 职场文书
《凡卡》教学反思
2014/04/09 职场文书
银行求职信范文
2014/05/26 职场文书
2015年计生协会工作总结
2015/04/24 职场文书
党务工作者主要事迹材料
2015/11/03 职场文书
2016年五一劳动节专题校园广播稿
2015/12/17 职场文书
2016党员干部廉政准则学习心得体会
2016/01/20 职场文书
导游词之桂林
2019/08/20 职场文书
五年级作文之想象作文
2019/10/30 职场文书
Python手拉手教你爬取贝壳房源数据的实战教程
2021/05/21 Python
mysql在项目中怎么选事务隔离级别
2021/05/25 MySQL
如何解决php-fpm启动不了问题
2021/11/17 PHP
Python实现视频中添加音频工具详解
2021/12/06 Python
Oracle锁表解决方法的详细记录
2022/06/05 Oracle