python正向最大匹配分词和逆向最大匹配分词的实例


Posted in Python onNovember 14, 2018

正向最大匹配

# -*- coding:utf-8 -*-
 
CODEC='utf-8'
 
def u(s, encoding):
  'converted other encoding to unicode encoding'
  if isinstance(s, unicode):
    return s
  else:
    return unicode(s, encoding)
 
def fwd_mm_seg(wordDict, maxLen, str):
  'forward max match segment'
  wordList = []
  segStr = str
  segStrLen = len(segStr)
  for word in wordDict:
    print 'word: ', word
  print "\n"
  while segStrLen > 0:
    if segStrLen > maxLen:
      wordLen = maxLen
    else:
      wordLen = segStrLen
    subStr = segStr[0:wordLen]
    print "subStr: ", subStr
    while wordLen > 1:
      if subStr in wordDict:
        print "subStr1: %r" % subStr
        break
      else:
        print "subStr2: %r" % subStr
        wordLen = wordLen - 1
        subStr = subStr[0:wordLen]
#      print "subStr3: ", subStr
    wordList.append(subStr)
    segStr = segStr[wordLen:]
    segStrLen = segStrLen - wordLen
  for wordstr in wordList:
    print "wordstr: ", wordstr
  return wordList
    
      
def main():
  fp_dict = open('words.dic')
  wordDict = {}
  for eachWord in fp_dict:
    wordDict[u(eachWord.strip(), 'utf-8')] = 1
  segStr = u'你好世界hello world'
  print segStr
  wordList = fwd_mm_seg(wordDict, 10, segStr)
  print "==".join(wordList)
  
 
if __name__ == '__main__':
  main()

逆向最大匹配

# -*- coding:utf-8 -*-
 
 
def u(s, encoding):
  'converted other encoding to unicode encoding'
  if isinstance(s, unicode):
    return s
  else:
    return unicode(s, encoding)
 
CODEC='utf-8'
 
def bwd_mm_seg(wordDict, maxLen, str):
  'forward max match segment'
  wordList = []
  segStr = str
  segStrLen = len(segStr)
  for word in wordDict:
    print 'word: ', word
  print "\n"
  while segStrLen > 0:
    if segStrLen > maxLen:
      wordLen = maxLen
    else:
      wordLen = segStrLen
    subStr = segStr[-wordLen:None]
    print "subStr: ", subStr
    while wordLen > 1:
      if subStr in wordDict:
        print "subStr1: %r" % subStr
        break
      else:
        print "subStr2: %r" % subStr
        wordLen = wordLen - 1
        subStr = subStr[-wordLen:None]
#      print "subStr3: ", subStr
    wordList.append(subStr)
    segStr = segStr[0: -wordLen]
    segStrLen = segStrLen - wordLen
  wordList.reverse()
  for wordstr in wordList:
    print "wordstr: ", wordstr
  return wordList
    
      
def main():
  fp_dict = open('words.dic')
  wordDict = {}
  for eachWord in fp_dict:
    wordDict[u(eachWord.strip(), 'utf-8')] = 1
  segStr = ur'你好世界hello world'
  print segStr
  wordList = bwd_mm_seg(wordDict, 10, segStr)
  print "==".join(wordList)
 
if __name__ == '__main__':
  main()

以上这篇python正向最大匹配分词和逆向最大匹配分词的实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
Python正则表达式的使用范例详解
Aug 08 Python
跟老齐学Python之玩转字符串(2)更新篇
Sep 28 Python
python安装numpy&安装matplotlib& scipy的教程
Nov 02 Python
基于python 处理中文路径的终极解决方法
Apr 12 Python
python实现多进程代码示例
Oct 31 Python
Pyqt QImage 与 np array 转换方法
Jun 27 Python
numpy:np.newaxis 实现将行向量转换成列向量
Nov 30 Python
40个你可能不知道的Python技巧附代码
Jan 29 Python
浅谈python累加求和+奇偶数求和_break_continue
Feb 25 Python
Python configparser模块操作代码实例
Jun 08 Python
python 爬取京东指定商品评论并进行情感分析
May 27 Python
Python MNIST手写体识别详解与试练
Nov 07 Python
对python中的乘法dot和对应分量相乘multiply详解
Nov 14 #Python
在python中实现对list求和及求积
Nov 14 #Python
python 统计一个列表当中的每一个元素出现了多少次的方法
Nov 14 #Python
Python 实现两个列表里元素对应相乘的方法
Nov 14 #Python
python将一组数分成每3个一组的实例
Nov 14 #Python
Python中实现单例模式的n种方式和原理
Nov 14 #Python
解决Python print输出不换行没空格的问题
Nov 14 #Python
You might like
php下检测字符串是否是utf8编码的代码
2008/06/28 PHP
Php Cookie的一个使用注意点
2008/11/08 PHP
PHP的SQL注入实现(测试代码安全不错)
2011/02/27 PHP
PHP之sprintf函数用法详解
2014/11/12 PHP
php两种无限分类方法实例
2015/04/21 PHP
PHP书写格式详解(必看)
2016/05/23 PHP
php加速缓存器opcache,apc,xcache,eAccelerator原理与配置方法实例分析
2020/03/02 PHP
JavaScript网页制作特殊效果用随机数
2007/05/22 Javascript
(跨浏览器基础事件/浏览器检测/判断浏览器)经验代码分享
2013/01/24 Javascript
JavaScript实现同步于本地时间的动态时间显示方法
2015/02/02 Javascript
jQuery实现简单倒计时功能的方法
2016/07/04 Javascript
jQuery实现的纵向下拉菜单实例详解【附demo源码下载】
2016/07/09 Javascript
Bootstrap导航条的使用和理解3
2016/12/14 Javascript
BootStrap CSS全局样式和表格样式源码解析
2017/01/20 Javascript
javascript设计模式之策略模式学习笔记
2017/02/15 Javascript
微信小程序 request接口的封装实例代码
2017/04/26 Javascript
Vue 使用formData方式向后台发送数据的实现
2019/04/14 Javascript
浅谈TypeScript的类型保护机制
2020/02/23 Javascript
Vue 中使用lodash对事件进行防抖和节流操作
2020/07/26 Javascript
JavaScript实现10秒后再次获取验证码
2020/12/02 Javascript
详细解析Python中的变量的数据类型
2015/05/13 Python
Python二分查找详解
2015/09/13 Python
python中datetime模块中strftime/strptime函数的使用
2018/07/03 Python
Python 获取div标签中的文字实例
2018/12/20 Python
PyCharm搭建Spark开发环境实现第一个pyspark程序
2019/06/13 Python
Python 、Pycharm、Anaconda三者的区别与联系、安装过程及注意事项
2019/10/11 Python
pyqt5数据库使用详细教程(打包解决方案)
2020/03/25 Python
CSS实现雨滴动画效果的实例代码
2019/10/08 HTML / CSS
英国马莎百货官网:Marks & Spencer
2016/07/29 全球购物
文秘专业大学生求职信
2013/11/10 职场文书
营业员演讲稿
2013/12/30 职场文书
护士长竞聘演讲稿
2014/04/30 职场文书
同学聚会开幕词
2019/04/02 职场文书
nginx部署多前端项目的几种方法
2021/05/25 Servers
Spring Boot DevTools 全局配置学习指南
2022/03/31 Java/Android
React四级菜单的实现
2022/04/08 Javascript