libreoffice python 操作word及excel文档的方法


Posted in Python onJuly 04, 2019

1、开始、关闭libreoffice服务;

开始之前同步字体文件时间,是因为创建soffice服务时,服务会检查所需加载的文件的时间,如果其认为时间不符,则其可能会重新加载,耗时较长,因此需事先统一时间。

使用时如果需要多次调用,最后每次调用均开启后关闭,否则libreoffice会创建一个缓存文档并越用越大,处理时间会增加。

class OfficeProcess(object):
  def __init__(self):
    self.p = 0
    subprocess.Popen('find /usr/share/fonts | xargs touch -m -t 201801010000.00', shell=True)

  def start_office(self):
    self.p = subprocess.Popen('soffice --pidfile=sof.pid --invisible --accept="socket,host=localhost,port=2002;urp;"', shell=True)
    while True:
      try:
        local_context = uno.getComponentContext()
        resolver = local_context.getServiceManager().createInstanceWithContext('com.sun.star.bridge.UnoUrlResolver', local_context)
        resolver.resolve('uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext')
        return
      except:
        print(ts(), "wait for connecting soffice...")
        time.sleep(1)
        continue

  def stop_office(self):
    with open("sof.pid", "rb") as f:
      try:
        os.kill(int(f.read()), signal.SIGTERM)
        self.p.wait()
      except:
        pass

2、init service manager

local_context = uno.getComponentContext()
    service_manager = local_context.getServiceManager()
    resolver = service_manager.createInstanceWithContext('com.sun.star.bridge.UnoUrlResolver', local_context)
    self.ctx = resolver.resolve('uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext')
    self.smgr = self.ctx.ServiceManager
    self.desktop = self.smgr.createInstanceWithContext('com.sun.star.frame.Desktop', self.ctx)

3、从二进制数据中读取doc文档

def ImportFromMemory(self, data):
    istream = self.smgr.createInstanceWithContext('com.sun.star.io.SequenceInputStream', self.ctx)
    istream.initialize((uno.ByteSequence(data), ))
    pv = PropertyValue()
    pv.Name = 'InputStream'
    pv.Value = istream
    self.doc = {'doc': []}
    try:
      self.document = self.desktop.loadComponentFromURL('private:stream/swriter', '_blank', 0, (pv, ))
      self.text = self.document.getText()
    except:
      self.text = None

4、读取doc文档中的数据

def ExportToJson(self):
    try:
      l = self.__ParseText(self.text, self.__Callback(self.doc['doc']))
      self.doc['length'] = l
    except:
      self.doc = {'doc': [], 'length': 0}
    return json.dumps(self.doc)

@staticmethod
  def __Callback(alist):
    def Append(sth):
      alist.append(sth)
    return Append
def __ParseText(self, text, func):
    l = 0
    text_it = text.createEnumeration()
    while text_it.hasMoreElements():
      element = text_it.nextElement()
      if element.supportsService('com.sun.star.text.Paragraph'):
        l += self.__ParseParagraph(element, func)
      elif element.supportsService('com.sun.star.text.TextTable'):
        l += self.__ParseTable(element, func)
      else:
        pass
    return l
def __ParseParagraph(self, paragraph, func):
    p = {'paragraph': []}
    l = 0
    paragraph_it = paragraph.createEnumeration()
    while paragraph_it.hasMoreElements():
      portion = paragraph_it.nextElement()
      if portion.TextPortionType == 'Text':
        l += self.__ParsePortionText(portion, self.__Callback(p['paragraph']))
      elif portion.TextPortionType == 'SoftPageBreak':
        pass
      elif portion.TextPortionType == 'TextField':
        l += self.__ParsePortionText(portion, self.__Callback(p['paragraph']))
      else:
        l += self.__ParseTextContent(portion, self.__Callback(p['paragraph']))
    if hasattr(paragraph, 'createContentEnumeration'):
      l += self.__ParseTextContent(paragraph, self.__Callback(p['paragraph']))
    p['length'] = l
    func(p)
    return l

  def __ParseTextContent(self, textcontent, func):
    l = 0
    content_it = textcontent.createContentEnumeration('com.sun.star.text.TextContent')
    while content_it.hasMoreElements():
      element = content_it.nextElement()
      if element.supportsService('com.sun.star.text.TextGraphicObject'):
        l += self.__ParsePortionGraphic(element, func)
      elif element.supportsService('com.sun.star.text.TextEmbeddedObject'):
        pass
      elif element.supportsService('com.sun.star.text.TextFrame'):
        l += self.__ParseFrame(element, func)
      elif element.supportsService('com.sun.star.drawing.GroupShape'):
        l += self.__ParseGroup(element, func)
      else:
        pass
    return l

  def __ParseFrame(self, frame, func):
    f = {'frame': []}
    l = self.__ParseText(frame.getText(), self.__Callback(f['frame']))
    f['length'] = l
    func(f)
    return l

  def __ParseGroup(self, group, func):
    l = 0
    for i in range(group.getCount()):
      it = group.getByIndex(i)
      if it.supportsService('com.sun.star.drawing.Text'):
        l += self.__ParseFrame(it, func)
      else:
        pass
    return l

  def __ParsePortionText(self, portion_text, func):
    func({'portion': portion_text.String, 'length': len(portion_text.String)})
    return len(portion_text.String)

  def __ParsePortionGraphic(self, portion_graphic, func):
    gp = self.smgr.createInstanceWithContext('com.sun.star.graphic.GraphicProvider', self.ctx)
    stream = self.smgr.createInstanceWithContext('com.sun.star.io.TempFile', self.ctx)
    pv1 = PropertyValue()
    pv1.Name = 'OutputStream'
    pv1.Value = stream
    pv2 = PropertyValue()
    pv2.Name = 'MimeType'
    pv2.Value = 'image/png'
    gp.storeGraphic(portion_graphic.Graphic, (pv1, pv2))
    stream.getOutputStream().flush()
    stream.seek(0)
    l = stream.getInputStream().available()
    b = uno.ByteSequence(b'')
    stream.seek(0)
    l, b = stream.getInputStream().readBytes(b, l)
    img = {'image': base64.b64encode(b.value).decode('ascii')}
    img['height'] = portion_graphic.Height
    img['width'] = portion_graphic.Width
    img['actualheight'] = portion_graphic.ActualSize.Height
    img['actualwidth'] = portion_graphic.ActualSize.Width
    img['croptop'] = portion_graphic.GraphicCrop.Top
    img['cropbottom'] = portion_graphic.GraphicCrop.Bottom
    img['cropleft'] = portion_graphic.GraphicCrop.Left
    img['cropright'] = portion_graphic.GraphicCrop.Right
    img['length'] = 0
    func(img)
    return 0

  def __ParseTable(self, table, func):
    l = 0
    try:
      matrix = self.__GetTableMatrix(table)
      seps = self.__GetTableSeparators(table)
      t = {}
      count = 0
      for ri in matrix.keys():
        t[ri] = {}
        for ci in matrix[ri].keys():
          t[ri][ci] = dict(matrix[ri][ci])
          del t[ri][ci]['cell']
          t[ri][ci]['content'] = []
          l += self.__ParseText(matrix[ri][ci]['cell'], self.__Callback(t[ri][ci]['content']))
          count += t[ri][ci]['rowspan'] * t[ri][ci]['colspan']
      if count != len(t) * len(seps):
        raise ValueError('count of cells error')
      func({'table': t, 'row': len(t), 'column': len(seps), 'length': l, 'tableid': self.table_id})
      self.table_id += 1
    except:
      l = 0
      print('discard wrong table')
    return l

  @staticmethod
  def __GetTableSeparators(table):
    result = [table.TableColumnRelativeSum]
    for ri in range(table.getRows().getCount()):
      result += [s.Position for s in table.getRows().getByIndex(ri).TableColumnSeparators]
    result = sorted(set(result))
    for i in range(len(result) - 1):
      result[i] += 1 if result[i] + 1 == result[i + 1] else 0
    return sorted(set(result))

  @staticmethod
  def __NameToRC(name):
    r = int(re.sub('[A-Za-z]', '', name)) - 1
    cstr = re.sub('[0-9]', '', name)
    c = 0
    for i in range(len(cstr)):
      if cstr[i] >= 'A' and cstr[i] <= 'Z':
        c = c * 52 + ord(cstr[i]) - ord('A')
      else:
        c = c * 52 + 26 + ord(cstr[i]) - ord('a')
    return r, c

  @staticmethod
  def __GetTableMatrix(table):
    result = {}
    for name in table.getCellNames():
      ri, ci = WordToJson.__NameToRC(name)
      cell = table.getCellByName(name)
      if ri not in result:
        result[ri] = {}
      result[ri][ci] = {'cell': cell, 'rowspan': cell.RowSpan, 'name': name}

    seps = WordToJson.__GetTableSeparators(table)
    for ri in result.keys():
      sep = [s.Position for s in table.getRows().getByIndex(ri).TableColumnSeparators] + [table.TableColumnRelativeSum]
      sep = sorted(set(sep))
      for ci in result[ri].keys():
        right = seps.index(sep[ci]) if sep[ci] in seps else seps.index(sep[ci] + 1)
        left = -1 if ci == 0 else seps.index(sep[ci - 1]) if sep[ci - 1] in seps else seps.index(sep[ci - 1] + 1)
        result[ri][ci]['colspan'] = right - left
    return result

5、写doc文档

self.doco = self.desktop.loadComponentFromURL('private:factory/swriter', '_blank', 0, ())
    self.texto = self.doco.getText()
    self.cursoro = self.texto.createTextCursor()
    self.cursoro.ParaBottomMargin = 500
def __WriteText(self, text, texto, cursoro):
    for it in text:
      if 'paragraph' in it:
        self.__WriteParagraph(it, texto, cursoro)
      elif 'image' in it:
        self.__WritePortionGraphic(it, texto, cursoro)
      elif 'table' in it:
        self.__WriteTable(it, texto, cursoro)

  def __WriteParagraph(self, paragraph, texto, cursoro):
    if paragraph['length'] > 0:
      if 'result' in paragraph:
        for it in paragraph['result']:
          texto.insertString(cursoro, it['trans_sen'], False)
      else:
        texto.insertString(cursoro, paragraph['paragraph'], False)
      texto.insertControlCharacter(cursoro, ControlCharacter.PARAGRAPH_BREAK, False)

  def __WritePortionGraphic(self, portion_graphic, texto, cursoro):
    png_base64 = portion_graphic['image']
    png = base64.b64decode(png_base64)
    gp = self.smgr.createInstanceWithContext('com.sun.star.graphic.GraphicProvider', self.ctx)
    istream = self.smgr.createInstanceWithContext('com.sun.star.io.SequenceInputStream', self.ctx)
    istream.initialize((uno.ByteSequence(png), ))
    pv = PropertyValue()
    pv.Name = 'InputStream'
    pv.Value = istream

    actualsize = uno.createUnoStruct('com.sun.star.awt.Size')
    actualsize.Height = portion_graphic['actualheight'] if 'actualheight' in portion_graphic else portion_graphic['height']
    actualsize.Width = portion_graphic['actualwidth'] if 'actualwidth' in portion_graphic else portion_graphic['width']
    graphiccrop = uno.createUnoStruct('com.sun.star.text.GraphicCrop')
    graphiccrop.Top = portion_graphic['croptop'] if 'croptop' in portion_graphic else 0
    graphiccrop.Bottom = portion_graphic['cropbottom'] if 'cropbottom' in portion_graphic else 0
    graphiccrop.Left = portion_graphic['cropleft'] if 'cropleft' in portion_graphic else 0
    graphiccrop.Right = portion_graphic['cropright'] if 'cropright' in portion_graphic else 0

    image = self.doco.createInstance('com.sun.star.text.TextGraphicObject')
    image.Surround = NONE
    image.Graphic = gp.queryGraphic((pv, ))
    image.Height = portion_graphic['height']
    image.Width = portion_graphic['width']
    image.setPropertyValue('ActualSize', actualsize)
    image.setPropertyValue('GraphicCrop', graphiccrop)
    texto.insertTextContent(cursoro, image, False)
    texto.insertControlCharacter(cursoro, ControlCharacter.PARAGRAPH_BREAK, False)

  def __WriteTable(self, table, texto, cursoro):
    tableo = self.doco.createInstance('com.sun.star.text.TextTable')
    tableo.initialize(table['row'], table['column'])
    texto.insertTextContent(cursoro, tableo, False)
#    texto.insertControlCharacter(cursoro, ControlCharacter.PARAGRAPH_BREAK, False)
    tcursoro = tableo.createCursorByCellName("A1")

    hitbug = False
    if table['row'] > 1:
      tcursoro.goDown(1, True)
      hitbug = tcursoro.getRangeName() == 'A1'

    for ri in sorted([int(r) for r in table['table'].keys()]):
      rs = table['table'][str(ri)]
      for ci in sorted([int(c) for c in rs.keys()]):
        cell = rs[str(ci)]
        if hitbug == False and (cell['rowspan'] > 1 or cell['colspan'] > 1):
          tcursoro.gotoCellByName(cell['name'], False)
          if cell['rowspan'] > 1:
            tcursoro.goDown(cell['rowspan'] - 1, True)
          if cell['colspan'] > 1:
            tcursoro.goRight(cell['colspan'] - 1, True)
          tcursoro.mergeRange()
        ctexto = tableo.getCellByName(cell['name'])
        if ctexto == None:
          continue
        ccursoro = ctexto.createTextCursor()
        ccursoro.CharWeight = FontWeight.NORMAL
        ccursoro.CharWeightAsian = FontWeight.NORMAL
        ccursoro.ParaAdjust = LEFT
        self.__WriteText(cell['content'], ctexto, ccursoro)

6、生成二进制的doc文档数据

streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)
    self.doco.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'MS Word 2007 XML', 0), PropertyValue('OutputStream', 0, streamo, 0)))
    streamo.flush()
    _, datao = streamo.readBytes(None, streamo.available())

7、从doc文档数据生成pdf的二进制数据

streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)
    self.doco.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'writer_pdf_Export', 0), PropertyValue('OutputStream', 0, streamo, 0)))
    streamo.flush()
    _, datap = streamo.readBytes(None, streamo.available())

8、读取excel二进制数据

def ImportFromMemory(self, data):
    istream = self.smgr.createInstanceWithContext('com.sun.star.io.SequenceInputStream', self.ctx)
    istream.initialize((uno.ByteSequence(data), ))
    pv = PropertyValue()
    pv.Name = 'InputStream'
    pv.Value = istream
    self.doc = {'doc': []}
    try:
      print("before loadComponentFromURL")
      self.document = self.desktop.loadComponentFromURL('private:stream/scalc', '_blank', 0, (pv, ))
      self.sheets = self.document.getSheets()
      print("ImportFromMemory done")
    except:
      print("ImportFromMemory failed")
      self.sheets = None

9、读取excel的文本数据

def ExportToJson(self):
    try:
      l = self.__ParseText(self.sheets, self.__Callback(self.doc['doc']))
      self.doc['length'] = l
    except:
      self.doc = {'doc': [], 'length': 0}
    return json.dumps(self.doc)
def __ParseText(self, sheets, func):
    l = 0
    sheets_it = sheets.createEnumeration()
    while sheets_it.hasMoreElements():
      element = sheets_it.nextElement()
      if element.supportsService('com.sun.star.sheet.Spreadsheet'):
        l += self.__ParseSpreadsheet(element, func)
    return l

  def __ParseSpreadsheet(self, spreadsheet, func):
    l = 0
    p = {'spreadsheet': []}
    visible_cells_it = spreadsheet.queryVisibleCells().getCells().createEnumeration()
    while visible_cells_it.hasMoreElements():
      cell = visible_cells_it.nextElement()
      type = cell.getType()
      if type == self.EMPTY:
        print("cell.type==empty")
      elif type == self.VALUE:
        print("cell.type==VALUE", "value=", cell.getValue(), cell.getCellAddress ())
      elif type == self.TEXT:
        print("cell.type==TEXT","content=", cell.getString().encode("UTF-8"), cell.getCellAddress ())
        l += self.__ParseCellText(spreadsheet, cell, self.__Callback(p['spreadsheet']))
        print("__ParseCellText=", p)
      elif type == self.FORMULA:
        print("cell.type==FORMULA", "formula=", cell.getValue())
    p['length'] = l
    func(p)
    return l

  def __ParseCellText(self, sheet, cell, func):
    try:
      x = cell.getCellAddress().Column
      y = cell.getCellAddress().Row
      sheetname = sheet.getName()
    except:
      x = -1
      y = -1
      sheetname = None
    func({'celltext': cell.getString(), 'x': x, 'y': y, 'sheetname': sheetname, 'length': len(cell.getString())})
    return len(cell.getString())
 self.EMPTY = uno.Enum("com.sun.star.table.CellContentType", "EMPTY")
    self.TEXT = uno.Enum("com.sun.star.table.CellContentType", "TEXT")
    self.FORMULA = uno.Enum("com.sun.star.table.CellContentType", "FORMULA")
    self.VALUE = uno.Enum("com.sun.star.table.CellContentType", "VALUE")

10、替换excel的文本信息

def ImportFromJson(self, data):
    doc = json.loads(data)
    try:
      self.__WriteText(doc['doc'])
    except:
      pass
def __WriteText(self, text):
    print("__WriteText begin:", text)
    sheet = None
    for it in text:
      if 'paragraph' in it and 'sheetname' in it:
        if sheet == None or sheet.getName() != it['sheetname']:
          try:
            sheet = self.sheets.getByName(it['sheetname'])
            print("getsheet:", it['sheetname'], "=", sheet.getName())
          except:
            sheet = None
            continue
        self.__WriteParagraph(it, sheet)

  def __WriteParagraph(self, paragraph, sheet):
    print("__WriteParagraph")
    if paragraph['length'] > 0:
      try:
        x = paragraph['x']
        y = paragraph['y']
        print("getcell:", x, y)
        cell = sheet.getCellByPosition(x, y)
        print("getcell done")
      except:
        return
      if 'result' in paragraph:
        for it in paragraph['result']:
          print("cell=", cell.getString())
          cell.setString(it['trans_sen'])
          print("cell,", cell.getString(), ",done")

11、生成excel文档二进制数据

streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)
    self.document.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'Calc MS Excel 2007 XML', 0), PropertyValue('OutputStream', 0, streamo, 0)))
    streamo.flush()
    _, datao = streamo.readBytes(None, streamo.available())

12、生成excel的pdf文档

streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)
    self.document.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'calc_pdf_Export', 0), PropertyValue('OutputStream', 0, streamo, 0)))
    streamo.flush()
    _, datap = streamo.readBytes(None, streamo.available())

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
决策树的python实现方法
Nov 18 Python
Python合并两个字典的常用方法与效率比较
Jun 17 Python
Python使用logging结合decorator模式实现优化日志输出的方法
Apr 16 Python
Python处理Excel文件实例代码
Jun 20 Python
pyspark 读取csv文件创建DataFrame的两种方法
Jun 07 Python
Python3正则匹配re.split,re.finditer及re.findall函数用法详解
Jun 11 Python
Python使用pydub库对mp3与wav格式进行互转的方法
Jan 10 Python
Python中一般处理中文的几种方法
Mar 06 Python
解决pyecharts在jupyter notebook中使用报错问题
Apr 23 Python
python调用matplotlib模块绘制柱状图
Oct 18 Python
Anaconda详细安装步骤图文教程
Nov 12 Python
python opencv肤色检测的实现示例
Dec 21 Python
Python实现12306火车票抢票系统
Jul 04 #Python
如何利用Pyecharts可视化微信好友
Jul 04 #Python
python 获取等间隔的数组实例
Jul 04 #Python
python 中pyqt5 树节点点击实现多窗口切换问题
Jul 04 #Python
Python机器学习算法库scikit-learn学习之决策树实现方法详解
Jul 04 #Python
Python 中PyQt5 点击主窗口弹出另一个窗口的实现方法
Jul 04 #Python
Python+opencv 实现图片文字的分割的方法示例
Jul 04 #Python
You might like
PHP动态图像的创建
2006/10/09 PHP
如何用C语言编写PHP扩展的详解
2013/06/13 PHP
Zend Framework页面缓存实例
2014/06/25 PHP
CSS中简写属性要注意TRouBLe的顺序问题(避免踩坑)
2021/03/09 HTML / CSS
javascript 屏蔽鼠标键盘的几段代码
2008/01/02 Javascript
用js做一个小游戏平台 (一)
2009/12/29 Javascript
js原生appendChild的bug解决心得分享
2013/07/01 Javascript
Bootstrap富文本组件wysiwyg数据保存到mysql的方法
2016/05/09 Javascript
用JS写的一个Ajax库(实例代码)
2016/08/06 Javascript
jQuery学习笔记——jqGrid的使用记录(实现分页、搜索功能)
2016/11/09 Javascript
JavaScript、C# URL编码、解码总结
2017/01/21 Javascript
jQuery树控件zTree使用方法详解(一)
2017/02/28 Javascript
NodeJS仿WebApi路由示例
2017/02/28 NodeJs
Vuex 在Vue 组件中获得Vuex 状态state的方法
2018/08/27 Javascript
详解基于electron制作一个node压缩图片的桌面应用
2019/01/29 Javascript
使用jquery-easyui的布局layout写后台管理页面的代码详解
2019/06/19 jQuery
[01:43]深扒TI7聊天轮盘语音出处4
2017/05/11 DOTA
[02:13] 完美世界DOTA2联赛PWL DAY5集锦
2020/11/03 DOTA
讲解Python中for循环下的索引变量的作用域
2015/04/15 Python
python实现遍历文件夹修改文件后缀
2018/08/28 Python
[机器视觉]使用python自动识别验证码详解
2019/05/16 Python
python os.fork() 循环输出方法
2019/08/08 Python
linux 下selenium chrome使用详解
2020/04/02 Python
Python插件机制实现详解
2020/05/04 Python
基于python爬取链家二手房信息代码示例
2020/10/21 Python
HTML5如何实现元素拖拽
2016/03/11 HTML / CSS
男女时尚与复古风格在线购物:RoseGal(全球免费送货)
2017/07/19 全球购物
函授毕业自我鉴定
2013/12/19 职场文书
幸福家庭标语
2014/06/27 职场文书
环境保护建议书
2014/08/26 职场文书
2014党员四风对照检查材料思想汇报
2014/09/17 职场文书
关于随地扔垃圾的检讨书
2014/09/30 职场文书
好好学习保证书
2015/02/26 职场文书
2015年大学宣传部工作总结
2015/05/26 职场文书
Python机器学习之基于Pytorch实现猫狗分类
2021/06/08 Python
一次项目中Thinkphp绕过禁用函数的实战记录
2021/11/17 PHP