使用python解析xml成对应的html示例分享


Posted in Python onApril 02, 2014

SAX将dd.xml解析成html。当然啦,如果得到了xml对应的xsl文件可以直接用libxml2将其转换成html。

#!/usr/bin/env python 
# -*- coding: utf-8 -*-
#---------------------------------------
#   程序:XML解析器
#   版本:01.0
#   作者:mupeng
#   日期:2013-12-18
#   语言:Python 2.7
#   功能:将xml解析成对应的html
#   注解:该程序用xml.sax模块的parse函数解析XML,并生成事件
#   继承ContentHandler并重写其事件处理函数
#   Dispatcher主要用于相应标签的起始、结束事件的派发
#---------------------------------------
from xml.sax.handler import ContentHandler
from xml.sax import parse
class Dispatcher:
    def dispatch(self, prefix, name, attrs=None):
        mname = prefix + name.capitalize()
        dname = 'default' + prefix.capitalize()
        method = getattr(self, mname, None)
        if callable(method): args = ()
        else:
            method = getattr(self, dname, None)
            #args = name
        #if prefix == 'start': args += attrs
        if callable(method): method()
    def startElement(self, name, attrs):
        self.dispatch('start', name, attrs)
    def endElement(self, name):
        self.dispatch('end', name)
class Website(Dispatcher, ContentHandler):
    def __init__(self):
        self.fout = open('ddt_SAX.html', 'w')
        self.imagein = False
        self.desflag = False
        self.item = False
        self.title = ''
        self.link = ''
        self.guid = ''
        self.url = ''
        self.pubdate = ''
        self.description = ''
        self.temp = ''
        self.prx = ''
    def startChannel(self):
        self.fout.write('''<html>\n<head>\n<title> RSS-''')
    def endChannel(self):
       self.fout.write('''
                    <tr><td height="20"></td></tr>
                    </table>
                    </center>
                    <script>
    function  GetTimeDiff(str)
    {
     if(str == '')
     {
      return '';
     }
     var pubDate = new Date(str);
     var nowDate = new Date();
     var diffMilSeconds = nowDate.valueOf()-pubDate.valueOf();
     var days = diffMilSeconds/86400000;
     days = parseInt(days);
     diffMilSeconds = diffMilSeconds-(days*86400000);
     var hours = diffMilSeconds/3600000;
     hours = parseInt(hours);
     diffMilSeconds = diffMilSeconds-(hours*3600000);
     var minutes = diffMilSeconds/60000;
     minutes = parseInt(minutes);
     diffMilSeconds = diffMilSeconds-(minutes*60000);
     var seconds = diffMilSeconds/1000;
     seconds = parseInt(seconds);
     var returnStr = "±±¾©·¢²¼Ê±¼ä£º" + pubDate.toLocaleString();
     if(days > 0)
     {
      returnStr = returnStr + " £¨¾àÀëÏÖÔÚ" + days + "Ìì" + hours + "Сʱ" + minutes + "·ÖÖÓ£©";
     }
     else if (hours > 0)
     {
      returnStr = returnStr + " £¨¾àÀëÏÖÔÚ" + hours + "Сʱ" + minutes + "·ÖÖÓ£©";
     }
     else if (minutes > 0)
     {
      returnStr = returnStr + " £¨¾àÀëÏÖÔÚ" + minutes + "·ÖÖÓ£©";
     }
     return returnStr;
    }
    function GetSpanText()
    {
     var pubDate;
     var pubDateArray;
     var spanArray = document.getElementsByTagName("span");
     for(var i = 0; i < spanArray.length; i++)
     {
      pubDate = spanArray[i].innerHTML;
      document.getElementsByTagName("span")[i].innerHTML = GetTimeDiff(pubDate);   
     }
    }
    GetSpanText();
   </script>
                </body>
                </html>
                ''')
       self.fout.close()
    def characters(self, chars):
        if chars.strip():
            #chars = chars.strip()
            self.temp += chars
            #print self.temp
       
    def startTitle(self):
        if self.item:
            self.fout.write('''
                        <tr bgcolor="#eeeeee">\n<td style="padding-top:5px;padding-left:5px;" height="30">\n<B>
                    ''')
    def endTitle(self):
        if not self.imagein and not self.item:
            self.title = self.temp
            self.temp = ''
            self.fout.write(self.title.encode('gb2312'))
            #self.title = self.temp
            self.fout.write('''
                </title>\n</head>\n<body>\n<center>\n
                <script>\n
                        function copyLink()
                        {
                                clipboardData.setData("Text",window.location.href);
                                alert("RSSÁ´½ÓÒѾ­¸´ÖƵ½¼ôÌù°å");
                        }
                        function subscibeLink()
                        {
                                var str = window.location.pathname;
                                while(str.match(/^\//))
                                {
                                        str = str.replace(/^\//,"");
                                }
                                window.open("http://rss.sina.com.cn/my_sina_web_rss_news.html?url=" + str,"_self");
                        }
                        </script>\n
                <table width="750" cellpadding="0" cellspacing="0">\n
                <tr>\n
                <td align="right" style="padding-right:15px;" valign="bottom">\n
            ''')
        if self.item:
            self.title = self.temp
            self.temp = ''
            self.fout.write(self.title.encode('gb2312'))
            self.fout.write('''
                        </B>
                        </td>
                        </tr>
                        <tr bgcolor="#eeeeee">
                        <td style="padding-left:5px;">
                        ''')
    def startImage(self):
        self.imagein = True
    def endImage(self):
        self.imagein = False
    def startLink(self):
        if self.imagein:
            self.fout.write('''<A href=" ''')
            
    def endLink(self):
        self.link = self.temp
        self.temp = ''
        if self.imagein:
            self.fout.write(self.link.encode('gb2312'))
            self.fout.write('''" target="_blank">\n ''')
        elif self.item:
            #self.link = self.temp
            pass
        else:
            self.fout.write(self.link)
            self.fout.write(''' " target="
      _blank
     "> ''')
            self.fout.write(self.title.encode('gb2312'))
            self.fout.write(''' </A></B></td>
                            </tr>
                            <tr><td colspan="2" align="center">
                            ''')
            self.fout.write(self.description.encode('gb2312'))
            self.fout.write('''
                        </td></tr>
                        <tr style="font-size:12px;" bgcolor="#eeeeff"><td colspan="2" style="font-size:14px;padding-top:5px;padding-bottom:5px;"><b><a href="javascript:copyLink();">¸´ÖÆ´ËÒ³Á´½Ó</a>                <a href="javascript:subscibeLink();">ÎÒҪǶÈë¸ÃÐÂÎÅÁÐ±íµ½ÎÒµÄÒ³Ãæ£¨¼òµ¥¡¢¿ìËÙ¡¢ÊµÊ±¡¢Ãâ·Ñ£©</a></b></td></tr>
                        </table>
                        <table width="750" cellpadding="0" cellspacing="0">
                            ''')
    def startUrl(self):
        if self.imagein:
            self.fout.write('''<IMG src=" ''')
    def endUrl(self):
        self.url = self.temp
        self.temp = ''
        if self.imagein:
            self.fout.write(self.url.encode('gb2312'))
            self.fout.write('''" border="0">\n
                            </A>
                            </td>
                            <td align="left" valign="bottom" style="padding-bottom:8px;"><B><A href="
                            ''')
        if self.item:
            #self.url = self.temp
            pass
    def defaultStart(self):
        pass
    def defaultEnd(self):
        self.temp = ''
    def startDescription(self):
        pass
    def endDescription(self):
        self.description = self.temp
        self.temp = ''
        if self.item:
            #self.fout.write('¡¡¡¡')
            self.fout.write(self.description.encode('gb2312'))
    def endGuid(self):
        self.guid = self.temp
    def endPubdate(self):
        if not self.temp.startswith('http'):
         self.pubdate = self.temp
         self.temp = ''
        else:
            self.pubdate = ''
    def startItem(self):
        self.item = True
    def endItem(self):
        self.item = False
        self.fout.write('''
                            </td>
                            </tr>
                            <tr bgcolor="#eeeeee">
                            <td style="padding-top:5px;padding-left:5px;">
                            <A href="''')
        self.fout.write(self.link)
        self.fout.write(''' " target="_blank"> ''')
        self.fout.write(self.guid)
        self.fout.write('''
                        </A>
                        </td>
                        </tr>
                        <tr bgcolor="#eeeeee">
                        <td style="padding-top:5px;padding-left:5px;padding-bottom:5px;"><span>''')
        self.fout.write(self.pubdate)
        self.fout.write('''</span></td>
                        </tr>
                        <tr height="10"><td></td></tr>''')
#程序入口
if __name__ == '__main__':
    parse('ddt.xml', Website())
Python 相关文章推荐
Python实现Windows上气泡提醒效果的方法
Jun 03 Python
Python tkinter事件高级用法实例
Jan 31 Python
python微信公众号之关注公众号自动回复
Oct 25 Python
解决Python内层for循环如何break出外层的循环的问题
Jun 24 Python
python调用并链接MATLAB脚本详解
Jul 05 Python
Python在Matplotlib图中显示中文字体的操作方法
Jul 29 Python
在django中实现页面倒数几秒后自动跳转的例子
Aug 16 Python
Python 中判断列表是否为空的方法
Nov 24 Python
win10安装python3.6的常见问题
Jul 01 Python
Python多线程的退出控制实现
Aug 10 Python
Python中的None与 NULL(即空字符)的区别详解
Sep 24 Python
pycharm配置安装autopep8自动规范代码的实现
Mar 02 Python
Python爬虫框架Scrapy安装使用步骤
Apr 01 #Python
使用python绘制人人网好友关系图示例
Apr 01 #Python
python异步任务队列示例
Apr 01 #Python
用Python编程实现语音控制电脑
Apr 01 #Python
35个Python编程小技巧
Apr 01 #Python
ptyhon实现sitemap生成示例
Mar 30 #Python
python实现百度关键词排名查询
Mar 30 #Python
You might like
使用php来实现网络服务
2009/09/15 PHP
Zend Framework教程之分发器Zend_Controller_Dispatcher用法详解
2016/03/07 PHP
Laravel实现定时任务的示例代码
2017/08/10 PHP
深入认识JavaScript中的函数
2007/01/22 Javascript
firefox火狐浏览器与与ie兼容的2个问题总结
2010/07/20 Javascript
鼠标滑上去后图片放大浮出效果的js代码
2011/05/28 Javascript
javascript开发随笔二 动态加载js和文件
2011/11/25 Javascript
通过Javascript将数据导出到外部Excel文档的函数代码
2012/06/15 Javascript
parentElement,srcElement的使用小结
2014/01/13 Javascript
javascript实现动态表头及表列的展现方法
2015/07/14 Javascript
JavaScript动态绑定详解
2017/09/14 Javascript
详解Webstorm 新建.vue文件支持高亮vue语法和es6语法
2017/10/26 Javascript
详解vue.js数据传递以及数据分发slot
2018/01/20 Javascript
vue.js将时间戳转化为日期格式的实现代码
2018/06/05 Javascript
nodejs实现百度舆情接口应用示例
2020/02/07 NodeJs
[01:56]无止竞 再出发——中国军团出征2017年DOTA2国际邀请赛
2017/07/05 DOTA
Python实现的txt文件去重功能示例
2018/07/07 Python
使用Python获取网段IP个数以及地址清单的方法
2018/11/01 Python
pandas读取CSV文件时查看修改各列的数据类型格式
2019/07/07 Python
Python定时任务APScheduler原理及实例解析
2020/05/30 Python
解决tensorflow读取本地MNITS_data失败的原因
2020/06/22 Python
python中执行smtplib失败的处理方法
2020/07/01 Python
印尼综合在线预订网站:Tiket.com(机票、酒店、火车、租车和娱乐)
2018/10/11 全球购物
Easy Spirit官网:美国休闲鞋履中的代表品牌
2019/04/12 全球购物
Sunglass Hut巴西网上商店:男女太阳镜
2020/10/04 全球购物
安全检查与奖惩制度
2014/01/23 职场文书
户籍证明模板
2014/09/28 职场文书
教师自我剖析材料范文
2014/09/30 职场文书
2014年党建工作汇报材料
2014/10/27 职场文书
党员承诺书范文2015
2015/04/27 职场文书
爱心捐款活动总结
2015/05/09 职场文书
小学校长开学致辞
2015/07/29 职场文书
使用Nginx搭载rtmp直播服务器的方法
2021/10/16 Servers
python playwright 自动等待和断言详解
2021/11/27 Python
golang实现浏览器导出excel文件功能
2022/03/25 Golang
Java详细解析==和equals的区别
2022/04/07 Java/Android