Python 匹配文本并在其上一行追加文本


Posted in Python onMay 11, 2022

匹配文本并在其上一行追加文本

问题描述

Python匹配文本并在其上一行追加文本

test.txt

a
b
c
d
e

1.读进列表后覆盖原文件 

def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    lines = open(filename).read().splitlines()
    index = lines.index(match)
    lines.insert(index, content)
    open(filename, mode='w').write('\n'.join(lines))
match_then_insert('test.txt', match='c', content='123')

效果

a
b
123
c
d
e

2.FileInput类

from fileinput import FileInput
def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    for line in FileInput(filename, inplace=True):  # 原地过滤
        if match in line:
            line = content + '\n' + line
        print(line, end='')  # 输出重定向到原文件
match_then_insert('test.txt', match='c', content='123')

3.seek

def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    with open(filename, mode='rb+') as f:
        while True:
            try:
                line = f.readline()  # 逐行读取
            except IndexError:  # 超出范围则退出
                break
            line_str = line.decode().splitlines()[0]
            if line_str == match:
                f.seek(-len(line), 1)  # 光标移动到上一行
                rest = f.read()  # 读取余下内容
                f.seek(-len(rest), 1)  # 光标移动回原位置
                f.truncate()  # 删除余下内容
                content = content + '\n'
                f.write(content.encode())  # 插入指定内容
                f.write(rest)  # 还原余下内容
                break
match_then_insert('test.txt', match='c', content='123')

对比

方案 耗时/s
读进列表后覆盖原文件 54.42
FileInput类 121.59
seek 3.53
from timeit import timeit
from fileinput import FileInput
def init_txt():
    open('test.txt', mode='w').write('\n'.join(['a', 'b', 'c', 'd', 'e']))
def f1(filename='test.txt', match='c', content='123'):
    lines = open(filename).read().splitlines()
    index = lines.index(match)
    lines.insert(index, content)
    open(filename, mode='w').write('\n'.join(lines))
def f2(filename='test.txt', match='c', content='123'):
    for line in FileInput(filename, inplace=True):
        if match in line:
            line = content + '\n' + line
        print(line, end='')
def f3(filename='test.txt', match='c', content='123'):
    with open(filename, mode='rb+') as f:
        while True:
            try:
                line = f.readline()
            except IndexError:
                break
            line_str = line.decode().splitlines()[0]
            if line_str == match:
                f.seek(-len(line), 1)
                rest = f.read()
                f.seek(-len(rest), 1)
                f.truncate()
                content = content + '\n'
                f.write(content.encode())
                f.write(rest)
                break
init_txt()
print(timeit(f1, number=1000))
init_txt()
print(timeit(f2, number=1000))
init_txt()
print(timeit(f3, number=1000))

遇到的坑

报错可试试在文件头部添加

# -*- coding: utf-8 -*-

或指定 encoding='utf-8'

用正则表达式匹配文本(Python经典编程案例)

ceshi.txt文本如下:第一行为空行

爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-1
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ah_sina_com_cn,job: 28395818dbcb11e998a3f632d94e247c,pid: 88971,log: data/logs/chinabond_fast_spider/ah_sina_com_cn/28395818dbcb11e998a3f632d94e247c.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shupeidian_bjx_com_cn,job: 04738a5cdbcb11e9803172286b76aa73,pid: 34246,log: data/logs/chinabond_fast_spider/shupeidian_bjx_com_cn/04738a5cdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_sdchina_com,job: 28e8db4edbcb11e9803172286b76aa73,pid: 34324,log: data/logs/chinabond_fast_spider/news_sdchina_com/28e8db4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: hq_smm_cn,job: 4bdc3af6dbcb11e9a45522b8c8b2a9e4,pid: 111593,log: data/logs/chinabond_fast_spider/hq_smm_cn/4bdc3af6dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: sichuan_scol_com_cn,job: 71321c4edbcb11e9803172286b76aa73,pid: 34461,log: data/logs/chinabond_fast_spider/sichuan_scol_com_cn/71321c4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_mof_gov_cn,job: 7418dacedbcb11e9b15e02034af50b6e,pid: 65326,log: data/logs/chinabond_fast_spider/www_mof_gov_cn/7418dacedbcb11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-5
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_funxun_com,job: 4dcda7a0dbcb11e980a8862f09ca6d70,pid: 27785,log: data/logs/chinabond_fast_spider/www_funxun_com/4dcda7a0dbcb11e980a8862f09ca6d70.log,items: None
error_data:
爬虫任务报警
01:49:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shuidian_bjx_com_cn,job: 95090682dbcb11e9a0fade28e59e3773,pid: 106424,log: data/logs/chinabond_fast_spider/shuidian_bjx_com_cn/95090682dbcb11e9a0fade28e59e3773.log,items: None
error_data:
爬虫任务报警
01:51:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: tech_sina_com_cn,job: de4bdf72dbcb11e9a45522b8c8b2a9e4,pid: 111685,log: data/logs/chinabond_fast_spider/tech_sina_com_cn/de4bdf72dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ee_ofweek_com,job: ff6bd5b8dbcb11e9803172286b76aa73,pid: 34626,log: data/logs/chinabond_fast_spider/ee_ofweek_com/ff6bd5b8dbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: house_hexun_com,job: ff6dfdacdbcb11e9803172286b76aa73,pid: 34633,log: data/logs/chinabond_fast_spider/house_hexun_com/ff6dfdacdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_sjfzxm_com,job: 018e7d78dbcc11e9b15e02034af50b6e,pid: 65492,log: data/logs/chinabond_fast_spider/www_sjfzxm_com/018e7d78dbcc11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:53:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_xianzhaiwang_cn,job: 48d835e8dbcc11e9a0fade28e59e3773,pid: 106476,log: data/logs/chinabond_fast_spider/news_xianzhaiwang_cn/48d835e8dbcc11e9a0fade28e59e3773.log,items: None
error_data:

代码如下:

import os
import re
import json
from collections import namedtuple
alert = namedtuple('Spider_Alert', 'alert_time, alert_hostname, alert_project, alert_spider')
path = r'D:\data\ceshi.txt'
g_path = r'D:\data\\'
file_name = r'result.txt'
file_path = g_path + file_name
alerts_list = list()
with open(path, encoding="utf-8") as file:
    lines = file.readlines()  # 读取每一行
    count = 0
    time = None
    hostname = None
    project = None
    for line in lines:
        if re.search(r'^\d{2}:\d{2}:\d{2}\s*$', line):
            time = re.search(r'^(\d{2}:\d{2}:\d{2})\s*$', line).group(1)
        if re.search(r'^hostname:\s*(.+)', line):
            hostname = re.search(r'^hostname:\s*(.+)', line).group(1)
        if re.search(r'project:\s*([^,]+),', line):
            project = re.search(r'project:\s*([^,]+),', line).group(1)
        if re.search(r'spider:\s*([^,]+),', line):
            spider = re.search(r'spider:\s*([^,]+),', line).group(1)
        if re.search(r'^error_data', line):
            spider_alert = None
            spider_alert = alert(alert_time=time, alert_hostname=hostname, alert_project=project, alert_spider=spider)
            alerts_list.append(spider_alert)
for element in alerts_list:
    print(element[0], element[1], element[3])
    with open(file_path, 'a', encoding="utf-8") as file:
        file.write(element[0] + "\t" + element[1] + "\t" + element[3])
        file.write(' \n')

执行结果如下图:

Python 匹配文本并在其上一行追加文本


Tags in this post...

Python 相关文章推荐
python的描述符(descriptor)、装饰器(property)造成的一个无限递归问题分享
Jul 09 Python
跟老齐学Python之使用Python查询更新数据库
Nov 25 Python
wxPython定时器wx.Timer简单应用实例
Jun 03 Python
详解python 发送邮件实例代码
Dec 22 Python
完美解决Python2操作中文名文件乱码的问题
Jan 04 Python
Python调用C# Com dll组件实战教程
Oct 12 Python
解决使用PyCharm时无法启动控制台的问题
Jan 19 Python
Python实现制度转换(货币,温度,长度)
Jul 14 Python
利用keras加载训练好的.H5文件,并实现预测图片
Jan 24 Python
python操作docx写入内容,并控制文本的字体颜色
Feb 13 Python
Python3实现打印任意宽度的菱形代码
Apr 12 Python
用python读取xlsx文件
Dec 17 Python
Python 一键获取电脑浏览器的账号密码
May 11 #Python
图神经网络GNN算法
May 11 #Python
python神经网络ResNet50模型
May 06 #Python
python和anaconda的区别
May 06 #Python
python神经网络Xception模型
May 06 #Python
Python使用永中文档转换服务
May 06 #Python
Python tensorflow卷积神经Inception V3网络结构
May 06 #Python
You might like
php+AJAX传送中文会导致乱码的问题的解决方法
2008/09/08 PHP
仿dedecms下拉分页样式修改的thinkphp分页类实例
2014/10/30 PHP
php绘图之加载外部图片的方法
2015/01/24 PHP
php实现的农历算法实例
2015/08/11 PHP
Smarty分页实现方法完整实例
2016/05/11 PHP
phpcms中的评论样式修改方法
2016/10/21 PHP
js form 验证函数 当前比较流行的错误提示
2009/06/23 Javascript
Document 对象的常用方法
2009/07/31 Javascript
js post方式传递提交的实现代码
2010/05/31 Javascript
使用javascipt---实现二分查找法
2013/04/10 Javascript
详解jQuery插件开发中的extend方法
2013/11/19 Javascript
jquery仅用6行代码实现滑动门效果
2015/09/07 Javascript
JS实现网页标题栏显示当前时间和日期的完整代码
2015/11/02 Javascript
三种AngularJS中获取数据源的方式
2016/02/02 Javascript
jQuery获取table行数并输出单元格内容的实现方法
2016/06/30 Javascript
AngularJS中比较两个数组是否相同
2016/08/24 Javascript
JavaScript之DOM_动力节点Java学院整理
2017/07/03 Javascript
js实现本地时间同步功能
2017/08/26 Javascript
微信web端后退强制刷新功能的实现代码
2018/03/04 Javascript
基于JS实现html中placeholder属性提示文字效果示例
2018/04/19 Javascript
Python提示[Errno 32]Broken pipe导致线程crash错误解决方法
2014/11/19 Python
深入学习python的yield和generator
2016/03/10 Python
python写一个md5解密器示例
2018/02/23 Python
Python设计模式之代理模式实例详解
2019/01/19 Python
Python多线程模块Threading用法示例小结
2019/11/09 Python
tensorflow实现对张量数据的切片操作方式
2020/01/19 Python
python中xlrd模块的使用详解
2021/02/01 Python
HTML5 Canvas draw方法制作动画效果示例
2013/07/11 HTML / CSS
Perfumetrader荷兰:香水、化妆品和护肤品在线商店
2017/09/15 全球购物
从当地商店送来的杂货:Instacart
2018/08/19 全球购物
日本最大的购物网站:日本乐天市场(Rakuten Ichiba)
2020/11/04 全球购物
科学育儿宣传标语
2014/10/08 职场文书
2014年企业党支部工作总结
2014/12/04 职场文书
Nginx stream 配置代理(Nginx TCP/UDP 负载均衡)
2021/11/17 Servers
docker-compose部署Yapi的方法
2022/04/08 Servers
SQL Server数据库的三种创建方法汇总
2023/05/08 MySQL