编程 Python

python 批量修改 labelImg 生成的xml文件的方法

Posted in Python onSeptember 09, 2019

概述

自己在用labelImg打好标签后，想只用其中几类训练，不想训练全部类别，又不想重新打标生成.xml文件，因此想到这个办法：直接在.xml文件中删除原有的不需要的标签类及其属性。

打标时标签名出现了大小写（工程量大时可能会手滑），程序中有改写标签值为小写的过程，因为我做py-faster-rcnn 训练时，标签必须全部为小写。

以如下的.xml文件为例，我故意把标签增加了大写

<annotation verified="yes">
 <filename>test.jpg</filename>
 <path>C:\Users\yasin\Desktop\test</path>
 <source>
 <database>Unknown</database>
 </source>
 <size>
 <width>400</width>
 <height>300</height>
 <depth>3</depth>
 </size>
 <segmented>0</segmented>
 <object>
 <name>People</name>
 <pose>Unspecified</pose>
 <truncated>0</truncated>
 <difficult>0</difficult>
 <bndbox>
  <xmin>80</xmin>
  <ymin>69</ymin>
  <xmax>144</xmax>
  <ymax>89</ymax>
 </bndbox>
 </object>
 <object>
 <name>CAT</name>
 <pose>Unspecified</pose>
 <truncated>0</truncated>
 <difficult>0</difficult>
 <bndbox>
  <xmin>40</xmin>
  <ymin>69</ymin>
  <xmax>143</xmax>
  <ymax>16</ymax>
 </bndbox>
 </object>
 <object>
 <name>dog</name>
 <pose>Unspecified</pose>
 <truncated>0</truncated>
 <difficult>0</difficult>
 <bndbox>
  <xmin>96</xmin>
  <ymin>82</ymin>
  <xmax>176</xmax>
  <ymax>87</ymax>
 </bndbox>
 </object> 
</annotation>

具体实现

假如我们只想保留图片上的people和cat类，其他都删除，代码如下：

from xml.etree.ElementTree import ElementTree
from os import walk, path

def read_xml(in_path):
  tree = ElementTree()
  tree.parse(in_path)
  return tree

def write_xml(tree, out_path):
  tree.write(out_path, encoding="utf-8", xml_declaration=True)

def find_nodes(tree, path):
  return tree.findall(path)

def del_node_by_target_classes(nodelist, target_classes_lower, tree_root):
  for parent_node in nodelist:
    children = parent_node.getchildren()
    if (parent_node.tag == "object" and children[0].text.lower() not in target_classes_lower):
      tree_root.remove(parent_node)
    elif (parent_node.tag == "object" and children[0].text.lower() in target_classes_lower):
      children[0].text = children[0].text.lower()

def get_fileNames(rootdir):
  data_path = []
  prefixs = []
  for root, dirs, files in walk(rootdir, topdown=True):
    for name in files:
      pre, ending = path.splitext(name)
      if ending != ".xml":
        continue
      else:
        data_path.append(path.join(root, name))
        prefixs.append(pre)

  return data_path, prefixs

if __name__ == "__main__":
  # get all the xml paths, prefixes if not used here
  paths_xml, prefixs = get_fileNames("/home/yasin/old_labels/")

  target_classes = ["PEOPLE", "CAT"] # target flags you want to keep

  target_classes_lower = []
  for i in range(len(target_classes)):
    target_classes_lower.append(target_classes[i].lower()) # make sure your target is lowe-case

  # print(target_classes_lower)
  for i in range(len(paths_xml)):
    # rename and save the corresponding xml
    tree = read_xml(paths_xml[i])
    
    # get tree node
    tree_root = tree.getroot()

    # get parent nodes
    del_parent_nodes = find_nodes(tree, "./")
    
    # get target classes and delete
    target_del_node = del_node_by_target_classes(del_parent_nodes, target_classes_lower, tree_root)
    
    # save output xml, 000001.xml
    write_xml(tree, "/home/yasin/new_labels/{}.xml".format("%06d" % i))

按照上述代码，示例.xml变为如下.xml，可以看出我们删除了除people和cat类的类别（即dog类），并把保留类别的打标改成了小写：

<?xml version='1.0' encoding='utf-8'?>
<annotation verified="yes">
 <filename>test.jpg</filename>
 <path>C:\Users\yasin\Desktop\test</path>
 <source>
 <database>Unknown</database>
 </source>
 <size>
 <width>400</width>
 <height>300</height>
 <depth>3</depth>
 </size>
 <segmented>0</segmented>
 <object>
 <name>people</name>
 <pose>Unspecified</pose>
 <truncated>0</truncated>
 <difficult>0</difficult>
 <bndbox>
  <xmin>80</xmin>
  <ymin>69</ymin>
  <xmax>144</xmax>
  <ymax>89</ymax>
 </bndbox>
 </object>
 <object>
 <name>cat</name>
 <pose>Unspecified</pose>
 <truncated>0</truncated>
 <difficult>0</difficult>
 <bndbox>
  <xmin>40</xmin>
  <ymin>69</ymin>
  <xmax>143</xmax>
  <ymax>16</ymax>
 </bndbox>
 </object>
</annotation>

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持三水点靠木。

python 批量修改 labelImg 生成的xml文件的方法

- Author -

Miscellaneous0712

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

python判断、获取一张图片主色调的2个实例

Apr 10 Python

浅谈Python中chr、unichr、ord字符函数之间的对比

Jun 16 Python

Python实现统计文本文件字数的方法

May 05 Python

详解基于django实现的webssh简单例子

Jul 17 Python

python通过配置文件共享全局变量的实例

Jan 11 Python

Python实现京东秒杀功能代码

May 16 Python

华为校园招聘上机笔试题扑克牌大小（python）

Apr 22 Python

python有几个版本

Jun 17 Python

Python 爬虫的原理

Jul 30 Python

Python使用windows设置定时执行脚本

Nov 12 Python

python爬虫beautifulsoup库使用操作教程全解(python爬虫基础入门)

Feb 19 Python

教你如何使用Python实现二叉树结构及三种遍历

Jun 18 Python

Python定时发送天气预报邮件代码实例

Sep 09 #Python

python英语单词测试小程序代码实例

Sep 09 #Python

Python实现TCP通信的示例代码

Sep 09 #Python

Python3使用PySynth制作音乐的方法

Sep 09 #Python

python智联招聘爬虫并导入到excel代码实例

Sep 09 #Python

python 的 openpyxl模块读取 Excel文件的方法

Sep 09 #Python

pymysql模块的使用(增删改查)详解

Sep 09 #Python

You might like

php5.3中连接sqlserver2000的两种方法(com与ODBC)

2012/12/29 PHP

解析PHP函数array_flip()在重复数组元素删除中的作用

2013/06/27 PHP

thinkPHP利用ajax异步上传图片并显示、删除的示例

2018/09/26 PHP

php和redis实现秒杀活动的流程

2019/07/17 PHP

实现超用户体验 table排序javascript实现代码

2009/06/22 Javascript

Extjs学习笔记之四工具栏和菜单

2010/01/07 Javascript

弹出最简单的模式化遮罩层的js代码

2013/12/04 Javascript

js类式继承的具体实现方法

2013/12/31 Javascript

javasciprt下jquery函数$.post执行无响应的解决方法

2014/03/13 Javascript

JS获取随机数函数可自定义最小值最大值

2014/05/08 Javascript

jQuery基础知识小结

2014/12/22 Javascript

javascript实现显示和隐藏div方法汇总

2015/08/14 Javascript

jQuery实现的多级下拉菜单效果代码

2015/08/24 Javascript

NodeJS创建最简单的HTTP服务器

2017/05/15 NodeJs

基于Cookie常用操作以及属性介绍

2017/09/07 Javascript

详解Puppeteer 入门教程

2018/05/09 Javascript

vue动态设置路由权限的主要思路

2021/01/13 Vue.js

使用Python正则表达式操作文本数据的方法

2019/05/14 Python

Django中的静态文件管理过程解析

2019/08/01 Python

pytorch:torch.mm()和torch.matmul()的使用

2019/12/27 Python

Python实现从N个数中找到最大的K个数

2020/04/02 Python

Html5移动端div固定到底部实现底部导航条的几种方式

2021/03/09 HTML / CSS

财务管理专业毕业生求职信范文

2013/09/21 职场文书

质检的岗位职责

2013/11/17 职场文书

作弊检讨书1000字

2014/02/01 职场文书

学习十八大坚定理想信念心得体会

2014/03/11 职场文书

中国好声音广告词

2014/03/18 职场文书

任命书格式

2014/06/05 职场文书

九九重阳节标语

2014/10/07 职场文书

青年文明号汇报材料

2014/12/23 职场文书

小学生光盘行动倡议书

2015/04/28 职场文书

志愿者工作心得体会

2016/01/15 职场文书

《梅花魂》教学反思

2016/02/18 职场文书

2019暑假阅读倡议书

2019/06/24 职场文书

python 通过使用Yolact训练数据集

2021/04/06 Python

Win2008系统搭建DHCP服务器

2022/06/25 Servers