统计VOC数据集中的所有标签(class_name_list)

jupiter
2021-01-20 / 0 评论 / 699 阅读 / 正在检测是否收录...
温馨提示:
本文最后更新于2021年11月27日,已超过880天没有更新,若内容或图片失效,请留言反馈。

1. PASCAL VOC数据格式

<?xml version='1.0' encoding='utf-8'?>
<annotation verified="no">
  <folder>JPEGImages</folder>
  <filename>2018_06_05_09_06_55_065</filename>
  <path>F:\receive\VOC2007\JPEGImages\2018_06_05_09_06_55_065.jpg</path>
  <source>
    <database>Unknown</database>
  </source>
  <size>
    <width>2048</width>
    <height>1536</height>
    <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>1</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>530</xmin>
      <ymin>752</ymin>
      <xmax>1498</xmax>
      <ymax>1326</ymax>
    </bndbox>
  </object>
</annotation>

2.获取voc数据集标签所有的类别数

下的所有的
import xmltodict
import os

# VOC xml文件所在文件夹
annotation_dir="./labels_voc/"

label_list = list()
# 逐一处理xml文件
for file in os.listdir(annotation_dir):
    annotation_path = os.path.join(annotation_dir,file)
    
    # 读取xml文件
    with open(annotation_path,'r') as f:
        xml_str = f.read()
    
    #转为字典
    xml_dic = xmltodict.parse(xml_str)
    
    # 获取label并去重加入到label_list
    objects = xml_dic["annotation"]["object"]
    if isinstance(objects,list): # xml文件中包含多个object
        for obj in objects:
            label = obj['name']
            if label not in label_list:
                label_list.append(label)
    else:# xml文件中只包含1个object
        obj = objects
        label = object_['name']
        if label not in label_list:
            label_list.append(label)
            
print(label_list)
['aeroplane', 'cat', 'car', 'dog', 'chair', 'person', 'horse', 'bird', 'tvmonitor', 'bus', 'boat', 'diningtable', 'bicycle', 'bottle', 'sofa', 'pottedplant', 'motorbike', 'cow', 'train', 'sheep']
0

评论 (0)

打卡
取消