分类计算机视觉下的文章 - jupiter's blog

登录 / 注册

标签搜索

jupiter

累计撰写 359 篇文章
累计收到 118 条评论

搜索到 32 篇与的结果

2023-10-29
离线安装pytorch 离线安装pytorch针对网络不稳定在线安装总是中途失败的情况下采用离线安装的方式1.安装前准备1.1 安装显卡驱动略1.2 创建虚拟环境conda create -n torch1.9 python=3.8 conda activate torch1.92.离线安装2.1 版本对应关系2.2 离线下载安装包下载地址：download.pytorch.org/whl/torch_stable.html我们在这里可以找到我们需要的torch-1.9.0+cu111-cp38-cp38-linux_x86_64.whl以及torchvision-0.10.0+cu111-cp38-cp38-linux_x86_64.whl两个文件即可。注意，cu111代表CUDA11.1，cp38表示python3.8的编译环境，linux_x86_64表示x86的平台64位操作系统。下载完成后，我们将这两个文件传入你的离线主机（服务器）中。接着进入刚刚用conda创建好的虚拟环境后依次安装whl包：2.3 离线安装# CPU版 pip install torch-1.9.0+cpu-cp38-cp38-linux_x86_64.whl pip install torchvision-0.10.0+cpu-cp38-cp38-linux_x86_64.whl # GPU版 pip install torch-1.9.0+cu111-cp38-cp38-linux_x86_64.whl pip install torchvision-0.10.0+cu111-cp38-cp38-linux_x86_64.whl参考资料Pytorch1.9 CPU/GPU（CUDA11.1）安装_torch==1.9.0+cu111_太阳花的小绿豆的博客-CSDN博客
- 2023年10月29日
- 415 阅读
- 0 评论
- 0 点赞
2023-10-28
YOLOv5 四检测头配置一、网络结构说明Yolov5原网络结构如下：增加一层检测层后，网络结构如下：（其中虚线表示删除的部分，细线表示增加的数据流动方向）二、网络配置第一步，在models文件夹下面创建yolov5s-add-one-layer.yaml文件。第二步，将下面的内容粘贴到新创建的文件中。# YOLOv5 by Ultralytics, GPL-3.0 license # Parameters nc: 2 # number of classes depth_multiple: 0.33 # model depth multiple width_multiple: 0.50 # layer channel multiple anchors: - [4,5, 8,10, 22,18] # P2/4 - [10,13, 16,30, 33,23] # P3/8 - [30,61, 62,45, 59,119] # P4/16 - [116,90, 156,198, 373,326] # P5/32 # YOLOv5 v6.0 backbone backbone: # [from, number, module, args] [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2 [-1, 1, Conv, [128, 3, 2]], # 1-P2/4 [-1, 3, C3, [128]], [-1, 1, Conv, [256, 3, 2]], # 3-P3/8 [-1, 6, C3, [256]], [-1, 1, Conv, [512, 3, 2]], # 5-P4/16 [-1, 9, C3, [512]], [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32 [-1, 3, C3, [1024]], [-1, 1, SPPF, [1024, 5]], # 9 ] # YOLOv5 v6.0 head head: [[-1, 1, Conv, [512, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 6], 1, Concat, [1]], # cat backbone P4 [-1, 3, C3, [512, False]], # 13 [-1, 1, Conv, [256, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 4], 1, Concat, [1]], # cat backbone P3 # add feature extration layer [-1, 3, C3, [256, False]], # 17 [-1, 1, Conv, [128, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 2], 1, Concat, [1]], # cat backbone P3 # add detect layer [-1, 3, C3, [128, False]], # 21 (P4/4-minium) [-1, 1, Conv, [128, 3, 2]], [[-1, 18], 1, Concat, [1]], # cat head P3 # end [-1, 3, C3, [256, False]], # 24 (P3/8-small) [-1, 1, Conv, [256, 3, 2]], [[-1, 14], 1, Concat, [1]], # cat head P4 [-1, 3, C3, [512, False]], # 27 (P4/16-medium) [-1, 1, Conv, [512, 3, 2]], [[-1, 10], 1, Concat, [1]], # cat head P5 [-1, 3, C3, [1024, False]], # 30 (P5/32-large) [[21, 24, 27, 30], 1, Detect, [nc, anchors]], # Detect(P2, P3, P4, P5) ]第三步，正常执行新模型的训练流程，参考：快速使用YOLOv5进行训练VOC格式的数据集 - jupiter's blog (inat.top)。参考资料【Yolov5】Yolov5添加检测层，四层结构对小目标、密集场景更友好_yolov5增加小目标检测层-CSDN博客
- 2023年10月28日
- 266 阅读
- 0 评论
- 0 点赞
2023-10-28
[目标检测模型辅助数据标注]YOLOv5检测图片并将结果转为voc的xml格式 1.调用YOLOv5检测模型对图片文件夹执行检测并保存txt和confpython detect.py --weight runs\train\exp3\weights\best.pt --source [待检测的图片文件夹]--save-txt --save-conf2.将检测结果转为voc的xml格式import os import copy from xml.dom import minidom from tqdm import tqdm import cv2 import xmltodict # 图片\YOLO txt\xml对应的文件夹地址 base_dir = "[上面的待检测的图片文件夹]" img_dir = os.path.join(base_dir,"img") txt_dir = os.path.join(base_dir,"labels") xml_dir = os.path.join(base_dir,"xml") class_name_list = "YOLO项目中data.yaml的class_names" obj_base = { 'name':'name', 'bndbox':{ 'xmin':1, 'ymin':1, 'xmax':1, 'ymax':1 } } xml_base = { 'annotation': { 'folder':'img', 'filename':'', 'size':{ 'width':1, 'height':1, 'depth':3 }, 'object':[] } } img_name_list = os.listdir(img_dir) pbar = tqdm(total=len(img_name_list)) pbar.set_description("YOLOTXT2VOC:") for i in range(len(img_name_list)): # 拼接对应文件的地址 img_name = img_name_list[i] img_path = os.path.join(img_dir,img_name) txt_path = os.path.join(txt_dir,img_name.split(".")[0]+".txt") xml_path = os.path.join(xml_dir,img_name.split(".")[0]+".xml") # 初始化xml文件对象 xml_tmp = copy.deepcopy(xml_base) xml_tmp['annotation']['filename'] = img_name # 读取图片对应的宽高信息 img = cv2.imread(img_path) img_height,img_width = img.shape[:2] # 读取txt文件内容 with open(txt_path,'r') as f: content = f.read() # 逐行解析txt内容 for line in content.split("\n"): if not line:continue data_item_list = line.split(" ") # 跳过类别置信度小于0.5的 conf = float(data_item_list[5]) if(conf<0.5):continue # 单行txt转为xml中对应的obj obj_tmp = copy.deepcopy(obj_base) obj_tmp["name"] = class_name_list[int(data_item_list[0])] x_center = int(float(data_item_list[1])*img_width) y_center = int(float(data_item_list[2])*img_height) width = int(float(data_item_list[3])*img_width) height = int(float(data_item_list[4])*img_height) obj_tmp["bndbox"]["xmin"] = int(x_center-width/2.0) obj_tmp["bndbox"]["ymin"] = int(y_center-height/2.0) obj_tmp["bndbox"]["xmax"] = width + obj_tmp["bndbox"]["xmin"] obj_tmp["bndbox"]["ymax"] = height + obj_tmp["bndbox"]["ymin"] xml_tmp['annotation']['object'].append(obj_tmp) # 转为xml并写入对应文件 xmlstr = xmltodict.unparse(xml_tmp) xml = minidom.parseString(xmlstr) xml_pretty_str = xml.toprettyxml() with open(xml_path,"w") as f: f.write(xml_pretty_str) pbar.update(1) pbar.close()
- 2023年10月28日
- 43 阅读
- 0 评论
- 0 点赞
2022-06-16
NCNN部署yolov5s 1.NCNN编译安装参考：Linux下如何安装ncnn2.模型转换(pt->onnx->ncnn)$\color{red}{此路不通，转出来的param文件中的Reshape的参数是错的}$2.1 pt模型转换onnx# pt-->onnx python export.py --weights yolov5s.pt --img 640 --batch 1#安装onnx-simplifier pip install onnx-simplifier # onnxsim 精简模型 python -m onnxsim yolov5s.onnx yolov5s-sim.onnx Simplifying... Finish! Here is the difference: ┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ ┃ ┃ Original Model ┃ Simplified Model ┃ ┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ │ Add │ 10 │ 10 │ │ Concat │ 17 │ 17 │ │ Constant │ 20 │ 0 │ │ Conv │ 60 │ 60 │ │ MaxPool │ 3 │ 3 │ │ Mul │ 69 │ 69 │ │ Pow │ 3 │ 3 │ │ Reshape │ 6 │ 6 │ │ Resize │ 2 │ 2 │ │ Sigmoid │ 60 │ 60 │ │ Split │ 3 │ 3 │ │ Transpose │ 3 │ 3 │ │ Model Size │ 28.0MiB │ 28.0MiB │ └────────────┴────────────────┴──────────────────┘2.2 使用onnx2ncnn.exe 转换模型把你的ncnn/build/tools/onnx加入到环境变量onnx2ncnn yolov5s-sim.onnx yolov5s_6.0.param yolov5s_6.0.bin2.3 调用测试将yolov5s_6.0.param 、yolov5s_6.0.bin模型copy到ncnn/build/examples/位置，运行下面命令./yolov5 image-path就会出现Segmentation fault (core dumped)的报错3.模型转换(pt->torchscript->ncnn)3.1 pt模型转换torchscript# pt-->torchscript python export.py --weights yolov5s.pt --include torchscript --train3.2 下载编译好的 pnnx 工具包执行转换pnnx下载地址：https://github.com/pnnx/pnnx执行转换，获得 yolov5s.ncnn.param 和 yolov5s.ncnn.bin 模型文件，指定 inputshape 并且额外指定 inputshape2 转换成支持动态 shape 输入的模型 ./pnnx yolov5s.torchscript inputshape=[1,3,640,640] inputshape2=[1,3,320,320]3.3 调用测试直接测试的相关文件下载:yolov5_pnnx.zip将 yolov5s.ncnn.param 和 yolov5s.ncnn.bin 模型copy到ncnn/build/examples/位置，运行下面命令./yolov5_pnnx image-path参考资料yolov5 模型部署NCNN（详细过程）Linux&Jetson Nano下编译安装ncnnYOLOv5转NCNN过程Jetson Nano 移植ncnn详细记录u版YOLOv5目标检测ncnn实现（第二版）
- 2022年06月16日
- 955 阅读
- 0 评论
- 0 点赞
2022-04-24
YOLOV5 加入注意力机制(以CA注意力机制为例) 0.CA注意力机制网络结构图1.在common.py中先添加你想添加的注意力模块### 常用注意力机制模块实现 class h_sigmoid(nn.Module): def __init__(self, inplace=True): super(h_sigmoid, self).__init__() self.relu = nn.ReLU6(inplace=inplace) def forward(self, x): return self.relu(x + 3) / 6 class h_swish(nn.Module): def __init__(self, inplace=True): super(h_swish, self).__init__() self.sigmoid = h_sigmoid(inplace=inplace) def forward(self, x): return x * self.sigmoid(x) class CoordAtt(nn.Module): def __init__(self, inp, oup, reduction=32): super(CoordAtt, self).__init__() self.pool_h = nn.AdaptiveAvgPool2d((None, 1)) self.pool_w = nn.AdaptiveAvgPool2d((1, None)) mip = max(8, inp // reduction) self.conv1 = nn.Conv2d(inp, mip, kernel_size=1, stride=1, padding=0) self.bn1 = nn.BatchNorm2d(mip) self.act = h_swish() self.conv_h = nn.Conv2d(mip, oup, kernel_size=1, stride=1, padding=0) self.conv_w = nn.Conv2d(mip, oup, kernel_size=1, stride=1, padding=0) def forward(self, x): identity = x n, c, h, w = x.size() x_h = self.pool_h(x) x_w = self.pool_w(x).permute(0, 1, 3, 2) y = torch.cat([x_h, x_w], dim=2) y = self.conv1(y) y = self.bn1(y) y = self.act(y) x_h, x_w = torch.split(y, [h, w], dim=2) x_w = x_w.permute(0, 1, 3, 2) a_h = self.conv_h(x_h).sigmoid() a_w = self.conv_w(x_w).sigmoid() out = identity * a_w * a_h return out class SELayer(nn.Module): def __init__(self, c1, r=16): super(SELayer, self).__init__() self.avgpool = nn.AdaptiveAvgPool2d(1) self.l1 = nn.Linear(c1, c1 // r, bias=False) self.relu = nn.ReLU(inplace=True) self.l2 = nn.Linear(c1 // r, c1, bias=False) self.sig = nn.Sigmoid() def forward(self, x): b, c, _, _ = x.size() y = self.avgpool(x).view(b, c) y = self.l1(y) y = self.relu(y) y = self.l2(y) y = self.sig(y) y = y.view(b, c, 1, 1) return x * y.expand_as(x) class eca_layer(nn.Module): """Constructs a ECA module. Args: channel: Number of channels of the input feature map k_size: Adaptive selection of kernel size """ def __init__(self, channel, k_size=3): super(eca_layer, self).__init__() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.conv = nn.Conv1d(1, 1, kernel_size=k_size, padding=(k_size - 1) // 2, bias=False) self.sigmoid = nn.Sigmoid() def forward(self, x): # feature descriptor on the global spatial information y = self.avg_pool(x) # Two different branches of ECA module y = self.conv(y.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1) # Multi-scale information fusion y = self.sigmoid(y) x = x * y.expand_as(x) return x * y.expand_as(x) class ChannelAttention(nn.Module): def __init__(self, in_planes, ratio=16): super(ChannelAttention, self).__init__() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.max_pool = nn.AdaptiveMaxPool2d(1) self.f1 = nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False) self.relu = nn.ReLU() self.f2 = nn.Conv2d(in_planes // ratio, in_planes, 1, bias=False) # 写法二,亦可使用顺序容器 # self.sharedMLP = nn.Sequential( # nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False), nn.ReLU(), # nn.Conv2d(in_planes // rotio, in_planes, 1, bias=False)) self.sigmoid = nn.Sigmoid() def forward(self, x): avg_out = self.f2(self.relu(self.f1(self.avg_pool(x)))) max_out = self.f2(self.relu(self.f1(self.max_pool(x)))) out = self.sigmoid(avg_out + max_out) return out class SpatialAttention(nn.Module): def __init__(self, kernel_size=7): super(SpatialAttention, self).__init__() assert kernel_size in (3, 7), 'kernel size must be 3 or 7' padding = 3 if kernel_size == 7 else 1 self.conv = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False) self.sigmoid = nn.Sigmoid() def forward(self, x): avg_out = torch.mean(x, dim=1, keepdim=True) max_out, _ = torch.max(x, dim=1, keepdim=True) x = torch.cat([avg_out, max_out], dim=1) x = self.conv(x) return self.sigmoid(x) class CBAMC3(nn.Module): # CSP Bottleneck with 3 convolutions def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion super(CBAMC3, self).__init__() c_ = int(c2 * e) # hidden channels self.cv1 = Conv(c1, c_, 1, 1) self.cv2 = Conv(c1, c_, 1, 1) self.cv3 = Conv(2 * c_, c2, 1) self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)]) self.channel_attention = ChannelAttention(c2, 16) self.spatial_attention = SpatialAttention(7) # self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)]) def forward(self, x): out = self.channel_attention(x) * x print('outchannels:{}'.format(out.shape)) out = self.spatial_attention(out) * out return out 2.修改yolo.py在def parse_model(d, ch):函数的代码中增加你想添加的注意力名称添加前 if m in [Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv, BottleneckCSP, C3, C3TR, C3SPP, C3Ghost]: c1, c2 = ch[f], args[0] if c2 != no: # if not output c2 = make_divisible(c2 * gw, 8)添加后 if m in [Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv, BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, CoordAtt]: c1, c2 = ch[f], args[0] if c2 != no: # if not output c2 = make_divisible(c2 * gw, 8)3.修改yaml文件/创建自定义的yaml文件示例-yolov5s-CA.yaml
- 2022年04月24日
- 547 阅读
- 0 评论
- 1 点赞
2022-04-13
YOLOv1论文复现一.整理结构概览1.数据格式转换部分2.DataSet,DateLoader部分2.1 主模块2.2 DataSet辅助模块--boxs2yololabel将单个图片中yolo格式的所有box转为yolov1网络模型所需要的label3.YOLOv1网络部分4.YOLOv1损失函数部分二.对每部分逐一进行实现1.数据格式转换部分--VOC2YOLO.py模块封装import xmltodict import os from progressbar import * """ 将单个xml文件中的单个object转换为yolov1格式 """ def get_yolo_data(obj,img_width,img_height): # 获取voc格式的数据信息 name = obj['name'] xmin = float(obj['bndbox']['xmin']) xmax = float(obj['bndbox']['xmax']) ymin = float(obj['bndbox']['ymin']) ymax = float(obj['bndbox']['ymax']) # 计算对应的yolo格式的数据信息，并进行归一化处理 class_idx = class_names.index(name) x_lefttop = xmin / img_width y_lefttop = ymin / img_height box_width = (xmax - xmin) / img_width box_height = (ymax - ymin) / img_height # 组转YOLO格式的数据 yolo_data = "{} {} {} {} {}\n".format(class_idx,x_lefttop,y_lefttop,box_width,box_height) return yolo_data """ 逐一处理xml文件，转换为YOLO所需的格式 + input + voc_xml_dir:VOC数据集的所有xml文件存储的文件夹 + yolo_txt_dir：转化完成后的YOLOv1格式数据的存储文件夹 + class_names：涉及的所有的类别 + output + yolo_txt_dir文件夹下的文件中的对应每张图片的YOLO格式的数据 """ def VOC2YOLOv1(voc_xml_dir,yolo_txt_dir,class_names): #进度条支持 count = 0 #计数器 widgets = ['VOC2YOLO: ',Percentage(), ' ', Bar('#'),' ', Timer(),' ', ETA()] pbar = ProgressBar(widgets=widgets, maxval=len(os.listdir(xml_dir))).start() # 对xml文件进行逐一处理 for xml_file in os.listdir(xml_dir): # 路径组装 xml_file_path = os.path.join(xml_dir,xml_file) txt_file_path = os.path.join(txt_dir,xml_file[:-4]+".txt") yolo_data = "" # 读取xml文件 with open(xml_file_path) as f: xml_str = f.read() # 转为字典 xml_dic = xmltodict.parse(xml_str) # 获取图片的width、height img_width = float(xml_dic["annotation"]["size"]["width"]) img_height = float(xml_dic["annotation"]["size"]["height"]) # 获取xml文件中的所有object objects = xml_dic["annotation"]["object"] # 对所有的object进行逐一处理 if isinstance(objects,list): # xml文件中包含多个object for obj in objects: yolo_data += get_yolo_data(obj,img_width,img_height) else: # xml文件中包含1个object obj = objects yolo_data += get_yolo_data(obj,img_width,img_height) # 将图片对应的yolo格式的数据写入到对应的文件 with open(txt_file_path,'w') as f: f.write(yolo_data) #更新进度 count += 1 pbar.update(count) pbar.finish() # 释放进度条调用测试voc_xml_dir='../VOC2007/Annotations/' #原xml路径 yolo_txt_dir='../VOC2007/labels/' #转换后txt文件存放路径 # 所有待检测的labels class_names = ['aeroplane', 'cat', 'car', 'dog', 'chair', 'person', 'horse', 'bird', 'tvmonitor', 'bus', 'boat', 'diningtable', 'bicycle', 'bottle', 'sofa', 'pottedplant', 'motorbike', 'cow', 'train', 'sheep'] VOC2YOLOv1(voc_xml_dir,yolo_txt_dir,class_names)VOC2YOLO: 100% |##########################| Elapsed Time: 0:01:18 Time: 0:01:182.DataSet,DateLoader部分模块封装from torch.utils.data import Dataset,DataLoader from PIL import Image """ 构建YOLOv1的dataset，用于加载VOC数据集(已对其进行了YOLO格式转换) + input + mode:调用模式，train/val + DATASET_PATH：VOC数据集的根目录，已对其进行了YOLO格式转换 + yolo_input_size：训练和测试所用的图片大小，通常为448 """ class Dataset_VOC(Dataset): def __init__(self,mode = "train",DATASET_PATH = "../VOC2007/",yolo_input_size = 448): self.filenames = [] # 储存数据集的文件名称 # 获取数据集的文件夹列表 if mode == "train": with open(DATASET_PATH + "ImageSets/Main/train.txt", 'r') as f: # 调用包含训练集图像名称的txt文件 self.filenames = [x.strip() for x in f] elif mode =='val': with open(DATASET_PATH + "ImageSets/Main/val.txt", 'r') as f: # 调用包含训练集图像名称的txt文件 self.filenames = [x.strip() for x in f] # 图像文件所在的文件夹 self.img_dir = os.path.join(DATASET_PATH,"JPEGImages") # 图像对应的label文件(.txt文件)的文件夹 self.label_dir = os.path.join(DATASET_PATH,"labels") def boxs2yololabel(self,boxs): """ 将boxs数据转换为训练时方便计算Loss的数据形式(7,7,5*B+cls_num) 单个box数据格式：(cls,x_rela_width,y_rela_height,w_rela_width,h_rela_height) x_rela_width：相对width=1的x的取值 """ gridsize = 1.0/7 # 网格大小 # 初始化result，此处需要根据不同数据集的类别个数进行修改 label = np.zeros((7,7,30)) # 根据box的数据填充label for i in range(len(boxs)//5): # 计算当前box会位于哪个网格 gridx = int(boxs[i*5+1] // gridsize) # 当前bbox中心落在第gridx个网格,列 gridy = int(boxs[i*5+2] // gridsize) # 当前bbox中心落在第gridy个网格,行 # 计算box相对于网格的左上角的点的相对位置 # box中心坐标 - 网格左上角点的坐标)/网格大小 ==> box中心点的相对位置 x_offset = boxs[i*5+1] / gridsize - gridx y_offset = boxs[i*5+2] / gridsize - gridy # 将第gridy行，gridx列的网格设置为负责当前ground truth的预测，置信度和对应类别概率均置为1 label[gridy, gridx, 0:5] = np.array([x_offset, y_offset, boxs[i*5+3], boxs[i*5+4], 1]) label[gridy, gridx, 5:10] = np.array([x_offset, y_offset, boxs[i*5+3], boxs[i*5+4], 1]) label[gridy, gridx, 10+int(boxs[i*5])] = 1 return label def __len__(self): return len(self.filenames) def __getitem__(self, index): # 构建image部分 # 读取图片 img_path = os.path.join(self.img_dir,self.filenames[index]+".jpg") img = cv2.imread(img_path) # 计算padding值将图像padding为正方形 h,w = img.shape[0:2] padw,padh = 0,0 if h>w: padw = (h - w) // 2 img = np.pad(img,((0,0),(padw,padw),(0,0)),'constant',constant_values=0) elif w>h: padh = (w - h) // 2 img = np.pad(img,((padh,padh),(0,0),(0,0)), 'constant', constant_values=0) # 然后resize为yolo网络所需要的尺寸448x448 yolo_input_size = 448 # 输入YOLOv1网络的图像尺寸为448x448 img = cv2.resize(img,(yolo_input_size,yolo_input_size)) # 构建label部分 # 读取图像对应的box信息，按1维的方式储存，每5个元素表示一个bbox的(cls_id,x_lefttop,y_lefttop,w,h) label_path = os.path.join(self.label_dir,self.filenames[index]+".txt") with open(label_path) as f: boxs = f.read().split('\n') boxs = [x.split() for x in boxs] boxs = [float(x) for y in boxs for x in y] # 根据padding、图像增广等操作，将原始的box数据转换为修改后图像的box数据 for i in range(len(boxs)//5): if padw != 0: boxs[i*5+1] = (boxs[i*5+1] * w + padw) / h boxs[i*5+3] = (boxs[i*5+3] * w) / h elif padh != 0: boxs[i*5+2] = (boxs[i*5+2] * h + padh) / w boxs[i*5+4] = (boxs[i*5+4] * h) / w # boxs转为yololabel label = self.boxs2yololabel(boxs) # img,label转为tensor img = transforms.ToTensor()(img) label = transforms.ToTensor()(label) return img,label调用测试train_dataset = Dataset_VOC(mode="train") val_dataset = Dataset_VOC(mode="val") train_dataloader = DataLoader(train_dataset,batch_size=2,shuffle=True) val_dataloader = DataLoader(val_dataset,batch_size=2,shuffle=True) for i,(inputs,labels) in enumerate(train_dataloader): print(inputs.shape,labels.shape) break for i,(inputs,labels) in enumerate(val_dataloader): print(inputs.shape,labels.shape) breaktorch.Size([2, 3, 448, 448]) torch.Size([2, 30, 7, 7]) torch.Size([2, 3, 448, 448]) torch.Size([2, 30, 7, 7])3.YOLOv1网络部分网络结构模块封装import torch import torch.nn as nn class YOLOv1(nn.Module): def __init__(self): super(YOLOv1,self).__init__() self.feature = nn.Sequential( nn.Conv2d(in_channels=3,out_channels=64,kernel_size=7,stride=2,padding=3), nn.LeakyReLU(), nn.MaxPool2d(kernel_size=2,stride=2), nn.Conv2d(in_channels=64,out_channels=192,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.MaxPool2d(kernel_size=2,stride=2), nn.Conv2d(in_channels=192,out_channels=128,kernel_size=1,stride=1,padding=0), nn.LeakyReLU(), nn.Conv2d(in_channels=128,out_channels=256,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.Conv2d(in_channels=256,out_channels=256,kernel_size=1,stride=1,padding=0), nn.LeakyReLU(), nn.Conv2d(in_channels=256,out_channels=512,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.MaxPool2d(kernel_size=2,stride=2), nn.Conv2d(in_channels=512,out_channels=256,kernel_size=1,stride=1,padding=0), nn.LeakyReLU(), nn.Conv2d(in_channels=256,out_channels=512,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.Conv2d(in_channels=512,out_channels=256,kernel_size=1,stride=1,padding=0), nn.LeakyReLU(), nn.Conv2d(in_channels=256,out_channels=512,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.Conv2d(in_channels=512,out_channels=256,kernel_size=1,stride=1,padding=0), nn.LeakyReLU(), nn.Conv2d(in_channels=256,out_channels=512,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.Conv2d(in_channels=512,out_channels=256,kernel_size=1,stride=1,padding=0), nn.LeakyReLU(), nn.Conv2d(in_channels=256,out_channels=512,kernel_size=3,stride=1,padding=1), nn.Conv2d(in_channels=512,out_channels=512,kernel_size=1,stride=1,padding=0), nn.LeakyReLU(), nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.MaxPool2d(kernel_size=2,stride=2), nn.Conv2d(in_channels=1024,out_channels=512,kernel_size=1,stride=1,padding=0), nn.LeakyReLU(), nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.Conv2d(in_channels=1024,out_channels=512,kernel_size=1,stride=1,padding=0), nn.LeakyReLU(), nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.Conv2d(in_channels=1024,out_channels=1024,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.Conv2d(in_channels=1024,out_channels=1024,kernel_size=3,stride=2,padding=1), nn.LeakyReLU(), nn.Conv2d(in_channels=1024,out_channels=1024,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), nn.Conv2d(in_channels=1024,out_channels=1024,kernel_size=3,stride=1,padding=1), nn.LeakyReLU(), ) self.classify = nn.Sequential( nn.Flatten(), nn.Linear(1024 * 7 * 7, 4096), nn.Dropout(0.5), nn.Linear(4096, 1470) #1470=7*7*30 ) def forward(self,x): x = self.feature(x) x = self.classify(x) return x调用测试yolov1 = YOLOv1() fake_input = torch.zeros((1,3,448,448)) print(fake_input.shape) output = yolov1(fake_input) print(output.shape)yolov1 = YOLOv1() fake_input = torch.zeros((1,3,448,448)) print(fake_input.shape) output = yolov1(fake_input) print(output.shape)4.YOLOv1损失函数部分损失函数详解模块封装""" + input + pred: (batch_size,30,7,7)的网络输出数据 + labels: (batch_size,30,7,7)的样本标签数据 + output + 当前批次样本的平均损失 """ """ + YOLOv1 的损失分为3部分 + 坐标预测损失 + 置信度预测损失 + 含object的box的confidence预测损失 + 不含object的box的confidence预测损失 + 类别预测损失 """ class YOLOv1_Loss(nn.Module): def __init__(self): super(YOLOv1_Loss,self).__init__() def convert_box_type(self,src_box): """ box格式转换 + input + src_box : [box_x_lefttop,box_y_lefttop,box_w,box_h] + output + dst_box : [box_x1,box_y1,box_x2,box_y2] """ x,y,w,h = tuple(src_box) x1,y1 = x,y x2,y2 = x+w,y+w return [x1,y1,x2,y2] def cal_iou(self,box1,box2): """ iou计算 """ # 求相交区域左上角的坐标和右下角的坐标 box_intersect_x1 = max(box1[0], box2[0]) box_intersect_y1 = max(box1[1], box2[1]) box_intersect_x2 = min(box1[2], box2[2]) box_intersect_y2 = min(box1[3], box2[3]) # 求二者相交的面积 area_intersect = (box_intersect_y2 - box_intersect_y1) * (box_intersect_x2 - box_intersect_x1) # 求box1,box2的面积 area_box1 = (box1[2] - box1[0]) * (box1[3] - box1[1]) area_box2 = (box2[2] - box2[0]) * (box2[3] - box2[1]) # 求二者相并的面积 area_union = area_box1 + area_box2 - area_intersect # 计算iou（交并比） iou = area_intersect / area_union return iou def forward(self,pred,target): batch_size = pred.shape[0] lambda_noobj = 0.5 # lambda_noobj参数 lambda_coord = 5 # lambda_coord参数 site_pred_loss = 0 # 坐标预测损失 obj_confidence_pred_loss = 0 # 含object的box的confidence预测损失 noobj_confidence_pred_loss = 0 #不含object的box的confidence预测损失 class_pred_loss = 0 # 类别预测损失 for batch_size_index in range(batch_size): # batchsize循环 for x_index in range(7): # x方向网格循环 for y_index in range(7): # y方向网格循环 # 获取单个网格的预测数据和真实数据 pred_data = pred[batch_size_index,:,x_index,y_index] # [x,y,w,h,confidence,x,y,w,h,confidence,cls*20] true_data = target[batch_size_index,:,x_index,y_index] #[x,y,w,h,confidence,x,y,w,h,confidence,cls*20] if true_data[4]==1:# 如果包含物体 # 解析预测数据和真实数据 pred_box_confidence_1 = pred_data[0:5] # [x,y,w,h,confidence1] pred_box_confidence_2 = pred_data[5:10] # [x,y,w,h,confidence2] true_box_confidence = true_data[0:5] # [x,y,w,h,confidence] # 获取两个预测box并计算与真实box的iou iou1 = self.cal_iou(self.convert_box_type(pred_box_confidence_1[0:4]),self.convert_box_type(true_box_confidence[0:4])) iou2 = self.cal_iou(self.convert_box_type(pred_box_confidence_2[0:4]),self.convert_box_type(true_box_confidence[0:4])) # 在两个box中选择iou大的box负责预测物体 if iou1 >= iou2: better_box_confidence,bad_box_confidence = pred_box_confidence_1,pred_box_confidence_2 better_iou,bad_iou = iou1,iou2 else: better_box_confidence,bad_box_confidence = pred_box_confidence_2,pred_box_confidence_1 better_iou,bad_iou = iou2,iou1 # 计算坐标预测损失 site_pred_loss += lambda_coord * torch.sum((better_box_confidence[0:2]- true_box_confidence[0:2])**2) # x,y的预测损失 site_pred_loss += lambda_coord * torch.sum((better_box_confidence[2:4].sqrt()-true_box_confidence[2:4].sqrt())**2) # w,h的预测损失 # 计算含object的box的confidence预测损失 obj_confidence_pred_loss += (better_box_confidence[4] - better_iou)**2 # iou比较小的bbox不负责预测物体，因此confidence loss算在noobj中 # 因此还需计算不含object的box的confidence预测损失 noobj_confidence_pred_loss += lambda_noobj * (bad_box_confidence[4] - bad_iou)**2 # 计算类别损失 class_pred_loss += torch.sum((pred_data[10:] - true_data[10:])**2) else: # 如果不包含物体，则只有置信度损失--noobj_confidence_pred_loss # [4,9]代表取两个预测框的confidence noobj_confidence_pred_loss += lambda_noobj * torch.sum(pred[batch_size_index,(4,9),x_index,y_index]**2) loss = site_pred_loss + obj_confidence_pred_loss + noobj_confidence_pred_loss + class_pred_loss return loss/batch_size调用测试loss = YOLOv1_Loss() label1 = torch.zeros([1,30,7,7]) label2 = torch.zeros([1,30,7,7]) print(label1.shape,label2.shape) print(loss(label1,label2)) loss = YOLOv1_Loss() label1 = torch.randn([8,30,7,7]) label2 = torch.randn([8,30,7,7]) print(label1.shape,label2.shape) print(loss(label1,label2))torch.Size([1, 30, 7, 7]) torch.Size([1, 30, 7, 7]) tensor(0.) torch.Size([8, 30, 7, 7]) torch.Size([8, 30, 7, 7]) tensor(46.7713)三.整体封装测试#TODO参考资料【YOLOv1论文翻译】：YOLO: 一体化的，实时的物体检测YOLOv1学习：（一）网络结构推导与实现YOLOv1学习：（二）损失函数理解和实现
- 2022年04月13日
- 811 阅读
- 0 评论
- 0 点赞
2022-03-21
目标检测结果可视化 1.核心函数# 目标检测效果可视化 import numpy as np import cv2 # class_list class_list = ['Plane', 'BridgeVehicle', 'Person', 'LuggageVehicle', 'RefuelVehicle', 'FoodVehicle', 'LuggageVehicleHead', 'TractorVehicle', 'RubbishVehicle', 'FollowMe'] """ 生成color_list """ # 生成number个color def random_color(color_num): color_list = [] for j in range(color_num): color_single = (int(np.random.randint(0,255)),int(np.random.randint(0,255)),int(np.random.randint(0,255))) color_list.append(tuple(color_single)) return color_list color_list = random_color(len(class_list)) """ 目标检测预测结果可视化函数 + img:进行目标检测的图片 + bbox_list:处理过的预测结果 + class_name_list:用于将cls_is转为cls_name + color_list:绘制不同的类别使用不同的颜色 + thresh:阈值 """ def vis_detections(img, bbox_list,class_name_list=class_list,color_list=color_list,thresh=0.5): for bbox in bbox_list: # 参数解析 x1,y1,x2,y2,score,cls_id = bbox[0],bbox[1],bbox[2], bbox[3],bbox[4],int(bbox[5]) cls_name = class_name_list[cls_id] color = color_list[cls_id] # 跳过低于阈值的框 if score<thresh:continue # 画框 cv2.rectangle(img, (int(x1),int(y1)), (int(x2),int(y2)),color_list[cls_id],2) # 画label label_text = '{:s} {:.3f}'.format(cls_name, score) cv2.putText(img, label_text, (x1-5, y1-5),cv2.FONT_HERSHEY_SIMPLEX, 0.8, color_list[cls_id], 2) return img2.调用测试img = cv2.imread("./data_handle/img/00001.jpg.") bbox_list = [ [882,549,1365,631,1,1] ] img = vis_detections(img,bbox_list) img_show = cv2.cvtColor(img,cv2.COLOR_BGR2RGB) import matplotlib.pyplot as plt plt.figure(dpi=200) plt.xticks([]) plt.yticks([]) plt.imshow(img_show) plt.show()
- 2022年03月21日
- 594 阅读
- 0 评论
- 0 点赞
2022-01-08
pytorch 逐层加载模型参数 pytorch 逐层加载模型参数1.现象描述当对模型结构的某些层做了微小调整之后，导致该层参数的shape发生了微小变化导致无法对整个模型进行加载，这时可以考虑通过逐层加载的方式来跳过某些层完成对保存好的模型的加载。2.代码实现# 模型参数逐层加载 ckpt_state_dict = torch.load("ckpt/best_att_wer_0.015267175572519083.pth")["model"] model_state_dict = model.state_dict() for key in model_state_dict.keys(): if model_state_dict[key].shape == ckpt_state_dict[key].shape: model_state_dict[key] = ckpt_state_dict[key] model.load_state_dict(model_state_dict)
- 2022年01月08日
- 746 阅读
- 0 评论
- 0 点赞
2022-01-07
Yolov5s:Yolov5sv6.0网络结构分析与实现 1.参考网络结构图(v5.0的)2. 配置文件解析原始配置文件yolov5s.yaml# YOLOv5 by Ultralytics, GPL-3.0 license # Parameters nc: 80 # number of classes depth_multiple: 0.33 # model depth multiple width_multiple: 0.50 # layer channel multiple anchors: - [10,13, 16,30, 33,23] # P3/8 - [30,61, 62,45, 59,119] # P4/16 - [116,90, 156,198, 373,326] # P5/32 # YOLOv5 v6.0 backbone backbone: # [from, number, module, args] [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2 [-1, 1, Conv, [128, 3, 2]], # 1-P2/4 [-1, 3, C3, [128]], [-1, 1, Conv, [256, 3, 2]], # 3-P3/8 [-1, 6, C3, [256]], [-1, 1, Conv, [512, 3, 2]], # 5-P4/16 [-1, 9, C3, [512]], [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32 [-1, 3, C3, [1024]], [-1, 1, SPPF, [1024, 5]], # 9 ] # YOLOv5 v6.0 head head: [[-1, 1, Conv, [512, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 6], 1, Concat, [1]], # cat backbone P4 [-1, 3, C3, [512, False]], # 13 [-1, 1, Conv, [256, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 4], 1, Concat, [1]], # cat backbone P3 [-1, 3, C3, [256, False]], # 17 (P3/8-small) [-1, 1, Conv, [256, 3, 2]], [[-1, 14], 1, Concat, [1]], # cat head P4 [-1, 3, C3, [512, False]], # 20 (P4/16-medium) [-1, 1, Conv, [512, 3, 2]], [[-1, 10], 1, Concat, [1]], # cat head P5 [-1, 3, C3, [1024, False]], # 23 (P5/32-large) [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) ]配置文件解析Model( (model): Sequential( (0): Conv(3,32,6,2,2) # 3x640x640-->32x320x320 (1): Conv(32,64,3,2) # 32x320x320-->64x160x160 (2): C3(64,64) # 64x160x160-->64x160x160 (3): Conv(64,128,3,2) # 64x160x160-->128x80x80 #P3 (4): C3(128,128) # 128x80x80-->128x80x80 (5): Conv(128,256,3,2) # 128x80x80-->256x40x40 #P4 (6): C3(256,256) # 256x40x40-->256x40x40 (7): Conv(256,512,3,2) # 256x40x40-->512x20x20 #P5 (8): SPP(512,512,[5, 9, 13]) # 512x20x20-->512x20x20 (9): C3(512,512) # 512x20x20-->512x20x20 #P6 (10): Conv(512,256,1,1) # 512x20x20-->256x20x20 (11): nn.Upsample(None, 2, 'nearest') # 256x20x20-->256x40x40 (12): Concat() # [x,p4]==>512x40x40 (13): C3(512,256) # 512x40x40-->256x40x40 (14): Conv(256,128) # 256x40x40-->128x40x40 (15): nn.Upsample(None, 2, 'nearest') # 128x40x40-->128x80x80 (16): Concat() # [x,p3]==>256x80x80 (17): C3(256,128) # 256x80x80-->128x80x80 #out1 (18): Conv(128,128,3,2) # 128x80x80-->128x40x40 (19): Concat() # [x,p4]==>384x40x40 (20): C3(384,256) # 384x40x40-->256x40x40 #out2 (21): Conv(256,256,3,2) # 256x40x40-->256x20x20 (22): Concat() # [x,p5]==>768x20x20 (23): C3(768,512) # 768x20x20 -->512x20x20 #out3 (24): Detect( (0): Conv2d(128, 255) # 128x80x80-->((cls_num+4+1)*anchor_num)x80x80 #out1_detect==>[3, 80, 80, 85] (1): Conv2d(256, 255) # 256x40x40-->((cls_num+4+1)*anchor_num)x40x40 #out2_detect==>[3, 40, 40, 85] (2): Conv2d(512, 255) # 512x20x20-->((cls_num+4+1)*anchor_num)x20x20 #out3_detect==>[3, 20, 20, 85] ) ) )3.代码实现3.1 公共基本块import torch import torch.nn as nn import warnings class Conv(nn.Module): # 标准卷积 def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups super().__init__() self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False) self.bn = nn.BatchNorm2d(c2) self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) def forward(self, x): return self.act(self.bn(self.conv(x))) def forward_fuse(self, x): return self.act(self.conv(x)) def forward_fuse(self, x): return self.act(self.conv(x)) class Bottleneck(nn.Module): # 标准bottleneck def __init__(self, c1, c2, shortcut=True, g=1, e=0.5): # ch_in, ch_out, shortcut, groups, expansion super().__init__() c_ = int(c2 * e) # hidden channels self.cv1 = Conv(c1, c_, 1, 1) self.cv2 = Conv(c_, c2, 3, 1, g=g) self.add = shortcut and c1 == c2 def forward(self, x): return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x)) class C3(nn.Module): # CSP Bottleneck with 3 convolutions def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion super().__init__() c_ = int(c2 * e) # hidden channels self.cv1 = Conv(c1, c_, 1, 1) self.cv2 = Conv(c1, c_, 1, 1) self.cv3 = Conv(2 * c_, c2, 1) # act=FReLU(c2) self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)]) # self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)]) def forward(self, x): return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1)) class SPPF(nn.Module): # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher def __init__(self, c1, c2, k=5): # equivalent to SPP(k=(5, 9, 13)) super().__init__() c_ = c1 // 2 # hidden channels self.cv1 = Conv(c1, c_, 1, 1) self.cv2 = Conv(c_ * 4, c2, 1, 1) self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2) def forward(self, x): x = self.cv1(x) with warnings.catch_warnings(): warnings.simplefilter('ignore') # suppress torch 1.9.0 max_pool2d() warning y1 = self.m(x) y2 = self.m(y1) return self.cv2(torch.cat([x, y1, y2, self.m(y2)], 1)) class Concat(nn.Module): # 沿维度连接张量列表 def __init__(self, dimension=1): super().__init__() self.d = dimension def forward(self, x): return torch.cat(x, self.d) def autopad(k, p=None): # kernel, padding # 计算然卷积结果与输入具有相同大小的padding if p is None: p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad return p3.2 backboneclass Yolov5sV6Backbone(nn.Module): def __init__(self): super(Yolov5sV6Backbone,self).__init__() self.backbone_part1 = nn.Sequential( Conv(3,32,6,2,2), # 0 Conv(32,64,3,2), # 1 C3(64,64), #2 Conv(64,128,3,2) #3 ) self.backbone_part2 = nn.Sequential( C3(128,128), # 4_1 Conv(128,256,3,2) # 5 ) self.backbone_part3 = nn.Sequential( C3(256,256), # 6 Conv(256,512,3,2) # 7 ) self.backbone_part4 = nn.Sequential( C3(512,512), # 8 SPPF(512,512,5), # 9 ) def forward(self,x): p3 = self.backbone_part1(x) p4 = self.backbone_part2(p3) p5 = self.backbone_part3(p4) p6 = self.backbone_part4(p5) return p3,p4,p5,p6调用测试backbone = Yolov5sV6Backbone() fake_input = torch.rand(1,3,640,640) p3,p4,p5,p6 = backbone(fake_input) print(p3.shape,p4.shape,p5.shape,p6.shape)torch.Size([1, 128, 80, 80]) torch.Size([1, 256, 40, 40]) torch.Size([1, 512, 20, 20]) torch.Size([1, 512, 20, 20])3.3 headclass Yolov5sV6Head(nn.Module): def __init__(self): super(Yolov5sV6Head,self).__init__() self.head_part1 = nn.Sequential( Conv(512,256,1,1), # 10 nn.Upsample(None, 2, 'nearest') # 11 ) self.head_concat1 =Concat() # 12 self.head_part2 = nn.Sequential( C3(512,256), # 13 Conv(256,128), # 14 nn.Upsample(None, 2, 'nearest') # 15 ) self.head_concat2 = Concat() # 16 self.head_out1 = C3(256,128) # 17 # 128x80x80 self.head_part3 = Conv(128,128,3,2) # 18 self.head_concat3 = Concat() # 19 self.head_out2 = C3(384,256) # 20 self.head_part4 = Conv(256,256,3,2) # 21 self.head_concat4 = Concat() # 22 self.head_out3 = C3(768,512) # 23 # 512x40x40 def forward(self,p3,p4,p5,x): x = self.head_part1(x) x = self.head_concat1([x,p4]) x = self.head_part2(x) x = self.head_concat2([x,p3]) out1 = self.head_out1(x) x = self.head_part3(out1) x = self.head_concat3([x,p4]) out2 = self.head_out2(x) x = self.head_part4(out2) x = self.head_concat4([x,p5]) out3 = self.head_out3(x) return out1,out2,out3调用测试backbone = Yolov5sV6Backbone() head = Yolov5sV6Head() fake_input = torch.rand(1,3,640,640) p3,p4,p5,p6 = backbone(fake_input) out1,out2,out3 = head(p3,p4,p5,p6) print(out1.shape,out2.shape,out3.shape)3.4 detect 部分class Yolov5sV6Detect(nn.Module): stride = None # strides computed during build def __init__(self, nc=80, anchors=(), ch=[128,256,512], inplace=True): # detection layer super(Yolov5sV6Detect,self).__init__() self.nc = nc # number of classes self.no = nc + 5 # number of outputs per anchor self.nl = len(anchors) # number of detection layers self.na = len(anchors[0]) // 2 # number of anchors self.grid = [torch.zeros(1)] * self.nl # init grid self.anchor_grid = [torch.zeros(1)] * self.nl # init anchor grid self.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2)) # shape(nl,na,2) self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch) # output conv self.inplace = inplace # use in-place ops (e.g. slice assignment) def forward(self, x): z = [] # inference output for i in range(self.nl): x[i] = self.m[i](x[i]) # conv bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85) x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous() if not self.training: # inference if self.grid[i].shape[2:4] != x[i].shape[2:4]: self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i) y = x[i].sigmoid() if self.inplace: y[..., 0:2] = (y[..., 0:2] * 2 - 0.5 + self.grid[i]) * self.stride[i] # xy y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh else: # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953 xy = (y[..., 0:2] * 2 - 0.5 + self.grid[i]) * self.stride[i] # xy wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh y = torch.cat((xy, wh, y[..., 4:]), -1) z.append(y.view(bs, -1, self.no)) return x if self.training else (torch.cat(z, 1), x) def _make_grid(self, nx=20, ny=20, i=0): d = self.anchors[i].device if check_version(torch.__version__, '1.10.0'): # torch>=1.10.0 meshgrid workaround for torch>=0.7 compatibility yv, xv = torch.meshgrid([torch.arange(ny, device=d), torch.arange(nx, device=d)], indexing='ij') else: yv, xv = torch.meshgrid([torch.arange(ny, device=d), torch.arange(nx, device=d)]) grid = torch.stack((xv, yv), 2).expand((1, self.na, ny, nx, 2)).float() anchor_grid = (self.anchors[i].clone() * self.stride[i]) \ .view((1, self.na, 1, 1, 2)).expand((1, self.na, ny, nx, 2)).float() return grid, anchor_grid调用测试anchors = [ [10,13, 16,30, 33,23], [30,61, 62,45, 59,119], [116,90, 156,198, 373,326] ] backbone = Yolov5sV6Backbone() head = Yolov5sV6Head() detect = Yolov5sV6Detect(nc=80,anchors=anchors) fake_input = torch.rand(1,3,640,640) p3,p4,p5,p6 = backbone(fake_input) out1,out2,out3 = head(p3,p4,p5,p6) out1,out2,out3 = detect([out1,out2,out3]) print(out1.shape,out2.shape,out3.shape)torch.Size([1, 3, 80, 80, 85]) torch.Size([1, 3, 40, 40, 85]) torch.Size([1, 3, 20, 20, 85])3.5 整体组装class Yolov5sV6(nn.Module): def __init__(self,nc=80,anchors=()): super(Yolov5sV6,self).__init__() self.backbone = Yolov5sV6Backbone() self.head = Yolov5sV6Head() self.detect = Yolov5sV6Detect(nc,anchors) def forward(self,x): p3,p4,p5,p6 = self.backbone(x) out1,out2,out3 = self.head(p3,p4,p5,p6) out1,out2,out3 = self.detect([out1,out2,out3]) return out1,out2,out3调用测试anchors = [ [10,13, 16,30, 33,23], [30,61, 62,45, 59,119], [116,90, 156,198, 373,326] ] yolov5s = Yolov5sV6(nc=80,anchors=anchors) fake_input = torch.rand(1,3,640,640) out1,out2,out3 = yolov5s(fake_input) print(out1.shape,out2.shape,out3.shape)torch.Size([1, 3, 80, 80, 85]) torch.Size([1, 3, 40, 40, 85]) torch.Size([1, 3, 20, 20, 85])4.模型复杂度分析模型名Input模型大小全精度模型大小半精度参数量FLOPSbackbone640x64026.0MB13.1MB3.80M4.42GFLOPShead640x64011.5MB5.78MB3.00M2.79GFLOPSdetect640x640897KB450KB0.23M0.37GFLOPSYolov5s640x64026.9M13.5M7.03M7.58GFLOPS参考资料https://github.com/ultralytics/yolov5YOLOv5代码详解（common.py部分）Yolov5从入门到放弃（一）---yolov5网络架构YOLOV5网络结构
- 2022年01月07日
- 2,318 阅读
- 0 评论
- 0 点赞
2021-12-07
ViBe(Visual Background Extractor)背景模型介绍与实现 1.原理介绍1.1 模型工作原理背景物体就是指静止的或是非常缓慢的移动的物体，而前景物体就对应移动的物体。所以我们可以把物体检测看出一个分类问题，也就是来确定一个像素点是否属于背景点。在ViBe模型中，背景模型为每个像素点存储了一个样本集，然后将每一个新的像素值和样本集进行比较来判断是否属于背景点。可以知道如果一个新的观察值属于背景点那么它应该和样本集中的采样值比较接近。该模型主要包括三个方面：（1）算法模型初始化；（2）像素的分类过程；（3）模型的更新策略。1.2关于样本集的大小假定我们要处理的每一帧图像是$M \times N$ 个像素的，$x$表示某帧图像的一个像素点。模型要为$M \times N$中每个像素建立一个样本集，$x$像素的样本集可以表示为$$ M(x）=\{p_1 , p_2 , p_3 … p_n\} $$每个样本集的大小为n，n这个值如何确定的，暂时不用管，一般是实验得出的，论文中取$n=20$。所以样本集的总大小为$M \times N \times n$。1.3模型的初始化初始化就是建立背景模型的过程。通用的检测算法的初始化需要一定长度的视频序列来完成，通常要耗费数秒的时间，这极大的影戏的检测的实时性，对于手持相机实时拍照来讲并不合适。ViBe的初始化仅仅通过一帧图像即可完成。ViBe初始化就是填充像素的样本集的过程。由于在一帧图像中不可能包含像素点的时空分布信息，我们利用了相近像素点拥有相近的时空分布特性，具体来讲就是：对于一个像素点$x$，随机的选择它的邻居点$NG(x)$的像素值作为它的模型样本值$M_0(x)$。$$ M_0(x) = {p_0(y | y ∈NG(x))} $$这种初始化方法的优缺点:优点对于噪声的反应比较灵敏，计算量小速度快，不仅减少了背景模型建立的过程，还可以处理背景突然变化的情况，当检测到背景突然变化明显时，只需要舍弃原始的模型，重新利用变化后的首帧图像建立背景模型。缺点用于作平均的几帧初始图像中可能采用了运动物体的像素，这种条件下初始化样本集，容易引入拖影（Ghost）区域；1.4像素的分类过程(前景检测)如下图，假定当前帧为第$ t$ 帧，$p_t(x)$表示第$ t$ 帧图像 $x$ 像素的像素值，图中的$p1$到$p6$都是$x$像素的样本集中的值。那图中的横坐标C1和纵坐标C2是什么呢？我们假定我们处理的图像每个像素是RGB格式的，即一个像素值由R,G,B三个值表示，那么图中的坐标其实还隐藏了C3，即C1，C2，C3表示的正是三个通道值，如果是灰度图的话下面的图就要画成一维一条直线了。接下来我们根据预先设定的半径R统计以当前像素点为中心的圆形区域$S_R(p_t{x})$(实际对应到RGB空间中为球形区域)中包含的该像素点的样本集中的样本的数量，这里记为#。在距$p_t(x)$值半径R距离范围内的样本值有$p_2,p_4$，在半径R范围内的样本值总数计为#，那么下图#=2。让后将该值与预先设定的阈值#min(论文中给出的值是2)进行对比，当#<#min的值时，x这个像素就被标记为前景像素，否则就将其标记为背景像素，依次处理所有像素，就能得出前景图像了1.5模型的更新策略即使已经建立起了背景模型，也应该对背景模型进行不断的更新，这样才能使得背景模型能够适应背景的不断变化（如光照变化，背景物体变更等）。A. 普通更新策略对于其他的背景提取算法，背景模型有两种不同的更新策略：保守更新策略：前景点永远不会用来填充模型这样会引起死锁，产生Ghost区域。比如初始化的时候如果一块静止的区域被错误的检测为运动的，那么在这种策略下它永远会被当做运动的物体来对待；Blind策略：前景和背景都可以用来更新背景模型；对死锁不敏感，但这样的缺点在于，缓慢移动的物体会融入到背景中，无法检测出来；B. ViBe算法更新策略ViBe算法中，使用的更新策略是：保守更新策略 + 前景点计数法 + 随机子采样。保守更新策略：前景点永远不会用来填充模型前景点计数法：对像素点进行统计，如果某个像素点连续N次被检测为前景，则将其更新为背景点；随机子采样：在每一个新的视频帧中都去更新背景模型中的每一个像素点的样本值是没有必要的，当一个像素点被分类为背景点时，它有1/φ的概率去更新背景模型。这就决定了ViBe算法的更新策略的其他属性：无记忆更新策略：每次确定需要更新像素点的背景模型时，以新的像素值随机取代该像素点样本集的一个样本值；时间取样更新策略：并非每处理一帧数据，都需要更新处理，而是按一定的更新率更新背景模型；当一个像素点被判定为背景时，它有1/φ的概率更新背景模型；φ是时间采样因子，一般取值为16；空间邻域更新策略：针对需要更新像素点，在该像素点的邻域中随机选择一个像素点，以新选择的像素点更新被选中的背景模型；C. ViBe算法具体更新的方法：每个背景点都有1/φ的概率更新该像素点的模型样本值；有1/φ的概率去更新该像素点邻居点的模型样本值；前景点计数达到临界值时，将其变为背景，并有1/ φ的概率去更新自己的模型样本值。2.算法实现import numpy as np import cv2 class ViBe: ''' ViBe运动检测，分割背景和前景运动图像 ''' def __init__(self,num_sam=20,min_match=2,radiu=20,rand_sam=16): self.defaultNbSamples = num_sam #每个像素的样本集数量，默认20个 self.defaultReqMatches = min_match #前景像素匹配数量，如果超过此值，则认为是背景像素 self.defaultRadius = radiu #匹配半径，即在该半径内则认为是匹配像素 self.defaultSubsamplingFactor = rand_sam #随机数因子，如果检测为背景，每个像素有1/defaultSubsamplingFactor几率更新样本集和领域样本集 self.background = 0 self.foreground = 255 def __buildNeighborArray(self,img): ''' 构建一副图像中每个像素的邻域数组参数：输入灰度图像返回值：每个像素9邻域数组，保存到self.samples中 ''' height,width=img.shape self.samples=np.zeros((self.defaultNbSamples,height,width),dtype=np.uint8) #生成随机偏移数组，用于计算随机选择的邻域坐标 ramoff_xy=np.random.randint(-1,2,size=(2,self.defaultNbSamples,height,width)) #ramoff_x=np.random.randint(-1,2,size=(self.defaultNbSamples,2,height,width)) #xr_=np.zeros((height,width)) xr_=np.tile(np.arange(width),(height,1)) #yr_=np.zeros((height,width)) yr_=np.tile(np.arange(height),(width,1)).T xyr_=np.zeros((2,self.defaultNbSamples,height,width)) for i in range(self.defaultNbSamples): xyr_[1,i]=xr_ xyr_[0,i]=yr_ xyr_=xyr_+ramoff_xy xyr_[xyr_<0]=0 tpr_=xyr_[1,:,:,-1] tpr_[tpr_>=width]=width-1 tpb_=xyr_[0,:,-1,:] tpb_[tpb_>=height]=height-1 xyr_[0,:,-1,:]=tpb_ xyr_[1,:,:,-1]=tpr_ #xyr=np.transpose(xyr_,(2,3,1,0)) xyr=xyr_.astype(int) self.samples=img[xyr[0,:,:,:],xyr[1,:,:,:]] def ProcessFirstFrame(self,img): ''' 处理视频的第一帧 1、初始化每个像素的样本集矩阵 2、初始化前景矩阵的mask 3、初始化前景像素的检测次数矩阵参数： img: 传入的numpy图像素组，要求灰度图像返回值：每个像素的样本集numpy数组 ''' self.__buildNeighborArray(img) self.fgCount=np.zeros(img.shape) #每个像素被检测为前景的次数 self.fgMask=np.zeros(img.shape) #保存前景像素 def Update(self,img): ''' 处理每帧视频，更新运动前景，并更新样本集。该函数是本类的主函数输入：灰度图像 ''' height,width=img.shape #计算当前像素值与样本库中值之差小于阀值范围RADIUS的个数，采用numpy的广播方法 dist=np.abs((self.samples.astype(float)-img.astype(float)).astype(int)) dist[dist<self.defaultRadius]=1 dist[dist>=self.defaultRadius]=0 matches=np.sum(dist,axis=0) #如果大于匹配数量阀值，则是背景，matches值False,否则为前景，值True matches=matches<self.defaultReqMatches self.fgMask[matches]=self.foreground self.fgMask[~matches]=self.background #前景像素计数+1,背景像素的计数设置为0 self.fgCount[matches]=self.fgCount[matches]+1 self.fgCount[~matches]=0 #如果某个像素连续50次被检测为前景，则认为一块静止区域被误判为运动，将其更新为背景点 fakeFG=self.fgCount>50 matches[fakeFG]=False #此处是该更新函数的关键 #更新背景像素的样本集，分两个步骤 #1、每个背景像素有1/self.defaultSubsamplingFactor几率更新自己的样本集 ##更新样本集方式为随机选取该像素样本集中的一个元素，更新为当前像素的值 #2、每个背景像素有1/self.defaultSubsamplingFactor几率更新邻域的样本集 ##更新邻域样本集方式为随机选取一个邻域点，并在该邻域点的样本集中随机选择一个更新为当前像素值 #更新自己样本集 upfactor=np.random.randint(self.defaultSubsamplingFactor,size=img.shape) #生成每个像素的更新几率 upfactor[matches]=100 #前景像素设置为100,其实可以是任何非零值，表示前景像素不需要更新样本集 upSelfSamplesInd=np.where(upfactor==0) #满足更新自己样本集像素的索引 upSelfSamplesPosition=np.random.randint(self.defaultNbSamples,size=upSelfSamplesInd[0].shape) #生成随机更新自己样本集的的索引 samInd=(upSelfSamplesPosition,upSelfSamplesInd[0],upSelfSamplesInd[1]) self.samples[samInd]=img[upSelfSamplesInd] #更新自己样本集中的一个样本为本次图像中对应像素值 #更新邻域样本集 upfactor=np.random.randint(self.defaultSubsamplingFactor,size=img.shape) #生成每个像素的更新几率 upfactor[matches]=100 #前景像素设置为100,其实可以是任何非零值，表示前景像素不需要更新样本集 upNbSamplesInd=np.where(upfactor==0) #满足更新邻域样本集背景像素的索引 nbnums=upNbSamplesInd[0].shape[0] ramNbOffset=np.random.randint(-1,2,size=(2,nbnums)) #分别是X和Y坐标的偏移 nbXY=np.stack(upNbSamplesInd) nbXY+=ramNbOffset nbXY[nbXY<0]=0 nbXY[0,nbXY[0,:]>=height]=height-1 nbXY[1,nbXY[1,:]>=width]=width-1 nbSPos=np.random.randint(self.defaultNbSamples,size=nbnums) nbSamInd=(nbSPos,nbXY[0],nbXY[1]) self.samples[nbSamInd]=img[upNbSamplesInd] def getFGMask(self): ''' 返回前景mask ''' return self.fgMask调用测试import matplotlib.pyplot as plt # 使用opencv加载目标视频 video_path = "./test.mp4" capture = cv2.VideoCapture(video_path) # 实例化ViBe模型 vibe=ViBe() # 第一帧标志 flag_first_frame = True while True: # 逐一读取视频帧 ret, frame = capture.read() if not ret: break # 将视频帧转为灰度图 gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # 如果是第一帧，则用于初始化背景模型 if flag_first_frame: vibe.ProcessFirstFrame(gray) flag_first_frame = False continue # 否则更新背景模型 vibe.Update(gray) # 获取前景并转为uint8类型 segMat=vibe.getFGMask() segMat = segMat.astype(np.uint8) # 拼接出显示图片 img_show = np.hstack((frame,cv2.cvtColor(segMat,cv2.COLOR_GRAY2BGR))) # 缩放到原来的二分之一并显示 x, y = img_show.shape[0:2] img_show = cv2.resize(img_show, (int(y / 2), int(x / 2))) cv2.imshow('vibe',img_show) if cv2.waitKey(50)&0xFF==ord("q"): break # 释放资源 capture.release() cv2.destroyAllWindows()参考资料《O. Barnich and M. Van Droogenbroeck. ViBe: a powerful random technique to estimate the background in video sequences.》ViBe背景建模算法ViBe：基于Python实现的加速版（2019.10）背景提取算法——帧间差分法、背景差分法、ViBe算法、ViBe+算法
- 2021年12月07日
- 1,033 阅读
- 0 评论
- 0 点赞
2021-12-06
暗通道先验去雾原理及实现 1.原理介绍1.1 大气散射模型为了表示雾对图像成像的影响，研究者提出了一个物理模型--大气散射模型，用其来表示雾霾等恶劣天气条件对图像造成的影响。该模型由McCartney首先提出，目前已经成为计算机视觉领域中朦胧图像生成过程的经典描述，该模型的数学表达式如下：$$ I(x)=J(x)t(x)+A(1-t(x)) $$其中:$I(x)$、$J(x)$分别表示有雾图像和对应的的无雾清晰图像，$A$表示全球大气光，$t(x)$是透射矩阵，描述的是光通过传输介质后没有被散射的部分，$t(x)$的数学表达式为：(其中，β指的是光的散射系数,d(x)指的是目标与摄象机之间的距离。)$$ t(x)=e^{-\beta d(x)} $$$J(x)t(x)$叫做直接衰减项，描述的是正常的清晰图像在透射媒介后发生衰减后保留下来的部分。可以看出，清晰图像会随着成像设备与物体的距离即$d(x)$的增加发生指数性衰减。$A(1－t(x))$部分称为大气遮罩层，表示的是全局大气光对成像的影响。因此，根据大气散射模型可知，要想恢复出清晰图像，需要解决两个问题：（1）准确的对全局大气光A进行估计求解，（2）准确的对透射矩阵t(x)进行求解。由上面的大气散射模型我们可以很容易得出除雾公式如下：$$ J(x)=\frac{I(x)-A(1-t(x))}{t(x)}=\frac{I(x)+t(x)}{t(x)}+A $$1.2 暗通道先验理论介绍暗通道先验理论为何凯明博士2009年在CVPR(IEEE Conference On Computer Version and Pattern Recogintion/IEEE计算视觉和模式识别会议)上所提出。该理论基于对大量无雾的户外图像的观察和统计，得出了大部分的户外无雾图像的每个局部区域都存在至少一个颜色通道的强度值很低，换言之，这个区域的各个像素点的颜色通道强度的最小值是个很小的数，其值约等于0，即在其暗通道图中表现为黑色。何凯明博士在其发表的暗通道先验除雾的论文中提出，对于任意的输入图像J，其暗通道Jdark可以用数学公式表达如下所示：$$ J^{dark}(x)=\min_{c\in{r,g,b}}[\min_{y\in{\Omega(x)}}(J^c(y))] $$式中$Ω(x)$则是一个正方形区域，该区域的中心是像素$x$，$J^c(x)$代表图像$J$的在某像素点的颜色强度。由公式可知，暗通道实际上图像的最小灰度图经过最小值滤波得到的。分别对无雾图像和有雾图像进行暗通道图的求解效果如图所示:1.3通过暗通道先验对大气光A和透射率t(x)进行估计由大气散射模型可知，若要对图像进行除雾，等价于现在的已经知道的$I(x)$，要求解出$J(x)$，显然这个方程有无数个解。因此就需要用到暗通道先验理论对透射率$t(x)$和大气光$A$进行估计了。首先完成的是对大气光$A$的估计，根据对何凯明博士发表的论文的理解，本文对大气光求解的具体步骤如下：将暗通道图和原图转化为[图片像素数量*通道数]的向量。求出暗通道图对于的向量中亮度前1/1000的向量索引。根据索引在原图转化得到的向量中寻找具有最高亮度的点的值，这就是A的估计值了。在完成大气$A$的求解后，便可以对$t(x)$的求解进行推导了。参考相关资料对$t(x)$的求解过程进行推导如下:首先将大气散射模型稍作变形得到如下式子：$$ \frac{I^c(x)}{A^c}=t(x)\frac{J^c(x)}{A^c}+1-t(x) $$为了对$t(x)$进行求解，先假设在每一个窗口内透射率$t(x)$为常数$\hat t(x)$，因为上面已经得到了$A$的估计值，所以在这里只需将$A$当做一个常量即可，对上式两边进行两次最小值运算，可以得到下式(i)：$$ \min_{y \in \Omega(x)}(\min_{c \in \{r,g,b\}}(\frac{I^c(x)}{A^c})) =\hat t(x)\min_{y \in \Omega(x)}(\min_{c \in \{r,g,b\}}(\frac{J^c(x)}{A^c})) +1-\hat t(x) $$根据暗通道先验理论有：$$ J^{dark}(x)=min_{c\in{r,g,b}}[min_{y\in{\Omega(x)}}(J^c(y))] =0 $$因此，可以得出：$$ \min_{y \in \Omega(x)}(\min_{c \in \{r,g,b\}}(\frac{J^c(x)}{A^c})) =0 $$代入式(i)可得式(ii)：$$ \hat t(x) = 1 - \min_{y \in \Omega(x)}(\min_{c \in \{r,g,b\}}(\frac{J^c(x)}{A^c})) $$到这就得到了透射图$t(x)$的估计结果。但是如果直接使用这一结果进行除雾效果有时产生的结果并不理想。因为空中总是会存在某些漂浮的小颗粒的，这使得我们看远处的东西会或多或少的受到雾的影响，此外，雾的存在还能让人更好的感受到物体的远近，因此，在去雾时需要考虑对一部分的雾进行保留。对雾的保留可以通过在式(ii)中引入一个在[0,1]之间的因子来实现，变化后得到的新的公式如下所示：$$ \hat t(x) = 1 - \omega\min_{y \in \Omega(x)}(min_{c \in \{r,g,b\}}(\frac{J^c(x)}{A^c})) $$根据何凯明博士的论文及进行了几次测试，最终本文选定的$\omega$的值为0.95。本文对复现实验中对$t(x)$进行求解过程及效果图如下图所示：1.4 根据大气光A和透射率t(x)的估计结果进行图像去雾到这里，我们就可以根据雾天图片退化模型进行清晰图像的恢复了。又因为当像素点x对应的透射图的$t(x)$很小时，会导致根据公式$$ J(x)=\frac{I(x)+t(x)}{t(x)}+A $$求解出的$J(x)$的值偏大，从而导致最终的得到的图片某些地方过曝，所以需要对其进行限制，本文选择限制的最大取值为$t_0=0.1$。从而的到最终使用的图像去雾公式为：$$ J(x)=\frac{I(x)+t(x)}{max(t(x),t_0)}+A $$2.代码实现2.0绘图辅助函数import matplotlib.pyplot as plt #显示单个图 def show_img_by_plt(img,title): plt.figure(figsize=(20,10)) #初始化画布 plt.axis('off') # 关掉坐标轴 plt.imshow(img) #显示图片 plt.title(title,y=-0.1) # 设置图像title plt.show() #show #通过subplot同时显示2个图 def show_double_img_by_subplot(img1,title1,img2,title2): plt.figure(figsize=(20,10)) #初始化画布 #第1个位置创建一个小图 plt.subplot(1,2,1)#表示将整个图像窗口分为1行2列, 当前位置为1 plt.axis('off') # 关掉坐标轴 plt.title(title1,y=-0.1) plt.imshow(img1) #第2个位置创建一个小图 plt.subplot(1,2,2)#表示将整个图像窗口分为1行2列, 当前位置为2 plt.axis('off') # 关掉坐标轴 plt.title(title2,y=-0.1) plt.imshow(img2) plt.show() #show #通过subplot同时显示2个图 #同时显示三个图 def show_three_img_by_subplot(img1,title1,img2,title2,img3,title3): plt.figure(figsize=(16,8)) #初始化画布 #第1个位置创建一个小图 plt.subplot(1,3,1)#表示将整个图像窗口分为1行2列, 当前位置为1 plt.axis('off') # 关掉坐标轴 plt.title(title1,y=-0.1) plt.imshow(img1) #第2个位置创建一个小图 plt.subplot(1,3,2)#表示将整个图像窗口分为1行2列, 当前位置为2 plt.axis('off') # 关掉坐标轴 plt.title(title2,y=-0.1) plt.imshow(img2) #第3个位置创建一个小图 plt.subplot(1,3,3)#表示将整个图像窗口分为1行2列, 当前位置为2 plt.axis('off') # 关掉坐标轴 plt.title(title3,y=-0.1) plt.imshow(img3) plt.show() #show2.1 获取图片暗通道图# 核心函数 #暗通道实际上是在rgb三个通道中取最小值组成灰度图，然后再进行一个最小值滤波得到的。 def get_dark_channel_img(original_img, r=15): #原图rgb三个通道中取最小值组成灰度图 temp_img = np.min(original_img,2) #再进行一个最小值滤波 #最小值滤波用腐蚀来替代了，其实腐蚀就是最小值滤波，最大值滤波是膨胀 kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (r,r)) """ cv2.getStructuringElement( ) 返回指定形状和尺寸的结构元素。这个函数的第一个参数表示内核的形状，有三种形状可以选择。矩形：MORPH_RECT; 交叉形：MORPH_CROSS; 椭圆形：MORPH_ELLIPSE; (r,r)表示kernel的size """ dst_img = cv2.erode(temp_img, kernel) """ dst = cv.dilate(temp_img, kerne) 对图片进行腐蚀参数说明： dst_img 输出与输入相同大小和类型的图像. kernel 用于侵蚀的结构元素可以使用getStructuringElement来创建结构元素。 """ return dst_img #返回最终暗通道图#调用测试 import imageio import numpy as np import cv2 img_src="./image/get_dark_channel_img_clear.jpg" original_img = imageio.imread(img_src) dark_channel_img=get_dark_channel_img(original_img) show_double_img_by_subplot(original_img,"original_img",dark_channel_img,"dark_channel_img")2.2根据暗通道图和原图估计大气光#核心函数 """ 1.从暗通道图按照亮度的大小取前0.1%的像素。 2.在这些位置中，在原始有雾图像I中寻找对应的具有最高亮度的点的值，作为A值。 """ def estimate_A(original_img,dark_channel_img): #计算图片的像素总数 img_h,img_w,img_c_num = original_img.shape img_size = img_h*img_w #计算暗通道图的像素值较大的0.1%的像素的数量 num_px = int(max(math.floor(img_size/1000),1)) #将暗通道图变为列向量 dark_vec = dark_channel_img.reshape(img_size,1); #将图片变为列向量 img_vec = original_img.reshape(img_size,3); #将暗通道图对应的列向量进行排序并返回从小到大的数组索引--argsort函数返回的是数组值从小到大的索引值 indices = np.argsort(dark_vec, axis=0) #取暗通道图像素较大的0.1%的像素的索引 indices = indices[img_size-num_px::] #将原图对应的暗通道图的像素值排前0.1% 的像素点的像素值进行求平均 A = np.max(img_vec[indices]) return A#调用测试 import imageio import numpy as np import cv2 import math # img_src=str(input("请输入原图的地址：")) img_src="./image/haze_image.jpg" original_img = imageio.imread(img_src)/255 dark_channel_img=get_dark_channel_img(original_img) A = estimate_A(original_img,dark_channel_img)2.3根据原图和大气光A进行投射图t(x)的估计#核心函数 #根据论文中透射图的估计公式进行书写的 def estimate_transmission(original_img,A, omega = 0.95): min_min_Ic_div_Ac = np.zeros(original_img.shape,"float64"); for index in range(0,3): min_min_Ic_div_Ac[:,:,index] = original_img[:,:,index]/A transmission = 1 - omega*get_dark_channel_img(min_min_Ic_div_Ac); return transmission#调用测试 import imageio import numpy as np import cv2 # img_src=str(input("请输入原图的地址：")) img_src="./image/haze_image.jpg" original_img = imageio.imread(img_src) dark_channel_img=get_dark_channel_img(original_img) A = estimate_A(original_img,dark_channel_img) transmission = estimate_transmission(original_img,A) show_double_img_by_subplot(original_img,"original_img",transmission,"transmission")2.4根据估计好的大气光和透射图进行图片恢复#核心函数 def recover_haze_by_dark_channel_prior(haze_img,t_max = 0.1): dark_channel_img=get_dark_channel_img(haze_img) A = estimate_A(haze_img,dark_channel_img) transmission = estimate_transmission(haze_img,A) clear_img = np.empty(haze_img.shape,haze_img.dtype); for index in range(0,3): clear_img[:,:,index] = (haze_img[:,:,index]-A[index])/cv2.max(transmission,t_max) + A[index] return clear_img#调用测试 import imageio import numpy as np # haze_img_src=str(input("请输入有雾图片的地址：")) haze_img_src="./image/haze_image1.jpg" haze_img = imageio.imread(haze_img_src) clear_img = recover_haze_by_dark_channel_prior(haze_img) show_double_img_by_subplot(haze_img,"haze_img",clear_img,"clear_img") transmission = estimate_transmission(haze_img,A) show_three_img_by_subplot(haze_img,"haze_img",clear_img,"clear_img",transmission,"transmission")3.暗通道先验图片去雾汇总import imageio import numpy as np import cv2 import math def get_dark_channel_img(original_img, r=6): #原图rgb三个通道中取最小值组成灰度图 temp_img = np.min(original_img,2) #再进行一个最小值滤波 #最小值滤波用腐蚀来替代了，其实腐蚀就是最小值滤波，最大值滤波是膨胀 kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (r,r)) dst_img = cv2.erode(temp_img, kernel) return dst_img #返回最终暗通道图 def estimate_A(original_img,dark_channel_img): #计算图片的像素总数 img_h,img_w,img_c_num = original_img.shape img_size = img_h*img_w #计算暗通道图的像素值较大的0.1%的像素的数量 num_px = int(max(math.floor(img_size/1000),1)) #将暗通道图变为列向量 dark_vec = dark_channel_img.reshape(img_size,1); #将图片变为列向量 img_vec = original_img.reshape(img_size,3); #将暗通道图对应的列向量进行排序并返回从小到大的数组索引--argsort函数返回的是数组值从小到大的索引值 indices = np.argsort(dark_vec, axis=0) #取暗通道图像素较大的0.1%的像素的索引 indices = indices[img_size-num_px::] #将原图对应的暗通道图的像素值排前0.1% 的像素点的像素值进行求平均 atm_sum = np.zeros([1,3]) for index in range(1,num_px): atm_sum = atm_sum + img_vec[indices[index]] A = atm_sum / num_px; return A[0] def estimate_transmission(original_img,A, omega = 0.95): min_min_Ic_div_Ac = np.zeros(original_img.shape,"float64"); for index in range(0,3): min_min_Ic_div_Ac[:,:,index] = original_img[:,:,index]/A[index] transmission = 1 - omega*get_dark_channel_img(min_min_Ic_div_Ac); return transmission def recover_haze_by_dark_channel_prior(haze_img,t_max = 0.1): dark_channel_img=get_dark_channel_img(haze_img) A = estimate_A(haze_img,dark_channel_img) transmission = estimate_transmission(haze_img,A) clear_img = np.empty(haze_img.shape,haze_img.dtype); for index in range(0,3): clear_img[:,:,index] = (haze_img[:,:,index]-A[index])/cv2.max(transmission,t_max) + A[index] return clear_img haze_img_src="./image/haze_image.jpg" haze_img = imageio.imread(haze_img_src) clear_img = recover_haze_by_dark_channel_prior(haze_img) show_double_img_by_subplot(haze_img,"haze_img",clear_img,"clear_img")应用与视频from moviepy.editor import VideoFileClip def process_image(image): return deHaze(image/255.0)*255 output_file = './vedio/clear_dark_channel_short.mp4' test_clip = VideoFileClip("./vedio/haze_short.mp4") new_clip = test_clip.fl_image(process_image) new_clip.write_videofile(output_file, audio=False)参考资料暗通道先验原理——DCP去雾算法He K , Sun J , Fellow, et al. Single Image Haze Removal Using Dark Channel Prior[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2011, 33(12):2341-2353.
- 2021年12月06日
- 1,630 阅读
- 1 评论
- 0 点赞
2021-11-25
Pytorch 实战：使用Resnet18实现对是否戴口罩进行图片分类 1.实验环境torch = 1.6.0torchvision = 0.7.0matplotlib = 3.3.3 # 绘图用progressbar = 2.5 # 绘制进度条用easydict # 超参数字典功能增强2.数据集数据集介绍包含2582张图片，3个类别(yes/unknow/no)下载地址：口罩检测数据集3.导入相关的包# 导包 import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader import torchvision from torchvision import datasets,transforms import matplotlib.pyplot as plt import random from progressbar import *4.设置超参数# 定义超参数 from easydict import EasyDict super_param={ 'train_data_root' : './data/train', 'val_data_root' : './data/train', 'device': torch.device('cuda:0' if torch.cuda.is_available() else cpu), 'lr': 0.001, 'epochs': 3, 'batch_size': 1, 'begain_epoch':0, 'model_load_flag':False, #是否加载以前的模型 'model_load_path':'./model/resnet18/epoch_1_0.8861347792408986.pkl', 'model_save_dir':'./model/resnet18', } super_param = EasyDict(super_param) if not os.path.exists(super_param.model_save_dir): os.mkdir(super_param.model_save_dir)5.模型搭建# 模型搭建，调用预训练模型Resnet18 class Modified_Resnet18(nn.Module): """docstring for ClassName""" def __init__(self, num_classs=3): super(Modified_Resnet18, self).__init__() model = torchvision.models.resnet18(pretrained=True) model.fc = nn.Linear(model.fc.in_features,num_classs) self.model = model def forward(self, x): x = self.model(x) return x model = Modified_Resnet18() print(model)Modified_Resnet18( (model): ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (1): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer2): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer3): Sequential( (0): BasicBlock( (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer4): Sequential( (0): BasicBlock( (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (avgpool): AdaptiveAvgPool2d(output_size=(1, 1)) (fc): Linear(in_features=512, out_features=3, bias=True) ) )6.加载数据集# 训练数据封装成dataloader # 定义数据处理的transform transform = transforms.Compose([ transforms.Resize((224,224)), transforms.ToTensor(), # 0-255 to 0-1 # transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), ]) train_dataset =torchvision.datasets.ImageFolder(root=super_param.train_data_root,transform=transform) train_loader =DataLoader(train_dataset,batch_size=super_param.batch_size, shuffle=True) val_dataset =torchvision.datasets.ImageFolder(root=super_param.val_data_root,transform=transform) val_loader =DataLoader(val_dataset,batch_size=super_param.batch_size, shuffle=True) # 保存类别与类别索引的对应关系 class_to_idx = train_dataset.class_to_idx idx_to_class = dict([val,key] for key,val in class_to_idx.items()) print(len(train_dataset),len(val_dataset))查看一个数据样例#查看一个数据 import matplotlib.pyplot as plt index = random.randint(0,len(val_dataset)) img,idx = val_dataset[index] img = img.permute(1,2,0) label = idx_to_class[idx] print("label=",label) plt.figure(dpi=100) plt.xticks([]) plt.yticks([]) plt.imshow(img) plt.show()7.定义损失函数和优化器# 定义损失函数和优化器 criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(),lr=super_param.lr)8.定义单个epoch的训练函数# 定义单个epoch的训练函数 def train_epoch(model,train_loader,super_param,criterion,optimizer,epoch): model.train()#训练声明 for batch_index,(imgs,labels) in enumerate(train_loader): #data上device imgs,labels = imgs.to(super_param.device),labels.to(super_param.device) #梯度清零 optimizer.zero_grad() #前向传播 output = model(imgs) #损失计算 loss = criterion(output,labels) #梯度计算，反向传播 loss.backward() #参数优化 optimizer.step() #打印参考信息 if(batch_index % 10 == 0): print("\rEpoch:{} Batch Index(batch_size={}):{}/{} Loss:{}".format(epoch,super_param.batch_size,batch_index,len(train_loader),loss.item()),end="")9.定义验证函数# 定义验证函数 def val(model,val_loader,super_param,criterion): model.eval()#测试声明 correct = 0.0 #正确数量 val_loss = 0.0 #测试损失 #定义进度条 widgets = ['Val: ',Percentage(), ' ', Bar('#'),' ', Timer(),' ', ETA()] pbar = ProgressBar(widgets=widgets, maxval=100).start() with torch.no_grad(): # 不会计算梯度，也不会进行反向传播 for batch_index,(imgs,labels) in enumerate(val_loader): imgs,labels = imgs.to(super_param.device),labels.to(super_param.device) output = model(imgs)#模型预测 val_loss += criterion(output,labels).item() # 计算测试损失 #argmax返回值，索引 dim=1表示要索引 pred = output.argmax(dim=1) # 找到概率最大的下标 correct += pred.eq(labels.view_as(pred)).sum().item()# 统计预测正确数量 # print("pred===========",pred) pbar.update(batch_index/len(val_loader)*100)#更新进度条进度 #释放进度条 pbar.finish() val_loss /= len(val_loader.dataset) val_accuracy = correct / len(val_loader.dataset) time.sleep(0.01) print("Val --- Avg Loss:{},Accuracy:{}".format(val_loss,val_accuracy)) return val_loss,val_accuracy10.模型训练model = model.to(super_param.device) if super_param.model_load_flag: #加载训练过的模型 model.load_state_dict(torch.load(super_param.model_load_path)) # 数据统计-用于绘图和模型保存 epoch_list = [] loss_list = [] accuracy_list =[] best_accuracy = 0.0 for epoch in range(super_param.begain_epoch,super_param.begain_epoch+super_param.epochs): train_epoch(model,train_loader,super_param,criterion,optimizer,epoch) val_loss,val_accuracy = val(model,val_loader,super_param,criterion) #数据统计 epoch_list.append(epoch) loss_list.append(val_loss) accuracy_list.append(accuracy_list) #保存准确率更高的模型 if(val_accuracy>best_accuracy): best_accuracy = val_accuracy torch.save(model.state_dict(),os.path.join(super_param.model_save_dir, 'epoch_' + str(epoch)+ '_' + str(best_accuracy) + '.pkl')) print('epoch_' + str(epoch) + '_' + str(best_accuracy) + '.pkl'+"保存成功")11.查看数据统计结果# 查看数据统计结果 fig = plt.figure(figsize=(12,12),dpi=70) #子图1 ax1 = plt.subplot(2,1,1) title = "bach_size={},lr={}".format(super_param.batch_size,super_param.lr) plt.title(title,fontsize=15) plt.xlabel('Epochs',fontsize=15) plt.ylabel('Loss',fontsize=15) plt.xticks(fontsize=13) plt.yticks(fontsize=13) plt.plot(epoch_list,loss_list) #子图2 ax2 = plt.subplot(2,1,2) plt.xlabel('Epochs',fontsize=15) plt.ylabel('Accuracy',fontsize=15) plt.xticks(fontsize=13) plt.yticks(fontsize=13) plt.plot(epoch_list,accuracy_list,'r') plt.show()
- 2021年11月25日
- 802 阅读
- 0 评论
- 0 点赞
2021-11-19
python结合opencv调用谷歌tesseract-ocr实现印刷体的表格识别识别环境搭建安装依赖-leptonica库下载源码git clone https://github.com/DanBloomberg/leptonica.gitconfiguresudo apt install automake bash autogen.sh ./configure编译安装make sudo make install这样就安装好了leptonica库谷歌tesseract-ocr编译安装下载源码git clone https://github.com/tesseract-ocr/tesseract.git tesseract-ocr安装依赖sudo apt-get install g++ autoconf automake libtool autoconf-archive pkg-config libpng12-dev libjpeg8-dev libtiff5-dev zlib1g-dev libleptonica-dev -y安装训练所需要的库(只是调用可以不用安装)sudo apt-get install libicu-dev libpango1.0-dev libcairo2-devconfigurebash autogen.sh ./configure编译安装make sudo make install # 可选项，不训练可以选择不执行下面两条 make training sudo make training-install sudo ldconfig安装对应的字体库并添加对应的环境变量下载好的语言包放在/usr/local/share/tessdata目录里面。语言包地址：https://github.com/tesseract-ocr/tessdata_best。里面有各种语言包，都是训练好的语言包。简体中文下载：chi_sim.traineddata ， chi_sim_vert.traineddata英文包：eng.traineddata。设置环境变量vim ~/.bashrc # 在.bashrc的文件末尾加入以下内容 export TESSDATA_PREFIX=/usr/local/share/tessdata source ~/.bashrc查看字体库tesseract --list-langs使用tesseract-ocr测试# 识别/home/app/1.png这张图片，内容输出到output.txt 里面，用chi_sim 中文来识别（不用加.traineddata，会默认加） tesseract /home/app/1.png output -l chi_sim # 查看识别结果 cat output.txt安装python调用依赖包pip install pytesseract配合opencv调用进行印刷体表格识别代码import cv2 import numpy as np import pytesseract import matplotlib.pyplot as plt import re #原图 raw = cv2.imread("table3.2.PNG") # 灰度图片 gray = cv2.cvtColor(raw, cv2.COLOR_BGR2GRAY) # 二值化 binary = cv2.adaptiveThreshold(~gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 35, -5) rows, cols = binary.shape # 识别横线 scale = 50 kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (cols // scale, 1)) eroded = cv2.erode(binary, kernel, iterations=1) dilated_col = cv2.dilate(eroded, kernel, iterations=1) # 识别竖线 scale = 10 kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, rows // scale)) eroded = cv2.erode(binary, kernel, iterations=1) dilated_row = cv2.dilate(eroded, kernel, iterations=1) # 标识交点 bitwise_and = cv2.bitwise_and(dilated_col, dilated_row) plt.figure(dpi=300) # plt.imshow(bitwise_and,cmap="gray") # 标识表格 merge = cv2.add(dilated_col, dilated_row) plt.figure(dpi=300) # plt.imshow(merge,cmap="gray") # 两张图片进行减法运算，去掉表格框线 merge2 = cv2.subtract(binary, merge) plt.figure(dpi=300) # plt.imshow(merge2,cmap="gray") # 识别黑白图中的白色交叉点，将横纵坐标取出 ys, xs = np.where(bitwise_and > 0) # 提取单元格切分点 # 横坐标 x_point_arr = [] # 通过排序，获取跳变的x和y的值，说明是交点，否则交点会有好多像素值值相近，我只取相近值的最后一点 # 这个10的跳变不是固定的，根据不同的图片会有微调，基本上为单元格表格的高度（y坐标跳变）和长度（x坐标跳变） sort_x_point = np.sort(xs) for i in range(len(sort_x_point) - 1): if sort_x_point[i + 1] - sort_x_point[i] > 10: x_point_arr.append(sort_x_point[i]) i = i + 1 x_point_arr.append(sort_x_point[i]) # 要将最后一个点加入 # 纵坐标 y_point_arr = [] sort_y_point = np.sort(ys) for i in range(len(sort_y_point) - 1): if (sort_y_point[i + 1] - sort_y_point[i] > 10): y_point_arr.append(sort_y_point[i]) i = i + 1 # 要将最后一个点加入 y_point_arr.append(sort_y_point[i]) # 循环y坐标，x坐标分割表格 plt.figure(dpi=300) data = [[] for i in range(len(y_point_arr))] for i in range(len(y_point_arr) - 1): for j in range(len(x_point_arr) - 1): # 在分割时，第一个参数为y坐标，第二个参数为x坐标 cell = raw[y_point_arr[i]:y_point_arr[i + 1], x_point_arr[j]:x_point_arr[j + 1]] #读取文字，此为默认英文 text1 = pytesseract.image_to_string(cell, lang="chi_sim+eng") #去除特殊字符 text1 = re.findall(r'[^\*"/:?\\|<>″′‖ 〈\n]', text1, re.S) text1 = "".join(text1) print('单元格图片信息：' + text1) data[i].append(text1) plt.subplot(len(y_point_arr),len(x_point_arr),len(x_point_arr)*i+j+1) plt.imshow(cell) plt.axis('off') plt.tight_layout() plt.show()实现效果表格图片识别结果单元格图片信息：停机位Stands 单元格图片信息：空器避展限制Wingspanlimitsforaircraft(m) 单元格图片信息：Nr61.62.414-416.886.887.898,899.ZZ14 单元格图片信息：80 单元格图片信息：Nr101,102.105,212.219-222,228.417-419,874-876.895-897.910-912 单元格图片信息：65 单元格图片信息：NrZZ11 单元格图片信息：61 单元格图片信息：Nr103,104.106.107.109,117,890-894.ZL1,ZL2(deicingstands) 单元格图片信息：52 单元格图片信息：Nr878.879 单元格图片信息：51 单元格图片信息：Nr229.230 单元格图片信息：48 单元格图片信息：Nr877 单元格图片信息：42 单元格图片信息：Nr61R.62L 单元格图片信息：38.1 单元格图片信息：Nr61L.62R.108.110-116,118.201-211.213-218,223-227.409-413,414L.414R.415L.415R.416L,416R.417R.418R.419R.501-504.601-610,886L,886R.887L.887R.888.889.898L.898R.8991L.899R_901-909,910R.911R.912R.913-916.921-925.ZZ1IL.ZZ12.ZZ13 单元格图片信息：36 单元格图片信息：NrZZ11IR 单元格图片信息：32 单元格图片信息：Nr884.885 单元格图片信息：29(fuselagelength32.8) 单元格图片信息：Nr417L.418L.419L 单元格图片信息：24.9 单元格图片信息：Nr910L.911L.912L.931-946 单元格图片信息：24
- 2021年11月19日
- 1,099 阅读
- 0 评论
- 0 点赞
2021-09-15
快速使用YOLOv5进行训练VOC格式的数据集训练步骤STEP1:下载官方YOLOv5的代码并配置环境git clone https://github.com/ultralytics/yolov5 cd yolov5 pip install -r requirements.txtSTEP2:准备VOC格式的数据集数据放置格式├──train_data_VOC ├── Annotations ├──放置xml文件 ├── JPEGImages ├──防止img文件STEP3:将数据集转为YOLOv5所需要的COCO格式mkdir train_data_COCO vim VOC2COCO.pyimport os import shutil import random import xmltodict from progressbar import * #================================================================================================================ # 函数定义区 # 函数-将voc xml中的object转化为对应的一条yolo数据 def get_yolo_data(obj,img_width,img_height): # 获取voc格式的数据信息 name = obj['name'] xmin = float(obj['bndbox']['xmin']) xmax = float(obj['bndbox']['xmax']) ymin = float(obj['bndbox']['ymin']) ymax = float(obj['bndbox']['ymax']) # 计算yolo格式的数据信息 class_idx = class_names.index(name) x_center,y_center = (xmin+xmax)/2,(ymin+ymax)/2 box_width = xmax - xmin box_height = ymax - ymin yolo_data = "{} {} {} {} {}\n".format(class_idx,x_center/img_width,y_center/img_height,box_width/img_width,box_height/img_height) return yolo_data # 函数-将xml文件转为txt文件 def convert_annotations(image_name): in_file = xml_file_path + image_name + '.xml' out_file = txt_file_path + image_name + '.txt' yolo_data = "" with open(in_file) as f: xml_str = f.read() # 转为字典 xml_dic = xmltodict.parse(xml_str) # 获取图片的width、height img_width = float(xml_dic["annotation"]["size"]["width"]) img_height = float(xml_dic["annotation"]["size"]["height"]) # 获取xml文件中的object objects = xml_dic["annotation"]["object"] if isinstance(objects,list): # xml文件中包含多个object for obj in objects: yolo_data += get_yolo_data(obj,img_width,img_height) else: # xml文件中包含1个object obj = objects yolo_data += get_yolo_data(obj,img_width,img_height) with open(out_file,'w') as f: f.write(yolo_data) # 函数-创建最终用于训练的COCO格式数据集的文件夹 def create_dir(): if not os.path.exists('train_data_COCO/images/'): os.makedirs('train_data_COCO/images/') if not os.path.exists('train_data_COCO/labels/'): os.makedirs('train_data_COCO/labels/') if not os.path.exists('train_data_COCO/images/train/'): os.makedirs('train_data_COCO/images/train') if not os.path.exists('train_data_COCO/images/val/'): os.makedirs('train_data_COCO/images/val/') if not os.path.exists('train_data_COCO/images/test/'): os.makedirs('train_data_COCO/images/test/') if not os.path.exists('train_data_COCO/labels/train/'): os.makedirs('train_data_COCO/labels/train/') if not os.path.exists('train_data_COCO/labels/val/'): os.makedirs('train_data_COCO/labels/val/') if not os.path.exists('train_data_COCO/labels/test/'): os.makedirs('train_data_COCO/labels/test/') return #================================================================================================================ # 功能实现区 """ STEP1:准备工作：数据准备+创建各种所需的文件夹 """ # 对应的VOC数据集的路径参数+类别参数 xml_file_path = './train_data_VOC/Annotations/' # 检查和自己的xml文件夹名称是否一致 images_file_path = './train_data_VOC/JPEGImages/' # 检查和自己的图像文件夹名称是否一致 class_names = ['Person', 'BridgeVehicle', 'LuggageVehicle', 'Plane', 'RefuelVehicle', 'FoodVehicle', 'RubbishVehicle', 'WaterVehicle', 'PlatformVehicle', 'TractorVehicle'] # 创一个临时文件夹用来存放xml文件转换出来的对应的txt文件 if not os.path.exists('train_data_COCO/temp_labels/'): os.makedirs('train_data_COCO/temp_labels/') txt_file_path = 'train_data_COCO/temp_labels/' # 执行xml到txt的转换，存储到一个临时文件夹 total_xml = os.listdir(xml_file_path) num_xml = len(total_xml) # XML文件总数 for i in range(num_xml): name = total_xml[i][:-4] convert_annotations(name) # 创建COCO格式的数据所需要的各种文件夹 create_dir() # 读取所有的txt文件 total_txt = os.listdir(txt_file_path) print("数据准备工作完成，开始进行数据分配") """ STEP2:数据分配：按比例对数据集进行划分 """ # 设置数据集划分比例，训练集75%，验证集15%，测试集15% train_percent = 0.8 val_percent = 0.15 test_percent = 0.05 # 计算train,val,test每一类的数据数量 num_txt = len(total_txt) num_train = int(num_txt * train_percent) num_val = int(num_txt * val_percent) num_test = num_txt - num_train - num_val # 根据计算出的每类的数据数量计算出进行数据分配的索引 list_all_txt = range(num_txt) # 范围 range(0, num) train = random.sample(list_all_txt, num_train)# train从list_all_txt取出num_train个元素 val_test = [i for i in list_all_txt if not i in train]# 所以list_all_txt列表只剩下了这些元素：val_test val = random.sample(val_test, num_val)# 再从val_test取出num_val个元素，val_test剩下的元素就是test # 根据采样的索引结果进行文件分配工作 print("训练集数目：{}, 验证集数目：{},测试集数目：{}".format(len(train), len(val), len(val_test) - len(val))) #进度条功能 widgets = ['VOC2COCO: ',Percentage(), ' ', Bar('#'),' ', Timer(),' ', ETA()] pbar = ProgressBar(widgets=widgets, maxval=num_txt).start() count = 0 for i in list_all_txt: name = total_txt[i][:-4] srcImage = images_file_path + name + '.jpg' srcLabel = txt_file_path + name + '.txt' if i in train: dst_train_Image = 'train_data_COCO/images/train/' + name + '.jpg' dst_train_Label = 'train_data_COCO/labels/train/' + name + '.txt' shutil.copyfile(srcImage, dst_train_Image) shutil.copyfile(srcLabel, dst_train_Label) elif i in val: dst_val_Image = 'train_data_COCO/images/val/' + name + '.jpg' dst_val_Label = 'train_data_COCO/labels/val/' + name + '.txt' shutil.copyfile(srcImage, dst_val_Image) shutil.copyfile(srcLabel, dst_val_Label) else: dst_test_Image = 'train_data_COCO/images/test/' + name + '.jpg' dst_test_Label = 'train_data_COCO/labels/test/' + name + '.txt' shutil.copyfile(srcImage, dst_test_Image) shutil.copyfile(srcLabel, dst_test_Label) #更新进度条 count += 1 pbar.update(count) #释放进度条 pbar.finish() print("数据分配工作完成，开始释放临时文") """ STEP3:释放临时文件 """ shutil.rmtree(txt_file_path) print("临时文件释放完成，VOC2COCO执行结束")python VOC2COCO.pySTEP4:在data下创建与数据对应的data.yaml文件文件内容按照数据的数据情况填写path: train_data_COCO # root train: # train images (relative to 'path') - images/train val: # val images (relative to 'path') - images/val test: # test images (optional) - images/test # Classes nc: 10 # number of classes names: ['Person', 'BridgeVehicle', 'LuggageVehicle', 'Plane', 'RefuelVehicle', 'FoodVehicle', 'RubbishVehicle', 'WaterVehicle', 'PlatformVehicle', 'TractorVehicle'] # class namesSTEP5:下载预训练模型mkdir weights cd weights wget https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5s.ptSTEP6:开始训练python train.py --data data/data.yaml --cfg models/yolov5s.yaml --weights weights/yolov5s.pt --batch-size 64 --epochs 60 参考资料https://github.com/ultralytics/yolov5
- 2021年09月15日
- 848 阅读
- 0 评论
- 0 点赞
2021-09-13
光流法简介及实现 1.光流法简介1.1光流光流（optical flow）是空间运动物体在观察成像平面上的像素运动的瞬时速度。通常将二维图像平面特定坐标点上的灰度瞬时变化率定义为光流矢量。一言以概之：所谓光流就是瞬时速率，在时间间隔很小（比如视频的连续前后两帧之间）时，也等同于目标点的位移三言以概之：所谓光流场就是很多光流的集合。当我们计算出了一幅图片中每个图像的光流，就能形成光流场。构建光流场是试图重现现实世界中的运动场，用以运动分析。 1.2光流法光流法是利用图像序列中像素在时间域上的变化以及相邻帧之间的相关性来找到上一帧跟当前帧之间存在的对应关系，从而计算出相邻帧之间物体的运动信息的一种方法。1.3光流法的基本假设条件亮度恒定不变。即同一目标在不同帧间运动时，其亮度不会发生改变。这是基本光流法的假定（所有光流法变种都必须满足），用于得到光流法基本方程；时间连续或运动是“小运动”。即时间的变化不会引起目标位置的剧烈变化，相邻帧之间位移要比较小。同样也是光流法不可或缺的假定。1.4光流场在空间中，运动可以用运动场描述，而在一个图像平面上，物体的运动往往是通过图像序列中不同图像灰度分布的不同体现的，从而，空间中的运动场转移到图像上就表示为光流场（optical flow field）。光流场是一个二维矢量场，它反映了图像上每一点灰度的变化趋势，可看成是带有灰度的像素点在图像平面上运动而产生的瞬时速度场。它包含的信息即是各像点的瞬时运动速度矢量信息。研究光流场的目的就是为了从序列图像中近似计算不能直接得到的运动场。光流场在理想情况下，光流场对应于运动场。1.5稠密光流与稀疏光流稠密光流稠密光流是一种针对图像或指定的某一片区域进行逐点匹配的图像配准方法，它计算图像上所有的点的偏移量，从而形成一个稠密的光流场。通过这个稠密的光流场，可以进行像素级别的图像配准。稀疏光流与稠密光流相反，稀疏光流并不对图像的每个像素点进行逐点计算。它通常需要指定一组点进行跟踪，这组点最好具有某种明显的特性，例如Harris角点等，那么跟踪就会相对稳定和可靠。稀疏跟踪的计算开销比稠密跟踪小得多。1.6光流法的优缺点优点光流法的优点在于它无须了解场景的信息,就可以准确地检测识别运动日标位置,且在摄像机处于运动的情况下仍然适用。而且光流不仅携带了运动物体的运动信息，而且还携带了有关景物三维结构的丰富信息，它能够在不知道场景的任何信息的情况下，检测出运动对象。缺点光流法的适用条件，即两个基本假设，在现实情况下均不容易满足。假设一：亮度恒定不变。但是实际情况是光流场并不一定反映了目标的实际运动情况,如图,所示。图中,光源不动,而物体表面均一,且产生了自传运动,却并没有产生光流图中,物体并没有运动,但是光源与物体发生相对运动,却有光流产生。因此可以说光流法法对光线敏感, 光线变化极易影响识别效果。假设二：小运动。现实情况下较大距离的运动也是普遍存在的。因此当需要检测的目标运动速度过快是，传统光流法也不适用。对稀疏光流算法而言存在着孔径问题，对稠密光流算法而言存在着计算量大的问题。观察上图(a)我们可以看到目标是在向右移动，但是由于“观察窗口”过小我们无法观测到边缘也在下降。LK算法中选区的小邻域就如同上图的观察窗口，邻域大小的选取会影响到最终的效果。当然，这是针对于一部分稀疏光流算法而言，属于稠密光流范畴的算法一般不存在这个问题。但是稠密光流法的显著缺点主要体现在,计算量大,耗时长,在对实时性要求苛刻的情况下并不适用。2.算法实现2.1稀疏光流-只跟踪某些角点(角点检测使用Shi-Tomasi检测算法）代码""" 稀疏光流，只跟踪某些角点 """ import numpy as np import cv2 import matplotlib.pyplot as plt video_path = "./test.mp4" cap = cv2.VideoCapture(video_path) # 打开视频 # ShiTomasi 角点检测参数 feature_params = dict( maxCorners = 100, qualityLevel = 0.3, minDistance = 7, blockSize = 7 ) # lucas kanade光流法参数 lk_params = dict( winSize = (15,15), maxLevel = 2, criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03)) # 创建随机颜色 color = np.random.randint(0,255,(100,3)) # 获取第一帧的灰度图像及其角点 ret, old_frame = cap.read() #获取第一帧 old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY) #找到原始灰度图 #获取第一帧的灰度图中的角点p0 p0 = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params) #创建一个蒙版用来画轨迹,i.e.和每帧图像大小相同的全0张量 mask = np.zeros_like(old_frame) # 对每帧图像计算光流并绘制光流轨迹 while(True): ret,frame = cap.read() if not ret: print("This video has been processed.") break frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # 计算每帧的光流 p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params) # 选取好的跟踪点 good_new = p1[st==1] good_old = p0[st==1] # 画出轨迹 for i,(new,old) in enumerate(zip(good_new,good_old)): a,b = new.ravel() c,d = old.ravel() mask = cv2.line(mask, (int(a),int(b)),(int(c),int(d)), color[i].tolist(), 2) #添加了该帧光流的轨迹图 frame = cv2.circle(frame,(int(a),int(b)),5,color[i].tolist(),-1) # 效果可视化 img = cv2.add(frame,mask) #将该图和轨迹图合并 img_show = np.hstack((img,mask)) cv2.imshow('frame',img_show) if cv2.waitKey(50)&0xFF==ord("q"): break # 更新"上一帧图像"和追踪点 old_gray = frame_gray.copy() p0 = good_new.reshape(-1,1,2) cv2.destroyAllWindows() cap.release() 实现效果2.2稠密光流-跟踪所有的像素点代码# 稠密光流 import numpy as np import cv2 video_path = "./test.mp4" cap = cv2.VideoCapture(video_path) # 打开视频 # 获取第一帧 ret, frame1 = cap.read() prvs = cv2.cvtColor(frame1,cv2.COLOR_BGR2GRAY) # 创建光流矢量绘制蒙版 hsv = np.zeros_like(frame1) # 遍历每一行的第1列 hsv[...,1] = 255 while(1): ret, frame2 = cap.read() next = cv2.cvtColor(frame2,cv2.COLOR_BGR2GRAY) # 返回一个两通道的光流向量，实际上是每个点的像素位移值 flow = cv2.calcOpticalFlowFarneback(prvs,next, None, 0.5, 3, 15, 3, 5, 1.2, 0) # 笛卡尔坐标转换为极坐标，获得极轴和极角 mag, ang = cv2.cartToPolar(flow[...,0], flow[...,1]) hsv[...,0] = ang*180/np.pi/2 hsv[...,2] = cv2.normalize(mag,None,0,255,cv2.NORM_MINMAX) rgb = cv2.cvtColor(hsv,cv2.COLOR_HSV2BGR) # 光流法结果可视化 img_show = np.hstack((frame2,rgb)) cv2.imshow('frame',img_show) if cv2.waitKey(50)&0xFF==ord("q"): break prvs = next cap.release() cv2.destroyAllWindows()实现效果参考资料计算机视觉--光流法(optical flow)简介：https://blog.csdn.net/qq_41368247/article/details/82562165光流法简述(重庆邮电大学.孔令上):https://wenku.baidu.com/view/7a2cb968ff00bed5b8f31d6c.htmlOPENCV对光流法的实现(PYTHON3):https://www.freesion.com/article/32791433918/角点检测：Harris 与 Shi-Tomasi：https://zhuanlan.zhihu.com/p/83064609python opencv入门光流法（41）:https://blog.csdn.net/tengfei461807914/article/details/80978947
- 2021年09月13日
- 1,163 阅读
- 0 评论
- 0 点赞

1
2
3