Scene Text Detection Resources(场景文字识别资源汇总) [转载] [翻译]

jupiter
2021-03-30 / 0 评论 / 684 阅读 / 正在检测是否收录...
温馨提示:
本文最后更新于2021年03月30日,已超过1332天没有更新,若内容或图片失效,请留言反馈。

1. 数据集

1.1 水平文字数据集

  • ICDAR 2003(IC03):

    • Introduction: 它总共包含509张图像,258张用于训练和251张用于测试。 具体来说,它在训练集中包含1110个文本实例,而在测试集中包含1156个文本实例。 它具有单词级注释。 IC03仅考虑英文文本实例。
    • Link: IC03-download
  • ICDAR 2011(IC11):

    • Introduction: IC11是用于文本检测的英语数据集。 它包含484张图像,229张用于训练和255张用于测试。 该数据集中有1564个文本实例。 它提供单词级和字符级注释。
    • Link:11-download
  • ICDAR 2013(IC13):

    • Introduction: IC13与IC11几乎相同。 它总共包含462张图像,用于训练的229张图像和用于测试的233张图像。 具体来说,它在训练集中包含849个文本实例,而在测试集中包含1095个文本实例。
    • Link: IC13-download

1.2 任意四边形文本数据集

  • USTB-SV1K:

    • Introduction:USTB-SV1K是英语数据集。 它包含来自Google街景视图的1000张街道图像,总共2955个文本实例。 它仅提供单词级注释。
    • Link: USTB-SV1K-download
  • SVT:

    • Introduction:它包含350张图像,总共725个英文文本实例。 SVT具有字符级别和单词级别的注释。 DVT的图像是从Google街景视图中获取的,分辨率较低。
    • Link: SVT-download
  • SVT-P:

    • Introduction: 它包含639个裁剪的单词图像以进行测试。 从Google街景视图的侧面快照中选择了图像。 因此,大多数图像会因非正面视角而严重失真。 它是SVT的改进数据集。
    • Link: SVT-P-download (Password : vnis)
  • ICDAR 2015(IC15):

    • Introduction: 它总共包含1500张图像,1000张用于训练和500张用于测试。 具体来说,它包含17548个文本实例。 它提供单词级别的注释。 IC15是第一个附带场景文本数据集,并且仅考虑英语单词。
    • Link: IC15-download
  • COCO-Text:

    • Introduction: 它总共包含63686张图像,用于训练的43686张图像,用于验证的10000张图像和用于测试的10000张图像。 具体来说,它包含145859个裁剪的单词图像以进行测试,包括手写和打印,清晰和模糊,英语和非英语。
    • Link: COCO-Text-download
  • MSRA-TD500:

    • Introduction: 它总共包含500张图像。 它提供文本行级别的注释而不是单词,并提供多边形框而不是轴对齐的矩形来进行文本区域注释。 它包含英文和中文文本实例。
    • Link: MSRA-TD500-download
  • MLT 2017:

    • Introduction:它总共包含10000个自然图像。 它提供单词级别的注释。 MLT有9种语言。 它是用于场景文本检测和识别的更真实和复杂的数据集。
    • Link: MLT-download
  • MLT 2019:

    • Introduction: 它总共包含18000张图像。 它提供单词级别的注释。 与MLT相比,此数据集有10种语言。 它是用于场景文本检测和识别的更真实和复杂的数据集。
    • Link: MLT-2019-download
  • CTW:

    • Introduction:它包含32285个中文文本的高分辨率街景图像,总共包含1018402个字符实例。 所有图像都在字符级别进行注释,包括其基础字符类型,绑定框和其他6个属性。 这些属性指示其背景是否复杂,是否凸起,是否为手写或印刷,是否被遮挡,是否扭曲,是否使用艺术字。
    • Link: CTW-download
  • RCTW-17:

    • Introduction:它总共包含12514张图像,用于训练的11514张图像和用于测试的1000张图像。 RCTW-17中的图像大部分是通过照相机或手机收集的,其他则是生成的图像。 文本实例用平行四边形注释。 它是第一个大规模的中文数据集,也是当时发布的最大的数据集。
    • Link: RCTW-17-download
  • ReCTS:

    • Introduction:该数据集是大规模的中国街景商标数据集。 它基于中文单词和中文文本行级标签。 标记方法是任意四边形标记。 它总共包含20000张图像。
    • Link: ReCTS-download

1.3 不规则文本数据集

  • CUTE80:

    • Introduction: 它包含在自然场景中拍摄的80张高分辨率图像。 具体来说,它包含288个裁剪的单词图像以进行测试。 数据集集中在弯曲的文本上。 没有提供词典。
    • Link: CUTE80-download
  • Total-Text:

    • Introduction: 它总共包含1,555张图像。 具体来说,它包含11459个经裁剪的单词图像,这些图像具有三种以上不同的文本方向:水平,多方向和弯曲。
    • Link: Total-Text-download
  • SCUT-CTW1500:

    • Introduction: 它总共包含1500张图像,1000张用于训练和500张用于测试。 具体来说,它包含10751个裁剪的单词图像以进行测试。 CTW-1500中的注释是具有14个顶点的多边形。 数据集主要由中文和英文组成。
    • Link: CTW-1500-download
  • LSVT:

    • Introduction: LSVT由20,000个测试数据,30,000个完整注释的训练数据和400,000个弱注释的训练数据组成,这些数据称为部分标签。 带标签的文本区域展示了文本的多样性:水平,多向和弯曲。
    • Link: LSVT-download
  • ArTs:

    • Introduction: ArT包含10,166张图像,5,603张用于训练和4,563张用于测试。 收集它们时会考虑到文本形状的多样性,并且所有文本形状在ArT中都有大量存在。
    • Link: ArT-download

1.4 合成数据集

  • Synth80k :

    • Introduction:它包含80万幅图像,其中包含约800万个合成词实例。 每个文本实例都用其文本字符串,单词级和字符级的边界框进行注释。
    • Link: Synth80k-download
  • SynthText :

    • Introduction:它包含600万个裁剪的单词图像。 生成过程与Synth90k相似。 它也以水平样式进行注释。
    • Link: SynthText-download

1.5 数据集对比

Comparison of Datasets
Datasets Language Image Text instance Text Shape Annotation level
Total Train Test Total Train Test Horizontal Arbitrary-Quadrilateral Multi-oriented Char Word Text-Line
IC03 English 509 258 251 2266 1110 1156
IC11 English 484 229 255 1564
IC13 English 462 229 233 1944 849 1095
USTB-SV1K English 1000 500 500 2955
SVT English 350 100 250 725 211 514
SVT-P English 238 639
IC15 English 1500 1000 500 17548 122318 5230
COCO-Text English 63686 43686 20000 145859 118309 27550
MSRA-TD500 English/Chinese 500 300 200
MLT 2017 Multi-lingual 18000 7200 10800
MLT 2019 Multi-lingual 20000 10000 10000
CTW Chinese 32285 25887 6398 1018402 812872 205530
RCTW-17 English/Chinese 12514 15114 1000
ReCTS Chinese 20000
CUTE80 English 80
Total-Text English 1525 1225 300 9330
CTW-1500 English/Chinese 1500 1000 500 10751
LSVT English/Chinese 450000 430000 20000
ArT English/Chinese 10166 5603 4563
Synth80k English 80k 8m
SynthText English 800k 6m

2. 场景文本检测资源总结

2.1 方法对比

场景文本检测方法可以分为四个部分:

  • (a) 传统方法;
  • (b) 基于分割的方法;
  • (c) 基于回归的方法;
  • (d) 混合方法.

注意

(1)“ Hori”代表水平场景文本数据集。

(2)“ Quad”代表任意四边形文本数据集。

(3)“ Irreg”代表不规则场景文本数据集。

(4)“传统方法”代表不依赖深度学习的方法。

2.1.1 传统方法

      Method            Model      Code Hori Quad Irreg Source Time                                                         Highlight                                                        
Yao et al. [1] TD-Mixture CVPR 2012 1) A new dataset MSRA-TD500 and protocol for evaluation. 2) Equipped a two-level classification scheme and two sets of features extractor.
Yin et al. [2]
TPAMI 2013 Extract Maximally Stable Extremal Regions (MSERs) as character candidates and group them together.
Le et al. [5] HOCC CVPR 2014 HOCC + MSERs
Yin et al. [7]
TPAMI 2015 Presenting a unified distance metric learning framework for adaptive hierarchical clustering.
Wu et al. [9]
TMM 2015 Exploring gradient directional symmetry at component level for smoothing edge components before text detection.
Tian et al. [17]
IJCAI 2016 Scene text is first detected locally in individual frames and finally linked by an optimal tracking trajectory.
Yang et al. [33]
TIP 2017 A text detector will locate character candidates and extract text regions. Then they will linked by an optimal tracking trajectory.
Liang et al. [8]
TIP 2015 Exploring maxima stable extreme regions along with stroke width transform for detecting candidate text regions.
Michal et al.[12] FASText ICCV 2015 Stroke keypoints are efficiently detected and then exploited to obtain stroke segmentations.

2.1.2基于分割的方法

       Method           Model      Code Hori Quad Irreg Source Time                                                                  Highlight                                                             
Li et al. [3]
TIP 2014 (1)develop three novel cues that are tailored for character detection and a Bayesian method for their integration; (2)design a Markov random field model to exploit the inherent dependencies between characters.
Zhang et al. [14]
CVPR 2016 Utilizing FCN for salient map detection and centroid of each character prediction.
Zhu et al. [16]
CVPR 2016 Performs a graph-based segmentation of connected components into words (Word-Graph).
He et al. [18] Text-CNN TIP 2016 Developing a new learning mechanism to train the Text-CNN with multi-level and rich supervised information.
Yao et al. [21]
arXiv 2016 Proposing to localize text in a holistic manner, by casting scene text detection as a semantic segmentation problem.
Hu et al. [27] WordSup ICCV 2017 Proposing a weakly supervised framework that can utilize word annotations. Then the detected characters are fed to a text structure analysis module.
Wu et al. [28]
ICCV 2017 Introducing the border class to the text detection problem for the first time, and validate that the decoding process is largely simplified with the help of text border.
Tang et al.[32]
TIP 2017 A text-aware candidate text region(CTR) extraction model + CTR refinement model.
Dai et al. [35] FTSN arXiv 2017 Detecting and segmenting the text instance jointly and simultaneously, leveraging merits from both semantic segmentation task and region proposal based object detection task.
Wang et al. [38]
ICDAR 2017 This paper proposes a novel character candidate extraction method based on super-pixel segmentation and hierarchical clustering.
Deng et al. [40] PixelLink AAAI 2018 Text instances are first segmented out by linking pixels wthin the same instance together.
Liu et al. [42] MCN CVPR 2018 Stochastic Flow Graph (SFG) + Markov Clustering.
Lyu et al. [43]
CVPR 2018 Detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions.
Chu et al. [45] Border ECCV 2018 The paper presents a novel scene text detection technique that makes use of semantics-aware text borders and bootstrapping based text segment augmentation.
Long et al. [46] TextSnake ECCV 2018 The paper proposes TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms based on symmetry axis.
Yang et al. [47] IncepText IJCAI 2018 Designing a novel Inception-Text module and introduce deformable PSROI pooling to deal with multi-oriented text detection.
Yue et al. [48]
BMVC 2018 Proposing a general framework for text detection called Guided CNN to achieve the two goals simultaneously.
Zhong et al. [53] AF-RPN arXiv 2018 Presenting AF-RPN(anchor-free) as an anchor-free and scale-friendly region proposal network for the Faster R-CNN framework.
Wang et al. [54] PSENet CVPR 2019 Proposing a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance.
Xu et al.[57] TextField arXiv 2018 Presenting a novel direction field which can represent scene texts of arbitrary shapes.
Tian et al. [58] FTDN ICIP 2018 FTDN is able to segment text region and simultaneously regress text box at pixel-level.
Tian et al. [83]
CVPR 2019 Constraining embedding feature of pixels inside the same text region to share similar properties.
Huang et al. [4] MSERs-CNN ECCV 2014 Combining MSERs with CNN
Sun et al. [6]
PR 2015 Presenting a robust text detection approach based on color-enhanced CER and neural networks.
Baek et al. [62] CRAFT CVPR 2019 Proposing CRAFT effectively detect text area by exploring each character and affinity between characters.
Richardson et al. [87]
WACV 2019 Presenting an additional scale predictor the estimate the better scale of text regions for testing.
Wang et al. [88] SAST ACMM 2019 Presenting a context attended multi-task learning framework for scene text detection.
Wang et al. [90] PAN ICCV 2019 Proposing an efficient and accurate arbitrary-shaped text detector called Pixel Aggregation Network(PAN),

2.1.3 基于回归的方法

      Method            Model      Code Hori Quad Irreg Source Time                                                       Highlight                                                                        
Gupta et al. [15] FCRN CVPR 2016 (a) Proposing a fast and scalable engine to generate synthetic images of text in clutter; (b) FCRN.
Zhong et al. [20] DeepText arXiv 2016 (a) Inception-RPN; (b) Utilize ambiguous text category (ATC) information and multilevel region-of-interest pooling (MLRP).
Liao et al. [22] TextBoxes AAAI 2017 Mainly basing SSD object detection framework.
Liu et al. [25] DMPNet CVPR 2017 Quadrilateral sliding windows + shared Monte-Carlo method for fast and accurate computing of the polygonal areas + a sequential protocol for relative regression.
He et al. [26] DDR ICCV 2017 Proposing an FCN that has bi-task outputs where one is pixel-wise classification between text and non-text, and the other is direct regression to determine the vertex coordinates of quadrilateral text boundaries.
Jiang et al. [36] R2CNN arXiv 2017 Using the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations.
Xing et al. [37] ArbiText arXiv 2017 Adopting the circle anchors and incorporating a pyramid pooling module into the Single Shot MultiBox Detector framework.
Zhang et al. [39] FEN AAAI 2018 Proposing a refined scene text detector with a novel Feature Enhancement Network (FEN) for Region Proposal and Text Detection Refinement.
Wang et al. [41] ITN CVPR 2018 ITN is presented to learn the geometry-aware representation encoding the unique geometric configurations of scene text instances with in-network transformation embedding.
Liao et al. [44] RRD CVPR 2018 The regression branch extracts rotation-sensitive features, while the classification branch extracts rotation-invariant features by pooling the rotation sensitive features.
Liao et al. [49] TextBoxes++ TIP 2018 Mainly basing SSD object detection framework and it replaces the rectangular box representation in conventional object detector by a quadrilateral or oriented rectangle representation.
He et al. [50]
TIP 2018 Proposing a scene text detection framework based on fully convolutional network with a bi-task prediction module.
Ma et al. [51] RRPN TMM 2018 RRPN + RRoI Pooling.
Zhu et al. [55] SLPR arXiv 2018 SLPR regresses multiple points on the edge of text line and then utilizes these points to sketch the outlines of the text.
Deng et al. [56]
arXiv 2018 CRPN employs corners to estimate the possible locations of text instances. And it also designs a embedded data augmentation module inside region-wise subnetwork.
Cai et al. [59] FFN ICIP 2018 Proposing a Feature Fusion Network to deal with text regions differing in enormous sizes.
Sabyasachi et al. [60] RGC ICIP 2018 Proposing a novel recurrent architecture to improve the learnings of a feature map at a given time.
Liu et al. [63] CTD PR 2019 CTD + TLOC + PNMS
Xie et al. [79] DeRPN AAAI 2019 DeRPN utilizes anchor string mechanism instead of anchor box in RPN.
Wang et al. [82]
CVPR 2019 Text-RPN + RNN
Liu et al. [84]
CVPR 2019 CSE mechanism
He et al. [29] SSTD ICCV 2017 Proposing an attention mechanism. Then developing a hierarchical inception module which efficiently aggregates multi-scale inception features.
Tian et al. [11]
ICCV 2015 Cascade boosting detects character candidates, and the min-cost flow network model get the final result.
Tian et al. [13] CTPN ECCV 2016 1) RPN + LSTM. 2) RPN incorporate a new vertical anchor mechanism and LSTM connects the region to get the final result.
He et al. [19]
ACCV 2016 ER detetctor detects regions to get coarse prediction of text regions. Then the local context is aggregated to classify the remaining regions to obtain a final prediction.
Shi et al. [23] SegLink CVPR 2017 Decomposing text into segments and links. A link connects two adjacent segments.
Tian et al. [30] WeText ICCV 2017 Proposing a weakly supervised scene text detection method (WeText).
Zhu et al. [31] RTN ICDAR 2017 Mainly basing CTPN vertical vertical proposal mechanism.
Ren et al. [34]
TMM 2017 Proposing a CNN-based detector. It contains a text structure component detector layer, a spatial pyramid layer, and a multi-input-layer deep belief network (DBN).
Zhang et al. [10]
CVPR 2015 The proposed algorithm exploits the symmetry property of character groups and allows for direct extraction of text lines from natural images.
Wang et al. [86] DSRN IJCAI 2019 Presenting a scale-transfer module and scale relationship module to handle the problem of scale variation.
Tang et al.[89] Seglink++ PR 2019 Presenting instance aware component grouping (ICG) for arbitrary-shape text detection.
Wang et al.[92] ContourNet CVPR 2020 1.A scale-insensitive Adaptive Region Proposal Network (AdaptiveRPN); 2. Local Orthogonal Texture-aware Module (LOTM).

2.1.4 混合方法

       Method           Model      Code Hori Quad Irreg Source Time                                                              Highlight                                                                 
Tang et al. [52] SSFT TMM 2018 Proposing a novel scene text detection method that involves superpixel-based stroke feature transform (SSFT) and deep learning based region classification (DLRC).
Xie et al.[61] SPCNet AAAI 2019 Text Context module + Re-Score mechanism.
Liu et al. [64] PMTD arXiv 2019 Perform “soft” semantic segmentation. It assigns a soft pyramid label (i.e., a real value between 0 and 1) for each pixel within text instance.
Liu et al. [80] BDN IJCAI 2019 Discretizing bouding boxes into key edges to address label confusion for text detection.
Zhang et al. [81] LOMO CVPR 2019 DR + IRM + SEM
Zhou et al. [24] EAST CVPR 2017 The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images with instance segmentation.
Yue et al. [48]
BMVC 2018 Proposing a general framework for text detection called Guided CNN to achieve the two goals simultaneously.
Zhong et al. [53] AF-RPN arXiv 2018 Presenting AF-RPN(anchor-free) as an anchor-free and scale-friendly region proposal network for the Faster R-CNN framework.
Xue et al.[85] MSR IJCAI 2019 Presenting a noval multi-scale regression network.
Liao et al. [91] DB AAAI 2020 Presenting differentiable binarization module to adaptively set the thresholds for binarization, which simplifies the post-processing.
Xiao et al. [93] SDM ECCV 2020 1. A novel sequential deformation method; 2. auxiliary character counting supervision.

2.2 检测结果

2.2.1 水平文本数据集的检测结果

Method                Model Source Time Method Category IC11[68] IC13 [69] IC05[67]
P R F P R F P R F
Yao et al. [1] TD-Mixture CVPR 2012 Traditional ~ ~ ~ 0.69 0.66 0.67 ~ ~ ~
Yin et al. [2]
TPAMI 2013 0.86 0.68 0.76 ~ ~ ~ ~ ~ ~
Yin et al. [7]
TPAMI 2015 0.838 0.66 0.738 ~ ~ ~ ~ ~ ~
Wu et al. [9]
TMM 2015 ~ ~ ~ 0.76 0.70 0.73 ~ ~ ~
Liang et al. [8]
TIP 2015 0.77 0.68 0.71 0.76 0.68 0.72 ~ ~ ~
Michal et al.[12] FASText ICCV 2015 ~ ~ ~ 0.84 0.69 0.77 ~ ~ ~
Li et al. [3]
TIP 2014 Segmentation 0.80 0.62 0.70 ~ ~ ~ ~ ~ ~
Zhang et al. [14]
CVPR 2016 ~ ~ ~ 0.88 0.78 0.83 ~ ~ ~
He et al. [18] Text-CNN TIP 2016 0.91 0.74 0.82 0.93 0.73 0.82 0.87 0.73 0.79
Yao et al. [21]
arXiv 2016 ~ ~ ~ 0.889 0.802 0.843 ~ ~ ~
Hu et al. [27] WordSup ICCV 2017 ~ ~ ~ 0.933 0.875 0.903 ~ ~ ~
Tang et al.[32]
TIP 2017 0.90 0.86 0.88 0.92 0.87 0.89 ~ ~ ~
Wang et al. [38]
ICDAR 2017 0.87 0.78 0.82 0.87 0.82 0.84 ~ ~ ~
Deng et al. [40] PixelLink AAAI 2018 ~ ~ ~ 0.886 0.875 0.881 ~ ~ ~
Liu et al. [42] MCN CVPR 2018 ~ ~ ~ 0.88 0.87 0.88 ~ ~ ~
Lyu et al. [43]
CVPR 2018 ~ ~ ~ 0.92 0.844 0.880 ~ ~ ~
Chu et al. [45] Border ECCV 2018 ~ ~ ~ 0.915 0.871 0.892 ~ ~ ~
Wang et al. [54] PSENet CVPR 2019 ~ ~ ~ 0.94 0.90 0.92 ~ ~ ~
Huang et al. [4] MSERs-CNN ECCV 2014 0.88 0.71 0.78 ~ ~ ~ 0.84 0.67 0.75
Sun et al. [6]
PR 2015 0.92 0.91 0.91 0.94 0.92 0.93 ~ ~ ~
Gupta et al. [15] FCRN CVPR 2016 Regression 0.94 0.77 0.85 0.938 0.764 0.842 ~ ~ ~
Zhong et al. [20] DeepText arXiv 2016 0.87 0.83 0.85 0.85 0.81 0.83 ~ ~ ~
Liao et al. [22] TextBoxes AAAI 2017 0.89 0.82 0.86 0.89 0.83 0.86 ~ ~ ~
Liu et al. [25] DMPNet CVPR 2017 ~ ~ ~ 0.93 0.83 0.870 ~ ~ ~
Jiang et al. [36] R2CNN arXiv 2017 ~ ~ ~ 0.92 0.81 0.86 ~ ~ ~
Xing et al. [37] ArbiText arXiv 2017 ~ ~ ~ 0.826 0.936 0.877 ~ ~ ~
Wang et al. [41] ITN CVPR 2018 0.896 0.889 0.892 0.941 0.893 0.916 ~ ~ ~
Liao et al. [49] TextBoxes++ TIP 2018 ~ ~ ~ 0.92 0.86 0.89 ~ ~ ~
He et al. [50]
TIP 2018 ~ ~ ~ 0.91 0.84 0.88 ~ ~ ~
Ma et al. [51] RRPN TMM 2018 ~ ~ ~ 0.95 0.89 0.91 ~ ~ ~
Zhu et al. [55] SLPR arXiv 2018 ~ ~ ~ 0.90 0.72 0.80 ~ ~ ~
Cai et al. [59] FFN ICIP 2018 ~ ~ ~ 0.92 0.84 0.876 ~ ~ ~
Sabyasachi et al. [60] RGC ICIP 2018 ~ ~ ~ 0.89 0.77 0.83 ~ ~ ~
Wang et al. [82]
CVPR 2019 ~ ~ ~ 0.937 0.878 0.907 ~ ~ ~
Liu et al. [84]
CVPR 2019 ~ ~ ~ 0.937 0.897 0.917 ~ ~ ~
He et al. [29] SSTD ICCV 2017 ~ ~ ~ 0.89 0.86 0.88 ~ ~ ~
Tian et al. [11]
ICCV 2015 0.86 0.76 0.81 0.852 0.759 0.802 ~ ~ ~
Tian et al. [13] CTPN ECCV 2016 ~ ~ ~ 0.93 0.83 0.88 ~ ~ ~
He et al. [19]
ACCV 2016 ~ ~ ~ 0.90 0.75 0.81 ~ ~ ~
Shi et al. [23] SegLink CVPR 2017 ~ ~ ~ 0.877 0.83 0.853 ~ ~ ~
Tian et al. [30] WeText ICCV 2017 ~ ~ ~ 0.911 0.831 0.869 ~ ~ ~
Zhu et al. [31] RTN ICDAR 2017 ~ ~ ~ 0.94 0.89 0.91 ~ ~ ~
Ren et al. [34]
TMM 2017 0.78 0.67 0.72 0.81 0.67 0.73 ~ ~ ~
Zhang et al. [10]
CVPR 2015 0.84 0.76 0.80 0.88 0.74 0.80 ~ ~ ~
Tang et al. [52] SSFT TMM 2018 Hybrid 0.906 0.847 0.876 0.911 0.861 0.885 ~ ~ ~
Xie et al.[61] SPCNet AAAI 2019 ~ ~ ~ 0.94 0.91 0.92 ~ ~ ~
Liu et al. [80] BDN IJCAI 2019 ~ ~ ~ 0.887 0.894 0.89 ~ ~ ~
Zhou et al. [24] EAST CVPR 2017 ~ ~ ~ 0.93 0.83 0.870 ~ ~ ~
Yue et al. [48]
BMVC 2018 ~ ~ ~ 0.885 0.846 0.870 ~ ~ ~
Zhong et al. [53] AF-RPN arXiv 2018 ~ ~ ~ 0.94 0.90 0.92 ~ ~ ~
Xue et al.[85] MSR IJCAI 2019 ~ ~ ~ 0.918 0.885 0.901 ~ ~ ~

2.2.2 任意四边形文本数据集的检测结果

Method                Model Source Time Method Category IC15 [70] MSRA-TD500 [71] USTB-SV1K [65] SVT [66]
P R F P R F P R F P R F
Le et al. [5] HOCC CVPR 2014 Traditional ~ ~ ~ 0.71 0.62 0.66 ~ ~ ~ ~ ~ ~
Yin et al. [7]
TPAMI 2015 ~ ~ ~ 0.81 0.63 0.71 0.499 0.454 0.475 ~ ~ ~
Wu et al. [9]
TMM 2015 ~ ~ ~ 0.63 0.70 0.66 ~ ~ ~ ~ ~ ~
Tian et al. [17]
IJCAI 2016 ~ ~ ~ 0.95 0.58 0.721 0.537 0.488 0.51 ~ ~ ~
Yang et al. [33]
TIP 2017 ~ ~ ~ 0.95 0.58 0.72 0.54 0.49 0.51 ~ ~ ~
Liang et al. [8]
TIP 2015 ~ ~ ~ 0.74 0.66 0.70 ~ ~ ~ ~ ~ ~
Zhang et al. [14]
CVPR 2016 Segmentation 0.71 0.43 0.54 0.83 0.67 0.74 ~ ~ ~ ~ ~ ~
Zhu et al. [16]
CVPR 2016 0.81 0.91 0.85 ~ ~ ~ ~ ~ ~ ~ ~ ~
He et al. [18] Text-CNN TIP 2016 ~ ~ ~ 0.76 0.61 0.69 ~ ~ ~ ~ ~ ~
Yao et al. [21]
arXiv 2016 0.723 0.587 0.648 0.765 0.753 0.759 ~ ~ ~ ~ ~ ~
Hu et al. [27] WordSup ICCV 2017 0.793 0.77 0.782 ~ ~ ~ ~ ~ ~ ~ ~ ~
Wu et al. [28]
ICCV 2017 0.91 0.78 0.84 0.77 0.78 0.77 ~ ~ ~ ~ ~ ~
Dai et al. [35] FTSN arXiv 2017 0.886 0.80 0.841 0.876 0.771 0.82 ~ ~ ~ ~ ~ ~
Deng et al. [40] PixelLink AAAI 2018 0.855 0.820 0.837 0.830 0.732 0.778 ~ ~ ~ ~ ~ ~
Liu et al. [42] MCN CVPR 2018 0.72 0.80 0.76 0.88 0.79 0.83 ~ ~ ~ ~ ~ ~
Lyu et al. [43]
CVPR 2018 0.895 0.797 0.843 0.876 0.762 0.815 ~ ~ ~ ~ ~ ~
Chu et al. [45] Border ECCV 2018 ~ ~ ~ 0.830 0.774 0.801 ~ ~ ~ ~ ~ ~
Long et al. [46] TextSnake ECCV 2018 0.849 0.804 0.826 0.832 0.739 0.783 ~ ~ ~ ~ ~ ~
Yang et al. [47] IncepText IJCAI 2018 0.938 0.873 0.905 0.875 0.790 0.830 ~ ~ ~ ~ ~ ~
Wang et al. [54] PSENet CVPR 2019 0.8692 0.845 0.8569 ~ ~ ~ ~ ~ ~ ~ ~ ~
Xu et al.[57] TextField arXiv 2018 0.843 0.805 0.824 0.874 0.759 0.813 ~ ~ ~ ~ ~ ~
Tian et al. [58] FTDN ICIP 2018 0.847 0.773 0.809 ~ ~ ~ ~ ~ ~ ~ ~ ~
Tian et al. [83]
CVPR 2019 0.883 0.850 0.866 0.842 0.817 0.829 ~ ~ ~ ~ ~ ~
Baek et al. [62] CRAFT CVPR 2019 0.898 0.843 0.869 0.882 0.782 0.829 ~ ~ ~ ~ ~ ~
Richardson et al. [87]
IJCAI 2019 0.853 0.83 0.827 ~ ~ ~ ~ ~ ~ ~ ~ ~
Wang et al. [88] SAST ACMM 2019 0.8755 0.8734 0.8744 ~ ~ ~ ~ ~ ~ ~ ~ ~
Wang et al. [90] PAN ICCV 2019 0.84 0.819 0.829 0.844 0.838 0.821 ~ ~ ~ ~ ~ ~
Gupta et al. [15] FCRN CVPR 2016 Regression ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.651 0.599 0.624
Liu et al. [25] DMPNet CVPR 2017 0.732 0.682 0.706 ~ ~ ~ ~ ~ ~ ~ ~ ~
He et al. [26] DDR ICCV 2017 0.82 0.80 0.81 0.77 0.70 0.74 ~ ~ ~ ~ ~ ~
Jiang et al. [36] R2CNN arXiv 2017 0.856 0.797 0.825 ~ ~ ~ ~ ~ ~ ~ ~ ~
Xing et al. [37] ArbiText arXiv 2017 0.792 0.735 0.759 0.78 0.72 0.75 ~ ~ ~ ~ ~ ~
Wang et al. [41] ITN CVPR 2018 0.857 0.741 0.795 0.903 0.723 0.803 ~ ~ ~ ~ ~ ~
Liao et al. [44] RRD CVPR 2018 0.88 0.8 0.838 0.876 0.73 0.79 ~ ~ ~ ~ ~ ~
Liao et al. [49] TextBoxes++ TIP 2018 0.878 0.785 0.829 ~ ~ ~ ~ ~ ~ ~ ~ ~
He et al. [50]
TIP 2018 0.85 0.80 0.82 0.91 0.81 0.86 ~ ~ ~ ~ ~ ~
Ma et al. [51] RRPN TMM 2018 0.822 0.732 0.774 0.821 0.677 0.742 ~ ~ ~ ~ ~ ~
Zhu et al. [55] SLPR arXiv 2018 0.855 0.836 0.845 ~ ~ ~ ~ ~ ~ ~ ~ ~
Deng et al. [56]
arXiv 2018 0.89 0.81 0.845 ~ ~ ~ ~ ~ ~ ~ ~ ~
Sabyasachi et al. [60] RGC ICIP 2018 0.83 0.81 0.82 0.85 0.76 0.80 ~ ~ ~ ~ ~ ~
Wang et al. [82]
CVPR 2019 0.892 0.86 0.876 0.852 0.821 0.836 ~ ~ ~ ~ ~ ~
He et al. [29] SSTD ICCV 2017 0.80 0.73 0.77 ~ ~ ~ ~ ~ ~ ~ ~ ~
Tian et al. [13] CTPN ECCV 2016 0.74 0.52 0.61 ~ ~ ~ ~ ~ ~ ~ ~ ~
He et al. [19]
ACCV 2016 ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.87 0.73 0.79
Shi et al. [23] SegLink CVPR 2017 0.731 0.768 0.75 0.86 0.70 0.77 ~ ~ ~ ~ ~ ~
Wang et al. [86] DSRN IJCAI 2019 0.832 0.796 0.814 0.876 0.712 0.785 ~ ~ ~ ~ ~ ~
Tang et al.[89] Seglink++ PR 2019 0.837 0.803 0.820 ~ ~ ~ ~ ~ ~ ~ ~ ~
Wang et al. [92] ContourNet CVPR 2020 0.876 0.861 0.869 ~ ~ ~ ~ ~ ~ ~ ~ ~
Tang et al. [52] SSFT TMM 2018 Hybrid ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.541 0.758 0.631
Xie et al.[61] SPCNet AAAI 2019 0.89 0.86 0.87 ~ ~ ~ ~ ~ ~ ~ ~ ~
Liu et al. [64] PMTD arXiv 2019 0.913 0.874 0.893 ~ ~ ~ ~ ~ ~ ~ ~ ~
Liu et al. [80] BDN IJCAI 2019 0.881 0.846 0.863 0.87 0.815 0.842 ~ ~ ~ ~ ~ ~
Zhang et al. [81] LOMO CVPR 2019 0.878 0.876 0.877 ~ ~ ~ ~ ~ ~ ~ ~ ~
Zhou et al. [24] EAST CVPR 2017 0.833 0.783 0.807 0.873 0.674 0.761 ~ ~ ~ ~ ~ ~
Yue et al. [48]
BMVC 2018 0.866 0.789 0.823 ~ ~ ~ ~ ~ ~ 0.691 0.660 0.675
Zhong et al. [53] AF-RPN arXiv 2018 0.89 0.83 0.86 ~ ~ ~ ~ ~ ~ ~ ~ ~
Xue et al.[85] MSR IJCAI 2019 ~ ~ ~ 0.874 0.767 0.817 ~ ~ ~ ~ ~ ~
Liao et al. [91] DB AAAI 2020 0.918 0.832 0.873 0.915 0.792 0.849 ~ ~ ~ ~ ~ ~
Xiao et al. [93] SDM ECCV 2020 0.9196 0.8922 0.9057 ~ ~ ~ ~ ~ ~ ~ ~ ~
Method                Model Source Time Method Category IC15 [70] MSRA-TD500 [71] USTB-SV1K [65] SVT [66]
P R F P R F P R F P R F
Le et al. [5] HOCC CVPR 2014 Traditional ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.80 0.73 0.76
Yao et al. [21]
arXiv 2016 Segmentation 0.432 0.27 0.333 ~ ~ ~ ~ ~ ~ ~ ~ ~
Hu et al. [27] WordSup ICCV 2017 0.452 0.309 0.368 ~ ~ ~ ~ ~ ~ ~ ~ ~
Lyu et al. [43]
CVPR 2018 0.351 0.348 0.349 ~ ~ ~ 0.743 0.706 0.724 ~ ~ ~
Chu et al. [45] Border ECCV 2018 ~ ~ ~ 0.782 0.588 0.671 0.777 0.621 0.690 ~ ~ ~
Yang et al. [47] IncepText IJCAI 2018 ~ ~ ~ 0.785 0.569 0.660 ~ ~ ~ ~ ~ ~
Wang et al. [54] PSENet CVPR 2019 ~ ~ ~ ~ ~ ~ 0.7535 0.6918 0.7213 ~ ~ ~
Baek et al. [62] CRAFT CVPR 2019 ~ ~ ~ ~ ~ ~ 0.806 0.682 0.739 ~ ~ ~
He et al. [29] SSTD ICCV 2017 Regression 0.46 0.31 0.37 ~ ~ ~ ~ ~ ~ ~ ~ ~
Gupta et al. [15] FCRN CVPR 2016 ~ ~ ~ ~ ~ ~ 0.844 0.763 0.801 ~ ~ ~
Liao et al. [49] TextBoxes++ TIP 2018 0.61 0.57 0.59 ~ ~ ~ ~ ~ ~ ~ ~ ~
Ma et al. [51] RRPN TMM 2018 ~ ~ ~ ~ ~ ~ 0.7669 0.5794 0.6601 ~ ~ ~
Deng et al. [56]
arXiv 2018 0.555 0.633 0.591 ~ ~ ~ ~ ~ ~ ~ ~ ~
Cai et al. [59] FFN ICIP 2018 0.43 0.35 0.39 ~ ~ ~ ~ ~ ~ ~ ~ ~
Xie et al. [79] DeRPN AAAI 2019 0.586 0.557 0.571 ~ ~ ~ ~ ~ ~ ~ ~ ~
He et al. [29] SSTD ICCV 2017 0.46 0.31 0.37 ~ ~ ~ ~ ~ ~ ~ ~ ~
Liao et al. [44] RRD CVPR 2018 ~ ~ ~ 0.591 0.775 0.670 ~ ~ ~ ~ ~ ~
Richardson et al. [87]
IJCAI 2019 ~ ~ ~ ~ ~ ~ 0.729 0.618 0.669 ~ ~ ~
Wang et al. [88] SAST ACMM 2019 ~ ~ ~ ~ ~ ~ 0.7935 0.6653 0.7237 ~ ~ ~
Xie et al.[61] SPCNet AAAI 2019 Hybrid ~ ~ ~ ~ ~ ~ 0.806 0.686 0.741 ~ ~ ~
Liu et al. [64] PMTD arXiv 2019 ~ ~ ~ ~ ~ ~ 0.844 0.763 0.801 ~ ~ ~
Liu et al. [80] BDN IJCAI 2019 ~ ~ ~ ~ ~ ~ 0.791 0.698 0.742 ~ ~ ~
Zhang et al. [81] LOMO CVPR 2019 ~ ~ ~ 0.791 0.602 0.684 0.802 0.672 0.731 ~ ~ ~
Zhou et al. [24] EAST CVPR 2017 0.504 0.324 0.395 ~ ~ ~ ~ ~ ~ ~ ~ ~
Zhong et al. [53] AF-RPN arXiv 2018 ~ ~ ~ ~ ~ ~ 0.75 0.66 0.70 ~ ~ ~
Liao et al. [91] DB AAAI 2020 ~ ~ ~ ~ ~ ~ 0.831 0.679 0.747 ~ ~ ~
Xiao et al. [93] SDM ECCV 2020 ~ ~ ~ ~ ~ ~ 0.8679 0.7526 0.8061 ~ ~ ~

2.2.3 不规则文本数据集的检测结果

在本节中,我们仅选择适用于不规则文本检测的那些方法。

Method                Model Source Time Method Category Total-text [74] SCUT-CTW1500 [75]
P R F P R F
Baek et al. [62] CRAFT CVPR 2019 Segmentation 0.876 0.799 0.836 0.860 0.811 0.835
Long et al. [46] TextSnake ECCV 2018 0.827 0.745 0.784 0.679 0.853 0.756
Tian et al. [83]
CVPR 2019 ~ ~ ~ 81.7 84.2 80.1
Wang et al. [54] PSENet CVPR 2019 0.840 0.779 0.809 0.848 0.797 0.822
Wang et al. [88] SAST ACMM 2019 0.8557 0.7549 0.802 0.8119 0.8171 0.8145
Wang et al. [90] PAN ICCV 2019 0.893 0.81 0.85 0.864 0.812 0.837
Zhu et al. [55] SLPR arXiv 2018 Regression ~ ~ ~ 0.801 0.701 0.748
Liu et al. [63] CTD+TLOC PR 2019 ~ ~ ~ 0.774 0.698 0.734
Wang et al. [82]
CVPR 2019 ~ ~ ~ 80.1 80.2 80.1
Liu et al. [84]
CVPR 2019 0.814 0.791 0.802 0.787 0.761 0.774
Tang et al.[89] Seglink++ PR 2019 0.829 0.809 0.815 0.828 0.798 0.813
Wang et al. [92] ContourNet CVPR 2020 0.869 0.839 0.854 0.837 0.841 0.839
Zhang et al. [81] LOMO CVPR 2019 Hybrid 0.876 0.793 0.833 0.857 0.765 0.808
Xie et al.[61] SPCNet AAAI 2019 0.83 0.83 0.83 ~ ~ ~
Xue et al.[85] MSR IJCAI 2019 0.852 0.73 0.768 0.838 0.778 0.807
Liao et al. [91] DB AAAI 2020 0.871 0.825 0.847 0.869 0.802 0.834
Xiao et al.[93] SDM ECCV 2020 0.9085 0.8603 0.8837 0.884 0.8442 0.8636

3. 综述

[A] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper

[B] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper

[C] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper

4. Evaluation

如果您有兴趣开发更好的场景文本检测指标,那么这里推荐的一些参考可能会有用:

[A] Wolf, Christian, and Jean-Michel Jolion. "Object count/area graphs for the evaluation of object detection and segmentation algorithms." International Journal of Document Analysis and Recognition (IJDAR) 8.4 (2006): 280-296. paper

[B] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160, 2015. paper

[C] Calarasanu, Stefania, Jonathan Fabrizio, and Severine Dubuisson. "What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions." Image and Vision Computing 46 (2016): 1-17. paper

[D] Shi, Baoguang, et al. "ICDAR2017 competition on reading chinese text in the wild (RCTW-17)." 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017. paper

[E] Nayef, N; Yin, F; Bizid, I; et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE.
paper

[F] Dangla, Aliona, et al. "A first step toward a fair comparison of evaluation protocols for text detection algorithms." 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 2018. paper

[G] He,Mengchao and Liu, Yuliang, et al. ICPR2018 Contest on Robust Reading for Multi-Type Web images. ICPR 2018. paper

[H] Liu, Yuliang and Jin, Lianwen, et al. "Tightness-aware Evaluation Protocol for Scene Text Detection" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019. paper code

5. OCR Service

OCRAPIFree
Tesseract OCR Engine×
Azure
ABBYY
OCR Space
SODA PDF OCR
Free Online OCR
Online OCR
Super Tools
Online Chinese Recognition
Calamari OCR×
Tencent OCR×

6. References and Code

[1] Yao C, Bai X, Liu W, et al. Detecting texts of arbitrary orientations in natural images. 2012 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 1083-1090. Paper
[2] Yin X C, Yin X, Huang K, et al. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 36(5): 970-83. Paper
[3] Li Y, Jia W, Shen C, et al. Characterness: An indicator of text in the wild. IEEE transactions on image processing, 2014, 23(4): 1666-1677. Paper
[4] Huang W, Qiao Y, Tang X. Robust scene text detection with convolution neural network induced mser trees. European Conference on Computer Vision(ECCV), 2014: 497-511. Paper
[5] Kang L, Li Y, Doermann D. Orientation robust text line detection in natural images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 4034-4041. Paper
[6] Sun L, Huo Q, Jia W, et al. A robust approach for text detection from natural scene images. Pattern Recognition, 2015, 48(9): 2906-2920. Paper
[7] Yin X C, Pei W Y, Zhang J, et al. Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015 (9): 1930-1937. Paper
[8] Liang G, Shivakumara P, Lu T, et al. Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Transactions on Image Processing, 2015, 24(11): 4488-4501. Paper
[9] Wu L, Shivakumara P, Lu T, et al. A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video. IEEE Trans. Multimedia, 2015, 17(8): 1137-1152. Paper
[10] Zheng Z, Wei S, et al. Symmetry-based text line detection in natural scenes. IEEE Conference on Computer Vision & Pattern Recognition(CVPR), 2015. Paper
[11] Tian S, Pan Y, Huang C, et al. Text flow: A unified text detection system in natural scene images. Proceedings of the IEEE international conference on computer vision(ICCV). 2015: 4651-4659. Paper
[12] Buta M, et al. FASText: Efficient unconstrained scene text detector. 2015 IEEE International Conference on Computer Vision (ICCV). 2015: 1206-1214. Paper
[13] Tian Z, Huang W, He T, et al. Detecting text in natural image with connectionist text proposal network. European conference on computer vision(ECCV), 2016: 56-72. Paper Code
[14] Zhang Z, Zhang C, Shen W, et al. Multi-oriented text detection with fully convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 4159-4167. Paper
[15] Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 2315-2324. Paper Code
[16] S. Zhu and R. Zanibbi, A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 625-632. Paper
[17] Tian S, Pei W Y, Zuo Z Y, et al. Scene Text Detection in Video by Learning Locally and Globally. IJCAI. 2016: 2647-2653. Paper
[18] He T, Huang W, Qiao Y, et al. Text-attentional convolutional neural network for scene text detection. IEEE transactions on image processing, 2016, 25(6): 2529-2541. Paper
[19] He, Dafang and Yang, Xiao and Huang, Wenyi and Zhou, Zihan and Kifer, Daniel and Giles, C Lee. Aggregating local context for accurate scene text detection. ACCV, 2016. Paper
[20] Zhong Z, Jin L, Zhang S, et al. Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314, 2016. Paper
[21] Yao C, Bai X, Sang N, et al. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002, 2016. Paper
[22] Liao M, Shi B, Bai X, et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI. 2017: 4161-4167. Paper Code
[23] Shi B, Bai X, Belongie S. Detecting Oriented Text in Natural Images by Linking Segments. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 3482-3490. Paper Code
[24] Zhou X, Yao C, Wen H, et al. EAST: an efficient and accurate scene text detector. CVPR, 2017: 2642-2651. Paper Code
[25] Liu Y, Jin L. Deep matching prior network: Toward tighter multi-oriented text detection. CVPR, 2017: 3454-3461. Paper
[26] He W, Zhang X Y, Yin F, et al. Deep Direct Regression for Multi-Oriented Scene Text Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2017: 745-753. Paper
[27] Hu H, Zhang C, Luo Y, et al. Wordsup: Exploiting word annotations for character based text detection. ICCV, 2017. Paper
[28] Wu Y, Natarajan P. Self-organized text detection with minimal post-processing via border learning. ICCV, 2017. Paper
[29] He P, Huang W, He T, et al. Single shot text detector with regional attention. The IEEE International Conference on Computer Vision (ICCV). 2017, 6(7). Paper Code
[30] Tian S, Lu S, Li C. Wetext: Scene text detection under weak supervision. ICCV, 2017. Paper
[31] Zhu, Xiangyu and Jiang, Yingying et al. Deep Residual Text Detection Network for Scene Text. ICDAR, 2017. Paper
[32] Tang Y , Wu X. Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks. IEEE Transactions on Image Processing, 2017, 26(3):1509-1520. Paper
[33] Yang C, Yin X C, Pei W Y, et al. Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework with Dynamic Programming. IEEE Transactions on Image Processing, 2017. Paper
[34] X. Ren, Y. Zhou, J. He, K. Chen, X. Yang and J. Sun, A Convolutional Neural Network-Based Chinese Text Detection Algorithm via Text Structure Modeling. in IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 506-518, March 2017. Paper
[35] Dai Y, Huang Z, Gao Y, et al. Fused text segmentation networks for multi-oriented scene text detection. arXiv preprint arXiv:1709.03272, 2017. Paper
[36] Jiang Y, Zhu X, Wang X, et al. R2CNN: rotational region CNN for orientation robust scene text detection. arXiv preprint arXiv:1706.09579, 2017. Paper
[37] Xing D, Li Z, Chen X, et al. ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene. arXiv preprint arXiv:1711.11249, 2017. Paper
[38] C. Wang, F. Yin and C. Liu, Scene Text Detection with Novel Superpixel Based Character Candidate Extraction. in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, pp. 929-934. Paper
[39] Sheng Zhang, Yuliang Liu, Lianwen Jin et al. Feature Enhancement Network: A Refined Scene Text Detector. In AAAI 2018. Paper
[40] Dan Deng et al. PixelLink: Detecting Scene Text via Instance Segmentation. In AAAI 2018. Paper Code
[41] Fangfang Wang, Liming Zhao, Xi L et al. Geometry-Aware Scene Text Detection with Instance Transformation Network. In CVPR 2018. Paper
[42] Zichuan Liu, Guosheng Lin, Sheng Yang et al. Learning Markov Clustering Networks for Scene Text Detection. In CVPR 2018. Paper
[43] Pengyuan Lyu, Cong Yao, Wenhao Wu et al. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. In CVPR 2018. Paper
[44] Minghui L, Zhen Z, Baoguang S. Rotation-Sensitive Regression for Oriented Scene Text Detection. In CVPR 2018. Paper
[45] Chuhui Xue et al. Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping. In ECCV 2018. Paper
[46] Long, Shangbang and Ruan, Jiaqiang, et al. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. In ECCV, 2018. Paper
[47] Qiangpeng Yang, Mengli Cheng et al. IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection. In IJCAI 2018. Paper
[48] Xiaoyu Yue et al. Boosting up Scene Text Detectors with Guided CNN. In BMVC 2018. Paper
[49] Liao M, Shi B , Bai X. TextBoxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, 2018, 27(8):3676-3690. Paper Code
[50] W. He, X. Zhang, F. Yin and C. Liu, Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression, in IEEE Transactions on Image Processing, vol. 27, no. 11, pp.5406-5419, 2018. Paper
[51] Ma J, Shao W, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals.in IEEE Transactions on Multimedia, 2018. Paper Code
[52] Youbao Tang and Xiangqian Wu. Scene Text Detection Using Superpixel-Based Stroke Feature Transform and Deep Learning Based Region Classification. In TMM, 2018. Paper
[53] Zhuoyao Zhong, Lei Sun and Qiang Huo. An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches. arXiv preprint arXiv:1804.09003. 2018. Paper
[54] Wenhai W, Enze X, et al. Shape Robust Text Detection with Progressive Scale Expansion Network. In CVPR 2019. Paper Code
[55] Zhu Y, Du J. Sliding Line Point Regression for Shape Robust Scene Text Detection. arXiv preprint arXiv:1801.09969, 2018. Paper
[56] Linjie D, Yanxiang Gong, et al. Detecting Multi-Oriented Text with Corner-based Region Proposals. arXiv preprint arXiv: 1804.02690, 2018. Paper Code
[57] Yongchao Xu, Yukang Wang, Wei Zhou, et al. TextField: Learning A Deep Direction Field for Irregular Scene Text Detection. arXiv preprint arXiv: 1812.01393, 2018. Paper
[58] Xiaowei Tian, Dao Wu, Rui Wang, Xiaochun Cao. Focal Text: an Accurate Text Detection with Focal Loss. In ICIP 2018. Paper
[59] Chenqin C, Pin L, Bing S. Feature Fusion Network for Scene Text Detection. In ICIP, 2018. Paper
[60] Sabyasachi Mohanty et al. Recurrent Global Convolutional Network for Scene Text Detection. In ICIP 2018. Paper
[61] Enze Xie, et al. Scene Text Detection with Supervised Pyramid Context Network. In AAAI 2019. Paper
[62] Youngmin Baek, Bado Lee, et al. Character Region Awareness for Text Detection. In CVPR 2019. Paper
[63] Yuliang L, Lianwen J, Shuaitao Z, et al. Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection. Pattern Recognition, 2019. Paper Code
[64] Jingchao Liu, Xuebo Liu, et al, Pyramid Mask Text Detector. arXiv preprint arXiv:1903.11800, 2019. Paper Code
[79] Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection. In AAAI, 2019. Paper Code
[80] Yuliang Liu, Lianwen Jin, et al, Omnidirectional Scene Text Detction with Sequential-free Box Discretization. In IJCAI, 2019.Paper Code
[81] Chengquan Zhang, Borong Liang, et al, Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes. In CVPR, 2019.Paper
[82] Xiaobing Wang, Yingying Jiang, et al, Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation. In CVPR, 2019. Paper
[83] Zhuotao Tian, Michelle Shu, et al, Learning Shape-Aware Embedding for Scene Text Detection. In CVPR, 2019. Paper
[84] Zichuan Liu, Guosheng Lin, et al, Towards Robust Curve Text Detection with Conditional Spatial Expansion. In CVPR, 2019. Paper
[85] Xue C, Lu S, Zhang W. MSR: multi-scale shape regression for scene text detection. In IJCAI, 2019. Paper
[86] Wang Y, Xie H, Fu Z, et al. DSRN: a deep scale relationship network for scene text detection. In IJCAI, 2019: 947-953. Paper
[87] Elad Richardson, et al, It's All About The Scale -- Efficient Text Detection Using Adaptive Scaling. In WACV, 2020. Paper
[88] Pengfei Wang, et al, A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning. In ACMM, 2019. Paper
[89] Jun Tang, et al, SegLink ++: Detecting Dense and Arbitrary-shaped Scene Text by Instance-aware Component Grouping. In PR, 2019. Paper
[90] Wenhai Wang, et al, Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. In ICCV, 2019. Paper
[91] Minghui Liao, et al, Real-time Scene Text Detection with Differentiable Binarization. In AAAI, 2020. PaperCode
[92] Wang, Yuxin, et al. ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection. CVPR. 2020. PaperCode
[93] Xiao, et al, Sequential Deformation for Accurate Scene Text Detection. In ECCV, 2020. Paper
                                                                                   Datasets
USTB-SV1K[65]:Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, Robust text detection in natural scene images, IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), priprint, 2013. Paper
SVT[66]: Wang,Kai, and S. Belongie. Word Spotting in the Wild. European Conference on Computer Vision(ECCV), 2010: 591-604. Paper
ICDAR2005[67]: Lucas, S: ICDAR 2005 text locating competition results. In: ICDAR ,2005. Paper
ICDAR2011[68]: Shahab, A, Shafait, F, Dengel, A: ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR, 2011. Paper
ICDAR2013[69]:D. Karatzas, F. Shafait, S. Uchida, et al. ICDAR 2013 robust reading competition. In ICDAR, 2013. Paper
ICDAR2015[70]:D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160, 2015. Paper
MSRA-TD500[71]:C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, Detecting texts of arbitrary orientations in natural images. in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012, pp.1083–1090.Paper
COCO-Text[72]:Veit A, Matera T, Neumann L, et al. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140, 2016. Paper
RCTW-17[73]:Shi B, Yao C, Liao M, et al. ICDAR2017 competition on reading chinese text in the wild (RCTW-17). Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 1429-1434. Paper
Total-Text[74]:Chee C K, Chan C S. Total-text: A comprehensive dataset for scene text detection and recognition.Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 935-942.Paper
SCUT-CTW1500[75]:Yuliang L, Lianwen J, Shuaitao Z, et al. Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection. Pattern Recognition, 2019.Paper
MLT 2017[76]: Nayef, N; Yin, F; Bizid, I; et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE. Paper
OSTD[77]: Chucai Yi and YingLi Tian, Text string detection from natural scenes by structure-based partition and grouping, In IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2594–2605, 2011. Paper
CTW[78]: Yuan T L, Zhu Z, Xu K, et al. Chinese Text in the Wild. arXiv preprint arXiv:1803.00085, 2018. Paper

如果您发现我们的资源中有任何问题,或者我们错过了任何好的论文/代码,请通过liuchongyu1996@gmail.com通知我们。 感谢您的贡献。

Copyright

Copyright © 2019 SCUT-DLVC. All Rights Reserved.

0

评论 (0)

打卡
取消