Towards Flexible Multi-Modal Document Models
迈向灵活的多模式文档模型
Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling
通过视觉对齐顺序坐标建模改进表格结构识别
Unifying Layout Generation with a Decoupled Diffusion Model
使用解耦扩散模型统一布局生成
Conditional Text Image Generation with Diffusion Models
使用扩散模型生成条件文本图像
Turning a CLIP Model into a Scene Text Detector
将 CLIP 模型转变为场景文本检测器
Unifying Vision, Text, and Layout for Universal Document Processing
统一视觉、文本和布局以实现通用文档处理
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
将实体建模为野外视觉信息提取的语义点
GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction
GeoLayoutLM:视觉信息提取的几何预训练
Handwritten Text Generation from Visual Archetypes
从视觉原型生成手写文本
Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution
实现文档图像中稳健的篡改文本检测:新数据集和新解决方案
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis
M6Doc:用于现代文档布局分析的大规模多格式、多类型、多布局、多语言、多注释类别数据集
Disentangling Writer and Character Styles for Handwriting Generation
解开书写者和字符风格以生成手写体