文档处理控件TX Text Control系列教程——.NET C#中基于图像的文档比较-控件新闻-慧都网

文档处理控件TX Text Control系列教程——.NET C#中基于图像的文档比较

翻译|使用教程|编辑：吉炜炜|2025-01-16 13:29:38.140|阅读 19 次

概述：虽然基于文本的比较方法很常见，但在某些情况下，基于图像的逐像素方法具有独特的优势。本文探讨了这种方法的适用情况和原因，并提供了突出其实用性和速度的示例和应用程序。

基于图像的文档比较

基于图像的文档比较将文档的页面渲染为图像，并逐个像素进行比较。这种方法不是通过编程分析文本内容、格式或定位，而是直接识别视觉差异。传统的基于文本的比较方法解析文档结构、提取文本、分析格式并检测位置差异。这个过程可能需要大量计算，尤其是对于布局复杂或格式繁杂的复杂文档。基于图像的比较跳过这些步骤并直接比较渲染的图像，这可以大大缩短处理时间。

基于文本的方法可能会忽略某些视觉差异，例如轻微的字体变化、对齐偏移或颜色变化。逐像素比较可以准确捕捉这些差异，使其成为视觉关键应用的理想选择。

比较文档

为了演示目的，我们将使用 TX Text Control 安装附带的演示文档。该文档有六页，包含 TX Text Control 的大部分功能。

在第一遍中，我们将获取文档的两个精确副本，并使用以下代码对它们进行比较。

								
												using static DocumentComparer; 
											
												string document1 = "demo1.tx"; 
											
												string document2 = "demo2.tx"; 
											
												// Get the comparison results 
											
												List<PageComparisonResult> comparisonResults = DocumentComparer.CompareDocuments(document1, document2); 
											
												// Generate and display the results 
											
												foreach (var result in comparisonResults) 
											
												{ 
											
												if (result.PageIndex == -1) 
											
												{ 
											
												// Special case for differing page counts 
											
												Console.WriteLine(result.Message); 
											
												} 
											
												else 
											
												{ 
											
												string message = result.AreEqual 
											
												? $"The document images of page {result.PageIndex + 1} are equal." 
											
												: $"The document images of page {result.PageIndex + 1} are different."; 
											
												Console.WriteLine(message); 
											
												} 
											
												}

运行此代码时，结果将如下所示，这意味着文档相同：

The document images of page 1 are equal. The document images of page 2 are equal. The document images of page 3 are equal. The document images of page 4 are equal. The document images of page 5 are equal. The document images of page 6 are equal.

现在让我们改变第 1 页第一段的字体，并缩小第 4 页图像的尺寸。

比较文档

当再次运行相同的代码时，结果将如下所示：

The document images of page 1 are different. The document images of page 2 are equal. The document images of page 3 are equal. The document images of page 4 are different. The document images of page 5 are equal. The document images of page 6 are equal.

执行

该类DocumentComparer是一个静态实用程序，用于逐页比较两个文档。它可帮助您了解文档在视觉上是否相同或存在差异。该CompareDocuments方法提供了比较两个文档的入口点。它使用服务器文本控件实例加载两个文档，并将每个文档转换为位图对象列表。

								
												public static List<PageComparisonResult> CompareDocuments(string documentPath1, string documentPath2) 
											
												{ 
											
												var comparisonResults = new List<PageComparisonResult>(); 
											
												using (var serverTextControl = new ServerTextControl()) 
											
												{ 
											
												serverTextControl.Create(); 
											
												// Load and render the first document 
											
												serverTextControl.Load(documentPath1, StreamType.InternalUnicodeFormat); 
											
												var bitmapsDocument1 = GetDocumentImages(serverTextControl); 
											
												// Load and render the second document 
											
												serverTextControl.Load(documentPath2, StreamType.InternalUnicodeFormat); 
											
												var bitmapsDocument2 = GetDocumentImages(serverTextControl); 
											
												// Compare pages 
											
												if (bitmapsDocument1.Count != bitmapsDocument2.Count) 
											
												{ 
											
												comparisonResults.Add(new PageComparisonResult 
											
												{ 
											
												PageIndex = -1, 
											
												AreEqual = false, 
											
												Message = "The documents have different page counts." 
											
												}); 
											
												return comparisonResults; // Return early if page counts differ 
											
												} 
											
												for (int i = 0; i < bitmapsDocument1.Count; i++) 
											
												{ 
											
												using (var bitmap1 = bitmapsDocument1[i]) 
											
												using (var bitmap2 = bitmapsDocument2[i]) 
											
												{ 
											
												comparisonResults.Add(new PageComparisonResult 
											
												{ 
											
												PageIndex = i, 
											
												AreEqual = !DocumentComparer.IsDifferent(bitmap1, bitmap2), 
											
												Message = null 
											
												}); 
											
												} 
											
												} 
											
												} 
											
												return comparisonResults; 
											
												}

每个位图代表一个渲染页面。该方法首先检查文档的页数是否相同。如果页数不同，它会立即返回一个结果，突出显示此差异。对于页数匹配的文档，该方法使用 IsDifferent 函数比较每个页面的渲染位图对象，识别任何视觉差异。

该GetDocumentImages方法从加载到 ServerTextControl 的文档中提取所有页面的高分辨率图像。每页都以 300 DPI 呈现，以保持高保真度并确保准确的基于像素的比较。

								
												private static List<Bitmap> GetDocumentImages(ServerTextControl serverTextControl) 
											
												{ 
											
												var bitmaps = new List<Bitmap>(); 
											
												var pages = serverTextControl.GetPages(); 
											
												for (int i = 1; i <= pages.Count; i++) 
											
												{ 
											
												// Get image for each page 
											
												bitmaps.Add(pages[i].GetImage(300, Page.PageContent.All)); 
											
												} 
											
												return bitmaps; 
											
												}

该IsDifferent方法通过逐字节比较像素数据来确定两个位图对象是否不同。如果图像的尺寸不同，则立即将其标记为不同。该方法锁定像素数据以实现高效访问，逐字节比较原始像素数据以查找不匹配，然后在比较完成后解锁数据。这种方法可确保准确检测出细微的视觉差异。

								
												public static bool IsDifferent(Bitmap bitmap1, Bitmap bitmap2) 
											
												{ 
											
												if (bitmap1 == null || bitmap2 == null) 
											
												{ 
											
												throw new ArgumentNullException("Bitmaps cannot be null."); 
											
												} 
											
												if (bitmap1.Width != bitmap2.Width || bitmap1.Height != bitmap2.Height) 
											
												{ 
											
												// Consider images different if dimensions are not the same. 
											
												return true; 
											
												} 
											
												// Lock the bits for both images for efficient pixel access. 
											
												var rect = new Rectangle(0, 0, bitmap1.Width, bitmap1.Height); 
											
												BitmapData data1 = bitmap1.LockBits(rect, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb); 
											
												BitmapData data2 = bitmap2.LockBits(rect, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb); 
											
												try 
											
												{ 
											
												// Compare pixel data byte by byte. 
											
												int bytes = data1.Stride * data1.Height; 
											
												byte[] buffer1 = new byte[bytes]; 
											
												byte[] buffer2 = new byte[bytes]; 
											
												System.Runtime.InteropServices.Marshal.Copy(data1.Scan0, buffer1, 0, bytes); 
											
												System.Runtime.InteropServices.Marshal.Copy(data2.Scan0, buffer2, 0, bytes); 
											
												for (int i = 0; i < bytes; i++) 
											
												{ 
											
												if (buffer1[i] != buffer2[i]) 
											
												{ 
											
												return true; 
											
												} 
											
												} 
											
												} 
											
												finally 
											
												{ 
											
												// Unlock the bits. 
											
												bitmap1.UnlockBits(data1); 
											
												bitmap2.UnlockBits(data2); 
											
												} 
											
												return false; 
											
												}

结论

基于 IImage 的文档比较提供了一种独特、非常快速且有效的方法来识别文档之间的视觉差异。通过将文档呈现为图像并逐个像素进行比较，此方法提供了一种快速而准确的方法来检测变化。这种方法对于视觉关键应用程序特别有用，因为基于文本的方法可能会忽略细微的差异。该DocumentComparer实用程序演示了如何使用 TX Text Control 实现基于图像的文档比较，从而为比较文档提供了一种实用且有效的解决方案。

产品试用下载、价格咨询、优惠获取，或其他任何问题，请联系。

标签：

本站文章除注明转载外，均为本站原创或翻译。欢迎任何形式的转载，但请务必注明出处、不得修改原文相关链接，如果存在内容上的异议请邮件反馈至chenjj@dpuzeg.cn

文章转载自：慧都网

上一篇：JetBrains AI：重新定义代码补全的未来下一篇：CAD处理控件Aspose.CAD教程：使用 Java 将 DXF 转换为 PNG

世俱杯 2025

在线采购

中文文档库

试用/Demo下载

开发社群