Import Ghostscript 9.54ghostscript-9.54

Signed-off-by: Thomas Deutschmann <whissi@gentoo.org>
author: Thomas Deutschmann <whissi@gentoo.org> 2021-03-30 10:59:39 +0200
committer: Thomas Deutschmann <whissi@gentoo.org> 2021-04-01 00:04:14 +0200
commit: 5ff1d6955496b3cf9a35042c9ac35db43bc336b1 (patch)
tree: 6d470f7eb448f59f53e8df1010aec9dad8ce1f72 /doc/Devices.htm
parent: Import Ghostscript 9.53.1 (diff)
download: ghostscript-gpl-patches-5ff1d6955496b3cf9a35042c9ac35db43bc336b1.tar.gz
ghostscript-gpl-patches-5ff1d6955496b3cf9a35042c9ac35db43bc336b1.tar.bz2
ghostscript-gpl-patches-5ff1d6955496b3cf9a35042c9ac35db43bc336b1.zip
1 files changed, 68 insertions, 16 deletions
diff --git a/doc/Devices.htm b/doc/Devices.htm
index b8be9431..c1048e59 100644
--- a/doc/Devices.htm
+++ b/doc/Devices.htm
@@ -38,7 +38,6 @@
             <li><a href="https://www.ghostscript.com/">Home</a></li>
             <li><a href="https://www.ghostscript.com/license.html">Licensing</a></li>
             <li><a href="https://www.ghostscript.com/releases.html">Releases</a></li>
-            <li><a href="https://www.ghostscript.com/release_history.html">Release History</a></li>
             <li><a href="https://www.ghostscript.com/documentation.html" title="Documentation">Documentation</a></li>
             <li><a href="https://www.ghostscript.com/download.html" title="Download">Download</a></li>
             <li><a href="https://www.ghostscript.com/performance.html" title="Performance">Performance</a></li>
@@ -71,13 +70,18 @@
 <li><a href="#BMP">BMP file format</a></li>
 <li><a href="#PCX">PCX file format</a></li>
 <li><a href="#PSD">PSD file format (DeviceN color model)</a></li>
+<li><a href="#PDFimage">Bitmap PDF output, PCLm output</a></li>
+</ul>
+<li><a href="#OCR-Devices">OCR Devices</a></li>
+<ul>
+<li><a href="#OCR">OCR text output</a></li>
+<li><a href="#PDFocr">Bitmap PDF output (with OCR text)</a></li>
+<li><a href="#PDFwriteocr">Vector PDF output (with OCR Unicode CMaps)</a></li>
 </ul>
 <li><a href="#High-level">High level formats</a></li>
 <ul>
 <li><a href="#PDF">PDF file output</a></li>
-<li><a href="#PDFimage">Bitmap PDF output, PCLm output</a></li>
 <li><a href="#OCR">OCR devices</a></li>
-<li><a href="#PDFocr">Bitmap PDF output (with OCR text)</a></li>
 <li><a href="#PS">PostScript file output</a></li>
 <li><a href="#EPS">EPS file output</a></li>
 <li><a href="#PXL">PCL-XL file output</a></li>
@@ -955,7 +959,11 @@ of 'high-level' formats. These allow Ghostscript to preserve (as much as
 possible) the drawing elements of the input file maintaining flexibility,
 resolution independence, and editability.</p>
 
-<h3><a name="OCR"></a>Optical Character Recognition (OCR) output</h3>
+<hr>
+
+<h2><a name="OCR-Devices"></a>Optical Character Recognition (OCR) devices</h2>
+
+<h3><a name="OCR"></a>OCR text output</h3>
 
 <p>
   These devices render internally in 8 bit greyscale, and then
@@ -967,18 +975,29 @@ resolution independence, and editability.</p>
 <p>
   The Tesseract engine relies on files to encapsulate each
   language and/or script. These &quot;traineddata&quot; files
-  are available in different forms, including <a href="github.com/tesseract-ocr/tessdata_fast">fast</a>
-  and <a href="tesseract-ocr/tessdata_best">best</a> variants.
+  are available in different forms, including <a href="http://github.com/tesseract-ocr/tessdata_fast">fast</a>
+  and <a href="http://github.com/tesseract-ocr/tessdata_best">best</a> variants.
   Alternatively, people can train their own data using the
   standard Tesseract tools.
 </p>
 <p>
-  These files are looked for from a variety of places. Firstly,
-  any files placed in &quot;Resource/Tesseract/&quot; will be
-  included in the binary for any standard (COMPILE_INITS=1) build.
-  Secondly, files will be searched for in the current directory.
-  Thirdly, files will be searched for in the directory given by
-  the environment variable TESSDATA_PREFIX.
+  These files are looked for from a variety of places.
+</p>
+<ul>
+  <li>Firstly, files will be searched for in the directory given by the
+    environment variable TESSDATA_PREFIX.
+  <li>Next, they will be searched for within the ROM filing system. Any
+    files placed in &quot;tessdata&quot; will be included within the ROM
+    filing system in the binary for any standard (COMPILE_INITS=1) build.
+  <li>Next, files will be searched for in the configured 'tessdata' path. On
+    Unix, this can be specified at the configure stage using
+    '--with-tessdata=&lt;path&gt;' (where &lt;path&gt; is a list of
+    directories to search, separated by ':' (on Unix) or ';' (on Windows)).
+  <li>Finally, we resort to searching the current directory.
+</ul>
+<p>
+  Please note, this pattern of directory searching differs from the original
+  release of the OCR devices.
 </p>
 <p>
   By default, the OCR process defaults to looking for English text,
@@ -993,7 +1012,7 @@ resolution independence, and editability.</p>
   Arabic:</dd></dl>
 <blockquote>
 <pre>
- <kbd>gs -sDEVICE=ocr -r200 -sOCRLanguage="eng,ara" -o out.txt\
+ <kbd>gs -sDEVICE=ocr -r200 -sOCRLanguage="eng+ara" -o out.txt\
       zlib/zlib.3.pdf</kbd>
 </pre>
 </blockquote>
@@ -1041,6 +1060,39 @@ resolution independence, and editability.</p>
 </p>
 <p>
 
+<h3><a name="PDFwriteocr"></a>Vector PDF output (with OCR Unicode CMaps)</h3>
+<p>
+The pdfwrite device has been augmented to use the OCR engine to analyse text
+(not images!) in the input stream, and derive Unicode code points for it.
+That information can then be used to create ToUnicode CMaps which are attached
+to the Font (or CIDFont) objects embedded in the PDF file.
+</p>
+<p>
+Fonts which have ToUnicode CMaps can be reliably (limited by the accuracy of
+the CMap) used in search and copy/paste functions, as well as text extraction
+from PDF files. Note that OCR is not a 100% perfect process; it is possible
+that some text might be misidentified.
+</p>
+<p>
+OCR is a slow operation! In addition it can (for Latin text at least) sometimes
+be preferable not to add ToUnicode information which may be incorrect, but instead
+to use the existing font Encoding. For English text this may give better results.
+</p>
+<p>For these reasons the OCR functionality of pdfwrite can be controlled by using a new
+parameter <code>-sUseOCR</code>. This has three possible values;
+</p>
+<dt><code>-sUseOCR=</code><b><em>string</em></b></dt>
+<dd>
+  <dl>
+    <dt>Never<dd>Default - don't use OCR at all even if support is built-in.
+    <dt>AsNeeded<dd>If there is no existing ToUnicode information, use OCR.
+    <dt>Always<dd>Ignore any existing information and always use OCR.
+  </dl>
+</dd>
+</p>
+
+<hr>
+
 <h2><a name="High-level"></a>High-level devices</h2>
 
 <h3><a name="PDF"></a>PDF writer</h3>
@@ -2081,7 +2133,7 @@ spot colors.</p>
 <hr>
 
 <p>
-<small>Copyright &copy; 2000-2020 Artifex Software, Inc.  All rights reserved.</small>
+<small>Copyright &copy; 2000-2021 Artifex Software, Inc.  All rights reserved.</small>
 
 <p>
 This software is provided AS-IS with no warranty, either express or
@@ -2094,7 +2146,7 @@ or contact Artifex Software, Inc.,  1305 Grant Avenue - Suite 200,
 Novato, CA 94945, U.S.A., +1(415)492-9861, for further information.
 
 <p>
-<small>Ghostscript version 9.53.1, 14 September 2020
+<small>Ghostscript version 9.54.0, 30 March 2021
 
 <!-- [3.0 end visible trailer] ============================================= -->
 
@@ -2122,7 +2174,7 @@ Novato, CA 94945, U.S.A., +1(415)492-9861, for further information.
            </ul>
         </div>
         <div class="col-ft-3 footright"><img src="images/Artifex_logo.png" width="194" height="40" alt=""/> <br>
-              © Copyright 2019 Artifex Software, Inc. <br>
+              © Copyright 2019-2021 Artifex Software, Inc. <br>
             All rights reserved.
         </div>
           </div>
author	Thomas Deutschmann <whissi@gentoo.org>	2021-03-30 10:59:39 +0200
committer	Thomas Deutschmann <whissi@gentoo.org>	2021-04-01 00:04:14 +0200
commit	5ff1d6955496b3cf9a35042c9ac35db43bc336b1 (patch)
tree	6d470f7eb448f59f53e8df1010aec9dad8ce1f72 /doc/Devices.htm
parent	Import Ghostscript 9.53.1 (diff)
download	ghostscript-gpl-patches-5ff1d6955496b3cf9a35042c9ac35db43bc336b1.tar.gz ghostscript-gpl-patches-5ff1d6955496b3cf9a35042c9ac35db43bc336b1.tar.bz2 ghostscript-gpl-patches-5ff1d6955496b3cf9a35042c9ac35db43bc336b1.zip