Intermediate Yens
6. Installing Software on the Yens
Throughout this course we will run and build upon a Python example. We will start with a serial script that we can run from the command line, via Jupyter notebook and via the scheduler. We will then parallelize the script, run it from the command line and via Slurm.
The python example depends on pytesseract
package which in turn uses Tesseract software
which is free and is developed by Google.
Tesseract is an optical character recognition (OCR) tool and pytesseract
package is a Tesseract wrapper for python.
Tesseract recognizes the text embedded in images.
The yens have a default version of Tesseract installed but we can install a newer version of it.
See this guide for details on how to install software in your home directory or any location where you have permissions (such as a shared project space).
Check the default Tesseract version already installed on the yens:
$ tesseract --version
You should see:
tesseract 4.0.0-beta.1
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found SSE
Following the guide above, a newer version of Tesseract (5.2.0) was
installed in a shared project space, /zfs/gsb/intermediate-yens/software/tesseract-5.2.0
. So, all we need to do is to make sure the path to the new Tesseract binary
can be found.
Add a path to the bin directory in your bash profile.
$ echo 'export PATH=/zfs/gsb/intermediate-yens/software/tesseract-5.2.0/bin:$PATH' >> ~/.bash_profile
$ echo 'export TESSDATA_PREFIX=/zfs/gsb/intermediate-yens/software/tesseract-5.2.0/tessdata' >> ~/.bash_profile
Now, we can call tesseract
executable from anywhere on the yen’s file system.
Source the bash profile to execute the added export PATH
command.
$ source ~/.bash_profile
Check the new tesseract version:
$ tesseract --version
You should see the updated version:
tesseract 5.2.0
leptonica-1.82.0
libgif 5.1.9 : libjpeg 8d (libjpeg-turbo 2.1.1) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2 : libopenjp2 2.4.0
Found AVX
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.6.0 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.4.8
Found libcurl/7.81.0 OpenSSL/3.0.2 zlib/1.2.11 brotli/1.0.9 zstd/1.4.8 libidn2/2.3.2 libpsl/0.21.0 (+libidn2/2.3.2) libssh/0.9.6/openssl/zlib nghttp2/1.43.0 librtmp/2.3 OpenLDAP/2.5.13
Connect with us