Deploy a Streamlit App on Heroku with OCR

NamyaLG
3 min readAug 10, 2021

Through this blog post, I will walk you through how you can deploy a Streamlit App that uses OCR (Optical character Recognition) on Heroku.

Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In just a few minutes you can build and deploy powerful data apps. Find more about Streamlight at https://docs.streamlit.io/en/stable/

Image source: Wikipedia

Heroku, on the other hand, is a platform as a service (PaaS) that enables developers to build, run, and operate applications entirely in the cloud.

Image source: Wikipedia

Context

To give a little context on why I had to use these services, I was building a project that involved translating Non-editable PDFs written in Dutch to English, and performing sentiment analysis on the text.

Non-editable PDFs are those PDFs that are basically images that are turned into PDFs, thus disabling functionalities such as copy-paste from the PDFs. To translate these documents, passing them through Google Translate does not help either, the only way text can be extracted from these PDFs is to perform Optical Character Recognition or OCR.

This is an example of a non-editable PDF in Dutch, from a Dutch newspaper, https://github.com/Namyalg/Dutch-To-English/blob/main/newpaper.pdf

The application was developed using Streamlit and deployed on Heroku.

Hosting and Deployment

This is a link to the repository which has been hosted

Support for all languages can be found here: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html

For the google-trans-new API, support can be found with regard to the different languages supported at: https://github.com/lushan88a/google_trans_new/blob/main/constant.py

Using the Google Translation API

The most important thing in order to perform the action of OCR is to ensure the right buildpacks are included in the Heroku setup.

The ones that I have used (after a lot of research are) :

heroku/python

In the settings tab under the Heroku Application, there is a provision to add buildpacks, it looks like this

Choosing buildpacks

The Python build pack can be chosen here, for the others, the URLs must be types and saved here.

--

--

NamyaLG

Tech-enthusiast | Runner_for_life | #NoHumanIsLimited