Automating Transcription with ChatGPT

I recently discovered a trove of online newspaper archives for an region where a branch of my family research is focused. Advantage Archives partners with mostly small-town libraries, especially in the midwestern USA, to digitize old newspaper archives and put them online.

The discovery of all these newpaper archives has led me to want to transcribe and write source citations for hundreds of articles. You know what’s tedious? Transcribing and citing hundreds of articles! This is a job for automation!

I have found ChatGPT to be excellent at transcribing old newspaper articles. On my Mac I press ⌘⇧4 and select a rectangular region of the article/text I want to transcribe – this generates a screenshot, which I then drag into a ChatGPT chat where I had said “Transcribe these newpaper clippings.” This outputs text that I can easily copy and then paste to my destination. You can get a ton of mileage out this without doing anything more.

But even this gets tedious when you have hundreds of articles to transcribe!

Here I turn to my old friend Hazel. Hazel is a hall-of-fame file-based automation tool on Mac.

For a long time I haven’t liked how MacOS clutters the desktop with screenshots and I don’t the naming convention it uses for these files. So I have a Hazel rule that watches my Desktop folder for screenshots, then renames them “Screen Shot YYYY-MM-DDTHH.MM.ss.png” and moves them to a dedicated screenshots folder.

But the screenshot → copy/paste to ChatGPT → copy/paste out of ChatGPT flow is also tedious after a few dozen times. So I set out to automate this. I installed Simon Willison’s excellent command-line tool llm, and I set it up with an OpenAPI API key. Now to put the automation all together…

Here I have another Hazel rule that monitors the screenshots directory for images <500KB that were modified in the last 2 minutes. If it sees matching files it calls a transcribe.sh with the filename.

What is transcribe.sh?

#!/usr/bin/env zsh

llm "Transcribe this image to plain text. Output only the text of the image and no other response text." -a $1 | sed '${/^$/d;}' | pbcopy
say transcribed

And the full flow is now in place! When I take a screenshot of rectangular region of an article:

  1. Hazel rule moves the screenshot to a screenshots folder and renames it

  2. Hazel rule detects new image matching target size and runs transcribe.sh <filename>

  3. transcribe.sh uses llm to pass the file to ChatGPT, which copies the transcribed text to the clipboard and then the computer says “transcribed” so I know it’s complete.

    The whole process typically takes less than 5 seconds – I snap a screenshot, then I hear “transcribed” at which point I can paste the transcription text where I want.

    Note: This is adapted from an earlier post on Mastodon.