Very informative article!
I’m wondering how the HTML/HTTP response gets transformed into plain text. Presumably, there’s a preprocessing step that extracts the content from the page. I’m curious to understand the limitations of that preprocessing.