Skip to content

AI, Regex, Twitter, Blockquotes, Readwise, Ghost

This may sound like a laundry list of random buzzword items -- but in reality these are all stakeholders to complete one seemingly simple task: converting blockquotes from Readwise format into Ghost Twitter Card format.

First, I'd like to go on record stating that I hate writing regex. It feels like writing cursive during middle school, with all the weird symbols and arbitrary rules abound.

you cant just have the b like that, bro.
why not?
it'll get confused with the f.
…what?
https://bram-adams.ghost.io/content/images/2023/02/cursive.png
cursive.png

Instead of sticking to the regular game of conditionals like everyone else, Regex decides to be "unique" and force the programmer to create the whole damn matcher in one go: if you don't match even a single character, you're shit out of luck.

Even with the power of AI on my side, I struggled for hours to make this simple regex. The rules I wanted:

  1. Match lines that start with '>'
  2. Reject any blockquotes that don't end with a Twitter link
  3. Swallow the ENTIRE blockquote and replace it with the correct Ghost HTML.

It was #3 that really fucked me. Tweets came in a lot of different styles like:


// multiline blockquote with multiple URLs

<figure class="kg-card kg-embed-card"><div class="twitter-tweet twitter-tweet-rendered"><iframe src="https://twitter.com/tayroga/status/1621335910808948736?ref_src=twsrc%5Etfw" width="550" height="550" frameborder="0" scrolling="no" allowfullscreen="true" style="border: none; max-width: 100%; min-width: 100%;"></iframe></div></figure>

// multiline blockquote -- no URLs

<figure class="kg-card kg-embed-card"><div class="twitter-tweet twitter-tweet-rendered"><iframe src="https://twitter.com/BrianNorgard/status/1621348809627541505?ref_src=twsrc%5Etfw" width="550" height="550" frameborder="0" scrolling="no" allowfullscreen="true" style="border: none; max-width: 100%; min-width: 100%;"></iframe></div></figure>


// single line with Twitter video embedded

<figure class="kg-card kg-embed-card"><div class="twitter-tweet twitter-tweet-rendered"><iframe src="https://twitter.com/Money__Doug/status/1620546196560551936?ref_src=twsrc%5Etfw" width="550" height="550" frameborder="0" scrolling="no" allowfullscreen="true" style="border: none; max-width: 100%; min-width: 100%;"></iframe></div></figure>

ChatGPT + Codex to the Rescue (kinda)

I started humbly in JS with this comment to Copilot:

// take the url from view tweet format and replace the entire blockquote with a tweet embed iframe

That created something that successfully extracted the tweet URL, but it failed miserably at eating the rest of the blockquote.

ChatGPT didn't fare much better. It got pretty obsessed with isolating the status ID from the string despite my pleading:

https://bram-adams.ghost.io/content/images/2023/02/chatgpt-regex-1.png
chatgpt regex 1.png
https://bram-adams.ghost.io/content/images/2023/02/chatgpt-regex-2.png
chatgpt regex 2.png
https://bram-adams.ghost.io/content/images/2023/02/chatgpt-regex-3.png
chatgpt regex 3.png
https://bram-adams.ghost.io/content/images/2023/02/chatgpt-regex-4.png
chatgpt regex 4.png

also i did not know that ChatGPT did images, whaddya know?

Eventually ChatGPT gave me something usable:

/^>.*\n.*\(https:\/\/twitter.com\/.*\/status\/(\d+)\)/s

which worked…kinda.

Regexr is the 🐐

At the end of the day, I had to use my boring soggy, water-logged ape brain to solve the Regex debacle. I played with placement of * and + for what felt like eternity in Regexr, and finally, blindly, stumbled upon the right answer.

It turns out I still had to hack it by artifically inserting a newline in front of View Tweet which defeats the purpose, but whatever, fuck it.

Full code below:

// take the url from view tweet format and replace the entire blockquote with a tweet embed iframe

// add a new line before every 
([View Tweet]

data.content = data.content.replace(/\(\[View Tweet\]/gm, "\n
([View Tweet]");

  

data.content = data.content.replace(

/(^>.*\n.*)*\(https:\/\/twitter.com\/(.*)\/status\/(\d+)\)\)/gm,

(match: any, p1: string, p2: string, p3: string) => {

console.log("p1", p1);

console.log("p2", p2);

  
  

const url = `https://twitter.com/${p2}/status/${p3}?ref_src=twsrc%5Etfw`;

return `<figure class="kg-card kg-embed-card"><div class="twitter-tweet twitter-tweet-rendered"><iframe src="${url}" width="550" height="550" frameborder="0" scrolling="no" allowfullscreen="true" style="border: none; max-width: 100%; min-width: 100%;"></iframe></div></figure>`;

}

)

All I can truly say is the creators regexr deserve a Nobel Peace Prize. They've single handedly prevented so many wars caused out of sheer frustration from dealing with the eccentricities of Regex.

To celebrate, here's a tweet, straight from Readwise in Obsidian, delivered hand-picked into Ghost.

And to you, Regex?

If there's a hell, I'll see you there.

Comments