Automation to parse all AI news on the web and summarize

When I started building an automated news digest for AI content, I assumed the hard part would be the AI. The prompt work: getting the model to pick signal over noise, write summaries that didn't embellish what the source actually said, stay consistent across categories. That took ten minutes. The rest of the afternoon was plumbing.

The system runs every morning at 8am: an n8n workflow that pulls from 130+ sources, merges two parallel branches, deduplicates against a rolling 7-day window, and hands a batch to Claude Haiku. Sources live in a Notion database with an active flag, cadence, and category. Adding a source means adding a row, not touching workflow code. YouTube channels go through a self-hosted RSSHub instance to get an RSS URL; newsletters land in a Gmail label and take a separate branch. The dedup window hashes each item's URL and stores it in n8n workflow static data, dropping anything seen in the last seven days. The model returns a JSON object with the top 15 items and a top 5 picks list, which gets parsed and written as a structured Notion page.

text

RSS feeds ─────────────────────┐
YouTube (via RSSHub) ──────────┤
Reddit / HuggingFace ──────────┼──► 7-day dedup ──► LLM (top 15) ──► Notion
                               │
Gmail newsletters (ai-inbox) ──┘

Four bugs before it ran clean. None were LLM bugs.

The first was invisible. n8n's task runner sandbox doesn't expose the fetch() global. Every request silently failed, the catch blocks swallowed the errors, the results array stayed empty, and the node reported success in about 4ms. No warning, no output. Just a very fast empty result. The fix was switching to axios via NODE_FUNCTION_ALLOW_EXTERNAL.

The second was stranger. Once axios was working, the node started crashing the task runner process with no useful message. Root cause: Node.js v24 made Error.name read-only. When any feed returned a non-200 response, axios tried to construct a new AxiosError, which sets this.name = 'AxiosError' on that now-read-only property, throws a TypeError, and crashes the process before any catch block runs. The fix is validateStatus: () => true, which tells axios to treat every HTTP status code as success so the error constructor is never reached:

// validateStatus prevents AxiosError construction on non-200 responses;
// check status manually instead
const res = await axios.get(url, { validateStatus: () => true });
if (res.status !== 200) { continue; }

The third was the LLM's JSON. Claude Haiku occasionally puts unescaped double quotes inside JSON string values, which JSON.parse rejects. I added a prompt instruction to avoid it, then built a state-machine repairer as a fallback: walk the string character-by-character, track whether you're inside a quoted value, and replace any " not followed by a structural character with a single quote.

let inString = false, out = '';
for (let i = 0; i < raw.length; i++) {
  const ch = raw[i], next = raw[i + 1];
  if (ch === '"') {
    if (inString) {
      // structural close: comma, brace, bracket, or colon
      if (',}]:'.includes(next)) { inString = false; out += ch; }
      else { out += "'"; } // interior quote — replace with single
    } else {
      inString = true; out += ch;
    }
  } else { out += ch; }
}

Two layers of defense for one edge case is about right when the alternative is silently empty Notion pages.

The fourth was timing. An earlier version had the Gmail branch mark emails as processed immediately after normalizing them, running in parallel with the RSS fetch. When the workflow failed mid-run, emails were gone from the inbox with nothing written to Notion. The fix: move Gmail marking to the end, after the page is created, reading pending message IDs from workflow static data.

text

Schedule (8am CT)
├── Notion sources DB
│   └── Filter active + cadence
│       └── Fetch + parse + 7-day dedup ──────────────┐
│                                                       │
└── Gmail (ai-inbox)                                   │
    └── Normalize ──────────────────────────────────►──┘
                                                       │
                                                     Merge
                                                       │
                                                Filter has items
                                                       │
                                            Build prompt (~150 items)
                                                       │
                                          OpenRouter / Claude Haiku
                                                       │
                                           Build Notion page blocks
                                                       │
                                            Write Notion page
                                                       │
                                          Mark Gmail processed

The "Filter has items" step exists because an empty batch still calls the LLM and creates a blank Notion page. If all sources are down or everything was already deduped, the workflow aborts there instead.

The model does selection and brief summarization, not analysis. It reads 150 items and picks 15, writes two-sentence summaries that reflect what the source said without editorializing. Haiku is the right model for this: fast, cheap, and picking the best headlines from a batch doesn't need reasoning depth. The output is a page with a top picks section followed by the full digest grouped by category. The prompt took a few iterations to tune the selection criteria but has been stable since.

What I didn't think about going in was the output surface. An earlier version sent email. Email has a finality: you read it once, it archives, the thread is gone. Notion accumulates. Each page sits in a database alongside every previous digest, searchable and filterable across time. The digest becomes a record, not just a feed.

The AI part runs in one HTTP call. What makes it reliable every morning is deduplication, error handling, and getting the Gmail timing right. The model reads 150 headlines and hands me 15. The hard part was getting all 150 there.

Automation to parse all AI news on the web and summarize

Read next