<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <title>Chandra Kumar Reddy</title>
  <subtitle>Engineering notes on distributed systems, event-driven services, and scaling production software on AWS.</subtitle>
  <link href="https://chandrakumarreddy.github.io/feed.xml" rel="self" />
  <link href="https://chandrakumarreddy.github.io/" />
  <updated>2026-06-21T00:00:00Z</updated>
  <id>https://chandrakumarreddy.github.io/</id>
  <author>
    <name>OrbitShift Engineering</name>
  </author>
  <entry>
    <title>From Documents to Answers: Building a Full RAG Pipeline with Chunking and an LLM</title>
    <link href="https://chandrakumarreddy.github.io/rag-chunking-llm/" />
    <updated>2026-06-20T00:00:00Z</updated>
    <id>https://chandrakumarreddy.github.io/rag-chunking-llm/</id>
    <content type="html">&lt;p&gt;&lt;img src=&quot;https://chandrakumarreddy.github.io/assets/rag-chunking-llm/hero.png&quot; alt=&quot;Three source documents being split into overlapping chunks, indexed, retrieved, and answered by an LLM agent&quot;&gt;&lt;/p&gt;
&lt;p&gt;Retrieval, chunking, and generation each get their own tutorials. Wiring them together into something you can actually run against a real repository is where the interesting decisions happen. This post walks through how &lt;code&gt;chunk_documents()&lt;/code&gt;, a minsearch index, and a pydantic-ai agent connect into one pipeline that answers questions about any GitHub repo&#39;s documentation.&lt;/p&gt;
&lt;h2&gt;The problem: whole documents make poor context&lt;/h2&gt;
&lt;p&gt;The project could already search documents — &lt;code&gt;GithubSource&lt;/code&gt; fetched markdown files from a GitHub repo and &lt;code&gt;minsearch&lt;/code&gt; scored them against a query. But the unit of retrieval was the entire document. A 4,000-word setup guide came back as a single blob.&lt;/p&gt;
&lt;p&gt;That fails in two directions. Fed to an LLM, a long document buries the relevant paragraph in noise and consumes context window budget fast. Shown to a user, it&#39;s a wall of text to wade through. Neither is an answer.&lt;/p&gt;
&lt;p&gt;The second gap was generation itself. Retrieval existed; synthesis didn&#39;t. You could find documents, but nothing composed a reply from them.&lt;/p&gt;
&lt;h2&gt;The approach: overlap at the seams, an agent for answers&lt;/h2&gt;
&lt;p&gt;Two changes: &lt;strong&gt;chunk before indexing&lt;/strong&gt;, and add a &lt;strong&gt;pydantic-ai agent for generation&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Chunking lives on &lt;code&gt;GithubSource&lt;/code&gt; as a method rather than a separate preprocessor, which keeps the call site clean:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;documents = github.fetch()
chunks    = github.chunk_documents(documents)
index     = build_index(chunks)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The alternative — a standalone &lt;code&gt;Chunker&lt;/code&gt; class — would have added a layer of indirection without adding clarity at this stage. The document-loading abstraction already owns everything about how docs are fetched and shaped; chunking is a natural extension of that responsibility.&lt;/p&gt;
&lt;p&gt;For generation, pydantic-ai connects to an LLM through OpenRouter. The streaming interface (&lt;code&gt;agent.run_stream&lt;/code&gt;) shows answers token by token — the right feel when a response takes several seconds. Using the free tier keeps the pipeline zero-cost during iteration.&lt;/p&gt;
&lt;p&gt;A vector store with embeddings would catch more semantic similarity, but it adds an embedding model, a vector database, and network latency. minsearch&#39;s TF-IDF approach is the correct trade-off here: in-process, zero-infrastructure, and fast enough for interactive use.&lt;/p&gt;
&lt;h2&gt;Implementation highlights&lt;/h2&gt;
&lt;h3&gt;Chunking with overlap&lt;/h3&gt;
&lt;p&gt;A sliding window with &lt;code&gt;chunk_size=1000&lt;/code&gt; and &lt;code&gt;overlap=200&lt;/code&gt; gives an 800-character stride. Without overlap, a sentence that crosses a chunk boundary gets split and scores poorly against any query targeting that sentence. Overlap ensures boundary text lands fully inside at least one chunk.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def chunk_documents(self, documents, chunk_size=1000, overlap=200):
    for doc in documents:
        content = doc[&amp;quot;content&amp;quot;]
        for i in range(0, len(content), chunk_size - overlap):
            chunk = content[i : i + chunk_size]
            chunks.append({
                &amp;quot;content&amp;quot;: chunk,
                &amp;quot;title&amp;quot;: doc[&amp;quot;title&amp;quot;],
                &amp;quot;description&amp;quot;: doc[&amp;quot;description&amp;quot;],
                &amp;quot;file_name&amp;quot;: doc[&amp;quot;file_name&amp;quot;],
            })
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each chunk inherits the parent document&#39;s &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, and &lt;code&gt;file_name&lt;/code&gt;. That metadata travels through the index and comes back with search results — how &lt;code&gt;format_context()&lt;/code&gt; can cite source files in the LLM prompt without a second lookup.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Known bug:&lt;/strong&gt; The current diff iterates over &lt;code&gt;range(0, len(doc), ...)&lt;/code&gt; — the dict length (always 4), not the content string — and slices to &lt;code&gt;1+chunk_size&lt;/code&gt; instead of &lt;code&gt;i+chunk_size&lt;/code&gt;. So every document currently produces four identical copies of its first 1,000 characters. The fix is &lt;code&gt;range(0, len(content), chunk_size - overlap)&lt;/code&gt; and &lt;code&gt;content[i : i+chunk_size]&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;Text fields vs. keyword fields&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;index = Index(
    text_fields=[&amp;quot;title&amp;quot;, &amp;quot;description&amp;quot;, &amp;quot;content&amp;quot;],
    keyword_fields=[&amp;quot;file_name&amp;quot;],
)
index.fit(chunks)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Keeping &lt;code&gt;file_name&lt;/code&gt; as a keyword field keeps it out of BM25 relevance scoring. It&#39;s for exact-match filtering (&amp;quot;only search docs under &lt;code&gt;docs/reference/&lt;/code&gt;&amp;quot;), not semantic matching. At query time, &lt;code&gt;boost_dict={&amp;quot;content&amp;quot;: 1.5}&lt;/code&gt; nudges the scorer to weight the passage text more than the title, which sharpens precision for prose questions.&lt;/p&gt;
&lt;h3&gt;A domain-agnostic system prompt&lt;/h3&gt;
&lt;p&gt;Instead of hardcoding &amp;quot;you are a documentation assistant for Evidently,&amp;quot; the &lt;code&gt;SYSTEM_PROMPT&lt;/code&gt; tells the agent to answer about &amp;quot;whatever documentation is supplied at runtime.&amp;quot; Swap &lt;code&gt;GITHUB_URL&lt;/code&gt; to a different repo and the agent adapts with no prompt change.&lt;/p&gt;
&lt;p&gt;Three behavioral contracts are spelled out explicitly: &lt;strong&gt;grounding&lt;/strong&gt; (no claim without supporting context), &lt;strong&gt;self-correction&lt;/strong&gt; (infer intent → select relevant passages → reconcile conflicts → re-check draft), and &lt;strong&gt;session learning&lt;/strong&gt; (treat corrections as durable signal across turns).&lt;/p&gt;
&lt;h3&gt;Attribution for free&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def format_context(results):
    blocks = []
    for doc in results:
        blocks.append(
            f&amp;quot;Source: {doc[&#39;file_name&#39;]}&#92;n&amp;quot;
            f&amp;quot;Title:  {doc[&#39;title&#39;]}&#92;n&amp;quot;
            f&amp;quot;{doc[&#39;content&#39;]}&amp;quot;
        )
    return &amp;quot;&#92;n&#92;n---&#92;n&#92;n&amp;quot;.join(blocks)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Labeling each block with its source file lets the model cite files in its answer — no post-processing. The &lt;code&gt;---&lt;/code&gt; separators read as section boundaries to most models.&lt;/p&gt;
&lt;h2&gt;The pipeline&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://chandrakumarreddy.github.io/assets/rag-chunking-llm/pipeline.png&quot; alt=&quot;Two-phase RAG pipeline: indexing runs once, query loop runs per question&quot;&gt;&lt;/p&gt;
&lt;p&gt;The index is built once at startup and reused for every query in the session. Building it per question would re-download and re-chunk the whole repo each time — expensive and unnecessary. The query loop is: user input → &lt;code&gt;index.search()&lt;/code&gt; → &lt;code&gt;format_context()&lt;/code&gt; → &lt;code&gt;build_prompt()&lt;/code&gt; → streaming LLM response.&lt;/p&gt;
&lt;p&gt;The streaming loop is worth a look:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;async with agent.run_stream(prompt) as result:
    async for chunk in result.stream_text(delta=True):
        print(chunk, end=&amp;quot;&amp;quot;, flush=True)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;flush=True&lt;/code&gt; is what keeps the terminal responsive. Without it, the answer buffers and appears all at once after a delay — which feels slow and broken for a long generation.&lt;/p&gt;
&lt;h2&gt;What shipped&lt;/h2&gt;
&lt;p&gt;Three new capabilities from three files: &lt;code&gt;githubsource.py&lt;/code&gt; gets chunked document splitting, &lt;code&gt;rag_chunk_docs.py&lt;/code&gt; provides a retrieval-only harness for debugging the index without spending an LLM call, and &lt;code&gt;rag_search_chunk_llm.py&lt;/code&gt; is the end-to-end demo you can point at any public GitHub repo by changing &lt;code&gt;GITHUB_URL&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;What&#39;s next&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Fix the chunking loop first.&lt;/strong&gt; The stride and slice bugs mean the pipeline runs but produces garbage chunks. Nothing else improves until these two lines are corrected.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thread message history.&lt;/strong&gt; The query loop discards context between turns. Passing &lt;code&gt;message_history&lt;/code&gt; into &lt;code&gt;run_stream()&lt;/code&gt; — already done in &lt;code&gt;llm.py&lt;/code&gt; — makes the self-learning system prompt behaviorally meaningful for multi-turn sessions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sentence-aware splitting.&lt;/strong&gt; Character boundaries cut mid-sentence. Splitting on paragraph breaks (&lt;code&gt;&#92;n&#92;n&lt;/code&gt;) or sentence endings would produce semantically cleaner chunks and better retrieval precision on prose documentation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Persist the index.&lt;/strong&gt; Re-fetching and re-chunking the repo on every startup is slow. Serializing the fitted index to disk makes repeat queries near-instant for interactive use.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Retrieval visibility.&lt;/strong&gt; There is no way to know whether the top-10 chunks are the right ones. Logging which files surface per query is the minimum needed to tune &lt;code&gt;boost_dict&lt;/code&gt; and &lt;code&gt;num_results&lt;/code&gt; with actual evidence.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>An RSS-Driven Publishing Pipeline: Git to a Static Site, Medium, and Substack</title>
    <link href="https://chandrakumarreddy.github.io/rss-publishing-pipeline/" />
    <updated>2026-06-21T00:00:00Z</updated>
    <id>https://chandrakumarreddy.github.io/rss-publishing-pipeline/</id>
    <content type="html">&lt;p&gt;&lt;img src=&quot;https://chandrakumarreddy.github.io/assets/rss-publishing-pipeline/hero.png&quot; alt=&quot;Markdown in a Git repo building into a static site and a full-content feed.xml that fans out to feed readers, Substack, and Medium&quot;&gt;&lt;/p&gt;
&lt;p&gt;Most &amp;quot;publish everywhere&amp;quot; setups are a pile of brittle integrations that break every time a platform changes its UI. I wanted the opposite: one canonical source, one well-formed feed, and syndication that degrades gracefully instead of silently. This is how that pipeline works, end to end, with the code that matters and the failure modes I hit getting there.&lt;/p&gt;
&lt;p&gt;The shape is simple. Markdown lives in a Git repo. A push triggers a build that produces a static site and a full-content &lt;code&gt;feed.xml&lt;/code&gt;. That feed is the contract — every downstream reader, including Medium and Substack, consumes it. Nothing pushes content out; everything pulls it in. That single inversion is what makes the whole thing robust.&lt;/p&gt;
&lt;p&gt;The rest of this post is the &lt;em&gt;how&lt;/em&gt;: why the feed sits at the center, how the generator is wired, the two image-pipeline bugs that cost me an evening each, the deploy, and the honest state of syndication in 2026.&lt;/p&gt;
&lt;h2&gt;Why the feed is the architecture&lt;/h2&gt;
&lt;p&gt;It&#39;s tempting to treat the RSS feed as an afterthought — a file the generator emits and nobody reads. Here it&#39;s the load-bearing wall. The site is one consumer of the feed; feed readers are another; Substack&#39;s importer is another. If the feed is correct — full content, absolute image URLs, stable canonical links — every consumer works without special-casing. If it&#39;s wrong, they all fail in the same way, which makes debugging tractable.&lt;/p&gt;
&lt;p&gt;So the design rule is: get the feed right, and treat the rendered HTML pages as just another view of the same data.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://chandrakumarreddy.github.io/assets/rss-publishing-pipeline/architecture.png&quot; alt=&quot;The feed as the hub: one Git source builds the feed, and the site, feed readers, Substack, and Medium all read from it&quot;&gt;&lt;/p&gt;
&lt;p&gt;That inversion has three concrete consequences worth naming, because they&#39;re the reason the rest of the design falls out the way it does:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;There is exactly one place a post can be wrong.&lt;/strong&gt; If an image is broken in the feed, it&#39;s broken everywhere — the site, the reader, the import. You never chase a bug that reproduces on Substack but not on your own page, because they read the same bytes. A single artifact to validate is the whole win.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Syndication has no write credentials.&lt;/strong&gt; Nothing in this pipeline holds an API token for Medium or Substack. There&#39;s no secret to rotate, no OAuth scope to renew, no integration to get deprecated out from under you. The platforms reach &lt;em&gt;in&lt;/em&gt; and read a public URL. That&#39;s a smaller attack surface and a smaller maintenance surface at the same time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&amp;quot;Full content&amp;quot; is a hard requirement, not a nicety.&lt;/strong&gt; A summary-only feed forces every consumer to crawl back to your site for the body, and importers that don&#39;t crawl simply get a truncated post. The feed must carry the entire rendered article, images included, or the contract leaks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hold onto that last point — it&#39;s the constraint that drives the image pipeline two sections down.&lt;/p&gt;
&lt;h2&gt;The generator&lt;/h2&gt;
&lt;p&gt;I use &lt;a href=&quot;https://www.11ty.dev/&quot;&gt;Eleventy&lt;/a&gt; because it&#39;s Node-based — no separate toolchain to maintain alongside a JS/TS stack, and the RSS plugin is first-party. Posts are markdown in &lt;code&gt;blog/&lt;/code&gt; with frontmatter that the feed depends on:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-yaml&quot;&gt;---
layout: post.njk
title: &amp;quot;An RSS-Driven Publishing Pipeline&amp;quot;
description: &amp;quot;One-sentence summary for the feed and cards.&amp;quot;
date: 2026-06-21
tags: posts
---
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each field earns its place:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;title&lt;/code&gt; and &lt;code&gt;description&lt;/code&gt; become the feed entry&#39;s &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;summary&amp;gt;&lt;/code&gt;, and they&#39;re also what render on the homepage cards. Write the description as a real one-sentence abstract, not SEO filler — it&#39;s the first thing a feed-reader subscriber sees.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;date&lt;/code&gt; drives sort order and the feed&#39;s &lt;code&gt;&amp;lt;updated&amp;gt;&lt;/code&gt; timestamp. Eleventy parses it as a date; if you ever see posts in the wrong order, it&#39;s almost always a string date that didn&#39;t parse.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tags: posts&lt;/code&gt; is the load-bearing one. It&#39;s how a post enters the feed collection. &lt;strong&gt;A post missing this line builds a page but never syndicates&lt;/strong&gt; — it exists at its URL, but it&#39;s invisible to the feed, the homepage list, and therefore every importer. This is the single most common &amp;quot;why didn&#39;t my post show up&amp;quot; cause, and because the page itself builds fine, nothing errors. Knowing it up front saves you the confused debugging session.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Wiring the feed&lt;/h3&gt;
&lt;p&gt;The feed itself comes from the official plugin, configured for full content and a correct absolute base:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-js&quot;&gt;const { feedPlugin } = require(&amp;quot;@11ty/eleventy-plugin-rss&amp;quot;);

const SITE_URL = (process.env.SITE_URL || &amp;quot;https://example.github.io&amp;quot;).replace(/&#92;/$/, &amp;quot;&amp;quot;);

module.exports = function (eleventyConfig) {
  eleventyConfig.addPlugin(feedPlugin, {
    type: &amp;quot;atom&amp;quot;,
    outputPath: &amp;quot;/feed.xml&amp;quot;,
    collection: { name: &amp;quot;posts&amp;quot;, limit: 0 }, // 0 = all posts, not just recent N
    metadata: {
      language: &amp;quot;en&amp;quot;,
      title: &amp;quot;Engineering Notes&amp;quot;,
      base: SITE_URL + &amp;quot;/&amp;quot;,            // MUST be absolute — see the URL bug below
      author: { name: &amp;quot;Your Name&amp;quot; },
    },
  });

  eleventyConfig.addCollection(&amp;quot;posts&amp;quot;, (api) =&amp;gt;
    api.getFilteredByGlob(&amp;quot;blog/*.md&amp;quot;).sort((a, b) =&amp;gt; b.date - a.date)
  );

  return { dir: { input: &amp;quot;.&amp;quot;, includes: &amp;quot;_includes&amp;quot;, output: &amp;quot;_site&amp;quot; } };
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A few choices here are deliberate:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;type: &amp;quot;atom&amp;quot;&lt;/code&gt; over RSS 2.0.&lt;/strong&gt; Atom is stricter about required fields (every entry needs a stable &lt;code&gt;id&lt;/code&gt;, an &lt;code&gt;updated&lt;/code&gt;, and the feed needs an author), and that strictness is a feature when the feed is your contract — the plugin won&#39;t let you emit a feed that&#39;s missing the fields importers rely on. Both Substack and feed readers accept Atom fine.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;base&lt;/code&gt; is read from an environment variable, normalized once.&lt;/strong&gt; The &lt;code&gt;.replace(/&#92;/$/, &amp;quot;&amp;quot;)&lt;/code&gt; strips a trailing slash so that &lt;code&gt;SITE_URL + &amp;quot;/&amp;quot;&lt;/code&gt; never produces a double slash. That single line is the difference between &lt;code&gt;https://site.com//assets/x.png&lt;/code&gt; (which some CDNs 404) and a clean URL. CI sets &lt;code&gt;SITE_URL&lt;/code&gt;; local builds fall back to the default. Same code path, same output.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;limit: 0&lt;/code&gt; means the full archive.&lt;/strong&gt; Some importers — Substack&#39;s included — do a one-pass read of the feed and import whatever items are present at that moment. If you cap the feed at the most recent 10 items, a fresh import silently backfills only those 10 and drops everything older, with no warning. Emitting the entire archive means one import captures the whole history. The cost is a larger &lt;code&gt;feed.xml&lt;/code&gt;, which is negligible for a text feed even at hundreds of posts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The collection is sorted newest-first&lt;/strong&gt; so both the feed and the homepage share one ordering. Defining order in exactly one place is the same principle as the feed itself: don&#39;t let two consumers disagree about the same data.&lt;/p&gt;
&lt;h2&gt;The illustration problem: SVG in, PNG out&lt;/h2&gt;
&lt;p&gt;Posts are illustrated with hand-authored SVG — scriptable, crisp at any zoom, diff-able in version control, and tiny on disk. For a site I&#39;d happily ship SVG and never think about it again. But the feed changes the calculus, because here&#39;s a constraint that bites silently:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Medium and Substack importers fetch and embed raster images. They do not render SVG.&lt;/strong&gt; If the feed references an &lt;code&gt;.svg&lt;/code&gt;, the importer either drops the image entirely or embeds a broken reference. Every illustration vanishes on import — with no error, no warning, no log line. You find out when you look at the published copy and the article is all text.&lt;/p&gt;
&lt;p&gt;So the requirement is contradictory only on the surface: author in SVG (best for editing and for the crisp web experience), but ship raster to anything that imports. The resolution is to generate the PNG at build time and keep the SVG as the source of truth.&lt;/p&gt;
&lt;h3&gt;Rasterizing at build time&lt;/h3&gt;
&lt;p&gt;A build stage walks the assets tree and renders every SVG to a same-named PNG using &lt;a href=&quot;https://sharp.pixelplumbing.com/&quot;&gt;sharp&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-js&quot;&gt;// scripts/build-assets.js
const fs = require(&amp;quot;fs&amp;quot;);
const path = require(&amp;quot;path&amp;quot;);
const sharp = require(&amp;quot;sharp&amp;quot;);

function* walk(dir) {
  for (const e of fs.readdirSync(dir, { withFileTypes: true })) {
    const full = path.join(dir, e.name);
    if (e.isDirectory()) yield* walk(full);
    else if (e.name.toLowerCase().endsWith(&amp;quot;.svg&amp;quot;)) yield full;
  }
}

(async () =&amp;gt; {
  for (const svg of walk(path.join(__dirname, &amp;quot;..&amp;quot;, &amp;quot;blog&amp;quot;, &amp;quot;assets&amp;quot;))) {
    await sharp(svg, { density: 200 })            // render SVG at 200 DPI, not the default 72
      .resize({ width: 2400, withoutEnlargement: true }) // crisp at 2x on retina, never upscale
      .png()
      .toFile(svg.replace(/&#92;.svg$/i, &amp;quot;.png&amp;quot;));
  }
})();
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Two parameters here matter more than they look:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;density: 200&lt;/code&gt;.&lt;/strong&gt; sharp rasterizes SVG through libvips, which defaults to 72 DPI. At 72 DPI a 1200-px-wide SVG renders to a soft, undersized bitmap. Bumping the density tells the rasterizer to sample the vector at a higher resolution before it ever hits the &lt;code&gt;resize&lt;/code&gt;, so text edges and thin strokes stay sharp. If your PNGs look fuzzy, density is the first knob.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;withoutEnlargement: true&lt;/code&gt;.&lt;/strong&gt; The &lt;code&gt;resize&lt;/code&gt; targets 2400px for retina crispness, but this flag means a smaller source SVG is never upscaled into a blurry mess — it just renders at its natural size. You get &amp;quot;at least crisp,&amp;quot; never &amp;quot;stretched.&amp;quot;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The script is idempotent and cheap: it overwrites the PNGs every build, so the committed SVG is always the source and the PNG is always derived. I commit both (the PNG so the site works even if someone clones and serves &lt;code&gt;_site&lt;/code&gt; without a build, the SVG because it&#39;s the editable original), but the PNG is regenerated on every CI run, so it can never drift from the SVG.&lt;/p&gt;
&lt;h3&gt;Rewriting references before render&lt;/h3&gt;
&lt;p&gt;Generating PNGs isn&#39;t enough — the markdown still says &lt;code&gt;.svg&lt;/code&gt;, and it uses repo-relative paths. A preprocessor fixes both &lt;em&gt;before&lt;/em&gt; Eleventy renders, so the change flows into the HTML page and the feed body identically:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-js&quot;&gt;eleventyConfig.addPreprocessor(&amp;quot;imgRefs&amp;quot;, &amp;quot;md&amp;quot;, (data, content) =&amp;gt;
  content
    .replace(/(&#92;]&#92;()(?:&#92;.&#92;/)?assets&#92;//gi, &amp;quot;$1/assets/&amp;quot;)  // 1. root-absolute path
    .replace(/(&#92;]&#92;([^)]+?)&#92;.svg(&#92;))/gi, &amp;quot;$1.png$2&amp;quot;)        // 2. swap extension svg -&amp;gt; png
);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The order is two independent rewrites on the markdown image syntax &lt;code&gt;![alt](path)&lt;/code&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;assets/…&lt;/code&gt; → &lt;code&gt;/assets/…&lt;/code&gt;&lt;/strong&gt; turns a repo-relative path into a root-absolute one. In the repo, images live next to posts under &lt;code&gt;blog/assets/&amp;lt;slug&amp;gt;/&lt;/code&gt;; on the built site they&#39;re copied to &lt;code&gt;/assets/&amp;lt;slug&amp;gt;/&lt;/code&gt;. Without this rewrite the relative path would resolve against the post&#39;s URL and double the directory.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;.svg&lt;/code&gt; → &lt;code&gt;.png&lt;/code&gt;&lt;/strong&gt; points every reference at the rasterized output.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Because this runs as a preprocessor on the raw markdown, the rewritten reference is what gets rendered into &lt;em&gt;both&lt;/em&gt; outputs. The feed plugin then takes the root-absolute &lt;code&gt;/assets/…&lt;/code&gt; path and prefixes it with &lt;code&gt;base&lt;/code&gt; to produce a fully-qualified &lt;code&gt;https://…/assets/…/x.png&lt;/code&gt; in the feed XML. One set of authored files; two consumers — browser and importer — each served what it can actually use.&lt;/p&gt;
&lt;p&gt;A subtle benefit: the regexes only touch markdown image syntax (&lt;code&gt;](…)&lt;/code&gt;), so a literal &lt;code&gt;.svg&lt;/code&gt; string inside a fenced code block in your post is left untouched. You can write &lt;em&gt;about&lt;/em&gt; SVG, showing &lt;code&gt;.svg&lt;/code&gt; paths in code samples, without the preprocessor mangling your prose.&lt;/p&gt;
&lt;h2&gt;The bug worth dwelling on: absolute URLs and subpaths&lt;/h2&gt;
&lt;p&gt;When I first tested the feed, the image URLs came out mangled in two different ways, and untangling them is the most transferable lesson in this whole setup.&lt;/p&gt;
&lt;p&gt;The first symptom was a &lt;strong&gt;doubled path&lt;/strong&gt; — URLs like &lt;code&gt;/assets/&amp;lt;slug&amp;gt;/assets/&amp;lt;slug&amp;gt;/hero.png&lt;/code&gt;. That was the relative-path rewrite missing: a repo-relative &lt;code&gt;assets/…&lt;/code&gt; reference resolved against the post&#39;s own URL, so the directory got concatenated twice. The &lt;code&gt;assets/ → /assets/&lt;/code&gt; rewrite above fixes it by anchoring to the site root.&lt;/p&gt;
&lt;p&gt;The second symptom was nastier because the feed &lt;em&gt;looked&lt;/em&gt; correct. On a GitHub Pages &lt;strong&gt;project&lt;/strong&gt; site — served at &lt;code&gt;https://&amp;lt;user&amp;gt;.github.io/&amp;lt;repo&amp;gt;/&lt;/code&gt; — root-absolute paths like &lt;code&gt;/assets/x.png&lt;/code&gt; resolve against the &lt;strong&gt;domain origin&lt;/strong&gt;, not the project subpath. So &lt;code&gt;/assets/x.png&lt;/code&gt; becomes &lt;code&gt;https://&amp;lt;user&amp;gt;.github.io/assets/x.png&lt;/code&gt;, the &lt;code&gt;/&amp;lt;repo&amp;gt;&lt;/code&gt; segment silently disappears, and the image 404s. The feed XML was well-formed, the URLs were absolute, everything &lt;em&gt;read&lt;/em&gt; fine — but every image was quietly broken, and an import would produce text with no pictures and no explanation.&lt;/p&gt;
&lt;p&gt;There are two ways out, and only one of them is worth taking:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fight it with &lt;code&gt;pathPrefix&lt;/code&gt;.&lt;/strong&gt; Eleventy has a &lt;code&gt;pathPrefix&lt;/code&gt; option, and the RSS plugin can incorporate it, so you &lt;em&gt;can&lt;/em&gt; make a project-subpath site emit correct absolute URLs. But now every link, every asset, every &lt;code&gt;base&lt;/code&gt; has to thread the prefix correctly, and any helper that builds a URL by hand becomes a place the prefix can be forgotten. You&#39;re adding a coordinate to every URL in the system to work around a hosting choice.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Serve at a domain root instead.&lt;/strong&gt; A &lt;code&gt;&amp;lt;user&amp;gt;.github.io&lt;/code&gt; repo (the one named exactly after your username) is served at the bare root, and so is any custom domain you attach. At a root, &lt;code&gt;/assets/x.png&lt;/code&gt; resolves correctly with zero special handling, no &lt;code&gt;pathPrefix&lt;/code&gt;, no prefix threading. The class of bug simply cannot occur.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the single highest-leverage setup decision in the pipeline. If you take one thing from this post: &lt;strong&gt;don&#39;t deploy a feed-driven site to a project subpath.&lt;/strong&gt; Use a user/org Pages repo or a custom domain, and the entire category of subpath URL bugs evaporates.&lt;/p&gt;
&lt;p&gt;A cheap verification step earns a permanent place in your release routine, because it catches this and several other failures at once:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# After deploy, fetch the feed and pull out every image URL, then check each one.
curl -s https://example.github.io/feed.xml &#92;
  | grep -oE &#39;https://[^&amp;quot;]+&#92;.png&#39; &#92;
  | sort -u &#92;
  | while read -r url; do
      code=$(curl -s -o /dev/null -w &#39;%{http_code}&#39; &amp;quot;$url&amp;quot;)
      echo &amp;quot;$code  $url&amp;quot;
    done
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Every line should start with &lt;code&gt;200&lt;/code&gt;. A &lt;code&gt;404&lt;/code&gt; means an image is broken in the feed — fix it &lt;em&gt;before&lt;/em&gt; you import anywhere, because once a platform has imported a broken post, you&#39;re cleaning it up by hand on their side.&lt;/p&gt;
&lt;h2&gt;Deploying with GitHub Actions&lt;/h2&gt;
&lt;p&gt;The deploy is unremarkable, which is exactly the goal — a boring deploy is one you don&#39;t think about. On a push that touches content, build and ship to Pages:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-yaml&quot;&gt;name: Publish blog
on:
  push:
    branches: [main]
    paths: [&amp;quot;blog/**&amp;quot;, &amp;quot;.eleventy.js&amp;quot;, &amp;quot;scripts/**&amp;quot;, &amp;quot;_includes/**&amp;quot;]
  workflow_dispatch: {}

permissions: { contents: read, pages: write, id-token: write }
concurrency: { group: pages, cancel-in-progress: false }

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 22, cache: npm }
      - run: npm ci
      - run: npm run build           # rasterize SVGs, then run Eleventy
        env: { SITE_URL: https://example.github.io }
      - uses: actions/upload-pages-artifact@v3
        with: { path: _site }
  deploy:
    needs: build
    runs-on: ubuntu-latest
    environment: { name: github-pages, url: &amp;quot;$&amp;quot; }
    steps:
      - id: deployment
        uses: actions/deploy-pages@v4
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The details that make it reliable rather than just functional:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;paths:&lt;/code&gt; filters the trigger.&lt;/strong&gt; A commit that only touches the README or CI config won&#39;t rebuild and redeploy the site. Builds happen when the content or the machinery that shapes it changes, and not otherwise — which keeps the deploy history meaningful and your Actions minutes low.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;workflow_dispatch: {}&lt;/code&gt;&lt;/strong&gt; adds a manual &amp;quot;Run workflow&amp;quot; button. Useful when you want to force a rebuild without a commit — say, after changing a repository secret or re-checking a deploy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;permissions&lt;/code&gt; are the minimum for Pages.&lt;/strong&gt; &lt;code&gt;contents: read&lt;/code&gt; to check out, &lt;code&gt;pages: write&lt;/code&gt; to publish, and &lt;code&gt;id-token: write&lt;/code&gt; for the OIDC token that &lt;code&gt;deploy-pages&lt;/code&gt; uses to authenticate to the Pages service. No broader scope is granted, so a compromised action can&#39;t do more than publish the site.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;concurrency&lt;/code&gt; with &lt;code&gt;cancel-in-progress: false&lt;/code&gt;&lt;/strong&gt; serializes deploys on a single &lt;code&gt;pages&lt;/code&gt; group. Two quick pushes won&#39;t race to publish; the second waits for the first. &lt;code&gt;false&lt;/code&gt; (rather than cancelling the in-flight run) means a deploy already uploading isn&#39;t aborted halfway.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;SITE_URL&lt;/code&gt; is set as an env var on the build&lt;/strong&gt;, which is what feeds the &lt;code&gt;base&lt;/code&gt; in the Eleventy config. CI and your machine run the &lt;em&gt;same&lt;/em&gt; &lt;code&gt;npm run build&lt;/code&gt; — the only difference is this one variable — so a build that works locally works in CI.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Two jobs, not one.&lt;/strong&gt; &lt;code&gt;build&lt;/code&gt; produces an artifact; &lt;code&gt;deploy&lt;/code&gt; consumes it. Splitting them is the GitHub-recommended Pages pattern: the build can&#39;t accidentally publish a half-finished &lt;code&gt;_site&lt;/code&gt;, because publishing is a separate, explicit step gated on the build succeeding.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code&gt;npm run build&lt;/code&gt; script chains &lt;code&gt;node scripts/build-assets.js &amp;amp;&amp;amp; eleventy&lt;/code&gt;. Because rasterization is the first link in that chain, there&#39;s no way to build the site without first regenerating the PNGs — CI and local output are byte-identical, and you can&#39;t ship a stale image.&lt;/p&gt;
&lt;h2&gt;Syndication: the honest version&lt;/h2&gt;
&lt;p&gt;Here&#39;s where expectations need calibrating, because most &amp;quot;auto-post to Medium and Substack&amp;quot; tutorials are describing a world that no longer exists.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;You cannot truly auto-publish to Medium or Substack in 2026.&lt;/strong&gt; Medium stopped issuing API integration tokens on 2025-01-01 and archived its API as unsupported; unless you&#39;re holding an old token from before that date, there is no programmatic publish path. Substack never had a public write API at all — its only inbound route is an RSS importer, and it&#39;s pull-based by design.&lt;/p&gt;
&lt;p&gt;So syndication is feed-driven &lt;em&gt;import&lt;/em&gt;, not push. That sounds like a downgrade, but it&#39;s actually what makes the pipeline durable: because the feed is correct, both platforms accept it cleanly, and there&#39;s no credential or integration that a platform can deprecate to break you.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://chandrakumarreddy.github.io/assets/rss-publishing-pipeline/syndication.png&quot; alt=&quot;Syndication flow: the feed is imported by Substack via RSS and the post URL is imported by Medium, each setting a canonical link back to the source&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Substack — RSS import&lt;/h3&gt;
&lt;p&gt;Settings → Import/Export → Import posts → paste the feed URL (&lt;code&gt;https://example.github.io/feed.xml&lt;/code&gt;). Substack pulls the posts and sets canonical links back to your site, so search engines still credit your copy as the original. Two caveats to plan around:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Imported posts arrive as &lt;em&gt;Published&lt;/em&gt;, not drafts.&lt;/strong&gt; There&#39;s no &amp;quot;import as draft&amp;quot; option. So import during a quiet window and review immediately — or better, import once, then rely on incremental imports for new posts and check each one promptly. Treat the import as a publish action, because that&#39;s what it is.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Code blocks lose syntax highlighting.&lt;/strong&gt; Substack&#39;s editor has no code-highlighting concept, so fenced code comes through as monospaced but uncolored. For a code-heavy post this is cosmetic, not broken — the code is intact and copyable — but it&#39;s worth knowing before you&#39;re surprised by how a tutorial looks on Substack.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because the feed carries full content with absolute PNG URLs, the body and images both come through. This is the payoff for the entire image pipeline: the SVG-to-PNG work two sections up is &lt;em&gt;why&lt;/em&gt; Substack shows your diagrams instead of dropping them.&lt;/p&gt;
&lt;h3&gt;Medium — import by URL&lt;/h3&gt;
&lt;p&gt;Medium has no feed import, but &amp;quot;Import a story&amp;quot; takes a single URL. Paste the post&#39;s canonical URL; Medium fetches the rendered page, converts it to a &lt;strong&gt;draft&lt;/strong&gt; (note: draft, unlike Substack — you get a review step for free), and sets &lt;code&gt;rel=canonical&lt;/code&gt; back to your site so search engines treat your version as the original. Because the page serves absolute PNG URLs, the images come through here too — Medium re-hosts them on its own CDN as part of the import.&lt;/p&gt;
&lt;p&gt;The canonical link is the quiet hero of this whole arrangement. Both platforms point back to your URL, so you publish the same article in three places without a duplicate-content penalty — Google understands which one is authoritative. Your site stays the source of record even when most of your readers are on someone else&#39;s platform.&lt;/p&gt;
&lt;h3&gt;The closest thing to automation&lt;/h3&gt;
&lt;p&gt;If you want hands-off syndication, an IFTTT applet (&amp;quot;new RSS item → Medium&amp;quot;) uses Medium&#39;s &lt;em&gt;internal&lt;/em&gt; posting pathway rather than the dead public API. It works, and it&#39;ll publish new posts without you touching Medium. But it&#39;s fragile — it depends on an internal integration Medium could change without notice — so treat it as a convenience, not a guarantee. Whatever you automate, verify each post actually landed and rendered correctly. The whole point of this architecture is that verification is cheap: open the published copy, confirm the images loaded, done.&lt;/p&gt;
&lt;h2&gt;The full loop&lt;/h2&gt;
&lt;p&gt;End to end, publishing a post looks like this.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The hands-off half:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Write the markdown in &lt;code&gt;blog/&amp;lt;slug&amp;gt;.md&lt;/code&gt; with the frontmatter (don&#39;t forget &lt;code&gt;tags: posts&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Drop the SVG illustrations in &lt;code&gt;blog/assets/&amp;lt;slug&amp;gt;/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Commit and push to &lt;code&gt;main&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;GitHub Actions rasterizes the SVGs to PNG, runs Eleventy, and deploys to Pages. The homepage updates and &lt;code&gt;feed.xml&lt;/code&gt; picks up the new post automatically.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;The manual last mile:&lt;/strong&gt;&lt;/p&gt;
&lt;ol start=&quot;5&quot;&gt;
&lt;li&gt;Confirm the post is live and run the feed-image check (&lt;code&gt;curl … | grep … | check 200&lt;/code&gt;) so you &lt;em&gt;know&lt;/em&gt; the images resolve before any platform reads them.&lt;/li&gt;
&lt;li&gt;Import / confirm the post on Substack (or let an incremental import pick it up).&lt;/li&gt;
&lt;li&gt;Import the post URL into Medium, review the draft, publish.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Steps 1–4 are fully automated. Steps 5–7 are a few minutes of clicking and a sanity check. That split is intentional: the parts that &lt;em&gt;can&lt;/em&gt; be made reliable and unattended are, and the parts that depend on third-party UIs stay manual and verified rather than automated and silently broken.&lt;/p&gt;
&lt;h2&gt;What this buys you&lt;/h2&gt;
&lt;p&gt;It&#39;s less magical than &amp;quot;push and it&#39;s everywhere,&amp;quot; but everything is inspectable and version-controlled, and nothing depends on a credential a platform can revoke. The feed is a real artifact you can open in a browser, validate with a feed checker, and reason about. The canonical source is yours, on infrastructure you control, and the syndicated copies all point back to it with &lt;code&gt;rel=canonical&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The durability is the entire point. When a platform changes its import UI — and they will — nothing in your pipeline breaks; you just click a slightly different button next time. There&#39;s no token to expire, no webhook to re-register, no integration to migrate. You moved the complexity from N fragile push-integrations to one correct artifact that everything pulls from, and a correct artifact is a thing you can actually keep correct.&lt;/p&gt;
&lt;p&gt;That&#39;s the trade: a few minutes of manual import per post, in exchange for a publishing system that doesn&#39;t rot. For a blog you intend to keep for years, that&#39;s the right side of the bargain.&lt;/p&gt;
</content>
  </entry>
</feed>