How I Built an Automated Research Pipeline That Keeps Me Current Without the Tab Hoarding

Last month, I realized I had 47 tabs open across three browser windows. Half were AI in education studies I'd bookmarked and never read. The other half were leadership articles someone had shared in a group chat that I'd "get to later." This was exactly the problem I'd faced when I was a head of school: never staying on top of the research that should have informed my decisions. I was still hoarding links.

So I built something to fix it. Not a product or a subscription. It's a Claude Code skill that searches for new research on my topics, downloads what it can, summarizes everything, and generates a structured digest I can scan in under a minute. The whole thing runs on a schedule and costs nothing beyond what I already pay for Claude.

Here's exactly how it works and how to set it up yourself.

What the Pipeline Actually Does

The system runs five stages:

Search — Queries arXiv and RSS feeds for new content matching my keywords (web search via Brave API is also supported)
Download — Fetches everything freely accessible. Flags paywalled content so I know it exists.
Summarize — Claude generates a structured summary of each source: key findings, methodology, notable data, and why it matters
Digest — Compiles everything into a single document with statistics pulled out and emerging themes identified
Deliver — Saves everything locally, organized by topic and date. Can optionally email the digest or sync to Google Drive for NotebookLM.

I track four research pillars: AI in education, leadership, systems thinking, and practical AI. Each one runs on its own schedule. AI in education runs weekly (the field moves fast but studies take time), while practical AI runs every three days (tools and workflows change constantly).

The Setup: Step by Step

Step 1: Install the Skill

The pipeline is a Claude Code skill with Python scripts that handle the searching, summarizing, and delivery. Clone it and install dependencies:

git clone https://github.com/kaiak-io/claude-code-skills.git
cd claude-code-skills/skills/research/automated-research/

# Install Python dependencies
pip install -r requirements.txt

# Copy the example config
cp config.yaml.example config.yaml

The skill includes:

scripts/search.py — queries arXiv and RSS feeds (optionally Brave web search)
scripts/summarize.py — generates summaries using Claude CLI
scripts/deliver.py — optionally sends email and syncs to Google Drive

Step 2: Configure Your Topics

Create a config.yaml that defines what you're tracking. Here's a simplified version of mine:

settings:
  summary_model: claude-sonnet
  digest_format: markdown
  digest_location: ./digests

  delivery:
    email:
      enabled: true
      provider: resend           # or gmail
      recipient: "you@example.com"
      from_email: "research@yourdomain.com"
    drive:
      enabled: true
      method: rclone
      remote_path: "Research"

topics:
  ai-in-education:
    keywords:
      - "large language model education"
      - "LLM tutoring"
      - "generative AI learning"
      - "AI literacy"
    sources: [arxiv, rss]
    frequency: weekly
    rss_feeds:
      - "https://blog.google/technology/ai/rss/"
      - "https://openai.com/blog/rss.xml"
    summary_focus: "methodology, effect sizes, and policy implications"

  leadership:
    keywords:
      - "organizational decision making"
      - "leadership cognition"
      - "distributed leadership"
    sources: [arxiv, rss]
    frequency: biweekly
    rss_feeds:
      - "https://sloanreview.mit.edu/feed/"
    summary_focus: "frameworks, decision-making models, school leadership"

The summary_focus field matters. It tells Claude what to pay attention to when summarizing. For education research, I want methodology and effect sizes, not just conclusions. For leadership content, I want frameworks I can actually use.

The RSS feeds are key. In my full config I pull from AI lab blogs (Google, OpenAI), business publications (MIT Sloan Review), and technical blogs (Simon Willison, Latent Space). Combined with arXiv for academic papers, this covers both research and industry developments.

Step 3: Run It

Run the pipeline scripts directly:

# Search for new content
python scripts/search.py

# Generate summaries (requires Claude CLI installed)
python scripts/summarize.py

# Send emails and sync to Drive (optional)
python scripts/deliver.py

On the first run, everything is "due" so all topics search simultaneously. After that, each topic runs on its own schedule. The pipeline checks the last run date and only processes topics when their interval has passed.

To automate it, set up scheduled tasks (Windows Task Scheduler or cron) to run the scripts daily: search in the morning, summarize in the evening.

Step 4: Set Up Email Delivery (Optional)

You have two options for email delivery:

Option A: Resend (Recommended)

Sign up at resend.com (free tier: 100 emails/day)
Verify your domain (add their DNS records)
Get your API key and add it to .env: RESEND_API_KEY=re_xxxxx

Option B: Gmail SMTP

Enable 2FA on your Google account
Generate an App Password: Google Account → Security → App Passwords
Add credentials to .env: GMAIL_ADDRESS and GMAIL_APP_PASSWORD

The email digest includes: topic and date, count of sources found, top findings with links, key statistics pulled out, and emerging themes. Scannable in 30 seconds.

Or skip email entirely. The pipeline saves everything locally. You can upload the notes/ folders directly to NotebookLM without any email setup.

Step 5: Connect to NotebookLM

The NotebookLM layer is what took this from a news feed to an actual learning system.

The simple approach (what I do): After the pipeline runs, I drag the notes/ folders directly into NotebookLM as sources. Each topic gets its own notebook. It takes 30 seconds and I do it weekly.

The automated approach (optional): You can use rclone to sync the research folder to Google Drive automatically, then point NotebookLM at the Drive folder. This requires installing rclone and configuring a Google Drive remote. More setup, but fully hands-off once it's working.

Either way, once the sources are in NotebookLM, click "Sync" in the source settings to ensure the latest summaries are indexed.

This means I can ask NotebookLM questions like:

"What do this month's sources say about teacher AI training programs?"
"Where do the studies disagree on AI's impact on student outcomes?"
"What gaps exist in my research on school leadership?"

NotebookLM grounds its answers in the actual sources, so I'm not getting hallucinated citations. And it can generate audio overviews. I've started listening to weekly research summaries as background audio while I work instead of podcasts.

What I've Learned So Far

💡 The filter changes the behavior

Having a structured digest waiting for me each morning changed how I engage with research. Before this, I'd set aside time to "catch up on reading" and not do it. Now the reading is already done: summarized, with key statistics pulled out. I still read full sources when something matters, but the triage is automated.

The numbers after the first week:

4 topics tracked across education, leadership, and AI
75 sources downloaded, 40 summarized so far
6 digests generated, organized by topic and date
Multiple paywalled sources flagged that I wouldn't have found manually
1 study that would have changed how I approached AI policy when I led a school (a RAND study showing 60% of schools have no AI guidance for teachers)

What works:

The per-topic scheduling is critical. AI in education research doesn't need daily checking. Major studies come out weekly or monthly. But practical AI tools change every few days. Having different frequencies per pillar means I'm not drowning in noise on slow topics or missing things on fast ones.

The structured summaries are better than my own notes would be. The pipeline extracts methodology, effect sizes, and specific statistics. The things you need when making a case to a board or writing a policy recommendation. I used to skim articles and highlight vaguely. Now I have a searchable archive of structured notes.

What doesn't work yet:

Email delivery and Google Drive sync are built but not configured yet. The scripts support Resend, Gmail SMTP, and rclone, but I haven't verified my sending domain or set up Drive sync. For now I'm reading the digests locally and uploading to NotebookLM manually. It takes 30 seconds and honestly I'm not sure the email step adds much. The value is in the summaries, not the delivery mechanism.

Paywalled content is still manual. About 20% of academic sources are behind journal paywalls, and the pipeline can only flag them. I still have to go retrieve them through library access or by requesting them directly from authors.

The summaries occasionally miss nuance. They're good for "what did this study find?" but not great for "how does this challenge existing assumptions?" I still need to read the important ones myself. The pipeline is a filter, not a replacement for thinking.

Where to Start

This week: Clone the skill from the repo, set up config.yaml with one or two topics, and run the pipeline manually once to see how it works.

This month: Add email delivery so digests come to you. Set up the remaining topics you care about. Adjust frequencies based on how much output feels right.

This quarter: Add Google Drive sync and NotebookLM. Asking questions across a month of accumulated research is a completely different experience from reading individual articles.