1home.md 2blog.md[+]3journey.md 4cv.md 5me.md 6status.md

NORMAL

LANG·

UTF-8

1,1 All

blog.md

NORMAL

UTF-8

Loading ...

Peifeng Wang — Full-Stack Engineer, Tokyo

>cd ../posts/

2026.05.10

Sky Finance Series #1: From POC to a Systematic RAG Platform — Architecture Overview

This is the first post in the Sky Finance series. I'll use this series to document the project's evolution — architecture decisions, lessons learned, and interesting technical details along the way.

Why I Built This

A month ago I published a POC post: SkyFinance POC: I Built a Personal Stock Alert System with Local LLMs and Cloud AI.

That version validated a core idea: use a local 7B model for per-ticker news filtering and classification (free), and reserve frontier cloud models only for cross-holding portfolio synthesis (cheap). The whole system cost around $0.50/month in API fees while running twice every trading day across all the tickers I follow.

Once the POC held up, I wanted to take it further:

Replace a collection of scripts with a proper async task queue
Replace ad-hoc vector queries with a persistent RAG architecture
Add an evaluation layer to actually measure whether the RAG strategies work
Support multiple LLM providers with clean abstractions

That's the starting point for Sky Finance.

What the Project Does

One sentence: a local-first financial intelligence platform that monitors US and Japanese equities, cleans market data with a local LLM, and generates RAG-powered strategy analysis with multi-provider model support.

Each trading day, the system automatically:

Pulls price data from yfinance and news from Google RSS
Runs each record through a local LLM (Ollama) for cleaning, summarization, and sentiment classification
Embeds the processed content and stores it in PostgreSQL + pgvector
Executes RAG retrieval per strategy and calls an LLM to generate analysis reports
Pushes a morning digest and price alerts to Slack

Everything runs on a Celery + Redis task queue. A web dashboard gives real-time visibility into pipeline status, strategy results, and evaluation scores.

Overall Architecture

Four Core Stages

INGESTION → PIPELINE → STRATEGY ENGINE → NOTIFICATIONS

Ingestion — the data entry point. yfinance handles price data synchronously, one task per ticker, to stay within Yahoo Finance rate limits. News is fetched concurrently via httpx.AsyncClient + asyncio.gather across multiple Google RSS feeds (L1-EN, L2-EN, L1-JA, L2-JA). A single feed failure returns an empty list and doesn't block the others.

Pipeline — data processing. Three sequential steps per record: cleaner.py → llm_summariser.py → embedder.py. Each record is an independent Celery task with its own retry policy.

Strategy Engine — RAG + LLM analysis. The core of the system; covered in detail below.

Notifications — Slack delivery. A daily morning digest plus per-ticker price move alerts.

Task Scheduling

Every stage runs as Celery tasks. Beat enqueues task names into Redis on a cron schedule; workers consume and execute them.

Beat Job	Schedule (UTC)	What It Does
`ingest-us-stocks`	23:00 Mon–Fri	Pull US equities after NYSE close
`ingest-japan-stocks`	07:30 Mon–Fri	Pull Japan equities after TSE close
`ingest-news`	:00 every hour	Fetch news for all tickers
`run-pipeline`	:30 every hour	Clean + LLM summarize + embed
`run-strategies`	09:00 Mon–Fri	RAG retrieval + AI analysis
`send-digest`	09:05 Mon–Fri	Send Slack morning digest

Beat never executes business logic — it only drops task names into Redis. Workers are the only processes that run actual Python code. task_acks_late = True ensures in-flight tasks requeue automatically if a worker goes down.

Pipeline: The Cost Boundary

The pipeline is the cost control layer of the whole architecture.

Each raw record (news article or price data) passes through:

cleaner.py — strip HTML, normalize whitespace, truncate to 2,000 characters
llm_summariser.py — call local Ollama (qwen2.5:3b-instruct) for structured output:
- summary: short summary of the content
- sentiment: + / = / -
- key_facts: list of key facts
- topics: topic tags
- relevance_score: how relevant the content is to the ticker
embedder.py — generate a 768-dim vector with nomic-embed-text, store in pgvector

This entire stage runs locally — zero API cost. Cloud models are only called during the Strategy Engine step when generating the final analysis reports.

RAG Design: Sentiment-Bucketed Retrieval

The naive RAG approach: embed a query, retrieve top-k most similar chunks, stuff them into a prompt.

For stock analysis, this creates a real problem: during a bull run, a ticker's news corpus might be 90% positive. A plain top-k retrieval will fill the context window with bullish articles and bury the few negative risk signals that often matter most.

Sky Finance solves this with sentiment-bucketed retrieval:

Retrieve top-k separately from each sentiment bucket (positive / neutral / negative)
Merge all three buckets and feed them to the model together
Each bucket's top_k is independently configurable per strategy

This guarantees that negative signals always make it into the context, regardless of how many positive articles exist.

Hybrid Retrieval: BM25 + Vector + RRF

Within each sentiment bucket, two retrieval legs run in parallel:

Vector leg — cosine similarity over the HNSW index (semantic recall)
BM25 leg — ts_rank_cd over a GIN-indexed tsvector column (keyword precision)

The two ranked lists are fused with Reciprocal Rank Fusion:

RRF(d) = 1/(60 + rank_vector(d)) + 1/(60 + rank_bm25(d))

A document that appears in both lists scores higher than one that dominates only one. No score normalization required. Each strategy can fall back to pure vector retrieval by setting retrieval_mode = "vector".

Multi-Provider LLM Routing

Strategies reference a tier name, not a specific model ID. Swapping the underlying model is a one-line edit in config/settings.toml — no code changes needed.

Tier	Provider	Default Model	Structured Output	Use Case
`local`	Ollama	qwen2.5:14b-instruct	`format: <schema>`	Free, no API key
`nano`	OpenAI	gpt-5.4-nano	`response_format: json_schema`	Low cost
`advanced`	OpenAI	gpt-5	`response_format: json_schema`	High quality, deep analysis
`claude`	Anthropic	claude-sonnet-4-6	`tool_use` + prompt caching	Cost-efficient for long contexts

Each provider exposes structured output differently. The abstraction layer handles the differences so strategy code stays provider-agnostic.

Prompt caching: the claude tier marks its system prompt with cache_control: ephemeral, cutting repeated-call token costs by ~90%. OpenAI automatically caches prompts of 1,024+ tokens at 50% reduction. Every call records input_tokens, output_tokens, cached_tokens, and cost_usd into strategy_results.metadata, visible in the dashboard.

LLM-as-a-Judge Evaluation

Intuition isn't enough to know whether a RAG strategy actually works. Sky Finance includes a built-in evaluation module.

The approach: for the same strategy, generate two reports — one using sentiment-bucketed retrieval, one using plain top-k retrieval. A judge LLM scores each report on three dimensions (0–10 each):

Faithfulness — is the report grounded in the retrieved evidence?
Coverage — does it address the breadth of relevant signals?
Actionability — is it useful for making investment decisions?

The output is a per-ticker score comparison, a Δ value, and an overall win rate (% of tickers where bucketed retrieval beats plain retrieval).

Run via the sky-eval CLI:

# Evaluate all tickers in strategy 1
uv run sky-eval --strategy-id 1

# Run entirely local — no API keys needed
uv run sky-eval --strategy-id 1 --model-tier local --judge-model qwen2.5:14b-instruct

# Use a stronger judge for higher-stakes evaluation
uv run sky-eval --strategy-id 1 --judge-model claude-opus-4-7

Tech Stack

Layer	Technology
Task queue	Celery + Redis
Database	PostgreSQL + pgvector (Docker)
Local LLM	Ollama (qwen2.5:3b / 14b, nomic-embed-text)
Cloud LLM	OpenAI API / Anthropic Claude API
Web dashboard	FastAPI + Jinja2 + HTMX
Data sources	yfinance + Google RSS
Schema management	Alembic
Notifications	Slack Webhook
Process management	honcho (dev) / Docker Compose (prod)
Python tooling	mise + uv

What's Coming Next in This Series

Sky Finance is actively evolving. Planned posts:

#2 Pipeline Deep Dive — how structured output works with a local LLM, why qwen2.5:3b over a larger model, and how the 0.55 similarity threshold was chosen
#3 RAG Design in Detail — sentiment bucketing edge cases, hybrid retrieval tuning, and prompt placeholder design
#4 Multi-Provider LLM Abstraction — unifying Ollama / OpenAI / Claude structured output under one interface, and prompt caching implementation details
#5 Evaluation Framework — how to write an LLM-as-a-judge prompt, how to interpret the results, and what's currently broken

Code is here: github.com/peifengstudio/sky-finance

Sky Finance Series #1 | May 2026

root@wang's server:~#