Shot Caller.
A.I.-driven scoring of financial influencers by realized ROI.
Shot Caller answers a simple question: who actually has good market advice, and who is selling noise? We ingest daily social commentary on assets from financial influencers, classify each recommendation with LLMs, then back-test it against real price movement. The result is a leaderboard of influencers ranked by realized ROI across multiple horizons — and a set of secondary signals for spotting contrarians who are right and crowd-followers who are wrong.
From scrape to score.
- 01
Daily ingestion
Cron · dailyA cron-driven pipeline scrapes Twitter, Reddit, and YouTube transcripts through a rotating proxy pool. Node and Python services run side by side — Node handles API-bound work, Python handles transcript extraction and NLP.
- Twitter API for influencer feeds and replies
- Reddit API for finance-adjacent subreddit threads
- YouTube transcript extraction routed through proxies to evade rate-limits and IP bans
- One full sweep per day, written through a queue
- 02
Cheap matcher
FilterBefore any LLM touches the data, a fast string/regex matcher culls the corpus to posts that mention a known asset. We resolve against a curated table of asset IDs and aliases — Bitcoin and BTC collapse to the same canonical row.
- Curated table of asset IDs with name / ticker / alias variants
- Cuts 90%+ of irrelevant chatter before it can hit an LLM bill
- Collapses surface forms into a single canonical asset id for downstream joins
- 03
LLM enrichment
LLMSurviving posts go through more expensive LLM calls that classify each recommendation into a small set of structured fields: action, sentiment, conviction, and time horizon. The calls run on a BullMQ worker pool backed by Redis — parallelizing the slow LLM step for throughput, with the queue architecture already in place to absorb backpressure if ingestion ever outpaces enrichment.
- Action — buy / sell / hold
- Sentiment — bullish / bearish / neutral
- Conviction — scale
- Time horizon — intraday, swing, weeks, months, long-term
- 04
Hourly price tracking
Parallel · hourlyRunning in parallel with the ingestion pipeline, a separate process records hourly price data for every tracked asset. This is the ground truth that every recommendation is back-tested against.
- Per-asset, per-hour OHLC data
- Continuous — independent of the daily scrape cadence
- The substrate for every ROI calculation downstream
- 05
ROI scoring
Serial · after 03A second process runs serially after enrichment. It joins each structured recommendation against the price series and computes realized ROI at multiple horizons.
- Trailing ROI (since the call was made)
- 7-day ROI
- 30-day ROI
- Action and time-horizon from stage 03 drive how each ROI is interpreted
- 06
Automated content generation: newsletter, social posts
DistributionOnce influencers are scored, the data flows into the editorial side — auto-generated leaderboards, a daily newsletter, and shareable social infographics, all published from the same underlying data without manual editorial intervention.
- Influencer ROI Ratings: Continuously-updated leaderboard surfacing who is consistently positive and at which horizon
- Newsletter Generator: Daily email digest auto-generated from the day's mined data — top movers, who called it correctly, who missed badly — drafted into a publish-ready brief
- Social Infographics: Instagram and social-channel infographics auto-generated from the same data — leaderboards, contrarian wins, weekly callouts — formatted for each channel's aspect ratio and voice
- 07
Evals & self-tuning
Meta · continuousEvery step that touches an LLM is wrapped in evals so the pipeline holds up over time. Outputs are graded by judge models, disagreements get flagged for review, and the prompts themselves adapt based on what's actually working in production.
- LLM-as-Judge Ensemble: Multiple judge models grade each enrichment independently. Disagreements between judges flag the underlying call for review — consensus is the green light to advance
- Self-Optimizing Prompts: Prompts update themselves based on which variants outperform on a held-out set. New candidates compete against the incumbent and only get promoted when they win on the eval suite
Who is predictably right?
Once every call has a back-tested ROI, we roll up to the influencer level. Averaging ROIs across horizons separates one-off lucky calls from people who are predictably positive over time. A second tier of signals catches the things naive leaderboards miss.
Average ROI by horizon
Trailing, 7-day, and 30-day ROIs averaged per influencer — surfacing who is consistently positive at which timescale.
Hit rate
Correct calls divided by total calls. A coarse but honest measure of how often someone is actually right.
Contrarian correctness
Calls that ran against the crowd consensus at the time and still resolved positive. A premium signal — these are the people worth listening to when everyone else is wrong.
Language sentiment
Affect of the language tracked separately from the structured recommendation. Sentiment that diverges from the call is itself a signal.
Restraint during crashes
Absence of buy calls during sharp downturns. Not buying into the crash is a form of being right — and a feature most leaderboards miss entirely.
Want the rest of the story?
The non-technical write-up lives on the work page. If you're building something in this space or want to dig deeper into any of the stages above, get in touch.