Drug intelligence for people who need the answer — not a subscription to a bloated terminal.
// full-stack US pharma data, one REST API, one sign-up
Every FDA dataset, every international regulator we can legally redistribute, every clinical trial registry, every major open scientific database — normalized into one instance with a documented Probability-of-Approval engine and a Claude-powered AI analyst that runs real SQL on top. Every data point commercially-free and traceable to its source row.
The terminals are priced for investment banks and mega-pharma. The onboarding takes twelve months and a committee. Every feature is a separate SKU. Meanwhile, the people who actually need this data every day — clinical researchers, biotech analysts, medical-affairs teams, diligence consultants, IR — either get priced out or end up stuck with ten browser tabs and an FDA.gov bookmark.
pharmasearch.tools is the opposite bet. One self-serve sign-up. Full-stack US pharma intelligence. An API that covers every public data point we can legally redistribute, an engine that predicts approval probability with documented backtest accuracy, and an AI agent that does the analyst-intern work in seconds. Built entirely on commercially-free data — no paid subscriptions to scrape, no licensing landmines to inherit, every row attributable to its source.
Each one alone would be a product. Together they’re the difference between “another database” and “the answer to the question you actually have.”
Every FDA dataset, every international regulator we can legally redistribute, every clinical trial registry, every major open scientific database — normalized into one Postgres instance with a unified REST API in front. Write one query, get patent expiration dates. Write another, get every Phase 1/2/3/4 trial ever registered for a drug. A safety analyst traces one FAERS mention back through the label, the Drugs@FDA review, the CRL history, and the sponsor’s SEC filings — in one query.
The flagship. Give it a drug name, indication, and phase. It returns a probability the drug will win FDA approval, backed by a documented, testable methodology. Backtested on historical cases, accuracy, calibration, and ranking performance is comparable to or better than hand-curated expert consensus — while being fully reproducible and fully traceable to source rows. Every score emits a provenance record. Click through to the row in one tap.
FDA’s REMS (Risk Evaluation and Mitigation Strategies) data is scattered across PDFs, change-log CSVs, and separate ETASU requirements documents. We ingested the whole thing and structured it — live program registry, Elements To Assure Safe Use (patient agreements, provider certifications, pharmacy registrations, lab monitoring), modification history, enforcement actions, patient-burden estimates, and ML-generated REMS predictions for pipeline drugs. Encoded into PoA scoring on both sides of the ledger.
The AI endpoints expose a Claude-powered analyst with full read access to the database. Not a RAG system over documents — an agent that runs real SQL, joins tables across sources, reads FDA Review PDFs, summarizes FAERS adverse-event patterns, compares trial protocols, and cites its sources with row IDs. Every call goes against fresh DB state. When it says “the FDA rejected this class in 2019 per CRL,” there’s a row ID.
Real SQL against fresh DB state. Tool calls visible. Every claim grounded in a primary source. This is a live sample — what happens when you ask the agent a drug-safety question.
Every source normalized into the same instance, queryable through one API, traceable back to its primary URL with a row ID. No paid subscriptions anywhere in this list.
Every PoA score is the output of three documented scoring rules run in sequence, blended against historical class-failure multipliers, and emitted with a full provenance record. Here’s what runs under the hood — and what a live forecast looks like.
Walks the OpenTargets knowledge graph (drug → target → indication → approved-analog cohort) and classifies evidence into five tiers. Each tier carries a tier-implied probability.
Computes historical success rates for the matched cohort from our ClinicalTrials.gov terminal-trial universe, blended with published Phase Transition Success Rate base rates using a documented cascade discount.
Rewards drugs whose analog cohort has a track record. Punishes drugs whose peers die in Phase 3.
Scores four dimensions — severity of condition, unmet medical need, benefit, risk — across every relevant source: FAERS adverse events, boxed-warning history, EMA EPAR benefit-risk sections, FDA Review docs, Health Canada recalls, IQWiG HTA ratings.
Real sources. Real weights. No vibes.
Negative-adjustment multipliers fire when historical evidence warrants: known class failures (Alzheimer’s amyloid graveyard, CETP inhibitors, ion-channel antiarrhythmics), CRLs for the same mechanism/indication, the subject drug’s own Phase 3 failure history, post-market boxed-warning additions in the analog cohort.
The engine emits a structured provenance record showing exactly which rule fired, which sources contributed, and which drugs in the analog cohort drove the probability. You can click through to the source row in one tap. No black box. No “trust us.”
Backtested on historical cases, accuracy, calibration, and ranking performance are comparable to or better than hand-curated expert consensus — while being fully reproducible and fully traceable to source rows.
We don’t ingest DrugBank’s commercial subscription content. We don’t scrape Citeline, BiomedTracker, DealForma, Evaluate Pharma, or any paid database. Every source is listed with its current license, commercial status, share-alike flags, and attribution requirements — published at /docs/DATA_LICENSING.md.
Every endpoint has a local answer. Live upstream calls only happen when someone explicitly asks for fresh data. The architecture is built to stay available when half the public data providers are not.
/docsWe’re not trying to displace an enterprise RFP cycle. We’re giving solo consultants, small biotechs, research labs, and teams that can’t justify a six-figure seat the same answers — with receipts.
/docs/DATA_LICENSING.mdThe platform is already powering real queries. These are the next items landing — users see each improvement as it goes live.
Scraping more Drugs@FDA Summary Review PDFs with on-demand Benefit-Risk Framework parsing.
To extract actual vote tallies — the FDA’s strongest public predictor of approval.
Rate-limited by EMA’s servers but climbing steadily.
Wiring PubMed research velocity into PoA as a positive signal for active investigation.
Expanding beyond the modeled class graveyards (Alzheimer’s amyloid, CETP, ion-channel antiarrhythmics) to cover more historical class failures.
The current archive is ingested; the retired pre-2015 setids are the next target.
Polling today; push-based in the roadmap.
Compare your trial design against every historical analog in seconds. Pull every terminal trial in your indication in one query.
Validate a deal thesis with real FDA, CMS, EMA, and clinical data. Trace every asset back to its primary row.
Prepare scientific responses with citable primary sources. Every response grounded in a row ID you can reference.
Cover sponsor pipelines with PoA scores you can defend to the committee — not proprietary analyst opinions.
Can’t justify a six-figure enterprise seat but still needs the answers. One sign-up, full corpus, one plan.
Track FDA action patterns by committee, reviewer, or drug class. Query the whole CRL transparency dataset in one line.
A commercially-licensed source of the same data you’d otherwise pull piecemeal from a dozen APIs and a scraper.
Want to know which of a sponsor’s portfolio carries ETASU burden? One query. When was a REMS modified and what changed? In modification history.
If you spend any meaningful amount of time looking up drug labels, chasing down trial registrations, trying to reconstruct an FDA Review, or estimating the odds that a pipeline asset crosses the approval line — this is for you.