- Autonomy
- Defense
- Robotics
- Space Tech
- Mobility(coming soon)
- Brain-Computer Interfaces
- Smart Homes
- Spatial Computing
- Voice & Conversational Interfaces
- Avatars & Digital Humans(coming soon)
- Wearables(coming soon)
- Blockchain / Crypto
- Cloud & Edge Computing
- Data Infrastructure
- Payments & Financial Infrastructure
- Cybersecurity(coming soon)
- Digital Identity(coming soon)
- Creative AI & Media
- Energy
- Manufacturing
- Climate Tech(coming soon)
- Food Tech(coming soon)
- Materials Science(coming soon)

DeepInfra

DeepInfra, founded in 2022 and based in Palo Alto, runs a dedicated inference cloud that lets developers deploy open-source and agent-driven AI models in production without managing GPUs directly. The company offers a serverless inference API spanning text generation, image generation, speech recognition, and embeddings, alongside dedicated GPU instances, and reports processing nearly five trillion tokens per week. It was founded by CEO Nikola Borisov, Georgios Papoutsis, and Yessenzhar Kanapin, a team that previously built and operated distributed systems at global scale including the imo messenger. In May 2026 DeepInfra closed a $107 million Series B co-led by 500 Global and Georges Harik, with NVIDIA, Supermicro, Samsung Next, Felicis, and others participating, bringing total funding to roughly $133 million.

DeepInfra is a fast-growing, NVIDIA-backed inference specialist that undercuts hyperscaler pricing for open-model serving, sitting alongside Together AI and Fireworks AI in the open-model inference cloud tier.

deepinfra.com AI / GPU Cloud

United Statesprivate

At a glance

Total raised

$133M

last round series b

Headcount

11-50

approximate · public range

Headquarters

Palo Alto

United States

Founded

2022

4 years active

Status

Private

AI / GPU Cloud

Subsector

For subscribers

Subscribe to see the full profile

Including Funding rounds, Bull / Bear thesis, Stock + earnings, Roster changes, Patents, News, and Open roles.

Already subscribed? Sign in →

AI bull / bear

Bull case

DeepInfra is the capital-efficiency outlier in a category defined by capital incineration. On roughly $133M of lifetime funding — capped by a $107M Series B in May 2026 co-led by 500 Global and Georges Harik, with NVIDIA, Supermicro, and Samsung Next participating — it processes nearly five trillion tokens per week, 25x growth in about a year, with revenue reportedly tripling since the start of 2026. The structural difference from rivals is that DeepInfra owns its hardware: eight US data centers running its own fleet, which is why it can be the price leader in open-model inference and still keep real gross margins while venture-subsidized competitors rent capacity at cloud markups. The founding team built and operated imo messenger at hundreds-of-millions-of-users scale with a skeleton crew, and it shows — the company runs planet-scale serving with a headcount in the dozens. Over 30% of platform tokens now come from autonomous agents, the steepest demand curve in software, and agents are relentlessly price-sensitive buyers who route to the cheapest reliable endpoint: exactly DeepInfra's design point. In a market converging on commodity economics, the lowest-cost producer with the leanest cost structure is the one that survives the price war everyone else is dreading.

Bear case

DeepInfra's strategy is the price floor, and living at the floor is a treadmill, not a moat. Winning share by being cheapest in a commoditizing market invites a war of attrition against rivals holding 10-100x its capital: Baseten just closed $1.5B at up to $13B, Fireworks is discussing $15B, Together is capitalized in the billions — any of them can price below cost longer than a company with $133M of lifetime funding can price above it. Owning hardware cuts both ways: it grants margin today and saddles the balance sheet with depreciation risk tomorrow, when Blackwell-era fleets meet Rubin-era price-performance and eight owned data centers become an anchor rather than an edge. Five trillion tokens a week is throughput, not economics — at floor prices, token volume converts to revenue at the worst rate in the industry, and the agent traffic driving growth is disloyal by construction, one OpenRouter re-rank away from leaving. The dozens-strong team that makes the cost structure lean also caps the enterprise motion: no army of solutions engineers, compliance certifications, and dedicated support that CIO-budget deals demand. No disclosed valuation suggests a modest one. Efficient followers in commodity markets get acquired or squeezed; they rarely get big.

Generated by editorial-sourced · July 1, 2026Refreshes quarterly · see disclosures