Kernel Inference Platform Engineer | Together AI Global Negotiation Guide
Negotiation DNA: Pre-IPO Equity + Base + Bonus | AI Inference & Optimization | 400% YoY Growth | $3.3B Valuation | SIGNATURE ROLE | +25-35% Kernel Premium
| Region | Base Salary | Equity (Options/RSU est.) | Bonus | Total Comp |
|---|---|---|---|---|
| San Francisco | $215K–$285K | $75K–$125K | 15–20% | $275K–$388K |
| Seattle | $204K–$271K | $71K–$119K | 15–20% | $261K–$369K |
| Remote US | $193K–$256K | $68K–$112K | 15–20% | $248K–$349K |
Negotiating a Kernel Inference Platform Engineer offer at Together AI?
Get a personalized playbook with your exact counter-offer numbers, word-for-word scripts, and a day-by-day negotiation plan.
Get My Playbook — $39 →Negotiation DNA
The Kernel Inference Platform Engineer is Together AI's SIGNATURE ROLE — the engineer who writes, optimizes, and ships the actual kernel code that powers Together Kernel, the proprietary inference optimization framework built on FlashAttention. This is not a supporting role. This is the role that Together AI was built around. You write CUDA kernels, optimize GPU memory hierarchies, implement custom attention mechanisms, and push the performance frontier of AI inference across every major GPU architecture. Together AI's 400% YoY revenue growth, $3.3B valuation, and industry-defining technical moat all trace back to the work done by engineers in this exact role. When every major AI lab uses FlashAttention, they are using code created by the team you would join. (Source: Together AI founding team profiles; FlashAttention paper and citation history, Dao et al. 2022; Series B coverage, The Information 2025.)
Level Mapping: Together AI Kernel Inference Platform Engineer ≈ Google L5–L7 Systems Engineer · Meta E5–E7 Infrastructure Engineer · NVIDIA Senior/Staff CUDA Engineer · OpenAI IC3–IC5 Infrastructure · Anthropic Senior/Staff Infrastructure Engineer · AMD Senior Kernel Developer
Together Kernel — The FlashAttention Premium
Together AI created FlashAttention — the single most impactful algorithmic optimization in modern AI, used by every major AI lab from OpenAI to Google DeepMind to Anthropic to Meta. Together Kernel is the proprietary inference optimization framework that delivers state-of-the-art speed and efficiency, and as a Kernel Inference Platform Engineer, you ARE Together Kernel. You write the code. You optimize the memory access patterns. You implement the custom CUDA kernels that make transformer inference 2–5x faster than the competition. With 400% YoY revenue growth and a $3.3B valuation, Together AI's entire business model is built on the code you produce.
This is the ultimate Kernel Premium role, justifying a 25–35% Kernel Innovation premium over comparable systems engineering positions. There are perhaps 200–500 engineers on Earth who can do this work at this level — you are in one of the most exclusive talent pools in all of technology.
Frame your negotiation with maximum conviction: "I build the optimization layer that makes AI inference fast and cheap everywhere — the foundational infrastructure of AI deployment. FlashAttention is used by every major AI lab. Together Kernel is the productized, next-generation evolution. I am the engineer who writes this code. My compensation should reflect that I'm building the most consequential infrastructure in AI."
The 25–35% Kernel Premium is not aspirational — it is the market reality. NVIDIA pays $400K+ for senior CUDA engineers. Google pays $450K+ for L6 systems engineers. Together AI's pre-IPO equity upside must bridge the gap, and the Kernel Premium ensures the base and total comp are competitive with these liquid alternatives.
Global Levers
- Extreme Talent Scarcity: "There are fewer than 500 engineers globally who can write production CUDA kernels for transformer inference optimization. I'm one of them. The market for this skill isn't competitive — it's non-existent. There is no hiring pipeline; there is only poaching."
- Pre-IPO Equity Maximum: "This is the SIGNATURE role at Together AI. My equity grant should be in the top 1% of the company's IC grants — I want $120K–$125K annually at current 409A, with double-trigger acceleration and a one-year cliff."
- FlashAttention Successor: "I'd be joining the team that created FlashAttention — the most cited, most adopted algorithmic optimization in modern AI. My work on Together Kernel will define the next generation. That pedigree and impact justify the maximum Kernel Premium of 35%."
- Revenue Engine: "Together AI's inference platform revenue is directly proportional to kernel performance. Every microsecond I shave off inference latency translates into margin improvement across millions of API calls. I am the revenue engine, not a cost center."
Negotiate Up Strategy: "This is the role I was made for, and Together AI is the only place to do it. I am holding competing offers: $380K TC from NVIDIA for a Senior CUDA Engineer role, and $400K TC from [Google/Meta] for an L6 systems engineering position. To accept Together AI, I need a base of $280K, equity valued at $120K/year at current 409A with double-trigger acceleration, and a 20% bonus target — bringing my total comp to approximately $385K. My accept-at floor is $350K TC. Below that, the delta against my liquid NVIDIA/FAANG offers is too large, even accounting for pre-IPO upside at $3.3B valuation. I also require: a $50K sign-on bonus to bridge year-one vesting, written confirmation of my scope as a core contributor to Together Kernel, publication rights for non-proprietary kernel optimization research, and a dedicated GPU compute allocation for kernel development experimentation. I want to build the future of inference here — make the offer match the ambition."
Evidence & Sources
- Together AI $3.3B valuation, 400% YoY revenue growth, SIGNATURE kernel engineering role (The Information, Forbes, TechCrunch 2025)
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (Dao et al., 2022; 5,000+ citations on Google Scholar)
- FlashAttention-2 and FlashAttention-3 successive performance improvements (arXiv, 2023–2024)
- NVIDIA CUDA engineer and Google/Meta systems engineer compensation benchmarks (Levels.fyi, 2025–2026)
- Together Kernel inference benchmarks vs. competitors (Together AI technical blog, MLPerf results, 2025)
- Kernel and systems engineer talent scarcity analysis (Rethink AI talent report, 2025)
Ready to negotiate your Together AI offer?
Get a personalized playbook with exact counter-offer numbers and word-for-word scripts.
Get My Playbook — $39 →