+inf

+inf

Home
About
Teaching AI to Optimize AI Models for Edge Deployment
How our agent, Möbius automated a Core ML port in ~12 h (vs. 2 weeks), hit 0.99998 parity, and made it 3.5× faster, while staying on the CPU.
Sep 27 • 
Brandon
3

August 2025

Near-Real-Time Speaker Diarization on CoreML
Lessons learned converting PyTorch models to run locally at zero marginal cost on Apple's Neural Engine
Aug 1 • 
Alex Weng
, 
Bharat Jain
, and 
Brandon
5
2

July 2025

Bringing State-of-the-Art AI Models to Intel® NPUs
Partnering with Intel to optimize whisper-large-v3-turbo, qwen-3, and phi-4-mini for NPU acceleration on Intel® AI PCs
Jul 29 • 
Alex Weng
 and 
Bharat Jain
2
Where are the local AI apps?
Millions build AI apps with natural language, but local AI deployment remains  complex. Why?
Jul 5 • 
Brandon
1

June 2025

How are we going to get Intelligence everywhere?
Software took off the moment code became pure logic - portable bytes that ignored the underlying silicon. How will we take the next step for AI?
Jun 19 • 
Brandon
4
2
© 2025 Fluid Inference
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture