Ishiki Labs: Building the Future of Multimodal AI

Ishiki Labs

Building the Future of Multimodal AI

Winter 2026

Active

Deep Learning

Generative AI

https://www.ishikilabs.ai/

Building the Future of Multimodal AI

Current multimodal models can see and hear. But they talk when they shouldn't. They can't tell if you're speaking to them or someone else. We are building an AI that knows when to stay silent, yet still understanding what's going on in your conversation, so it can best assist when you do need it in real time. Our first version, fern-0.1: provides real-time expert opinions on demand, instant task delegation, zero interruptions. All as fast as ChatGPT voice and Gemini live.

Active Founders

Robert Xu

Founder

Co-founder & CTO of Ishiki Labs (W26). Previously worked on multi-modal AI and Orion AR glasses at Meta and research infra at Citadel Securities.

Robert Xu

Founder

Co-founder & CTO of Ishiki Labs (W26). Previously worked on multi-modal AI and Orion AR glasses at Meta and research infra at Citadel Securities.

Amit Yadav

Founder

Cofounder and CEO at Ishiki Labs (W26). Previously: Research Scientist, first in LlaMA team training multimodal LLMs and then in Reality Labs at Meta training video assistant for smart glasses. PhD from Purdue University with 20+ publications at top conferences like CVPR, NeurIPS, and ICASSP

Amit Yadav

Founder

YC Photos

Hear from the founders

How did your company get started? (i.e., How did the founders meet? How did you come up with the idea? How did you decide to be a founder?)

Making multimodal AI truly human-like is hard. We are a highly technical team with deep research and system experience on building multimodal assistants. Amit has a PhD in AI, worked as an AI research scientist at Meta, and trained multimodal assistant for smart glasses. He also researched with Meta SuperIntelligence Lab training multimodal LLMs and has 20+ publications at top conferences in AI including NeurIPS, CVPR, and ICASSP. For the last 4 years, Robert has been building advanced orchestration systems for running multimodal assistants on smart glasses, optimizing for latency and compute. Before Meta, Robert worked on developing research infrastructure at Citadel Securities.

What is your long-term vision? If you truly succeed, what will be different about the world?

World will have a human like multimodal AI assistant. Simplest way to say that is: if an expert human watching a video stream can help, we're building the AI that can too.