William and Inigo here - Co-Founders at Luel (YC W26).
Luel is a rights-cleared data marketplace + collection engine. Our difference is speed and edge cases: AI enterprises request datasets to spec, we mobilize a global contributor network, and deliver licensed, audit-ready data within days.
Frontier labs have hit a wall: public web data is tapped out, synthetic-only pipelines risk degeneration, and the next generation of models needs rights-cleared multimodal data that doesn’t exist at scale.
Companies are spending huge budgets on generic, low-signal datasets from legacy vendors
The internet’s “easy data” is largely exhausted, what’s left is low-signal, repetitive, or messy
Most datasets fail production requirements: unclear rights, weak provenance, missing consent, inconsistent metadata
The Solution
Luel delivers to-spec multimodal datasets with clean provenance:
Custom collections: you specify exactly what you need; we scope, recruit, QA, and deliver
Off-the-shelf licensing: completed collections become ready-to-license catalogue datasets (ranging from patient-doctor conversations in south Asia to gemstone manufacturing footage for robotics)
Rights trail included: built for procurement + compliance from day one (consent evidence, chain-of-title, QA logs)
How it works
AI teams submit a dataset spec (modality, scenario, instructions, devices, QA rules)
We post a listing and instantly match vetted contributors
Submissions run through multi-stage QA and are delivered within days.
The Team:
William Namgyal - CEO
2x Founding Engineer, Berkeley M.E.T. Dropout
Our ask
If you’re training multimodal / robotics / speech models and need data, we’d love to talk. Intros to heads of data, applied research, or model training teams = 🙏