Engineering Blog

Blog

Deep dives into AI infrastructure, GPU performance benchmarks, customer stories, and the latest from the NeuralVane platform.

Infrastructure

How We Built a 400Gbps InfiniBand Fabric Across 12 Regions

A deep dive into the network architecture behind NeuralVane's multi-region GPU clusters. We cover topology design, congestion control, and how we achieve near-linear scaling for distributed training jobs.

January 15, 2025 · 12 min read · By Dr. Sarah Lin
Benchmark

H100 vs. H200 vs. B200: Real-World Training Benchmarks on NeuralVane

We ran identical LLM training workloads across three GPU generations on our platform. The results reveal surprising insights about memory bandwidth, interconnect utilization, and cost-per-token economics.

January 8, 2025 · 18 min read · By Marcus Rodriguez
Customer Story

How Meridian AI Reduced Training Costs by 60% After Migrating to NeuralVane

Meridian AI was spending $2.4M/month on GPU compute with a major cloud provider. After migrating to NeuralVane, they cut costs by 60% while improving training throughput by 3.2x. Here's their story.

December 20, 2024 · 8 min read · By Elena Petrov
Product Launch

Introducing NeuralVane Inference Engine: Sub-10ms Latency at Scale

Today we're launching NeuralVane Inference Engine — a fully managed serving platform optimized for LLMs and diffusion models. Automatic batching, speculative decoding, and global edge deployment built in.

December 12, 2024 · 6 min read · By Arjun Krishnamurthy
Infrastructure

Designing for Failure: Our Approach to GPU Cluster Resilience

GPUs fail. Nodes go down. Networks partition. In this post, we explain how NeuralVane's checkpoint-and-resume architecture ensures your training jobs survive hardware failures without losing progress.

November 28, 2024 · 14 min read · By James Whitfield
Benchmark

NeuralVane Storage: 120 GB/s Throughput for Data-Intensive Training Pipelines

Training large models requires feeding data at extraordinary rates. We benchmarked our distributed storage layer against S3, GCS, and local NVMe to show how NeuralVane eliminates I/O bottlenecks.

November 15, 2024 · 10 min read · By Dr. Riya Nakamura

Subscribe to our engineering newsletter

Get monthly deep dives on AI infrastructure, performance tips, and product updates delivered to your inbox.