LLM

April 23, 2026
in LLM
9 min read

Self-Hosting LLMs with NVIDIA NIM on the Yens

NVIDIA NIM (NVIDIA Inference Microservices) lets you deploy optimized, production-ready large language models on your own infrastructure. Instead of sending data to a third-party API, you pull a pre-packaged container, start it on a GPU node, and query it through a standard OpenAI-compatible REST endpoint — all within Stanford's network.

This guide walks through deploying Google's Gemma 4 31B IT model on a single H200 GPU on the Yen cluster using Singularity containers.

H200 GPUs Required

NIM containers require H200 GPUs and do not work on older GPU architectures such as the A30 or A40. On the Yen cluster, the only node with H200 GPUs is yen-gpu4. All examples in this guide use yen-gpu4 as the target node.

March 6, 2026
in LLM
2 min read

Stanford's LLM API Tools

If you need to use LLMs via an API, Stanford provides an official AI API Gateway. This tool provisions API keys and ties the billing to a Stanford account for simple usage and billing.

November 7, 2025
in LLM
12 min read

Fine-Tuning Open Source Models

When large language models (LLMs) first appeared, they felt almost magical — you could ask them anything and they’d reply with surprisingly fluent text. But once you start applying them in research or production, the limitations show up quickly. The base model can sort of do your task, but not reliably enough. That’s where fine-tuning comes in.

May 12, 2025
in LLM
6 min read

Running Ollama on Stanford Computing Clusters

Imagine running a notebook cell like this — and getting a full response from a large language model hosted on your own cluster:

LLM running on Yen Jupyter

With Ollama, you can host models like Llama 3 or DeepSeek on Stanford’s GPU clusters — no API keys, no external calls — and interact with them through your own code or notebooks.

This guide walks you through setting up Ollama across Stanford's GPU computing clusters — Yen, Sherlock, and Marlowe — to efficiently run large language models (LLMs).

March 28, 2024
in LLM, BERT, GPU, Sentiment Analysis
5 min read

Fine-Tuning BERT for Sentiment Analysis on Financial News

In this example, we train a transformer model for sentiment analysis on financial news using Stanford GSB's Yens GPU partition. This task can provide insights into market movements based on news sentiment.