Stanford's LLM API Tools
If you need to use LLMs via an API, Stanford provides an official AI API Gateway. This tool provisions API keys and ties the billing to a Stanford account for simple usage and billing.
If you need to use LLMs via an API, Stanford provides an official AI API Gateway. This tool provisions API keys and ties the billing to a Stanford account for simple usage and billing.
When large language models (LLMs) first appeared, they felt almost magical — you could ask them anything and they’d reply with surprisingly fluent text. But once you start applying them in research or production, the limitations show up quickly. The base model can sort of do your task, but not reliably enough. That’s where fine-tuning comes in.
Imagine running a notebook cell like this — and getting a full response from a large language model hosted on your own cluster:

With Ollama, you can host models like Llama 3 or DeepSeek on Stanford’s GPU clusters — no API keys, no external calls — and interact with them through your own code or notebooks.
This guide walks you through setting up Ollama across Stanford's GPU computing clusters — Yen, Sherlock, and Marlowe — to efficiently run large language models (LLMs).
In this example, we train a transformer model for sentiment analysis on financial news using Stanford GSB's Yens GPU partition. This task can provide insights into market movements based on news sentiment.