Skip to content

DARC Blog

This blog is a place to learn how to solve Research Computing tasks at Stanford GSB.

description

Fine-Tuning Open Source Models

When large language models (LLMs) first appeared, they felt almost magical — you could ask them anything and they’d reply with surprisingly fluent text. But once you start applying them in research or production, the limitations show up quickly. The base model can sort of do your task, but not reliably enough. That’s where fine-tuning comes in.

Running Ollama on Stanford Computing Clusters

Imagine running a notebook cell like this — and getting a full response from a large language model hosted on your own cluster:

LLM running on Yen Jupyter

With Ollama, you can host models like Llama 3 or DeepSeek on Stanford’s GPU clusters — no API keys, no external calls — and interact with them through your own code or notebooks.

This guide walks you through setting up Ollama across Stanford's GPU computing clusters — Yen, Sherlock, and Marlowe — to efficiently run large language models (LLMs).

Introduction to Using Redivis

Redivis is a powerful data querying and analysis platform built specifically with researchers in mind. It is currently the GSB's solution for hosting Big Data (data on the scale of TBs) for researchers at the school. At the time of this post, the StanfordGSBLibrary Redivis organization hosts more than 50 datasets consisting of over 100 TB of data with 300 organization members.

If you are a researcher who is just starting to use Redivis, or considering using the platform, this blog post will help you get started by covering some common use cases and helpful tips.

Train Machine Learning Models on Colab GPU

Google Colab

Google Colab enables you to run Jupyter notebooks in the cloud with the option to use a CPU or accelerate computations by adding GPU or TPU support. We will use the free Colab tier, but for longer training jobs or access to better GPUs (e.g., T4, P100, or V100), the paid Colab Pro or Colab Pro+ option may be a better choice. Navigate to Colab website and check out an example Jupyter notebook that uses a GPU for machine learning training.

Reproducible Research Essentials

This guide provides the foundational components needed to ensure reproducibility in your research. It focuses on:

  • Documentating fixed inputs and expected outputs
  • Making a README file
  • Managing computational environments
  • Summary with additional resources
  • Advanced topics

Editing Files on the Command Line

When working within JupyterHub, one can utilize the built-in Text Editor to directly edit scripts on the Yens. However, it is sometimes more convenient and faster to edit files directly from the command line, for instance if you are logged into a terminal and need to make small changes to your Slurm script prior to submission.

In this post, we will illustrate how you can do this using the Vim text editor that comes with Linux distributions and can be used on any HPC system or server. As a specific example, we will make small changes to a Python script from the command line.