DARC Blog

This blog is a place to learn how to solve Research Computing tasks at Stanford GSB.

description

November 7, 2025
in LLM
12 min read

Fine-Tuning Open Source Models

When large language models (LLMs) first appeared, they felt almost magical — you could ask them anything and they’d reply with surprisingly fluent text. But once you start applying them in research or production, the limitations show up quickly. The base model can sort of do your task, but not reliably enough. That’s where fine-tuning comes in.

May 12, 2025
in LLM
6 min read

Running Ollama on Stanford Computing Clusters

Imagine running a notebook cell like this — and getting a full response from a large language model hosted on your own cluster:

LLM running on Yen Jupyter

With Ollama, you can host models like Llama 3 or DeepSeek on Stanford’s GPU clusters — no API keys, no external calls — and interact with them through your own code or notebooks.

This guide walks you through setting up Ollama across Stanford's GPU computing clusters — Yen, Sherlock, and Marlowe — to efficiently run large language models (LLMs).

December 6, 2024
in redivis, data, big data
6 min read

Introduction to Using Redivis

Redivis is a powerful data querying and analysis platform built specifically with researchers in mind. It is currently the GSB's solution for hosting Big Data (data on the scale of TBs) for researchers at the school. At the time of this post, the StanfordGSBLibrary Redivis organization hosts more than 50 datasets consisting of over 100 TB of data with 300 organization members.

If you are a researcher who is just starting to use Redivis, or considering using the platform, this blog post will help you get started by covering some common use cases and helpful tips.

November 21, 2024
in File Transferring, Downloading
6 min read

Download Files Directly to the Yen Servers

Instead of downloading the file to your local machine and transferring it to the Yens, you can streamline the process by downloading it directly to the Yens server using the wget command.

March 28, 2024
in Machine Learning, Colab, GPU
6 min read

Train Machine Learning Models on Colab GPU

Google Colab

Google Colab enables you to run Jupyter notebooks in the cloud with the option to use a CPU or accelerate computations by adding GPU or TPU support. We will use the free Colab tier, but for longer training jobs or access to better GPUs (e.g., T4, P100, or V100), the paid Colab Pro or Colab Pro+ option may be a better choice. Navigate to Colab website and check out an example Jupyter notebook that uses a GPU for machine learning training.

March 28, 2024
in LLM, BERT, GPU, Sentiment Analysis
5 min read

Fine-Tuning BERT for Sentiment Analysis on Financial News

In this example, we train a transformer model for sentiment analysis on financial news using Stanford GSB's Yens GPU partition. This task can provide insights into market movements based on news sentiment.

February 28, 2024
in Reproducibility, Research
3 min read

Reproducible Research Essentials

This guide provides the foundational components needed to ensure reproducibility in your research. It focuses on:

Documentating fixed inputs and expected outputs
Making a README file
Managing computational environments
Summary with additional resources
Advanced topics

September 20, 2023
in File Editing
2 min read

Editing Files on the Command Line

When working within JupyterHub, one can utilize the built-in Text Editor to directly edit scripts on the Yens. However, it is sometimes more convenient and faster to edit files directly from the command line, for instance if you are logged into a terminal and need to make small changes to your Slurm script prior to submission.

In this post, we will illustrate how you can do this using the Vim text editor that comes with Linux distributions and can be used on any HPC system or server. As a specific example, we will make small changes to a Python script from the command line.

September 18, 2023
in File Transfer, rclone, Google Drive
3 min read

`rclone` files from Yens to Google Drive

rclone is a versatile and convenient tool for executing data transfers in and out of the Yens as it supports various sources/destinations, including Google Drive, Amazon S3, Dropbox, SFTP endpoints, etc. In this post, we will illustrate how to use rclone to transfer files from the Yens to Google Drive.

August 23, 2023
in R, Environments
2 min read

How Do I Export My R Environment To JupyterHub?

If you've created an R virtual environment and want to use it as a kernel in JupyterHub, follow the steps below.

Note

Want to use a Python virtual environment in JupyterHub? Have a look at this page