Train BERT model on Yen’s GPU node

In this example we train a transformer model for sentiment analysis on financial news. This task can provide insights into market movements based on news sentiment. Note: This is a basic example. In a real-world scenario, you’d use a large dataset, fine-tune the model more thoroughly, and handle additional considerations.


We’ll use the transformers library, which provides pre-trained models and tokenizers, in conjunction with PyTorch Lightning. PyTorch Lightning is a lightweight PyTorch wrapper that helps researchers streamline their model training and evaluation.


We’ll use Kaggle dataset of financial phrase bank (Malo et al., 2014). The collection of financial news headlines was annotated by 16 people with background knowledge on financial markets. We will use 2,264 sentences that had 100% agreement in annotations, each sentence labeled by 5-8 people. The data file, Sentences_AllAgree.txt, has the following format:


where sentiment is either positive, neutral or negative. Positive sentiment means that the news headline may positively influence the stock price; negative sentiment means that the headline may negatively influence the stock price; and neutral sentiment means that the headline is not relevant to the stock price.

For example,

For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .@positive

Training Script

The training script:

  • Defines a custom dataset to handle financial news texts and their labels.
  • Uses BERT from the transformers library for sentiment analysis.
  • Trains the model using PyTorch Lightning’s Trainer on Yen’s GPU.

Login to Yens

Enter your SUNet ID and Duo authenticate to login.

$ ssh <$USER>

The data, Python training script and example Slurm script are in GitHub repo that you will need to copy to the Yens.

On the Yens, navigate where you want to copy the repo to and run:

$ git clone  

After the repo is downloaded, cd into the yens-gpu-demo directory:

$ cd yens-gpu-demo

Before submitting the Slurm script, check how many GPU’s, CPU and CPU RAM are available with squeue -p gpu and sinfo -p gpu commands to see how many jobs are running on the yen-gpu1 node and how many resources are idle.

Let’s look at the Slurm script, finBERT-train.slurm.

$ cat finBERT-train.slurm

# Example slurm script to train a PyTorch deep learning model on Yen GPU

#SBATCH -J finBERT-train               # Job name
#SBATCH -p gpu                         # Partition (queue) to submit to
#SBATCH -c 10                          # Number of CPU cores
#SBATCH -N 1                           # Request one node
#SBATCH -t 1:00:00                          # Max job time: 1 day
#SBATCH -G 1                           # Max GPUs per user: 2
#SBATCH -o finBERT-train.out           # Output file
#SBATCH --mail-type=ALL                # Email notifications

# Load PyTorch module
ml pytorch

# Execute the Python script with multiple workers (for data loading)
srun python ${SLURM_CPUS_PER_TASK}

This script is asking for one GPU on the gpu partition and 10 CPU cores for 1 day.

Edit the Slurm script to include your email address.

When ready to submit, run with:

$ sbatch finBERT-train.slurm

Once your job starts running, check the Slurm queue for your username:

$ squeue -u $USER
            225707       gpu finBERT-    user1  R       0:06      1 yen-gpu1

We can monitor the log file while the job is running:

$ tail -f *out

Also, check that GPU is utilized while script is running by ssh‘ing into the yen-gpu1 node and running nvidia-smi.

$ ssh yen-gpu1

After logging into the GPU node:

$ watch nvidia-smi

Monitor the GPU utilization under GPU-Util column and make sure you see your python process running. You can also check GPU RAM that is being used and adjust your batch size in future training runs:

| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A30                      On | 00000000:17:00.0 Off |                    0 |
| N/A   45C    P0              157W / 165W|   4515MiB / 24576MiB |     45%      Default |
|                                         |                      |             Disabled |
|   1  NVIDIA A30                      On | 00000000:65:00.0 Off |                    0 |
| N/A   34C    P0               32W / 165W|      0MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
|   2  NVIDIA A30                      On | 00000000:CA:00.0 Off |                    0 |
| N/A   31C    P0               29W / 165W|      0MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
|   3  NVIDIA A30                      On | 00000000:E3:00.0 Off |                    0 |
| N/A   33C    P0               29W / 165W|      0MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |

| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|    0   N/A  N/A    170912      C   ...tware/free/pytorch/2.0.0/bin/python     4512MiB |

After the script is done running, check email for CPU/RAM utilization and adjust for future submissions.

Once the job is done, look at the output file:

$ cat finBERT-train.out 

The out file should tell you that the GPU was detected and used.

GPU available: True (cuda), used: True

We print the accuracy on the test set before training:

Untrainted model accuracy:
┃        Test metric        ┃       DataLoader 0        ┃
│         test_acc          │    0.18421052631578946    │

Then we train for 3 epochs which takes about 100 seconds and we print the test set accuracy after training:

┃        Test metric        ┃       DataLoader 0        ┃
│         test_acc          │    0.9736842105263158     │