How to calculate the memory requirement of Bert?

Question

I am curious about the memory usage of transformers.BertModel. I would like to use the pretrained model to transform text and save the output of token [CLS]. No training, only inference.
My input to bert is 511 tokens. With the batch size being 16, my code runs out of memory. The GPU has 32GB memory. My question is how to estimate the memory usage of Bert.
Strangely the other job having batch size 32 finished successfully, with the same set up. My code are listed below.
# Create dataloader
bs = 16
train_comb = ConcatDataset([train_data, valid_data])
train_dl = DataLoader(train_comb, sampler=RandomSampler(train_data), batch_size=bs)

model = BertModel.from_pretrained('/my_dir/bert_base_uncased/',
                                  output_attentions=False,
                                  output_hidden_states=False)
model.cuda()
out_list = []
model.eval()
with torch.no_grad():
    for d in train_dl:
        d = [i.cuda() for i in d].      # d = [input_ids, attention_mask, token_type_ids, labels] 
        inputs, labels = d[:3], d[3]    # input_ids has shape 16 x 511
        output = model(*inputs)[0][:, 0, :]
        out_list.append(output)

outputs = torch.cat(out_list)

Later I changed the for loop to below
with torch.no_grad():
    for d in train_dl:
        d = [i.cuda() for i in d[:3]]          # don't care about the labels
        out_list.append(model(*d)[0][:, 0, :]) # remove the intermediary variables
    del d

To summarize, my questions are:

How to estimate the memory usage of Bert? I want to use it to estimate the batch size.
My 2nd job having batch size 32 finished successfully. Is it because it has more paddings?
Is there any suggestion on improving the efficiency of the memory usage in my code?

Lei Hao · Answer

After some searching, it turns out the error was caused by appending the output to list in GPU. With following code, the error is gone.
with torch.no_grad():
    for d in train_dl:
        d = [i.cuda() for i in d[:3]]          
        out_list.append(model(*d)[0][:, 0, :].cpu()) 
    del d

Without .cpu(), the memory keep increasing
Tensor size: torch.Size([4, 511]), Memory allocated: 418.7685546875MB
Tensor size: torch.Size([4, 768]), Memory allocated: 424.7568359375MB

Tensor size: torch.Size([4, 511]), Memory allocated: 424.7568359375MB
Tensor size: torch.Size([4, 768]), Memory allocated: 430.7451171875MB

Tensor size: torch.Size([4, 511]), Memory allocated: 430.7451171875MB
Tensor size: torch.Size([4, 768]), Memory allocated: 436.7333984375MB

With .cpu(), the memory doesn't change.
Tensor size: torch.Size([128, 511]), Memory allocated: 420.21875MB
Tensor size: torch.Size([128, 768]), Memory allocated: 420.21875MB

Tensor size: torch.Size([128, 511]), Memory allocated: 420.21875MB
Tensor size: torch.Size([128, 768]), Memory allocated: 420.21875MB

How to calculate the memory requirement of Bert?

One Answer

Add your own answers!

Ask a Question