TransWikia.com

How to show top 10 largest items from git history

Ask Ubuntu Asked on October 31, 2021

I found this command to get the top 10 largest files from my git history (in this closed issue https://github.com/18F/C2/issues/439)

git verify-pack -v .git/objects/pack/pack-7b03cc896f31b2441f3a791ef760bd28495697e6.idx 
| sort -k 3 -n 
| tail -10

It now shows something like this:

32f0dac6ee67325ca12b9c03279ee2dbc7790567 blob   12732444 11425432 1091676437
c63c2851049c51eabbcd54cb46cad367d4e0d593 blob   14368670 12189261 246241495

What do these numbers represent? What of them is the file size? Also I would be very grateful if anyone could break down and explain the top command. I don’t understand it.

2 Answers

Here is another neat solution to this problem using git's ls-tree sub-command:

$ git ls-tree  -rl HEAD | sort -k4 -n | tail | awk '{print $4, $5}' |
numfmt --to=iec-i

4.0Ki  .bashrc
4.0Ki  .config/conky/conky.conf
4.5Ki  .config/rofi/config.rasi
5.4Ki  .vim/notes
7.2Ki  .config/tint2/tint2rc
7.5Ki  .bash_functions
7.5Ki  .vimrc
19Ki   .vim/colors/clrs.vim
38Ki   .config/openbox/rc.xml
63Ki   .config/ipfilter.dat
  • -r to list the files recursively.
  • -l to show object size of blob (file) entries.
  • sort -k4 -n sort numerically based on 4th column.
  • tail cut out the last 10 item.
  • Using awk to only get the 4th and 5th column out out the output.

Answered by Ravexina on October 31, 2021

To reduce the space used by files, git packs objects stored in the repository into a .pack file. This pack file contains the actual git objects and the .idx contains the index used to quickly locate objects within the pack file.

$ git verify-pack -v .git/objects/pack/pack-7b03cc896f31b2441f3a791ef760bd28495697e6.idx 

The above command reads the given .idx file and verifies it with the corresponding pack file. Using -v you get a verbose output.

The third column in the output is the size of the objects. Using sort -k 3 -n we are sorting the output numerically using the 3th column (based on size) and with tail -10 we are cutting out the last 10 which are the largest in size.

To get the name of files from their hash:

$ git ls-tree -r HEAD | grep HASH

To get a list of all names:

$ git verify-pack -v .git/objects/pack/pack-1daab5282d01ab18db98e21a985eb2d288f7faa0.idx | sort -k 3 -n | tail | cut -f1 -d' ' | while read i; do git ls-tree -r HEAD  | grep "$i"; done

100644 blob 6209b3840fa470a534e670cff93bce698ba60819    .bashrc
100644 blob 1131e7127cb2cf6c1f854f728a1794262cdf85f6    .vimrc
100644 blob a249a5ae9b33553f4484da42a019ed14e5f44e21    .vim/colors/clrs.vim
100644 blob f329f223953827e59954f67ad4d76568b6dd894e    .config/openbox/rc.xml

Read more:

$ git verify-pack --help

Unpacking Git packfiles

Git Internals - Packfiles

Git - finding a filename from a SHA1

Answered by Ravexina on October 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP