October 2014
M T W T F S S
« Mar    
 12345
6789101112
13141516171819
20212223242526
2728293031  
145

Refs

Categories

Archives

profile for slm on Stack Exchange, a network of free, community-driven Q&A sites

[one-liner]: Calculating Disk Space Usage for a List of Files Using du under Linux

Background

Being the most techie of my group of friends I often get other people’s problems dumped into my lap when something goes wrong with their computer. I generally don’t mind since it gives me the opportunity to help out a friend and work on some obscure problems, such as:

  • issue #1: laptop no longer works, ah can I pack it up and ship it to you?
  • issue #2: ah … yeah we never backed up any of the pictures on our computer like you told us to!
  • issue #3: we think we got some sort of virus and it’s no longer working, can you fix it?

Today I’ll be discussing what I did for a recently received computer with issues #1 & #2 8-).

Getting Started

For starters, I used info from a previous post where I used a Rosewill SATA or IDE 3-in-1 (5.25″, 3.5″, 2.5″) to USB2.0 Cable Converter Adapter to mount an internal 2.5″ laptop SATA drive as a external USB device. First I wanted to determine how much diskspace was being used by a list of certain types of files:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# file extensions that we're looking for
% fileTypes="\.mp3$|\.doc$|\.xls$|\.ppt$|\.jpg$"
 
# results that we want to ignore
% ignoreList="\.lnk|/Windows/winsxs|/Temporary Internet Files/|/Program Files/AWC/|\
	/Program Files/Common Files/microsoft shared/|RECYCLE.BIN/|Toshiba|Picasa|\
	/Program Files/HP/|/Program Files/Coupons/|TOSHIBA Games|MovieFactory|\
	/ProgramData/Skype/Plugins/Local Cache/"
 
# find the files and print their total size
% find . -type f -print | \
	egrep -i "$fileTypes" | \
	egrep -v "$ignoreList" | tr '\n' '\0' | du -ch --files0-from=-
 
# egrep -i "$fileTypes"     -   files we're looking for (.mp3, .doc, etc.)
# egrep -v "$ignoreList"    -   strings within filenames which we want to ignore
# tr '\n' '\0'              -   convert the carriage returns to null characters
# du -ch --files0-from=-    -   passing the list of files now seperated by \0 via
#                               STDIN (--files0-from=-), calculate the size of each
#                               file along with the total amount for all the files
#                               (-c). The -h prints the sizes in human readable
#                               format.

The above find command will return output like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
...
...
24K	./Users/Scott/AppData/Local/Temp/Temp1_agenda[1].zip/November Board Meeting Kelly.doc
1.9M	./Users/Scott/AppData/Local/Temp/Ppt0000000.ppt
12K	./Users/Scott/AppData/Local/Temp/~WRD0001.doc
2.1M	./Users/Scott/Pictures/Nikon Transfer/December 2010/DSCN1173.JPG
2.0M	./Users/Scott/Pictures/Nikon Transfer/Moms Club Easter 4-3-10/DSCN0007.JPG
568K	./Windows/Web/Wallpaper/img26.jpg
300K	./Windows/Web/Wallpaper/img28.jpg
484K	./Windows/Web/Wallpaper/img29.jpg
1.6M	./Windows/Web/Wallpaper/img7.jpg
1.3M	./Windows/Web/Wallpaper/img8.jpg
1.5M	./Windows/Web/Wallpaper/img9.jpg
672K	./Windows/Web/Wallpaper/toshiba_1920x1200-1.jpg
4.2G	total

So it looks like I would be able to fit all these files on a single 4.7GB DVD. The next step was to extract these files to a temporary location. To accomplish this I used the same technique that I had used previously making use of the little know command cpio.

1
2
3
4
5
6
7
8
9
% find . -type f -print | \
	egrep -i "$fileTypes" | \
	egrep -v "$ignoreList" | cpio -pavd /tmp/scottdrive_local
 
# NOTES on cpio options:
# ‘-p’ Run in copy-pass mode
# ‘-a’ Resets the access time of a file after reading, looks like it wasn't read
# ‘-v’ List the files processed
# ‘-d’ Create leading directories where needed

This find command is slightly different than the one used above to calculate the diskspace. The primary difference being that it drops the tr command and passes the unaltered list of files directly to cpio. cpio’s copy-pass mode (-p) switch allows for this! From the temp. directory I’ll be using k3b to burn the contents directly to a DVD.

Works pretty flawlessly!

NOTE: For further details regarding my one-liner blog posts, check out my one-liner style guide primer.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>