May 2013
M T W T F S S
« Apr    
 12345
6789101112
13141516171819
20212223242526
2728293031  
101

Refs

Categories

Archives

9,331slm
●5 ●38 ●132
 

Pushing & Pulling Files Around Using tar, ssh, scp, & rsync

Occasionally I have to copy whole directory trees from one location to another. This copying typically falls into one of two categories:

  • An entire directory from one location to another on the same computer
  • An entire directory from one computer to another computer

There are essentially 2 techniques, PUSH & PULL, which can be used to copy whole trees from one location to another. Below I’m going to cover several methods that make use of these 2 techniques.

Copying directories on the same host

1. tar

1
2
# copy SOURCEDIR into DESTDIR
localhost% tar zcvf - SOURCEDIR | (cd DESTDIR; tar zxvf -)

This approach uses tar to archive the directory SOURCEDIR, redirecting the output to STDOUT instead of a file. The contents of STDOUT are then sent to the pipe. By going through the pipe, the output on STDOUT becomes input on STDIN. This input on STDIN is then sent to everything inside of the parentheses. The commands inside the parentheses, first change directory to DESTDIR, and then un-tars the stream of data coming in via STDIN.

2. cp

1
2
3
4
5
6
7
# copy SOURCEDIR into DESTDIR
# example 1
localhost% cp -a SOURCEDIR DESTDIR/.
# example 2
localhost% cp -cdpR SOURCEDIR DESTDIR/.
# example 3
localhost% cp --preserve=context,mode,ownership,timestamps,links --no-dereference --recursive SOURCEDIR DESTDIR/.

All three examples above do exactly the same thing. Each example is just progressively more verbose. Looking at example 3, the switches should be self explanatory except for maybe –no-dereference. This switch tells cp not to follow links, just create a similar link in the copy being created in DESTDIR.

BTW, I mention cp here because newer versions of cp can in fact be used to make duplicate copies of directories from one location to another on the same host. Older versions, particularly on some older versions of Solaris that I maintain, don’t have the more feature rich version of cp, and so the tar method mentioned above is your only option.

Pushing a directory from localhost —> remotehost

Why call this Push? Conceptually we are “pushing” a duplicate copy of a directory from one location to another, i.e. we are “pushing” this directory FROM the localhost TO a remotehost. If you can’t get your head around the term pushing, think of it as the tar command, pushing the copied directory out from localhost to some destination.

1. tar & ssh

1
2
3
4
5
6
7
8
9
10
11
12
# copy SOURCEDIR from localhost to remotehost over ssh. Untarring begins in /home/user1
localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost tar zxvf -
 
# copy SOURCEDIR from localhost to remotehost over ssh. Untarring in DESTDIR
# example 1
localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost 'cd DESTDIR; tar zxvf - '
# example 2
localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost "(cd DESTDIR; tar zxvf -)"
# example 3
localhost% tar zcvf - SOURCEDIR | ssh -l user1 remotehost 'cd DESTDIR ; tar zxvf -'
# example 4
localhost% tar zcvf - SOURCEDIR | ssh user1@remotehost "cat > /DESTDIR/DESTFILE.tar.gz"

NOTE: If the OS you’re on doesn’t have a tar command that supports the z switch, such as with older versions of Solaris, then drop the z switches from both sides of the commands above.

2. scp

1
2
3
4
5
# contents of SOURCEDIR copied to DESTDIR (only what's in SOURCEDIR, not the directory itself)
localhost% scp -rp SOURCEDIR user1@remotehost:/home/user1/DESTDIR
 
# SOURCEDIR copied under DESTDIR
localhost% scp -rp SOURCEDIR user1@remotehost:/home/user1/DESTDIR/.

3. rsync & ssh

1
2
3
4
5
6
7
8
# copy directory SOURCEDIR to DESTDIR
# example 1
localhost% rsync -avzH -e ssh --progress /SOURCEDIR user1@remotehost:/DESTDIR
# example 2
localhost% rsync -avzH -e'ssh' /SOURCEDIR user1@remotehost:/DESTDIR
 
# copy contents of directory SOURCEDIR to DESTDIR
localhost% rsync -avzH -e ssh --progress /SOURCEDIR/ user1@remotehost:/DESTDIR

Pulling a directory to localhost <— remotehost

Why call this Pull? Conceptually we are “pulling” a directory from one location to another, to create a duplicate. Usually we are “pulling” this directory TO our localhost back FROM a remotehost. If you can’t get your head around the term pulling, think of it as a command being run on a remotehost which streams a directory’s content to your localhost, and then the localhost pulls this stream of data in.

1. tar & ssh

1
2
3
4
5
6
7
# copy SOURCEDIR to DESTDIR
# example 1
localhost% ssh remotehost 'tar zcvf - SOURCEDIR' | tar zxvf -
# example 2
localhost% ssh -n remotehost 'tar zcvf - SOURCEDIR' | tar zxvf -
# example 3
localhost% ssh remotehost "( cd SOURCEDIR ; tar zcvf - SOURCEFILES ) " | tar zxvf -
1
2
3
4
5
6
7
8
9
10
# copy SOURCEDIR to a tar file
# example 1
localhost% ssh remotehost 'tar zcvf - SOURCEDIR' | cat > DESTFILE.tar.gz
# example 2
localhost% ssh -n remotehost 'tar zcvf - SOURCEDIR' | cat > DESTFILE.tar.gz
# example 3
localhost% ssh -n remotehost "tar jcvf - SOURCEDIR" > DESTFILE.tar.bz2
 
# NOTE: Example 3 just demonstrates that the "| cat" is actually redundant,
#       so it can be dropped if you like.

Example 1, from the second code block above, might result in what appears to be a corrupted DESTFILE.tar.gz file. For example, after creating DESTFILE.tar.gz on one of my hosts, it showed up like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
# corrupted archive while using example 1
localhost% tar ztvf DESTFILE.tar.gz 
 
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
 
# file details
localhost% file DESTFILE.tar.gz 
DESTFILE.tar.gz: data
 
localhost% ls -l  | grep DESTFILE
-rw-r--r-- 1 user1  users  102934 2009-07-19 23:10 DESTFILE.tar.gz

Fear not! First, you can try fixing it with the command dos2unix DESTFILE.tar.gz to clean the tar file up. This appears to happen when ssh’ing to a user account that has certain output being generated via the ~/.bashrc login file.

Other times input from STDIN will inadvertently get redirected into the tar command being run via the ssh. To completely disable STDIN input to the ssh, use the -n switch, as in example 2.

Finally, there are still yet other times where neither of these will fix your problem. For example, I use the program mailstat in my ~/.bashrc to display how much new email I have since the last time I logged in. The output from mailstat shows up inside of the DESTFILE.tar.gz file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# contents of DESTFILE.tar.gz corrupted by ~/.bashrc commands
% more DESTFILE.tar.gz 
 
  Total  Number Folder
  -----  ------ ------
  14467       4 folder1
   3891       1 /dev/null
   4424       1  formail +1 -eds >> lists/ls/$MLIST
 336579      13  formail +1 -eds >> lists/rh/$MLIST
   4849       1  formail +1 -eds >> lists/trilug
 228074      13 /home/user1/Mail/INBOX
 108279      16 /home/user1/Mail/main_boxes/razor-caught
  24320       1 lists/sans/newsbites
  57863      11 lists/sunsource/gridengine
  13710       1 main_boxes/spamassassin_caught
  87947      16 main_boxes/Trash
  -----  ------
 884403      78
...
...
*** contents of tar ***
...
...

It turns out that this type of problem is because ~/.bashrc shouldn’t ever include any commands that echo output to STDOUT or STDERR. These commands should really be relocated to either ~/.bash_login or ~/.bash_profile. Relocating anything that echoes output to STDIN or STDERR results in a correctly transferred DESTFILE.tar.gz.

1
2
3
4
5
6
# correct DESTFILE.tar.gz
localhost% file DESTFILE.tar.gz 
DESTFILE.tar.gz: gzip compressed data, from Unix, last modified: Tue Jul 21 01:13:48 2009
 
localhost% ls -l | grep DEST
-rw-r--r--  1 user1   users    102400 2009-07-21 00:49 DESTFILE.tar.gz

2. scp

1
2
3
4
5
# contents of SOURCEDIR copied to DESTDIR (only what's in SOURCEDIR, not the directory itself)
localhost% scp -rp user1@remotehost:/home/user1/SOURCEDIR DESTDIR
 
# SOURCEDIR copied under DESTDIR
localhost% scp -rp user1@remotehost:/home/user1/SOURCEDIR DESTDIR/.

3. rsync & ssh

1
2
3
4
5
6
7
8
# copy directory SOURCEDIR to DESTDIR
# example 1
localhost% rsync -avzH -e ssh --progress user1@remotehost:/SOURCEDIR /DESTDIR
# example 2
localhost% rsync -avzH -e'ssh' user1@remotehost:/SOURCEDIR /DESTDIR
 
# copy contents of directory SOURCEDIR to DESTDIR
localhost% rsync -avzH -e ssh --progress user1@remotehost:/SOURCEDIR/ /DESTDIR

These sites proved useful for working out some of the finer points:

4 comments to Pushing & Pulling Files Around Using tar, ssh, scp, & rsync

  • Troy Curtis

    Just read this in my reader, so it is a bit old. I’m surprised someone has not already pointed out that tar uses standard input and standard output by default, and you have to explicitly tell it to use a file using the ‘-f’ option. So all the ‘f -’ options in your tar commands can be taken out! That’s a savings of 2 characters! Ha!

    Also, I’m often in a situation that I need to archive and push whole directory trees as root, but due to security restrictions I cannot login over the network directly as root (we always have to use ‘su’ for audit purposes). That means the pipe through ssh, scp, and rsync methods are not an option. For a while I had to create the archive file, move it, then extract it, until I learned about netcat. Now I do something like this:

    1. Setup netcat to listen on an unused port, I tend to use 20000 and pipe the output to tar
    2. I like the feedback on the receive side so I know it is doing something (hence the ‘v’ option)
    3. Depending on your version of netcat you might need ‘-p’ to specify the port
      remotehost% nc -l 20000 | tar xv
    1. Now pipe the output of tar through netcat
      localhost% tar c | nc remotehost 20000

    Of course this data is in the clear, which is fine for my LAN case, for a WAN case you’d probably want all that to go through an ssh tunnel. Also I don’t use compression because in my use it is Gigabit which means it just wastes CPU :)

  • Dustin

    I’m having a hell of a time taking this to the next step.

    I have several low-resource embedded machines that I’d like to stream .tar.gz out of, but have it finalize on the local machine with a heavier compression. I cannot for the life of me get tar to accept stdin and then funnel it back out stdout.

    I’ve tried several variations, and it just plain doesn’t work… I have no idea.

    I can get this to work:

    1
    
    ssh user@host "tar --gzip --create --file - /" &gt; /path/backup.tar.gz

    And this, too (restores file tree as original):

    1
    
    ssh user@host 'tar --gzip --create --file - /' | tar --gzip --extract --file -

    But this won’t (I’ve tried several variations):

    1
    
    ssh user@host "tar --gzip --create --file - /" | tar --gzip --extract - - | tar --xz --create --file backup.tar.xz -

    It seems to take issue with the second ‘stage.’ I’ve tried it with one and two “-” and it makes no difference. Also, with and without “–file” and “–force” – no effect. It talks about can’t find archive, can’t create empty archive, etc… Any help?

    • I have to admit that your question stumped me for a loooong time. I would come back to it every once in a while as if it were a puzzle. Well I think I’ve figured out a solution to your question. Here it goes:

      1
      
      ssh user@host "tar --gzip --create --file - /" | tar --gzip --extract --verbose --file - | xargs tar --xz --create --file backup.tar.xz

      You can compact this down like so:

      1
      
      ssh user@host "tar zcf - /" | tar zxvf - | xargs tar Jcf backup.tar.xz

      The 2 key features that I was finally able to figure out are the addition of the verbose switch (–verbose|v) in the middle tar “stage” and the use of xargs in the last “stage”. The verbose switch is necessary to get the middle “stage” to emit the list of files that are being unpacked and xargs acts as a sink picking up all these file names and feeding them to the next tar command.

      Thanks for the question and hope this helps. Sorry it took 2 years!

  • Steve Foris

    This is an excellent write up! Thanks :-)

    A few things i’d like to add:
    1. The use of tar “-cf -” helps alot with Unix compatibilites I just had to use a similar technique to move a bunch of data from varying *nix.
    2. If you need to do things as root sudo can work well as long as requiretty has been turned off in the sudoers file.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>