Compress and Decompress
tar
##reference:https://man.linuxde.net/tar
- Compress:
tar -zcvf File.tar.gz File
- Decompress:
tar - zxvf File.tar.gz
Format | Decompress | compress |
---|---|---|
.tar | xvf | cvf |
.tar.gz | zxvf | zcvf |
.tar.bz2 | jxvf | jcvf |
.tar.bz | jxvf | |
.tar.Z | .tar.Z | .tar.Z |
Tar with ssh to substitute scp
This pipe could help you upload large files much faster than scp
. It is a very good way to substitute scp
.
I tried to backup 1.4 T files from a moveable hard drive with scp
and it takes a half hour for 24 KB files. It spends most of the time reading files.
When I switched this pipeline, a few decade gigabytes was uploaded within a few minutes. It is crazy fast!!!
Cite: roaima; 2015
|
cp files with tar
|
Samll size fiels:
cp -r Github /media/Side/ken/Github
cp -r Github /media/Side/ken/Github 0.00s user 0.23s system 27% cpu 0.835 total
time tar cf -Github | gzip | ssh ken@0.0.0.0 'cd /media/Side/ken && gzip -d | tar xvf -'
tar cf - Githu* 0.06s user 0.32s system 2% cpu 13.564 total gzip 10.70s user 0.04s system 79% cpu 13.566 total ssh ken@0.0.0.0 'cd /media/Side/ken && gzip -d | tar xvf -' 0.59s user 0.26s system 6% cpu 13.567 total
For the Github
directory, cp
only takes less than 1 s, but take 13.5s
for tar-pipe. So, if you have lots of small files, cp
still are your first choose.
Large file test
check the size of the file: du -sh Mutation/Raw_VCF
23G Mutation/Raw_VCF
time cp -r Mutation/Raw_VCF /media/Side/ken/
cp -r Mutation/Raw_VCF /media/Side/ken/ 0.53s user 59.35s system 7% cpu 12:31.78 total
time tar cf - Mutation/Raw_VCF | gzip | ssh ken@0.0.0.0 'cd /media/Side/ken && gzip -d | tar xvf -'
tar cf - Mutation/Raw_VCF 3.21s user 27.98s system 4% cpu 10:36.64 total gzip 532.32s user 2.73s system 84% cpu 10:36.65 total ssh ken@0.0.0.0 'cd /media/Side/ken && gzip -d | tar xvf -' 18.95s user 7.16s system 4% cpu 10:36.65 total
So, in this result, cp
takes like 12 minutes, but our tar-pipe takes 10.5 minutes
A better way
Though the pipeline works, but the ssh
part is wasting large of resource. The best way for this situation is:
|
And it only takes roughly 2 minutes.
gzip
- Compress:
gzip -cr 220725_KEGG > KEGG.gz
- Decompress:
gzip -d KEGG.gz
Tar with multiple threads
Resource: Artem S. Tashkinov, 2020
|
-
tar
:tar
stands for “tape archive” and is a command-line utility used to create, extract, or manipulate tarball archives, which are collections of files and directories bundled together into a single file.
-
-c
:- This flag tells
tar
to create a new archive.
- This flag tells
-
-I 'xz -9 -T0'
:- The
-I
option allows you to specify a compression program to use. In this case,'xz -9 -T0'
is being used.xz
: This is the program used for compression.xz
is a lossless data compression tool that is highly efficient.-9
: This option tellsxz
to use the maximum compression level (9). This will result in the smallest possible archive size, though it may take more time to compress.-T0
: This option tellsxz
to use all available CPU threads to perform the compression.0
dynamically chooses the number of threads based on the number of available CPU cores. This makes the compression faster on multi-core systems.
- The
-
-f archive.tar.xz
:- The
-f
option allows you to specify the name of the archive file you’re creating. In this case, the archive is namedarchive.tar.xz
. archive.tar.xz
: This is the name of the output file. The.tar.xz
extension indicates that it’s a tarball compressed withxz
.
- The
-
[list of files and folders]
:- This is a placeholder for the actual files and directories you want to include in the archive. You would replace this with a list of the files and directories you wish to compress into the archive.
Compress and Decompress