April 2, 2017

Hadoop Commands

hadoop command [genericOptions] [commandOptions]

hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-x] <path> ...]
[-find <path> ... <expression> ...]
[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]

hadoop fs -ls / -- Get a directory listing of the HDFS root directory
hadoop fs -ls /user/hive
hadoop fs -ls s3a://satya-hive/trip_data_dec17
hadoop fs -cat /test1/foo.txt -- Display contents of the file residing in HDFS
hadoop fs -cat /test1/foo.txt | wc -l -- Count the number of lines of file in  HDFS
hadoop fs -rm /test1/foo.txt -- Remove the file in HDFS
hadoop fs -rm -r hadoop-test2
hadoop fs -mkdir /test6/ -- Create a directory under the HDFS root directory
hadoop fs -put foo.txt /test6/ -- Copy file foo.txt from local disk to the directory
hadoop fs -put /etc/note.txt /test2/note_fs.txt
hadoop fs -get /user/satya/passwd ./ -- Copy the file back to local disk
hadoop fs -cp hadoop-test1/dwp-payments-april10.csv hadoop-test2
hadoop fs -cp /user/satya/reviews_Home_and_Kitchen_5.json s3a://satya-sparks/reviews_HomeKitchen
hadoop fs -Ddfs.replication=2 -cp hadoop-test2/dwp-payments-april10.csv hadoop-test2/test_with_rep2.csv
hadoop fs -mv hadoop-test1/dwp-payments-april10.csv hadoop-test3
hadoop fs -setrep 5 -R /user/satya/tmp/
hadoop fs -chmod 1777 /tmp
hadoop fs -touch /user/satya/test/foo
hadoop fs -rmr /user/satya/test/foo
hadoop fs -touchz /user/satya/test/bar
hadoop fs -count -q /user/satya
hadoop fs -copyFromLocal  /hirw-starterkit/hdfs/commands/dwp-payments-april10.csv hadoop-test1
hadoop fs -copyToLocal hadoop-test1/dwp-payments-april10.csv .
hadoop -execute start-all.sh

hadoop job -list
hadoop job -kill jobID
hadoop job -list-attempt-ids jobID taskType taskState
hadoop job -kill-task taskAttemptId

hadoop namenode -format

hadoop jar <jar_file> wordcount <output_file>
hadoop jar /opt/hadoop/hadoop-examples-1.0.4.jar wordcount /out/wc_output

hadoop dfsadmin -report
hadoop dfsadmin -setSpaceQuota 10737418240 /user/esammer
hadoop dfsadmin -refreshNodes
hadoop dfsadmin -upgradeProgress status
hadoop dfsadmin -finalizeUpgrade

hadoop fsck
Usage: DFSck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]
<path> -- start checking from this path
-move -- move corrupted files to /lost+found
-delete -- delete corrupted files
-files -- print out files being checked
-openforwrite -- print out files opened for write
-blocks -- print out block report
-locations -- print out locations for every block
-racks -- print out network topology for data-node locations
By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually tagged CORRUPT or HEALTHY depending on their block allocation status.
hadoop fsck / -files -blocks -locations
hadoop fsck /user/satya -files -blocks -locations

hadoop distcp -- Distributed Copy (distcp)
distcp [OPTIONS] <srcurl>* <desturl>
-p[rbugp] Preserve status
r: replication number
b: block size
u: user
g: group
p: permission
-p alone is equivalent to -prbugp
-i Ignore failures
-log <logdir> Write logs to <logdir>
-m <num_maps> Maximum number of simultaneous copies
-overwrite Overwrite destination
-update Overwrite if src size different from dst size
-skipcrccheck Do not use CRC check to determine if src is different from dest. Relevant only if -update is specified
-f <urilist_uri> Use list at <urilist_uri> as src list
-filelimit <n> Limit the total number of files to be <= n
-sizelimit <n> Limit the total size to be <= n bytes
-delete Delete the files existing in the dst but not in src
-mapredSslConf <f> Filename of SSL configuration for mapper task

NOTE 1: if -overwrite or -update are set, each source URI is interpreted as an isomorphic update to an existing directory.
For example:
hadoop distcp -p -update "hdfs://A:8020/user/foo/bar" "hdfs://B:8020/user/foo/baz"
would update all descendants of 'baz' also in 'bar'; it would *not* update /user/foo/baz/bar

NOTE 2: The parameter <n> in -filelimit and -sizelimit can be specified with symbolic representation. For examples,
1230k = 1230 * 1024 = 1259520
891g = 891 * 1024^3 = 956703965184

hadoop distcp hdfs://A:8020/path/one hdfs://B:8020/path/two
hadoop distcp /path/one /path/two

Related Articles: Apache Hadoop    Hadoop Training in India (Bangalore/Hyderabad)


  1. Worthful Hadoop tutorial. Appreciate a lot for taking up the pain to write such a quality content on Hadoop tutorial. Just now I watched this similar Hadoop tutorial and I think this will enhance the knowledge of other visitors for sureHadoop Online Training

  2. hanks For Sharing The Information The Information Shared Is Very Valuable Please Keep Updating

    Us Time Just Went On Reading The article Hadoop Online Course