hadoop 2.7.3本地环境运行官方wordcount-基于HDFS
接上篇《hadoop 2.7.3本地环境运行官方wordcount》。继续在本地模式下测试,本次使用hdfs.
2 本地模式使用fs计数wodcount
上面是直接使用的是linux的文件系统。现在使用hadoop fs。在本地模式下,hadoop fs其实也是使用的linux的fs。下面示例说明:
2.1 验证FS
<code>cd /home/jungle/hadoop/hadoop-local<br/> ls -l<br/> total 116<br/> drwxr-xr-x. 2 jungle jungle 4096 Jan 6 15:06 bin<br/> drwxrwxr-x. 4 jungle jungle 31 Jan 6 16:53 dataLocal<br/> drwxr-xr-x. 3 jungle jungle 19 Jan 6 14:56 etc<br/> drwxr-xr-x. 2 jungle jungle 101 Jan 6 14:56 include<br/> drwxr-xr-x. 3 jungle jungle 19 Jan 6 14:56 lib<br/> drwxr-xr-x. 2 jungle jungle 4096 Jan 6 14:56 libexec<br/> -rw-r--r--. 1 jungle jungle 84854 Jan 6 14:56 LICENSE.txt<br/> -rw-r--r--. 1 jungle jungle 14978 Jan 6 14:56 NOTICE.txt<br/> -rw-r--r--. 1 jungle jungle 1366 Jan 6 14:56 README.txt<br/> drwxr-xr-x. 2 jungle jungle 4096 Jan 6 14:56 sbin<br/> drwxr-xr-x. 4 jungle jungle 29 Jan 6 14:56 share hadoop fs -ls /<br/> Found 20 items<br/> -rw-r--r-- 1 root root 0 2016-12-30 12:26 /1<br/> dr-xr-xr-x - root root 45056 2016-12-30 13:06 /bin<br/> dr-xr-xr-x - root root 4096 2016-12-29 20:09 /boot<br/> drwxr-xr-x - root root 3120 2017-01-06 18:31 /dev<br/> drwxr-xr-x - root root 8192 2017-01-06 18:32 /etc<br/> drwxr-xr-x - root root 19 2016-11-05 23:38 /home<br/> dr-xr-xr-x - root root 4096 2016-12-30 12:29 /lib<br/> dr-xr-xr-x - root root 81920 2016-12-30 13:04 /lib64<br/> drwxr-xr-x - root root 6 2016-11-05 23:38 /media<br/> # ... # 等同 ls -l /home/jungle/hadoop/hadoop-local<br/> hadoop fs -ls /home/jungle/hadoop/hadoop-local<br/> Found 11 items<br/> -rw-r--r-- 1 jungle jungle 84854 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/LICENSE.txt<br/> -rw-r--r-- 1 jungle jungle 14978 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/NOTICE.txt<br/> -rw-r--r-- 1 jungle jungle 1366 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/README.txt<br/> drwxr-xr-x - jungle jungle 4096 2017-01-06 15:06 /home/jungle/hadoop/hadoop-local/bin<br/> drwxrwxr-x - jungle jungle 31 2017-01-06 16:53 /home/jungle/hadoop/hadoop-local/dataLocal<br/> drwxr-xr-x - jungle jungle 19 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/etc<br/> drwxr-xr-x - jungle jungle 101 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/include<br/> drwxr-xr-x - jungle jungle 19 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/lib<br/> drwxr-xr-x - jungle jungle 4096 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/libexec<br/> drwxr-xr-x - jungle jungle 4096 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/sbin<br/> drwxr-xr-x - jungle jungle 29 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/share<br/> </code>
从上面可以看出。hadoop fs -ls /home/jungle/hadoop/hadoop-local
和linux的命令ls /home/jungle/hadoop/hadoop-local
是等效的。
2.2 准备数据
下面基于上次实例的原始数据,将其copy到hdfs上。
<code>hadoop fs -mkdir -p ./dataHdfs/input hadoop fs -ls .<br/> Found 12 items<br/> drwxrwxr-x - jungle jungle 18 2017-01-06 18:44 dataHdfs<br/> drwxrwxr-x - jungle jungle 31 2017-01-06 16:53 dataLocal<br/> # ... hadoop fs -ls ./dataHdfs/<br/> Found 1 items<br/> drwxrwxr-x - jungle jungle 6 2017-01-06 18:44 dataHdfs/input hadoop fs -put<br/> -put: Not enough arguments: expected 1 but got 0<br/> Usage: hadoop fs [generic options] -put [-f] [-p] [-l] <localsrc> ... <dst> # 将本地文件,put到hdfs上,实际效果等同于linux下的copy<br/> hadoop fs -put dataLocal/input/ ./dataHdfs/<br/> ls -l dataHdfs/<br/> total 0<br/> drwxrwxr-x. 2 jungle jungle 80 Jan 6 18:51 input ls -l dataHdfs/input/<br/> total 8<br/> -rw-r--r--. 1 jungle jungle 37 Jan 6 18:51 file1.txt<br/> -rw-r--r--. 1 jungle jungle 70 Jan 6 18:51 file2.txt hadoop fs -ls ./dataHdfs/<br/> Found 1 items<br/> drwxrwxr-x - jungle jungle 80 2017-01-06 18:51 dataHdfs/input hadoop fs -ls ./dataHdfs/input/<br/> Found 2 items<br/> -rw-r--r-- 1 jungle jungle 37 2017-01-06 18:51 dataHdfs/input/file1.txt<br/> -rw-r--r-- 1 jungle jungle 70 2017-01-06 18:51 dataHdfs/input/file2.txt </code>
2.3 执行wordcount
<code>hadoop jar /home/jungle/hadoop/hadoop-local/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount dataHdfs/input/ dataHdfs/output<br/> # 这里的input, output目录,即可以理解成hdfs里的目录,也可以理解成linux里的目录。 cat dataHdfs/output/part-r-00000<br/> I 1<br/> am 1<br/> bye 2<br/> great 1<br/> hadoop. 3<br/> hello 3<br/> is 1<br/> jungle. 2<br/> software 1<br/> the 1<br/> world. 2 md5sum dataLocal/outout/part-r-00000 dataHdfs/output/part-r-00000<br/> 68956fd01404e5fc79e8f84e148f19e8 dataLocal/outout/part-r-00000<br/> 68956fd01404e5fc79e8f84e148f19e8 dataHdfs/output/part-r-00000 </code>
转发申明:
本文转自互联网,由小站整理并发布,在于分享相关技术和知识。版权归原作者所有,如有侵权,请联系本站,将在24小时内删除。谢谢