This is a follow up to my last post in which I showed you how to write a word count MapReduce job. Have a look at that earlier post before reading on. It will put things into perspective.
As you saw we implemented our map and reduce methods in their own classes by extending Mapper and Reducer class from Hadoop framework. It turns out that there is a way to implement a word count example without doing that. Hadoop framework already ships with two classes which can be used as our mapper and reducer. They are TokenCounterMapper from org.apache.hadoop.mapreduce.lib.map package and IntSumReducer from org.apache.hadoop.mapreduce.lib.reduce package.
Here is a revised word count which uses built-in TokenCounterMapper and IntSumReducer classes.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
package com.thereforesystems.hadoop; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.map.TokenCounterMapper; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer; import org.apache.hadoop.util.GenericOptionsParser; public class HadoopWordCountRevised { public static void main(String[] args) throws Exception { Configuration config = new Configuration(); String[] otherArgs = new GenericOptionsParser(config, args).getRemainingArgs(); Job job = new Job(config, "Word Count Tutorial"); job.setJarByClass(HadoopWordCountRevised.class); job.setMapperClass(TokenCounterMapper.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } |





