READ Free Dumps For Cloudera- CCD-410
Question ID 12529 | You want to perform analysis on a large collection of images. You want to store this data in
HDFS and process it with MapReduce but you also want to give your data analysts and
data scientists the ability to process the data directly from HDFS with an interpreted high-
level programming language like Python. Which format should you use to store this data in
HDFS?
|
Option A | SequenceFiles
|
Option B | Avro
|
Option C | JSON
|
Option D | HTML
|
Option E | XML
|
Option F | CSV
|
Correct Answer | B |
Explanation Reference: Hadoop binary files processing introduced by image duplicates finder
Question ID 12530 | What is the disadvantage of using multiple reducers with the default HashPartitioner and
distributing your workload across you cluster?
|
Option A | You will not be able to compress the intermediate data.
|
Option B | You will longer be able to take advantage of a Combiner.
|
Option C | By using multiple reducers with the default HashPartitioner, output files may not be in globally sorted order.
|
Option D | There are no concerns with this approach. It is always advisable to use multiple reduces.
|
Correct Answer | C |
Explanation Explanation: Multiple reducers and total ordering If your sort job runs with multiple reducers (either because mapreduce.job.reduces in mapred-site.xml has been set to a number larger than 1, or because youve used the -r option to specify the number of reducers on the command-line), then by default Hadoop will use the HashPartitioner to distribute records across the reducers. Use of the HashPartitioner means that you cant concatenate your output files to create a single sorted output file. To do this youll need total ordering, Reference: Sorting text files with MapReduce