READ Free Dumps For Cloudera- CCD-410
Question ID 12513 | Which project gives you a distributed, Scalable, data store that allows you random, realtime
read/write access to hundreds of terabytes of data?
|
Option A | HBase
|
Option B | Hue
|
Option C | Pig
|
Option D | Hive
|
Option E | Oozie
|
Option F | Flume
|
Correct Answer | A |
Explanation Explanation: Use Apache HBase when you need random, realtime read/write access to your Big Data. Note: This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Features Linear and modular scalability. Strictly consistent reads and writes. Automatic and configurable sharding of tables Automatic failover support between RegionServers. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables. Easy to use Java API for client access. Block cache and Bloom Filters for real-time queries. Query predicate push down via server side Filters Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options Extensible jruby-based (JIRB) shell Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX Reference: http://hbase.apache.org/ (when would I use HBase? First sentence)
Question ID 12514 | You need to create a job that does frequency analysis on input data. You will do this by
writing a Mapper that uses TextInputFormat and splits each value (a line of text from an
input file) into individual characters. For each one of these characters, you will emit the
character as a key and an InputWritable as the value. As this will produce proportionally
more intermediate data than input data, which two resources should you expect to be
bottlenecks?
|
Option A | Processor and network I/O
|
Option B | Disk I/O and network I/O
|
Option C | Processor and RAM
|
Option D | Processor and disk I/O
|
Correct Answer | B |
Explanation