hadoop – Page 2 – Read For Learn

Name node is in safe mode. Not able to leave

January 21, 2022 by admin

not able to create anything in hdfs I did But showing what is the problem? Solution: http://unmeshasreeveni.blogspot.com/2014/04/name-node-is-in-safe-mode-how-to-leave.html?m=1

Failed to locate the winutils binary in the hadoop binary path

January 8, 2022 by admin

I am getting the following error while starting namenode for latest hadoop-2.2 release. I didn’t find winutils exe file in hadoop bin folder. I tried below commands

What should be hadoop.tmp.dir ?

January 3, 2022 by admin

It’s confusing, but hadoop.tmp.dir is used as the base for temporary directories locally, and also in HDFS. The document isn’t great, but mapred.system.dir is set by default to “${hadoop.tmp.dir}/mapred/system”, and this defines the Path on the HDFS where where the Map/Reduce framework stores system files. If you want these to not be tied together, you can edit your mapred-site.xml such that the definition of mapred.system.dir … Read more

connect to host localhost port 22: Connection refused

January 2, 2022 by admin

While installing hadoop in my local machine , i got following error can some one help me to resolve this error , than changing port number

get “ERROR: Can’t get master address from ZooKeeper; znode data == null” when using Hbase shell

December 31, 2021 by admin

If you just want to run HBase without going into Zookeeper management for standalone HBase, then remove all the property blocks from hbase-site.xml except the property block named hbase.rootdir. Now run /bin/start-hbase.sh. HBase comes with its own Zookeeper, which gets started when you run /bin/start-hbase.sh, which will suffice if you are trying to get around … Read more

Why is Fetch task in Hive works faster than Map-only task?

December 18, 2021 by admin

FetchTask directly fetches data, whereas Mapreduce will invoke a map reduce job Run code snippetExpand snippet Also there is another parameter hive.fetch.task.conversion.threshold which by default in 0.10-0.13 is -1 and >0.14 is 1G(1073741824) This indicates that, If table size is greater than 1G use Mapreduce instead of Fetch task more detail

What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism?

December 3, 2021 by admin

From the answer here, spark.sql.shuffle.partitions configures the number of partitions that are used when shuffling data for joins or aggregations. spark.default.parallelism is the default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set explicitly by the user. Note that spark.default.parallelism seems to only be working for raw RDD and is ignored when working with dataframes. If the task you are performing … Read more

Hadoop “Unable to load native-hadoop library for your platform” warning

November 22, 2021 by admin

I assume you’re running Hadoop on 64bit CentOS. The reason you saw that warning is the native Hadoop library $HADOOP_HOME/lib/native/libhadoop.so.1.0.0 was actually compiled on 32 bit. Anyway, it’s just a warning, and won’t impact Hadoop’s functionalities. Here is the way if you do want to eliminate this warning, download the source code of Hadoop and recompile libhadoop.so.1.0.0 on 64bit … Read more