fs.hdfs.impl.disable.cache caused SparkSQL very slow

This is a question related to this question: Hive/Hadoop intermittent failure: Unable to move source to destination We found that we could avoid the problem of “Unable to move source … Filesystem closed” by setting fs.hdfs.impl.disable.cache to true However, we also observed that the SparkSQL queries became very slow — queries that used to finish … Read more

What should be hadoop.tmp.dir ?

It’s confusing, but hadoop.tmp.dir is used as the base for temporary directories locally, and also in HDFS. The document isn’t great, but mapred.system.dir is set by default to “${hadoop.tmp.dir}/mapred/system”, and this defines the Path on the HDFS where where the Map/Reduce framework stores system files. If you want these to not be tied together, you can edit your mapred-site.xml such that the definition of mapred.system.dir … Read more