how to write subquery and use “In” Clause in Hive

According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select: “Hive does not support IN, EXISTS or subqueries in the WHERE clause.” You might want to look at: https://issues.apache.org/jira/browse/HIVE-801 https://issues.apache.org/jira/browse/HIVE-1799

Difference between hive.tez.container.size and tez.task.resource.memory.mb

hive.tez.container.size This property specifies tez container size. Usually value of this property should be the same as or a small multiple (1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb and should not exceed value of yarn.scheduler.maximum-allocation-mb. As a general rule don’t put value higher than memory per processor as you want 1 processor per container and … Read more

Hive dynamic partitioning

You need to modify your select: I am not sure to which column on your demo staging you want to perform partitioning or which column in demo corresponds to land. But whatever is the column it should be present as the last column in select say your demo table column name is id so your … Read more

fs.hdfs.impl.disable.cache caused SparkSQL very slow

This is a question related to this question: Hive/Hadoop intermittent failure: Unable to move source to destination We found that we could avoid the problem of “Unable to move source … Filesystem closed” by setting fs.hdfs.impl.disable.cache to true However, we also observed that the SparkSQL queries became very slow — queries that used to finish … Read more

Hive: how to show all partitions of a table?

I have a table with 1000+ partitions. “Show partitions” command only lists a small number of partitions. How can i show all partitions? Update: I found “show partitions” command only lists exactly 500 partitions. “select … where …” only processes the 500 partitions!

Difference between INNER JOIN and LEFT SEMI JOIN

An INNER JOIN can return data from the columns from both tables, and can duplicate values of records on either side have more than one match. A LEFT SEMI JOIN can only return columns from the left-hand table, and yields one of each record from the left-hand table where there is one or more matches in the right-hand table … Read more

Hive’s unix_timestamp and from_unixtime functions

From the language manual: Convert time string with given pattern to Unix time stamp (in seconds) The result of this function is in seconds. Your result changes with the milliseconds portion of the date, but the unix functions only support seconds. For example: SELECT unix_timestamp(’10-Jun-15 10.00.00 AM’, ‘dd-MMM-yy hh.mm.ss a’); 1433930400 SELECT from_unixtime(1433930400, ‘dd-MMM-yy hh.mm.ss … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)