hive – Read For Learn

how to write subquery and use “In” Clause in Hive

February 7, 2022 by admin

According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select: “Hive does not support IN, EXISTS or subqueries in the WHERE clause.” You might want to look at: https://issues.apache.org/jira/browse/HIVE-801 https://issues.apache.org/jira/browse/HIVE-1799

Difference between `load data inpath ` and `location` in hive?

February 7, 2022 by admin

At my firm, I see these two commands used frequently, and I’d like to be aware of the differences, because their functionality seems the same to me: 1 2 They both copy the data from the directory on HDFS into the directory for the table on HIVE. Are there differences that one should be aware … Read more

Hive Map-Join configuration mystery

January 31, 2022 by admin

These parameters are used to make decision on when to use Map Join against Common join in hive, which ultimately affects query performance at the end. Map join is used when one of the join tables is small enough to fit in the memory, so it is very fast. here’s the explanation of all parameters: hive.auto.convert.join When this parameter set … Read more

Difference between hive.tez.container.size and tez.task.resource.memory.mb

January 27, 2022 by admin

hive.tez.container.size This property specifies tez container size. Usually value of this property should be the same as or a small multiple (1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb and should not exceed value of yarn.scheduler.maximum-allocation-mb. As a general rule don’t put value higher than memory per processor as you want 1 processor per container and … Read more

Hive dynamic partitioning

January 24, 2022 by admin

You need to modify your select: I am not sure to which column on your demo staging you want to perform partitioning or which column in demo corresponds to land. But whatever is the column it should be present as the last column in select say your demo table column name is id so your … Read more

java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

January 23, 2022 by admin

I did below modifications and I am able to start the Hive Shell without any errors: 1. ~/.bashrc Inside bashrc file add the below environment variables at End Of File : sudo gedit ~/.bashrc 2. hive-site.xml You have to create this file(hive-site.xml) in conf directory of Hive and add the below details 3. You also … Read more

fs.hdfs.impl.disable.cache caused SparkSQL very slow

January 23, 2022 by admin

This is a question related to this question: Hive/Hadoop intermittent failure: Unable to move source to destination We found that we could avoid the problem of “Unable to move source … Filesystem closed” by setting fs.hdfs.impl.disable.cache to true However, we also observed that the SparkSQL queries became very slow — queries that used to finish … Read more

Hive: how to show all partitions of a table?

January 22, 2022 by admin

I have a table with 1000+ partitions. “Show partitions” command only lists a small number of partitions. How can i show all partitions? Update: I found “show partitions” command only lists exactly 500 partitions. “select … where …” only processes the 500 partitions!

Difference between INNER JOIN and LEFT SEMI JOIN

January 19, 2022 by admin

An INNER JOIN can return data from the columns from both tables, and can duplicate values of records on either side have more than one match. A LEFT SEMI JOIN can only return columns from the left-hand table, and yields one of each record from the left-hand table where there is one or more matches in the right-hand table … Read more

Hive’s unix_timestamp and from_unixtime functions

January 18, 2022 by admin

From the language manual: Convert time string with given pattern to Unix time stamp (in seconds) The result of this function is in seconds. Your result changes with the milliseconds portion of the date, but the unix functions only support seconds. For example: SELECT unix_timestamp(’10-Jun-15 10.00.00 AM’, ‘dd-MMM-yy hh.mm.ss a’); 1433930400 SELECT from_unixtime(1433930400, ‘dd-MMM-yy hh.mm.ss … Read more