When should we use Radix sort?
Radix sort is harder to generalize than most other sorting algorithms. It requires fixed size keys, and some standard way of breaking the keys into pieces. Thus it never finds its way into libraries.
Radix sort is harder to generalize than most other sorting algorithms. It requires fixed size keys, and some standard way of breaking the keys into pieces. Thus it never finds its way into libraries.
hash_set is an extension that is not part of the C++ standard. Lookups should be O(1) rather than O(log n) for set, so it will be faster in most circumstances. Another difference will be seen when you iterate through the containers. set will deliver the contents in sorted order, while hash_set will be essentially random (Thanks Lou Franco). Edit: The C++11 … Read more
From the answer here, spark.sql.shuffle.partitions configures the number of partitions that are used when shuffling data for joins or aggregations. spark.default.parallelism is the default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set explicitly by the user. Note that spark.default.parallelism seems to only be working for raw RDD and is ignored when working with dataframes. If the task you are performing … Read more
guppy3 is quite simple to use. At some point in your code, you have to write the following: This gives you some output like this: You can also find out from where objects are referenced and get statistics about that, but somehow the docs on that are a bit sparse. There is a graphical browser as … Read more
One simplistic approach to measuring the “elapsed time” between events is to just grab the current date and time. In SQL Server Management Studio To calculate elapsed times, you could grab those date values into variables, and use the DATEDIFF function: That’s just one approach. You can also get elapsed times for queries using SQL … Read more
I need to read a large text file of around 5-6 GB line by line using Java. How can I do this quickly?
I need to read a large text file of around 5-6 GB line by line using Java. How can I do this quickly?
The comment was referring to the Big-O Notation. Briefly: O(1) means in constant time – independent of the number of items. O(N) means in proportion to the number of items. O(log N) means a time proportional to log(N) Basically any ‘O’ notation means an operation will take time up to a maximum of k*f(N)where: k is a … Read more