Convert pandas Series to DataFrame
Rather than create 2 temporary dfs you can just pass these as params within a dict using the DataFrame constructor: There are lots of ways to construct a df, see the docs
Rather than create 2 temporary dfs you can just pass these as params within a dict using the DataFrame constructor: There are lots of ways to construct a df, see the docs
The pd.DataFrame constructor does not accept a dictionary view as data. You can convert to list instead. Here’s a minimal example: The docs do suggest this: data : numpy ndarray (structured or homogeneous), dict, or DataFrame Equivalently, you can use pd.DataFrame.from_dict, which accepts a dictionary directly:
You need nunique: If you need to strip ‘ characters: Or as Jon Clements commented: You can retain the column name like this: The difference is that nunique() returns a Series and agg() returns a DataFrame.
df.to_numpy() is better than df.values, here’s why.* It’s time to deprecate your usage of values and as_matrix(). pandas v0.24.0 introduced two new methods for obtaining NumPy arrays from pandas objects: to_numpy(), which is defined on Index, Series, and DataFrame objects, and array, which is defined on Index and Series objects only. If you visit the v0.24 docs for .values, you will see a big red warning that says: Warning: We recommend … Read more
You can create subset of data with your condition and then use shape or len: Performance is interesting, the fastest solution is compare numpy array and sum: Code:
Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation. We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns. option 1use agg ← link to docs option 2more for lessuse describe ← link to … Read more
For a dataframe df, one can use any of the following: len(df.index) df.shape[0] df[df.columns[0]].count() (== number of non-NaN values in first column) Code to reproduce the plot:
Use GroupBy.sum:
You are providing a string representation of a dict to the DataFrame constructor, and not a dict itself. So this is the reason you get that error. So if you want to use your code, you could do: But better would be to not create the string in the first place, but directly putting it in a … Read more
To select rows whose column value equals a scalar, some_value, use ==: To select rows whose column value is in an iterable, some_values, use isin: Combine multiple conditions with &: Note the parentheses. Due to Python’s operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses is parsed as which results in a Truth … Read more