Converting Pandas dataframe into Spark dataframe error

You need to make sure your pandas dataframe columns are appropriate for the type spark is inferring. If your pandas dataframe lists something like:

pd.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5062 entries, 0 to 5061
Data columns (total 51 columns):
SomeCol                    5062 non-null object
Col2                       5062 non-null object

And you’re getting that error try:

df[['SomeCol', 'Col2']] = df[['SomeCol', 'Col2']].astype(str)

Now, make sure .astype(str) is actually the type you want those columns to be. Basically, when the underlying Java code tries to infer the type from an object in python it uses some observations and makes a guess, if that guess doesn’t apply to all the data in the column(s) it’s trying to convert from pandas to spark it will fail.

Spark RDD to DataFrame python
how to sort pandas dataframe from one column
Renaming column names in Pandas
How to reset index in a pandas dataframe? [duplicate]
Delete a column from a Pandas DataFrame
How to deal with SettingWithCopyWarning in Pandas
How to deal with SettingWithCopyWarning in Pandas
Constructing pandas DataFrame from values in variables gives “ValueError: If using all scalar values, you must pass an index”
How to iterate over rows in a DataFrame in Pandas
pandas read_json: “If using all scalar values, you must pass an index”
How to iterate over rows in a DataFrame in Pandas
Writing a pandas DataFrame to CSV file
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Writing a pandas DataFrame to CSV file
Adding new column to existing DataFrame in Python pandas
Modifing data while using iterrows() does not work
ImportError: No module named pandas
How to change the order of DataFrame columns?
How to change the order of DataFrame columns?
ImportError: No module named pandas. Pandas installed pip
What does `ValueError: cannot reindex from a duplicate axis` mean?
Pandas DataFrame Groupby two columns and get counts
How can I use the apply() function for a single column?
How to show all columns’ names on a large pandas dataframe?
Convenient way to deal with ValueError: cannot reindex from a duplicate axis
ValueError: Unknown label type: ‘continuous’
How to groupby based on two columns in pandas?
How to fix IndexError: invalid index to scalar variable
“Series objects are mutable and cannot be hashed” error
How to deal with SettingWithCopyWarning in Pandas
Merging dataframes on index with pandas
ImportError: No module named pandas
TypeError: ‘Series’ objects are mutable, thus they cannot be hashed problemwith column
Create a Pandas Dataframe by appending one row at a time
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
ValueError: Length of values does not match length of index | Pandas DataFrame.unique()
Convert Python dict into a dataframe
re.sub erroring with “Expected string or bytes-like object”
Creating an empty Pandas DataFrame, then filling it?
How do I select rows from a DataFrame based on column values?
How do I select rows from a DataFrame based on column values?
DataFrame constructor not properly called! error
Pandas group-by and sum
How do I get the row count of a Pandas DataFrame?
Python pandas groupby aggregate on multiple columns, then pivot
Python Pandas Counting the Occurrences of a Specific value
Convert pandas dataframe to NumPy array
Count unique values per groups with Pandas
DataFrame constructor not properly called
Convert pandas Series to DataFrame
Error:cannot convert float NaN to integer in pandas
ImportError: Missing required dependencies [‘numpy’]
Replacing column values in a pandas DataFrame
Error”Can only compare identically-labeled Series objects” and sort_index
How to iterate over rows in a DataFrame in Pandas
Pandas group-by and sum
How do I get the row count of a Pandas DataFrame?
Python Pandas – Missing required dependencies [‘numpy’] 1
Pandas “Can only compare identically-labeled DataFrame objects” error
Pandas: ValueError: cannot convert float NaN to integer
Get list from pandas dataframe column or row?
pandas DataFrame “no numeric data to plot” error
ValueError: Length of values does not match length of index | Pandas DataFrame.unique()
‘DataFrame’ object has no attribute ‘sort’
‘DataFrame’ object has no attribute ‘sort’
pandas: merge (join) two data frames on multiple columns
TypeError: cannot unpack non-iterable int objec
Pandas DataFrame column to list
Rename Pandas DataFrame Index
Why do I get: “Length of values does not match length of index” error?
pandas: filter rows of DataFrame with operator chaining
why should I make a copy of a data frame in pandas
‘DataFrame’ object has no attribute ‘sort’
Pandas, merging two dataframes on multiple columns, and multiplying result
Convert DataFrame column type from string to datetime, dd/mm/yyyy format
Pandas how to use pd.cut()
How to customize a scatter matrix to see all titles?
How to Read .txt in Pandas
Get a list from Pandas DataFrame column headers
What is dtype(‘O’), in pandas?
How to read a .xlsx file using the pandas Library in iPython?
Get total of Pandas column
Plot pie chart and table of pandas dataframe
Type error: cannot convert the series to
What is dtype(‘O’), in pandas?
TypeError: ‘DataFrame’ object is not callable
Pandas ‘count(distinct)’ equivalent
How to check whether a pandas DataFrame is empty?
Convert columns to string in Pandas
Change column type in pandas
Python TypeError: cannot convert the series to when trying to do math on dataframe
ValueError: ‘object too deep for desired array’
Shuffle DataFrame rows
What does axis in pandas mean?
How to take column-slices of dataframe in pandas
AttributeError: Can only use .dt accessor with datetimelike values
Combine two columns of text in pandas dataframe
Python: pandas merge multiple dataframes
How to load a tsv file into a Pandas DataFrame?

Related Posts:

Leave a Comment Cancel reply