You need to make sure your pandas dataframe columns are appropriate for the type spark is inferring. If your pandas dataframe lists something like:
pd.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 5062 entries, 0 to 5061 Data columns (total 51 columns): SomeCol 5062 non-null object Col2 5062 non-null object
And you’re getting that error try:
df[['SomeCol', 'Col2']] = df[['SomeCol', 'Col2']].astype(str)
Now, make sure .astype(str)
is actually the type you want those columns to be. Basically, when the underlying Java code tries to infer the type from an object in python it uses some observations and makes a guess, if that guess doesn’t apply to all the data in the column(s) it’s trying to convert from pandas to spark it will fail.
Related Posts:
- Spark RDD to DataFrame python
- Delete a column from a Pandas DataFrame
- Constructing pandas DataFrame from values in variables gives “ValueError: If using all scalar values, you must pass an index”
- How to iterate over rows in a DataFrame in Pandas
- Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
- Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
- Writing a pandas DataFrame to CSV file
- Modifing data while using iterrows() does not work
- What does `ValueError: cannot reindex from a duplicate axis` mean?
- Pandas DataFrame Groupby two columns and get counts
- How to show all columns’ names on a large pandas dataframe?
- ValueError: Unknown label type: ‘continuous’
- How to groupby based on two columns in pandas?
- How to deal with SettingWithCopyWarning in Pandas
- ImportError: No module named pandas
- TypeError: ‘Series’ objects are mutable, thus they cannot be hashed problemwith column
- Create a Pandas Dataframe by appending one row at a time
- ValueError: Length of values does not match length of index | Pandas DataFrame.unique()
- Creating an empty Pandas DataFrame, then filling it?
- Python Pandas Counting the Occurrences of a Specific value
- Count unique values per groups with Pandas
- DataFrame constructor not properly called
- Convert pandas Series to DataFrame
- ImportError: Missing required dependencies [‘numpy’]
- How to iterate over rows in a DataFrame in Pandas
- Pandas group-by and sum
- Pandas “Can only compare identically-labeled DataFrame objects” error
- Pandas: ValueError: cannot convert float NaN to integer
- ValueError: Length of values does not match length of index | Pandas DataFrame.unique()
- ‘DataFrame’ object has no attribute ‘sort’
- pandas: merge (join) two data frames on multiple columns
- Why do I get: “Length of values does not match length of index” error?
- why should I make a copy of a data frame in pandas
- Pandas, merging two dataframes on multiple columns, and multiplying result
- How to customize a scatter matrix to see all titles?
- What is dtype(‘O’), in pandas?
- Plot pie chart and table of pandas dataframe
- What is dtype(‘O’), in pandas?
- Pandas ‘count(distinct)’ equivalent
- Python TypeError: cannot convert the series to
when trying to do math on dataframe - ValueError: ‘object too deep for desired array’
- What does axis in pandas mean?
- How to take column-slices of dataframe in pandas
- Combine two columns of text in pandas dataframe
- Group by index + column in pandas
- Pandas: change data type of Series to String
- How to change dataframe column names in pyspark?
- How to change dataframe column names in pyspark?
- Normalize data in pandas
- Convert a Pandas DataFrame to a dictionary
- Selecting with complex criteria from pandas.DataFrame
- Create a Pandas Dataframe by appending one row at a time
- module ‘pandas’ has no attribute ‘rolling_mean’
- Modify the legend of pandas bar plot
- Count unique values using pandas groupby
- Ignoring NaNs with str.containsv
- Count unique values per groups with Pandas [duplicate]
- Map to List error: Series object not callable
- Pandas DataFrame: replace all values in a column, based on condition
- How to convert string to datetime format in pandas python?
- No numeric types to aggregate – change in groupby() behaviour?
- Pandas – How to flatten a hierarchical index in columns
- Pandas OR statement ending in series contains
- Solution for SpecificationError: nested renamer is not supported while agg() along with groupby()
- ValueError: Expected object or value when reading json as pandas dataframe
- How to check if a column exists in Pandas
- How to get a value from a cell of a dataframe?
- Pandas split DataFrame by column value
- Groupby value counts on the dataframe pandas
- AttributeError: ‘Series’ object has no attribute ‘reshape’
- Pandas – Drop function error (label not contained in axis)
- Pandas index column title or name
- Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`
- datetime to string with series in pandas
- environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON
- import pandas_datareader gives ImportError: cannot import name ‘is_list_like’
- How to add pandas data to an existing csv file?
- Importing Pandas gives error AttributeError: module ‘pandas’ has no attribute ‘core’ in iPython Notebook
- pandas groupby sort within groups
- if else function in pandas dataframe
- cannot convert the series to
- Read data (.dat file) with Pandas
- How to convert index of a pandas dataframe into a column
- Replacing Pandas or Numpy Nan with a None to use with MysqlDB
- pandas replace multiple values one column
- Boolean Series key will be reindexed to match DataFrame index
- Must have equal len keys and value when setting with an iterable
- Difference between data type ‘datetime64[ns]’ and ‘
- Merging two DataFrames
- pandas: multiple conditions while indexing data frame – unexpected behavior
- How to print a specific row of a pandas DataFrame?
- Logical operators for Boolean indexing in Pandas
- How to update Pandas from Anaconda and is it possible to use eclipse with this last
- vectorize conditional assignment in pandas dataframe
- Error in Reading a csv file in pandas[CParserError: Error tokenizing data. C error: Buffer overflow caught – possible malformed input file.]
- Convert list of dictionaries to a pandas DataFrame
- Python Pandas : pivot table with aggfunc = count unique distinct
- Compare two columns using pandas
- AttributeError: ‘Series’ object has no attribute ‘split’ error in sending emails
- alueError: ordinal must be >= 1