How can I fix wp_insert_comment failure when ‘comment_content” includes slanted apostrophe in Excel csv source data

If processed as saved, comments with slanted apostrophes/curled single
quotes – – are simply not inserted – they do not appear either in
backend comments.php or in the post output.

That’s because the wpdb class checks if the comment content contains any invalid (UTF-8) text, and if yes, then wpdb rejects the content and therefore does not run the insert query (which originates from wp_insert_comment()).

If I use esc_html() on the comment content, the comments will be
processed – are will be listed in the backend and in post output – but
the comment content will be emptied.

Similar to the first case above, where esc_html() uses wp_check_invalid_utf8() to check if the comment content contains any invalid UTF-8 text, and if so, then an empty string is returned (by default).

I’ve tried some other means to change the comment content
programmatically, like str_replace-ing with ', but I haven’t had
any luck with that: The affected comments are still rejected.

Yes, because the slanted quote you passed to str_replace() isn’t the same slanted quote in the comment content, i.e. their encoding does not match, only their appearance that looks alike.

Therefore… You may simply need to convert the file encoding to UTF-8.

And I mean, convert it before uploading the file. 🙂 Have you already tried doing so?

Or if you’re using Windows, then you should know that MS Excel by default saves the CSV file using the ANSI ( i.e. Windows-1252 or Western European (Windows) ) encoding, so you should instead save it as UTF-8 — in the “Save As” window, click the “Tools” next to the “Save” button, select “Web Options”, then go to the “Encoding” tab and choose “Unicode (UTF-8)”.

Regardless of the method by which I turn the Sheet into a CSV file
(direct download as CSV, download as xlsx then save as csv, copy-paste
into Excel and save), I get some version of the same problem

  • If by the “direct download as CSV”, you mean the File → Download → “Comma-separated values (.csv, current sheet)” option in the Google Sheets (web) app, then perhaps after you downloaded the file, you edited it and saved it using MS Excel (with the encoding set to the default)?

  • Because Google Sheets actually encodes the sheet in UTF-8 (when exporting the sheet for download), so if you uploaded the file as-is (without editing it) to your website, then you wouldn’t have the encoding issues.

And if you want to be sure the encoding is Windows-1252, then try one of these (with the already-uploaded data), where if the encoding is indeed Windows-1252, then the 2nd and 3rd below would give you the slanted quote as-is (i.e. just as how it appears on-screen — ), whereas the first one would give you ’ ( and not the �.. ).

  • $comment_content = mb_convert_encoding( $insert_array[4], 'HTML-ENTITIES', 'Windows-1252' );

  • $comment_content = mb_convert_encoding( $insert_array[4], 'UTF-8', 'Windows-1252' );

  • $comment_content = iconv( 'Windows-1252', 'UTF-8', $insert_array[4] );

Or actually, if you use Notepad++, then you could easily check the encoding by looking at the bottom-right corner of the editor. 🙂

UPDATE

  • Actually, when I said save it as UTF-8, I mean, in Google Sheets, export the file as an Excel file (.xlsx) and then in Excel, export it as CSV with the UTF-8 encoding. Have you tried doing so or is that what you actually did?

  • And if wp_insert_category() works, but not wp_create_category(), then it’s likely that category_exists() fails and it could be due to many reasons…

But anyway, now that you know what the file encoding is (which is Windows-1252), then if you just can’t get the CSV data to work without giving you the encoding (or invalid characters) issues, then you can try to manually encode the data (i.e. each item in $insert_array) to UTF-8 like so:

function my_fix_invalid_utf8( $text ) {
    if ( function_exists( 'iconv' ) ) {
        return iconv( 'Windows-1252', 'UTF-8', $text );
    } elseif ( function_exists( 'mb_convert_encoding' ) ) {
        return mb_convert_encoding( $text, 'UTF-8', 'Windows-1252' );
    } elseif ( function_exists( 'utf8_encode' ) ) {
        // This would not fix the APPEARANCE of the text (i.e. you'd see something like '??' on
        // the page), but this would at least let you insert the comment or text to the database..
        return utf8_encode( $text );
    }

    return $text; // if all else fails, return the text as-is
}

$insert_array = array_map( 'my_fix_invalid_utf8', $insert_array );

And I have no affiliations with the author/devs of Notepad++, but you could actually easily convert the encoding using Notepad++.. so give it a try? 🙂 (the below file was BTW, exported from an Excel file)

Easily convert file encoding using Notepad++ — Preview in a Windowws 10 computer