PHP library that can merge stylesheet with inline style [closed]

I am trying to replicate the same output you have provided in your example above and I am only able to achieve output along the lines of;

<p class=MsoNormal>
    <span class=MsoIntenseReference>
        <span style="color:red;text-transform:none;letter-spacing:0pt;font-weight:normal;text-decoration:none">
            Red Example text
        </span>
    </span>
</p>

As you can see Microsoft Word (2010) is inserting predefined class names for the paragraph and span tags, additionally its also wrapping the span containing the text.

How were you able to assign a class name to the span in which wraps your text?

For reference I am saving my HTML file as a “Web Page, Filtered” and Filtered being the key to removing any of the “dirty” formatting Word would otherwise apply to the document.

If I can replicate the same output you are getting in your example above, then I we may be able to work towards an easier solution.

PS. I do apologize that this response to your question is coming up as an answer, however I am seemingly not able to post a comment. I do intend to follow this through with some additional commentary that will work towards a complete answer though as I have some suggestions I want to make once I get some further insight into the my initial question above!

UPDATE

NOTE: This is intended as a guide to hopefully set you off on the right path and therefore the code provided below are examples missing some functionality in which you will need to write.

Ideally you want your XML-RPC script to handle the processing of the content in which you feed it in two ways.

1) Search and replace inline-styles to those in which are compatible with WordPress via Regular Expression (RegEx).

2) Post your newly sanitized content to your blog in the form of a post.

Considering you won’t know the exactly inline-style format that your MS Word Document will output, you can with the use of RegEx search and replace text between characters based upon meeting certain criteria.

Take this for example;

<span style="color:green">Integer</span>

Through RegEx you might search for the word “green” between <span and > and where you find a match of “green” you replace all text between with your desired inline-style;

<span class="green" style="color:green;font-weight:bold;font-size:10pt">

To make this inline-styling available in the post editor screen in the WordPress dashboard you will need to add some extra options to the TinyMCE editor “styles-dropdown” which would look something similar to;

    array(
        'title' => 'Bold Green Text',
        'classes' => 'green',
            'inline' => 'span',
        'styles' => array(
            'color' => 'green',
            'fontWeight' => 'bold',
                    'fontSize' => '10pt'
        )

You can read more about that at,

1) HERE

2) AND HERE

Essentially the custom styles you add should match that of which you are making available via your RegEx function.

Now in terms of your XML-RPC script (example. post-via-xmlrpc.php) would look something along the lines of;

<?php

// Your RegExp function for processing your source file

function sanitize_content() {

    gloabl $content;
    $content="<span class="important">example content is here</span>";

    // do your regular expression stuff here

    return $content;

}    

// Your XML-RPC function

function wpPostXMLRPC($title,$content,$rpcurl,$username,$password,$categories=array(1)){

    $categories = implode(",", $categories);
    $XML = "<title>$title</title>"."<category>$categories</category>".$sanitized_content;

    $params = array('','',$username,$password,$XML,1);
    $request = xmlrpc_encode_request('blogger.newPost',$params);

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_POSTFIELDS, $request);
    curl_setopt($ch, CURLOPT_URL, $rpcurl);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 1);
    curl_exec($ch);
    curl_close($ch);
}

// Do stuff here to initiate your post function

?>

For this example you can see I’ve included the $content string within the script, but of course you would want to pass your MS Word HTML file to this variable instead and you can either do this via a form or by file path and so on.

Assuming your post-via-xmlrpc.php was accessible via your localhost you would run this process by visiting,

http://localhost/post-via-xmlrpc.php

The most difficult part of this entire process is really your regular expression (RegEx) search and replace function for which you would need to find <body> and remove everything before it, find </body> and remove everything after it, then remove both <body> and </body> and then parse through the remaining content replacing inline-styles as required.

There really isn’t any need to mess around with another PHP library when it can all be done from a self contained XML-RPC script designed to sanitize your input.