How to get all files inserted (but not attached) to a post

Yes you can do this! Is it simple/easy? Sort of… Is it fast and scalable? Oh dear god no 😱.

The Super Expensive Solution

The solution here is this function:

$post_id = url_to_postid( $url );

The problem is that this is a very slow function that’s expensive to call. If you call this when displaying every post, your DB server will be under super high strain and may fall over unless you configure it correctly and put it on a dedicated machine.

Note: you can’t call this function early, it has to be on the setup_theme hook or later, or it’ll cause a fatal error to occur.

With this function, we can pull out every URL in the posts content, and do a test to see if it begins with the URL of our site. If it does, we can check to see if it matches any of the attachments, and if it doesn’t, then we know it’s unattached and can use this function to grab a copy.

How to retrieve all the URLs in a posts content is a topic or another question however. Testing if a URL is for an attached attachment is a simple if statement comparison in a loop ( foreach attached thing, check if its URL matches the URL we’re testing )

Speeding Things Up

There are a few things that might mitigate the cost of this function:

  • Run the process on the save and update posts hook and store the result in post meta
  • Wrap the function in a caching layer to speed things up ( only effective if there’s an object caching solution in place such as memcached, not as effective as the previous mechanism )
    -Use a less precise method written by Pippins plugins that uses a DB query. This will bypass caches and it’s still expensive, but not as expensive as url_to_postid. It also only works with GUIDs, hence the accuracy trade off

An Even More Expensive Solution

Query all the attachments via WP_Query and load all of them into memory, then check each attachment 1 by 1 to see if they appear in your post.

This is by far the slowest and most expensive way to do this:

  • As you upload more media, you have to load more when doing the query, for any site with more than 30 or 40 attachments you’re going to get out of memory fatal errors
  • It’s a heavy database operation, your DB has to send all of the attachments, and if you’ve got several people browsing at once, this will get very problematic
  • It will never scale past 5-10 concurrent users, and that’s if you’re lucky
  • It’s slow, checking every attachment takes time

Leave a Comment