How get JSON from external API on every page load if not existing in transient?

What is the best option to fetch the different online statuses automatically “on-the-fly” in real time?

What is the most efficient way of fetching (external) data like these? Optimally everything happens “on-the-fly” as soon as the source differ from the site. At least it should run a check every page update. -What about caching? I have read a few lines about transients. Would that work in this context?

Not on the fly.

Right now you’ve ran head first into a fundamental trade off. The more “realtime” you make this, the slower it will be.

  • At an absolute minimum your page will always be as slow, or slower, than what the API is.
  • If I refresh the page half a second later it’ll pay that cost all over again even though it’s just fetched the data.
  • If that service goes down, so does that page

Your tradeoff is latency vs speed. You can make the page blazing fast/cheap to generate at the cost of immediacy, or, you can make it fully realtime but that costs resources/time/speed.

Opting to always go for max 100% speed doesn’t work, not just because it slows your page down, but because by the time the page has loaded and the viewer has looked through the data it’s already out of date by your criteria.

So instead:

  • Don’t request on every page load for every person. You never asked for bespoke couture responses tailored ot each viewer, you asked for up to date responses. Why make the request multiple times at the same time for multiple viewers?
  • Satisfice, pull in the data in the background in a cron job and update your posts that way, say every 5 minutes. Put that 5 minute figure in the page so users know, and it’ll be clear that the data is never older than 5 minutes. This also reduces your API calls to the remote server which might save cost and bandwidth. Or perhaps more often than that, the interval is up to you.
    • you want to put the fetching of the data and the display of the data as far away from eachother as possible, ideally in separate requests so that they can happen at the same time. You don’t want to involve the backoffice everytime a customer arrives, just send regular updates to front of house with the “new information”. Displaying the data you have stored is cheap, fast, and scales. Updating that data is not.
  • always cache remote data, even a tiny cache expiration of 10 seconds will have enormous benefits if you have a lot of visitors, and almost no difference if one or two people visit. Just make sure this cache lasts longer than it takes to make the requests.
  • you may be able to conceal the almost realtime/pseudo realtime nature of things with smart UX, e.g. most services don’t tell you that XYZ is online, rather they tell you they were recently online, or consider any presence in the last 5 minutes to be online. If you can say “seen 10 minutes ago” then that’s enough for most people.

If you need realtime status on wether they’re online or not, then you may never be able to get that. For true realtime status you need your frontend browser connected via a socket the remote API so that the 3rd party can push that data to your viewers. Otherwise you’ll need to do what almost all sites do but pretend they don’t: “Ask really really often and pretend it’s realtime”, aka polling via AJAX.

And store your data in your agent CPT.

I’d recommend reading up on WP Cron for scheduling actions: