The code that runs the photo galleries on cbc.ca is powered by PHP.
This is the first time we’ve really used “dynamic delivery” to serve news content on our website since ~2004. Our stance has always been to serve static content to our users (we use the term “baked out”).
In addition to moving to a dynamic backend to deliver our Photo Galleries, we’ve also changed the way internal clients are able to include these galleries in their pages. We’ve exposed an API for internal users to use. This means that every photo gallery on cbc.ca is now powered by our “Photo Gallery API”. Previously users would have to use server side includes (SSIs) to include static content.
On average, the API get’s accessed ~1.8 million times per day. More so during peak times (for example, during Hurricane Sandy the API was getting hit as much as 2.6M times per day). Each access resulted in an average of 27KB in payload being returned to the user. So on any day, we would be moving ~460GB of xml data.
On the intial rollout of the code, the application had no notion of how to handle If-Modified-Since (IMS) requests from browsers. This meant that even if the client had a copy of the XML data that the API returned, it would always get a copy of the data from the server when they revistied the photo gallery page (or other pages that included the gallery).
This posed a few problems:
- Users are downloading data for which they already have.
- The back end application is spending time returning data that the client already has.
- Page rendering time is a little longer as the browser has to wait to finish downloading the xml and processing it.
By default PHP doesn’t know how to handle these IMS requests. It is up to the individual developer to code logic to handle these types of requests and to handle them accordingly.
Once this logic was put in place we saw some drastic changes in the volume of data being sent to our users. Now, on average we’re only delivering 12KB per hit. Resulting in ~205GB of traffic per day, a savings of 44%.
Click for Larger
Take a look at the above graph.
The green line represents the number of hits. On November 6, you see it split into a blue and red line. This is when the new code went live. Before 100% of hits were 200 OK, now about 45% of hits are 304 Not Modified vs. 200 OK. As such, the size per hit has been reduced because returning a 304 response is much more lightweight than 200s.
We’ve eliminited all of the problems identified earlier. Users now load XML data from their local browser cache (if they have it). The backend application does not need to select ALL the data from the DB on each hit. Page rendering time is reduced as the browser does not need to wait to download content it already has.