Contact Menu

Integrating Content from External Sources into OU Campus Using RSS, PHP, and JavaScript

The web team of the University of Tennessee at Chattanooga uses PHP and RSS to syndicate blog content, news releases, and calendar events into their main website.

PHP SimpleXML is used to parse the XML of the RSS feeds. We import a variety of feeds, from WordPress, from Master Calendar, and from other sites such as an external athletics CMS. WordPress provides some RSS features, including RSS from categories, tags and search strings, but we have added media attachments and a customized template to output a more complicated RSS feed on the University home page.

News releases, events and content from WordPress via RSS

Example web pages:

WordPress plugin & PHP to add media attachments to RSS feed

Example RSS feeds (view XML/XSL styled page in FireFox; View Source to see XML structure, namespaces, node names and structure):

Custom WordPress RSS feed template, followed by the functions call to load and create it:

<?php
/*
Template Name: Custom Current Headlines Feed

*/

$numposts = 10;
$category_id = get_cat_ID('Current Headlines');

function custom_rss_date( $timestamp = null ) {
  $timestamp = ($timestamp==null) ? time() : $timestamp;
  echo date(DATE_RSS, $timestamp);
}

function custom_rss_text_limit($string, $length, $replacer = '&hellip;') { 
  $string = strip_tags($string);
  if(strlen($string) > $length) 
    return (preg_match('/^(.*)\W.*$/', substr($string, 0, $length+1), $matches) ? $matches[1] : substr($string, 0, $length)) . $replacer;   
  return $string; 
}

$posts = query_posts('cat='.$category_id.'&showposts='.$numposts);

$lastpost = $numposts - 1;



header('Content-Type: ' . feed_content_type('rss-http') . '; charset=' . get_option('blog_charset'), true);
$more = 1;

echo '<?xml version="1.0" encoding="'.get_option('blog_charset').'"?'.'>'; ?>

<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	<?php do_action('rss2_ns'); ?>
>

<channel>
	<title><?php bloginfo_rss('name'); wp_title_rss(); ?></title>
	<atom:link href="<?php self_link(); ?>" rel="self" type="application/rss+xml" />
	<link><?php bloginfo_rss('url') ?></link>
	<description><?php bloginfo_rss("description") ?></description>
	<lastBuildDate><?php echo mysql2date('D, d M Y H:i:s +0000', get_lastpostmodified('GMT'), false); ?></lastBuildDate>
	<language><?php bloginfo_rss( 'language' ); ?></language>
	<sy:updatePeriod><?php echo apply_filters( 'rss_update_period', 'hourly' ); ?></sy:updatePeriod>
	<sy:updateFrequency><?php echo apply_filters( 'rss_update_frequency', '1' ); ?></sy:updateFrequency>
	<?php do_action('rss2_head'); ?>
	<?php while( have_posts()) : the_post(); ?>
	<item>
		<title><?php the_title_rss() ?></title>
		<link><?php the_permalink_rss() ?></link>
		<comments><?php comments_link_feed(); ?></comments>
		<pubDate><?php echo mysql2date('D, d M Y H:i:s +0000', get_post_time('Y-m-d H:i:s', true), false); ?></pubDate>
		<dc:creator><?php the_author() ?></dc:creator>
<?php the_category_rss('rss2') ?>
		<guid isPermaLink="false"><?php the_guid(); ?></guid>
<?php if (get_option('rss_use_excerpt')) : ?>
		<description><![CDATA[<?php get_the_excerpt(); ?>]]></description>
<?php else : ?>
		<description><?php echo '<![CDATA['.custom_rss_text_limit($post->post_content, 256).']]>';  ?></description>
<?php $content = get_the_content_feed('rss2'); ?>
	<?php if ( strlen( $content ) > 0 ) : ?>
		<content:encoded><![CDATA[<?php echo $content; ?>]]></content:encoded>
	<?php else : ?>
		<content:encoded><![CDATA[<?php the_excerpt(); ?>]]></content:encoded>
	<?php endif; ?>
<?php endif; ?>
		<wfw:commentRss><?php echo esc_url( get_post_comments_feed_link(null, 'rss2') ); ?></wfw:commentRss>
		<slash:comments><?php echo get_comments_number(); ?></slash:comments>
<?php rss_enclosure(); ?>
<?php do_action('rss2_item'); ?>
		</item>
	<?php endwhile; ?>
</channel>
</rss>
/* Custom RSS Feed for Home Page headlines */

function create_my_customfeed() {
load_template( get_stylesheet_directory() . '/headlines-customfeed.php');
}
add_action('do_feed_headlines', 'create_my_customfeed', 10, 1);

OU Campus PHP helper file and code asset

To parse an existing external RSS feed, such as a standard or custom WordPress feed, we use a helper file that employs PHP’s simplexml_load_string to load the targeted RSS feed, traverse the XML document, select node content, then echo out a block of styled html for each set of targeted XML nodes. The helper file accepts the $feed variable to specify the location of the targeted feed, and the $maxitems variable to specify the number of items to select from the top of the that feed.

On its own, the PHP helper file can be hit in a browser, to verify the targeted feed type exists and is parsed correctly. The browser will return an un-styled html page from the helper file URL.

The OU Campus Code Asset includes the helper file, over-rides the helper’s default variables, is and is very simple to copy and modify, according to the intended target feed and desired number of items to be presented on the final page. Many different Assets can include a single helper file.

(Vinit from OmniUpdate outlined a much better way to handle the feed URL and number of items in his Server Side Scripting class: via PCF Page Parameters.)

PHP file

  • parses feed and displays HTML
    • variables set to defaults for testing
      • may need to verify feed access and XML structure
  • various types of display html
    • title, excerpt, thumbnail from WordPress
    • title only from WordPress
    • RSS feed with formatting or namespaces different from WordPress
<?php

$input = $_SERVER['QUERY_STRING'];
parse_str($input);

if (!isset($feed))//script or page property will choose which feed to display
	$feed = "http://blog.utc.edu/news/headlines.xml/";

if(!isset($maxitems))
	$maxitems = 3;

$file = file_get_contents($feed);
$file = str_ireplace('src="http://', 'src="//', $file);
$file = str_ireplace('media:content url="http://', 'media:content url="//', $file);
$file = str_ireplace('media:thumbnail url="http://', 'media:thumbnail url="//', $file);

$sxml = simplexml_load_string($file);

$i = 0;
	foreach ($sxml->channel->item as $item) {
		if (++$i > $maxitems) {
				break;
			}
		$namespaces      = $item->getNameSpaces( true );
		$content         = isset($namespaces['content']) ? $item->children( $namespaces['content'] ) : '';
		$content_encoded = isset($content->encoded)      ? $content->encoded                         : '';
		$media           = isset($namespaces['media'])   ? $item->children( $namespaces['media'] )   : '';
		$html       = "<div class=\"row-fluid\">"
					.     "<h3><a href=\"{$item->link}\">{$item->title}</a></h3>"
					.     "{$content_encoded}"
					.  "</div>";
	echo($html);
}
?>
<script>
	$(document).ready(function(){
		$('.sidebar img').removeClass().addClass('thumbnail pull-right span5');
		$('aside.well img').removeClass().addClass('thumbnail pull-right span5');
	});
</script>

OU Campus Assets

  • simple structure
    • specify RSS source
    • number of posts to display
<?php
$feed = "//blog.utc.edu/hr/category/benefits/feed/";
include($_SERVER['DOCUMENT_ROOT']. '/_resources/php/get-headlines-sidebar.php');
$maxitems = 5;
?>

(Vinit from OmniUpdate outlined a much better way to handle the feed URL and number of items in his Server Side Scripting class: via PCF Page Parameters.)

More Speed

After you verify the final production page, you should set up cron jobs for any external feeds that are created via database calls on the external servers. A WordPress feed will require a bit of PHP and MySQL work to generate the RSS feed. This requires some processing, and will introduce a delay in service of the OU Campus PHP page. To avoid this latency, we create cron jobs on the production server to periodically fetch the feeds and cache them locally on the production server. The speed increase for the final page product is noticeable.

If your OU Campus Sites include development, test, training, or mobile Sites, you will want to duplicate the local static feeds on all of the OU Sites.

If you control an external WordPress site, you can employ caching mechanisms or plugins to more efficiently serve RSS feeds.

  • cache RSS feeds via WordPress plugin, e.g. W3  Total Cache
  • cron job to fetch feeds to local production servers
  • minimize external requests for faster page load

This cron job can be placed in /etc/cron.hourly to fetch WordPress feeds, check lastBuildDate, and compare that to the existing build date, and create a cached RSS xml file.

#!/bin/bash

# Preload WordPress Feeds for Website

# UTC News
wget http://blog.utc.edu/news/headlines.xml/ -O /data/web/prod/www/_resources/rss/wp-news.tmp >/dev/null 2>&1
TMP_LASTBUILD="$(xml_grep '/rss/channel/lastBuildDate' --text_only /data/web/prod/www/_resources/rss/wp-news.tmp)"
XML_LASTBUILD="$(xml_grep '/rss/channel/lastBuildDate' --text_only /data/web/prod/www/_resources/rss/wp-news.xml)"
MIMETYPE=`file -b --mime-type /data/web/prod/www/_resources/rss/wp-news.tmp`
if [ "$MIMETYPE" == "application/xml" -a  "$TMP_LASTBUILD" != "$XML_LASTBUILD" ] ; then
    mv /data/web/prod/www/_resources/rss/wp-news.tmp /data/web/prod/www/_resources/rss/wp-news.xml
fi

# Duplicate for development, test & training environments
cp -p /data/web/prod/www/_resources/rss/wp*.xml /data/web/test/www/_resources/rss/
cp -p /data/web/prod/www/_resources/rss/wp*.xml /data/web/test/train/_resources/rss/
cp -p /data/web/prod/www/_resources/rss/wp*.xml /data/web/dev/www/_resources/rss/

For other RSS sources, the structure may be different; maybe we can’t do a lastBuildDate check. And… some servers may not respond to a default wget request from a shell script. No problem: just specify --header and --user-agent, don’t do the lastBuildDate check, and just write the RSS XML file.

#!/bin/bash

# Preload GoMocs Feeds for Website


# Gomocs.com News
wget  --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" http://www.gomocs.com/rss.aspx -O /data/web/prod/www/_resources/gomocs-news.xml

# Duplicate for development, test & training environments
cp -p /data/web/prod/www/_resources/rss/gomocs*.xml /data/web/test/www/_resources/rss/
cp -p /data/web/prod/www/_resources/rss/gomocs*.xml /data/web/test/train/_resources/rss/
cp -p /data/web/prod/www/_resources/rss/gomocs*.xml /data/web/dev/www/_resources/rss/

Event Calendar RSS feeds from Master Calendar via RSS

Example pages that include feeds from Master Calendar. MC has a very rudimentary RSS output of title, date/time and description. MC caches its RSS feeds by default, and expects a lot of traffic to the feed locations, so it is not so necessary to create cron jobs to fetch them. However, if you have an OU Campus page that fetches and displays a large number of MC feeds, you will see a performance increase by using cron jobs. UTC.edu’s home page fetches a number of feeds, and we saw a noticeably faster page load after moving these calls to cron jobs.

Social Media Streams

UTC uses a jQuery plugin and PHP to pull in social media posts for the university account, as well as for individual departments and colleges. This is an easy way to create a social stream sidebar or social wall page.

Facebook, Twitter and Instagram require PHP  API script for connection to signed apps. jquery.imagesloaded is helpful for the wall display

Code Samples

(A compilation zip including all code mentioned in the presentation is available. Comment and subscribe to this post to be notified of updates.)

  • WordPress plugin to add media attachments to RSS feed
  • WordPress functions.php changes
  • PHP files to parse feeds and display HTML
  • OU Campus Assets to specify RSS source and set number of posts
  • cron job to fetch RSS feeds to local server
No comments yet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.