From start to launch: http://yakimaherald.com
Posted by ezmobius Thu, 03 Nov 2005 11:55:00 GMT
Beginning in early 2005, we thought about how best to redesign the newspaper’s website. We needed to start completely from scratch in order to increase the usability and reduce the maintenance overhead with the newspaper’s website. This website brings together 4 separate data sources into one cohesive structure. With the old site there was way too much human interaction required to keep things up to date and it was costing the paper more money for staff than the revenue from the site. To accomplish this, I switched to developing full time with Ruby on Rails, referred to here as RoR) after four years of heavy PHP and a few years of Python development.
I hope this summary of the project provide sperspective on how someone coming from a PHP background rewrote 5,500 lines of PHP code with 1,800 lines of code in RoR. I was the only one coding on this project and I had one other person create some of the views.
More after the jump…
We received approval and started rebuilding the yakimaherald.com site on May 1, 2005. It was almost 4 months from start to finish building this app. When I say “we”, I actually mean myself, the only developer and my designer who made the views.
During those four months I still had to do the daily maintenance and upkeep of the paper’s website and advertising plus we built two or three smaller sites with ROR during these four months. If I had worked on nothing else except the new site, I estimate it would have taken between two and three months to develop with just myself and one designer.
The final app weighed in at:
1839 LOC/models/controllers 1067 LOC/unit & functional tests eight controllers 12 models nine layouts 69 view templates
The system is very heavy on content. There are four main data sources for the application:
1. A local PostgreSQL 8.x database for CMS (Content Management System) functionality and static page contents. This database holds the info that reporters and photographers input through the admin interface. And it also holds the new banner management system. Config is pretty much vanilla PostgreSQL and it performs great for my situation. I used the C PostgreSQL/Ruby bindings.
2. The newspaper uses a BaseView database that is a proprietary solution that many of the world’s newspapers run for their newsroom database. This holds all the content that gets printed in the paper. This db is not SQL. It has a proprietary scripting/templating language called LiveIQ. My rails model that handles this database is a custom ruby lib that I wrote. It creates a little DSL for querying the BasviewDB. It converts custom ruby commands that get converted into the LiveIQ scripting language on the fly. This makes things so much nicer to work with. There is no mental context switching between different scripting languages so I can think and query the db in Ruby. All the local Yakima and Central Washington content comes from this database. This model accounts for 367 LOC out of my total application because of its complexity. I may be able to make this component open source because it could definitely benefit any other newspapers that use Baseview and are thinking about ruby and rails, or even just a better way to get their news online.
3. Custom xml feeds from the AP news wire. This content comes from the AP newswire subscription our paper has for the print version. It contains thousands of news items from around the world that get constantly updated throughout the day.
These feeds are a little rough and require a fair bit of text processing before they are ready to go live on the web. The feed come across the wire as a Base64 encoded xml file. After unpacking it I have to scan for the relevant feeds we use out of the two or more thousand that are available. So my app processes and regenerates a cache of the online content every 1.5 hours unless we manually flush the cache sooner.
4. The Seattle Times owns the Yakima Herald. So we get some of our content from them. We don’t have a whole lot of content from this source yet but we will be using more soon as we receive approval to use their RSS feeds.
This application is very data and content heavy. When the index page gets regenerated after a cache flush it is pulling local PostgreSQL data, Baseview DB data from a server on the local LAN, custom xml feeds from the AP wire and a few headlines feeds from the Seattle Times. This still is relatively fast. It takes about 200 milliseconds which is very good for everything it is doing to create the page including the network latency. But this only happens every 1.5 hours on one hit, the rest of the time it is cached as html files in the public directory. These get served fast by lighttpd (lighttpd is a lightweight, fast web server). Lighttpd/fastcgi can serve up to 180 dynamic requests/second using 6 fastcgi processes(fastcgi is a webserver module that allows your rails project to be cached in memory so it doesn’t have to be reloaded from disk on every web request). After completing this project in record time(for me personally), I have no doubt in my mind that Ruby on Rails can scale for large web applications with a large number of users. I think that rails is a perfect platform for sites like the Yakima Herald-Republic’s website and even much larger projects as well. Rails uses the ‘shared nothing’ technique to scaling your applications as demand rises. This means that there are no complicated servlets to set up and mountains of xml configuration files to write. As your rails application grows, you can just add more servers to the backend that run fastcgi processes that are delegated to from the front end lighttpd server. So you can keep scaling up by throwing more hardware at the problem, which is relatively cheap these days compared to developers time. As of the time I wrote this article, http://yakimaherald.com gets about 60,000+ page views per day. And there is still plenty of room to grow on the one Apple Xserve that is dedicated to this application.
The new applications runs on a brand new dual 2.3Ghz G5 Xserve running Tiger server with 1 gigabyte of ram and 480Gb SCSI RAID disk storage. We just got this in 10 days before the site launched and I configured it myself. We are running Lighttpd 1.4.6/fcgi as the webserver (available for download at http://www.lighttpd.net) and it is running flawlessly. I initially tried running the application on apache2/fcgi but in testing I got too many random 500 internal server errors. Also apache2 is a resource hog compared to lighttpd.
Lighttpd has proven itself to me over the last few months in production on some smaller sites and I think it is ready for prime time. We are getting around 60,000+ hits a day and thanks rails page and fragment caching. This allows for content to be regenerated once after it is updated and then written to a static .html page that the webserver is able to serve at 1500+ requests/second. Then the next time an update is made to the content of any page that is cached, the cached file is regenerated as .html and served as a static page until the next update. This is a breeze to do with rails caching abilities. You tell it which pages and fragments to cache and then define observers that watch the content for updates. Once it notices an update to information in a cached page it springs into action and rebuilds a new cache file with the new content.
I have six dispatch.fcgi’s running (the ruby fastcgi processes) and they fluctuate from below 1% to around 13% of one CPU when they are working on a complex page rebuild after a cache is swept. However, for the most part, they just hover around 1% to 3%. And Lighttpd is awesome – it’s never gone above 9% cpu yet and it mainly stays around 3%! (These percentages go to 200% since there are dual procs.) For the most part I am using about 16% oyt of 200% of all my processing power on this box for my RoR app at any given time.
I have a few launchd scripts(OS X Tiger’s new xml version of cron) running for maintenence tasks. I have launchd start an instance of the awesome ruby daemon daedalus at boot time. This daemon checks to make sure that lighttpd is running every 3 minutes and if it is not it relaunches a new instance of lighttpd/fcgi. It also wipes out the Ruby_sess files in /tmp every six hours. I end up with around 8,000-9,000 of these session file in six hours and my app runs much better when these are not allowed to build up.
Daedalus also bashes my cached pages every 1.5 hours. I have many data sources that won’t work with cache_sweeper because they come from remote computers. So this script erases the pertinent files in public so the cache can rebuild with the new content from all remote locations. We also have an intranet page where people from the newsroom can go and run a script to clean the cache whenever they add new content they want to be picked up.
I also have a lot of “glue” code written in ruby to do various text processing and ftp’ing and other things. The classified ads are processed to format them for online display. I have a bunch of admin tools written in Ruby as well.
I can say that I am very happy about how Ruby works, how Rails scales and with Ruby’s ‘speed’. Yes Ruby is slower than some other scripting languages for now, but that will be changing soon. Ruby 1.9 already has speed improvements over the current branch and YARV beta’s (Yet Another Ruby VM, a faster virtual machine for ruby code to run on) can give up to a 20x performance boost to certain parts of Ruby.
All in all I am very happy with my experience with RoR as well as Ruby. Rails is a very productive environment for me to develop web apps in. But I have really fell in love with Ruby itself. Ruby is so elegant and the syntax allows for me to open up code from seven or eight months ago and at an instance see exactly what it does. So it is much more maintainable than the Perl and shell scripts that I have replaced. I think that anyone considering RoR and Ruby for a decent size project should not be too concerned with how well RoR scales. It scales great. The shared nothing architecture works great. If I need more power eventually I can just fire up another linux box and run fcgi’s on there. Rinse, repeat!
I want to thank the excellent communities surrounding ruby and rails. These folks have been so helpful to me while learning the ropes. If anyone would like to see some configuration files or have any questions about how to deploy your app, please don’t hesitate to ask. It’s the least I can do to help out folks who want to learn ruby/rails, after all the great help I have received form the community.