Object storage and retrieval in PHP part 1 – JSON files

I mentioned in my post about eatStatic that I was using JSON files for storage of objects and arrays, but hoped to make it switchable to use mongdb. This is part one of a two-part post, demonstrating use of JSON files with json_encode() and json_decode().

Take the following simple class:-


class case_study {
var $id;
var $title;
var $body_text;
var $skills = array();
}


If we create an instance of this and add some data:-


$case_study = new case_study;
$case_study->id = 'my/case_study';
$case_study->title = 'My case study';
$case_study->body_text = 'Some text for the case study';
$case_study->skills['css'] = 'CSS';
$case_study->skills['sitebuild'] = 'Site Build';


and pass it to print_r(), we get this:-


case_study Object
(
[id] => my/case_study
[title] => My case study
[body_text] => Some text for the case study
[skills] => Array
(
[css] => CSS
[sitebuild] => Site Build
)

)


If we now encode it as JSON:-


$json_str = json_encode($case_study);


At this point, we can save the file to the filesystem – I tend to create a unique ID based on the current date/time and a random string. I won’t detail it all here, but you can see some of the helper functions I use in eatStaticStorage.class.php and eatStatic.class.php. One thing worth noting is that sometimes when reading a .json file back in from the filesystem, I was experiencing a bug where the last three characters were omitted – i’m not sure what was causing this, but it was fixed by changing my read_file() method to use file_get_contents(), instead of fread().

Once you have retrieved your JSON string you can decode it again:-


$case_study = json_decode($json_str);


and we end up with this:-


stdClass Object
(
[id] => my/case_study
[title] => My case study
[body_text] => Some text for the case study
[skills] => stdClass Object
(
[css] => CSS
[sitebuild] => Site Build
)

)


Notice that the array “skills” is now an object. We can set it back to an array using get_object_vars():-


$case_study->skills = get_object_vars($case_study->skills);


nb: this only happens for key => value arrays, if it was just a simple array e.g. array(‘css’,’sitebuild’), we wouldn’t need to pass it through get_object_vars(), as it would be maintained as an array.

This gets us back to where we started:-


stdClass Object
(
[id] => my/case_study
[title] => My case study
[body_text] => Some text for the case study
[skills] => Array
(
[css] => CSS
[sitebuild] => Site Build
)

)


sort of – we now have an object instance with all the attributes of the original object instance, but it doesn’t know it is a case_study object. In fact it isn’t a case_study object instance at all – we would have to create a new instance of case_study and copy the attributes across if we needed the real thing, but if you just want the data, this can be used as it is in most cases.

The above example is very simple, but it can get quite complex when your object contains arrays of objects, which in turn may contain arrays (and arrays of objects). The initial cheap and convenient trick of encoding an object instance and saving it, then retrieving, decoding and using it can then get quite hairy, but still less effort than splitting it out into different objects and maintaining in several different relational database tables.

In part two i’ll talk about how to use mongoDB to save and retrieve object instances in PHP.

Site building workflow challenges – keeping HTML in a database

I was reminded to today of one of my pet hates – coordinating a site build, or a site rebuild when the CMS you are using keeps content, often containing HTML markup from use of an editor such as tiny mce, in the database.

Consider the following scenario:-

  • You have a staging site where the client has been using the CMS to input content
  • Meanwhile, you make some changes to the database on your local version and want to push them to staging
  • You can use a migration script to push your changes to the live database but you find yourself wanting to also copy the new content back to your local database, so you can work on CSS on real content. You then would probably drop your local database and restore from a backup from staging, losing any test content you put in locally

It’s basically a bit of a kerfuffle.

This is one of the scenarios that I hope could be avoided with a CMS based on eatStatic (if I ever develop it beyond a blog engine) – any content-types that contain bodies of text, whether thay are marked up with HTML or not, would be stored on the filesystem. This could be put under version control, so you can selectively synchronise your content with another instance of the site.

I can also see a case for some add-on for any existing CMS – an export function that routinely pushes text content from the db into text files to be kept under version control, and also allows import, allowing instances to selectively sync content.

Introducing eatStatic blog engine

creating a new blog post in textmate

Recently I ported this blog from an ancient version of wordpress to my own simple blog engine, which uses my PHP5 micro-framework, “eatStatic”. I use the phrase “blog engine” rather than blog software, as it isn’t really packaged up yet as something I would describe as software – its more just a collection of classes and templates that can be used to keep a blog.

The bulk of the code was written last year in the space of a couple of hours while sitting in a garage waiting for my car to be fixed – I was about to go on a long road trip and wanted a blogging solution that let me create blog posts and organise photos offline and then conveniently sync it to the live site when I had an internet connection. The result was my “on the road” blog about mobile working.

The thing that sets this apart from other blog engines (and the origin of the name “eatStatic”, along with a nod to a 90’s techno act), is that instead of using a relational database to store content, it uses simple text files for blog posts, and cached json files to store collections of data (e.g. post lists, tag references etc.). I have it set up to run with dropbox so that I compose my posts in textmate and they are synced to a dropbox folder on the webserver. You don’t have to use dropbox though – you can use any technique you like to upload the data files to the server – for “on the road” I use subversion, which means I also have versioning of blog content. Draft posts are composed in a drafts folder and moved into the main posts folder to push them live. There is currently no admin area on the site, though I might add one later.

The published date and URI for each post are taken from the text file name – i’ve adapted it for this blog to use the same url scheme as wordpress to avoid link rot on legacy content. Some people asked me why I don’t just use the title and created/ modified date of the text file to make it even simpler, and the answer is that I wanted finer control, and the option to specify the publish date – using created/ modified would have been a disaster for the content I imported from wordpress. Also by naming each file starting with YYYY-MM-DD, the post files are easier to sort/ find in the post folder, both visually/ manually and in code. You can use HTML in the blog post and additionally line breaks are converted to br tags, other then immediately after a closing tag. You can add tags and metadata at the end of the text file.

I’ve also got a simple thumbnail gallery which can be included in a post (see below) by uploading a folder full of full-size images with the same name as the post. The idea behind this is that a set of jpeg/ png images can be imported from a camera, and automatically pushed to the server by dropbox. A caching script creates the thumbnails and web-size version on demand, which are saved to the filesystem for efficiency during subsequent requests. I considered setting it up so that each post had it’s own folder, which could then contain images, but the blog engine was mostly written with the idea of quickly creating posts by opening textmate/ emacs, writing and saving rather then faffing around with creating folders.

I made the decision not to build in any commenting functionality – the anti-spam / moderation features needed are too much of a pain to deal with, so i’ve archived the old wordpress comments into the post body and integrated disqus instead.

As I mentioned before, I’ve been using a previous version of eatStatic successfully for my “on the road” blog, but I wanted to see how it coped with 100’s of posts rather then just a handful – it seems to be doing fine, coping with over 600 posts, but i’m sure there is room for improvement. I’ve also been investigating making the json read/write switchable to use mongodb so that it could potentially be very scaleable – i’ve encountered a few inconsistencies in the way that PHP json_decode() and mongodb object retrieval work, but nothing that can’t be worked around – expect a blog post on that later!

I don’t expect eatStatic blog to be a wordpress killer, but it may appeal to techie types who want a lightweight PHP5 blog engine, maybe to plug into an existing site and people who want to compose posts in textmate/ emacs (or any other code editor), rather than in a web form. If you are interested in trying it, keep an eye on the github repo, as i’ll commit an example of how this blog is formed, once i’ve ironed out the more embarrassing bugs! I may add a simple admin area at a later date, to allow publishing entirely via the web, and I think it would also benefit from a “post by email” feature, for convenient moblogging, but don’t hold your breath!

When I was importing content (I actually wrote a python script to parse a wordpress xml export file and create the text files), I found it quite fitting that the first ever post on this blog nearly ten years ago was made on a home-brewed ASP blog engine which used XML for data storage. I think before then I kept a static HTML blog of sorts, on a freeserve site, but unfortunately haven’t got a copy of that for completeness.

Lastly, whether or not you want to set up an eatStatic-based blog, if you aren’t already using dropbox, it really is excellent, so why not sign up for free 2GB account using my referral URL, so I can get some more free space? Even though I have a paid dropbox account, I use a second free account to mount on my server for automated site/ database backups and for this blog and it keeps filling up!

Watershed 2011 rebuild

screen grab of watershed.co.uk

Last night the new version of the watershed website was pushed live. I had the pleasure of being one of many people involved in this project, which involved combining several different sites representing different projects within the watershed brand. I did the “first cut” of the HTML/CSS, working from a PSD provided by the design agency Document, and also helped with some of the Drupal theme integration, working alongside some talented watershed staff and other freelancers (i’d name them all here, but would inevitably miss someone).

Keeping my developer mojo

cloudstermatic.jpg

I think it’s fair to say I lost my developer mojo for a while earlier this year. I don’t think it was any one particular thing, maybe a combination of feeling slightly burned out, working on a few projects where I felt restricted and frustrated by the technology, which lead me to feeling dread instead of excitement at the prospect of a new project. I started to question whether I should be a developer and whether I should start to look into a career change.

Well luckily my mojo came back a month or so later – around the time that some changes in the scope for a project led me to using my own “micro” php framework instead of a large cumbersome one, to speed up progress. The initial feeling was one of failure that I had been beaten by the mysteries and learning curve that I had been trying to overcome, but this was soon overridden by the feeling of excitement about being able to create something.

It’s fairly simple – to get development satisfaction I have to feel creative in what i’m doing, and feel like i’m learning something. Churning out website after website using a monolithic framework or CMS does not give me that satisfaction. I can’t always choose my projects, so when I find myself working on stuff that doesn’t excite me, I have to make time to work on other stuff that does, no matter how busy I am.

It’s a principle google use with their “20% time” – where employees can work on their pet projects. It’s not just to indulge people – the creative energy generated by letting geeks do their stuff spills over into the day to day work. I don’t have an employer to allow me this privilege, so I have to make time to do it myself. I’m hoping some good stuff comes out of this – but as a by-product rather than the goal.

Social media withdrawl tools part 1

I have occasional periods where for one reason or another I want to distance myself from social media for a bit. One of the things that usually stops me is the idea of missing communication aimed specifically at me. Facebook has the option to control what types of communication (e.g. a message) result in an email, and twitter has the option to send you an email when someone direct messages you, but (as far as I know) there is no direct way to be informed when you are “mentioned” on twitter.

Here is one technique to get round this – presuming you use an RSS reader of some sort (I use google reader), you can go to search.twitter.com, search for your username and then subscribe to the RSS feed for the results.

Now simply uninstall your twitter client and step away from the firehose, and if someone mentions you, you’ll know next time you read your RSS feeds. Less immediate, but that’s part of the point!

My Idea of March – a decentralised microblogging/ chat system

Chris Shiflett has suggested a blog revival, and i’m having one of those days, trapped between sorting out domestic chores and procrastination, and not being entirely productive, so what better time to blog than right now!

One of the current topics being discussed is how twitter have asked developers to stop creating new twitter clients (for the non-technical a “twitter client” is a program or “app” such as tweetdeck that lets you use twitter on your computer or phone without visiting the twitter website). Apart from being an annoyance to people who want to create new clients, many developers including myself are starting to read between the lines, that this is a sign of the corporate machinery clanking into action to control how twitter is used, so they can more efficiently monetise it.

I think this is kind of inevitable – i’ve never seen a service adopted by the masses so quickly, and most long term users will remember when twitter struggled to scale as it suddenly became a network of multi-millions rather than a few hundred or a few thousand.Any social network system like this needs a critical mass of users to make it useful – but looking at my own needs, that critical mass consists of a couple of hundred of people who are mostly friends and colleagues in the web/ digital media industries. I don’t use the “trending” stuff and i’m not interested in following celebrities, so a critical mass for me would be for all or most of those people to adopt another system. Yesterday there were mumblings about identi.ca, and sure enough you’ll see that i’ve reserved my username (after some signup confusion and accidentally signing up to a mailing list instead).

Identi.ca is an open source solution and I look forward to seeing what develops with it, but I can’t help feeling that the real solution (especially for the technically inclined) is a decentralised system, just like blogs have always been – you host your blog either on your own site, or using a service such as blogger, and people consume them directly or via RSS readers. This is another example of geek-led innovation – the geeks were doing it first, the mainstream followed later.

I’ve had a quick look around for a decentralised micro-blog system, and it seems there are a few already out there, but before I look closer to see if any of them would fit my needs, here’s how I envisage it working:-

  • You self-host a micro-blog on your site, it’s just like a blog, but each post has a 140 character limit
  • The micro-blog has an RSS feed, a variant that can optionally include extra info such as “in reply to” and “location”
  • You use a self-hosted micro-blog aggregator to follow other people’s micro-blogs (this could of course include a twitter stream, using RSS not their restrictive API)
  • The aggregator could be on your site, or even running locally on your machine, and of course you could build ANY DAMN CLIENT YOU LIKE to view, interact with, and post to your micro-blog
  • You could hook into most of the existing services for things like link shortening, image hosting etc.
  • Private messaging would need some thought, but it’s essentially like having a contact form on your website (and the same spam considerations).
  • Popularity contest “follower” stats would be optional – it would actually be pretty difficult to work out how many people are following you, other than analysing RSS stats, which can be misleading.
  • Global search would be tricky without a central database, but I rarely use that.
  • No central point of failure means the platform would be very resilient, and there would be no massive server-farm or staff to fund.
  • No owner means the users run it, people may develop services around it to make money, but there would be no central owner
  • The core tech should be really, really simple – the basic service should need no integration with services or API’s, or software installation requirements which may scare people off.

So, once the geeks have invented the platform, there would be a similar barrier of entry to participating as there is to starting a blog – you would either need to install it, roll your own, or sign up with one of the hosted services I can imagine popping up a few months later. Therefore mainstream adoption would be much, much slower and because of the way I want to use a service like this, that isn’t a bad thing.

Looking at my twitter profile, I currently follow 372 people, a number that i’d like to get down, but social etiquette dictates that I only unfollow people if they get ridiculously noisy or off-topic. I bet if I analysed how many people I regularly interact with or find unmissably interesting I would get that number down to less than 100. If I ever get round to building it, I wonder if I could persuade 100 geeks to try it? Of course as mentioned earlier, I could build an aggregator to follow peoples twitter streams via RSS, so I don’t necessarily have to have other people adopt the platform immediately, so maybe that’s where I will start – I wonder if scraping RSS feeds counts as use of the twitter API?

archived comments

Some great ideas here Rick. After another fail whale this evening decentralized seems like a good idea.

Joe Leech 2011-03-16 22:08:33

The de-centralized nature of XMPP would work really well as a solid back end for a new service such as this. It already supports (via the pubsub mech) the idea of posts as well as private messages and has the advantage of support for private as well as public multi user chats. All running on privately or publicly owned servers communicating with each other using an open source protocol. Sounds ideal!

There’s also the additional advantage that XMPP is real time so no polling.

You just need to design the experience to feel less like “IM” and more like twitter which would be achievable.

Stefan 2011-05-25 11:17:10

A few gotchas when your Drupal site is being deployed behind caching servers and proxy_pass

I recently launched a Drupal based site that forms part of the website for a global company (being a freelancer working through a design agency with an NDA, I can’t talk about it any more than that, or stick it on my portfolio unfortunately!). I though i’d make a few notes about some of the issues I had to overcome.

The site itself is hosted on a dedicated VPS, but served to the world through akamai caching servers, which means that everything is cached by default. Therefore in this set-up, the CMS is only available at a different URL, where caching is bypassed. Gotcha no.1 is that cookies set and read via PHP do not work in this scenario*. Fortunately Javascript cookies can still be used.

In addition to the caching, the site is served up as part of a much bigger site /deep/down/in/the/url/structure, so proxy_pass is used (before caching) to rewrite the paths. Gotcha no.2 is that base_path() in drupal picks up the path of the origin server, so I had to add a condition like this in my settings.php (excuse wrapping, really must sort this site out for code samples):-


if($_SERVER['HTTP_X_FORWARDED_HOST'] !=''){
$base_url = 'http://'.$_SERVER['HTTP_X_FORWARDED_HOST'].'/path/to/my/proxied/site';
}


The clue to gotcha number 3 is in that last example. $_SERVER[‘HTTP_HOST’] reports the host of the origin server, rather than the public host, so if it is used anywhere in your code, it may cause issues. I ended up adding a function getHost() that I use to return the appropriate host, depending on where the site is being viewed (once again excuse formatting):-


function getHost(){
$host = $_SERVER['HTTP_HOST'];
if(isset($_SERVER['HTTP_X_FORWARDED_HOST'])){
$host = $_SERVER['HTTP_X_FORWARDED_HOST'];
}
return $host;
}


Gotcha No.4 was image paths in optimised css – both the Drupal-provided CSS caching and some external stuff I had using minify – these both rewrote the image paths to use those from the origin server. This meant that CSS applied images worked in the non-cached editing environment but not on live. I haven’t come up with a solution for that one yet, other then to leave some of the stylesheets non-minified.

* maybe there is a solution to this, but assume in this case there is little or no scope to request server config changes.

Signal and Noise

Rick Hurst signal and noise music blog

One of my goals for this year was to get back into playing bass and making music, and I have been doing just that – it’s like the music-making part of my brain that fell asleep sometime around 2003 has woken up again. I’ve started another blog to document it all:-

http://noise.rickhurst.co.uk