OLGM & Split Batcher

I’ve just wrapped up a very large project which I’ve unfortunately been unable to document on this blog due to non-disclosure. For this reason, blog entries have been sparse in the last six weeks. I’m very hopeful to find some time to work on my own personal projects, particularly Payload, over the festive period.

One of the exciting features of this particular project is the ability to switch between mapping engines using this library. I believe this library was developed to facilitate using Google’s tile imagery in the OpenLayers engine, via a special layer which from what I gather, maps interactions from OpenLayers into an instance of a Google Map giving the appearance that the OpenLayers engine is handling everything.

This can be used to huge advantage. It is against Google’s TOS to load tile imagery directly from their servers, this gets around that and allows you to have the Google tile imagery in OpenLayers. It also means that OpenLayers tools, for example the drawing tools and modify interaction, can be used in the context of a Google map. These tools massively outperform Google’s tools, so to be able to use the best of both engines in a single application is a really useful thing to be able to do.

Split Batcher

I’m about to embark on another freelance project which involves importing an extremely large Excel spreadsheet containing tens of thousands of addresses.

This data will need automatically updating periodically, there may be only small changes, there may be very large changes. One potential stumbling block is that this data has no primary key or unique ID which can be used to identify existing records.

In addition, another challenge this project will present is that none of these records have coordinate data associated with them.

This means it will be necessary to write a system which can accept a very large dataset, identify records which already exist in the database, and geocode any new records.

Identifying duplicates in a very large dataset (with no ID’s) and geocoding can be a very time consuming process indeed.

The final challenge is that this needs to work in any WordPress environment – in other words, relying on the availability and ability of the client to set up a cron job is not necessarily a given.

I’ve been pondering writing such a library for many months now, whilst I could batch geocode the initial data then hope for the best (in terms of large deltas) in the future, I’d like to take this opportunity to realise a library which I’ve wanted to develop for some time. I’m sure this has it’s uses elsewhere, both in terms of plugin development and in a broader context, so I’ve decided to create this library for this project and make it available for re-use elsewhere.

Creating the split batch library.

This library will allow developers to create a job, supplying a unique handle to identify the job, a subject (the data to be processed) and a job which is a kind of iterator.

The subject and the state of the iterator will be stored in the database, and the developer can pass in whatever processing functions they need. I will probably opt to implement this as an abstract class which requires the developer to implement the processing logic on a subclass, which can then be passed to the split batch daemon.

The library will also provide limits on the maximum number of iterations per request, and the maximum duration of processing per request. It will also track the last run date and time, and allow the developer to specify a frequency.

The end result of this will be a library which can process a specified number of records up to a set time limit, with a frequency set in so that the server isn’t crippled if there are a lot of incoming HTTP requests. This will give the illusion of a cron job driven by incoming requests, with a transparent experience to the site administrators.

I may release a small plugin to monitor and administrate these jobs in my own time if time permits this month.

More on this system will follow!

Leave a Reply