Quite a large proportion of my day-to-day job involves the design, implementation and ongoing improvement of an in-house API.
As a result of this, a commonly identified bottleneck arises when dealing with large, ‘expensive’ data. This is commonly seen when an application posts a large volume of well structured data to the API (that some process must be carried out upon), before some form of structured receipt is then returned as a request response.
Analysing such a request tends to show high PHP CPU usage with lower database consumption. This means that with an increase in data processing capacity we could make more use of this redundant database resource.
The structured nature of data exchanged via an API means that we can, relatively simply and reliably, divide the submitted data and process it simultaneously with the help of a great tool called Gearman.
Let’s first agree on a contrived, theoretical scenario against which to design our solution:
We have an API endpoint which receives a large number of entity ids and associated status updates. These ids must first be loaded from the database (to validate the ids) before their status is updated according to the posted value.
Our current solution iterates the posted data (no matter the size), loading and updating as it goes, until the entire posted array of ids is dealt with. We then return a response indicating the ids of all entities updated. The time taken to process the request is therefore linearly proportional to the number of posted entity ids.
Our solution will take the data passed by the user, divide it evenly into definable number of chunks and process each chunk in parallel -before finally combining the output from the individual processes and returning the combination to the user.
Kohana’s HMVC methodology lends itself perfectly to this solution. Firstly, we’ll need a parent controller that will transparently handle the splitting of the POSTED data. We’ll call this the ‘Farmable Controller’. Secondly, we will need a Gearman worker - we’ll call this our ‘ant’ - which will handle the incoming request from Gearman, process the request via Kohana and return the output back to our Farmable Controller.
You can grab the raw code over at github. Essentially this is an example controller which deals with the automatic detection of a request suitable for parallelisation, and the subsequent splitting, assignment and merging of the request data.
1 2 3 4 5 6 7 8 9 10 11 12 13
The above snippet demonstrates how we pass the request - along with the chunk of data that we want it to handle - to the ant (the Gearman worker).
This is our gearman worker. Its sole job is to listen for incoming requests, execute the passed request via an internal request and return the response to the parent process for later merging. You can view the full code here. The snippet below demonstrates how we make sense of the data received as part of the Gearman call, and how we execute the internal request:
1 2 3 4 5 6 7 8 9 10 11
This is an example controller which simply takes the data posted to it and returns the data prefixed with the work “affected” (just to prove it has executed).
Take note of the class level variables defined within this controller:
The key within POST which holds the items we are interested in
The number of items within $POST[$data_key] for which to parallelise the request
The number of workers (ants) available over which to distribute the request
Running the Code
Once you have the prerequisites in place you can start up the required Gearman workers (I recommend three instances). Running the code below will start one worker instance (ant).
You can then make a request to the dummy update controller with the below
On executing the above request you should see each worker output a debug line informing that it has been requested to serve a response. You should also see your main request respond with a single request. You can then edit the updater.php controller and set the _workable_limit to a fictitiously high value and re-run the test (you will need to restart your workers). You should now notice the workers are no longer called and the main request should take noticeably longer to execute.
Hopefully from this you will have seen some of the powerful capabilies available with the HMVC nature of Kohana, which makes this technique so easy to achieve. If you haven’t come across Gearman before then this is a very basic introduction to an incredibly powerful tool, definitely check it out!
Furthermore, we have shown how a slight change in our system architecture has transformed the fundamental behaviour, and thus scaleability pattern, of our application - brains before brawn!.
I’ve also provided a working demonstration using Gearman and Kohana. Whilst it certainly isn’t production-ready, it should serve as a solid foundation for anyone wanting to play with the technique!