Kohana and Gearman - Practical Multitasking

Published on:
Tags: Personal

Practical Multitasking with Kohana and Gearman

Quite a large proportion of my day-to-day job involves the design, implementation and ongoing improvement of an in-house API.

As a result of this, a commonly identified bottleneck arises when dealing with large, ‘expensive’ data. This is commonly seen when an application posts a large volume of well structured data to the API (that some process must be carried out upon), before some form of structured receipt is then returned as a request response.

Analysing such a request tends to show high PHP CPU usage with lower database consumption. This means that with an increase in data processing capacity we could make more use of this redundant database resource.

The structured nature of data exchanged via an API means that we can, relatively simply and reliably, divide the submitted data and process it simultaneously with the help of a great tool called Gearman.

Proposed Solution

Let’s first agree on a contrived, theoretical scenario against which to design our solution:

We have an API endpoint which receives a large number of entity ids and associated status updates. These ids must first be loaded from the database (to validate the ids) before their status is updated according to the posted value.

Our current solution iterates the posted data (no matter the size), loading and updating as it goes, until the entire posted array of ids is dealt with. We then return a response indicating the ids of all entities updated. The time taken to process the request is therefore linearly proportional to the number of posted entity ids.

Our solution will take the data passed by the user, divide it evenly into definable number of chunks and process each chunk in parallel -before finally combining the output from the individual processes and returning the combination to the user.

Practical Multitasking with Kohana and Gearman

Implementation

Kohana’s HMVC methodology lends itself perfectly to this solution. Firstly, we’ll need a parent controller that will transparently handle the splitting of the POSTED data. We’ll call this the ‘Farmable Controller’. Secondly, we will need a Gearman worker – we’ll call this our ‘ant’ – which will handle the incoming request from Gearman, process the request via Kohana and return the output back to our Farmable Controller.

Controller_Farmable

You can grab the raw code over at github. Essentially this is an example controller which deals with the automatic detection of a request suitable for parallelisation, and the subsequent splitting, assignment and merging of the request data.

Dispatching each sub-request via gearman
1
2
3
4
5
6
7
8
9
10
11
12
13
<?php

foreach ($arr_chunks as $chunk) {

  // Format the string to be passed to the worker $arr_d = $_POST;
  $arr_d[$this->_key_name] = $arr_chunks[$c];

  $str_data = $str_route . "#" . http_build_query($arr_d);

  $obj_gearman->addTask('make_request', $str_data); $c++;
}

?>

The above snippet demonstrates how we pass the request – along with the chunk of data that we want it to handle – to the ant (the Gearman worker).

Controller_Worker (ant)

This is our gearman worker. Its sole job is to listen for incoming requests, execute the passed request via an internal request and return the response to the parent process for later merging. You can view the full code here. The snippet below demonstrates how we make sense of the data received as part of the Gearman call, and how we execute the internal request:

Gearman worker example
1
2
3
4
5
6
7
8
9
10
11
<?php
  $arr_pieces = explode('#', $job->workload());

  // Assign the data
  $str_uri = $arr_pieces[0]; parse_str($arr_pieces[1],$arr_post);

  // Create and execute the request..
  $str_data = Request::factory($str_uri)->post($arr_post) ->execute() ->body();

  return $str_data;
?>

Controller_update

This is an example controller which simply takes the data posted to it and returns the data prefixed with the work “affected” (just to prove it has executed).

Take note of the class level variables defined within this controller:

  • $_data_key
    The key within POST which holds the items we are interested in

  • $_workable_limit
    The number of items within $POST[$data_key] for which to parallelise the request

  • $_worker_count
    The number of workers (ants) available over which to distribute the request

Running the Code

To run the code, you will need to ensure that you have Gearman installed locally as well as the Gearman PHP extension (see here). Then checkout my example project from GitHub.

Once you have the prerequisites in place you can start up the required Gearman workers (I recommend three instances). Running the code below will start one worker instance (ant).

1
cd your_project_dir/public php index.php --uri=worker &

You can then make a request to the dummy update controller with the below

1
php index.php --uri=updater --post='header=some_data&data;[]=penguin&data;[]=giraffe&data;[]=badger'

On executing the above request you should see each worker output a debug line informing that it has been requested to serve a response. You should also see your main request respond with a single request. You can then edit the updater.php controller and set the _workable_limit to a fictitiously high value and re-run the test (you will need to restart your workers). You should now notice the workers are no longer called and the main request should take noticeably longer to execute.

Summary

Hopefully from this you will have seen some of the powerful capabilies available with the HMVC nature of Kohana, which makes this technique so easy to achieve. If you haven’t come across Gearman before then this is a very basic introduction to an incredibly powerful tool, definitely check it out!

Furthermore, we have shown how a slight change in our system architecture has transformed the fundamental behaviour, and thus scaleability pattern, of our application – brains before brawn!.

I’ve also provided a working demonstration using Gearman and Kohana. Whilst it certainly isn’t production-ready, it should serve as a solid foundation for anyone wanting to play with the technique!