HOWTO: Setup shard-query framework environment

HOWTO: Setup shard-query framework environment

1 How to setup shard query environment for Scale out Partition Based parallel Database Systems
1.1 INSTALLING SHARD QUERY
1.2 CHECKING ENVIRONMENT
1.3 PERFORMING A TEST

How to setup shard query environment for Scale out Partition Based parallel Database Systems

ShardQuery is a PHP class which is intended to make working with a partitioned dataset easier. It offers an easy to use PHP class interface for querying the dataset.
CODE HERE http://code.google.com/p/shard-query/
Shardquery definition from the autor:
What is it? ShardQuery is a PHP class which is intended to make working with a partitioned dataset easier. It offers an easy to use PHP class interface for querying the dataset. Also included is run_query.php, which is an example application for the ShardQuery class. Shard-Query can be used transparently, and includes an experimental LUA script for MySQL proxy.
Key Features
ParallelPipelining – MPP distributed query engines runs fragments of queries in parallel, combining the results at the end. Like map/reduce except it speaks SQL directly.
Access many shards in parallel
QueryRouting – Sends queries only to the shard containing the requested data.
ConditionPushdown – Aggregation, joins and filtering are always performed at the shard level which fully distributes the work
Gearman Workers – PHP is not threaded. Gearman (Net_Gearman) is leveraged instead.
Parallel loader and pre-splitter can load or preprocess delimited files with ease. Overhead of looking up/creating new shard keys is spread over coprocesses

1. INSTALLING shard query

1) Check your php configuration

**PHP 5.3.3
[[email protected] php-5.3.3]# cat config.nice|grep -v \#
‘./configure’ \
‘–enable-json’ \
‘–with-pdo-mysql=mysqlnd’ \
‘–with-mysqli=mysqlnd’ \
‘–with-mysql=mysqlnd’ \
‘–enable-sockets’ \
‘–enable-pcntl’ \
[email protected]

2) Install Net_Gearman libs

Net_Gearman : Net_Gearman based PHP worker script, as well as a function which requests a resultset from each shard.
yum install php-pear
pear install channel://pear.php.net/Net_Gearman-0.2.3
pecl install json

3) Download/install SHARD QUERY

http://code.google.com/p/shard-query/source/checkout , by the way you should copy the include/ and Net/ directories into your PHP include path. Net_Gearman will attempt to autoload the files in the Net/Gearman/Job folder. So In your include path run this.
svn co http://shard-query.googlecode.com/svn/trunk/include/ <php include path>/include/

4) Install GEARMAN SERVER:

To install, first set up the Yum repository on your system, Then install the gearmand daemon and pecl extension:
rpm -Uvh http://repo.webtatic.com/yum/centos/5/latest.rpm
yum install gearmand php-pecl-gearman –enablerepo=webtatic
Remember to add to php include path the net-gearmandlib and the folder Include from shardquery, If you cant edit your php.ini add this to the script set_include_path(‘.:/usr/share/pear’);

2) CHECKING ENVIRONMENT

1) Start services

[[email protected] ~]# /etc/init.d/mysqld start
[[email protected] ~]# /etc/init.d/gearmand start
DEFAULT PORTS 0.0.0.0:4730/gearmand 0.0.0.0:3306/mysqld

2) Start workers

About the workers: Gearman::Worker is a worker class for the Gearman distributed job system, providing a framework for receiving and serving jobs from a Gearman server. TO DEBUG ERRORS: check cd workers/worker.log COMMON PROBLEMS: Gearman library is not in the worker php script. Gearman must be installed as a pear lib, and ALSO the shard-query gearman scripts in workers directory.Or more simple add net_gearman and Include folder of shardquery to php include folder.
./start_workers 1(number of workers)
note:check workers.log if any problem

3) Execute a query test

_example1]$ ./run_query.php
if you see a Unknown database error, its okey because you have any db created.

3) PERFORMING A TEST

FILTER: day 2000-01-01
PLease remember to configure your shard ini, with your own database settings.
database=localhost shard1=ontime shard2=ontime2
1. run the script populate_db.PHP to populate db with random data.populate_db will add for each year(2000,2012) all months,for each month(12) all days(31)..4320rows. also the script creates schema and tables and will output db shards data for shards ini config.
2. execute manually the following queries to create duplicate data.
USE ontime;
insert into ontime values(‘ontime.james’,’2000-01-01′,0);
insert into ontime values(‘ontime.james’,’2000-01-01′,0);
insert into ontime values(‘ontime.james’,’2000-01-01′,0);
insert into ontime values(‘ontime.james’,’2000-01-01′,0);
insert into ontime values(‘ontime.uniquevalues.jamesjara’,’2000-01-01′,0);
insert into ontime values(‘ontime2.james’,’2000-01-01′,0); #inthis case ontime2.james is NOT in ontime2 ,its just for test duplicates in diferente shards
DATA IN THE TABLE ONTIME FROM SHARD1
use ontime;
select scheme,FlightDate,number from ontime where FlightDate BETWEEN ’2000-01-01′ and ’2000-01-01′;
+———————+————+——–+
| scheme | FlightDate | number |
+———————+————+——–+
| ontime.james | 2000-01-01 | 0 | DUPLICATE IN THIS SHARD
| ontime.james | 2000-01-01 | 0 | DUPLICATE IN THIS SHARD , 2 times
| ontime.james | 2000-01-01 | 0 | DUPLICATE IN THIS SHARD , 3 times
| ontime.james | 2000-01-01 | 0 | DUPLICATE IN THIS SHARD , 4 times
| ontime.uniquevalues | 2000-01-01 | 0 | UNIQUE ROW
| ontime2.james | 2000-01-01 | 0 | DUPLICATED IN SHARD2
+———————+————+——–+
DATA IN THE TABLE ONTIME FROM SHARD2
use ontime2;
select scheme,FlightDate,number from ontime where FlightDate BETWEEN ’2000-01-01′ and ’2000-01-01′;
+—————+————+——–+
| scheme | FlightDate | number |
+—————+————+——–+
| ontime2.james | 2000-01-01 | 0 | DUPLICATED IN SHARD1
+—————+————+——–+
As you can see we have unique data across all the diferent shards.. so now lets run queries..

1. RUN

[[email protected] _example3]$ ./run_query.php

2. SQ INPUT QUERY:

select scheme,FlightDate,number from ontime where FlightDate BETWEEN ’2000-01-01′ and ’2000-01-01′;
the sq input query is in the file test1.sql

3. OUTPUT/RESULTS:

To see the output go to the file output.html, remember run runquery to update the file
$ firefox `pwd`/output.html
or for remote host
lynx -print output.html
RESULT:
+———————–+————+——–+
| scheme | FlightDate | number |
+———————–+————+——–+
| ontime2.james | 2000-01-01 | 0 | DUPLICATED 2 times, 1 TIME IN SHARD1(ontime) and 1TIME IN SHARD2(ontime2)
| ontime.james | 2000-01-01 | 0 | DUPLICATED 4 TIMES ONLY IN SHARD1(ontime)
| ontime.uniquevalues | 2000-01-01 | 0 | UNIQUE ROW IN SHARD1(ontime)
+———————–+————+——–+
Thanks for your reading, If you want more update follow my at @jamesjara

0 pensamientos:

Post a Comment

feedback!