Showing posts with label machine. Show all posts
Showing posts with label machine. Show all posts

Sunday, March 1, 2015

Alexa Web Discovery Machine Stats Resources Fun!

  • Link to Alexa
  • Alexa Blog

WHAT?
"Founded in April 1996, Alexa Internet grew out of a vision of Web navigation that is intelligent and constantly improving with the participation of its users. Along the way Alexa has developed an installed based of millions of toolbars, one of the largest Web crawls and an infrastructure to process and serve massive amounts of data. For users of Alexas Toolbar and web site this has resulted in products that have revolutionized Web navigation and intelligence. For developers this has resulted in a set of tools unprecedented in scope allowing whole new services to be created on the Alexa data and platform" ...more

HOW?
"Alexa is continually crawling all publicly available web sites to create a series of snapshots of the Web. They use the data they collect to create features and services:

  • Site Information: Traffic rankings, pictures of sites, links pointing to sites and more
  • Related Links: Sites that are similar to the one you are currently viewing

Currently, Alexa gathers approximately 1.6 Terabytes (1600 gigabytes) of Web content per day. After each snapshot of the Web, which takes approximately two months to complete, Alexa has gathered 4.5 Billion pages from over 16 million sites"...more

JUICE?
Whether the results are 100% accurate or not (methodology!), Alexa has a wonderful set of learning tools (Search, Traffic Rankings, Director, Alexa Toolbar and Developers Corner) to search, discover, rank and compare different sites around the world. For example, the most visited sites in all Education categories are:
  1. W3C - The World Wide Web Consortium
  2. How Stuff Works
  3. Classmates
  4. Massachusetts Institute of Technology
  5. National University of Singapore (NUS)

Surprisingly, NUS is ranked ahead of (6) University of California, Berkeley , (7) Stanford University, and (8) Harvard University. How is that possible? Well, this should give us some incentive to research Why this is the case. Let the students figure this one out! Also, perhaps Wikipedia should be placed in the education category. Yes, it will probably be ranked No. 1.

What about Malaysia? Currently, Yahoo is ranked No.1, but interestingly Friendster is No. 2, and Facebook is way down in 11th. I suppose in two months time, Facebook will probably overtake Friendster.

The beauty of Alexa is that you can actually use this tool to compare different sites of your own liking. In this example I have compared 5 Mambo Jumbo sites (Dont need to clarify!):


If you havent tried or used Alexa, it might be time to have some fun learning with your friends, colleagues or students exploring different rankings and comparing your favourite sites :)

Read more »

Tuesday, February 3, 2015

How to use Support Vector Machine classifier in OpenCV for Linearly Separable Data sets

In this tutorial I’m going to illustrate very basic and simple coding example targeting beginners to use Support Vector Machine (SVM) Implementation in OpenCV for Linearly Separable Data sets. I’m not going to explain the complex mathematical background of finding the optimal hyperplane. However in the first section of the post I’m going to give a simple introduction about support vector machines. 

When we consider classification and machine learning methods SVM is one of simple and easy to use classification method. Regarding image processing there are number of uses of SVM. Image classification and hand written character recognition are some uses of SVM. SVM can be easily used to classify feature vectors extracted from images. 

SVM is a supervised learning method. It defines separating hyperplanes between labeled data sets. These hyper planes can be used to categorize new data set which we don’t know the class label. 

To understand the problem easily instead of hyperplanes and vectors in a high dimensional space I used lines and points in the Cartesian plane (See the below figure).
Here the problem is to select a one from all possible lines which can be used to separate two classes, After defining such hyperplane we can categories new sample data to a class using that separating hyperplane, but it is difficult to define an optimal hyperplane. Based on a criterion we can estimate the worth of the lines. The operation of the SVM algorithm is based on finding the hyperplane that gives the optimal separating hyperplane (largest minimum distance to the training examples). 

Support vectors are the elements of the training set that would change the position of the dividing hyperplane if removed. Support vectors are the critical elements of the training set.


In this section I’m going to illustrate how we can use the SVM implementation in OpenCV to classify very simple data set. The SVM implementation in OpenCV is based on LibSVM.

In this example, to train the SVM I used 10 points (x and y coordinates lying on a Cartesian plane). I separate these points in to two classes named as 0 and 1 (See following table).


 
Table 1

To train the SVM we have to pass a N * M Mat of features (N rows, M columns) and a Nx1 Mat of class-labels

float trainingData[10][2] = { { 100, 10 }, { 150, 10 }, { 600, 200 }, { 600, 10}, { 10, 100 }, { 455, 10 }, { 345, 255 }, { 10, 501 }, { 401, 255 }, { 30, 150 } };

Here I create 10 * 2 data set as the feature space.

float labels[10] = { 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0 };

Here I create 10 * 1 data set as class labels. These class labels are mapped with the set of features as shown in Table 1.

SVM training function only accepts data as Mat objects so we need to create Mat objects from arrays defined above. Then after completing the training process we can use the trained SVM to classify given coordinates in to a class.

Following code can be used to train and predict using SVM. I have added comments to easily understand the code.

#include <opencvcv.h>
#include <opencvhighgui.h>
#include "opencv2/ml/ml.hpp"

void main(){
float labels[10] = { 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0 };
cv::Mat lablesMat(10, 1, CV_32FC1, labels);

float trainingData[10][2] = { { 100, 10 }, { 150, 10 }, { 600, 200 }, { 600, 10 }, { 10, 100 }, { 455, 10 }, { 345, 255 }, { 10, 501 }, { 401, 255 }, { 30, 150 } };

cv::Mat trainDataMat(10, 2, CV_32FC1, trainingData);

//Define parameters for SVM
CvSVMParams params;
//SVM type is defined as n-class classification n>=2, allows imperfect separation of classes
params.svm_type = CvSVM::C_SVC;
// No mapping is done, linear discrimination (or regression) is done in the original feature space.
params.kernel_type = CvSVM::LINEAR;
//Define the termination criterion for SVM algorithm.
//Here stop the algorithm after the achieved algorithm-dependent accuracy becomes lower than epsilon
//or run for maximum 100 iterations
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 1e-6);

CvSVM svm;
//Call train function
svm.train(trainDataMat, lablesMat, cv::Mat(), cv::Mat(), params);

//Create test features
float testData[2] = { 150, 15 };

cv::Mat testDataMat(2, 1, CV_32FC1, testData);

//Predict the class labele for test data sample
float predictLable = svm.predict(testDataMat);

std::cout << "Predicted label :" << predictLable << "
";

system("PAUSE");

}

In line 29 I pass data to predict the class label.

First I pass {150,15}. The class label was correctly predicted as 0. Following figure shows the output.



Then I changed the line no 29 to pass {400,200} as the  test data. Here is the output.


By doing some modifications to the above code as shown in the following code example, we can graphically represent the decision regions given by the SVM.

#include <opencvcv.h>
#include <opencvhighgui.h>
#include "opencv2/ml/ml.hpp"

void main(){
int width = 650, height = 650;
//Create a mat object
cv::Mat image = cv::Mat::zeros(height, width, CV_8UC3);

// Set up training data
float labels[10] = { 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0 }; //
cv::Mat labelsMat(10, 1, CV_32FC1, labels);
float trainingData[10][2] = { { 100, 10 }, { 150, 10 }, { 600, 200 }, { 600, 10 }, { 10, 100 }, { 455, 10 }, { 345, 255 }, { 10, 501 }, { 401, 255 }, { 30, 150 } };//
cv::Mat trainingDataMat(10, 2, CV_32FC1, trainingData);

// Define up SVMs parameters
CvSVMParams params;
params.svm_type = CvSVM::C_SVC;
params.kernel_type = CvSVM::LINEAR;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 1e-6);

// Train the SVM
CvSVM SVM;
SVM.train(trainingDataMat, labelsMat, cv::Mat(), cv::Mat(), params);

cv::Vec3b green(0, 255, 0), blue(255, 0, 0);
// Show the decision regions given by the SVM
for (int i = 0; i < image.rows; ++i)
for (int j = 0; j < image.cols; ++j)
{
cv::Mat sampleMat = (cv::Mat_<float>(1, 2) << j, i);
float response = SVM.predict(sampleMat);

if (response == 1)
image.at<cv::Vec3b>(i, j) = green;
else if (response == 0)
image.at<cv::Vec3b>(i, j) = blue;

}

// Show the training data
int thickness = -1;
int lineType = 8;
circle(image, cv::Point(100, 10), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(150, 10), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(600, 200), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(600, 10), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(10, 100), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(455, 10), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(345, 255), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(10, 501), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(401, 255), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(30, 150), 5, cv::Scalar(255, 255, 255), thickness, lineType);

//Show the test data
circle(image, cv::Point(400, 200), 5, cv::Scalar(0, 0, 255), thickness, lineType);

imwrite("result.png", image); // save the image

imshow("SVM Simple Example", image); // show it to the user
cv::waitKey(0);

}

The following image shows the output of the code.


White dots shows the points of training data set. The red dot shows the point belongs to the test data which is {400,200}.

You can download the VisualStudio project from here. I have used OpenCV 2.4.9.

(To create the above code samples I used example codes provided in the link 1.)
References:
1. http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
2. http://docs.opencv.org/modules/ml/doc/support_vector_machines.html#cvsvmparams
3. http://web.mit.edu/6.034/wwwbob/svm-notes-long-08.pdf

Read more »