Sunday, March 1, 2015
Alexa Web Discovery Machine Stats Resources Fun!
- Link to Alexa
- Alexa Blog
WHAT?
"Founded in April 1996, Alexa Internet grew out of a vision of Web navigation that is intelligent and constantly improving with the participation of its users. Along the way Alexa has developed an installed based of millions of toolbars, one of the largest Web crawls and an infrastructure to process and serve massive amounts of data. For users of Alexas Toolbar and web site this has resulted in products that have revolutionized Web navigation and intelligence. For developers this has resulted in a set of tools unprecedented in scope allowing whole new services to be created on the Alexa data and platform" ...more
HOW?
"Alexa is continually crawling all publicly available web sites to create a series of snapshots of the Web. They use the data they collect to create features and services:
- Site Information: Traffic rankings, pictures of sites, links pointing to sites and more
- Related Links: Sites that are similar to the one you are currently viewing
Currently, Alexa gathers approximately 1.6 Terabytes (1600 gigabytes) of Web content per day. After each snapshot of the Web, which takes approximately two months to complete, Alexa has gathered 4.5 Billion pages from over 16 million sites"...more
JUICE?Whether the results are 100% accurate or not (methodology!), Alexa has a wonderful set of learning tools (Search, Traffic Rankings, Director, Alexa Toolbar and Developers Corner) to search, discover, rank and compare different sites around the world. For example, the most visited sites in all Education categories are:
- W3C - The World Wide Web Consortium
- How Stuff Works
- Classmates
- Massachusetts Institute of Technology
- National University of Singapore (NUS)
Surprisingly, NUS is ranked ahead of (6) University of California, Berkeley , (7) Stanford University, and (8) Harvard University. How is that possible? Well, this should give us some incentive to research Why this is the case. Let the students figure this one out! Also, perhaps Wikipedia should be placed in the education category. Yes, it will probably be ranked No. 1.
What about Malaysia? Currently, Yahoo is ranked No.1, but interestingly Friendster is No. 2, and Facebook is way down in 11th. I suppose in two months time, Facebook will probably overtake Friendster.
The beauty of Alexa is that you can actually use this tool to compare different sites of your own liking. In this example I have compared 5 Mambo Jumbo sites (Dont need to clarify!):
If you havent tried or used Alexa, it might be time to have some fun learning with your friends, colleagues or students exploring different rankings and comparing your favourite sites :)
Tuesday, February 3, 2015
How to use Support Vector Machine classifier in OpenCV for Linearly Separable Data sets


In this example, to train the SVM I used 10 points (x and y coordinates lying on a Cartesian plane). I separate these points in to two classes named as 0 and 1 (See following table).
To train the SVM we have to pass a N * M Mat of features (N rows, M columns) and a Nx1 Mat of class-labels
float trainingData[10][2] = { { 100, 10 }, { 150, 10 }, { 600, 200 }, { 600, 10}, { 10, 100 }, { 455, 10 }, { 345, 255 }, { 10, 501 }, { 401, 255 }, { 30, 150 } };
Here I create 10 * 2 data set as the feature space.
float labels[10] = { 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0 };
Here I create 10 * 1 data set as class labels. These class labels are mapped with the set of features as shown in Table 1.
SVM training function only accepts data as Mat objects so we need to create Mat objects from arrays defined above. Then after completing the training process we can use the trained SVM to classify given coordinates in to a class.
Following code can be used to train and predict using SVM. I have added comments to easily understand the code.
#include <opencvcv.h>
#include <opencvhighgui.h>
#include "opencv2/ml/ml.hpp"
void main(){
float labels[10] = { 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0 };
cv::Mat lablesMat(10, 1, CV_32FC1, labels);
float trainingData[10][2] = { { 100, 10 }, { 150, 10 }, { 600, 200 }, { 600, 10 }, { 10, 100 }, { 455, 10 }, { 345, 255 }, { 10, 501 }, { 401, 255 }, { 30, 150 } };
cv::Mat trainDataMat(10, 2, CV_32FC1, trainingData);
//Define parameters for SVM
CvSVMParams params;
//SVM type is defined as n-class classification n>=2, allows imperfect separation of classes
params.svm_type = CvSVM::C_SVC;
// No mapping is done, linear discrimination (or regression) is done in the original feature space.
params.kernel_type = CvSVM::LINEAR;
//Define the termination criterion for SVM algorithm.
//Here stop the algorithm after the achieved algorithm-dependent accuracy becomes lower than epsilon
//or run for maximum 100 iterations
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 1e-6);
CvSVM svm;
//Call train function
svm.train(trainDataMat, lablesMat, cv::Mat(), cv::Mat(), params);
//Create test features
float testData[2] = { 150, 15 };
cv::Mat testDataMat(2, 1, CV_32FC1, testData);
//Predict the class labele for test data sample
float predictLable = svm.predict(testDataMat);
std::cout << "Predicted label :" << predictLable << "
";
system("PAUSE");
}
In line 29 I pass data to predict the class label.
First I pass {150,15}. The class label was correctly predicted as 0. Following figure shows the output.
#include <opencvcv.h>
#include <opencvhighgui.h>
#include "opencv2/ml/ml.hpp"
void main(){
int width = 650, height = 650;
//Create a mat object
cv::Mat image = cv::Mat::zeros(height, width, CV_8UC3);
// Set up training data
float labels[10] = { 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0 }; //
cv::Mat labelsMat(10, 1, CV_32FC1, labels);
float trainingData[10][2] = { { 100, 10 }, { 150, 10 }, { 600, 200 }, { 600, 10 }, { 10, 100 }, { 455, 10 }, { 345, 255 }, { 10, 501 }, { 401, 255 }, { 30, 150 } };//
cv::Mat trainingDataMat(10, 2, CV_32FC1, trainingData);
// Define up SVMs parameters
CvSVMParams params;
params.svm_type = CvSVM::C_SVC;
params.kernel_type = CvSVM::LINEAR;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 1e-6);
// Train the SVM
CvSVM SVM;
SVM.train(trainingDataMat, labelsMat, cv::Mat(), cv::Mat(), params);
cv::Vec3b green(0, 255, 0), blue(255, 0, 0);
// Show the decision regions given by the SVM
for (int i = 0; i < image.rows; ++i)
for (int j = 0; j < image.cols; ++j)
{
cv::Mat sampleMat = (cv::Mat_<float>(1, 2) << j, i);
float response = SVM.predict(sampleMat);
if (response == 1)
image.at<cv::Vec3b>(i, j) = green;
else if (response == 0)
image.at<cv::Vec3b>(i, j) = blue;
}
// Show the training data
int thickness = -1;
int lineType = 8;
circle(image, cv::Point(100, 10), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(150, 10), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(600, 200), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(600, 10), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(10, 100), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(455, 10), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(345, 255), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(10, 501), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(401, 255), 5, cv::Scalar(255, 255, 255), thickness, lineType);
circle(image, cv::Point(30, 150), 5, cv::Scalar(255, 255, 255), thickness, lineType);
//Show the test data
circle(image, cv::Point(400, 200), 5, cv::Scalar(0, 0, 255), thickness, lineType);
imwrite("result.png", image); // save the image
imshow("SVM Simple Example", image); // show it to the user
cv::waitKey(0);
}
The following image shows the output of the code.
White dots shows the points of training data set. The red dot shows the point belongs to the test data which is {400,200}.
You can download the VisualStudio project from here. I have used OpenCV 2.4.9.
(To create the above code samples I used example codes provided in the link 1.)
References:
1. http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
2. http://docs.opencv.org/modules/ml/doc/support_vector_machines.html#cvsvmparams
3. http://web.mit.edu/6.034/wwwbob/svm-notes-long-08.pdf