Determining Features of Web Documents and Building a Web Classifier using Support Vector Machine

1,2Taufik Fuadi Abidin, 2Alim Misbullah, 1,2Muhammad Subianto
1Department of Informatics, College of Science, Syiah Kuala University,
2Data Mining and Information Retrieval Research Group,

Published in:

AISS (Advances in Information Sciences and Service Sciences)
Volume 3 Issue 10, November, 2011
Pages 401-408
ISSN 1976-3700 (Print) 2233-9345 (Online)
GlobalCIS (Convergence Information Society, Republic of Korea)

Determining categories of web page documents is one of the important tasks in web mining. Web category can help users organize web pages into groups and help improve the quality of web search. The process of determining web category, a.k.a. web classification, involves predicting the category of newly encountered web pages using a pre-classified training set and classifier models. Many works in web page classification have been recognized, but none put emphasis on classification Indonesian web pages. In this paper, we extracted several potential feature attributes of Indonesian news web pages and used them to build Indonesian web page classifier using Support Vector Machine (SVM) for economics, sports, health, automotive, music, politics, and science topics. The features were extracted from the title and body (content) of the pages. The results show that the selected features are good features and the classifier accuracy is high. We measured the accuracy of the classifier using Fmeasure and Receiver Operating Characteristic (ROC) curve.

Web Features and Classifier, Support Vector Machine

2011 Article
View Contents
View Coverpage
View Paper
- Clicks: 2097
- Downloads: 555
Journal Home Page
Editorial Board
Scope, Paper type, Format
Paper Submission