Main Article Content

Abstract

Web spam is a technique through which the immaterial pages get higher position than pertinent pages in the web index's outcomes. Spam pages are commonly inadequate and unseemly outcomes for client. Numerous scientists are working around there to distinguish the spam pages. Be that as it may, there is no widespread proficient technique grown so far which can recognize all spam pages. This paper is an exertion toward that path, where we propose a joined methodology of substance and connection based techniques to recognize the spam pages. The substance based methodology utilizes term thickness and Part of Speech (POS) proportion test and in the connection based methodology, we investigate the collective discovery utilizing customized page positioning to order the Web page as spam or non-spam. For test reason, dataset has been utilized. The outcomes have been contrasted and a portion of the current methodologies. A decent and promising F-proportion of 75.2% exhibits the relevance and productivity of our methodology.

Article Details