A Graph based Methodology for Web Structure Mining: with a Case Study on the Webs of UK Universities

Tahani Alqurashi, Wenjia Wang

Research output: Contribution to conferencePaperpeer-review

Abstract

Web structure mining is to extract knowledge from the
hyperlink structure data of world wide webs for improving web
design for clear content presentation and easy navigation. This
paper presents a graph-based methodology for web structure
mining. The structure of a website is firstly mapped onto a
graph with its nodes representing web pages and links
representing hyperlinks between pages and other websites. Then
the characteristics of the web graph, such as, the degree of each
node, density, connectivity, the closeness centralisation, and the
node clusters, can be analysed quantitatively. The methodology
is tested on the web structural data collected from 110 UK’s
university websites. After cleansing and pre-processing the
data, the graphs were constructed and analysed to obtain the
aforementioned properties for each web and other useful
information, such as page size and the length of the optimal
path as they both affect the navigability. Based on the
evaluation of the properties, some guidelines and criteria are
devised for quantifying the structural quality of the webs into
five categories from very poor to very good. The average
degree and the percentage of strongly connected component
(SCC) pages together with the average distance were found to
be the most important properties in determining the structural
quality of a web.
Original languageEnglish
Publication statusPublished - Jun 2014
EventInternational Conference on Web Intelligence, Mining and Semantics - Thesseloniki, Greece
Duration: 2 Jun 20145 Jun 2014

Conference

ConferenceInternational Conference on Web Intelligence, Mining and Semantics
CountryGreece
CityThesseloniki
Period2/06/145/06/14

Keywords

  • Web Mining
  • Graph theory
  • web structure

Cite this