In this paper, we present LODAP, a log data preprocessor which is able to extract user sessions starting from the requests stored in the log file of a Web site. LODAP is composed of several modules. A data cleaning module cleans the log file by removing useless records in order to retain only relevant requests encoding the user navigational behaviour. The data structuration module groups the remained requests in user sessions, by using a time-based method. Finally, the data filtering module considerably reduces the size of data concerning the extracted user sessions by deleting the least visited pages and the uninteresting sessions. In addition, a data summarization module creates reports which represent information summaries mined from the analyzed log file and containing the results provided by each module of LODAP. The implemented tool is characterized by a wizard-based interface which guides the analyst during the preprocessing of the log data through a sequence of "panels". Each panel is a graphical window which offers a basic functionality of the processor. Tests on the log files of a specific Web site show that the LODAP tool can effectively reduce the log dataset size and identify significant user sessions.

LODAP: A LOg DAta Preprocessor for mining Web browsing patterns

CASTELLANO, GIOVANNA;FANELLI, Anna Maria;
2007-01-01

Abstract

In this paper, we present LODAP, a log data preprocessor which is able to extract user sessions starting from the requests stored in the log file of a Web site. LODAP is composed of several modules. A data cleaning module cleans the log file by removing useless records in order to retain only relevant requests encoding the user navigational behaviour. The data structuration module groups the remained requests in user sessions, by using a time-based method. Finally, the data filtering module considerably reduces the size of data concerning the extracted user sessions by deleting the least visited pages and the uninteresting sessions. In addition, a data summarization module creates reports which represent information summaries mined from the analyzed log file and containing the results provided by each module of LODAP. The implemented tool is characterized by a wizard-based interface which guides the analyst during the preprocessing of the log data through a sequence of "panels". Each panel is a graphical window which offers a basic functionality of the processor. Tests on the log files of a specific Web site show that the LODAP tool can effectively reduce the log dataset size and identify significant user sessions.
2007
978-960-8457-59-1
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/55011
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact