Status : Verified
Personal Name Vesagas, Terence John C.
Resource Title Building filipiniana indexes towards the development of a standard filipiniana subject heading list
Date Issued May 2018
Abstract This research was conducted to explore the creation of Filipiniana indexes as part of an effort to push for the eventual creation of a standardized Filipiniana Subject Heading List. As there is no existing subject heading system or index based on the Filipiniana, this research aimed to push forward in the formation of a localized system by creating indexes for the Filipiniana collection. These indexes provides the first step towards subject analysis through the provision of key terms and contexts gained through the processing of the collection. With the growth of technology, methods such as automatic indexing have become a point of interest. This is because automatic indexing makes use of computational power to perform large-scale tasks with ease. Thus, big data analysis tools were used as the primary instrument to process the raw data and produce the final outputs. The UP Filipiniana collection was chosen for this research as it has the largest collection of Filipiniana materials in the country.

The raw dataset consists of 71,891 book titles from the Filipiniana collection. The raw dataset was run through an automatic analysis, which is composed of three parts: term frequency analysis, clustering analysis, and collocation analysis. The results were able to provide a description of the subjects that the UP Filipiniana collection has. These descriptions include a list of the most frequent terms within the collection, clusters of related terms, and the most frequent collocated terms in the collection. These outputs were then processed into a title index and permuted index, based on the most prominent terms and clusters. The final outputs, after being compared and analyzed, were able to show the viability of creating indexes for the eventual creation of a standardized Filipiniana Subject Heading List using big data analysis methods. Packages in R were utilized as they were designed and optimized for the methods of analysis required in automatic indexing. This research also highlights some of the problems that were encountered in applying these big data analysis methods within the local context. The raw and cleaned datasets, the results of each analysis, and the resulting outputs are available at https://github.com/tjcvz/UP_Filipiniana_Big_Data_Analysis. Future works of this study can be applied to other Filipiniana collections.
Degree Course Bachelor of Library and Information Science
Language English
Keyword Big data analysis; Clustering; Collation; Filipiniana; Index; Indexes; Subject analysis; Text mining; Thesis; UP Filipiniana
Material Type Thesis/Dissertation
Preliminary Pages
127.84 Kb