How does dimensionality influence outlier detection effectiveness in multivariate geochemical data? insights from LOF and IF methods

Journal Publication ResearchOnline@JCU
Shahrestani, Shahed;Sanislav, Ioan
Abstract

This paper examines the impact of the curse of dimensionality on the performance of isolation forest (IF) and local outlier factor (LOF) in detecting mineralization-related geochemical anomalies from a high-dimensional geochemical dataset. Using subsets selected through random and supervised methods with varying dimensions, IF and LOF were tested against known mineral deposit locations to assess their effectiveness. This study evaluates the percentage of mineral occurrences classified as anomalies and the area under the ROC curve across different dimensionalities. Furthermore, the influence of dimension reduction techniques such as PCA and ISOMAP on IF and LOF performance is explored. IF demonstrates consistent performance, proving robust across various dimensions and particularly suited to high-dimensional datasets. In contrast, LOF displays sensitivity to dimensionality, with optimal performance in lower dimensions (5 to 10 variables) but diminishing effectiveness beyond this range. This sensitivity highlights the importance of judicious input variable selection for LOF to achieve effective anomaly detection in geochemical datasets. Additionally, this study reveals that the performance of IF remains stable with both PCA and ISOMAP, whereas LOF benefits more from PCA, where its variance-maximizing feature may retain sufficient structural integrity for effective anomaly detection. Conversely, the performance of LOF declines with ISOMAP due to its more significant impact on local density changes. This variation underscores the need for a careful selection of dimension reduction methods and the number of components used as input for outlier detection methods.

Journal

Earth Science Informatics

Publication Name

N/A

Volume

18

ISBN/ISSN

1865-0473

Edition

N/A

Issue

N/A

Pages Count

15

Location

N/A

Publisher

Springer

Publisher Url

N/A

Publisher Location

N/A

Publish Date

N/A

Url

N/A

Date

N/A

EISSN

N/A

DOI

10.1007/s12145-024-01611-0