Sök Innehåll Kontakt


Title Speaker
Data mining and applied linear algebra Moody Chu
Location Time and Date
PDC seminar room, Teknikringen 14, level 3 14.30-15.30, 2010-05-05
Abstract
In the era of information and digital technologies, massive amounts of data are being generated at almost every level of applications in almost every area of disciplines. Quite often the data collected from complex phenomena represent the integrated result of several interrelated variables, whereas these variables are less precisely defined. Ambiguities in the variables, noises in the measurement, variations throughout the parsing and indexing, and even the insufficiency of some information render considerable uncertainties in the data. Without parallel gains in techniques for effectively organizing or sorting such data, the gains in the amount of information would simply be an inundatory deluge. Extracting interesting information from raw data, generally known as the data mining, therefore becomes an indispensable task.
The principal objective in data mining is to distinguish which variable is related to which and how the variables are related. In many situations the digitized information is gathered and stored as a data matrix. It is often the case, or so assumed, that the exogenous variables depend on the endogenous variables in a linear relationship. Retrieving useful information therefore can often be characterized as finding suitable matrix factorization.
In this talk, we offer a synoptic view on how linear algebra techniques can help to carry out the task of data mining. Examples from factor analysis, cluster analysis, and latent semantic indexing are used to demonstrate how matrix factorization helps to uncover hidden connection and do things fast. Low rank matrix approximation plays a fundamental role in cleaning the data and compressing the data. Other types of constraints, such as nonnegativity, will also be briefly discussed. Finally, link analysis is a necessary dynamical system and desideratum by which we classify and rank trust and significance of retrieved data.



Sidansvarig: Webmaster, KCSE
qiang@mech.kth.se

Uppdaterad: 2007-01-06