IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

[Computational-biology] Connectionists: Learning from Multiple Sources Workshop, 13 Dec '08 Whistler Canada

David R. Hardoon via comp-bio%40net.bio.net (by D.Hardoon from cs.ucl.ac.uk)
Thu Oct 2 00:51:46 EST 2008

Apologies if multiple copies are received.

Call for Papers:





24 Oct 08 Submission deadline for extended abstracts
28 Oct 08 Notification of acceptance
13 Dec 08 Workshop at NIPS 08, Whistler, Canada


While the machine learning community has primarily focused on  
analysing output of a single data source, there has been relatively  
few attempts to develop a general framework, or heuristics, for  
analysing several data sources in terms of a shared dependency  
structure. Learning from multiple data sources  (or alternatively, the  
data fusion problem) is a timely research area.  Due to the increasing  
availability and sophistication of data recording techniques and  
advances in data analysis algorithms, there exists many scenarios in  
which it is necessary to model multiple, related data sources, i.e. in  
fields such as bioinformatics, multimodal signal processing,  
information retrieval etc. The relevance of this research area is  
inspired by the human brain's ability to integrate five different  
sensory input streams into a coherent representation of its environment.

The open question is to find approaches to analyse data which consists  
of more than one set of observations (or view) of the same phenomenon.  
In general, existing methods use a discriminative approach, where a  
set of features for each data set is found in order to explicitly  
optimise some dependency criterion. Existing approaches include  
canonical correlation analysis (Hotelling, 1936), a standard  
statistical technique for modeling two data sources, and its multiset  
variation (Kettenring, 1971) which find linearly correlated features  
between data sets, and kernel variants (Lai and Fyfe, 2000; Bach and  
Jordan, 2002; Hardoon et al., 2004) and approaches that optimise the  
mutual information between extracted features (Becker, 1996; Chechik  
et al., 2003).  However, discriminative approaches may be ad hoc,  
require regularisation to ensure erroneous shared features are not  
discovered, and it is difficult to incorporate prior knowledge about  
the shared information. Generative probabilistic approaches address  
this problem by jointly modeling each data stream as a sum of a shared  
component and a 'private' component that models the within-set  
variation (Bach and Jordan, 2005; Leen and Fyfe, 2006; Klami and  
Kaski, 2006).

These approaches assume a simple relationship between (two) data  
sources, i.e.assuming a so-called 'flat' data structure where the data  
consists of N independent pairs of related data variables; whereas in  
practice, related data sources may exhibit extremely complex co- 
variation (for instance, audio and visual streams related to the same  
video). A potential solution to this problem could be a fully  
probabilistic approach, which could be used to impose structured  
variation within and between data sources. Additional methodological  
challenges include determining what is the 'useful' information we are  
trying to learn from the multiple sources and building models for  
predicting one data source given the others. As well as the  
unsupervised learning of multiple data sources detailed above, there  
is the closely related problem of multitask learning (Bickel et al.,  
2008), or transfer learning, where a task is learned from other  
related tasks.


The aim of the workshop is to promote discussion amongst leading  
machine learning and applied researchers about learning from multiple,  
related sources of data, with a focus on both methodological issues  
and applied research problems.

Topics of the workshop include (but not limited to):
- unsupervised learning (generative / discriminative modeling) of  
multiple related data sources
- canonical correlation analysis-type methods
- data fusion for real world applications, such as bioinformatics,  
sensor networks, multimodal signal processing, information retrieval
- multitask /transfer learning
- multiview learning


Prof. Michael Jordan
University of California, Berkeley

Dr. Francis Bach
École normale supérieure

Dr. Tobias Scheffer
Max-Planck-Institut fur Informatik


David R. Hardoon 		(University College London)
Gayle Leen 				(Helsinki University of Technology)
Samuel Kaski 			(Helsinki University of Technology)
John Shawe-Taylor 		(University College London)


Andreas Argyriou		(University College London)
Tom Dieithe 			(University College London)
Colin Fyfe			(University of the West of Scotland)
Jaakko Peltonen		(Helsinki University of Technology)


We invite the submission of high quality extended abstracts (2 to 4  
pages) in the NIPS style http://nips.cc/PaperInformation/StyleFiles.  
Abstracts should be sent (in .pdf/.ps) to D.Hardoon from cs.ucl.ac.uk, gleen from cis.hut.fi 

A selection of the submitted abstracts will be accepted as either an  
oral presentation or poster presentation. The best abstracts will be  
considered for extended versions in the workshop proceedings, and  
possibly as a special issue of a journal.

More information about the Comp-bio mailing list

Send comments to us at biosci-help [At] net.bio.net