Andreas Christmann (Free University of Brussels)
Nello Cristianini (University of Bristol)
Ingrid Daubechies (Princeton University)
Kristiaan Pelckmans (Katholieke Universiteit Leuven)
Alain Rakotomamonjy (INSA de Rouen)
Bernhard Schoelkopf (Max Planck Institute for Biological Cybernetics)
John Shawe-Taylor (University College London)
Johan Suykens (Katholieke Universiteit Leuven)
Sandor Szedmak (University of Southampton)
Ioannis Tsochantaridis (Google)
Jean-Philippe Vert (Ecole des Mines de Paris)
Kernel Based Quantile Regression: Consistency and Robustness
Andreas Christmann (Free University of Brussels)
(Based on joint work with Ingo Steinwart, Los Alamos National Laboratory)
Empirical risk minimization plays an important role in statistical machine learning. Support vector machines (SVMs) proposed by V. Vapnik and general SVMs based on arbitrary convex loss functions are leading examples.
Takeuchi et al. (2006) proposed nonparametric quantile regression based on kernels. This approach can be seen as a generalization of classical quantile regression proposed by Koenker and Bassett (1978) and described in detail by Koenker (2005).
In this talk results about learnability in the sense of L−risk consistency and robustness properties of such kernel based quantile regression estimators will be given.
References:
Christmann & Steinwart (2006). Consistency of kernel based quantile regression. Submitted.
Christmann & Steinwart (2005). Consistency and robustness of kernel based regression. Submitted.
Koenker (2005). Quantile regression. Cambridge University Press.
Koenker & Bassett (1978). Regression quantiles. Econometrica, 46, 35-50.
Schoelkopf & Smola (2002). Learning with Kernels. MIT Press.
Takeuchi, Le, Sears & Smola (2006). Nonparametric quantile estimation. JMLR, 7, 1231-1264.
Vapnik (1998). Statistical Learning Theory. Wiley, New York.
The Analysis of Patterns
Nello Cristianini (University of Bristol , Dept. Engineering Mathematics and Dept. of Computer Science)
Modern society relies on our capability to automatically detect very subtle patterns in very large and complex sets of data. This affects the way we do science, business and technology. This also poses important ethical questions. It is interesting to look at how science handles the problem of pattern analysis, and how many different scientific communities have ended up converging onto a single stable set of principles and methods for computational approaches to the same problem. Now computers can be trusted to do this job for us. How do they do it? We give a look at various approaches, as we focus on the field of kernel methods as a leading example.
Convergence of AdaBoost and related algorithms to maximum margin solutions
Ingrid Daubechies (Princeton University)
(Based on work joint with Cynthia Rudin and R. Shapire)
The algorithm AdaBoost, invented by Freund and Shapire, has shown to be useful for a variety of applications; it constructs, out of an ensemble of weak classifiers, an appropriate combination that functions as a strong classifier. It often generalizes surprisingly well. Until a few years ago, it wasn't known whether AdaBoost converges to the solution with maximum margin when this is is strictly positive, i.e. for those classification problems where an effective strong classifier can be constructed from the weak classifiers. In her Ph. D. thesis, Rudin showed (by counterexample) that AdaBoost does not necessarily converge to the maximum margin solution; together with Shapire and the speaker (her co-advisers), she also introduced "smoothened" versions of AdaBoost, and studied their mathematical properties. In particular, they do converge to the maximum margin solution, and both their asymptotic rate of convergence and an upper bound on the number of iterations necessary to achieve a pre-assigned maximum error can be given explicitly.
Convex Optimization for the Design of Learning Machines
Kristiaan Pelckmans (K.U. Leuven, ESAT-SCD)
This talk surveys convex optimization techniques for the design of task specific learning machines. Recently, techniques of Convex Optimization (CO) play a prominent role in learning approaches, as pioneered by the work on Support Vector Machines (SVMs) and other regularization based learning schemes. Duality theory has played an important role in the development of so-called kernel machines, while the fact of uniqueness of the optimal solution has permitted theoretical as well as practical breakthroughs. A third main advantage of using CO tools in research on learning problems is that the conceptual level of the design of a learning scheme becomes nicely separated from the actual algorithm implementing this scheme (the CO solver).
From a conceptual point of view, statistical learning theory established a theoretical sound foundation for prediction and generalization. This lead to necessary and sufficient requirements for guaranteeing a machine to learn effectively from observations. This talk examplifies how to use those generic principles for designing a learning machine for the task at hand, while fitting nicely into a context of convex optimization. Specifically, we will review a.o. results on primal-dual kernel machines for additive models, kernel machines in the context of missing and censored observations, and machines incorporating structural constraints. The classical supervised case, extensions towards the unsupervised case as well as formulations in the context of nonlinear dynamical systems will be discussed. We proceed by reviewing the role convex optimization can play in problems of model selection tasks. Finally, the talk will outline a generic setting for designing learning machines in the context of graph labeling tasks.
Regularization path algorithms
Alain Rakotomamonjy (INSA de Rouen)
In most machine learning algorithms, a trade-off between an empirical error and a regularizer (measuring how ''simple'' the decision function is) has to be determined in order to obtain a predictor with good generalization performance. The objective is then to minimize both the error and the regularizer, and the interest is to find the set of classifiers denoted as the accuracy-regularization frontier that cannot reduce both terms. The typical approach for assessing the trade-off parameter is to learn several functions belonging to the ''frontier'' and to evaluate them with validation data or cross-validation. Recently, approaches for efficiently computing the full frontier have been proposed. The aim of this talk is to review these recent contributions.
We first present different characterizations of this frontier and then review recent methods to obtain the so-called regularization path, which is the set of classifiers belonging to the frontier achieved by an algorithm when varying the trade-off parameter. We discuss the computation of the regularization path for several specific methods for which the regularization path can be computed very efficiently, such as SVM and Least Angle Regression.
Some thoughts on kernels
Bernhard Schoelkopf (Max Planck Institute for Biological Cybernetics)
My plan is to present my thoughts on what made kernel machines popular and what may or may not keep them going. I will also discuss applications in different domains, including computer graphics.
Kernel Methods: the Emergence of a Well-founded Machine Learning
John Shawe-Taylor (University College London)
During the last ten years kernel methods have established themselves as the method of choice in many types of machine learning problems. The talk will review the development of kernel methods emphasising a surprising convergence of theory and practice that we contend underscored the process. The talk will argue in favour of a broad definition of theory as a touchstone for the well-foundedness and scientific justification of an approach, but equally emphasise a key lesson that algorithmic efficiency and scalability is vital for widespread adoption.
Engineering Kernel Machines
Johan Suykens (K.U. Leuven, ESAT-SCD)
Support vector machines and kernel based learning is a highly multi-disciplinary research area at the intersection of different fields as machine learning and neural networks, mathematics and statistics, pattern recognition and signal processing, systems and control, optimization and others. In this respect, an important challenge is the integrative understanding and systematic design of kernel machines. In this talk we explain how least squares support vector machines may facilitate this process as 'core problem' formulations and elementary building blocks for the design of kernel machines. It enables to:
We illustrate the principles and methods on applications in nonlinear system identification and bioinformatics.
How to teach the Support Vector Machine to learn vectors and structured outputs at one-class complexity
Sandor Szedmak (Southampton University, ECS, ISIS Group, and, University of Helsinki, Department of Computer Science)
The Support Vector Machine (SVM) has proved it is a very useful device of machine learning, but it was restricted to directly solve binary classification problems only. There is a strong demand to extend the underlying ideas towards multiclass classification, multivariate regression problems and learning when the outputs have complex structures, e.g. graphs. The known approaches are tackling with the exploding computational complexity and the range of potential applications is generally very limited. We show there is a straightforward algebraic generalization of the SVM which can handle arbitrary vector outputs and preserves the same computational complexity of its binary ancestor. The multiclass classification, regression and structural learning problems can then be solved via an embedding into a properly chosen vector space.
Structure Prediction
Ioannis Tsochantaridis (Google)

Learning general functional dependencies over arbitrary inputs and outputs is one of the key challenges in machine learning. Recent progress in kernel methods has focused on designing flexible and powerful input representations, greatly increasing the range of applicability of learning methods to input representations based on strings, graphs, automata, etc. However, the output variables in supervised learning have typically been restricted to be either binary or real-valued. There are many real-world applications, nevertheless, where the outputs to be predicted have some meaningful internal structure (e.g. sequences, trees, graphs), or where multiple correlated outputs should be jointly predicted. Such problems, commonly called structure prediction problems, have recently drawn particular attention. Several approaches have been proposed and applied to problems in areas such asinformation retrieval, natural language processing, bioinformatics etc. In this talk I will give an overview of structured prediction, focusing on a specific generalization of support vector machines, and discuss challenges posed by practical applications.
Classification of biological sequences with kernel methods
Jean-Philippe Vert (Ecole des Mines de Paris)
The increasing amount of gene and protein sequences generated by sequencing projects has boosted the development of statistical and machine learning algorithms to compare and classify biological sequences. Kernel methods in general, and support vector machines in particular, lends themselves particularly well to this problem thanks to their capacity to manipulate any type of data as long as a positive definite kernel is defined. In this talk I will review different ways to make kernels for variable-length sequences and illustrate their applications in the context of biological sequence classification.
As this is the first talk in the workshop, I will start with a short introduction to kernel methods.