基于张量机器学习模型_什么是基于模型的机器学习？

基于张量机器学习模型

About Tom: Tom Diethe is a research fellow on the SPHERE project at the University of Bristol. His research interests include probabilistic machine learning, computational statistics, learning theory, and data fusion. He has a PhD in machine learning applied to multivariate signal processing from University College London. Contact him at tom.diethe@bristol.ac.uk.

关于汤姆：汤姆·迪特（Tom Diethe）是布里斯托大学（University of Bristol）SPHERE项目的研究员。他的研究兴趣包括概率机器学习，计算统计，学习理论和数据融合。他在伦敦大学学院获得了应用于多元信号处理的机器学习博士学位。通过tom.diethe@bristol.ac.uk与他联系。

介绍 (Introduction)

If you haven’t had your head in the sand, you’ll realize that Machine Learning is no longer a niche research area, it is now a mainstream technology being applied by engineers right across the spectrum.

如果您还没有动静，您将意识到机器学习不再是一个利基研究领域，它现在已成为工程师在各个领域应用的主流技术。

“I think it’s the dawn of an exciting new era of info and computer science … It’s a new world in which the ability to understand the world and people and draw conclusions will be really quite remarkable… It’s a fundamentally different way of doing computer science.” -Steve Ballmer

“我认为这是令人兴奋的信息和计算机科学新时代的开始……这是一个新世界，在这个世界中，人们了解世界和人们并得出结论的能力将非常出色……这是一种完全不同的计算机科学方式。 ” -史蒂夫·鲍尔默

However, in practice there are pitfalls and dangers, especially when employing machine learning for the first time. It’s easy to become swamped by the sheer number of methods, there is a whole new vocabulary to learn, and it is often difficult to choose an algorithm for a given problem. Often, especially when data is unstructured (which is increasingly the case) it’s hard to work out which off-the-shelf method fits the problem, and instead one has to resort to coercing the data to fit the problem. It’s also often not clear how do to deal with noisy, or missing or corrupted data.

但是，实际上存在陷阱和危险，尤其是在第一次使用机器学习时。大量的方法很容易被淹没，要学习一整套全新的词汇，并且通常很难为给定的问题选择一种算法。通常，尤其是在数据是非结构化的情况下（这种情况越来越多），很难确定哪种现成的方法可以解决问题，而必须依靠强制数据来解决问题。通常也不清楚如何处理嘈杂，丢失或损坏的数据。

This post is about a different viewpoint called “model-based machine learning” [1], which tackles these difficulties, can solve problems in a wide range of application domains, and covers most existing machine learning algorithms as special cases. It will also allow us to deal with uncertainty that we encounter in real-world applications in a principled manner.

这篇文章的主题是“基于模型的机器学习” [1]，它解决了这些难题，可以解决广泛的应用领域中的问题，并涵盖了大多数现有的机器学习算法作为特例。这也将使我们能够以原则性的方式处理现实应用中遇到的不确定性。

Since the invention of the Perceptron algorithm [2], huge numbers of algorithms have been created to solve various specialist tasks. A common approach to solving problems involves trying a few different algorithms, often in practice guided by familiarity, or due to the presence of a particular toolbox in the language being used, rather than it being the most appropriate for the problem. Each algorithm will also have parameters that often require careful tuning, and these often don’t map to intuitive concepts. As a result, practitioners often resort to using the defaults set by the authors of a given toolbox, or exhaustive searches of the parameter space.

自Perceptron算法[2]发明以来，已经创建了大量算法来解决各种专业任务。解决问题的常用方法包括尝试几种不同的算法，通常是在实践中以熟悉为指导，或者是由于所使用的语言中存在特定的工具箱，而不是最适合该问题的工具。每个算法还将具有经常需要仔细调整的参数，而这些参数通常无法映射到直观的概念。结果，从业人员经常求助于使用给定工具箱的作者设置的默认值，或详尽搜索参数空间。

The Mark 1 Perceptron Machine

Mark 1感知机

If you can’t find an algorithm that fits your problem, you are left with two options: modify your problem until it fits some standard framework, or invent a new algorithm. Whilst the latter may get you a NIPS or ICML paper, this is not a viable option for most. Model-based machine learning instead offers tailored solutions.

如果找不到适合您问题的算法，您将有两个选择：修改问题直到适合某些标准框架，或者发明新算法。尽管后者可能会为您提供NIPS或ICML论文，但对于大多数人来说，这不是一个可行的选择。相反，基于模型的机器学习提供了量身定制的解决方案。

The central idea underpinning the model-based approach to directly encode any problem-specific assumptions, with any available prior knowledge, in the form of a (mathematical) model. These include the number and types of variables in the problem domain, and the factors that determine their interaction. A model-specific algorithm is then (automatically) generated. This approach can be used to do any standard machine learning task, such as classification, regression, or clustering, whilst improving understanding and control over how these tasks are accomplished.

基于模型的方法的核心思想是，以（数学）模型的形式，使用任何可用的先验知识直接对任何特定于问题的假设进行编码。这些因素包括问题域中变量的数量和类型，以及决定其相互作用的因素。然后（自动）生成特定于模型的算法。这种方法可用于执行任何标准的机器学习任务，例如分类，回归或聚类，同时提高对这些任务如何完成的理解和控制。

If we take the example of a network of different kinds of sensors in a real world environment, these will introduce different sources of uncertainty. We might have sensors that are simply not working, or that are giving incorrect readings. More generally, a given sensor will at any given time have a particular signal to noise ratio, and the types of noise that are corrupting the signal might also vary.

如果我们以现实环境中不同类型的传感器网络为例，这些将引入不同的不确定性来源。我们可能有一些传感器根本无法工作，或者给出了错误的读数。更一般而言，给定的传感器将在任何给定的时间具有特定的信噪比，并且破坏信号的噪声类型也可能会发生变化。

As a result we need a principled framework for quantifying and computing with uncertainty. In the model-based approach we build a model of how the data was generated, which can directly incorporate the noise models for each of the sensors. It is easy to see that probabilistic Bayesian graphical models are a natural fit to the model-based framework.

因此，我们需要一个原则性的框架进行不确定性的量化和计算。在基于模型的方法中，我们建立了如何生成数据的模型，该模型可以直接合并每个传感器的噪声模型。显而易见，概率贝叶斯图形模型很自然地适合基于模型的框架。

“The subjectivist states his judgements, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science.” -I.J. Good

“主观主义者陈述了自己的判断，而客观主义者则通过称呼假设知识将其扫荡在地毯下，并沉迷于科学的光荣客观性。” -IJ好

The key insight that is often overlooked is that any off-the-shelf algorithm has underlying assumptions of its own (intentionally or not), although often these are ill-defined. The result is that the algorithm behaves like a “black box”, meaning empirical comparisons are necessary, for example by nested cross-validation. This is laborious and inefficient, and there are many pitfalls to this approach [3]. If no algorithm gives adequately good results the way forward is even more unclear.

经常被忽略的关键见解是，任何现成的算法都有其自身的基础假设（有意或无意），尽管这些假设通常是不确定的。结果是该算法的行为就像“黑匣子”，这意味着需要进行经验比较，例如通过嵌套交叉验证。这既费力又效率低下，这种方法有很多陷阱[3]。如果没有算法能给出足够好的结果，那么前进的道路就更加不清楚。

From a model-based viewpoint, to make predictions using the model we need to plug the observed data into model, and compute the probabilities of the possible values a variable can take after the relevant evidence is taken into account – a process known as “inference”.

从基于模型的角度来看，要使用模型进行预测，我们需要将观察到的数据插入模型中，并计算出在考虑了相关证据后变量可以考虑的可能值的概率–这个过程称为“推论” ” 。

精确和近似推论 (Exact and Approximate Inference)

By separating model and inference in this manner, the same method of inference can be applied to a wide variety of models, or alternatively different inference methods can be used for the same model. Perhaps the simplest approaches to inference are simulation based approaches such as Markov chain Monte Carlo (MCMC), which can shown to be correct in the long run, but slow to converge. For any “realistic” model with more than a trivial quantity of data, we may hit the limits of computational tractability. Deterministic approximate inference methods, such as Expectation Propagation and Variational Bayes, make it possible to learn models by trading off computation time for accuracy.

通过以这种方式分离模型和推理，可以将相同的推理方法应用于多种模型，或者可以将不同的推理方法用于同一模型。也许最简单的推理方法是基于仿真的方法，例如马尔可夫链蒙特卡洛（MCMC），从长远来看可以证明是正确的，但收敛速度较慢。对于任何数据量不大的“现实”模型，我们可能会达到计算可处理性的极限。确定性近似推理方法（例如，期望传播和变分贝叶斯算法）可以通过权衡计算时间以获得准确性来学习模型。

“An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.” -John Tukey

“对正确问题的近似答案比对近似问题的精确答案更有价值。” -约翰·图基

比较模型 (Comparing Models)

Because we have clearly laid out the assumptions when creating models, it becomes easier to compare them both qualitatively and quantitatively. This is especially important if there are parts of the modelling process that are difficult to pin down.

因为我们在创建模型时已经明确列出了假设，所以定性和定量比较它们变得更加容易。如果建模过程中的某些部分难以确定，这尤其重要。

“Statisticians, like artists, have the bad habit of falling in love with their models.” -George Box

“统计学家和艺术家一样，都有爱上模型的坏习惯。” -乔治盒

If we can compute the probability of a model given the the data (not assuming any particular model parameters), this is the []“Model Evidence”](https://en.wikipedia.org/wiki/Marginal_likelihood) (or marginal likelihood). This quantity can then be used to compute the so-called “Bayes factor”, which can be thought of as a Bayesian alternative to classical hypothesis testing [4, 5], by taking the ratio of the evidence for each model. For models where the evidence is too costly to evaluate numerically, approximate Bayesian computation can be used instead.

如果我们可以根据数据计算出模型的概率（不假设任何特定的模型参数），则这是[]“模型证据”]（https://en.wikipedia.org/wiki/Marginal_likelihood）（或边际）可能性）。然后，通过采用每种模型的证据比率，可以使用此数量来计算所谓的“贝叶斯因子” ，可以将其视为经典假设检验的贝叶斯替代方案[4，5]。对于证据过于昂贵而无法进行数字评估的模型，可以使用近似贝叶斯计算。

案例研究：SPHERE项目 (Case Study: The SPHERE Project)

Many countries are experiencing the effects of an ageing population, which coupled with a rise in chronic health conditions is encouraging a shift towards the managing health related issues in the home. The SPHERE (a Sensor Platform for HEalthcare in a Residential Environment) project [6] has designed a multimodal sensor system and analytics platform for this purpose. Naturally, the SPHERE setting presents many sources of uncertainty. Firstly, we are dealing with multiple sensor modalities (environmental, body-worn, video), each of which will have different noise profiles and failure modes. Secondly, annotated or labelled data is expensive and intrusive to acquire, and the resulting labels are potentially noisy and inaccurate. Lastly, patterns of human behaviour are subject to many factors that may or may not be attributed to the particular health context of a given individual. In this project we are making use of model-based methods throughout.

许多国家正在经历人口老龄化的影响，加上长期健康状况的恶化，正在鼓励人们转向管理家庭中与健康相关的问题。 SPHERE（用于居住环境中的医疗保健的传感器平台）项目[6]为此目的设计了一个多模式传感器系统和分析平台。自然，SPHERE设置会带来许多不确定性来源。首先，我们正在处理多种传感器模式（环境，穿戴式，视频），每种传感器都有不同的噪声特征和故障模式。其次，带注释或加标签的数据获取起来昂贵且侵入性，并且产生的标签潜在地嘈杂且不准确。最后，人类行为方式受到许多因素的影响，这些因素可能或可能不会归因于给定个体的特定健康状况。在这个项目中，我们始终使用基于模型的方法。

The SPHERE House, in Bristol, England, has been retrofitted with a number of smart-home technologies designed to gauge inhabitants’ physical and mental wellbeing. Photo: Sion Hannuna

位于英国布里斯托尔的SPHERE House进行了翻新，配备了许多智能家居技术，旨在评估居民的身心健康。照片：锡永汉奴纳

基于模型的机器学习工具 (Tools for Model-Based Machine Learning)

Because of the separation of the model from the method of inference, it also becomes possible (if by no means trivial) to create software that is able to take the model, as specified using some form of modelling language or API, and then automatically generate inference routines (possibly even by automatically generating source code!) to solve a wide variety of models. This allows a new breed of engineer – effectively a “modeller”, who does not need to know about the specifics about the inference method being used. Some examples of software packages that seek to achieve this are:

由于模型与推理方法的分离，因此也有可能（如果不是很简单的话）创建能够采用某种形式的建模语言或API指定的能够采用模型的软件，然后自动生成推理例程（甚至可能通过自动生成源代码！）来解决各种模型。这使新一代的工程师成为有效的“建模者”，他们无需了解所使用的推理方法的细节。寻求实现此目的的软件包的一些示例是：

Infer.NET. A software framework developed at Microsoft Research Cambridge for running Bayesian inference in graphical models. It can also be used for probabilistic programming.
BUGS. A Bayesian modelling framework using MCMC methods.
Church. A probabilistic programming language designed for expressive description of generative models .
Stan. A probabilistic programming language implementing full Bayesian statistical inference with MCMC sampling, approximate Bayesian inference with Variational inference and penalized maximum likelihood estimation.
GPy. Gaussian processes framework in python, from the Sheffield machine learning group.
PyMC. A python module that implements Bayesian statistical models and fitting algorithms, including MCMC.

Infer.NET 。由Microsoft Research Cambridge开发的软件框架，用于在图形模型中运行贝叶斯推理。它也可以用于概率编程。
臭虫。使用MCMC方法的贝叶斯建模框架。
教堂。一种概率编程语言，用于生成模型的表达表示。
斯坦。一种概率编程语言，可通过MCMC采样实现完整的贝叶斯统计推断，具有变分推断的近似贝叶斯推断和惩罚的最大似然估计。
GPy 。来自Sheffield机器学习小组的Python中的高斯处理框架。
PyMC 。一个实现贝叶斯统计模型和拟合算法（包括MCMC）的python模块。

See the ‘early access’ model-based machine learning book at http://www.mbmlbook.com.

请参阅http://www.mbmlbook.com上基于模型的“早期访问”机器学习书。

摘要 (Summary)

Machine learning is being successfully applied to a large number of real-world problems as you read this post. In some cases, all that is needed is the “best guess answer”, and there also happens to be an off-the-shelf tool for the job. However, there are also a large class of other scenarios: where either no such tool exists, where quantifying the uncertainty is of great importance or where you want to be able to introspect on the answers given by the system. Model-based machine learning provides a compelling path to tackling all of these scenarios.

阅读本文后，机器学习已成功应用于许多现实世界中的问题。在某些情况下，所需要做的只是“最佳猜测答案”，而且恰好有一种现成的工具可以完成这项工作。但是，还有很多其他方案：没有这种工具，量化不确定性非常重要，或者您希望能够对系统给出的答案进行反思。基于模型的机器学习为解决所有这些情况提供了引人注目的途径。

参考资料 (References)

[1]. Winn, J., Bishop, C.M., Diethe, T. (2015). Model-Based Machine Learning. Microsoft Research Cambridge. http://www.mbmlbook.com.

[1]。 Winn，J.，Bishop，CM，Diethe，T.（2015年）。基于模型的机器学习。微软研究院剑桥。 http://www.mbmlbook.com 。

[2]. Rosenblatt, F. (1957). The perceptron, a perceiving and recognizing automaton. Report 85-460-1. Cornell Aeronautical Laboratory.

[2]。 Rosenblatt，F。（1957年）。感知器，感知和识别自动机。报告85-460-1。康奈尔航空实验室。

[3]. Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of cheminformatics, 6(1), 1-15.

[3]。 Krstajic，D.，Buturovic，LJ，Leahy，DE，＆Thomas，S.（2014年）。选择和评估回归模型和分类模型时，交叉验证的陷阱。化学信息学报，6（1），1-15。

[4]. Goodman, S. N. (1999). Toward evidence-based medical statistics. 1: The P value fallacy. Annals of internal medicine, 130(12), 995-1004.

[4]。 Goodman，SN（1999）。迈向循证医学统计。 1：P值谬误。内科医学杂志，130（12），995-1004。