计算机专业毕业设计文献翻译
《计算机专业毕业设计文献翻译》由会员分享,可在线阅读,更多相关《计算机专业毕业设计文献翻译(19页珍藏版)》请在毕设资料网上搜索。
1、 第六章 大尺寸问题 在这一章节我们讨论一些方法 ,为了解决 LS-SVM 的大数据设定中的方法设计和归类问题 . We explain Nystom method as proposed in the context of Gaussian processes and incomplete Cholesky factorization for low rank approximation. Then a new technique of fixed size LS-SVM is presented. In this fixed size LS-SVM method one solves th
2、e primal problem instead of the dual, after estimating the map to the feature space B based upon the eigenfunctions obtained form kernel PCA, which is explained in more detail in the next Chapter. This method gives explicit links between function estimation and density estimation, exploits the prima
3、l-dual formulations, and addresses the problem of how to actively select suitable support vectors instead of taking random points as in the Nystrom method. Next we explain methods that aim at constructing a suitable basis in the feature space. Furthermore, approaches for combining submodels are disc
4、ussed such as committee networks and nonlinear and multilayer extensions of this approach. 6.1 Low rank approximation methods 6.1.1 Nystrom method Suppose one takes a linear kernel. We already mentioned that one can in fact equally well solve then the primal problem as the dual problem. In fact solv
5、ing the primal problem is more advantageous for larger data sets wile solving the dual problem is more suitable for large dimensional input Fig.6.1 For linear support vector machines the dual problem is suitable for solving problems with large dimensional input spaces while the primal problem is con
6、venient twords large data sets. However, for nonlinear SVMs one has no expression for B(x), as a result one can only solve the dual problem in terms of the related kernel function. In the method of Fixed Size LS-SVM the Nystrom method is used to estimate eigenfunctions. After obtaining estimates for
7、 B(x) and linking primal-dual formulations, the computation of W, B is done in the primal space. spaces because the unknowns are ERn and ERN, respectively, where n denotes the dimension for the input space and N the number of given training data points. for example in the linear function estimation
8、case one has 公式 (6.1) by elimination of the error variables EK, which one can immediately solve. In this case the mapping B becomes B(XK) = (XK) and there is no need to solve the dual problem in the support values A, certainly not for large data sets. For the nonlinear case on the other hand the sit
9、uation is much more complicated. For many choices of the kernel, B () may become infinite dimensional and hence also the W vector. However, one may still try in this case to find meaningful estimates for B (XK). A procedure to find such estimates is implicitly done by the Nystrom method, which is we
10、ll known in the area of integral equations 14; 63 and has been successfully applied in the context of Gaussian processes by Williams & Seeger in 294. The method is related to finding a low rank approximation to the given kernel matrix by randomly choosing M rows/columns of the kernel matrix. Let us
11、denote the big kernel matrix by (N, N) and the small kernel matrix based on the random subsample (M, M) with M N (In practice often M N). Consider the eigenvalue decomposition of the small kernel matrix (M, M) (6.2) where B contains the eigenvalues and U the corresponding eigenvectors. This is relat
12、ed to eigenfunctions 1 and eigenvalues 2 of the integral equation( 6.3) 。 as follows (6.4) 。 Where M1 and M2 are estimates to M1 and M2 respectively, for the integral equation, and uki denotes the ki-th entry of the matrix U. This can be understood form sampling the integral by M points x1, x2 , xm.
13、 For the big kernel matrix on has the eigenvalue decomposition (6.5) Furthermore, as explained in 294 one has 。 (6.6) 。 One can then show that (6.7) where O (N, M) is the N block matrix taken from O (N, N). These insights are used then for solving in an approximate sense the linear system (6.8)witho
14、ut bias term in the model, as considered in Gaussian process regression problems. By applying the Sherman-Morrison-Woodbury formula 98 one obtains 294: (6.9) where 2 are calculated from (6.6) base upon 2 from the small matrix. In LS-SVM classification and regression one usually considers a bias term
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中设计图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 计算机专业 毕业设计 文献 翻译
