Fiter: Greedy Bloggers for a Picky Audience Avner May, Augustin

Transcription

Compact Kernel Models for Acoustic Modeling via Random Feature Selection
Avner
*
May ,
*Columbia
Main Idea
• Kernel approximation methods often
require very many random features for
good performance.
• We propose a simple feature selection
method for reducing the number of
required features, and show it is
effective in acoustic modeling.
• We perform comparisons to DNNs,
and achieve comparable WER results
on Broadcast News (50 hour) dataset.
Michael
*
Collins ,
Daniel
University Computer Science Department,
Background and Motivation
*
Hsu ,
†IBM
Brian
†
Kingsbury
TJ Watson Research Center
Datasets Used
Feature Selection Algorithm
Kernel Approximation
𝑇
•
𝐾 𝑥𝑖 , 𝑥𝑗 ≈ 𝑧 𝑥𝑖 𝑧 𝑥𝑗
•
Random Fourier Features [1]:
𝑧𝑖 𝑥 =
2/𝐷 cos(𝑤𝑖𝑇 𝑥 + 𝑏𝑖 )
RBF: 𝑤𝑖 ~ 𝑁𝑜𝑟𝑚𝑎𝑙, 𝑏𝑖 ~ 𝑈𝑛𝑖𝑓[0,2𝜋]
Kernels Used
Acoustic Modeling
• 𝑝 𝑦 𝑥 ; 𝜃 = exp
𝑇
𝜃𝑦 𝑧
𝑥
/
𝑦 exp
𝑇
𝜃𝑦 𝑧
𝑥
• Maximizing regularized log-likelihood is convex.
• Similar to single hidden-layer neural net, with cosine
activation function, and random first layer weights.
[1] A. Rahimi and B. Recht, “Random Features for Large-Scale Kernel Machines,” in Proceedings of NIPS, 2007
Cantonese and Bengali Results
Broadcast News Results
Effects of Feature Selection
Do features selected in iteration 𝑡 survive future rounds?
Future Work
• Why are DNNs typically better than kernels models
in terms of WER, when they have comparable crossentropy?
• Can we reduce the number of parameters needed
by the kernel models, and maintain strong
performance? Could we then train models with
even more random features?
Comparison to DNNs
Are some input features more important than others?
• How does feature selection compare with
backpropagation?
• Can we leverage sparse random projections for
faster training?
• Theoretical guarantees for algorithm?
[2] ICASSP 2016: USC, Columbia, IBM. “A Comparison Between Deep Neural Networks
and Kernel Acoustic Models for Speech Recognition.” Come See Poster Tomorrow!!

Fiter: Greedy Bloggers for a Picky Audience Avner May, Augustin

Transcription

Similar documents

Dynamic Intelligent Kernel Assignment in Heterogeneous MultiGPU Systems

STAT474/STAT574, Practice Test 2, all problems equally weighted

M. J. Ferrarotti, S. Decherchi and W. Rocchia CONCEPT Lab, Istituto

Bootchart 2

Lustre + Linux

p - Blogs@FAU

WA1a-12

to open - Musicoz

MandrakeLinux

In Search of more economic IT solutions

Lab 2 – Operating Systems CSCI 6303 – Principles of I.T.

Visual Aid - UC Food Safety

TÉCNICAS Y DISTRIBUCIONES TIEMPO

Nutri-Rich Oil Information Sheet-v2

press release - Artnovion Acoustics

Intrusion Detection Systems What To Observe? What To Observe?

1. Basic Concepts

演習３ (PRACTICE FOR 3 GRADES)

PDF

Blower Package Pressure and Vacuum

Resodyn Corporation - Dave Desch, Larry Farrar Partners in Progress

Engineering Data in the MBSE life-cycle EGS