2021-01-03

論文メモ

Data Augmentation for End-to-end Code-switching Speech Recognition
Code-switching speech用のdata augmentation。1. audio splicing: GMM-HMMを用いてcode-switching speechを言語別複数セグメントに分け、同一話者の別発話セグメントと連結することで、新たなcode-switching speechを作成。2. code-switching text with word translation: monolingual textから名詞or動詞を選択し、別言語へ翻訳する→TTS。3. code-switching text with word insertion: 別単語の単語を挿入する→TTS。いずれの手法も、併用した場合も性能改善。
Deep Convolutional Neural Network with Mixupfor Environmental Sound Classification
環境音分類タスクにmixupを使用し、性能改善。
QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions
CNN+CTCを使用した軽量音声認識。1. depthwise separable convolution, 2. ShuffleNetを参考に、group shuffle。
Multi-QuartzNet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion
QuartzNetの発展。マルチストリーム化、squeeze-and-excitationを用いてchannel方向でのattention、各ストリームに対する重みづけを追加。
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
入力に色々なdilation rateのCNNをかけて、それぞれを独立(multi-stream)に。stream毎にconvolutionやself-attentionをかけて、それをfusion。1D convolutionやself-attentionに対してSVDをかけて次元削減。
ACGAN-based Data Augmentation Integrated withLong-term Scalogram for Acoustic SceneClassification
音響イベント分類タスク。ACGANを用いてdata augmentation。
Manifold Mixup: Better Representations byInterpolating Hidden States
hidden representation上でmixup。特徴量およびラベルをベータ分布からサンプリングした値で線形補間。sequential dataに対しては？
Contextual RNN-T for Open Domain ASR
音声認識の際に補助的なメタテキストデータを追加することで、固有表現の認識性能を向上させる試み。テキストデータに対してattentionをかけてcontext vectorを計算・連結。生成仮説のprefixとメタテキストのprefixが一致しているか調べ、attentionを調整。
BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example
事後確率化した系列をBLSTMに入力して、長いコンテキストを考慮させる。STFTのパラメタを変えて複数入力特徴量を計算し、マルチストリーム化+fusion。
Double Adversarial Network based Monaural Speech Enhancement forRobust Speech Recognition
音声強調のためのGANと音声生成のためのGANを、discriminatorを共有した状態で学習。音性強調側のGANは、0/1の分類ではなく、クリーンスピーチと強調後音声に対するdiscriminatorの出力のL2 lossとする。
Unsupervised regularization of the embedding extractor for robust languageidentification
Maximum Mean Discrepancy lossを用いた言語識別器の教師なし適応
Metric learning loss functions to reduce domain mismatch in the x-vectorspace for language recognition
cross-entropyやadditive angular margin softmax lossで言語識別器を学習し、ミスマッチを生じさせる要因 (channel mismatch, gender mismatch) をMMD (maximum mean discrepancy) を用いて評価。
Kaldi-web: An installation-free, on-device speech recognition system
kaldi用web GUI (https://gitlab.inria.fr/kaldi.web/kaldi-wasm)。

2020-10-18

論文メモ

Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019 2019年のDCASE challengeの総評。
Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging Audio taggingタスク。サンプル単位でのhidden vectorの類似度を図るsimilarity preserving KDを発展させ、サンプル内のフレーム間類似度をもとめるintra-utterance similarity preserving KDを提案。teacher modelとstudentモデルでこれを計算し、MSE lossを最小化させる。
Feature space Augmentation for Long-Tailed Data 画像分野での、imbalanced dataに対する特徴量空間でのdata augmentation。CAMを用いて判断根拠となっている部分とそうではない部分に分け、後者をいじる。
StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation data augmentation用に重畳音声を作成させる。オープンソース。 https://github.com/SRPOL-AUI/storir
Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias 長時間発話データで話者照合モデルを作成し、短時間発話用モデルをteacher student learningで学習。更新時の重みを対象に正則化こうをつける。
Sound Event Localization and Detection based on CRNN using Dense Rectangular Filters and Channel Rotation Data Augmentation sound event localicationに特化したdata augmentation。チャネルのスワッピング、方位角の回転。
Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification code switching speechを対象としたLID(lanugage identification)タスクに特化したdata augmentation。SpecAugmentを参考に特定言語部分のスペクトルをマスクし、マスクしたスペクトルと対応する部分をマスクした言語ラベルを新たなサンプルとする。
Dice Loss for Data-imbalanced NLP Tasks F値を最終的な評価指標とするタスクをCEで解くと、accとF値に乖離が生じる。そのため、ダイス係数(=F1 score)をもとにした誤差関数を提案。(F値を使用するタスクで有効。accuracyが最終的な評価指標の場合はCEと使い分ける。)
Knowing What to Listen to: Early Attention for Deep Speech Representation Learning STFTの(時間軸方向ではなく)周波数ビンを対象にattentionをかける。モデルの初期段階でattentionをかけられることが他手法と異なると主張。

2020-02-23

aclocal command not found

プログラミング

エラー文

aclocal-1.14: command not found

元々の開発環境と現在の開発環境の違いにより生じる。

autoreconf -f -i

を実行して更新する。

2020-02-11

devtoolsetについて

プログラミング

1. Install a package with repository for your system:
On CentOS, install package centos-release-scl available in CentOS repository:
$ sudo yum install centos-release-scl

On RHEL, enable RHSCL repository for you system:
$ sudo yum-config-manager --enable rhel-server-rhscl-7-rpms

2. Install the collection:
$ sudo yum install devtoolset-6

3. Start using software collections:
$ scl enable devtoolset-6 bash

を実行すると、/opt/rh/devtoolset-6以下に色々なものがインストールされる。 source /opt/rh/devtoolset-6/enableと実行すると環境変数に設定される。

そもそも、dockerで..

[1] https://www.softwarecollections.org/en/scls/rhscl/devtoolset-6/

2020-01-11

加速主義

言葉

加速主義

2020-01-11

第三者効果

言葉

第三者効果

マスメディアがもたらす影響を他人事として考えること。すなわち、テレビや新聞などによって伝えられる説得的なメッセージによって多くの人は影響を受けると懸念する一方、それは自分以外の他者（第三者）においてのみ起こることで、自分だけはメディアに踊らされたりはしないと考える [1]。
メディアに騙される世間と、騙されない自分。
他人(情弱)の知らないことを知っているという自負心。

[1] http://www.jumonji-u.ac.jp/sscs/ikeda/cognitive_bias/cate_s/s_12.html

2020-01-06

error: aggregate ‘EVP_CIPHER_CTX ctx’ has incomplete type and cannot be defined

OpenSSL 1.1.0以降、EVP_CIPHER_CTXに関する文法が変更されている。

- 初期化

更新前 (1.0.0)
EVP_CIPHER_CTX ctx;
EVP_CIPHER_CTX_init(&ctx);

更新後 (1.1.0)
EVP_CIPHER_CTX *ctx = EVP_CIPHER_CTX_new();
EVP_CIPHER_CTX_init(ctx);

- 後処理

更新前 (1.0.0)
EVP_CIPHER_CTX_cleanup(&ctx);

更新後 (1.1.0)
EVP_CIPHER_CTX_free(ctx);

(参考) - https://stackoverflow.com/questions/26345175/correct-way-to-free-allocate-the-context-in-the-openssl