The ContextFst object

ContextFst类的对象 (C)是一个动态建立的FST对象,用来表示一个从上下文相关的音素到上下文独立的音素的转换器。这个对象是用来避免我们在上下文中罗列所有可能的音素,当上下文窗长 (N)或音素数量比平常大很多时罗列所有音素的方法将会很困难。ContextFst::ContextFst转换器需要上下文窗宽(N)和中间音素位置(P)这些都会在之前的Phonetic context windows中有进一步的解释。对于三音素系统他们的常用值是3和1。特殊符号的整数id也是需要的,"subsequential symbol"(上文中提到的$),当所有音素都被处理完后FST输出N-P-1次$(用来确保context FST的输出是确定化的)。除此之外也要整数id的音素和消歧符列表。ContextFst 输出端的词表包括音素,和加上subsequential symbol的消歧符。输入端的词表是自动生成的(除了静音,它未被使用), 相当于内容有上下文相关的音素,混淆符,以及一些特殊符号在传统方法中用 #-1 代替静音,然而这里我们把他们看成别的混淆符(这是用来确保确定化的, i.e.删除静音)。subsequential symbol '$'和输入端没有太多联系(对传统方法来说也一样)。输入端混淆符的id和输出端的相关符号的整数id不是绝对一样。ContextFst类对象有一个ILabelInfo()的函数,是用来对std::vector >型对象返回引用,使用户能求出输入端每个符号的"meaning"。相关的解释可以查找The ilabel_info object。

这是ContextMatcher类的特殊对象"Matcher",它是在组合算法中被使用(Matcher 是组合算法用来做弧查找,更多细节请看the OpenFst documentation)。 ContextMatcher 让ContextFst的对象使用更高效通过避免分配更多不必要的状态((the issue is that with the normal matcher, every time we want any arc out of a state, we would have to allocate the destination states of all other arcs out of that state)。当组合的左边参数是ContxtFst型时,这是一个相关函数,ComposeContextFst()是用来执行FST组合操作的。也有一个函数很类似ComposeContext(),但是用来创建ContextFst对象。

Avoiding weight pushing

我们处理权重前推的整个问题上和传统方法上有微小的差别。 权重前推对于log半环在加速搜索是有效的(权重前推是一个FST操作确保每个状态出弧概率的和在合适的情况下相加为1)。然而在一些情况下 权重前推有负面影响。这个问题来自用FSTs表示统计语言模型,通常相加的结果大于1,因为一些词被记了两次(直接的或通过回退弧)。

我们决定避免可能的权重前推,所以使用了不同的方法来处理问题。1.定义"stochastic" FST ,它的权重和为1,读者可以假设这里只考虑log半环,而不是tropical,所以"sum to one"就是加和的意思,不是取最大。要确保图创建的每一过程都具有那种属性:如果前一个过程是随机的,后一个过程也将是随机的。也就是说,如果G是随机的,LG也是随机的;如果LG是随机的,那么det(LG)也是随机的;如果det(LG)是随机的,那么min(det(LG))也是随机的,等等。意思就是,每个独立的操作必须在一定情况下保持随机性。现在通过一种琐碎但是很有用的方法可以实现:例如:we could just try the push-weights algorithm and if it seems to be failing because, say, the original G fst summed up to more than one, then we throw up our hands in horror and announce failure. This would not be very helpful.

我们想保持随机性在某方面,也就是说首先,对于G,所有状态中最小和最大的,以及这些状态的出弧概率与最后的概率的和。这是我们程序"fstisstochastic"能实现的。如果G是随机的,所有的这些数字就都是1(你也从程序中看到是0,因为我们在log空间中执行的操作;这里是log半环)。我们想保持随机性在一下方面:当最小或最大从来不变的最坏;换句话说,他们从来不会离1很远。事实上,这很自然会发生,当我们的算法用局部的方法维保随机性。还有好多算法需要保持随机性,包括: • Minimization最小化操作 • Determinization确定化操作 • Epsilon removal删除空 • Composition 组合操作(with particular FSTs on the left) There are also one or two minor algorithms that need to preserve stochasticity, like adding a subsequential-symbol loop. Minimization naturally preserves stochasticity, as long as we don't do any weight pushing as part of it (we use our program "fstminimizeencoded" which does minimization without weight pushing). Determinization preserves stochasticity as long as we do it in the same semiring that we want to preserve stochasticity in (this means the log semiring; this is why we use our program fstdeterminizestar with the option --determinize-in-log=true). Regarding epsilon removal: firstly, we have our own version of epsilon removal "RemoveEpsLocal()" (fstrmepslocal), which doesn't guarantee to remove all epsilons but does guarantee to never "blow up". This algorithm is unusual among FST algorithms in that, to to what we need it to do and preserve stochasticity, it needs to "keep track of" two semirings at the same time. That is, if it is to preserve equivalence in the tropical semiring and stochasticity in the log semiring, which is what we need in practice, it actually has to "know about" both semirings simultaneously. This seems to be an edge case where the "semiring" concept somewhat breaks down. Composition on the left with the lexicon L, the context FST C and the H tranducer (which encodes the HMM structure) all have to preserve stochasticity. Let's discuss this the abstract: we ask, when composing A o B, what are sufficient properties that A must have so that A o B will be stochastic whenever B is stochastic? We believe these properties are: • For any symbol sequence that can appear on the input of B, the inverse of A must accept that sequence (i.e. it must be possible for A to output that sequence), and: • For any such symbol sequence (say, S), if we compose A with an unweighted linear FST with S on its input, the result will be stochastic. These properties are true of C, L and H, at least if everything is properly normalized (i.e. if the lexicon weights sum to one for any given word, and if the HMMs in H are properly normalized and we don't use a probability scale). However, in practice in our graph creation recipes we use a probability scale on the transition probabilities in the HMMs (similar to the acoustic scale). This means that the very last stages of graph creation typically don't preserve stochasticity. Also, if we are using a statistical language model, G will typically not be stochastic in the first place. What we do in this case is we measure at the start how much it "deviates from stochasticity" (using the program fstisstochastic), and during subsequent graph creation stages (except for the very last one) we verify that the non-stochasticity does not "get worse" than it was at the beginning. 至少如果•每个都能被恰当的归一的话(例如,如果对任何给定的词,词典权重之和都为1,如果H中的HMMs状态被恰当的归一,而没用概率尺度表示),这些属性对C,L,H也都是真实存在的。然而我们在建图的实际应用中用概率尺度表示HMMs的转移概率(类似声学尺度)。这就是说图构建的最后的过程不能保持随机性。如果我们用了一个随机的语言模型,G首先不会是随机的。

results matching ""

    No results matching ""