Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton January 29, 2020

Exploring vowel formant estimation through simulation-based techniques

  • Tyler Kendall ORCID logo EMAIL logo and Charlotte Vaughn ORCID logo
From the journal Linguistics Vanguard

Abstract

This paper contributes insight into the sources of variability in vowel formant estimation, a major analytic activity in sociophonetics, by reviewing the outcomes of two simulations that manipulated the settings used for linear predictive coding (LPC)-based vowel formant estimation. Simulation 1 explores the range of frequency differences obtained when minor adjustments are made to LPC settings, and measurement timepoints around the settings used by trained analysts, in order to determine the range of variability that should be expected in sociophonetic vowel studies. Simulation 2 examines the variability that emerges when LPC settings are varied combinatorially around constant default settings, rather than settings set by trained analysts. The impacts of different LPC settings are discussed as a way of demonstrating the inherent properties of LPC-based formant estimation. This work suggests that differences more fine-grained than about 10 Hz in F1 and 15–20 Hz in F2 are within the range of LPC-based formant estimation variability.

Acknowledgement

The data used for this project were collected with support from grants # BCS-0518264, BCS-1123460, and BCS-1122950 from the National Science Foundation.

References

Atal, B. S. & Suzanne Hanauer. 1971. Speech analysis and synthesis by linear prediction of the speech wave. Journal of the Acoustical Society of America 50. 637–655.10.1121/1.1912679Search in Google Scholar

Boersma, Paul & David Weenink. 2019. Praat: Doing phonetics by computer. Version 6.1. [software; available from http://www.fon.hum.uva.nl/praat/].Search in Google Scholar

Di Paolo, Marianna, Malcah Yaeger-Dror, & Alicia Beckford Wassink. 2011. Analyzing vowels. In Marianna Di Paolo & Malcah Yaeger-Dror (eds.), Sociophonetics: A student’s guide, 87–106. New York: Routledge.Search in Google Scholar

Dissen, Yehoshua, Jacob Goldberger, & Joseph Keshet. 2019. Formant estimation and tracking: A deep learning approach. Journal of the Acoustical Society of America 145. 642–653.10.1121/1.5088048Search in Google Scholar

Duckworth, Martin, Kirsty McDougall, Gea de Jong, & Linda Shockey. 2011. Improving the consistency of formant measurement. International Journal of Speech, Language, & Law 18. 35–51.10.1558/ijsll.v18i1.35Search in Google Scholar

Farrington, Charlie, Tyler Kendall, & Valerie Fridland. 2018. Vowel dynamics in the Southern Vowel Shift. American Speech 93(2). 186–222.10.1215/00031283-6926157Search in Google Scholar

Harrison, Philip. 2004. Variability of formant measurements. MA Dissertation. York, UK: University of York.Search in Google Scholar

Harrison, Philip. 2013. Making accurate formant measurements: An empirical investigation of the influence of the measurement tool, analysis settings and speaker on formant measurements. PhD dissertation. York, UK: University of York.Search in Google Scholar

Kendall, Tyler & Valerie Fridland. 2012. Variation in perception and production of mid front vowels in the U.S. Southern Vowel Shift. Journal of Phonetics 40(2). 289–306.10.1016/j.wocn.2011.12.002Search in Google Scholar

Kendall Tyler & Valerie Fridland. 2017. Regional relationships among the low vowels of U.S. English: Evidence from production and perception. Language Variation and Change 29(2). 245–271.10.1017/S0954394517000084Search in Google Scholar

Kendall, Tyler & Erik R. Thomas. 2010. Vowels. R: Vowel Manipulation, Normalization, and Plotting in R. Version 1.2-2. [R software package; available from https://cran.r-project.org/web/packages/vowels/].Search in Google Scholar

Kendall, Tyler & Charlotte Vaughn. 2015. Measurement variability in vowel formant estimation: A simulation experiment. In The Scottish Consortium for ICPhS 2015 (eds.), Proceedings of the International Congress on Phonetics (ICPhS) 2015. Glasgow: University of Glasgow.Search in Google Scholar

Kewley-Port, Diane & Amy Neel. 2006. Perception of dynamic properties of speech: Peripheral and central processes. In Steven Greenberg & William A. Ainsworth (eds.), Listening to speech: An auditory perspective, 49–61. Mahwah, NJ: Lawrence Erlbaum.10.4324/9780203933107-4Search in Google Scholar

Kewley-Port, Diane & Yijian Zheng. 1999. Vowel formant discrimination: Towards more ordinary listening conditions. Journal of the Acoustical Society of America 106. 2945–2958.10.1121/1.428134Search in Google Scholar

Labov, William, Ingrid Rosenfelder, & Josef Fruehwald. 2013. One hundred years of sound change in Philadelphia: Linear incrementation, reversal, and reanalysis. Language 89(1). 30–65.10.1353/lan.2013.0015Search in Google Scholar

Lobanov, Boris M. 1971. Classification of Russian vowels spoken by different listeners. Journal of the Acoustical Society of America 49. 606–608.10.1121/1.1912396Search in Google Scholar

Markel, John D. & Augustine H. Gray Jr. 1976. Linear Prediction of Speech. Berlin: Springer.10.1007/978-3-642-66286-7Search in Google Scholar

McAuliffe, Michael, Arlie Coles, Michael Goodale, Sarah Mihuc, Michael Wagner, Jane Stuart-Smith & Morgan Sonderegger. 2019. ISCAN: A system for integrated phonetic analysis across speech corpora. In: 19th International Congress of Phonetic Sciences (ICPhS 2019), 1322–1326. Australia: Melbourne, 5–9 August 2019.Search in Google Scholar

Morrison, Geoffrey Stewart. 2008. Forensic voice comparison using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English /aɪ/. International Journal of Speech, Language, & Law 15. 249–266.Search in Google Scholar

Nolan, Francis & Catalin Grigoras. 2005. A case for formant analysis in forensic speaker identification. International Journal of Speech, Language and Law 12. 143–173.10.1558/sll.2005.12.2.143Search in Google Scholar

O’ Shaughnessy, Douglas. 1988. Linear predictive coding: One popular technique for analyzing certain physical signals. IEEE Potentials 7(1). 29–32.10.1109/45.1890Search in Google Scholar

Reddy, Sravana & James Stanford. 2015. Toward completely automated vowel extraction: Introducing DARLA. Linguistics Vanguard 1(1). 15–28.10.1515/lingvan-2015-0002Search in Google Scholar

Rosenfelder, Ingrid, Josef Fruehwald, Keelan Evanini, Scott Seyfarth, Kyle Gorman, Hilary Prichard & Jiahong Yuan. 2014. FAVE (Forced Alignment and Vowel Extraction) program suite. Version 1.2.2. [software; available from https://github.com/JoFrhwld/FAVE].Search in Google Scholar

Thomas, Erik R. 2011. Sociophonetics: An introduction. Houndmills: Palgrave.10.1007/978-1-137-28561-4Search in Google Scholar

Thomas, Erik R. & Tyler Kendall. 2007. NORM: The vowel normalization and plotting suite. Version 1.1. [online resource; available at http://lingtools.uoregon.edu/norm/; accessed February 2019].Search in Google Scholar

Vallabha, Gautam K. & Betty Tuller. 2002. Systematic errors in the formant analysis of steady-state vowels. Speech Communication 38. 141–160.10.1016/S0167-6393(01)00049-8Search in Google Scholar

Watt, Dominic. 2010. The identification of the individual through speech. In Carmen Llamas & Dominic Watt (eds.), Language and Identities, 76–85. Edinburgh: Edinburgh University Press.10.1515/9780748635788-011Search in Google Scholar

Watt, Dominic, Anne Fabricius, & Tyler Kendall. 2011. More on vowels: Plotting and normalization. In Marianna Di Paolo & Malcah Yaeger-Dror (eds.), Sociophonetics: A student’s guide, 107–118. New York: Routledge.Search in Google Scholar

Zhang, Cuiling, Geoffrey Stewart Morrison, Felipe Ochoa, & Ewald Enzinger. 2013. Reliability of human-supervised formant-trajectory measurement for forensic voice comparison. Journal of the Acoustical Society of America 133. EL54–EL60.10.1121/1.4773223Search in Google Scholar

Appendix: Full results from Simulation 1

Tables A1 and A2 provide the vowel category average medians for F1, F2, and F3 of the 4 tokens per vowel category from Simulation 1, with speakers grouped by sex (Table A1 for males, Table A2 for females). The tables also provide the average IQRs (over the 4 tokens) and the average absolute differences between the medians for each vowel token and the gold standard vowel measurement.

Inspecting the variability via average IQRs in these tables, it is evident that some speaker-vowel category pairings obtain somewhat unstable estimates. The individual speakers’ IQRs range for F1 from a low of 3.4 Hz (Ryan3381’s /i/) to a high of 135.5 Hz (Ann5805’s /ɑ/) (mean = 37.5 Hz) and for F2 from a low of 19.7 Hz (Bryan2168’s /ɪ/) to a high of 1567.1 (Paige5200’s /e/) (mean = 162.2 Hz). Further, we find some support for Harrison’s (2013) observation that formant estimation error is not uniform across the vowel space. High vowels tend to obtain tighter distributions for F1 while low vowels obtain the widest distributions (as visible in Figure 2 in the main text). For F2, less consistent patterns emerge. For instance, /ɑ/ and /ɔ/ obtain some of the smallest IQRs, but /æ/, also a low vowel, obtains some very large IQRs. Generally, /e/ obtains the highest IQRs for F2. And, although /ɪ/ has some of the lowest F2 IQRs for some speakers, it also has some of the highest for others (941.6 Hz for Natalie2800 and 1174.2 Hz for Paige5200). The range of variability across individual speakers and vowel classes reinforces the fact that LPC results are dependent on speaker-level and vowel token-level factors, but does not highlight specific patterns for the sources of that variability. We do find systematic patterns, however, by comparing the average IQRs across speaker sex (male: F1 = 32.4, F2 = 80.9; female: F1 = 42.7, F2 = 243.4), individual speakers (F1 = 37.5 [range = 18.0–50.7]; F2 mean = 162.2 [range = 63.2–491.1]), and vowel categories (F1 = 37.5 [range = 13.5–68.3]; F2 mean = 162.2 [range = 49.0–402.9]), showing that indeed higher formant frequencies (i.e. female voices and formants above F1) track less well with LPC and result in less stable estimates (Vallabha and Tuller 2002; Zhang et al. 2013). Speakers and vowel categories, however, yield roughly similar IQR ranges, indicating that they contribute equivalently to LPC-based estimation variability.

Table A1:

Simulation 1 results for individual male speakers.

F1F2F3
ValIQRDifValIQRDifValIQRDif
aaron3265
    /i/271.526.57.22313.468.717.22800.4408.644.3
    /ɪ/390.040.98.11964.946.412.12598.2125.113.7
    /e/416.537.58.22094.179.825.62657.4198.140.5
    /ɛ/525.947.614.71791.745.813.52581.1186.244.1
    /æ/645.529.35.61661.653.28.52474.9141.249.1
    /ɑ/680.137.219.41137.840.426.02397.0101.832.3
    /ɔ/633.967.326.11043.3105.421.92407.0385.229.8
    /ʌ/613.148.09.71297.750.814.32479.861.816.6
    /o/494.853.018.61093.979.116.22352.290.915.6
    /u/324.121.86.91585.9109.223.12255.848.15.7
    Mn40.912.567.917.8174.729.2
    SD13.66.924.96.0126.715.3
Bryan2168
    /i/286.46.40.42437.367.416.13016.0326.247.4
    /ɪ/392.89.62.62095.619.77.02885.951.114.9
    /e/438.115.33.52191.5154.815.02930.2224.013.2
    /ɛ/568.016.04.11880.829.72.72848.565.210.8
    /æ/780.630.99.21711.7145.110.32690.1306.267.7
    /ɑ/701.026.66.71051.422.55.42660.3135.610.8
    /ɔ/635.620.210.8970.239.511.52668.0901.925.8
    /ʌ/643.114.55.01277.442.37.12753.660.116.3
    /o/490.330.68.31101.677.59.52626.5109.23.0
    /u/340.210.33.31392.5107.524.92374.5103.013.4
    Mn18.05.470.611.0228.322.3
    SD8.83.349.96.4256.920.0
Eric1510
    /i/260.55.60.92309.4301.013.12831.5418.5118.7
    /ɪ/446.212.42.71809.724.96.42682.271.66.9
    /e/474.510.33.71987.3340.76.02683.3291.324.0
    /ɛ/580.713.13.01717.031.36.52612.8155.911.8
    /æ/709.017.35.41581.4141.811.42373.9739.430.5
    /ɑ/669.327.39.41129.223.212.32417.187.25.6
    /ɔ/629.574.730.71062.6118.730.42471.4407.016.0
    /ʌ/611.320.64.61327.847.46.42476.8330.719.2
    /o/515.732.25.51149.054.48.72379.4217.425.7
    /u/337.911.14.31362.1151.127.42214.1268.323.9
    Mn22.57.0123.512.9298.728.2
    SD20.18.6114.88.9196.032.8
Johnny4055
    /i/245.07.62.22475.163.712.33067.0284.578.4
    /ɪ/373.130.03.91969.648.618.62681.057.419.3
    /e/403.932.04.12163.5115.415.82635.591.853.8
    /ɛ/563.144.220.41706.940.99.52627.657.936.0
    /æ/771.037.66.31565.764.212.72485.9162.221.9
    /ɑ/657.9124.431.8998.7102.825.42216.5381.510.6
    /ɔ/634.163.030.0962.653.817.02208.754.622.7
    /ʌ/607.255.214.01290.253.710.02466.487.920.3
    /o/442.537.313.11050.162.624.62465.9101.613.1
    /u/266.29.93.01373.8190.521.82392.868.316.1
    Mn44.112.979.616.8134.829.2
    SD33.111.245.65.8111.521.4
Ryan3381
    /i/257.53.41.52334.757.17.33070.3168.216.4
    /ɪ/405.522.64.42044.037.98.52683.889.017.5
    /e/422.211.23.52175.8102.219.12700.1238.728.4
    /ɛ/529.317.95.21857.650.510.82570.9143.418.5
    /æ/730.424.34.51743.047.78.32434.1157.010.3
    /ɑ/742.3134.021.61133.767.915.52165.3352.341.2
    /ɔ/696.277.717.21067.527.38.92296.2256.717.0
    /ʌ/613.335.38.21308.558.612.02496.0199.937.0
    /o/467.225.08.51131.172.117.82183.034.08.1
    /u/293.212.53.91480.4110.329.22245.457.36.0
    Mn36.47.963.213.7169.720.0
    SD39.96.526.36.897.211.9
  1. Mean for each vowel category for individual token medians (Val), mean interquartile ranges for individual token distributions (IQR), and mean absolute differences from gold standard values (Dif), separated by formant, for F1, F2, and F3. All values in Hz. Means and standard deviations provided for IQRs and differences.

Table A2:

Simulation 1 results for individual female speakers.

F1F2F3
ValIQRDifValIQRDifValIQRDif
Ann5805
    /i/364.211.82.72878.254.47.63298.2182.028.2
    /ɪ/490.218.53.42392.736.56.23167.9120.718.1
    /e/455.128.86.02619.788.710.53167.1261.611.6
    /ɛ/730.828.910.72173.0145.912.22993.392.77.8
    /æ/874.254.87.32088.8147.560.92778.3127.634.8
    /ɑ/928.2135.515.81301.049.97.62825.5117.738.4
    /ɔ/739.6101.637.91096.461.422.03028.196.226.5
    /ʌ/814.647.014.21727.958.211.32983.1201.436.1
    /o/547.925.87.51184.759.07.32972.5108.510.6
    /u/411.111.54.81770.286.119.72942.139.712.6
    Mn46.411.078.816.5134.822.5
    SD41.210.439.016.563.511.7
Jocelyn1675
    /i/352.625.71.22828.469.16.43167.1133.138.9
    /ɪ/507.619.62.42296.735.45.83142.167.813.3
    /e/445.28.20.82569.180.96.53105.2395.223.8
    /ɛ/675.028.92.12069.4161.49.83095.2327.619.7
    /æ/861.944.67.21838.276.93.72867.2224.316.2
    /ɑ/834.136.46.81264.930.36.63150.848.39.2
    /ɔ/808.321.95.81268.931.52.93154.531.07.9
    /ʌ/758.328.715.31714.644.119.92966.7186.318.9
    /o/563.025.811.81383.5103.644.42845.3173.839.0
    /u/440.96.72.71954.8233.643.02836.9145.326.9
    Mn24.75.686.714.9173.321.4
    SD11.54.865.515.9118.011.0
Lindsey1595
    /i/329.032.31.52815.552.43.73410.9106.610.2
    /ɪ/521.230.95.62234.557.013.03103.669.09.6
    /e/414.232.69.42626.4102.629.23168.4195.739.8
    /ɛ/698.543.915.62043.7393.731.83082.9325.08.9
    /æ/870.0134.036.41835.7414.427.42991.2538.328.8
    /ɑ/812.652.514.01119.380.924.02807.654.511.3
    /ɔ/801.355.819.11205.1115.043.62807.3350.415.8
    /ʌ/731.088.818.61435.4380.396.62964.8641.34.3
    /o/527.629.410.21156.7118.129.02869.390.513.6
    /u/371.36.42.91816.6207.557.22717.6111.328.5
    Mn50.713.3192.235.6248.317.1
    SD36.410.2147.326.0208.511.4
Natalie2800
    /i/364.847.312.73082.5119.225.43625.5284.826.0
    /ɪ/485.730.022.62609.0941.612.73412.7853.336.8
    /e/449.920.710.02939.71396.328.23303.2511.883.7
    /ɛ/749.037.615.42357.2410.511.43341.1879.746.4
    /æ/966.4113.722.72210.4354.030.13093.5764.761.3
    /ɑ/808.580.641.21050.741.73.43027.3132.231.2
    /ɔ/817.278.022.91016.342.315.52967.7547.741.8
    /ʌ/803.742.09.71567.4165.919.53139.7863.340.9
    /o/548.930.913.51221.899.219.13014.081.837.2
    /u/424.011.16.91820.8112.134.32912.370.723.0
    Mn49.217.8368.320.0499.042.8
    SD31.910.1452.39.6335.718.0
Paige5200
    /i/274.227.912.22838.1112.431.03355.5396.3364.2
    /ɪ/475.930.723.52165.01174.211.32963.1801.3490.2
    /e/485.725.828.02362.01567.127.02916.8841.0455.1
    /ɛ/651.042.914.31907.1787.37.42836.9927.2730.8
    /æ/870.8106.119.91733.7456.626.72543.8656.9874.2
    /ɑ/781.728.710.31154.330.88.62456.2313.8928.2
    /ɔ/775.949.511.11145.050.818.12315.5180.8739.6
    /ʌ/756.048.516.11419.2362.134.82515.8799.9814.6
    /o/580.529.516.71438.9223.244.82488.1342.3547.9
    /u/443.833.910.31795.7146.715.02719.4204.4411.1
    Mn42.416.2491.122.5546.4
    SD24.06.0523.812.4287.2
Received: 2019-03-07
Accepted: 2019-06-12
Published Online: 2020-01-29

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 19.4.2024 from https://www.degruyter.com/document/doi/10.1515/lingvan-2018-0060/html
Scroll to top button