UvA-DARE (Digital Academic Repository) Why phonetically-motivated constraints do not lead to phonetic determinism: The relevance of aspiration in cueing NC sequences in Tumbuka

This paper examines the role of phonetic cues to postnasal laryngeal contrasts, language-specific differences in the use of these cues, and the phonetic naturalness of the different cues. While many studies have shown that long stop closure duration is a well-established cue to voicelessness in the postnasal context (see, e.g., Cohn & Riehl 2012, who claim this to be a universal property), the present study focusses on the role of aspiration noise in maintaining a voicing contrast in the postnasal environment. It provides experimental data from the Bantu language Tumbuka to illustrate that aspiration noise can preserve a postnasal laryngeal contrast even when stop closure duration is short. Though typologically less common, we show that the use of aspiration as a cue is also phonetically motivated. Furthermore, we show that such phonetic motivation should not be directly incorporated into phonology (e.g., as markedness constraints in OT). Instead, we employ the BiPhon model (Boersma 2007), which allows for a strict distinction between the modules of phonetics and phonology, and which formalizes the mapping of phonetic cues onto phonological representations via cue constraints, avoiding the problem of phonetic determinism.


Introduction
How phonetic information should be incorporated into or influence phonological analysis is a persistent question for the phonetics-phonology interface (Hamann 2011;Hyman 2001). As Hyman (2001) shows, accounting for the outputs of homorganic nasal-stop sequences (NC) is a fruitful area to test the limits of what he terms "phonetic determinism" in accounting both for cross-linguistically common phonological patterns and for less commonly attested patterns. While it is uncontroversial that one cross-linguistically 3 linguistic variation in the realization of NC sequences. The paper is structured as follows. In section 2 we survey phonetic studies of the realization of NC cross-linguistically, concentrating on the relative importance of four phonetic cues -voicing during stop closure, strength of the release burst, duration of stop closure and aspiration -in determining the realization and interpretation of postnasal laryngeal quality. In section 3, we present a phonetic study of postnasal aspiration in Tumbuka, which we show motivates aspiration as an independent cue from duration of stop closure. In section 4, we describe how phonetic cues to postnasal laryngeal quality can be incorporated into a phonological analysis, adopting the BiPhon model (Boersma 2007), that accounts for the apparently contradictory variation in all the postnasal processes schematized in (1). Section 5 concludes.

Phonetic factors conditioning the laryngeal quality of postnasal stops 2
In this section and the next, we provide evidence from phonetic studies for the relative importance of four perceptual cues to the postnasal laryngeal quality of obstruent stops: voicing during stop closure, strength of the release burst, the duration of stop closure, and aspiration. The first two cues will be treated together. The different weight given to these perceptual cues can account for the different attested outputs of NC schematized in (1), as we shall argue in section 4.

Voicing during stop closure and strength of release burst
The phonetic naturalness of postnasal voicing has been well-established in work like Ohala & Ohala (1991, Hayes & Stivers (2000), and Solé (2009Solé ( , 2012, and it is considered the most common laryngeal alternation in the NT context: see, e.g., Herbert (1986) and Pater (1999). It is found in languages as geographically diverse as Japanese, Terena, Quichua, Zoque, some Italian dialects, as well as in numerous Bantu and other Niger-Congo languages. (For examples and discussion, see, e.g., Choti 2015;Herbert 1986;Hyman 2001;Kadima 1969;Ohala & Ohala 1991Nasukawa 2005;Pater 1999;Rosenthall 1989;Solé 2009, andSteriade 1993.) We illustrate postnasal voicing with data from Kimatuumbi (Bantu P.13, Tanzania), where the source of the homorganic nasal is a prefix belonging to the set of what Odden (1996: 88ff.) calls the N-prefixes, such as the class 10 nominal prefix. 3 Examples (2c) and (2d) show that voiceless plosives are realized as voiced in this context: (2) Postnasal neutralization of voicing contrast in Kimatuumbi class 10 nouns (Odden 1996: 89, 91) Class 11 singular Class 10 plural (with an N-prefix) a. lų-báų 'rib' m-báų 'ribs' b. lų-góį 'braided rope' ŋ-góį 'braided ropes' c. lų-paláaį́ 'bald head' m-baláaį 'bald heads' 2 We are not concerned with whether the NC sequences are phonologically complex unary segments (prenasalized stops or post-ploded nasals) or consonant clusters. As Ladefoged & Maddieson (1986) argues, deciding whether a nasal plus stop sequence is a unary segment or a cluster is not a phonetic issue but concerns solely the phonology of the language in question. Indeed, Cohn & Riehl's (2012) experimental findings for six Austronesian languages show that the phonological status of NC is not reflected in the internal timing of the nasal and oral closure in these sequences. For a general discussion of the phonological interpretation of NC as a cluster or a complex unary segment in Bantu languages, see Downing (2005). 3 The morphological context of NC sequences crucially conditions the output in Kimatuumbi, as Odden (1996) shows. Homorganic nasals that have as their source what Odden (1996: 82ff) calls a muͅ -prefix (such as the 2 nd person singular subject prefix) do not trigger voicing on a following stop: m-paánde 'you should plant'; n-teleké 'you should cook'. These kinds of morphological conditions are another reason why a phonetically motivated constraint *NT cannot account for all the phonological alternations found in the NC context, as Hyman (2001) makes clear. An acute accent indicates High tone in the Kimatuumbi data. Unfortunately, our sources for the other languages do not consistently mark tone, and so we follow them in omitting it. In any case, tone is orthogonal to the discussion of the laryngeal alternations. 4 d. lų-tiníká 'cut' n-diníká 'cuts' As Solé's (2009Solé's ( , 2012 thoughtful overview of the phonetic literature demonstrates, the typological preference for post-nasal voicing in plosives has both an aerodynamic and a perceptual basis. Aerodynamically, the voicing of the nasal in NT sequences is extended into the oral stop (compared to postvocalic stops) because initial velar leakage and continued velic raising even after velic closure cause an expansion of the oral cavity volume, which lowers the pressure in the oral cavity and therefore prolongs transglottal air flow. As a consequence, the plosive, even in NT sequences, is partially voiced. This can further lead to a shorter pressure build-up during the oral closure and hence a weaker stop burst than expected for a voiceless stop. Partial voicing and a weak burst can lead the post-nasal plosive to be interpreted phonologically as a fully voiced stop, since a noisy release is an important perceptual cue for voiceless stops. This re-interpretation motivates synchronic or diachronic phonological processes of stop voicing in post-nasal position, often resulting in neutralization of a laryngeal contrast, as we saw in (2), above.

Duration of stop closure
Even though postnasal voicing of obstruent stops is a common process, it is easy enough to find languages where a voiceless T versus voiced D contrast is not neutralized postnasally, in spite of the need to overcome the "phonetically unnatural" voicelessness of obstruents in this position. Riehl's (2008) detailed phonetic study of NC sequences in a set of Austronesian languages provides several examples, including Manado Malay (Austronesian; East Indonesia): (3) Postnasal voicing contrast in Manado Malay (Riehl 2008: 76, 208, 217) a. ambe 'to take' vs. ampa 'four' b. tanda 'sign' vs. tanta 'aunt' c. paŋge 'to call out' vs.
paŋko 'to hold on lap' Languages that maintain a post-nasal voicing contrast seem to employ specific strategies to make this contrast more perceptible. As Beddor (2007) puts it, the voiceless stops in such languages show a "resistance to velar leakage [...] that might diminish their voiceless percept" (p. 250/1). The cues for voiceless stops that are typically compromised in post-nasal position, namely a silent closure phase, i.e., absence of low-frequency energy, and an audible release burst (Ohala & Ohala 1993), can be enhanced 4 by lengthening the oral closure, which automatically leads to more pressure build-up and a stronger burst. Additionally, the relative duration of a long oral closure is enhanced by a shortening of the preceding nasal closure. Such an increase in voiceless oral closure duration and decrease in preceding nasal duration leads to an increase in perception of NT, as shown by Beddor (2009) in an identification task with American English listeners. Acoustic studies on languages with a contrast between ND and NT provide support for the role of duration of oral plosive closure in enhancing the perception of voicelessness, as they reveal a pattern of shorter nasal closure and a longer oral plosive closure for NT compared to ND sequences. Riehl (2008) and Cohn & Riehl (2012), for instance, found for all six of the Austronesian languages they discuss that in NT sequences, the nasal and the oral stop each took up approximately half of the total closure duration, while in ND sequences, the nasal comprised most of the total closure duration and the oral closure duration was extremely short. 5 A similar durational trade-off between long nasal and short oral closure for ND but not 5 for NT has also been found in French and Sundanese (Cohn 1990). In Ikalanga (Bantu S.16), which has NT sequences only in loans, Beddor (2007, Figure 4) found a longer oral closure for NT compared to ND, but no difference in the duration of their nasal components.
Nguni Bantu languages like Xhosa (S.41, South Africa) and Zulu (S.42, South Africa), have a threeway laryngeal contrast for all non-labial stops, including clicks: {T h , T, T̤ }, where T̤ stands for a voiceless depressor consonant. 6 This is illustrated in the data from Zulu below: (4) Laryngeal contrasts in Zulu (Chen & Downing 2011;Doke et al. 1996) a. T h ba-ya-khaba 'they are kicking' b. T ba-ya-kakwa 'they are being surrounded' c. T̤ isi-gaba 'section; piece, class 7' As noted since at least Doke (1926), post-nasally, only a two-way contrast is found: between the plain, variably ejective, voiceless stops and voiced ( In a phonetic study of Xhosa -where, as in Zulu, NT can be optionally ejected and ND is a voiced depressor -Jessen (2002) observed a longer oral closure for NT, and three of the four speakers tested showed a shorter nasal closure in NT than in ND. In short, there is a body of evidence confirming Cohn's (1990) observation that there is a systematic cross-linguistic asymmetry in the relative duration of the nasal and oral components of NT vs. ND clusters. As schematized in Figure 1 (cf. also Solé 2012: 133 and Stanton 2016: 1092, the nasal and oral parts in NT sequences often take up roughly half of the total duration, while in ND sequences, the nasal component often takes up most of the duration of the sequence and the oral component is quite short. Figure 1: Graphic illustration of the reported asymmetry in the duration of the nasal (light grey) and the oral (dark grey) parts in homorganic nasal-plosive sequences with voiceless plosive (NT; upper row) and voiced plosive (ND; lower row) due to contrast enhancement (based on the values provided by Cohn & Riehl 2012 for intervocalic position).
6 Work like Doke (1926Doke ( , 1961, Traill et al. (1987), Giannini et al. (1988), Jessen (2001), and Chen & Downing (2011) has clearly established that depressor stops in Zulu are not phonetically voiced. It remains a puzzle, however, what the source of the depressor effect of these consonants on tone might be. The abbreviation we have chosen for the depressor stops -T̤ -captures this ambiguity: the sounds are voiceless, yet have an effect on a following tone that would be more expected if they were (breathy) voiced. 7 Zulu has a particularly complex set of laryngeal contrasts in stops and a correspondingly complex set of postnasal laryngeal alternations, in fact. The only implosive /ɓ/ is realized postnasally as a depressor, not an implosive, and depressor stops (T̤ ) in postnasal position are truly voiced (ND ̤ ), hence /N+ɓ/ and /N+b̤ / are neutralized. The partially voiced ("soft") k does not occur in post-nasal position. See Doke (1961) for detailed discussion. Articulatorily, Solé (2012) accounts for the timing difference between NT and ND by an earlier raising of the velum for NT. Beddor (2007) and Cohn & Riehl (2012), on the other hand, ascribe it to a temporal shift in a constant-sized nasal gesture. 8 Whatever the articulatory explanation, it is clear that the stop closure duration asymmetry is potentially an important perceptual cue to postnasal laryngeal contrasts.
Indeed, even languages like Setswana (S.30; Botswana, South Africa) that do not contrast NT and ND postnasally provide evidence for the importance of stop closure duration as a cue to laryngeal quality. Gouskova et al. (2011), Zsiga & Tlale Boyer (2017 and Zsiga (2018) present the results of a careful investigation of postnasal stop realization in the Sengwato dialect of Setswana (henceforth, Setswana). As shown in (6) and (7), below, in word-initial position one finds a three-way laryngeal contrast for stem-initial plosives -D, T, T h : These laryngeal contrasts are not all realized post-nasally. Only voiceless stops are found, and they can contrast in aspiration (Gouskova et al. 2011;Zsiga 2018;Zsiga & Tlale Boyer 2017): (7) Postnasal neutralization in Setswana (Zsiga & Boyer 2017: 344, 349) a. bala That is, Setswana appears to illustrate post-nasal devoicing, typical of Sotho-Tswana languages, which Hyman (2001) argues is phonetically unnatural. 9 However, phonetic studies provide a motivation for this uncommon outcome. In phrase-medial intervocalic position T and D are actually both partially voiced. They remain perceptually distinct due to a significantly longer closure duration for voiceless stops as well as a small difference in VOT. In postnasal position, there is also no significant difference in voicing, but crucially in this context the distinction in closure duration for T and D is neutralized. Zsiga (2018) proposes that because the closure is relatively long -more similar to that of phrase-medial intervocalic voiceless stops than voiced ones -and the release is fortis and voiceless, both T and D are interpreted as voiceless postnasally, even though they are both partially voiced. (See Solé 2012 for a similar proposal based on a phonetic study of laryngeal quality in another Sotho-Tswana dialect, reported in Solé et al. 2010.) In short, one does not need to appeal to a phonetically unnatural *ND constraint to account for "postnasal devoicing" once one has a more careful look at the phonetically natural -if uncommon -details of the realization and interpretation of these NC sequences. The role of postnasal aspiration in Setswana is discussed in the next section.
2.3 Closure duration as a perceptual cue to postnasal laryngeal quality in languages with postnasal aspiration The studies surveyed up to now have dealt with NT sequences that have a voiceless but non-aspirated oral stop, though it has to be noted that many of the studies described do not explicitly provide details about 8 A recent articulatory study by Carignan et al. (2019) on German NT/ND sequences indicates that while the velar gesture for the nasal indeed starts earlier in NT than in ND, it is also of lesser magnitude and length, calling into question the cross-linguistic validity of the constant-sized nasal gesture hypothesis by Beddor and Cohn & Riehl. 9 The post-nasal laryngeal patterns in the Tswana dialects appear to show a great deal of variation: see the discussions in Boyer & Zsiga (2013), Coetzee et al. (2007), Coetzee & Pretorius (2010), Hyman (2001), and Solé et al. (2010). We concentrate in this paper on the Sengwato Setswana dialect. See the cited works for a detailed discussion of the distribution of Setswana plosives and the full range of postnasal alternations. 7 aspiration. In this section, we review the few acoustic studies of NC sequences in languages that have postnasal aspiration, contrasting NTh with ND and/or NT.
American English is a well-studied language where NTh contrasts with ND (intervocalically). It has been shown in studies like Raphael et al. (1975) for nasal duration and Beddor (2007Beddor ( , 2009) that NTh has shorter nasal and longer oral closure than ND, the expected closure asymmetry for voiceless vs. voiced postnasal stops schematized in Figure 1, above.
In Sukuma (Bantu F.21) a post-nasal voiceless plosive is also realized as aspirated (NTh), contrasting with ND. 10 Maddieson & Ladefoged (1993: 279-280) analyze data from one speaker of Sukuma, and found the nasal portion of NTh was significantly shorter than that of ND, while its oral stop closure was almost twice as long as that of ND. The duration of the oral closure in their study depended on the place of articulation, with alveolars and labials tending to be longer than palatals and velars. In Sukuma, NTh vs. ND thus also shows the stop closure asymmetry schematized in Figure 1.
One of the languages where NT contrasts with NTh and where experimental data on this contrast exists is Setswana. Recall from (6) and (7) that Setswana contrasts stem-initial D, T and Th in word-initial position (and intervocalically), but only contrasts T and Th in the postnasal context. 11 Gouskova et al.'s (2011) phonetic study compares NTh to Th for labial and alveolar place of articulation in Setswana, and found that for all six speakers in their study, the duration of both the oral closure and aspiration were very similar in post-vocalic and post-nasal position. They also compared NTh and NT and show that they do not differ significantly with respect to the duration of the oral closure, which is relatively long for both NT and NTh compared to intervocalic D. Aspiration thus seems to be the main cue distinguishing Th from other plosives in all contexts, and while the closure duration asymmetry is important in distinguishing voiced from voiceless stops in phrase-medial (intervocalic) position, as noted above, this seems not to hold for the contrast between NTh and NT.
Hmong Daw (Hmongic, Viet Nam), another language with a NT vs. NTh contrast, has been investigated by Maddieson & Ladefoged (1993: 259-260), who provide data for one speaker. 12 They show that the duration of nasal and oral closure for both NT and NTh strongly depends on the place of articulation, with very long nasals and very short oral closures for bilabials and shorter nasals but longer oral closures for uvulars. The duration of aspiration, on the other hand, remains similar across the different places of articulation. Though this study does not report on a difference in timing between NT and NTh, the spectrograms and waveforms provided (Maddieson & Ladefoged 1993: 260) indicate that the nasals are longer and the oral closures shorter in NT than in NTh. In Hmong Daw, NT vs. NTh thus has a nasal-oral closure duration asymmetry similar to that illustrated in Figure 1, above, for ND vs. NT in languages with a voicing contrast. Table 1 summarizes the timing asymmetries between nasal and oral closure duration that we could infer from the acoustic studies on languages with a post-nasal aspiration contrast.
10 Sukuma also has aspirated nasals (Nh), and most phonetic studies of Sukuma primarily focus on the phonetic realization and the possible diachronic emergence of this segmental class. See Maddieson (1991); Maddieson & Ladefoged (1993); Huffman & Hinnebusch (1998). However, this set of segments is not relevant to the present study and will not be discussed. 11 The facts are somewhat more complex than this. The interested reader should consult Gouskova et al. (2011) and Zsiga (2018) for details. We rely for our analysis on Zsiga & Tlale Boyer's (2017) and Zsiga's (2018) interpretation of the Setswana data collected in their phonetic studies. 12 Maddieson & Ladefoged (1993) show with their phonetic data that Smalley's (1976) interpretation of the Hmong Daw contrast as one between ND and NTh is incorrect. 8 Table 1: Languages with a post-nasal laryngeal contrast involving aspiration, and their timing asymmetry patterns, where ">" stands for "In the sequence on the left, nasal is longer and oral closure shorter than in the sequence on the right". For reference: languages with a voicing contrast without aspiration show the pattern ND > NT.

Languages
Timing asymmetry Sukuma, American English ND > NTh Hmong Daw NT > NTh Setswana NT = NTh The summary in Table 1 shows that NTh has the same pattern as NT schematized in Figure 1 in languages like Sukuma and American English, where it contrasts with ND. In Hmong Daw, where it contrasts with NT, the latter seems to take over the role of ND, and the two show the expected asymmetry. In Setswana, on the other hand, where NTh also contrasts with NT, the two behave the same with respect to timing of nasal and oral closure.
Apart from the findings for Setswana, the results therefore confirm Cohn (1990) and Cohn & Riehl's (2012) hypothesis that an asymmetry in closure duration is a crucial cross-linguistic cue to a postnasal voicing contrast. Setswana points in the direction that aspiration can take over the role as distinguishing factor for a postnasal voicing contrast. These results raise the question of why postnasal aspiration is quite frequently found, if it commonly only serves to redundantly reinforce longer closure duration as a cue to voicelessness. To address this issue, we undertook a phonetic study of NC sequences in Tumbuka, presented in the next section.

A case study of postnasal laryngeal properties in Tumbuka: The role of aspiration
Cross-linguistically, it is quite common for postnasal voiceless stops to be obligatorily aspirated, which arguably enhances a postnasal contrast between voiced and voiceless stops. 13 This is particularly well  Kenstowicz & Kisseberth 1977: 211;Nurse & Hinnebusch 1993: 155-157). Cross-Bantu surveys by Kerremans (1980: 169), Hinnebusch (1975), Herbert (1985), Hyman (2001) and Odden (2015) discuss a number of additional cases. Outside the Bantu language family, post-nasal aspiration is found, e.g., in some dialects of Icelandic (Helgason 2001) and in Lizu (Chirkova & Chen 2013), a Tibeto-Burman language spoken in the Sichuan Province of the People's Republic of China. Postnasal aspiration in a language with a two-way laryngeal contrast for stops is illustrated with data from Kongo (Bantu H.14-16, Congo-Kinshasa, Congo-Brazzaville): Postnasal enhancement of voicing contrast in Kongo (Meinhof 1932: 158;Carter 1984: Because there are very few phonetic studies of languages with postnasal aspiration, we conducted an investigation of plosive realization in Tumbuka (Bantu N.21, Malawi) to test further Cohn & Riehl's (2012) hypothesis that long oral closure duration is universally the principal cue to postnasal voicelessness and to evaluate the relative importance of aspiration as a cue. Like some other Malawian Bantu languages (e.g., Cinsenga (N.41) and Chichewa (N.31), both discussed in Miti 2001), Tumbuka has a three-way laryngeal 9 contrast -D, T, Th -for plosives, which is neutralized to D vs. Th in the postnasal context. Table 2 provides an overview of the portion of the Tumbuka consonant inventory that is relevant for the present study. 14 The data in (9) illustrate the three-way laryngeal contrast. Unless mentioned otherwise, all the data comes from the first author's elicitation notes. (9) Tumbuka three-way laryngeal contrast; ku-is the infinitive prefix (Vail 1972: 6 and elicitation notes) a. ku-pula 'to pound' ku-p h ula 'to save' ku-binkha 'to be dirty' b. ku-tola 'to be married [man]' ku-t h ola 'to pull out' ku-dangira 'to precede' c. ku-kama 'to squeeze; milk' ku-k h ala 'to dwell; sit' ku-ganda 'to bump; hit' These contrasts are partially neutralized, to NTh vs. ND, in the postnasal context. NT does not occur morpheme-internally in Tumbuka, and an underlying N+T is obligatorily realized as NTh. For example, class 9/10 nouns begin with a homorganic nasal consonant, and a following stop can only be voiceless aspirated or voiced, as shown in (10a). Productive postnasal aspiration alternations are found following morphemes which are realized as homorganic nasals, such as the first singular subject prefix (10b, c) or the simple copula (10d). Many more examples like these are found in our elicitation questionnaire, provided in the Appendix.
(10) Postnasal aspiration neutralization in Tumbuka (Vail 1972 and elicitation notes) a. ŋk h alo mbuzi '9.custom, habit' '9.goat' b. wa-ka-ndi-tumila vs. ŋ-k h a-tumikila *ŋ-ka-tumikila 2SUBJ-PAST-me-sent.for 1 ST SGSUBJ-PAST-sent.for.PASS 's/he sent me for' 'I was sent for' c. ku-tuma 'to send' n-t h um-e 'I should send (subjunctive)' d. pasi 'on the ground' m-p h asi 'it's on the ground' *m-pasi koma 'small, flat basket' ŋ-k h oma 'it is a small flat basket' T and Th only robustly contrast in root-initial position; elsewhere only T occurs. Because root-initial position realizes all phonemic contrasts, we consider it a position of prominence, following work like Beckman (1997). Tumbuka is a phrasal stress language, with the correlates of stress being lengthening of the phrase-penult syllable and association of a High tone with the penult syllable (Downing 2006(Downing , 2019. 16 The penult is, then, also considered a position of prominence when it realizes these stress correlates. As work like Hubbard (1994) has shown for other Bantu languages, both root-initial consonants and consonants 14 The present study excludes the fricatives of Tumbuka, /β, v, f, z, s, ɣ, h/, because (i) they only have a two-way voicing contrast and (ii) voiceless fricatives cannot occur after a homorganic nasal (Vail 1972: 6-8). The only occurring homorganic nasal-fricative sequences are /nv/ and /nz/; all other voiced fricatives undergo a hardening process in this environment (Vail 1972: 17). 15 We follow Vail (1972: 6) in calling this place of articulation "palatal" (as is quite common in Bantu literature), as it patterns phonologically with the palatal glide [j] and nasal [ɲ]. 16 Vowel length is not contrastive in Tumbuka (Vail 1972). 10 in the onset of syllables with phrasal stress are commonly longer in duration than other consonants in the word, as one would expect if they are in prominent positions.
In the next section, we provide details of our acoustic study. Section 3.2 discusses the implications of our results for the timing of the acoustic events in ND and NTh sequences, and the role of aspiration in maintaining post-nasal voicing contrasts.

Acoustic study
In the following study, we investigate whether the timing asymmetry reported for the acoustic events in ND and NT, with a much longer nasal and a shorter oral segment for ND compared to NT, as shown in Figure  1 and discussed in sections 2.2 and 2.3 above, also holds for the ND vs. NTh contrast in Tumbuka. For this, we compared the duration of the nasal and oral closures in ND and NTh sequences. In addition, we also included the duration of plosives and nasals in intervocalic position, as this allowed us to compare the overall duration of NC clusters to their counterparts in non-cluster position. Furthermore, we investigated whether the aspiration in NTh is comparable in duration to that in Th, i.e., whether aspiration is equally strong and thus can function as a perceptual cue to the laryngeal contrast in post-nasal position. For this, we compared the duration of the aspiration (including the burst) in NTh sequences to that of Th in intervocalic context.

Participants and stimuli
We recorded 7 adult native speakers of Tumbuka (3 male, 4 female) from Northern Malawi, all multilingual. 17 The recordings were made in Mzuzu and Zomba, Malawi, during a fieldwork trip by the first author in 2013, and were conducted in relatively quiet rooms, though some background noise could not be avoided. Recordings were made directly onto a MacBookPro laptop, using a SoundProjects LSM microphone, with a sampling frequency of 44.1 kHz. The data collection has been performed in accordance with the LSA's ethics statement for linguistic fieldwork (https://www.linguisticsociety.org/content/revisedethics-statement).
The participants read sentences presented via a computer screen that contained D, T, Th, ND, NTh and N at the beginning of prominent syllables, mostly stressed, often root-initial. These segments or segment sequences were preceded and followed by a vowel, thus they never occurred sentence-initially. Sentenceinitial NC was excluded, as its nasal was often devoiced in line with the findings by Maddieson & Ladefoged (1993), who report nasal devoicing of NCs in utterance-initial position for several languages, including the Bantu languages Kwambi (R.23), Pokomo (E.71) and Bondei (G.24).
Our material contained a total of 108 sentences of varying complexity from 3 to 18 syllables. The full set is provided in the Appendix. An example sentence from our stimulus set is given in (11); the prominent syllable is bolded.
I-PAST-sew 10.mat 'I sewed the mats' The first word in (11) also contains two instances of NC, which were not included in our analysis, as they did not meet the criteria of being both sentence-medial and in a prominent syllable.
Speakers were asked to produce at least four repetitions of each sentence. Due to background noise that cannot be avoided under fieldwork conditions, several of the tokens had to be excluded. An overview of the number of tokens obtained is given in Table 3.

Annotation and measurements
For each NC sequence, we annotated three acoustic events: nasal closure (N), oral closure (C), and burst with possible aspiration (B). An illustration of this annotation is given in Figure 1 with a representative waveform and spectrogram. In this figure, the preceding and following vowels (V) are also indicated, though they were not included in the analysis. All of the annotations and measurements were performed in Praat (Boersma & Weenink 2017).
The boundary between a vowel and a following nasal was determined on the basis of the abrupt change in formants in the spectrogram and the abrupt change in amplitude and shape of the waveform. This method was also applied for tokens that showed clear instances of vowel nasalization. The boundary between the nasal and the following oral closure phase was again based on changes in formants and amplitude. For NTh sequences, the cessation of voicing was used as an additional criterion. The boundary between oral closure and burst was set where the burst noise started, which was usually clearly visible both in the spectrogram and the waveform. Burst and aspiration noise, though mostly distinguishable, were not annotated or measured separately in the present study. End of aspiration was set at the point where voice bar and vowel formants started. Note that what we labelled as burst plus aspiration (B) included also affrication noise for the palatal place of articulation.
A few tokens of NTh (all produced by speaker 2) did not have an oral closure phase at all, see the illustration in Figure 3.
As is visible in Figure 3, this token shows neither an abrupt change in form and amplitude of the waveform nor an abrupt change in formants in the spectrogram. It seems to be an instance where the voicing went on right up to the burst, though it is also possible that background noise simply blurred the expected oral closure. Tokens like these were not included in the calculations of the oral closure but were in those of the nasal and aspiration duration. The same speaker also produced a few instances of ND with no clear oral closure phase. Again, these tokens were not included in the calculations of the oral closure.
For plosives and nasals in intervocalic position, we annotated the closure phase (C or N, depending on the item), and for Th items also the burst plus aspiration noise (B). For this, we employed the same criteria as described for the NC items above. We did not annotate or analyze the burst of non-aspirated plosives in intervocalic position.
12  Due to the relative noisiness of our recordings (due to fieldwork conditions with background noise), we had to refrain from measuring percentage of voicing in nasal or oral closure phases and amplitude of the bursts.

Results
The statistical analyses were carried out with linear mixed-effects models by using the lme4 package (Bates et al. 2015) in R (R Development Core Team, 2016). The statistical models featured the dependent variable duration of nasal closure, oral closure, or aspiration noise. Predictors were type, i.e., ND vs. NTh sequence (for the dependent variables duration of nasal and oral closure), or Th vs. NTh (for the dependent variable duration of aspiration), and place of articulation, all contrast-coded. Additionally, two random variables were taken into consideration: speaker and word. The models also account for type and place, and the interaction between them, as random slopes per speaker. Statistical significance was assessed by employing the lmerTest package (Kuznetsova et. al 2017). Figure 4 summarizes the results of the duration measurements. We first compared the duration of the nasal parts of NTh and ND. The mean duration of the nasal in ND is 70 ms, and 6.67 ms longer than in NTh (p = 0.012). There was a significant effect of place of articulation, with labial nasal closures being longer than alveolar ones (by 14.3 ms; p = 0.0003), and velar longer than palatal ones (by 10.6 ms; p = 0.0069), but no interaction between type and place.
For the duration of the oral closures in NTh and ND, we found again a difference: the oral component of ND had a mean duration of 26 ms, and was 7.63 ms shorter than that of NTh (p = 0.0243). There was also an effect of place of articulation, as the closure duration for a front articulation (labial or alveolar) was significantly longer than that for a back articulation (palatal and velar) (by 5.15 ms; p = 0.049). Furthermore, labials had longer closure durations than alveolars (by 8.03 ms; p = 0.029). Again, there was no interaction between type and place. The duration of nasal and oral closures in NTh and ND for each speaker are provided in Figure 5. We can see in Figure 5 that for all speakers the duration of the nasal closures (white boxes) is considerably longer than the duration of the oral closures (grey boxes), both in ND (left two columns in each speaker panel) and NTh (right two columns). Furthermore, the nasal closures are slightly longer in ND than in NTh for all speakers. With respect to the oral closure, five speakers show a slightly shorter closure in ND than in NTh, while two speakers (5 and 6) show a minimal difference in the other direction. For none of the speakers the duration of the nasal approaches the duration of the stop in NTh, in contrast to the almost equal duration of oral and nasal closure in NT predicted by Cohn & Diehl (2012), cf. Figure 1.
Comparing the durations of the NC sequences to that of intervocalic segments, the mean duration of the intervocalic single nasal is longer than the nasal parts in the NC sequences, namely almost as long as the nasal and oral closures of NC taken together, and the oral closures of NC are much shorter than the oral closures for intervocalic plosives.
For the analysis of the duration of aspiration, we took into consideration that the palatals are affricates while all other places of articulation are stops, and that their affrication noise was included in the aspiration phase in our annotation, which we therefore expected to be much longer than the aspiration duration of the other places of articulation. For this reason, we created two separate models, one for the palatals and one for the other three places of articulation. The factor that following high vowels can cause longer aspiration noise (see, e.g., Yavaș 2009) was not included in our models.
For the palatals, there was a main effect of type, and aspiration plus affrication for NTh was longer than that for Th (by 28.85 ms, p = 0.00651). For the other places of articulation, there was also a main effect of type, but here the aspiration plus affrication for NTh was shorter than that for Th (by 15.29 ms, p = 0.0131). There was no interaction with place of articulation for the non-palatals. Aspiration duration per speaker is given for the plosives in Figure 6, and for the palatal affricates in Figure 7.
As Figure 6 shows, the aspiration for the plosives is shorter in NTh than in Th for all speakers but one; speaker 2 has a similar duration for the two and shows remarkable variation in the aspiration duration for NTh. The palatal affricate in Figure 7 shows the reverse tendency: here, the aspiration plus affrication is longer for NTh than for Th, for all speakers. This difference between affricates and the plosives justifies the separate statistical analyses we performed. In sum, none of the speakers show durational patterns that are in contrast with the averages. 15 Figure 6: Duration of aspiration noise in NTh (grey) and Th (white) for the plosive stops split by speakers. Figure 7: Duration of aspiration noise in NTh (grey) and Th (white) for the palatal affricates split by speakers.

Discussion of phonetic results
Following the stop closure asymmetry hypothesis (Cohn & Riehl 2012) discussed in section 2.2 above, we expected the nasal closure to be considerably shorter in NTh than in ND sequences, and the oral stop closure to be considerably longer in NTh than in ND sequences, as schematized in the upper two rows of Figure 8. Our results, however, give a different picture, as can be seen by comparing the upper part of Figure 8 with the lower part. Though the nasal in ND is longer than that in NTh (by 6.67 ms) and the oral closure in ND is shorter than that in NTh (by 7.63 ms; both statistically significant), the duration of both the nasal and the oral closures in ND and NTh are very similar. We thus did not find in Tumbuka the same striking closure duration asymmetry between ND and NT that Cohn & Riehl (2012) predicted to be universal. The individual data that was given in Figure 5 showed that all of the speakers have the same relative durational difference, and none of them matches Cohn & Riehl's prediction. We conclude that the voiceless oral closure and its closure duration might be less reliable cues to a postnasal laryngeal contrast in Tumbuka than in languages that display Cohn & Riehl's durational asymmetry more clearly, contradicting Cohn's (1990), Cohn & Riehl's (2012) and Riehl's (2008) proposal that stop closure duration asymmetry is universally the most important cue for a postnasal voicing contrast. This is supported by the fact that in our data from Tumbuka we found instances of NTh where we could not detect a clear oral closure, cf. Figure 3.
The small but significant difference in oral stop duration between ND and NTh that we found in the present study could be entirely due to the fact that the closure duration of voiceless stops is generally longer than that of voiced stops: see, e.g., Lisker (1957) for intervocalic stops in English, Slis & Cohen (1969) for Dutch, Keating (1980) for Polish, Abdelli-Beruh (2004) for French, and also Misnadin (2016) for Madurese with a three-way laryngeal contrast, where the voiced stops have shorter closures than the voiceless aspirated and non-aspirated stops.
We propose that the presence of aspiration, with a duration that is comparable to aspiration in simplex stops, is a crucial perceptual cue to the voicelessness of a postnasal stop in Tumbuka, overriding its short stop closure duration that might otherwise cue voicing. Aspiration thus also seems to be a crucial cue to the voicing contrast found in post-nasal position in Tumbuka.
Our finding that place of articulation has an effect on the duration of nasal and oral closure is in line with the cross-linguistic observation that the duration of both closures decreases when one moves from a more front place of articulation to one further back, forming a gradation from labial to alveolar to velar (Stathopoulos & Weismer 1983;Maddieson 1996). Since the focus of the present study is the contrast in postnasal voicing, we will not take up this finding in the remainder of the article.
To sum up the discussion in sections 2 and 3, one finds clear cross-linguistic patterns in homorganic nasal-stop sequences, such as postnasal voicing or the implementation of the voiceless postnasal stop with a long closure duration, and these tendencies have phonetic motivations. However, this does not lead to phonetic determinism in the laryngeal quality of postnasal stops. As Solé (2012) has also observed, languages can draw on different phonetic strategies in their realization of postnasal obstruents. We have seen in the analysis of Tumbuka in this section that the long silent closure phase shown to be typical for voiceless postnasals in other languages is not present, and that instead aspiration most plausibly is the main cue to voicelessness. In that respect, it is similar to Setswana, discussed in section 2, where aspiration in NTh is the only cue to distinguish it from NT, since both NTh and NT have long oral closure phases. Oral closure duration is a parameter in which Setswana differs from Tumbuka and emphasizes that closure duration and aspiration are distinct cues to postnasal laryngeal quality. Two strategies, lengthening the oral closure duration and use of aspiration, are thus independently possible, both resulting in a strong release burst to enhance the percept of voicelessness.

A phonetically-motivated phonological account of postnasal processes and counterprocesses
To account for the patterns surveyed in sections 2 and 3, we propose that languages can choose among several cues to implement postnasal laryngeal contrasts, the main ones being: to increase or decrease the duration of stop closure and the concomitant increase or decrease of the duration of nasal closure, and the additional use of aspiration. 18 In this section, we take up how these phonetic cues can be incorporated into a phonological analysis, adopting the Bidirectional Phonetics & Phonology model (BiPhon; Boersma 2007), of the processes and counterprocesses discussed in Hyman's (2001) survey of postnasal NC alternations, repeated in (12) for convenience, while avoiding phonetic determinism: (12) Postnasal processes and counterprocesses in Bantu (adapted from Hyman 2001: 169, fig. (38)), repeated from (1) Section 4.1 elaborates the assumptions about the phonetics-phonology interface that motivate our analysis, focusing on postnasal (de-)voicing (12a) and (de-)aspiration (12b). In section 4.2, we then discuss possible phonetic motivations for the processes of (de-)affrication (12c) and (de-)nasalization (12d). Due to a lack of sufficient phonetic studies, the discussion in section 4.2 is necessarily somewhat speculative.

Phonetic motivations and their incorporation into phonological analyses
On the basis of the attested patterns summarized in (12), Hyman (2001) shows that a phonetically-motivated constraint like *NT is too restrictive in predicting the possible range of postnasal obstruents. Hyman argues that this problem exists with all accounts that incorporate phonetics directly into phonology, because "phoneticizing" phonology in the way that phonetically-defined constraints like *NT do, almost necessarily defines a narrower range of possible phonological systems than what is attested. Another point of criticism that can be raised against many approaches that directly incorporate phonetics into phonology is that phonetic gestures and auditory information are not the equivalent of abstract, categorical phonological units: the two types of representations are incompatible and should not be conflated. Confusion often arises because the same symbols are used to represent, on the one hand, a phonetically detailed transcription of acoustic characteristics or articulatory gestures, and on the other hand, abstract phonological entities such as feature bundles. For example, the IPA symbol [t] represents both a segment realized phonetically with tongue tip raising and with total obstruction in the vocal tract but without vocal fold vibration, as well as a 18 segment characterized by the phonological feature bundle [-voice, -sonorant, +consonantal, CORONAL].
(For a more detailed critique, see work like Hamann 2011: 211f.)

A model that distinguishes phonology and phonetics
These issues related to how phonetic information is best incorporated into phonological theory are directly addressed in the Bidirectional Phonetics and Phonology (henceforth, BiPhon) grammar model, developed by Boersma (2007), which we will employ in the analyses in this section. BiPhon keeps the phonetic and the phonological modules distinct, but models them together, and in so doing is able to account not only for more obviously phonetically-motivated processes but also for corresponding apparent counterprocesses, as we show below. The phonological module of BiPhon consists of two representations, an underlying and a surface phonological form, in line with traditional generative phonological theory as put forward in work since Chomsky & Halle (1968). We follow work like Pierrehumbert (1990) and Cohn (1993) in assuming that while the phonological module deals with abstract, discrete and symbolic categories and phonological processes, the phonetics module is responsible for gradient, continuous, physical categories and processes, involving auditory and articulatory representations. How these four representations are ordered and connected in BiPhon is shown in Figure 9. Note that we follow here the BiPhon notation of enclosing underlying forms in pipes (| |), phonological surface forms in slashes (/ /), and phonetic forms, both auditory and articulatory, in square brackets ([ ]). An additional difference between BiPhon and traditional grammar models is that the latter only define phonology as the mapping of underlying to surface form, i.e., the production direction, while BiPhon also includes the reverse processing direction, i.e., perception -from auditory to surface form -and recognition -from surface to underlying form. The recognition process is responsible for the undoing of phonological processes to enable listeners to access the underlying forms stored in the lexicon. The mappings between the forms that hold for one processing direction are also used for the other processing direction, hence the bidirectionality, with the exception of sensorimotor mappings between the auditory and the articulatory form, which only play a role in the production direction.
The representations and mappings assumed in BiPhon are not restricted to a specific type of formal implementation, and both OT and Neural Network formalizations exist. (For the former, see, e.g., Boersma 2007, Boersma & Hamann 2009; for the latter, see, e.g., Boersma, Benders &Seinhorst 2020.) Our analyses below employ OT-BiPhon. sequences and are devoid of phonetic content. They are thus arbitrary and language-specific, in line with substance-free phonological approaches such as Hale & Reiss (2000), Blaho (2008), Iosad (2012), andHall (2014). They contrast with traditional markedness constraints employed in OT, which need to be motivated by extra-grammatical considerations such as articulatory difficulty, perceptual saliency, cross-linguistic tendencies and/or acquisitional biases. (See Bermudéz-Otero & Börjars 2006 for discussion.) Similarly, in BiPhon phonological features are considered to be substance free and constructed simply on the basis of the phonological behavior that the learners observe in their language, for example the need to group segments into classes that undergo or trigger phonological processes. (For similar ideas, see, e.g., Mielke 2008.) That these phonotactic restrictions and phonological features are nevertheless quite often phonetically grounded is due to the fact that they are learned on the basis of, and connected to, the phonetic representations described in the next section.

The phonetics module and cue constraints as phonetics-phonology interface
In BiPhon, the phonetic module consists of both auditory and articulatory representations. The auditory form is assumed to be primary and is directly linked to the phonological surface form, since language learners of spoken languages start with auditory cues and need to learn how to map them onto slowly emerging phonological representations. The mapping of auditory representations onto phonological representations is therefore considered the locus of the interface between phonetics and phonology and will be the focus of the analysis in this section. 19 For the sake of brevity and clarity, we have excluded from our formalization the role of underlying laryngeal contrasts and whether a process results in neutralization (as for instance in Kimatuumbi where both NT and ND are realized as [ND], or in Zulu where both NTh and NT' are realized as [NT'], i.e., with ejection) or in enhancement (e.g., in Kongo where NT is realized as [NTh] and is therefore perceptually more distinct from [ND]). For a detailed formalization of the neutralization cases in Tumbuka and Zulu which includes the role of FAITHFULNESS constraints to underlying laryngeal contrasts, see Hamann & Downing (2017).
The mapping between auditory cues and phonological categories is formalized with cue constraints. To give an example of a cue constraint of relevance to the present study, the following observation "the presence of vocal fold vibration during closure in the auditory signal should not be interpreted as a voiceless stop in the surface phonological form" can be formalized as the cue constraint *[˷] /T/, where the small wiggly line stands for the periodic low-frequency murmur that vocal fold vibration causes during a stop closure. The cue constraint, *[ _ ] /D/, formalizes that the presence of a silent closure phase "[ _ ]" should not be interpreted as a voiced stop. Note that the negative formulation of these constraints reflects the fact that in perception it is usually not only a single cue in the form of the highest-ranked positive constraint that decides on the winner, but an interaction of several cues, though this only becomes visible in tableaux with more than two candidates (see, e.g., Boersma & Hamann 2009). The tableaux in (13) illustrate how the two cue constraints that were just introduced evaluate the perception of input auditory forms with lowfrequency murmur (left) and silence (right) during closure.

Postnasal voicing and aspiration
Recall from the discussion in section 2, above, that for articulatory reasons a continuation of voicing from the nasal into the following voiceless stops occurs in NT sequences. Therefore, the presence of some voicing during closure [˷] alone is an insufficient cue to exclude /NT/, and might lead listeners to confuse /NT/ and /ND/. This is formalized with the perception tableau in (14), where [n] in the auditory input stands for the lower-frequency formant structure that is typical for nasals, and [˷_ ] for a stop closure that first shows some voicing murmur and is then silent.  (14) indicate that due to the contradictory auditory cues, both /ND/ and /NT/ are valid percepts. The result of such contradictory cues can be that language learners might give more weight to one cue than the other. Quite frequently, the presence of voicing is given more weight than the presence of silence in the closure, a tendency aided by the fact that silence is easily masked by background noise. Formally, this assignment of more weight to the voicing cue means that the cue constraint *[˷] /T/ is ranked higher than *[ _ ] /D/, as exemplified in tableau (15). . In a subsequent production of /ND/, language learners will implement it without any silence in the closure, due to the bidirectional use of the cue constraint *[ _ ] /D/, which in the production direction is interpreted as "a voiced stop in the surface form should not be realized with a silent stop closure in the auditory signal." Alternatively, language users can enhance the postnasal contrast schematized in (14) by lengthening the voiceless part of the closure phase to [n˷__ ], which also results in a stronger burst noise [ t ], as exemplified in tableau (16).  (17) is enhanced by postnasal aspiration, cf. (12b). Whether this voiceless plosive is interpreted as phonologically aspirated or not depends on phonological factors in the language, hence our ambiguous notation /NT(h)/.

Postnasal devoicing and de-aspiration
Setswana is a language that departs from the common solution to the ambiguous voicing cues in [n˷_] by weighing the silence part in the closure more importantly, rather than by giving more weight to the coarticulatorily caused presence of short murmur (as in tableau (15)). The result of reranking these two cues is schematized in (18). In the subsequent articulation of /NT/, speakers will try to employ the bidirectional cue constraint *[˷] /T/, "a voiceless stop in the surface form should not be realized with a voiced stop closure in the auditory signal," and not realize /NT/ with any voicing. However, since this voicing is a co-articulatory by-product that cannot be avoided, speakers will tend to implement a longer silent closure phase, i.e., [n˷__ t ], to make the silence phase longer and therefore more salient. This weighting of cues by the listeners and the ranking of the respective cue constraints in (18) gives rise to postnasal devoicing, cf. (12a). This shows that a seemingly unnatural process can arise due to the fact that some languages assign more importance to different cues than other languages, so that the choice of how perceptual cues are ranked is language-specific. 20 20 An alternative, diachronic analysis of postnasal devoicing in Sotho-Tswana and other languages is provided by Beguš (2019), adopting proposals from Hyman (2001) and Dickens (1984). In this approach, the process of postnasal devoicing is hypothesized to involve a 3-step blurring process: 1-all stops spirantize unless they are postnasal; 2-all stops devoice (this would only affect postnasal stops, since these are the only stops following step 1); 3-all fricatives become stops, reintroducing the voiced stops eliminated by step 2. As we can see, these three steps define a Duke of York gambit, which has been critiqued since Pullum (1976; see Gleim 2019 for recent discussion). Furthermore, two of the crucial steps in this diachronic scenario are controversial, at best. Both Hyman and Dickens suggest that proto-Sotho-Tswana might not have had a series of voiced stops but rather, voiced fricatives, eliminating step 1. Second, it is controversial whether [voice] is synchronically a contrastive feature for Setswana stops. Since [d] is an allophone of /l/, [g] does not occur, and /b/ is variably realized as [β], step 3 does not seem to have applied to restore a set of voiced stops in Setswana. (See Hyman 2001 and Zsiga 2018 for discussion.) If the blurring process that leads to postnasal devoicing requires that all three steps be attested for a language, then it is unclear whether it accounts for synchronic Setswana. We would be equally cautious in proposing that other languages which have been described as having postnasal devoicing should be given the same analysis as the one we provide for Setswana. Until such time as 22 A factor that plays an important role in the language-specific choice of cues is whether a cue is already used for other contrasts in the language. Tumbuka, for instance, employs aspiration contrastively in the non-nasal context, and is therefore more likely to also employ this cue in postnasal position, to resolve a perceptually non-salient contrast. Zulu, just like Tumbuka, has a three-way laryngeal contrast, but employs optional ejection ['] as a cue for the plain voiceless series. In addition, Zulu has clicks with the same laryngeal contrasts as the stops, but with partly affricated releases that mask aspiration. As a result of this masking, aspiration is unreliable in distinguishing between aspirated and plain voiceless clicks. Because these cues are not restricted to clicks, aspiration is an unreliable cue to distinguish between aspirated and plain voiceless stops in general in Zulu. In terms of a BiPhon formalization, this means that the cue constraint *[ th ] /T/, "The presence of aspiration noise in the auditory signal should not be interpreted as plain voiceless stop in the surface form," and the cue constraint *[ t ] /Th/, "The absence of aspiration noise in the auditory signal should not be interpreted as an aspirated voiceless stop in the surface form," are both ranked low in Zulu. A perception tableau for a postnasal voiceless stop with aspiration noise illustrating these constraints is given in (19).  (19) formalizes the non-salience of aspiration noise as a cue to voicelessness in Zulu. Optional ejection, on the other hand, is a very salient cue for non-aspiration, as the highest-ranked cue constraint in (19) and (20)  Zulu hence optimally neutralizes NT/NTh in favor of (optionally ejective) NT, a seemingly unnatural process of postnasal de-aspiration, cf. (12b), since this results in a perceptually distinct postnasal contrast which does not require additional articulatory effort for speakers of a language that already employs ejection articulations and ejection noise as perceptual cues.

(De-)affrication and (de-)nasalization: Possible phonetic motivations
In this section, we take up the possible phonetic motivations for the processes and counterprocesses of (de-)affrication and (de-)nasalization schematized in (12c, d). Most of these alternations have not received the same phonetic attention as the ones presented in preceding sections, so our discussion is necessarily somewhat speculative. The discussion is intended to provide hypotheses for future phonetic and careful phonetic documentation of the phenomena exists for all the languages which have postnasal devoicing, however, the issue must be set aside. 21 As mentioned already in section 2.2, the closely related language Xhosa shows a long stop closure duration in the plain voiceless /T/, especially in postnasal position, which distinguishes it from aspirated /Th/ (Jessen 2002). If long closure duration is also employed as cue for /T/ in Zulu, then the perception tableau in (20) should additionally include a high-ranked constraint *[__] /Th/ that prefers /NT/, strengthening our argument that in this language /NT/ has much stronger perceptual cues than /NTh/. 23 phonological investigation of these processes by proposing the kinds of perceptual cues that could be relevant in accounting for the data.
(21) Kongo postnasal affrication (Carter 1984); ku-is the infinitive prefix a. ku-N-fila à ku-m-pfila 'to lead me' b. ku-N-vuna à ku-m-bvuna 'to deceive me' c. ku-N-siba à ku-n-tsiba 'to curse me' Carter (1984) attributes postnasal affrication to premature velic closure (see, too Ohala 1993 andSolé 2012), while Zsiga (2018) and Zsiga & Tlale Boyer (2017) consider affrication another aspect of fortition in the postnasal context. (Note in (12) that Kongo, Zulu and Setswana maintain voiceless stops postnasally.) Warner's (2001) study of stop epenthesis in English and Dutch proposes that both articulatory and perceptual factors potentially motivate postnasal affrication. Premature velic closure can be analyzed, articulatorily, as a misalignment of the nasal release with the onset of frication. The epenthetic stop also provides a perceptual cue to the place of the nasal that is obscured in the pre-fricative context. In short, affrication has a clear phonetic motivation.
De-affrication is also attested in the postnasal context (though, unfortunately, there are no phonetic studies that we know of for this phenomenon). De-affrication is illustrated with data from Shona cited in Hyman (2001: 170): (22) Shona de-affrication (Hannan 1984) a. -bvuma 'agree, admit' m-vum-o 'permission, agreement, class 9' b. mu-dzuwe 'swing, class 3' n-zuwe 'swing, class 10' Our proposal is that de-affrication could be perceptually motivated by the short stop closure duration that we have shown in sections 2 and 3 is commonly found in the postnasal context, both with voiced and voiceless stops. It is plausible to assume that reduction of the stop closure would lead to de-affrication: recall from section 3 that in Tumbuka the plosive closure duration of postnasal aspirated stops was in some instances almost non-existent. An articulatory motivation for de-affrication is again mis-alignment of nasal release with the onset of frication that has also been proposed for affrication, though in this case the velic movement is later than expected (both types of misalignment are due to the relative sluggishness of the velum compared to other articulators). Carter (1984) proposes that postnasal denasalization (12d) in Kongo has the same articulatory motivation as postnasal affrication in the same language, namely, premature velic closure. (See, too, Solé 2012.) Data illustrating denasalization in closely related Yaka (Bantu H.30; Congo-Kinshasa, Angola) is given in (23b), where N-is the 1 st singular subject prefix and -idi is a past tense suffix. As we can see, the underlying stem-initial nasal is denasalized following the nasal prefix. The examples in (23) illustrate that N+N and N+D neutralize to N+D in the postnasal context.

Conclusion
As an extension of Hyman's (2001) influential study, we have shown the importance of studying the phonetic motivation of not only common but also not so common postnasal laryngeal alternations in more detail, paying special attention to the potential auditory cues that might be present, such as aspiration noise in Tumbuka postnasal voiceless plosives. This is a necessary step away both from a too simplistic assumption of "phonetic determinism", i.e., the assumption that phonetic motivation has to be directly incorporated into phonology, and also from the assumption that every phonological contrast employs the same phonetic cues in every language. We have shown that by keeping the articulatory and perceptual motivations in the phonetics and letting them interact with the phonological representations, the BiPhon model is able to account for the influence of phonetics on phonology (and vice versa), without lumping the two indiscriminately together. We have further argued that other contrasts besides the postnasal one, i.e., the phoneme inventory and set of phonological contrasts as a whole, is important for an account of the phonetics and phonology of postnasal processes in any specific language.
Follow-up perceptual studies are needed to test the use of potential auditory cues to complement the existing acoustical research, and in the case of Tumbuka, to test to what extent aspiration is used as a perceptual cue to the ND-NTh contrast. Furthermore, large-scale whole-language studies are necessary to enable a complete account of all auditory cues to phonemic contrasts that are of relevance in any given language. In Tumbuka, for example, another potential cue that might play a role in the perception of postnasal laryngeal contrast is the duration of the preceding vowel (see the acoustic study by Hamann, Miatto & Downing 2019). Furthermore, the fact that closure duration, besides cuing voicing, is also a cue to place of articulation (cf. the acoustic findings in section 3.1.3, which are in line with the universal 25 tendency that closure duration decreases when place of articulation moves back in vocal tract) would need to be taken into account in such whole-language simulations.
In conclusion, we agree with Hyman (2001: 172) that "the existence of […] 'processes' vs. 'counterprocesses' may simply highlight the richness and complexity of the phonetics-phonology interface." We hope our study has confirmed Hyman's suspicion that "both the processes and counter-processes are phonetically driven, but by different, sometimes contradictory demands" that can be resolved differently due to differences in the overall phonetic and phonological properties of different languages.

Abbreviations
We follow the Leipzig glossing rules (https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf) for the abbreviations in the interlinear glosses, as adapted for Bantu languages by van de Velde et al. (2019). Note that numbers in the glosses refer to noun agreement classes. We have abbreviated the glossing somewhat to help the reader focus on the aspects of the morphology that are important in our study, i.e., positions of prominence and contexts for alternations. Detailed discussion of Tumbuka morphology can be found in Vail (1972) and Chavula (2016). Finally, note that the Tumbuka sentences are given in the orthography, as this is the form that was presented to the speakers when we made the recordings. a-mama wa-ku-pura ngoma 1a-woman 1.SUBJ-PROG-pound 9.maize The woman is pounding maize.
17Q kasi ŵ-a-khara pa mi-pando panji pasi? Q 2.SUBJ-PRF-sit LOC 4-chair or on.the.ground Are they sitting on chairs or on the ground?
17A m-phasi pa waka COP-on.the.ground bare/merely It's on bare ground.

59
n-khu-kunkhula pasi na ŵa-na 1SG.SUBJ-PROG-roll on.the.ground with 2-child I am rolling on the ground with the children.

65
pulikizga=ni ma-kani sono listen=PL 6-news now Listen to the news now.