“Natural” stress patterns and dependencies between edge alignment and quantity sensitivity

We conducted an artificial language learning experiment to study learning asymmetries that might reveal latent preferences relating to, and any dependencies between, the edge alignment and quantity sensitivity (QS) parameters in stress patterning. We used a poverty of the stimulus approach to teach American English speakers an unbounded QS stress rule (stress a single CV: syllable) and either a left-or right-aligning QI rule if only light syllables were present. Forms with two CV: syllables were withheld in the learning phase and added in the test phase, forcing participants to choose between left-and right-aligning options for the QS rule. Participants learned the left-and right-edge QI rules equally well, and also the basic QS rule. Response patterns for words with two CV: syllables suggest biases favoring a left-aligning QS rule with a left-edge QI default. Our results also suggest that a left-aligning QS pattern with a right-edge QI default was least favored. We argue that stress patterns shown to be preferred based on evidence from ease-of-learning and participants’ untrained generalizations can be considered more natural than less favored opposing patterns. We suggest that cognitive biases revealed by artificial stress learning studies may have contributed to shaping stress typology.


Introduction
Half a century of research in metrical phonology has delivered a well-articulated typology of word stress patterns found in the world's languages along with formal theories intended to account for asymmetries in the stress typology (Liberman & Prince 1977;Hayes 1985Hayes , 1995Prince 1983;Hammond 1984;Halle & Vergnaud 1987, among others). All post-SPE metrical theories are componential in that they model stress patterns in terms of a limited set of interacting parameters with opposing settings. 1 One such parameter is edge alignment: stress patterns orient to either the left or right edge of the prosodic word. A second key parameter is quantity (or weight) sensitivity. In quantity-sensitive (QS) stress patterns, word stress is sensitive to the internal structure of syllables. In QS languages, syllables with two morae are stress attractors. There are no known weight-sensitive languages in which stress is preferentially assigned to monomoraic syllables. * The authors are grateful to Rajka Smiljanić for recording stimulus materials for the project, and to Scott Myers for his helpful input at various stages of the project. This research was presented at the Manchester Phonology Meeting in May, 2019. We also thank Andrew Nevins for his editorial handling of the submission and the anonymous PDA reviewers for their helpful feedback.
1 Or constraints with opposing requirements. In this work, we refer to parameters identified in Hayes (1985), as it is convenient to think of oppositions in binary terms, and because we wish to distance ourselves from any particular formal theory (e.g., Optimality Theory, and its claims about independent constraints and constraint rankings).
In this article, we report the results of an artificial language learning (ALL) experiment in which we explored native American English speakers' preferences in regard to the edge alignment parameter in stress patterning, and whether the presence of a weight-sensitive component influences edge alignment preferences. This experiment belongs to a research program whose broadest objective is to discover evidence that can shed light on what it means for a stress pattern to be natural, and then relate any findings back to the stress typology. Do asymmetries in the stress typology exist because some known parameter settings are more natural than others? And relatedly, when stress parameters interact, do the effects of one more naturally take precedence over the effects of the other? Finally, how can naturalness in stress patterning be measured? Before describing the present research in detail, we first discuss what we mean by "naturalness" and provide background on the most relevant prior ALL research on stress pattern learning.

Naturalness and the problem with frequency statistics
The concept of "naturalness", applied to linguistic patterns, can be thought of in various ways. Obviously, if a stress pattern occurs in a human language, then it is natural in that sense; conversely, non-occurring patterns must not be natural. Adopting a more gradient metric, it would be reasonable to try to relate naturalness to frequency asymmetries in the stress typology. More prevalent stress patterns and structures could be considered more natural than patterns which occur less often. Print resources and the availability of searchable, annotated databases provide some basis for judging the prevalence of many characteristics of stress patterns (e.g. Gordon 2016). 2 With high-quality quality statistical information about frequency, we might ask, for example: if left-aligned stress patterns are more widely attested than right-edge patterns (see below), is left-alignment the more natural option?
Unfortunately, frequency cannot be assumed as a proxy for naturalness with full confidence, given a lack of comprehensive frequency counts for a wide range of stress patterns. Some useful information about edge alignment in quantity-insensitive (QI) stress patterns is provided in works such as Hyman (1977), Gordon (2002), and van der Hulst & Goedemans (2009; the online version of StressTyp in 2009) and aggregated in Gordon (2016). Figure 1 (from Gordon 2016: 177, displayed with Gordon's caption), reveals asymmetries among languages reported to have a single fixed stress positioned at or near one edge of the word or the other. While the frequency information from the three surveys differs, perhaps due to variation in sampling, averaging over these sources suggests that a single fixed stress on an initial syllable is most common, followed by stress on a final or penultimate syllable. In contrast, languages with a single fixed stress on a peninitial or antepenultimate syllable are scarce.
Strikingly, given their theoretical prominence, binary iterative stress patterns may be much less common than non-iterative patterns: Gordon (2002) reports that in a sample of 262 languages with predictable, QI word stress, 167 languages (63.7%) had a single, fixed stress and only 52 (19.8%) had patterns in which more than one stress was assigned. Of the latter, 38 languages had alternating stress and 14 had a pattern in which stress occurs on a syllable at/near each edge in words that are long enough. Figure  2 (from Gordon 2016:179, with Gordon's caption), reveals asymmetries for four binary alternating stress patterns, based on frequency counts in Gordon (2002;38 languages) and Goedemans (2010;171 languages). Stress on odd syllables counting from the left is most common, followed by stress on even syllables counting from the right. These distributions are formally analyzed in terms of syllabic trochees assigned iteratively beginning at the left or right edge, respectively. Both of these patterns are much more common than the mirror image patterns (stress on even syllables counted from the left, or odd syllables counted from the right). Theoretically, phonologists interpret these asymmetries to mean that trochaic feet are more common than iambic feet in QI stress systems. From a language-processing perspective, it is reasonable to think that the greater prevalence of stress patterns with stress on word-initial syllables may be related to the usefulness of stress cues in signaling beginnings of new words in speech, a point by Cutler & Carter (1987), who found positive evidence for this proposal in a psycholinguistic study with English speakers. Compiled information for quantity-sensitive (QS) stress based on high-quality surveys is less available, but details from WALS (https://www.eva.mpg.de/lingua/research/tool.php), cited in Gordon (2016: 207), suggest a different balance: a sample of 48 languages with iterative QS stress patterns was evenly split between stress iterating from left-to-right vs. from right-to-left. In the same sample of QS systems, interestingly, regardless of the origin of footing (left vs. right), the syllable with primary stress tended to be positioned at the right edge (36 of 48 languages). This asymmetry hints at a dependency between weight sensitivity and alignment in stress patterning that deserves a closer look, and it provides a rationale for the study described in this article.
If it is reliable, the finding that binary alternating stress patterns are rarer than patterns with a single fixed stress might seem counterintuitive to phonologists; historically, theoretical discussions have tended to emphasize iterative stress patterns, as these have provided fertile ground for understanding the constraints (or parameters) which underpin the stress typology. But there is also reason to question the reliability of the frequency information we have. As Gordon (2016) points out, iterative stress patterns may well be underreported; some primary sources may describe main word stress and fail to mention secondary stress. Conversely, alternating secondary stress might in some cases be overreported (see Newlin-Łukowicz 2012, for Polish), given that humans tend to perceive sound sequences rhythmically, even in the absence of physical cues marking rhythmic contrasts (e.g. Bolton 1894). Recent reports of serious flaws in published, well-known descriptions of stress patterns (see Tabain et al. 2014;de Lacy 2014) highlight the importance of rigorously confirming impressionistic descriptions of stress patterns, which have often been provided by researchers who are not native speakers of the languages they have described.
The problem is clear: if we lack sufficient information about the prevalence of a wide range of stress patterns, and if we have low confidence on the information we do have, then ideas about natural stress patterning cannot rely on frequency statistics alone.

Naturalness and cognitive preferences
We present as our working assumption that a stress pattern (or one of its components) is "natural" to the extent that it is optimized by at least one aspect of human biology. Patterns that are grounded in the mechanics of speech articulation, or which facilitate speech perception, have long been considered more natural than patterns which are not grounded in these ways (e.g. Diehl & Kluender 1989;Archangeli & Pulleyblank 1994;Lindblom 2000). Many biologically grounded sound patterns can be shown to be preferred by humans.
But speech perception and production are not the only human systems for which sound patterns can be optimized and, accordingly, associated with some kind of asymmetry (or general preference) in nature. Some general preferences are based in other aspects of human cognition, for example humans' natural tendencies to group objects (visual or auditory) according to Gestalt principles. We expect some general preferences or biases found among speakers of diverse languages to be cognitively based and, as such, they should be quantifiable independently of noise introduced by native language experience. We further suggest that cognitive preferences for particular components of stress patterns have helped to shape the stress typology. We are not the first to think along these lines. Moreton (2008) draws a distinction between analytic (cognitive) bias and channel bias in considering influences on the evolution of phonological patterns. Hayes (1995) proposed that perceptual biases described by the Iambic/Trochaic Law (ITL) are the cognitive precursors for asymmetries in his foot inventory. The ITL asserts that listeners associate more intense sounds with group onsets and longer sounds with group endings, and there is some evidence to support these claims. 3 The burgeoning psycholinguistic literature on ITL effects attests to the value of studying prosodic patterns in laboratory settings where inferences about study participants' preferences are based on measuring their responses in cognitively based tasks.
In the core psycholinguistic research on ITL effects among human adults, inferences about perceptual grouping biases have been based on participants' responses when they were given two-alternative forced choice or recall-based tasks. Another way in which cognitive biases can be studied is by measuring participants' performance in artificial language learning (ALL) experiments (e.g. Wilson 2003;Finley & Badecker 2009;Culbertson 2012). If natural stress patterns are those which are preferred by humans (are cognitively grounded, per our working assumption), then learning biases can be interpreted as indicators of cognitively based preferences.
ALL studies of stress acquisition can produce two kinds of information that can be useful. First, learning biases can be inferred based on the ease with which humans learn features of unfamiliar stress patterns presented to them in an exposure or training phase of an ALL study. Ease-of-learning can be quantified as an accuracy score based on participants' decisions in a post-exposure/training test phase in which they indicate whether or not they think new vocabulary items belong to the artificial language (AL).
3 Psycholinguistic studies of ITL effects among speakers of diverse languages have assembled a nuanced picture. There is convincing evidence that humans prefer sound groupings in which louder sounds come first regardless of their language background. On the other hand, there does not seem to be a universal preference for groupings with longer sounds in final position and where this preference has been found, it is weaker than the intensity-based effect and more sensitive to contextual influences. A review of the recent literature on ITL and related effects is found in Crowhurst (2019).
This method tests the ability to match novel forms to trained patterns. The second kind of information is provided by ALL studies that add a poverty of the stimulus component. In such studies, participants are trained and tested as just described, but crucial data bearing on a component of the pattern are withheld during exposure. During the test phase, new forms are introduced that require participants to extend what they have learned in making an untrained generalization. The poverty of the stimulus method can be useful in revealing latent selection biases that are not reflected by an accuracy score in a strict exposure-and-matching study. The study we describe in section 2 provides both types of information.

Background
Given the paradigm's promise, surprisingly little ALL research has explored humans' learning of stress patterns. Some research has modelled the learnability of stress patterns computationally (e.g. Heinz 2009;Stanton 2016;Staubs 2014), for example, by implementing a gradual learning algorithm (GLA) for learning a constraint hierarchy given a data set and a set of weighted constraints. Such studies can be very useful, especially when results from the machine learner are considered together with other information. In one useful contribution, Stanton (2016) explores the midpoint pathology, which refers to an odd set of predicted but non-occurring patterns that are generated when antilapse constraints are highly ranked (Kager 2012). Stanton argues that midpoint systems are unattested not because metrical theory doesn't allow them, but because they are hard to learn, for two reasons. The first is that long forms needed to learn most midpoint systems are low-frequency and therefore unavailable to human learners. The second is that information about stress location is inconsistent in midpoint systems. In one such system, for example, a stress window shifts from the right to the left edge, as words get longer. Machine learning studies can be valuable as a proxy for human learners when investigating unattested patterns, and given that it is difficult to study the acquisition of a stress pattern with any precision in the L1 context.
In the two key human ALL studies, Carpenter (2010Carpenter ( , 2016, compared the ability of monolingual American English and Laurentian (Canadian) French speakers to learn "natural" (attested) and "unnatural" (unreported and not predicted under standard theoretical assumptions) stress patterns. In Carpenter (2010), participants were taught quality-sensitive stress rules: in the natural condition, nonce words presented stress on the leftmost of any syllables containing a low vowel, else on the initial syllable when only high vowels were present. In the unnatural condition, the pattern was inverted so that syllables with high vowels were stress attractors. In Carpenter (2016), the stress rules were quantity-sensitive: the special, stress-attracting syllables were closed CVC syllables in the natural condition and open CV syllables in the unnatural condition. In both studies, Carpenter found that participants in both language groups learned the natural pattern more successfully than the unnatural pattern, although there were between-group differences (the English speakers achieved higher accuracy scores and with less training than the French speakers). Carpenter's findings are valuable for establishing that an ALL paradigm can be used to study stress pattern learning, and because they suggest that humans can learn patterns that do and don't occur, but may be better at learning attested patterns.
The stress rules taught to participants in Carpenter (2010Carpenter ( , 2016) exemplify a "default-to-same" edge stress pattern, one of four related unbounded stress patterns in a mini-typology that reflects different settings of the edge alignment and quantity-sensitivity parameters. 4 This typology is schematically represented in Figure 3. In the pattern Carpenter taught to her participants, shown in Figure (3a), the "special" and default rules are left-aligning; stress falls on the leftmost syllable with a stress-attracting property, and when no special syllables are present, stress defaults to the same (left) edge. The mirror-image, right-aligning default-to-same edge stress pattern is represented in Figure (3d). In 4 In the early research on this typology, quality sensitivity was treated as a species of weight sensitivity (Kiparsky 1973 default-to-same edge patterns, alignment settings for the special and general rules do not compete. Other languages are reported to have a "default-to-opposite" stress pattern with conflicting edge alignment: in the default case, stress falls on the peripheral syllable at the opposite word edge when no stress attractors are present. While all of the patterns in Figure 3 are reported to occur in languages (see Figure 10 in our concluding discussion), the existence of default-to-opposite patterns (Figure 3b and 3c) has been questioned (Gordon 2000). Such challenges raise the issue of whether the dependencies between quantity sensitivity and edge alignment that are implicit in default-to-opposite patterns exist. Other information discussed in section 1.1 suggests the possibility of dependencies between quantity sensitivity and edge alignment in iterative QS stress systems.
In our research, we were interested in studying ease-of-learning in relation to edge-alignment oppositions; given the reported prevalence of word-initial stress in QI systems (Figures 1 and 2), we wanted to test whether a left-aligning pattern of prominence might be easier to learn than a right-aligning pattern. In addition, given the possibility of typological dependencies between quantity sensitivity and edge alignment in both bounded and unbounded stress systems, we were interested in discovering whether edgebased asymmetries might be influenced by syllable weight in a study in which the learning target was an unbounded pattern of prominence. 5 We used a poverty of the stimulus approach to study the following two questions. First, when English speakers are taught a default edge-aligning QI prominence rule and a QS prominence rule that is ambiguous with respect to alignment, which QS rule will they infer when the provision of critical new forms in a postexposure test phase requires them to choose between aligning edges? 6 Second, are there dependencies between the trained edge for the QI rule and the edge participants infer for the "updated" QS rule?
In this study, adult native English speakers in two learning groups were exposed to a set of trisyllabic nonce words with a single prominent syllable. A subset of these items (the QI set) contained only CV syllables (light syllables, or L); another subset (the QS set) contained a single syllable with a long vowel, CV:, in initial, medial or final position (the heavy syllable or H). 7 One syllable in every trisyllable was louder and higher pitched than the others; this we refer to as the prominent syllable (transcribed with an acute accent). Participants were to learn a QS and a QI generalization about the distribution of prominent syllables. The QS rule was that the prominent syllable must be the heavy syllable, if there was one (e.g. pá:teko, paté:ko or patekó:). The QI rule was that when no heavy syllables were present, prominence would fall on the short-voweled syllable in word-initial position (kétopa, for a group trained on the left edge), or the final syllable (ketopá for a group trained on the right edge). Words with two heavy syllables (the "hold out" set) were withheld during exposure and then added in the subsequent test phase.
The experiment had two phases. In Phase 1, the exposure phase, participants listened to and pronounced lists of 12 words, one by one. Each list was followed by review trials in which a two-alternative forced choice task tested participants' recall for items in the preceding list. In review trials, participants heard "decision pairs" consisting of a trisyllable from the preceding list (the target), and a foil, the same trisyllable with prominence on a different syllable. Participants were tasked with identifying the target. Phase 2 was a test in which participants responded to similar decision pairs based on novel trisyllables. This time, the target was defined as the member of each decision pair that belonged to the AL. In addition to new stimuli representing the exposure patterns, Phase 2 also included decision pairs for a Novel Patterns condition, which consisted of the trisyllables with two Hs (withheld during exposure). The Novel Patterns stimuli required participants to "update" the QS rule by deciding whether the leftmost or the rightmost heavy syllable should be the prominent one. This required participants to make an untrained generalization about edge alignment for the QS set and in these cases, the question was whether they would prefer prominence on the H closest to the trained edge (a default-to-same pattern), or on the one furthest from the trained edge (a default-to-opposite pattern).

Hypotheses
The study's design allowed for a variety of different outcomes, and competing hypotheses were possible. We expected participants in both groups to learn the trained default edge when no long vowel was present, but we anticipated a learning asymmetry that favored the left edge, for two reasons. First, if the typological asymmetry favoring prominence on initial syllables in QI stress systems (see Figures 1 and 2) reflects a general analytic bias, then we might expect to find this asymmetry reflected in participants' decisions in the ALL study. The second reason was that in participants' native language, English, initial stress predominates in nouns and adjectives with the structure LLL; words like Pamela, trinity are more common than medially stressed words like banana and vanilla. Also, while the stimuli used in the study were trisyllabic, we note that initial stress is also highly prevalent among English bisyllables, whereas bisyllables with stress only on the second syllable are less common (Cutler & Carter 1987). On the other hand, stress on the last syllable of an LLL form is not an English pattern. Both considerations led us to expect performance in the group with the right-edge default stress pattern to be weaker. 8 We expected both participant groups to learn the basic QS rule. As long as they did, there were competing hypotheses as to what they might decide when presented with the hold-out set (trisyllables with two Hs) in the test phase. One possibility was that response patterns might reveal a bias favoring prominence on the rightmost H, whatever the trained default edge had been, given the asymmetry favoring right-edge primary stress in QS systems reported by Gordon (2016: 207). In this case, assuming that participants learned the trained edge in the LLL condition, a preference for prominence on the rightmost H would be consistent with a default-to-opposite pattern in the left-edge group and a default-to-same pattern in the rightedge group. Alternatively, if Gordon (2000) is right to question whether true default-to-opposite patterns occur naturally, then we might expect a bias favoring prominence on the H closest to the trained edge in the LLL condition.
A final possibility relates to the phonological observation that information-rich and/or prominent phonological structures are often restricted to initial positions in languages, a species of positional prominence. In languages with vowel harmony, for example, the full range of vowel contrasts is often restricted to word-initial syllables (e.g. Turkish, Clements & Sezer 1982;Shona, Beckman 1997;Yaka, Hyman 1998). Cutler & Carter (1987) report that the proportion of stress-initial lexical words in running English speech is high, and they argue that English speakers tend to infer a word boundary before a stressed syllable. Stressed long vowels are salient, information-rich segments; they can encode information both about phonological length contrasts and surface stress contrasts. If there is a tendency to prefer salient, information-rich syllables to occur earlier rather than later in words (if not precisely at the beginning), then we might expect participants to favor pronunciations with stress on the leftmost long V when more than one is present, whatever the trained edge. That is, we might expect the responses in the left-edge group to reflect a default-to-same edge preference, and for decisions in the right-edge group to be consistent with a default-to-opposite edge preference.
We did not anticipate that interference from pre-existing biases would be as problematic when a heavy syllable was present as when there was none: English trisyllables in which two light syllables combine with an initial, medial or final heavy syllable (whether closed or open with a tense vowel or diphthong) are common (e.g. mandible, agenda, Brigadoon), and primary stress falls on the heavy syllable.

Stimuli and design
Vocabulary constructed for the AL were trisyllables in which light CV and heavy CV: syllables combined an onset in the set /p t k/ and one of the vowels /a e o/, or their long counterparts. Only non-high vowels were used, to minimize unwanted perceptual effects that might be associated with height-dependent differences between vowels. To create perceptual distance between the AL and the participants' native language (English), syllables were recorded by a native speaker of Croatian. Croatian does not have unstressed vowel reduction, as English does, and our intention was to create stimuli that participants would perceive as un-English-like. Syllables were recorded individually in the Croatian carrier phrase Reći ___ tri puta 'Say ___ three times'. A token of each syllable with modal voicing throughout was excised from the carrier phrase, segmenting at the beginning of the stop closure and at the beginning of the closure for [t] in tri.
The study's design called for trisyllables with seven arrangements of L and H syllables. A QI pattern with short Vs, LLL, and three QS patterns with one long V, HLL, LHL and LLH, were used in both the exposure and test phases. Three QS patterns with two long Vs, HHL, HLH and LHH (the novel "hold out" set) were used only in the test phase. Syllables were concatenated into trisyllables subject to the restrictions that no CV combination was repeated and no C occurred more than twice. The occurrence of Vs was not restricted in this way. A trisyllabic combination of CV sequences was never used twice with the same pattern of long and short Vs; however, some CV combinations were used with different length patterns in different phases of the experiment (e.g. tapoke in the exposure phase and tapo:ke: in the test phase). The occurrence of Cs and Vs in different positions was balanced across items used in each phase of the study. Every trisyllable was prefixed with a 500 ms silent period. The study materials are provided in Appendix A.
In addition to the duration cue, prominence was simulated by manipulating overall intensity, and pitch. 9 Prior to concatenation, syllables were edited individually in Praat (Boersma & Weenink 2019) to normalize vowel duration and intensity. Duration was adjusted to measure 150 ms and 250 ms in short and long Vs by copying or deleting every other voicing period from the sound wave as needed, preserving the vowel's natural amplitude contour. Intensity was normalized by changing gain to 65 dB in non-prominent Vs and 68 dB in prominent Vs. After concatenation, every trisyllable was given a pitch contour in which f0 was level at 225 Hz during the initial, medial, or final third of the prominent V (depending on whether prominence was initial, medial, or final) and declined smoothly to plateau at 180 Hz in non-prominent vowels.
Two lists of twelve vocabulary items were prepared for the exposure phase. List 1 contained only QI trisyllables (LLL), prepared with word-initial and word-final prominence for the left-and right-edge groups, respectively. List 2 contained four trisyllables for each QS pattern (HLL, LHL and LLH). List 2 items, all with prominence on H, were the same for the two participant groups.
Six QI trisyllables and six QS trisyllables (balanced for pattern) were selected for use in post-list review trials, structured as "decision pairs". Each decision pair consisted of a target, a trisyllable from the preceding list, together with a foil, a competing pronunciation with prominence in a different position. As an example, the List 2 target pekó:te was paired with a foil péko:te, with prominence (incorrectly) on the first syllable. Another decision pair combined pekó:te with the other foil, peko:té, with final prominence. These decision pairs (pekó:te -péko:te and pekó:te -peko:té) were presented twice, with target and foil in counterbalanced order. In all, 24 review trials followed each list (6 target words x 2 incorrect foils x 2 orders). Including the 500 ms of silence preceding each trisyllable, the longest decision pairs measured just under 3 sec in duration.
The test phase consisted of another 168 decision pairs based on a new set of trisyllables. In a "Trained Patterns" condition, 15 new LLL trisyllables were used in 60 decision pairs (two target/foil pairs per new trisyllable, with two counterbalanced orders). Fifteen new QS combinations, balanced for H's position, were used in another 60 decision pairs. For decision pairs in a "Novel Patterns" condition, eight trisyllables were prepared for each of three new patterns, HHL, HLH and LHH. Novel Patterns decision pairs combined pronunciations in which prominence was on one H or the other, and were counterbalanced. For example, the trisyllable ka:te:po was presented in the decision pairs ká:te:po -ka:té:po and ka:té:po -ká:te:po. (No Novel Patterns decision pairs allowed participants to choose between prominence on H or L.) In all, there were 48 Novel Patterns decision pairs (8 trisyllables x 3 patterns x 2 orders).

Testing procedures
All procedures followed a protocol (#2017-08-0089) approved by the Institutional Review Board at the University of Texas at Austin. Consent procedures and a short background questionnaire were administered when participants arrived. Afterwards, seated in a sound-treated room in a campus phonetics laboratory, participants were told they would learn to recognize words belonging to an unfamiliar, constructed language by their sound. They were instructed to pay attention to word stress, which was exemplified using the English words banana and acrobat as examples.
Training began with List 1 (pattern LLL), which participants moved through by pressing a button to hear each word in sequence. They were instructed to pronounce every word to themselves before proceeding to the next item. List 1 was followed by the 24 List 1 review trials, then List 2 and the List 2 review trials. On review trials, participants were instructed to indicate which member of the decision pair they had heard in the preceding list (the target) by pressing a designated button on a response pad. No response feedback was provided. To maximize learning opportunities, Lists 1 and 2 and the review trials were repeated once. Participants could pause, if they wished, between lists and before advancing to the test phase. Participants were told that vocabulary they would hear in the test phase were new although similar to items they had heard during training. Their task was to indicate which member of each decision pair they thought belonged to the AL (the target).
Participants listened to stimuli through studio-quality Audio-Technica headphones connected to the 13inch Apple Pro laptop used to run the experiment, which was controlled by SuperLab 5.0 (Cedrus Corporation). The order in which items were presented at every stage was randomized by the software. Most participants completed the experiment in 25 minutes or less.

Participants
Thirty-eight participants were tested in the left-edge group and 43 in the right-edge group. The threshold for inclusion in the statistical analysis was an accuracy score of 70% on the second round of review trials in the exposure phase. Twenty-five participants in the left-edge group (18 women and 7 men) and 24 in the right-edge group (17 women and 7 men) met this criterion. While this exclusion rate is high, we believe it may be due to the absence of performance feedback during training. All participants were native speakers of American English and none but one had had substantial L2 exposure prior to age 13. One participant had begun to learn Spanish in classes in elementary school. All participants had studied at least one foreign language in high school or college. All participants but one were members of the university community and were recruited through a university-wide events bulletin, announcements in undergraduate classes, and by word of mouth. Participants were paid $10 or received course credit for their time. They ranged in age from 18 to 45, with most participants aged between 18 and 25.

Data handling and statistical procedures
The response data were cleaned by eliminating trials on which participants had pressed a non-sanctioned button, and responses with reaction times (RT) shorter than 1,000 ms or longer than 10,000 ms (measured from the beginning of the stimulus event). In fact, there were no RTs between 408 ms and 1000 ms. Most short RTs were below 100 ms, which suggested that the participant might have pressed a response button twice by accident. The upper threshold of 10,000 ms was arbitrarily determined. No stimulus event was longer than 3,000 ms and we considered that a 10,000 ms RT allowed participants 7 seconds after hearing each decision pair to respond. Of the 10,584 observations expected per the design, these exclusions left 10,509 data points for the statistical analysis, an exclusion rate of .71%. The raw data, organized by category and response type, are presented in Appendix B.
The data were analyzed in R and R Studio (R Core Team 2018; RStudio Team 2016). As a first step, exact binomial tests of proportions (α = .05) were conducted to determine whether the proportion of positively coded responses for each type of decision pair was significantly better than would be expected by chance. These tests provided an indication of participants' success in generalizing the rules they learned in the training phase to new examples of the same patterns, and of any latent bias in the Novel Patterns condition.
Variations in participants' response patterns across the inventory of decision pair types (which differed by Pattern and Foil) were explored by fitting mixed-effects models (likelihood ratio tests) to the response data, as described in the sections devoted to the training and test results, using the mixed function in the afex package (Singmann et al. 2019), a wrapper for lmer which provides convenience functions for the analysis of factorial data. The dependent variable was a binomial and the interpretation of positively coded responses differed depending on the analysis (as explained below). Edge was a between-subjects factor that encoded the trained edge in the QI condition (two levels, left and right). The within-subjects factors were Pattern, with four levels in the analyses of the Trained Patterns data (LLL, HLL, LHL and LLH), and three levels in the separate, Novel Patterns analysis (HHL, HLH, and LHH). Foil coded the position of the prominent syllable in the distractors paired with targets in decision pairs (two levels, the position closest to the left edge, in foils like péko:te and to:téka; and the position closer to the right edge, in foils like peko:té and to:teká). The inclusion of Edge and Pattern in the statistical analysis was hypothesis driven (see section 2.2). Foil was included because, given that participants' task required them to select one member of each decision pair and reject the other, we were interested in the types of errors participants might make. In decision pairs for the QS patterns, in particular, if participants failed to reject a foil/distractor in which prominence incorrectly occurred at the word-edge consistent with LLL forms presented during exposure, this could be interpreted as an effect of the trained edge. Significant interactions in the mixed models analysis were explored post hoc by making pairwise comparisons using the emmeans package in R (Lenth 2020).

Recall accuracy in the training phase
Participants were included in the statistical analysis for the test conditions if they scored at least 70% on the second-round review trials in the exposure phase. This score measured recall accuracy, the participants' success in identifying trisyllables they had heard in a preceding vocabulary list. Our primary purpose in analyzing these data was to test for significant group-based differences in recall accuracy as a prelude to analyzing the test data. A visual inspection of Figure 4, which represents the training results by Pattern, indicates that performance in the left-and right-edge groups was comparable.
Recall accuracy was high in both groups overall, although lower for trisyllables ending in a long vowel (pattern LLH). Exact binomial tests conducted on the recall data for both groups, combined, indicated that the true probability of success (PoS) for each pattern, averaged over Foil, was significantly greater than .5, the level of chance. (The outcomes of the binomial tests are shown in Table C1 in Appendix C.) The purpose of the statistical analysis was to test for group-based differences based on the trained edge as a precursor to analyzing the response data for the Trained Patterns test condition. To this end, a mixed effects model was fitted to the recall data using the afex function with Edge as a fixed effect, Participant as a random intercept and random slopes for Pattern and Foil. Edge was not significant in this model (F(1, 47) = 2.06, p = .16).  In this condition, the dependent variable measured congruent responses. Congruent responses successfully identified targets in decision pairs, but as the stimuli were new, the target was the trisyllable whose prominence position matched the exposure patterns.
A comparison of Figures 4 and 5 shows that success in the test phase was lower overall than on the review trials. We expected this, as participants were not confirming identity in test trials, but were applying learned rules to new forms. Figure 5 shows that success in the two groups was comparable for the QI pattern, but there were differences among the QS patterns: left-trained were more successful than righttrained participants in choosing the HLL target, and less successful for LLH. The proportion of congruent responses was more even across the four patterns in the right-trained group overall although there were differences by type of foil (see below). Exact binomial tests conducted for each category (Pattern by Foil) graphed in Figure 5 indicated that the proportion of congruent responses was significantly greater than .5 in all but four cases (see Table C2 in Appendix C). In the left-trained group, success for pattern LLH was significantly lower than chance would predict when the foil had initial prominence ('LLH). (An uptick ['] indicates prominence on the following L.) For pattern LHL, success was not significantly better than chance when the foil had initial prominence ('LHL) in the left-edge group, and final prominence (LH'L) in the right-edge group; and for pattern HLL in the right-edge group when the foil had final prominence (HL'L). These results suggest that participants were less successful in choosing the target when the foil had prominence at or closer to the trained edge.
To test the statistical strength of the differences seen in Figure 5, a mixed effects model was fitted to the Trained Patterns response data using the afex function in R. This model included Edge, Pattern, Foil and the interactions Edge:Pattern and Pattern:Foil in fixed effects; random slopes for Pattern and Foil; and Participant as a random intercept. 10 The output of this model, shown in Table 1, reveals that the fixed effect of Pattern was significant. The main effects of Edge and Foil were not significant, but their effects are seen in the significant values associated with the interactions Edge:Pattern, Edge:Foil and Pattern:Foil (discussed below). The significant interactions in Table 1 were explored by making pairwise comparisons using the emmeans package in R (Lenth, 2020). The interpretable contrasts which were revealed to be significant are presented in Table 2. (The full set of interpretable comparisons and the code used in the statistical analysis are provided in Tables C3, C4 and C5 in Appendix C).  Table 1 is the most maximal model that converged. The maximal model, with a random slope for Pattern*Foil and the three-way interaction Edge*Foil*Pattern in fixed effects, failed to converge.  Table 1, none of the pairwise comparisons was significant. Despite a tendency noted in Figure 5 for participants to make more mistakes when a foil had prominence at the trained left edge, in certain QS patterns, there was no statistical support for concluding that this was an effect of training.
To help with the interpretation of the significant Pattern:Edge interaction in Table 1 and the pairwise comparisons in Table 2a, the response data for Pattern in the two groups, averaged over Foil, are represented in Figure 6. The graph shows that outcomes for Pattern were more variable in the left-edge group (white bars) than in the right-edge group (grey bars). In the pairwise comparisons based on the omnibus model (Table 1), the contrasts between patterns HLL and LHL, and between HLL and LLH were significant in the left-edge group. In other words, the greater success identifying targets when H occurred at the trained edge, compared to the other QS patterns, was significant in the left-trained group. Interpreting, the trained left-edge bias in the left-edge group may have been enhanced when the first V was long. Our contention that this was an effect of the trained edge is supported by the finding that the between-group difference for pattern HLL was marginally significant (comparison HLL (left) vs. HLL (right) in Table 2). None of the by-pattern differences seen in Figure 5b for the right-trained group were significant.
The Pattern:Foil interaction can be meaningfully interpreted only for the QS patterns because target/foil combinations were different for pattern LLL in the left-and right-trained groups. Figure 7 graphs the outcomes for patterns HLL, LHL and LLH by Pattern and Foil. The only contrast that was significant in the pairwise comparison was for pattern LLH: the result in Table 2b confirms that participants were significantly less successful in choosing the target LL'H when the competitor had prominence on the initial L (i.e. 'LLH), than when it had medial prominence, L'LH. This can be seen in the rightmost bars in Figure  7. A comparison with Figure 5 indicates that this effect can be attributed to the weaker (well below chance) performance in the left-edge group. While the difference between LHL and LLH in the left-edge group was not significant in the pairwise comparisons, the finding that the left-edge group was least successful with pattern LLH is still notable. Interpreting, we associate this outcome with the phenomenon of preboundary lengthening (Fletcher 2013, and references cited therein): if left-trained participants discounted vowel duration in final position, they may have fallen back on the default left-edge pattern and treated the 'LLH foil as an 'LLL target. The difference between the two groups for pattern LLH, although apparent in Figure 6, was not significant in the pairwise comparisons. How can the findings in the Trained Patterns test condition be interpreted? We begin by observing that the study's outcomes confirm our initial expectations: participants in both groups learned the QI rule for their group and the basic QS rule well enough to perform significantly better than chance would predict for most pattern/foil combinations (see the binomial tests in Table C2, Appendix C). We did not find a stronger effect of left-edge training for the LLL pattern; performance was comparable in the two participant groups in the QI condition. This outcome shows that English-speaking study participants were able to learn a simple prominence-based pattern that is not present in their native language (final prominence in LLL), and they performed as well as participants learning the mirror image pattern (initial prominence in LLL) which is consistent with English stress rules in some lexical classes (at least, nouns and adjectives).
While participants learned the QS rule (prominence cues should mark the heavy syllable, if one was present), the results revealed interesting effects of the trained edge on participants' success for the QS patterns. In the left-trained group, success for the HLL pattern was significantly higher than for the LHL and LLH patterns, suggesting that the effect of edge-training and syllable weight was combinative. (Success for HLL was even higher than for LLL in the left edge group, but this result was not significant.) In general, left-trained participants were more likely overall to pick the member of a decision pair with initial prominence, which resulted in more errors for patterns LHL and LLH when the foil had initial prominence. In these cases, left-trained participants failed to reject the foils 'LHL and 'LLH in favor of the L'HL and LL'H targets, and the proportion of failures in these cases was higher than would be expected by chance. Left-trained participants were particularly bad at rejecting 'LLH foils in favor of the LL'H targets. We see this not purely as an effect of the trained edge, but as related to pre-boundary lengthening: participants trained on the left edge were not primed to attend to final syllables, and we suggest that they were discounting the increased duration of final H syllables in this position. We did not find comparable differences in the right-trained group, where success for patterns LLL and LLH was comparable; if righttrained participants were discounting the duration of final H syllables, then they may simply have perceived 'LLH foils as 'LLL targets. There were other effects of the trained edge in the right-edge group in that participants' worst performance was for patterns LHL and HLL, where they failed to reject foils with prominence at the trained right edge (LH'L and HL'L) in favor of targets with prominence on H (L'HL and 'HLL) more often than in other conditions. The binomial tests in Table C2 (Appendix C) indicate that participants' success was not significantly better than chance in these cases.

Which heavy syllable? Inferring an edge-aligned QS rule
In the Novel Patterns condition, decision pairs offered participants the choice between prominence on the first or the second of two heavy syllables (e.g. kó:te:pa vs. ko:té:pa). While the participants' task was the same as in the Trained Patterns condition (to identify the member in each decision pair which they thought belonged to the AL), a positively coded response in the Novel Patterns condition meant the participant had selected the trisyllable with prominence on the leftmost H. Figure 8, which graphs the results by Edge and Pattern, reveals a bias favoring the leftmost H for all patterns (HHL, HLH and LHH) in the left-edge group, and no clear preference in the right-edge group.
The output of the best-fitting model for the Novel Patterns condition, which included the terms Edge, Pattern and Edge*Pattern in fixed effects, a random slope for Edge*Pattern, and Participant as a random intercept, indicated that the effect of Edge was significant (F(1, 47) = 6.52, p = .01). (The output for the full model is provided in Table C7, Appendix C). Binomial tests of proportions confirmed that the bias favoring prominence on the leftmost H in the left-edge group was significantly greater than .5 for all of the patterns HHL, HLH and LHH (see Table C6, Appendix C). Although differences across categories are apparent in Figure 8 for the left-trained group, the output of the mixed effects model indicated that the that the fixed effect of Pattern was not significant. Patterns condition in the test phase However, the group-level outcomes do not tell the whole story. Figure 9 provides a more granular view of response patterns at the participant level, aggregating over Pattern. The x-axis represents the percentage of responses favoring prominence on the leftmost H, while the y-axis represents the number of participants in each bin. In this graph we see that all but three of the participants in the left-edge group (white bars) show a response bias favoring prominence on the leftmost H. A majority of the left-trained participants (20 of 24) cluster in the bins between 60 and 89.9%, with the 60-69.9% representing the central tendency (M = .625, SD = .152). On the other hand, the preferences of the right-trained participants (grey bars) were somewhat more distributed (M = .518, SD = .140). Half of the participants (12 of 24) showed a response bias favoring prominence on the first H, and the other half on the second H. These distributions demonstrate that in general, exposure to the left edge QI pattern predicted a left-oriented bias for the QS pattern, but right-edge experience was not strongly associated with either a left-or right-oriented bias.

Main findings
At the group level, participants in our study learned the QI rule equally well regardless of left-or right-edge exposure. We initially expected that participants might be more successful in learning the left-edge QI rule for two reasons. The first is the typological observation that initial main stress seems to be more common than final main stress in QI stress systems (see Figure 1). As noted in section 1, this asymmetry might be linked to the utility of initial stress in signaling beginnings of words in speech (Cutler & Carter, 1987). A more compelling reason, we thought, related to participants' native experience of English. In English, initial stress predominates in LLL trisyllables (e.g., Pamela, trinity) whereas the mirror image pattern, with stress on the last of three light syllables, is not an English pattern. That the trained edge was learned equally well in both groups, contrary to our expectation, is an important finding: it argues that adults whose native language has a stress contrast can learn an AL with non-native characteristics, and their ability to do so is not completely constrained by L1 biases (see section 5.2, below). This should not necessarily be surprising, since adults can, after all, learn foreign languages, but it is useful to have evidence from a controlled ALL study.
We expected participants in both groups to learn the QS rule, and this expectation was broadly confirmed by the study's results. In the Trained Patterns condition, success in identifying targets was significantly above chance in most Pattern/Foil combinations. Interestingly, in light of the outcome for the LLL pattern, we observed effects of the trained edge in both participant groups for the QS patterns. The influence of trained edge was most dramatic in the left-edge group, where success was significantly higher when the heavy syllable was at the trained edge (the HLL pattern) than when it was not (the LHL and LLH patterns).
There was more variability across patterns in the left-edge group: success was significantly higher for HLL than other QS patterns. Success across patterns was more even in the right-edge group and none of the differences seen in Figures 5b, 6 and 7 for this group were significant. Our interpretation of the higher success for pattern HLL is that left-edge participants' attention to the word-initial syllable was enhanced when the vowel in that position was long. We suggested that the lower success in the left-edge group for pattern LLH may reflect participants' discounting vowel duration in final position due to their implicit expectation of pre-boundary lengthening. By contrast, performance for patterns LLL and LLH in the rightedge group was roughly comparable and as noted, differences by Pattern were not significant in the right-edge group). The result for pattern LLH in the right-edge group is hard to interpret: it might suggest a greater sensitivity to final vowel duration in the right-edge group, which could also be interpreted as an effect of the trained edge. On the other hand, right-edge participants might also have discounted final vowel duration, in which case, they might not have been distinguishing LLH from LLL stimuli.
When differences by Foil were examined, success identifying targets was lower in both groups when the foil had stress on a light syllable at the trained edge. In each group, decision pairs for two patterns had targets with stress exactly at the trained edge (LLL and HLL in the left-edge group; LLL and LLH in the right-edge group). In the other two patterns (LHL and LLH in the left-edge group; LHL and HLL in the right-edge group), one target/foil combination provided an opportunity to reject the target in favor of a foil with stress on the L at the trained edge. Participants in both groups chose the target less often when this was the case. That is, for pattern LHL, success for a target like peká:te was lower in the left-edge group when the foil was péka:te, and lower in the right-edge group when the foil was peka:té. This effect was especially pronounced for pattern LLH in the left-edge group, where a target like kopeté: was rejected significantly more often when the foil had initial stress, kópete:.
In the Novel Patterns condition, we observed a preference for stress on the leftmost of two heavy syllables in the left-edge group for all patterns (HHL, HLH and LHH), a bias which shows the influence of the trained edge. In contrast, there was no obvious effect of the trained edge in the right-edge group; when asked to indicate whether stress on the first or the second of two long Vs was more characteristic of the AL, the outcome revealed no clear preference, overall. However, when performance was examined across participants, a clear difference between the two groups was again observed: training on the left edge predicted a left-edge bias (21 of the 24 left-trained participants), whereas right edge training did not predict a preference for either edge; participants were evenly divided between a bias favoring one edge or the other. The indeterminate outcome in the right-edge group was interesting, given that the findings in the Trained Patterns condition indicated that these participants had clearly learned both the QI and the QS rules, as well as limited evidence of a bias favoring the trained edge when a single long V was present.

Can the outcomes be explained by native language experience? A more detailed look
We expect native language experience to have an effect on L2 learning, including in ALL studies. L1 effects on learnability are interesting in their own right and worthy of future investigation. In this section, we try to pick apart the possible influence of experience with English stress patterns and the effects of exposure to AL forms in the study. Where might biases due to AL exposure be observed in our data, and which patterns are not so easily explained in these terms?
Native language characteristics can influence L2 learning at every level of the grammar, including the acquisition of prosody (e.g. Portin et al. 2008;Vainio et al. 2014;Rasier & Hiligsmann 2007), and L1 influences can be either positive or negative. While some research has shown that phonological "wordlikeness" in the L1 improves vocabulary learning in ALL studies (as in other L2 learning), other research supports a "low-wordlike" advantage, especially at the initial stages of learning, as novelty can make less word-like items more salient (Bartolotti & Marian 2017, and works cited therein). Learners are influenced by L1 stress characteristics in acquiring both novel L1 vocabulary and in straightforwardly perceptual tasks. Sulpizio & McQueen (2011) show that Italian speakers' abstract knowledge of lexical stress patterns in the native language facilitated their learning and recognition of novel L1 words. In the literature on perception, research on stress deafness has shown that speakers of languages with predictable or no word-level stress (e.g. Finnish, Hungarian, French) have more difficulty perceiving the stress contrast in nonce words than speakers of languages with unpredictable stress, such as Spanish (Peperkamp et al. 2010, and references cited there). 11 As for our participants, two different kinds of information available to native English speakers could have influenced the study's outcomes. The first would be frequency statistics about word stress position, both in speech and in the lexicon. The second would be native speakers' intuitive knowledge of English stress rules. Metrical analyses since Chomsky & Halle (1968), at least, have claimed that although English stress patterns are partially lexicalized, rules positioning primary stress in all word classes operate mostly at the right edge.
To have a sense for the types of words native English speakers actually encounter, we conducted a simple survey of primary stress in nouns in the CELEX database, using the Reelex interface, Version 0.4.5 (http://web.phonetik.uni-frankfurt.de/simplex.html). 12 Table 3 provides information for underived nouns and Table 4 for morphologically complex nouns. The information in Tables 3 and 4 paints a striking picture of asymmetries in word length and stress position in the data English speakers are exposed to. In Table 3, we see that monosyllables account for 80.02% and bisyllables another 17.8% of tokens in the CELEX database for monomorphemic nouns. The lexical counts (distinct words) are less extreme (44.86% for monosyllables and 39.92% for bisyllables). Trisyllables account for a paltry 2.08% of monomorphemic tokens and a larger share, 12.96% of lexical nouns in CELEX.
English, the distribution of primary and secondary stress is generally predictable but varies depending on the positions of heavy syllables. English speakers are clearly able to perceive stress contrasts. 12 We are grateful to Henning Reetz for providing this resource. The counts for monosyllables in Table 4 are dramatically lower; this is expected, given that inflectional and derivational suffixes commonly add one or more syllables. Bisyllables account for 31.47% of tokens and 31.09% of lexical words. The percentage of derived trisyllables is slightly higher, 36.70% of tokens and 36.07% of distinct words. Words of more than three syllables account for only 3% of distinct words and about .01% of tokens in Table 3. The percentages of derived words (Table 4) longer than three syllables are higher: 32.20% of distinct words and 28.7% of tokens. Counting only monomorphemic and morphologically complex nouns in CELEX, 75.27% are monomorphemes. If the CELEX counts (which are actually based on print sources) are at all representative of speech, then English speakers' surface exposure to nouns is overwhelmingly skewed toward words of fewer than three syllables.
When stress position is taken into account, we find that bisyllables overwhelmingly have initial stress: 90.46% of lexical words and 91.92% of tokens in Figure 10; and 90.1% of lexical words and 93.99% of tokens in Table 4. Table 3 shows that among the monomorphemic trisyllabic lexical words, the ratio of initial to medial stress is roughly 2:1, and only 6.8% of lexical words have final stress. Among trisyllabic tokens, primary stress predominates (80.29% of tokens), with medially stressed tokens running far behind (15.82%) and tokens with final stress a distant third (3.89%). The disparities among morphologically complex trisyllables (Table 4) are less pronounced, but here initial stress still predominates among lexical words at least, 67.86%. Among tokens, the percentage of medially stressed trisyllables is higher, 55.36%, compared to trisyllables with initial stress, 42.01%.
Counting only monomorphemic and morphologically complex nouns in CELEX, 75.27% are monomorphemes. Having no basis to infer morphological complexity (or take account of other special circumstances), we assumed that participants in our study would treat the trisyllabic exposure stimuli as morphologically underived and paradigmatically regular; the figure just cited suggests that our assumption is reasonable. 13 If the CELEX counts (which are actually based on print sources) are at all representative of speech, then English speakers' "language-in-use" exposure to nouns is overwhelmingly skewed toward words of fewer than three syllables, and the predominance of word-initial main stress is clear, both in the lexical and the token count.
The initial stress bias indicated by our CELEX counts might seem to predict the success of the group in our study who learned the left edge QI rule, but it does not predict the success of the other group in learning the right edge. One way to think about this is that high-frequency patterns do not account for everything a language user learns, and a relatively small number of forms, it would seem, can provide the data needed to form a different generalization. It is generally accepted that the distribution of stress in English is mostly rule governed, although complicated by factors such as contextual quantity-sensitivity, lexical class and morphological conditioning, and the existence of subpatterns and exceptions which may be associated with loan status, for example (Chomsky & Halle, 1968;Hammond, 1984;Kager, 1989). 14 The counts in Tables 3 and 4 do not take account of differences in syllable weight and the positions of heavy syllables. In many initially stressed words, the first syllable is heavy (e.g. fountain), and the rule for English would be that a heavy syllables in other positions also attract main stress (for example, giráffe, agénda, Sàskatóon). Phonologists argue that English has a right-edge stress rule (Chomsky & Halle 1968;Hammond 1984;Kager 1989;Domahs et al. 2014). Primary word stress is restricted to a word-final trisyllabic window, and which syllable in the trisyllabic window has primary stress depends on factors including syllable quantity and extrametricality, which varies depending on lexical class and derived vs. underived status (e.g., Hayes 1982). Initial stress in a trisyllable (such as the LLL and HLL patterns in our study) is consistent with a word-final trisyllabic stress window, and indeed, we note that the right-edge location of the window can only be learned based on long words (e.g. hàmamèlidánthemum, sùpercàlifràgilìsticsèxpiàlidócious), which as the CELEX counts attest, are low frequency in English. It is also clear from bisyllables and words of four or more syllables in which multiple syllables are stressed (pòntóon, Mìnnesóta) that the primary stress in English is located by a right-edge rule. 15 Given arguments that at least the English main stress rule works from the right word edge, we think that a right-edge learning advantage could have been considered as a competing hypothesis. If there are reasons to think that English speakers might be sensitive to either the right or the left word edge in stress positioning, what might our results have to tell us about the kind of implicit L1 knowledge that may have influenced participants' stress pattern learning in our study -knowledge of phonological rules? Or knowledge of frequency statistics relating to the distribution of stressed syllables in words (the initial stress bias indicated by the CELEX counts)? These issues cannot be completely teased apart in a study in which all stimuli were trisyllabic: since English's phonological rules allow primary stress to appear no further to the left than the antepenultimate syllable, demonstrating an influence of the phonological generalization would require observations 13 A future study of this type could control participants' assumptions about lexical class by pairing stimuli with images of objects in the exposure phase, or presenting stimuli in a frame of the type "This is a ____". In either case, the "foreign" words would be learned as nouns.
14 See Hammond (1999) for an excellent description of English word stress patterns and review of the literature. 15 In trisyllables, initial main stress in HLH words (and also LLH forms like Bígelòw, chíckadèe) is the result of a stress-retracting rule (the so-called Nightingale Rule). The medial CVC syllables in nightingale and Arkansas pattern as light between stresses. Word-final CVC syllables also pattern as light; only syllables containing certain vowels and diphthongs attract stress word-finally in nouns.
based on stimuli longer than three syllables. An ALL study that interrogates the issues raised in this paragraph would be a useful future undertaking.
We can offer suggestions based on the findings of the current study. As noted in section 5.1, that the trained edge in the LLL pattern was learned equally well in the two participant groups is an important finding that argues against a completely constraining role for L1 biases, at least in the QI condition. Participants' success in learning the left-edge QI rule was not surprising, given the prevalence of the 'LLL pattern in English (e.g. trinity). However, if this dominant native pattern influenced participants' learning, then right-trained participants should not have demonstrated so strong a trained edge effect in the QI condition, given that LL'L is not allowed in English: wordfinal light syllables are never stressed in English. Moreover, if left-trained participants were relying on implicit L1 knowledge (whether of a right-aligning phonological rule or of more probabilistic distributional statistics), we would have expected to observe larger differences depending on stress position in the foil. Specifically, we would have expected fewer congruent decisions when the foil was L'LL than when it was LL'L; L'LL forms exist as exceptional patterns in English (e.g. banána, vanílla) and should compete with the 'LLL pattern, while the un-English-like LL'L pattern should have been rejected more frequently. In sum, our findings do not support the conclusion that participants' learning of the trained edge in the QI condition was influenced by L1 knowledge of stress positioning; this is contradicted by the success of the right-edge group.
We observed effects of the trained edge in the QS Trained Patterns and Novel Patterns conditions. These were most pronounced in the left-edge group, but we argue that they were also present in the right-edge group. We expected success in the QS Trained Patterns condition to be above chance regardless of trained edge because although the phonetic exponents of stress in our study differed from English, underived trisyllabic nouns with a single heavy syllable with primary stress in any position are familiar to English speakers (e.g. góndola, agénda, Sàskatóon). In the left-edge group, as noted, we did find a significant left-edge advantage for the QS pattern HLL compared to patterns LHL and LLH, and in the Novel Patterns condition, forms with stress on the first of the two heavy syllables were chosen significantly more often than chance.
Importantly, words in which only one of two heavy syllables attracts stress do not exist in English. However, if participants were influenced by their L1 knowledge of primary stress patterns, then the left-edge participants' bias favoring stress on the leftmost heavy syllable in the Novel Patterns condition might be expected, at least for the HLH and LHH patterns. Among underived trisyllables with a HLH pattern, stress on the first heavy syllable is the general rule, as in níghtingàle and Árkansàs (Hayes, 1982;Hammond, 1999), although forms with main stress on the second H also exist (e.g. kàngaróo, Tìmbuktú). While the LHH pattern seems to be less common (perhaps in part because the set of syllables patterning as heavy in final position is more restricted than in nonfinal position), main stress naturally falls on the first H (e.g. stalágmìte; Nanáimò, a place name in British Columbia). 16 However, this explanation is not available for the HHL pattern; in words of this shape, English's rules place main stress on the second H (e.g.

bàndána, Hèlsínki, Phìlémon).
What is also not easily explained in terms of an influencing role for L1 stress knowledge is the absence of any overall bias in the right-edge group. There were no significant differences across Patterns in either the Trained or the Novel Patterns condition in the right-edge group, and in the Novel Patterns condition, there was no group-level preference for stress on either the leftmost or rightmost H. Our interpretation of the outcomes we observed in the left-edge group is that an initial stress bias for which there is some evidence in English may have enhanced the effects of left-edge training in our study, even though the initial stress bias is stronger in English LLL and HLL words than in LHL words. (It should be noted that in LLH nouns in English, the initial syllable is assigned stress, whether primary, as in chíckadèe, or secondary, as in Sàskatóon.) Left-edge training in the study may have made these participants more sensitive to vowel length in initial position than elsewhere, but did nothing to increase their sensitivity to vowel length in final position where they may have discounted increased duration, which we have suggested is related to the expectation preboundary lengthening. We do not see any obvious influence of L1 stress patterns in the rightedge outcomes for the QI and QS Trained patterns, but we do see modest effects of L2 training in the study. The absence of significant differences by Pattern in the right-edge group suggests that right-edge training may have worked against participants' knowledge of native English stress patterns and perhaps, to some extent, against a natural tendency to discount final vowel length. Harder to explain is the greater variability at the participant level in the Novel Patterns condition in the right-edge group: as many participants preferred stress on the first as on the second of the two heavy syllables in HHL, HLH and LHH items. This outcome suggests neither a strong influence of the trained right edge, nor very clearly participants' implicit knowledge of how trisyllables with multiple heavy syllables are assigned stress in English.
Anticipating the discussion in section 6, we note that the by-participant variability in the Novel Patterns condition may reflect a more general dispreference for stress on the last of a series of heavy syllables, when there is a choice. We recognize the need for caution in raising the possibility of general analytic biases, given that only English speakers were tested in our study. On the one hand, we would argue that testing the learning of opposing characteristics of stress patterns among speakers of the same language is a good starting place for studies of this kind: given that the participants' exposure to native stress patterns was presumably comparable in both groups, it is possible to draw careful conclusions about differences based on training. However, we would emphasize that this study is a first step, and achieving a fuller understanding of general biases will require studies with speakers of languages whose prosodic characteristics differ.
Summarizing, our results point to L2 training effects and the possible influence of knowledge about native stress patterns in the left-edge group. Our interpretation of the outcomes in the leftedge group is that an initial stress bias which exists in English and left-edge training together may be responsible for a left-edge bias favoring trisyllables with stress on an initial heavy syllable and on the first of two heavy syllables. Importantly, not all of the ways in which the left edge bias was revealed in the left-edge group can be explained in terms of L1 biases, and we conclude that we have demonstrated effects of exposure to the AL pattern. We did not see clear evidence for L1 biases in the results for the right-edge group. We observed a strong effect of the trained right edge for the QI pattern and moderate effects of the trained edge in right-trained participants' biases for the QS patterns. 2014; Tabain et al. 2013). The general motivation for this study, as a first step, is the idea that naturalness in sound patterning (in this case, phonological stress) can also by studied by conducting ALL experiments, by now, a well-established methodology in laboratory phonological research. In our study, participants were exposed to new patterns and tested to see both how well they recognized a form they had previously heard which was presented together with a different stress pronunciation; and how well they generalized the exposure patterns to novel forms. If participants' ability to perform these cognitive tasks is superior for one pattern, as compared with another (possibly opposing) pattern, then there may be grounds for considering the first pattern to be more natural than the second, in that sense. We believe that information from studies of this kind can potentially be used to support or challenge conclusions about natural stress patterning based on typological frequency statistics.
It is important for us to signal that we cannot know for certain that the participants in our study learned a stress pattern per se; the most we can do is make inferences about what they perceived based on their responses in cognitive tasks. Our participants learned a prominence-based pattern which was modelled after stress patterns found in languages, and the cues used to signal prominence were typical of stress cues documented in many languages (see references cited in the early sections of the paper). We believe that any biases observed in a study like ours, if they can be teased apart from biases related to native language experience, can shed light on basic preferences that over time, may have contributed to shaping the stress typology in human languages.
We were interested in preferences relating to the edge alignment and quantity-sensitive parameters. Typological information suggests that in QI stress systems, both iterative and noniterative main stress occurs more frequently on word-initial syllables than closer to the right edge (see Figures 1 and 2). As noted earlier, a general bias favoring word-initial stress makes sense, given its utility for the word segmentation task; some researchers have argued that English speakers associate stressed syllables with beginnings of new words (Cutler & Carter 1987). The typological left-edge asymmetry is weaker in QS stress systems. As noted in section 1.1, Gordon (2016: 207), observes a different balance in a 48-languages with iterative QS stress. The origin of footing was at the left edge in half of these languages, and at the right edge, in the other half. Regardless of the direction of footing (left vs. right), the primary stress was positioned at the right edge in 75% of the languages in the same (36 of 48 languages). This asymmetry hints at a dependency between weight sensitivity and alignment that provided a rationale for our study.
The default-to-opposite/default-to-same mini-typology that we have used as a point of reference in this study provides evidence of a different kind of dependency between edge alignment and weight-sensitivity, but in unbounded stress systems. These patterns were first introduced schematically in Figure 3. Figure 10 provides lists of languages which are reported to exemplify the patterns in this mini-typology (Hayes 1995;Gordon 2000;Walker 1997;andHeinz 2009, citing Bailey 1995). 18 These lists bring together the more straightforward cases discussed in the literature, 19 and we note again that there are questions about the reliability or consistency of the sources from which these lists are drawn. A case in point is Khalkha Mongolian, which is not included because it has been variably described as a right-orienting default-to-same edge system 18 Some of the languages in Figure 10 have additional contingencies. For example, some (at least Kashmiri, Buriat, Classical Arabic, and Northwestern Mari) do not allow stress on a heavy syllable in final position. 19 Other, somewhat more complicated examples in dominant/recessive accentual systems such as Abzhuy Abkhaz (Spruit, 1986) and Cupeño (Hill and Hill 1968;Crowhurst, 1994) also exist. Primary sources for the languages in Figure 10 can be found in the works cited.
(quadrant (c)) by Street (1963) and Walker (1997), but as a default-to-opposite edge system (quadrant (d)) by Bosson (1964) and Poppe (1970 Figure 10: Languages with default-to-same edge and default-to-opposite edge stress systems. (In some systems, the special syllable may be characterized by some quality other than weight.) Given the above concerns, any observations based on information from so small a sample can only be provisional: languages with these patterns are not widely attested, and some of the languages in Figure 10 are related varieties (spoken in regions of the former Soviet Union). Nonetheless, we observe that default-to-same patterns predominate over default-to-opposite patterns in this set, and the main gap in the typology is in quadrant (b): patterns with a right-edge QI and a left-edge QS rule seem to be less common.
Turning to the results of our study, it is not possible to infer a natural preference for left-or right-aligning stress in the QI context; participants learned the QI rule equally well, regardless of edge exposure. However, of the level of the system in which both the QI and QS rules operated, it is possible to say more. The decision patterns of 69.4% our participants (34 of 39) were consistent with a default-to-same edge pattern. Of the 25 participants who learned a left-edge QI rule, 22 inferred a left-edge QS pattern, when given the opportunity to do so. Only 3 of 24 participants with left-edge training inferred a default-to-opposite stress system. The picture was quite different for the participants who learned the right-edge QI rule. Twelve of these participants inferred a rightaligning, and 12 left-edge QS rule.
Summarizing, just over two-thirds of our participants showed patterns consistent with a default-to-same edge pattern, with the left QI/left QS option dominating. The smaller share of the participants were consistent with a default-to-opposite pattern. Interestingly, the asymmetry on this group did not turn out as one would expect, if the typological information reflected in Figure  10 were reliable: the default-to-opposite pattern most prevalent in our data was seen with participants who learned the right-edge QI rule and inferred a left-edge QS rule. 20 Taking our results as suggestive, and without making firm claims about universality, how could we think about asymmetries in the decision behavior of our participants in language processing terms? And how can we relate the default-to-same and default-to-opposite patterns we are discussing to notions for complexity in formal terms?
Beginning with the formal, it is worth noting that default-to-same patterns are easier to model in optimality-theoretic terms (Prince & Smolensky 2004) insofar as these analyses require only simple constraints and no ambiguity in constraint rankings. The left QI/left QS pattern requires a dominant constraint *StressedL, which punishes stress on light syllables, and requires the gradient alignment constraint Align-Left (assign a cost to every syllable standing between the main stressed syllable and the left word-edge) to be ranked above its mirror-image right-edge counterpart. The result of this constraint ranking is shown in Tables 5 and 6, for QI and QS forms, respectively.  Default to opposite edge patterns are more complex to analyze in OT because they require a dependency which can be modelled as a Boolean conjunction: *StressedL is conjoined with one of the alignment constraints and the conjunction is violated only when both conjuncts are violated. Align-Left and Align-Right are ranked below conjunction. Some replication of constraints and ambiguity in their rankings is involved because whichever of the alignment constraints is conjoined with *StressedL, its mirror image must dominate in the hierarchy below. Tables 8 and 9 model the analysis for the default-to-opposite pattern which was most consistent with the decision behavior for the participants in our study, the right QI/left QS pattern. We caution yet again that we cannot make too much of the difference between the lengths of the lists in Figure  10 and the decision patterns we observed in our study, given uncertainties about the extent to which the information in Figure 10 can be considered representative. One of our goals at the outset was to probe for whether learning asymmetries in our ALL study would reflect dependencies between quantity sensitivity and the aligning edge. One of our main questions was this: if participants learn a QI edge-aligning pattern, and are then given a task requiring them to infer the aligning edge for a QS pattern, will they choose the same edge, or will they choose the other edge? In the first (same) case, there is no obvious dependency between the two parameters, but in the second there would be, if adding a weight-sensitive dimension changed alignment preferences. We have seen in the OT models above that it is easier to model the nodependency (default-to-same edge) patterns, and harder to model the default-to-opposite patterns in which there are dependencies.
Asymmetries in the decision patterns of participants in our study leaned in favor of the nodependency default-to-same edge patterns, and this was overwhelmingly the case for participants who learned the left-edge QI rule. Furthermore, for the smaller number of participants whose response asymmetries were consistent with a default-to-opposite pattern, the QS rule was in most cases (12 or 15 participants) the left-aligning one. Tendencies observed in studies such as this one are probabilistic and as such, are not so easily modelled in optimality-theoretic terms. We can, perhaps, make sense of these tendencies if we turn again to the basic word segmentation problem. A left-aligning, default-to-same edge bias fits what we have said earlier about the utility of prominence cues (stress) in signaling new words. The ability to learn a right-edge QI rule (as our participants did) confirms that humans can use prominence cues demarcatively to identify word endings (and of course, there are languages with demarcative final stress). What is interesting is that when given the opportunity to infer a QS edge aligning rule, half of our right-trained participants chose the left.
Packaging these observations in terms of the issue of dependencies, we found strong evidence for default-to-same edge patterns, with no particular dependencies between edge alignment and quantity sensitivity. The most prevalent tendency we observed was for the left-aligning default-tosame edge patterns. Where the asymmetries in our participants' decision patterns did seem to suggest a dependency (the default-to-opposite cases), the dependency favored a left-aligning QS rule. We believe that our results may suggest a dispreference for dependencies, in this type of case, but also strong evidence for positioning prominence cues closer to the beginnings of words. This outcome is not completely predicted by stress patterns in English, the participants' native language.
The focus of the current study has been on edge alignment in a set of prominence-based patterns which can be related to stress patterns occurring in languages. However, alignment is an important component of other types of phonological patterning. In one common type of tone pattern, H tone associates to a tone-bearing unit near the left edge of a verbal constituent and spreads rightward (e.g., Shona, Myers 1990). Tones do not typically align at the right and spread to the left. Mimetic palatalization in Japanese presents a segmental example of the pattern in Table  10, quadrant (d): in mimetic forms, the rightmost non-rhotic coronal C is palatalized, and if there are no coronals, then the leftmost C is palatalized (Mester & Itô, 1989). Future studies using the ALL paradigm could fruitfully investigate and compare the naturalness of edge alignment options in different phonological domains.
In conclusion, our findings indicate that adult English speakers can learn a prominence-based pattern that differs from stress in their native language, but that shared features with L1 stress patterns (in this case, aligning to the left) may make an exposure pattern easier to learn. The outcomes of our study also broadly suggest that while adults can learn either a left-or rightaligning quantity-insensitive prominence-based pattern equally well, there may be a greater tendency to infer a left-than a right-aligning quantity-sensitive stress pattern when learners have the opportunity to choose. In seeking to study learning asymmetries, our approach in this study has been to teach speakers of the same native language prominence-based patterns characterized by opposing features, and we believe that there is value in this approach. We believe that ultimately, learning asymmetries observed in a study like ours, if they can be teased apart from biases related to the native language, may be a source of information about natural stress patterning. In writing of learning asymmetries, or that one stress pattern may be more readily learned in the ALL context, we make no claims about the ease with which children might learn one pattern as opposed to another in the L1 context. What we are pointing out is that asymmetries in response behavior in psycholinguistic studies can potentially reveal cognitive biases which all humans have and which, over time, may have contributed to shaping the stress typology. This study represents a beginning in a more extended program of ALL research on stress patterns, and we expect that any firm claims of connection between cognitive biases and stress patterning in human languages must be informed by the results of similar studies conducted with speakers of diverse languages.        Results are averaged over the levels of Foil. Degrees-of-freedom method: Kenward-Roger P-value adjustment: Tukey method for comparing a family of 8 estimates