Wikisözlük
tkwiktionary
https://tk.wiktionary.org/wiki/Ba%C5%9F_Sahypa
MediaWiki 1.47.0-wmf.7
case-sensitive
Media
Ýörite
Çekişme
Ulanyjy
Ulanyjy çekişme
Wikisözlük
Wikisözlük çekişme
Faýl
Faýl çekişme
MediaWiki
MediaWiki çekişme
Şablon
Şablon çekişme
Ýardam
Ýardam çekişme
Kategoriýa
Kategoriýa çekişme
TimedText
TimedText talk
Module
Module talk
Event
Event talk
Wikisözlük:Interfeýs administratorlary
4
8163
27697
2026-06-21T14:00:55Z
Umarxon III
2840
Sahypa döretdi, mazmuny: ''''Interfeýs administratorlary''' — CSS we JavaScript bilen ýazylan Wikisözlüknyň hyzmat sahypalaryny (MediaWiki: Common.js, MediaWiki: Vector.css we [[Special:Gadjets]]) sanawynda görkezilen gadjet sahypalaryny redaktirlemäge hukugy bolan ulanyjylardyr. Bu sahypalarda Wikisözlüknyň mazmunynyň görkezilişini üýtgetmek, sahypalaryň özüni alyp barşyny üýtgetmek ýa-da çylşyrymly gurallary döretmek üçin ähli Wikisözlük redaktorlarynyň...'
27697
wikitext
text/x-wiki
'''Interfeýs administratorlary''' — CSS we JavaScript bilen ýazylan Wikisözlüknyň hyzmat sahypalaryny (MediaWiki: Common.js, MediaWiki: Vector.css we [[Special:Gadjets]]) sanawynda görkezilen gadjet sahypalaryny redaktirlemäge hukugy bolan ulanyjylardyr. Bu sahypalarda Wikisözlüknyň mazmunynyň görkezilişini üýtgetmek, sahypalaryň özüni alyp barşyny üýtgetmek ýa-da çylşyrymly gurallary döretmek üçin ähli Wikisözlük redaktorlarynyň we okyjylarynyň brauzerlerinde işleýän programma kody bar. Interfeýs dolandyryjylary, eýesiniň razylygy bilen ýa-da işleýşinde tehniki kynçylyklar bar bolsa, MediaWiki at giňişligindäki beýleki sahypalary, beýleki adamlaryň ýazgylaryny we stillerini redaktirläp bilerler.
== Baýdagy bellemegiň esaslary ==
Interfeýs administratorynyň baýdagy ýokary derejeli tehniki başarnyk, şeýle hem jemgyýetiň ynamynyň ýokary derejesini talap edýändigi sebäpli, dalaşgäriň administrator, býurokrat ýa-da inerener baýdagy bolmagy hökmanydyr. Aýratyn ýagdaýlarda, bu baýdagy beýleki dil bölümlerinde ýa-da gaznanyň taslamalarynda bu baýdagy bolan ýa-da ýeterlik wagtyň dowamynda beýleki dil bölümlerini ýa-da taslamalaryny tehniki taýdan hyzmat eden rus Wikisözlük gatnaşyjylaryna wagtlaýyn meseleleri çözmek üçin interfeýs dolandyryjy baýdagy bellenilip bilner.
Baýdak Wikisözlük sahypasynda bellendi: [[Wikisözlük:Interfeýs dolandyryjylaryna haýyş|Interfeýs dolandyryjylaryna haýyş]]. Programmada, gatnaşyjy "Interfeýs dolandyryjylarynyň tehniki başarnyklary" bölüminde görkezilen islendik usul bilen tehniki başarnyklaryny subut etmeli. Mundan başga-da, çekişme wagtynda dalaşgäre tehniki başarnyklaryny barlamaga gönükdirilen soraglar berilip bilner.
Ara alyp maslahatlaşmak azyndan bir hepde dowam edýär. Ondan soň, býurokratlar tehniki başarnyk we jemgyýetiň dalaşgäre bolan ynam derejesi baradaky argumentlere baha berýärler we baýdagyň bellenilmegi barada karar berýärler.
Wikisözlüknyň işleýşine ýaramaz täsir edýän ýa-da aç-açan ylalaşyga garşy edilen üýtgeşmeleri çözmek üçin baýdak wagtlaýynça býurokratlar tarapyndan bellenip bilner. Baýdak bu hereketleri tamamlamak üçin zerur döwür üçin bellenilýär.
== Interfeýs dolandyryjylarynyň tehniki başarnyklary ==
Interfeýs administratorynyň baýdagyny soraýan gatnaşyjynyň tehniki başarnyklary, ony almak üçin hökmany talapdyr. Başarnyk islendik görnüşde görkezilip bilner:
* Şablonlara, şahsy CSS we JS-de üýtgeşmeler;
* Mysallar ýa-da gadjetlerde, umumy CSS ýa-da JS-de taýýar ýerine ýetiriş bilen üýtgeşmeler üçin teklipler;
* Üçünji tarapyň çeşmeleri (mysal üçin, GitHub), şol sanda gatnaşyjynyň tehniki başarnyklaryny subut edip biljek we belli bir gatnaşyjy bilen aç-açan baglanyşykly beýleki fond taslamalarynda işlemek.
== Arzalary bir wagtda tabşyrmak ==
Bir gatnaşyjynyň diňe Interfeýs administratorynyň baýdagy we Dolandyryjy, Býurokrat ýa-da Inerener baýdagy bolup bilse-de, gatnaşyjy bu baýdaklar üçin arzalary bir wagtda tabşyryp biler. Şeýle-de bolsa, Interfeýs administratorynyň baýdagy üçin ýüztutma iň soňky hasaplanýar. Interfeýs dolandyryjysy we inerener baýdaklary üçin ýüztutma diskussiýa görnüşinde geçirilýändigi sebäpli, gatnaşyjy iki aýry arzanyň ýerine ýekeje kombinirlenen arzany iberip biler.
Dolandyryjy ýa-da Býurokrat baýdagy üçin ýüztutma şowsuz bolan halatynda, interfeýs administratorynyň baýdagy baradaky arzada bu baýdagy bellemek talaplarynyň berjaý edilýändigi görkezilen bolsa we dalaşgärlere Wikisözlük laýyklykda in Engineener baýdagynyň bellenmegine päsgel berýän deliller ýok bolsa: In Engineenerler # Baýdagy sylaglamak we aýyrmak (ylalaşyksyz hereketler we ş.m.).
[[Kategoriýa:Wikisözlük]]
f0ic7npnsozfev5h0bnyxdv9hnev9rq
Wikisözlük:Interfeýs dolandyryjylaryna haýyş
4
8164
27698
2026-06-21T14:03:47Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'Bu sahypa [[Wikisözlük:Interfeýs administratorlarý|interfeýs administratorynyň]] baýdagyny almak üçin haýyşlary galdyrar. <inputbox> type=commenttitle preload=Wikisözlük:Interfeýs dolandyryjylaryna haýyş/Şablon editintro=Şablon:Editintro/IASS page=Wikisözlük:Interfeýs dolandyryjylaryna haýyş default={{U|{{subst<noinclude></noinclude>:REVISIONUSER}}}} buttonlabel=Haýyş goýuň hidden=yes </inputbox>'
27698
wikitext
text/x-wiki
Bu sahypa [[Wikisözlük:Interfeýs administratorlarý|interfeýs administratorynyň]] baýdagyny almak üçin haýyşlary galdyrar.
<inputbox>
type=commenttitle
preload=Wikisözlük:Interfeýs dolandyryjylaryna haýyş/Şablon
editintro=Şablon:Editintro/IASS
page=Wikisözlük:Interfeýs dolandyryjylaryna haýyş
default={{U|{{subst<noinclude></noinclude>:REVISIONUSER}}}}
buttonlabel=Haýyş goýuň
hidden=yes
</inputbox>
77upav40mm16bw9155aa68vve24hvh4
27699
27698
2026-06-21T14:07:04Z
Umarxon III
2840
27699
wikitext
text/x-wiki
Bu sahypa [[Wikisözlük:Interfeýs administratorlary|interfeýs administratorynyň]] baýdagyny almak üçin haýyşlary galdyrar.
<inputbox>
type=commenttitle
preload=Wikisözlük:Interfeýs dolandyryjylaryna haýyş/Şablon
editintro=Şablon:Editintro/IASS
page=Wikisözlük:Interfeýs dolandyryjylaryna haýyş
default={{U|{{subst<noinclude></noinclude>:REVISIONUSER}}}}
buttonlabel=Haýyş goýuň
hidden=yes
</inputbox>
== [[Ulanyjy:Umarxon III|Umarxon III]] ==
Salam! Uzak wagt bäri Türkmen Wikipediýasynda peýdaly üýtgeşmeler girizýärin. Häzirki wagtda bu dilde Wikisözlükda gadjet ýok. Munuň üçin interfeýs administratorynyň hukuklaryny almaly. Meni goldarsyňyz diýip umyt edýärin. [[Ulanyjy:Umarxon III|Umarxon III]] ([[Ulanyjy çekişme:Umarxon III|gürleşme]]) 14:06, 21 iýun 2026 (UTC).
br7hmc5gbe89eisyzraeto32wd21fmn
Module:ar-stripdiacritics
828
8165
27700
2026-06-21T14:16:26Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local m_str_utils = require("Module:string utilities") local find = m_str_utils.find local gsub = m_str_utils.gsub local U = m_str_utils.char local taTwiil = U(0x640) local waSla = U(0x671) -- diacritics ordinarily removed by entry_name replacements local Arabic_diacritics = U(0x64B, 0x64C, 0x64D, 0x64E, 0x64F, 0x650, 0x651, 0x652, 0x656, 0x670, 0x6DF, 0x6E0, 0x6E1) -- replace alif waṣl with alif -- remove tatweel and diacritics: fathatan, dammatan, kasratan...'
27700
Scribunto
text/plain
local m_str_utils = require("Module:string utilities")
local find = m_str_utils.find
local gsub = m_str_utils.gsub
local U = m_str_utils.char
local taTwiil = U(0x640)
local waSla = U(0x671)
-- diacritics ordinarily removed by entry_name replacements
local Arabic_diacritics = U(0x64B, 0x64C, 0x64D, 0x64E, 0x64F, 0x650, 0x651, 0x652, 0x656, 0x670, 0x6DF, 0x6E0, 0x6E1)
-- replace alif waṣl with alif
-- remove tatweel and diacritics: fathatan, dammatan, kasratan, fatha,
-- damma, kasra, shadda, sukun, subscript alif, superscript (dagger) alif,
-- sifr mustadir, sifr mustatil, variant sukun
local replacements = {
from = {U(0x0671), "[" .. U(0x640) .. Arabic_diacritics .. "]"},
to = {U(0x0627)},
}
local export = {}
function export.stripDiacritics(text, lang, sc)
if text == waSla or find(text, "^" .. taTwiil .. "?[" .. Arabic_diacritics .. "]" .. "$") then
return text
end
for i, from in ipairs(replacements.from) do
local to = replacements.to[i] or ""
text = gsub(text, from, to)
end
return text
end
return export
2lgsh3j6uhu231mbh9l3b1ycea09psv
Module:ar-verb
828
8166
27701
2026-06-21T14:18:09Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} --[=[ This module implements {{ar-conj}} and provides the underlying conjugation functions for {{ar-verb}} (whose actual formatting is done in [[Module:ar-headword]]). Author: User:Benwing, from an early version (2013-2014) by User:Atitarev, User:ZxxZxxZ. ]=] --[=[ TERMINOLOGY: -- "slot" = A particular combination of tense/mood/person/number/etc. Example slot names for verbs are "past_1s" (past tense first-person singular), "juss_pass_3...'
27701
Scribunto
text/plain
local export = {}
--[=[
This module implements {{ar-conj}} and provides the underlying conjugation functions for {{ar-verb}}
(whose actual formatting is done in [[Module:ar-headword]]).
Author: User:Benwing, from an early version (2013-2014) by User:Atitarev, User:ZxxZxxZ.
]=]
--[=[
TERMINOLOGY:
-- "slot" = A particular combination of tense/mood/person/number/etc.
Example slot names for verbs are "past_1s" (past tense first-person singular), "juss_pass_3fp" (non-past jussive
passive third-person feminine plural) "ap" (active participle). Each slot is filled with zero or more forms.
-- "form" = The conjugated Arabic form representing the value of a given slot.
-- "lemma" = The dictionary form of a given Arabic term. For Arabic, normally the third person masculine singular past,
although other forms may be used if this form is missing (e.g. in passive-only verbs or verbs lacking the past).
]=]
--[=[
FIXME:
1. Finish unimplemented conjugation types. Only IX-final-weak left (extremely rare, possibly only one verb اِعْمَايَ
(according to Haywood and Nahmad p. 244, who are very specific about the irregular occurrence of alif + yā instead
of expected اِعْمَيَّ with doubled yā). Not in Hans Wehr. NOTE: Not true about this, cf. form IX اِرْعَوَى "to desist,
to repent, to see the light". Also note form XII اِخْضَوْضَرَ = form IX اِخْضَرَّ "to be or become green".
[DONE except for اِعْمَايَ]
2. Implement irregular verbs as special cases and recognize them, e.g.
-- laysa "to not be"; only exists in the past tense, no non-past, no imperative, no participles, no passive, no
verbal noun. Irregular alternation las-/lays-. [IMPLEMENTABLE USING OVERRIDES]
-- istaḥā yastaḥī "be ashamed of" -- this is complex according to Hans Wehr because there are two verbs, regular
istaḥyā yastaḥyī "to spare (someone)'s life" and irregular istaḥyā yastaḥyī "to be ashamed to face (someone)",
which is irregular because it has the alternate irregular form istaḥā yastaḥī which only applies to this meaning.
Currently we follow Haywood and Nahmad in saying that both varieties can be spelled istaḥyā/istaḥā/istaḥḥā, but we
should instead use a variant= param similar to حَيَّ to distinguish the two possibilities, and maybe not include
istaḥḥā.
-- ʿayya/ʿayiya yaʿayyu/yaʿyā "to not find the right way, be incapable of, stammer, falter, fall ill". This appears
to be a mixture of a geminate and final-weak verb. Unclear what the whole paradigm looks like. Do the
consonant-ending parts in the past follow the final-weak paradigm? Is it the same in the non-past? Or can you
conjugate the non-past fully as either geminate or final-weak?
-- اِنْمَحَى inmaḥā or يمَّحَى immaḥā "to be effaced, obliterated; to disappear, vanish" has irregular assimilation of inm-
to imm- as an alternative. inmalasa "to become smooth; to glide; to slip away; to escape" also has immalasa as an
alternative. The only other form VII verbs in Hans Wehr beginning with -m- are inmalaḵa "to be pulled out, torn
out, wrenched" and inmāʿa "to be melted, to melt, to dissolve", which are not listed with imm- alternatives, but
might have them; if so, we should handle this generally. [DONE]
-- يَرَعَ yaraʕa yariʕu "to be a coward, to be chickenhearted" as an alternative form of يَرِعَ yariʕa yayraʕu (as given in
Wehr). [IMPLEMENTABLE USING OVERRIDES]
3. Implement individual override parameters for each paradigm part. See Module:fro-verb for an example of how to do this
generally. Note that {{temp|ar-conj-I}} and other of the older templates already had such individual override params.
[DONE]
Irregular verbs already implemented:
-- [ḥayya/ḥayiya yaḥyā "live" -- behaves like a normal final-weak verb
(e.g. past first singular ḥayītu) except in the past-tense parts with
vowel-initial endings (all the third person except for the third feminine
plural). The normal singular and dual endings have -yiya- in them, which
compresses to -yya-, with the normal endings the less preferred ones.
In masculine third plural, expected ḥayū is replaced by ḥayyū by
analogy to the -yy- parts, and the regular form is not given as an
alternant in John Mace. Barron's 201 verbs appears to have the regular
ḥayū as the part, however. Note also that final -yā appears with tall
alif. This appears to be a spelling convention of Arabic, also applying
in ḥayyā (form II, "to keep (someone) alive") and 'aḥyā (form IV,
"to animate, revive, give birth to, give new life to").] -- implemented
-- [ittaxadha yattaxidhu "take"] -- implemented
-- [sa'ala yas'alu "ask" with alternative jussive/imperative yasal/sal] -- implemented
-- [ra'ā yarā "see"] -- implemented
-- ['arā yurī "show"] -- implemented
-- ['akala ya'kulu "eat" with imperative kul] -- implemented
-- ['axadha ya'xudhu "take" with imperative xudh] -- implemented
-- ['amara ya'muru "order" with imperative mur] -- implemented
--]=]
local force_cat = false -- set to true for debugging
-- if true, always maintain manual translit during processing, and compare against full translit at the end
local debug_translit = false
local lang = require("Module:languages").getByCode("ar")
local m_links = require("Module:links")
local m_string_utilities = require("Module:string utilities")
local m_table = require("Module:table")
local ar_utilities = require("Module:ar-utilities")
local ar_nominals = require("Module:ar-nominals")
local iut = require("Module:inflection utilities")
local put = require("Module:parse utilities")
local pron_qualifier_module = "Module:pron qualifier"
local list_to_text = mw.text.listToText
local rfind = m_string_utilities.find
local rsubn = m_string_utilities.gsub
local rmatch = m_string_utilities.match
local rsplit = m_string_utilities.split
local usub = m_string_utilities.sub
local ulen = m_string_utilities.len
local u = m_string_utilities.char
local unpack = unpack or table.unpack -- Lua 5.2 compatibility
local dump = mw.dumpObject
-- Within this module, conjugations are the functions that do the actual
-- conjugating by creating the parts of a basic verb.
-- They are defined further down.
local conjugations = {}
-- hamza variants
local HAMZA = u(0x0621) -- hamza on the line (stand-alone hamza) = ء
local HAMZA_ON_ALIF = u(0x0623)
local HAMZA_ON_W = u(0x0624)
local HAMZA_UNDER_ALIF = u(0x0625)
local HAMZA_ON_Y = u(0x0626)
local HAMZA_ANY = "[" .. HAMZA .. HAMZA_ON_ALIF .. HAMZA_UNDER_ALIF .. HAMZA_ON_W .. HAMZA_ON_Y .. "]"
local HAMZA_PH = u(0xFFF0) -- hamza placeholder
local BAD = u(0xFFF1)
local BORDER = u(0xFFF2)
-- diacritics
local A = u(0x064E) -- fatḥa
local AN = u(0x064B) -- fatḥatān (fatḥa tanwīn)
local U = u(0x064F) -- ḍamma
local UN = u(0x064C) -- ḍammatān (ḍamma tanwīn)
local I = u(0x0650) -- kasra
local IN = u(0x064D) -- kasratān (kasra tanwīn)
local SK = u(0x0652) -- sukūn = no vowel
local SH = u(0x0651) -- šadda = gemination of consonants
local DAGGER_ALIF = u(0x0670)
local DIACRITIC_ANY_BUT_SH = "[" .. A .. I .. U .. AN .. IN .. UN .. SK .. DAGGER_ALIF .. "]"
-- Pattern matching short vowels
local AIU = "[" .. A .. I .. U .. "]"
-- Pattern matching short vowels or sukūn
local AIUSK = "[" .. A .. I .. U .. SK .. "]"
-- Pattern matching any diacritics that may be on a consonant
local DIACRITIC = SH .. "?" .. DIACRITIC_ANY_BUT_SH
-- translit_patterns
local vowels = "aeiouāēīōū"
local NV = "[^" .. vowels .. "]"
local dia = {a = A, i = I, u = U}
local undia = {[A] = "a", [I] = "i", [U] = "u", ["-"] = "-"}
-- various letters and signs
local ALIF = u(0x0627) -- ʾalif = ا
local AMAQ = u(0x0649) -- ʾalif maqṣūra = ى
local AMAD = u(0x0622) -- ʾalif madda = آ
local TAM = u(0x0629) -- tāʾ marbūṭa = ة
local T = u(0x062A) -- tāʾ = ت
local HYPHEN = u(0x0640)
local N = u(0x0646) -- nūn = ن
local W = u(0x0648) -- wāw = و
local Y = u(0x064A) -- yāʾ = ي
local S = "س"
local M = "م"
local LRM = u(0x200e) -- left-to-right mark
-- common combinations
local AH = A .. TAM
local AT = A .. T
local AA = A .. ALIF
local AAMAQ = A .. AMAQ
local AAH = AA .. TAM
local AAT = AA .. T
local II = I .. Y
local UU = U .. W
local AY = A .. Y
local AW = A .. W
local AYSK = AY .. SK
local AWSK = AW .. SK
local NA = N .. A
local NI = N .. I
local AAN = AA .. N
local AANI = AA .. NI
local AYNI = AYSK .. NI
local AWNA = AWSK .. NA
local AYNA = AYSK .. NA
local AYAAT = AY .. AAT
local UNU = "[" .. UN .. U .. "]"
local MA = M .. A
local MU = M .. U
local TA = T .. A
local TU = T .. U
local _I = ALIF .. I
local _U = ALIF .. U
local translit_cache = {
-- hamza variants
[HAMZA] = "ʔ",
[HAMZA_ON_ALIF] = "ʔ",
[HAMZA_ON_W] = "ʔ",
[HAMZA_UNDER_ALIF] = "ʔ",
[HAMZA_ON_Y] = "ʔ",
[HAMZA_PH] = "ʔ",
-- diacritics
[A] = "a",
[AN] = "an",
[U] = "u",
[UN] = "un",
[I] = "i",
[IN] = "in",
[SK] = "",
[SH] = "*", -- handled specially
[DAGGER_ALIF] = "ā",
-- various letters and signs
[""] = "",
[ALIF] = BAD, -- we should never be transliterating ALIF by itself, as its translit in isolation is ambiguous
[AMAQ] = BAD,
[AMAD] = "ʔā",
[TAM] = "",
[T] = "t",
[N] = "n",
[W] = "w",
[Y] = "y",
[S] = "s",
[M] = "m",
[LRM] = "",
-- common combinations
[AH] = "a",
[AT] = "at",
[AA] = "ā",
[AAMAQ] = "ā",
[AAH] = "āh",
[AAT] = "āt",
[II] = "ī",
[UU] = "ū",
[AY] = "ay",
[AW] = "aw",
[AYSK] = "ay",
[AWSK] = "aw",
[NA] = "na",
[NI] = "ni",
[AAN] = "ān",
[AANI] = "āni",
[AYNI] = "ayni",
[AWNA] = "awna",
[AYNA] = "ayna",
[AYAAT] = "ayāt",
[MA] = "ma",
[MU] = "mu",
[TA] = "ta",
[TU] = "tu",
[_I] = "i",
[_U] = "u",
}
local function transliterate(text)
local cached = translit_cache[text]
if cached then
if cached == BAD then
error(("Internal error: Unable to transliterate %s because explicitly marked as BAD"):format(text))
end
return cached
end
local tr = (lang:transliterate(text))
if not tr then
error(("Internal error: Unable to transliterate: %s"):format(text))
end
translit_cache[text] = tr
return tr
end
local all_person_number_list = {
"1s",
"2ms",
"2fs",
"3ms",
"3fs",
"2d",
"3md",
"3fd",
"1p",
"2mp",
"2fp",
"3mp",
"3fp"
}
local function make_person_number_slot_accel_list(list)
local slot_accel_list = {}
return slot_accel_list
end
local imp_person_number_list = {}
for _, pn in ipairs(all_person_number_list) do
if pn:find("^2") then
table.insert(imp_person_number_list, pn)
end
end
local passive_types = m_table.listToSet {
"pass", -- verb has both active and passive
"ipass", -- verb is active with impersonal passive
"nopass", -- verb is active-only
"onlypass", -- verb is passive-only
"onlypass-impers", -- verb itself is impersonal, meaning passive-only with impersonal passive
}
local indicator_flags = m_table.listToSet {
"nopast", "no_nonpast", "noimp",
"nocat", -- don't categorize or include annotations about this; useful in suppletive parts of verbs
"reduced", -- verb has assimilation/reduction of initial coronals
"altgem", -- form X with alternative past geminate forms with final-weak endings
}
export.potential_lemma_slots = {"past_3ms", "past_pass_3ms", "ind_3ms", "ind_pass_3ms", "imp_2ms"}
export.unsettable_slots = {}
for _, potential_lemma_slot in ipairs(export.potential_lemma_slots) do
table.insert(export.unsettable_slots, potential_lemma_slot .. "_linked")
end
-- We don't set the active participle directly for form I because we don't want stative verbs (with past vowel i or u)
-- to default to فَاعِل. Instead we set the special slot 'ap1' and later copy it to 'ap' for non-stative verbs. The user
-- meanwhile can explicitly request the فَاعِل form for active participles for stative verbs using `ap:+`.
table.insert(export.unsettable_slots, "ap1") -- primary default فَاعِل for form I active participles
table.insert(export.unsettable_slots, "ap2") -- secondary default فَعِيل for form I active participles (stative I)
table.insert(export.unsettable_slots, "ap3") -- secondary default فَعِل for form I active participles (stative II)
table.insert(export.unsettable_slots, "apcd") -- secondary default أَفْعَل for form I active participles (color/defect)
table.insert(export.unsettable_slots, "apan") -- secondary default فَعْلَان for form I active participles (in -ān)
table.insert(export.unsettable_slots, "pp2") -- secondary default فَعِيل for form I passive participles (same as ap2)
table.insert(export.unsettable_slots, "vn2") -- secondary default فِعَال for form III verbal nouns
export.unsettable_slots_set = m_table.listToSet(export.unsettable_slots)
local default_indicator_to_active_participle_slot = {
["+"] = "ap1",
["++"] = "ap2",
["+++"] = "ap3",
["+cd"] = "apcd",
["+an"] = "apan",
}
local slots_that_may_be_uncertain = {
vn = "verbal noun",
ap = "active participle",
}
-- Initialize all the slots for which we generate forms.
local function add_slots(alternant_multiword_spec)
alternant_multiword_spec.verb_slots = {
{"ap", "act|part"},
{"pp", "pass|part"},
{"vn", "vnoun"},
}
for _, unsettable_slot in ipairs(export.unsettable_slots) do
table.insert(alternant_multiword_spec.verb_slots, {unsettable_slot, "-"})
end
-- Add entries for a slot with person/number variants.
-- `slot_prefix` is the prefix of the slot, typically specifying the tense/aspect.
-- `tag_suffix` is a string listing the set of inflection tags to add after the person/number tags.
-- `person_number_list` is a list of the person/number slot suffixes to add to `slot_prefix`.
local function add_personal_slot(slot_prefix, tag_suffix, person_number_list)
for _, persnum in ipairs(person_number_list) do
local slot = slot_prefix .. "_" .. persnum
local accel = persnum:gsub("(.)", "%1|") .. tag_suffix
table.insert(alternant_multiword_spec.verb_slots, {slot, accel})
end
end
local tenses = {
{"past", "past|%s"},
{"ind", "non-past|%s|ind"},
{"sub", "non-past|%s|sub"},
{"juss", "non-past|%s|juss"},
}
for _, slot_accel in ipairs(tenses) do
local slot, accel = unpack(slot_accel)
for _, voice in ipairs {"act", "pass"} do
add_personal_slot(voice == "act" and slot or slot .. "_pass", accel:format(voice),
all_person_number_list)
end
end
add_personal_slot("imp", "imp", imp_person_number_list)
alternant_multiword_spec.verb_slots_map = {}
for _, slot_accel in ipairs(alternant_multiword_spec.verb_slots) do
local slot, accel = unpack(slot_accel)
alternant_multiword_spec.verb_slots_map[slot] = accel
end
end
local overridable_stems = {}
local slot_override_param_mods = {
footnote = {
item_dest = "footnotes",
store = "insert",
},
alt = {},
t = {
-- [[Module:links]] expects the gloss in "gloss".
item_dest = "gloss",
},
gloss = {},
g = {
-- [[Module:links]] expects the genders in "g". `sublist = true` automatically splits on comma (optionally
-- with surrounding whitespace).
item_dest = "genders",
sublist = true,
},
pos = {},
lit = {},
id = {},
-- Qualifiers and labels
q = {
type = "qualifier",
},
qq = {
type = "qualifier",
},
l = {
type = "labels",
},
ll = {
type = "labels",
},
}
local function generate_obj(formval, parse_err, prefix, is_slot_override)
local val, uncertain = formval:match("^(.*)(%?)$")
val = val or formval
uncertain = not not uncertain
local ar, translit = val:match("^(.*)//(.*)$")
if not ar then
ar = val
end
if ar == "" then
if uncertain then
ar = "?"
else
error(("Can't specify blank value for override for %s override '%s'"):format(
is_slot_override and "slot" or "stem", prefix))
end
end
return {form = ar, translit = translit, uncertain = uncertain}
end
local function parse_inline_modifiers(comma_separated_group, parse_err, prefix, is_slot_override)
local function this_generate_obj(formval, parse_err)
return generate_obj(formval, parse_err, prefix, is_slot_override)
end
return put.parse_inline_modifiers_from_segments {
group = comma_separated_group,
props = {
param_mods = slot_override_param_mods,
parse_err = parse_err,
generate_obj = this_generate_obj,
pre_normalize_modifiers = function(data)
local modtext = data.modtext
modtext = modtext:match("^(%[.*%])$")
if modtext then
return ("<footnote:%s>"):format(modtext)
end
return data.modtext
end,
},
}
end
local function allow_multiple_values_for_override(comma_separated_groups, data, is_slot_override)
local retvals = {}
for _, comma_separated_group in ipairs(comma_separated_groups) do
local retval
if is_slot_override then
retval = parse_inline_modifiers(comma_separated_group, data.parse_err)
else
retval = generate_obj(comma_separated_group[1], data.parse_err, data.prefix, is_slot_override)
retval.footnotes = data.fetch_footnotes(comma_separated_group)
end
table.insert(retvals, retval)
end
for _, form in ipairs(retvals) do
if form.form == "+" or default_indicator_to_active_participle_slot[form.form] then
if form.form ~= "+" and default_indicator_to_active_participle_slot[form.form] and not is_slot_override then
error(("Stem override '%s' cannot use %s to request a secondary default"):format(
data.prefix, form.form))
end
data.base.slot_override_uses_default[data.prefix] = true
end
end
for _, form in ipairs(retvals) do
if form.form == "-" then
data.base.slot_explicitly_missing[data.prefix] = true
break
end
end
if data.base.slot_explicitly_missing[data.prefix] then
for _, form in ipairs(retvals) do
if form.form ~= "-" then
data.parse_err(("For slot or stem '%s', saw both - and a value other than -, which isn't allowed"):
format(data.prefix))
end
end
return nil
end
return retvals
end
local function simple_choice(choices)
return function(separated_groups, data)
if #separated_groups > 1 then
data.parse_err("For spec '" .. data.prefix .. ":', only one value currently allowed")
end
if #separated_groups[1] > 1 then
data.parse_err("For spec '" .. data.prefix .. ":', no footnotes currently allowed")
end
local choice = separated_groups[1][1]
if not m_table.contains(choices, choice) then
data.parse_err("For spec '" .. data.prefix .. ":', saw value '" .. choice .. "' but expected one of '" ..
table.concat(choices, ",") .. "'")
end
return choice
end
end
for _, overridable_stem in ipairs {
"past",
"past_v",
"past_c",
"past_pass",
"past_pass_v",
"past_pass_c",
"nonpast",
"nonpast_v",
"nonpast_c",
"nonpast_pass",
"nonpast_pass_v",
"nonpast_pass_c",
"imp",
"imp_v",
"imp_c",
} do
overridable_stems[overridable_stem] = allow_multiple_values_for_override
end
overridable_stems.past_final_weak_vowel = simple_choice { "ay", "aw", "ī", "ū" }
overridable_stems.past_pass_final_weak_vowel = simple_choice { "ay", "aw", "ī", "ū" }
overridable_stems.nonpast_final_weak_vowel = simple_choice { "ā", "ī", "ū" }
overridable_stems.nonpast_pass_final_weak_vowel = simple_choice { "ā", "ī", "ū" }
-------------------------------------------------------------------------------
-- Utility functions --
-------------------------------------------------------------------------------
-- version of rsubn() that discards all but the first return value
local function rsub(term, foo, bar)
return (rsubn(term, foo, bar))
end
-- version of rsubn() that returns a 2nd argument boolean indicating whether a substitution was made.
local function rsubb(term, foo, bar)
local retval, nsubs = rsubn(term, foo, bar)
return retval, nsubs > 0
end
-- Concatenate one or more strings or form objects.
local function q(...)
local not_all_strings = debug_translit
local has_manual_translit = debug_translit
for i = 1, select("#", ...) do
local argt = select(i, ...)
if not argt then
error(("Internal error: Saw nil at index %s: %s"):format(i, dump({...})))
end
if type(argt) ~= "string" then
not_all_strings = true
if argt.translit then
has_manual_translit = true
break
end
end
end
if not not_all_strings then
-- just strings, concatenate directly
return table.concat({...})
end
local formvals = {}
local translit = has_manual_translit and {} or nil
local footnotes
for i = 1, select("#", ...) do
local argt = select(i, ...)
if type(argt) == "string" then
formvals[i] = argt
if has_manual_translit then
translit[i] = transliterate(argt)
end
else
formvals[i] = argt.form
if has_manual_translit then
translit[i] = argt.translit or transliterate(argt.form)
end
footnotes = iut.combine_footnotes(footnotes, argt.footnotes)
end
end
-- FIXME: Do we want to support other properties?
return {
form = table.concat(formvals),
translit = has_manual_translit and table.concat(translit) or nil,
footnotes = footnotes,
}
end
-- Return the formval associated with `rad` (a radical or past/non-past vowel, either a string or form object).
local function rget(rad)
if type(rad) == "string" then
return rad
elseif type(rad) == "table" then
return rad.form
else
error(("Internal error: Unexpected type for radical or past/non-past vowel: %s"):format(dump(rad)))
end
end
export.rget = rget -- for use in [[Module:ar-headword]]
-- Return the footnotes associated with `rad` (a radical or past/non-past vowel, either a string or form object).
local function rget_footnotes(rad)
if type(rad) == "string" then
return nil
elseif type(rad) == "table" then
return rad.footnotes
else
error(("Internal error: Unexpected type for radical or past/non-past vowel: %s"):format(dump(rad)))
end
end
-- Return true if the formval associated with `rad` (a radical or past/non-past vowel, either a string or form object)
-- is `val`.
local function req(rad, val)
return rget(rad) == val
end
-- Map `vow` (a past/non-past vowel, either a string or form object without translit) by passing the formval through
-- `fn`. Don't call this on radicals because they may have manual translit and it isn't clear how to handle that.
local function map_vowel(vow, fn)
if type(vow) == "string" then
return fn(vow)
elseif type(vow) == "table" then
return {form = fn(vow.form), footnotes = vow.footnotes}
else
error(("Internal error: Unexpected type for past/non-past vowel: %s"):format(dump(vow)))
end
end
local function get_radicals_3(vowel_spec)
return vowel_spec.rad1, vowel_spec.rad2, vowel_spec.rad3, vowel_spec.past, vowel_spec.nonpast
end
local function get_radicals_4(vowel_spec)
return vowel_spec.rad1, vowel_spec.rad2, vowel_spec.rad3, vowel_spec.rad4
end
local function is_final_weak(base, vowel_spec)
return vowel_spec.weakness == "final-weak" or base.form == "XV"
end
local function link_term(text, face, id)
return m_links.full_link({lang = lang, term = text, tr = "-", id = id}, face)
end
local function tag_text(text, tag, class)
return m_links.full_link({lang = lang, alt = text, tr = "-"})
end
local function track(page)
require("Module:debug/track")("ar-verb/" .. page)
return true
end
local function track_if_ar_conj(base, page)
if base.alternant_multiword_spec.source_template == "ar-conj" then
require("Module:debug/track")("ar-verb/" .. page)
end
return true
end
local function reorder_shadda(word)
-- shadda+short-vowel (including tanwīn vowels, i.e. -an -in -un) gets
-- replaced with short-vowel+shadda during NFC normalisation, which
-- MediaWiki does for all Unicode strings; however, it makes various
-- processes inconvenient, so undo it.
word = rsub(word, "(" .. DIACRITIC_ANY_BUT_SH .. ")" .. SH, SH .. "%1")
return word
end
-------------------------------------------------------------------------------
-- Basic functions to inflect tenses --
-------------------------------------------------------------------------------
local function skip_slot(base, slot, allow_overrides)
if base.slot_explicitly_missing[slot] then
return true
end
if not allow_overrides and base.slot_overrides[slot] and not base.slot_override_uses_default[slot] then
-- Skip any slots for which there are overrides, except those that request the default value using +, ++, etc.
return true
end
if base.passive == "nopass" and (slot == "pp" or slot:find("_pass")) then
return true
elseif base.passive == "onlypass" and slot ~= "pp" and slot ~= "vn" and not slot:find("_pass") then
return true
elseif base.passive == "ipass" and slot:find("_pass") and not slot:find("3ms") then
return true
elseif base.passive == "onlypass-impers" and slot ~= "pp" and slot ~= "vn" and (not slot:find("_pass") or
slot:find("_pass") and not slot:find("3ms")) then
return true
end
if base.nopast and slot:find("^past_") then
return true
end
if base.noimp and slot:find("^imp_") then
return true
end
if base.no_nonpast and (slot:find("^ind_") or slot:find("^sub_") or slot:find("^juss")) then
return true
end
return false
end
local function basic_combine_stem_ending(stem, ending)
return stem .. ending
end
local function basic_combine_stem_ending_tr(stem, ending)
return stem .. ending
end
-- Concatenate `prefixes`, `stems` and `endings` (any of which may be an abbreviate form list, i.e. strings, form
-- objects or lists of strings or form objects) and store into `slot`. If a user-supplied override exists for the slot,
-- nothing will happen unless `allow_overrides` is provided.
local function add3(base, slot, prefixes, stems, endings, allow_overrides)
if skip_slot(base, slot, allow_overrides) then
return
end
-- Optimization since the prefixes are almost always single strings.
if type(prefixes) == "string" then
local function do_combine_stem_ending(stem, ending)
return prefixes .. stem .. ending
end
local function do_combine_stem_ending_tr(stem, ending)
return transliterate(prefixes) .. stem .. ending
end
iut.add_forms(base.forms, slot, stems, endings, do_combine_stem_ending, transliterate,
do_combine_stem_ending_tr, base.form_footnotes)
else
iut.add_multiple_forms(base.forms, slot, {prefixes, stems, endings}, basic_combine_stem_ending, transliterate,
basic_combine_stem_ending_tr, base.form_footnotes)
end
end
-- Insert one or more forms in `form_or_forms` into `slot`. `form_or_forms` is an abbreviated form list (see comment at
-- top of [[Module:inflection utilities]]). If a user-supplied override exists for the slot, nothing will happen unless
-- `allow_overrides` is provided. BEWARE: One form object should never occur in two different slots, or twice in a given
-- slot; if taking a form object from an existing slot, make sure to shallowCopy() it.
local function insert_form_or_forms(base, slot, form_or_forms, allow_overrides, uncertain)
if not skip_slot(base, slot, allow_overrides) then
-- Some optimizations of the most common case of inserting a single string.
if type(form_or_forms) == "string" and not base.form_footnotes then
form_or_forms = {form = form_or_forms, uncertain = uncertain}
iut.insert_form(base.forms, slot, form_or_forms)
else
local list = iut.convert_to_general_list_form(form_or_forms, base.form_footnotes)
if uncertain then
for _, formobj in ipairs(list) do
formobj.uncertain = true
end
end
iut.insert_forms(base.forms, slot, list)
end
end
end
-- Insert `string_or_form` into both the ap2 and pp2 slots, shallowCopying a form object to make sure no form objects
-- occur in two slots.
local function insert_ap2_pp2(base, string_or_form)
insert_form_or_forms(base, "ap2", string_or_form)
if type(string_or_form) == "table" then
string_or_form = m_table.shallowCopy(string_or_form)
end
insert_form_or_forms(base, "pp2", string_or_form)
end
-- Convert `stemforms` (a string, a form object, or a list of strings and/or form objects) into "general form" (a list
-- of form objects) and map `fn` over the list of objects. `fn` is passed two arguments (form value and translit) and
-- should likewise return the new form value and translit. Footnotes will be preserved. FIXME: Preserve other metadata.
local function map_general(stemforms, fn)
return iut.map_forms(iut.convert_to_general_list_form(stemforms), fn)
end
-- Similar to map_general() except that `fn` should return a single value (one or more strings or form objects), instead
-- of two values (form value and translit), and the resulting value(s) from all calls to `fn` will be flattened to
-- construct the overall return value. Footnotes will be preserved. FIXME: Preserve other metadata.
local function flatmap_general(stemforms, fn)
return iut.flatmap_forms(iut.convert_to_general_list_form(stemforms), fn)
end
-- Given user-supplied stem overrides in `base`, construct any derived stem overrides (e.g. vowel-specific or
-- consonant-specific variants), and truncate initial y-/ي- in any non-past overrides.
local function construct_stems(base)
local stems = base.stem_overrides
stems.past_v = stems.past_v or stems.past
stems.past_c = stems.past_c or stems.past
stems.past_pass_v = stems.past_pass_v or stems.past_pass
stems.past_pass_c = stems.past_pass_c or stems.past_pass
stems.nonpast_v = stems.nonpast_v or stems.nonpast
stems.nonpast_c = stems.nonpast_c or stems.nonpast
stems.nonpast_pass_v = stems.nonpast_pass_v or stems.nonpast_pass
stems.nonpast_pass_c = stems.nonpast_pass_c or stems.nonpast_pass
stems.imp_v = stems.imp_v or stems.imp
stems.imp_c = stems.imp_c or stems.imp
local function truncate_nonpast_initial_cons(stem_type, form, translit)
if form == "+" then
return form, translit
end
if not form:find("^" .. Y) then
error(("Form value %s for stem type '%s' should begin with ي"):format(form, stem_type))
end
form = form:gsub("^" .. Y, "")
if translit then
if not translit:find("^y") then
error(("Translit value %s for stem type '%s' should begin with y"):format(translit, stem_type))
end
translit = translit:gsub("^y", "")
end
return form, translit
end
for _, nonpast_stem_type in ipairs { "nonpast_v", "nonpast_c", "nonpast_pass_v", "nonpast_pass_c" } do
if stems[nonpast_stem_type] then
stems[nonpast_stem_type] = map_general(stems[nonpast_stem_type], function(form, translit)
return truncate_nonpast_initial_cons(nonpast_stem_type, form, translit)
end)
end
end
end
-- Given user-specified overrides for stem `stemname`, return overrides with occurrences of + replaced by
-- `default_stem`. If no overrides, return `default_stem`, or {} if no default.
local function override_stem_if_needed(base, stemname, default_stem)
local overrides = base.stem_overrides[stemname]
if not overrides then
return default_stem or {}
end
return map_general(overrides, function(form, translit)
if form ~= "+" and default_indicator_to_active_participle_slot[form] then
error(("Stem overrides cannot use secondary default indicators but saw %s in stem override '%s'"):format(
form, stemname))
end
if form == "+" then
if translit then
error(("Cannot supply manual translit along with + for stem override '%s'"):format(stemname))
end
if not default_stem then
error(("Cannot use + for stem override '%s' because no default is available"):format(stemname))
end
if type(default_stem) ~= "string" then
error(("Internal error: Default stem for '%s' is not a string: %s"):format(stemname, dump(default_stem)))
end
return default_stem
end
return form, translit
end)
end
-------------------------------------------------------------------------------
-- Properties of different verbal forms --
-------------------------------------------------------------------------------
local allowed_vforms = {"I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX",
"X", "XI", "XII", "XIII", "XIV", "XV", "Iq", "IIq", "IIIq", "IVq"}
local allowed_vforms_set = m_table.listToSet(allowed_vforms)
local allowed_vforms_with_weakness = m_table.shallowCopy(allowed_vforms)
-- The user needs to be able to explicitly specify that a form-I verb (specifically one whose initial radical is و) is
-- sound. Cf. wajiʕa yawjaʕu (not #yajaʕu) "to ache, to hurt". In general, i~a and u~u verbs whose initial radical is و
-- seem to not assimilate the first radical; cf. وقح "to be shameless", variously waqaḥa~yaqiḥu, waquḥa~yawquḥu and
-- waqiḥa~yawqaḥu, whereas a~i verbs (wafaḍa~yafiḍu "to rush"), i~i verbs (wafiqa~yafiqu "to be proper, to be suitable")
-- and a~a verbs (waḍaʕa~yaḍaʕu "to set down, to place") do assimilate. But there are naturally exceptions, e.g.
-- waṭiʔa~yaṭaʔu "to tread, to trample"; wasiʕa~yasaʕu "to be spacious; to be well-off"; waṯiʔa~yaṯaʔu "to get bruised,
-- to be sprained". Also beware of waniya~yawnā "to be faint; to languish", which is sound in the first radical and
-- final-weak in the last radical. Nonetheless, the regularity of the patterns mentioned above suggest we should provide
-- them as defaults.
-- Note that there are other cases of unexpectedly sound verbs, e.g. izdawaja~yazdawiju "to be in pairs", layisa~yalyasu
-- "to be valiant, to be brave", ʔaḥwaja~yuḥwiju "to need", istahwana~yastahwinu "to consider easy", sawisa~yaswasu "to
-- be or become moth-eaten or worm-eaten" (vs. sāsa~yasūsu "to govern, to rule" from the same radicals), ʕawira~yaʕwaru
-- "to be one-eyed", istajwaba~yastajwibu "to interrogate", etc. But in these cases there is no need for explicit user
-- specification as the lemma itself specifies the unexpected soundness.
for _, form_with_weakness in ipairs { "I-sound", "I-assimilated", "none-sound", "none-hollow", "none-geminate",
"none-final-weak" } do
table.insert(allowed_vforms_with_weakness, form_with_weakness)
end
local allowed_vforms_with_weakness_set = m_table.listToSet(allowed_vforms_with_weakness)
local function vform_supports_final_weak(vform)
return vform ~= "XI" and vform ~= "XV" and vform ~= "IVq"
end
local function vform_supports_geminate(vform)
return vform == "I" or vform == "III" or vform == "IV" or vform == "VI" or vform == "VII" or vform == "VIII" or
vform == "X"
end
local function vform_supports_hollow(vform)
return vform == "I" or vform == "IV" or vform == "VII" or vform == "VIII" or vform == "X"
end
local function vform_probably_impersonal_passive(vform, weakness, past_vowel, nonpast_vowel)
return vform == "I" and req(past_vowel, I) or vform == "V" or vform == "VI" or vform == "X" or vform == "IIq"
end
local function vform_probably_full_passive(vform)
return vform == "II" or vform == "III" or vform == "IV" or vform == "Iq"
end
local function vform_probably_no_passive(vform, weakness, past_vowel, nonpast_vowel)
return vform == "I" and req(past_vowel, U) or vform == "VII" or vform == "IX" or
vform == "XI" or vform == "XII" or vform == "XIII" or vform == "XIV" or vform == "XV" or
vform == "IIIq" or vform == "IVq"
end
-- Active vforms II, III, IV, Iq use non-past prefixes in -u- instead of -a-.
local function prefix_vowel_from_vform(vform)
if vform == "II" or vform == "III" or vform == "IV" or vform == "Iq" then
return "u"
else
return "a"
end
end
-- True if the active non-past takes a-vocalization rather than i-vocalization in its last syllable.
local function vform_nonpast_a_vowel(vform)
return vform == "V" or vform == "VI" or vform == "XV" or vform == "IIq"
end
-- True if the `passive` spec indicates a passive-only verb.
local function is_passive_only(passive)
return passive == "onlypass" or passive == "onlypass-impers"
end
export.is_passive_only = is_passive_only -- for use in [[Module:ar-headword]]
-------------------------------------------------------------------------------
-- Properties of specific sounds --
-------------------------------------------------------------------------------
-- Is radical wāw (و) or yāʾ (ي)?
local function is_waw_ya(rad)
return req(rad, W) or req(rad, Y)
end
-- Check that radical is wāw (و) or yāʾ (ي), error if not
local function check_waw_ya(rad)
if not is_waw_ya(rad) then
error("Expecting weak radical: '" .. rget(rad) .. "' should be " .. W .. " or " .. Y)
end
end
-- Form-I verb حيّ or حيي and form-X verb استحيا or استحى
local function hayy_radicals(rad1, rad2, rad3)
return req(rad1, "ح") and req(rad2, Y) and is_waw_ya(rad3)
end
-- FUCK ME HARD. "Lua error at line 1514: main function has more than 200 local variables".
local function create_conjugations()
-------------------------------------------------------------------------------
-- Radicals associated with various irregular verbs --
-------------------------------------------------------------------------------
-- Form-I verb أخذ or form-VIII verb اتخذ
local function axadh_radicals(rad1, rad2, rad3)
return req(rad1, HAMZA) and req(rad2, "خ") and req(rad3, "ذ")
end
-- Form-I verb whose imperative has a reduced form: أكل and أخذ and أمر. Return "shortonly" if only
-- short-form imperatives exist (أكل and أخذ) or "shortlong" if long-form imperatives also exist (أمر);
-- they are used after a clitic like فَ and وَ.
local function reduced_imperative_verb(rad1, rad2, rad3)
return axadh_radicals(rad1, rad2, rad3) and "shortonly" or
req(rad1, HAMZA) and req(rad2, "ك") and req(rad3, "ل") and "shortonly" or
req(rad1, HAMZA) and req(rad2, "م") and req(rad3, "ر") and "shortlong"
end
-- Form-I verb رأى and form-IV verb أرى
local function raa_radicals(rad1, rad2, rad3)
return req(rad1, "ر") and req(rad2, HAMZA) and is_waw_ya(rad3)
end
-- Form-I verb سأل
local function saal_radicals(rad1, rad2, rad3)
return req(rad1, "س") and req(rad2, HAMZA) and req(rad3, "ل")
end
-- Form-I verb كان
local function kaan_radicals(rad1, rad2, rad3)
return req(rad1, "ك") and req(rad2, W) and req(rad3, N)
end
-------------------------------------------------------------------------------
-- Sets of past endings --
-------------------------------------------------------------------------------
-- The 13 endings of the sound/hollow/geminate past tense.
local past_endings = {
-- singular
SK .. TU, SK .. TA, SK .. "تِ", A, A .. "تْ",
--dual
SK .. "تُمَا", AA, A .. "تَا",
-- plural
SK .. "نَا", SK .. "تُمْ",
-- shadda + vowel diacritic ends up in the wrong order due to Unicode
-- bug, so keep them separate to avoid this
SK .. "تُن" .. SH .. A, UU .. ALIF, SK .. "نَ"
}
-- Make endings for final-weak past in -aytu or -awtu. AYAW is AY or AW as appropriate. Note that AA and AW are
-- global variables.
local function make_past_endings_ay_aw(ayaw, third_sg_masc)
return {
-- singular
ayaw .. SK .. TU, ayaw .. SK .. TA, ayaw .. SK .. "تِ",
third_sg_masc, A .. "تْ",
--dual
ayaw .. SK .. "تُمَا", ayaw .. AA, A .. "تَا",
-- plural
ayaw .. SK .. "نَا", ayaw .. SK .. "تُمْ",
-- shadda + vowel diacritic ends up in the wrong order due to Unicode
-- bug, so keep them separate to avoid this
ayaw .. SK .. "تُن" .. SH .. A, AW .. SK .. ALIF, ayaw .. SK .. "نَ"
}
end
-- past final-weak -aytu endings
local past_endings_ay = make_past_endings_ay_aw(AY, AAMAQ)
-- past final-weak -awtu endings
local past_endings_aw = make_past_endings_ay_aw(AW, AA)
-- used for alternative endings for form-X geminate verbs like اِسْتَمَرَّ
local past_endings_ay_12_person_only = {
-- singular
AY .. SK .. TU, AY .. SK .. TA, AY .. SK .. "تِ",
{}, {},
--dual
AY .. SK .. "تُمَا", {}, {},
-- plural
AY .. SK .. "نَا", AY .. SK .. "تُمْ",
-- shadda + vowel diacritic ends up in the wrong order due to Unicode
-- bug, so keep them separate to avoid this
AY .. SK .. "تُن" .. SH .. A, {}, {},
}
-- Make endings for final-weak past in -ītu or -ūtu. IIUU is ī or ū as appropriate. Note that AA and UU are global
-- variables.
local function make_past_endings_ii_uu(iiuu)
return {
-- singular
iiuu .. TU, iiuu .. TA, iiuu .. "تِ", iiuu .. A, iiuu .. A .. "تْ",
--dual
iiuu .. "تُمَا", iiuu .. AA, iiuu .. A .. "تَا",
-- plural
iiuu .. "نَا", iiuu .. "تُمْ",
-- shadda + vowel diacritic ends up in the wrong order due to Unicode
-- bug, so keep them separate to avoid this
iiuu .. "تُن" .. SH .. A, UU .. ALIF, iiuu .. "نَ"
}
end
-- past final-weak -ītu endings
local past_endings_ii = make_past_endings_ii_uu(II)
-- past final-weak -ūtu endings
local past_endings_uu = make_past_endings_ii_uu(UU)
-------------------------------------------------------------------------------
-- Sets of non-past prefixes and endings --
-------------------------------------------------------------------------------
local nonpast_prefix_consonants = {
-- singular
HAMZA, T, T, Y, T,
-- dual
T, Y, T,
-- plural
N, T, T, Y, Y
}
-- There are only five distinct endings in all non-past verbs. Make any set of non-past endings given these five
-- distinct endings.
local function make_nonpast_endings(null, fem, dual, pl, fempl)
return {
-- singular
null, null, fem, null, null,
-- dual
dual, dual, dual,
-- plural
null, pl, fempl, pl, fempl
}
end
-- endings for non-past indicative
local ind_endings = make_nonpast_endings(
U,
II .. NA,
AANI,
UU .. NA,
SK .. NA
)
-- Make the endings for non-past subjunctive/jussive, given the vowel diacritic used in "null" endings
-- (1s/2ms/3ms/3fs/1p).
local function make_sub_juss_endings(dia_null)
return make_nonpast_endings(
dia_null,
II,
AA,
UU .. ALIF,
SK .. NA
)
end
-- endings for non-past subjunctive
local sub_endings = make_sub_juss_endings(A)
-- endings for non-past jussive
local juss_endings = make_sub_juss_endings(SK)
-- endings for alternative geminate non-past jussive in -a; same as subjunctive
local juss_endings_alt_a = sub_endings
-- endings for alternative geminate non-past jussive in -i
local juss_endings_alt_i = make_sub_juss_endings(I)
-- Endings for final-weak non-past indicative in -ā. Note that AY, AW and AAMAQ are global variables.
local ind_endings_aa = make_nonpast_endings(
AAMAQ,
AYSK .. NA,
AY .. AANI,
AWSK .. NA,
AYSK .. NA
)
-- Make endings for final-weak non-past indicative in -ī or -ū; IIUU is ī or ū as appropriate. Note that II and UU
-- are global variables.
local function make_ind_endings_ii_uu(iiuu)
return make_nonpast_endings(
iiuu,
II .. NA,
iiuu .. AANI,
UU .. NA,
iiuu .. NA
)
end
-- endings for final-weak non-past indicative in -ī
local ind_endings_ii = make_ind_endings_ii_uu(II)
-- endings for final-weak non-past indicative in -ū
local ind_endings_uu = make_ind_endings_ii_uu(UU)
-- Endings for final-weak non-past subjunctive in -ā. Note that AY, AW, ALIF, AAMAQ are global variables.
local sub_endings_aa = make_nonpast_endings(
AAMAQ,
AYSK,
AY .. AA,
AWSK .. ALIF,
AYSK .. NA
)
-- Make endings for final-weak non-past subjunctive in -ī or -ū. IIUU is ī or ū as appropriate. Note that AA, II,
-- UU, ALIF are global variables.
local function make_sub_endings_ii_uu(iiuu)
return make_nonpast_endings(
iiuu .. A,
II,
iiuu .. AA,
UU .. ALIF,
iiuu .. NA
)
end
-- endings for final-weak non-past subjunctive in -ī
local sub_endings_ii = make_sub_endings_ii_uu(II)
-- endings for final-weak non-past subjunctive in -ū
local sub_endings_uu = make_sub_endings_ii_uu(UU)
-- endings for final-weak non-past jussive in -ā
local juss_endings_aa = make_nonpast_endings(
A,
AYSK,
AY .. AA,
AWSK .. ALIF,
AYSK .. NA
)
-- Make endings for final-weak non-past jussive in -ī or -ū. IU is short i or u, IIUU is long ī or ū as appropriate.
-- Note that AA, II, UU, ALIF are global variables.
local function make_juss_endings_ii_uu(iu, iiuu)
return make_nonpast_endings(
iu,
II,
iiuu .. AA,
UU .. ALIF,
iiuu .. NA
)
end
-- endings for final-weak non-past jussive in -ī
local juss_endings_ii = make_juss_endings_ii_uu(I, II)
-- endings for final-weak non-past jussive in -ū
local juss_endings_uu = make_juss_endings_ii_uu(U, UU)
-------------------------------------------------------------------------------
-- Sets of imperative endings --
-------------------------------------------------------------------------------
-- Extract the second person jussive endings to get corresponding imperative endings.
local function imperative_endings_from_jussive(endings)
return {endings[2], endings[3], endings[6], endings[10], endings[11]}
end
-- normal imperative endings
local imp_endings = imperative_endings_from_jussive(juss_endings)
-- alternative geminate imperative endings in -a
local imp_endings_alt_a = imperative_endings_from_jussive(juss_endings_alt_a)
-- alternative geminate imperative endings in -i
local imp_endings_alt_i = imperative_endings_from_jussive(juss_endings_alt_i)
-- final-weak imperative endings in -ā
local imp_endings_aa = imperative_endings_from_jussive(juss_endings_aa)
-- final-weak imperative endings in -ī
local imp_endings_ii = imperative_endings_from_jussive(juss_endings_ii)
-- final-weak imperative endings in -ū
local imp_endings_uu = imperative_endings_from_jussive(juss_endings_uu)
-------------------------------------------------------------------------------
-- Basic functions to inflect tenses --
-------------------------------------------------------------------------------
-- Add to `base` the inflections for the tense indicated by `tense` (the prefix in the slot names, e.g. 'past'
-- or 'juss_pass'), formed by combining the `prefixes`, `stems` and `endings`. Each of `prefixes`, `stems` and
-- `endings` is either a sequence of 5 (for the imperative) or 13 (for other tenses) abbreviated form lists (each of
-- which is either a string, a form object, or a list of strings and/or form objects; see
-- [[Module:inflection utilities]] for more info). Alternatively, any of `prefixes`, `stems` or `endings` can be a
-- single-element list containing an abbreviated form list, with an additional key `all_same` set to true, or (as a
-- special case) a single string; in the latter cases, the same value is used for all 5 or 13 slots. If existing
-- inflections already exist, they will be added to, not overridden. `pnums` is the list of person/number slot name
-- suffixes, which must match up with the elements in `prefixes`, `stems` and `endings` (i.e. 5 for imperative, 13
-- otherwise).
local function inflect_tense_1(base, tense, prefixes, stems, endings, pnums)
if not prefixes or not stems or not endings then
return
end
local function verify_affixes(affixname, affixes)
local function interr(msg)
error(("Internal error: For tense '%s', '%s' %s: %s"):format(tense, affixname, msg, dump(affixes)))
end
if type(affixes) == "string" then
-- do nothing
elseif type(affixes) ~= "table" then
interr("is not a table or string")
elseif affixes.all_same then
if #affixes ~= 1 then
interr(("with all_same = true should have length 1 but has length %s"):format(#affixes))
end
else
if #affixes ~= #pnums then
interr(("should have length %s but has length %s"):format(#pnums, #affixes))
end
end
end
verify_affixes("prefixes", prefixes)
verify_affixes("stems", stems)
verify_affixes("endings", endings)
local function get_affix(affixes, i)
if type(affixes) == "string" then
return affixes
elseif affixes.all_same then
return affixes[1]
else
return affixes[i]
end
end
for i, pnum in ipairs(pnums) do
local prefix = get_affix(prefixes, i)
local stem = get_affix(stems, i)
local ending = get_affix(endings, i)
local slot = tense .. "_" .. pnum
add3(base, slot, prefix, stem, ending)
end
end
-- Add to `base` the inflections for the tense indicated by `tense` (the prefix in the slot names, e.g. 'past'
-- or 'juss_pass'), formed by combining the `prefixes`, `stems` and `endings`. This is a simple wrapper around
-- inflect_tense_1() that applies to all tenses other than the imperative; see inflect_tense_1() for more
-- information about the parameters.
local function inflect_tense(base, tense, prefixes, stems, endings)
inflect_tense_1(base, tense, prefixes, stems, endings, all_person_number_list)
end
-- Like inflect_tense() but for the imperative, which has only five parts instead of 13 and no prefixes.
local function inflect_tense_imp(base, stems, endings)
inflect_tense_1(base, "imp", "", stems, endings, imp_person_number_list)
end
-------------------------------------------------------------------------------
-- Functions to inflect the past tense --
-------------------------------------------------------------------------------
-- Generate past verbs using specified vowel and consonant stems; works for sound, assimilated, hollow, and geminate
-- verbs, active and passive.
local function past_2stem_conj(base, tense, v_stem, c_stem, footnote_12)
local passive = tense:find("_pass") and "_pass" or ""
-- Override stems with user-specified stems if available.
v_stem = override_stem_if_needed(base, "past" .. passive .. "_v", v_stem)
local c_stem_12 = c_stem
if footnote_12 then
c_stem_12 = iut.combine_form_and_footnotes(c_stem_12, footnote_12)
end
c_stem_12 = override_stem_if_needed(base, "past" .. passive .. "_c", c_stem_12)
local c_stem_3 = override_stem_if_needed(base, "past" .. passive .. "_c", c_stem)
inflect_tense(base, tense, "", {
-- singular
c_stem_12, c_stem_12, c_stem_12, v_stem, v_stem,
--dual
c_stem_12, v_stem, v_stem,
-- plural
c_stem_12, c_stem_12, c_stem_12, v_stem, c_stem_3
}, past_endings)
end
-- Generate past verbs using single specified stem; works for sound and assimilated verbs, active and passive.
local function past_1stem_conj(base, tense, stem)
past_2stem_conj(base, tense, stem, stem)
end
-------------------------------------------------------------------------------
-- Functions to inflect non-past tenses --
-------------------------------------------------------------------------------
-- Generate non-past conjugation, with two stems, for vowel-initial and consonant-initial endings, respectively.
-- Useful for active and passive; for all forms; for all weaknesses (sound, assimilated, hollow, final-weak and
-- geminate) and for all types of non-past (indicative, subjunctive, jussive) except for the imperative. (There is a
-- separate wrapper function below for geminate jussives because they have three alternants.) Both stems may be the
-- same, e.g. for sound verbs.
-- `prefix_vowel` will be either "a" or "u". `endings` should be an array of 13 items. If `endings` is nil or
-- omitted, infer the endings from the tense. If `jussive` is true, or `endings` is nil and `tense` indicatives
-- jussive, use the jussive pattern of vowel/consonant stems (different from the normal ones).
local function nonpast_2stem_conj(base, tense, prefix_vowel, v_stem, c_stem, endings, jussive)
local passive = tense:find("_pass") and "_pass" or ""
-- Override stems with user-specified stems if available.
v_stem = override_stem_if_needed(base, "nonpast" .. passive .. "_v",
v_stem and q(dia[prefix_vowel], v_stem) or nil)
c_stem = override_stem_if_needed(base, "nonpast" .. passive .. "_c",
c_stem and q(dia[prefix_vowel], c_stem) or nil)
if not endings then
if tense:find("^ind") then
endings = ind_endings
elseif tense:find("^sub") then
endings = sub_endings
elseif tense:find("^juss") then
jussive = true
endings = juss_endings
else
error("Internal error: Unrecognized tense '" .. tense .."'")
end
end
if not jussive then
inflect_tense(base, tense, nonpast_prefix_consonants, {
-- singular
v_stem, v_stem, v_stem, v_stem, v_stem,
--dual
v_stem, v_stem, v_stem,
-- plural
v_stem, v_stem, c_stem, v_stem, c_stem
}, endings)
else
inflect_tense(base, tense, nonpast_prefix_consonants, {
-- singular
-- 'adlul, tadlul, tadullī, yadlul, tadlul
c_stem, c_stem, v_stem, c_stem, c_stem,
--dual
-- tadullā, yadullā, tadullā
v_stem, v_stem, v_stem,
-- plural
-- nadlul, tadullū, tadlulna, yadullū, yadlulna
c_stem, v_stem, c_stem, v_stem, c_stem
}, endings)
end
end
-- Generate non-past conjugation with one stem (no distinct stems for vowel-initial and consonant-initial endings).
-- See nonpast_2stem_conj().
local function nonpast_1stem_conj(base, tense, prefix_vowel, stem, endings, jussive)
nonpast_2stem_conj(base, tense, prefix_vowel, stem, stem, endings, jussive)
end
-- Generate active/passive jussive geminative. There are three alternants, two with terminations -a and -i and one
-- in a null termination with a distinct pattern of vowel/consonant stem usage. See nonpast_2stem_conj() for a
-- description of the arguments.
local function jussive_gem_conj(base, tense, prefix_vowel, v_stem, c_stem)
-- alternative in -a
nonpast_2stem_conj(base, tense, prefix_vowel, v_stem, c_stem, juss_endings_alt_a)
-- alternative in -i
nonpast_2stem_conj(base, tense, prefix_vowel, v_stem, c_stem, juss_endings_alt_i)
-- alternative in -null; requires different combination of v_stem and
-- c_stem since the null endings require the c_stem (e.g. "tadlul" here)
-- whereas the corresponding endings above in -a or -i require the v_stem
-- (e.g. "tadulla, tadulli" above)
nonpast_2stem_conj(base, tense, prefix_vowel, v_stem, c_stem, juss_endings, "jussive")
end
-------------------------------------------------------------------------------
-- Functions to inflect the imperative --
-------------------------------------------------------------------------------
-- Generate imperative conjugation, with two stems, for vowel-initial and consonant-initial endings, respectively.
-- Useful for all forms, and for all weaknesses other than final-weak. Note that the two stems may be the same
-- (specifically for sound and assimilated verbs). If `endings` is nil or omitted, use `imp_endings`. If `alt_gem`
-- is specified, use the pattern of vowel and consonant stems appropriate for the alternative geminate imperatives
-- that use a null ending of -a or -i instead of an empty ending.
local function make_2stem_imperative(base, v_stem, c_stem, endings, alt_gem)
endings = endings or imp_endings
-- Override stems with user-specified stems if available.
v_stem = override_stem_if_needed(base, "imp_v", v_stem)
c_stem = override_stem_if_needed(base, "imp_c", c_stem)
if alt_gem then
inflect_tense_imp(base, {v_stem, v_stem, v_stem, v_stem, c_stem}, endings)
else
inflect_tense_imp(base, {c_stem, v_stem, v_stem, v_stem, c_stem}, endings)
end
end
-- Generate imperative parts for sound or assimilated verbs.
local function make_1stem_imperative(base, stem)
make_2stem_imperative(base, stem, stem)
end
-- Generate imperative parts for geminate verbs form I (also IV, VII, VIII, X).
local function make_gem_imperative(base, v_stem, c_stem)
make_2stem_imperative(base, v_stem, c_stem, imp_endings_alt_a, "alt gem")
make_2stem_imperative(base, v_stem, c_stem, imp_endings_alt_i, "alt gem")
make_2stem_imperative(base, v_stem, c_stem)
end
-------------------------------------------------------------------------------
-- Functions to inflect entire verbs --
-------------------------------------------------------------------------------
-- Generate finite parts of a sound verb (also works for assimilated verbs) from five stems (past and non-past,
-- active and passive, plus imperative) plus the prefix vowel in the active non-past ("a" or "u").
local function make_sound_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem,
prefix_vowel)
past_1stem_conj(base, "past", past_stem)
past_1stem_conj(base, "past_pass", past_pass_stem)
nonpast_1stem_conj(base, "ind", prefix_vowel, nonpast_stem)
nonpast_1stem_conj(base, "sub", prefix_vowel, nonpast_stem)
nonpast_1stem_conj(base, "juss", prefix_vowel, nonpast_stem)
nonpast_1stem_conj(base, "ind_pass", "u", nonpast_pass_stem)
nonpast_1stem_conj(base, "sub_pass", "u", nonpast_pass_stem)
nonpast_1stem_conj(base, "juss_pass", "u", nonpast_pass_stem)
make_1stem_imperative(base, imp_stem)
end
local function past_final_weak_endings_from_vowel(vowel)
if vowel == "ay" then
return past_endings_ay
elseif vowel == "aw" then
return past_endings_aw
elseif vowel == "ī" then
return past_endings_ii
elseif vowel == "ū" then
return past_endings_uu
elseif not vowel then
return nil
else
error(("Internal error: Unrecognized past final-weak vowel spec '%s'"):format(vowel))
end
end
local function nonpast_final_weak_endings_from_vowel(vowel)
if vowel == "ā" then
return ind_endings_aa, sub_endings_aa, juss_endings_aa, imp_endings_aa
elseif vowel == "ī" then
return ind_endings_ii, sub_endings_ii, juss_endings_ii, imp_endings_ii
elseif vowel == "ū" then
return ind_endings_uu, sub_endings_uu, juss_endings_uu, imp_endings_uu
elseif not vowel then
return nil
else
error(("Internal error: Unrecognized non-past final-weak vowel spec '%s'"):format(vowel))
end
end
-- Generate finite parts of a final-weak verb from five stems (past and non-past, active and passive, plus
-- imperative), the past active ending vowel (ay, aw, ī or ū), the non-past active ending vowel (ā, ī or ū) and the
-- prefix vowel in the active non-past (a or u).
local function make_final_weak_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem,
past_ending_vowel, nonpast_ending_vowel, prefix_vowel)
past_stem = override_stem_if_needed(base, "past", past_stem)
past_pass_stem = override_stem_if_needed(base, "past_pass", past_pass_stem)
-- Don't call override_stem_if_needed() here for non-past stems; it's called in nonpast_2stem_conj().
imp_stem = override_stem_if_needed(base, "imp", imp_stem)
-- + not supported for ending vowel overrides
past_ending_vowel = base.stem_overrides.past_final_weak_vowel or past_ending_vowel
local past_pass_ending_vowel = base.stem_overrides.past_pass_final_weak_vowel or "ī"
nonpast_ending_vowel = base.stem_overrides.nonpast_final_weak_vowel or nonpast_ending_vowel
local nonpast_pass_ending_vowel = base.stem_overrides.nonpast_pass_final_weak_vowel or "ā"
local past_endings = past_final_weak_endings_from_vowel(past_ending_vowel)
local past_pass_endings = past_final_weak_endings_from_vowel(past_pass_ending_vowel)
local ind_endings, sub_endings, juss_endings, imp_endings =
nonpast_final_weak_endings_from_vowel(nonpast_ending_vowel)
local ind_pass_endings, sub_pass_endings, juss_pass_endings =
nonpast_final_weak_endings_from_vowel(nonpast_pass_ending_vowel)
inflect_tense(base, "past", "", {past_stem, all_same = 1}, past_endings)
inflect_tense(base, "past_pass", "", {past_pass_stem, all_same = 1}, past_pass_endings)
nonpast_1stem_conj(base, "ind", prefix_vowel, nonpast_stem, ind_endings)
nonpast_1stem_conj(base, "sub", prefix_vowel, nonpast_stem, sub_endings)
nonpast_1stem_conj(base, "juss", prefix_vowel, nonpast_stem, juss_endings)
nonpast_1stem_conj(base, "ind_pass", "u", nonpast_pass_stem, ind_pass_endings)
nonpast_1stem_conj(base, "sub_pass", "u", nonpast_pass_stem, sub_pass_endings)
nonpast_1stem_conj(base, "juss_pass", "u", nonpast_pass_stem, juss_pass_endings)
inflect_tense_imp(base, {imp_stem, all_same = 1}, imp_endings)
end
-- Generate finite parts of an augmented (form II+) final-weak verb from five stems (past and non-past, active and
-- passive, plus imperative) plus the prefix vowel in the active non-past ("a" or "u") and a flag indicating if it
-- behaves like a form V/VI verb in taking non-past endings in -ā instead of -ī.
local function make_augmented_final_weak_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem,
imp_stem, prefix_vowel, form56)
make_final_weak_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem, "ay",
form56 and "ā" or "ī", prefix_vowel)
end
-- Generate finite parts of an augmented (form II+) sound or final-weak verb, given:
-- * `base` (conjugation data structure);
-- * `vowel_spec` (radicals, weakness);
-- * `past_stem_base` (active past stem minus last syllable (= -al or -ā));
-- * `nonpast_stem_base` (non-past stem minus last syllable (= -al/-il or -ā/-ī);
-- * `past_pass_stem_base` (passive past stem minus last syllable (= -il or -ī));
-- * `vn` (verbal noun).
local function make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base,
past_pass_stem_base, vn)
insert_form_or_forms(base, "vn", vn)
local lastrad = base.quadlit and vowel_spec.rad4 or vowel_spec.rad3
local final_weak = is_final_weak(base, vowel_spec)
local prefix_vowel = prefix_vowel_from_vform(base.verb_form)
local form56 = vform_nonpast_a_vowel(base.verb_form)
local a_base_suffix = final_weak and "" or q(A, lastrad)
local i_base_suffix = final_weak and "" or q(I, lastrad)
-- past and non-past stems, active and passive
local past_stem = q(past_stem_base, a_base_suffix)
-- In forms 5 and 6, non-past has /a/ as last stem vowel in the non-past
-- in both active and passive, but /i/ in the active participle and /a/
-- in the passive participle. Elsewhere, consistent /i/ in active non-past
-- and participle, consistent /a/ in passive non-past and participle.
-- Hence, forms 5 and 6 differ only in the non-past active (but not
-- active participle), so we have to split the finite non-past stem and
-- active participle stem.
local nonpast_stem = q(nonpast_stem_base, form56 and a_base_suffix or i_base_suffix)
local ap_stem = q(nonpast_stem_base, i_base_suffix)
local past_pass_stem = q(past_pass_stem_base, i_base_suffix)
local nonpast_pass_stem = q(nonpast_stem_base, a_base_suffix)
-- imperative stem
local imp_stem = q(past_stem_base, form56 and a_base_suffix or i_base_suffix)
-- make parts
if final_weak then
make_augmented_final_weak_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem,
prefix_vowel, form56)
else
make_sound_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem, prefix_vowel)
end
-- active and passive participle
if final_weak then
insert_form_or_forms(base, "ap", q(MU, ap_stem, IN))
insert_form_or_forms(base, "pp", q(MU, nonpast_pass_stem, AN, AMAQ))
else
insert_form_or_forms(base, "ap", q(MU, ap_stem))
insert_form_or_forms(base, "pp", q(MU, nonpast_pass_stem))
end
end
-- Generate finite parts of a hollow or geminate verb from ten stems (vowel and consonant stems for each of past and
-- non-past, active and passive, plus imperative) plus the prefix vowel in the active non-past ("a" or "u"), plus a
-- flag indicating if we are a geminate verb.
local function make_hollow_geminate_verb(base, geminate, past_v_stem, past_c_stem, past_pass_v_stem,
past_pass_c_stem, nonpast_v_stem, nonpast_c_stem, nonpast_pass_v_stem, nonpast_pass_c_stem, imp_v_stem,
imp_c_stem, prefix_vowel, altgem_note)
past_2stem_conj(base, "past", past_v_stem, past_c_stem, altgem_note)
past_2stem_conj(base, "past_pass", past_pass_v_stem, past_pass_c_stem)
nonpast_2stem_conj(base, "ind", prefix_vowel, nonpast_v_stem, nonpast_c_stem)
nonpast_2stem_conj(base, "sub", prefix_vowel, nonpast_v_stem, nonpast_c_stem)
nonpast_2stem_conj(base, "ind_pass", "u", nonpast_pass_v_stem, nonpast_pass_c_stem)
nonpast_2stem_conj(base, "sub_pass", "u", nonpast_pass_v_stem, nonpast_pass_c_stem)
if geminate then
jussive_gem_conj(base, "juss", prefix_vowel, nonpast_v_stem, nonpast_c_stem)
jussive_gem_conj(base, "juss_pass", "u", nonpast_pass_v_stem, nonpast_pass_c_stem)
make_gem_imperative(base, imp_v_stem, imp_c_stem)
else
nonpast_2stem_conj(base, "juss", prefix_vowel, nonpast_v_stem, nonpast_c_stem)
nonpast_2stem_conj(base, "juss_pass", "u", nonpast_pass_v_stem, nonpast_pass_c_stem)
make_2stem_imperative(base, imp_v_stem, imp_c_stem)
end
end
-- Generate finite parts of an augmented (form II+) hollow verb, given:
-- * `base` (conjugation data structure);
-- * `vowel_spec` (radicals, weakness);
-- * `past_stem_base` (invariable part of active past stem);
-- * `nonpast_stem_base` (invariable part of nonpast stem);
-- * `past_pass_stem_base` (invariable part of passive past stem);
-- * `vn` (verbal noun).
local function make_augmented_hollow_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base,
vn)
insert_form_or_forms(base, "vn", vn)
local lastrad = base.quadlit and vowel_spec.rad4 or vowel_spec.rad3
local form410 = base.verb_form == "IV" or base.verb_form == "X"
local prefix_vowel = prefix_vowel_from_vform(base.verb_form)
local a_base_suffix_v, a_base_suffix_c
local i_base_suffix_v, i_base_suffix_c
a_base_suffix_v = q(AA, lastrad) -- 'af-āl-a, inf-āl-a
a_base_suffix_c = q(A, lastrad) -- 'af-al-tu, inf-al-tu
i_base_suffix_v = q(II, lastrad) -- 'uf-īl-a, unf-īl-a
i_base_suffix_c = q(I, lastrad) -- 'uf-il-tu, unf-il-tu
-- past and non-past stems, active and passive, for vowel-initial and
-- consonant-initial endings
local past_v_stem = q(past_stem_base, a_base_suffix_v)
local past_c_stem = q(past_stem_base, a_base_suffix_c)
-- yu-f-īl-u, ya-staf-īl-u but yanf-āl-u, yaft-āl-u
local nonpast_v_stem = q(nonpast_stem_base, form410 and i_base_suffix_v or a_base_suffix_v)
local nonpast_c_stem = q(nonpast_stem_base, form410 and i_base_suffix_c or a_base_suffix_c)
local past_pass_v_stem = q(past_pass_stem_base, i_base_suffix_v)
local past_pass_c_stem = q(past_pass_stem_base, i_base_suffix_c)
local nonpast_pass_v_stem = q(nonpast_stem_base, a_base_suffix_v)
local nonpast_pass_c_stem = q(nonpast_stem_base, a_base_suffix_c)
-- imperative stem
local imp_v_stem = q(past_stem_base, form410 and i_base_suffix_v or a_base_suffix_v)
local imp_c_stem = q(past_stem_base, form410 and i_base_suffix_c or a_base_suffix_c)
-- make parts
make_hollow_geminate_verb(base, false, past_v_stem, past_c_stem, past_pass_v_stem,
past_pass_c_stem, nonpast_v_stem, nonpast_c_stem, nonpast_pass_v_stem,
nonpast_pass_c_stem, imp_v_stem, imp_c_stem, prefix_vowel)
-- active participle
insert_form_or_forms(base, "ap", q(MU, nonpast_v_stem))
-- passive participle
insert_form_or_forms(base, "pp", q(MU, nonpast_pass_v_stem))
end
-- Generate finite parts of an augmented (form II+) geminate verb, given:
-- * `base` (conjugation data structure);
-- * `vowel_spec` (radicals, weakness);
-- * `past_stem_base` (invariable part of active past stem; this and the stem bases below will end with a consonant
-- for forms IV, X, IVq, and a short vowel for the others);
-- * `nonpast_stem_base` (invariable part of nonpast stem);
-- * `past_pass_stem_base` (invariable part of passive past stem);
-- * `vn` (verbal noun);
-- * `altgem_note` (footnote to add to active past 1/2-person forms, when alternative forms are supplied [form X]).
local function make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base,
past_pass_stem_base, vn, altgem_note)
insert_form_or_forms(base, "vn", vn)
local vform = base.verb_form
local lastrad = base.quadlit and vowel_spec.rad4 or vowel_spec.rad3
local prefix_vowel = prefix_vowel_from_vform(vform)
local a_base_suffix_v, a_base_suffix_c
local i_base_suffix_v, i_base_suffix_c
if vform == "IV" or vform == "X" or vform == "IVq" then
a_base_suffix_v = q(A, lastrad, SH) -- 'af-all
a_base_suffix_c = q(SK, lastrad, A, lastrad) -- 'af-lal
i_base_suffix_v = q(I, lastrad, SH) -- yuf-ill
i_base_suffix_c = q(SK, lastrad, I, lastrad) -- yuf-lil
else
a_base_suffix_v = q(lastrad, SH) -- fā-ll, infa-ll
a_base_suffix_c = q(lastrad, A, lastrad) -- fā-lal, infa-lal
i_base_suffix_v = q(lastrad, SH) -- yufā-ll, yanfa-ll
i_base_suffix_c = q(lastrad, I, lastrad) -- yufā-lil, yanfa-lil
end
-- past and non-past stems, active and passive, for vowel-initial and
-- consonant-initial endings
local past_v_stem = q(past_stem_base, a_base_suffix_v)
local past_c_stem = q(past_stem_base, a_base_suffix_c)
local nonpast_v_stem = q(nonpast_stem_base, vform_nonpast_a_vowel(vform) and a_base_suffix_v or i_base_suffix_v)
local nonpast_c_stem = q(nonpast_stem_base, vform_nonpast_a_vowel(vform) and a_base_suffix_c or i_base_suffix_c)
-- NOTE: Formerly had a comment that "vform III and VI passive past do not have contracted parts, only
-- uncontracted parts, which are added separately by those functions". This is based on Mace
-- "Arabic Verbs and Essential Grammar" (1999) entry 63 (continued), which shows passive ḥūjija but no ḥūjja;
-- but that is apparently a mistake, as (1) verb tables in other books do show contracted passive parts for
-- these forms; (2) there is no mention of such an exception on p. 99, which explains how geminate ("doubled")
-- verbs work (on the contrary, it says "The contracted and uncontracted pairs (see above) are found all
-- over Forms III and VI of the doubled verbs").
local past_pass_v_stem = q(past_pass_stem_base, i_base_suffix_v)
local past_pass_c_stem = q(past_pass_stem_base, i_base_suffix_c)
local nonpast_pass_v_stem = q(nonpast_stem_base, a_base_suffix_v)
local nonpast_pass_c_stem = q(nonpast_stem_base, a_base_suffix_c)
-- imperative stem
local imp_v_stem = q(past_stem_base, vform_nonpast_a_vowel(vform) and a_base_suffix_v or i_base_suffix_v)
local imp_c_stem = q(past_stem_base, vform_nonpast_a_vowel(vform) and a_base_suffix_c or i_base_suffix_c)
-- make parts
make_hollow_geminate_verb(base, "geminate", past_v_stem, past_c_stem, past_pass_v_stem,
past_pass_c_stem, nonpast_v_stem, nonpast_c_stem, nonpast_pass_v_stem,
nonpast_pass_c_stem, imp_v_stem, imp_c_stem, prefix_vowel, altgem_note)
-- active participle
insert_form_or_forms(base, "ap", q(MU, nonpast_v_stem))
-- passive participle
insert_form_or_forms(base, "pp", q(MU, nonpast_pass_v_stem))
end
-------------------------------------------------------------------------------
-- Conjugation functions for specific conjugation types --
-------------------------------------------------------------------------------
local function form_i_imp_stem_through_rad1(base, nonpast_vowel, rad1)
local imp_vowel = map_vowel(nonpast_vowel, function(vow)
if vow == A or vow == I then
return I
elseif vow == U then
return U
elseif not skip_slot(base, "imp_2ms") then
error(("Internal error: Non-past vowel %s isn't a, i, or u, should have been caught earlier"):format(
dump(nonpast_vowel)))
else
-- Passive-only; imperative won't ever be displayed so it doesn't matter.
return I
end
end)
-- Mace ("Arabic Verbs and Essentials of Grammar" p. 63: [https://archive.org/details/arabicverbsessen00john/page/62/mode/2up])
-- claims that initial hamza is assimilated/elided into a long vowel in the form-I imperative, but apparently
-- this isn't corrrect.
local vowel_on_alif = map_vowel(imp_vowel, function(vow)
return ALIF .. vow
end)
return q(vowel_on_alif, rad1, SK)
end
-- Implement form-I sound or assimilated verb. ASSIMILATED is true for assimilated verbs.
local function make_form_i_sound_assimilated_verb(base, vowel_spec, assimilated)
local rad1, rad2, rad3, past_vowel, nonpast_vowel = get_radicals_3(vowel_spec)
-- Verbal nouns (maṣādir) for form I are unpredictable and have to be supplied
-- past and non-past stems, active and passive
local past_stem = q(rad1, A, rad2, past_vowel, rad3)
local nonpast_stem = assimilated and q(rad2, nonpast_vowel, rad3) or
q(rad1, SK, rad2, nonpast_vowel, rad3)
local past_pass_stem = q(rad1, U, rad2, I, rad3)
local nonpast_pass_stem = q(rad1, SK, rad2, A, rad3)
-- imperative stem
-- check for irregular verb with reduced imperative (أَخَذَ or أَكَلَ or أَمَرَ)
local reducedimp = reduced_imperative_verb(rad1, rad2, rad3)
if reducedimp then
base.irregular = true
end
local imp_stem_suffix = q(rad2, nonpast_vowel, rad3)
local long_imp_stem_base = form_i_imp_stem_through_rad1(base, nonpast_vowel, rad1)
local short_imp_stem_base = ""
local imp_stem = q((assimilated or reducedimp) and "" or long_imp_stem_base, imp_stem_suffix)
-- make parts
make_sound_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem, "a")
if reducedimp == "shortlong" then
make_1stem_imperative(base, iut.combine_form_and_footnotes(q(long_imp_stem_base, imp_stem_suffix),
mw.getCurrentFrame():preprocess("[used especially with a clitic such as {{m|ar|فَ}} or {{m|ar|وَ}}]")))
end
-- Check for irregular verb سَأَلَ with alternative jussive and imperative. Calling this after make_sound_verb()
-- adds additional entries to the paradigm parts.
if saal_radicals(rad1, rad2, rad3) then
base.irregular = true
nonpast_1stem_conj(base, "juss", "a", "سَل")
nonpast_1stem_conj(base, "juss_pass", "u", "سَل")
make_1stem_imperative(base, "سَل")
end
-- Active participle.
insert_form_or_forms(base, "ap1", q(rad1, AA, rad2, I, rad3))
-- Insert alternative active participle (stative type I) فَعِيل. Since not all verbs have this, we require that
-- verbs that do have it specify it explicitly; a shortcut ++ is provided to make this easier (e.g. <ap:++> to
-- indicate that the alternative form should be used for the active participle, <ap:+,++> to indicate that both
-- forms can be used, and <ap:-> to indicate that there is no active participle). The same form is used for
-- secondary default passive participle.
insert_ap2_pp2(base, q(rad1, A, rad2, II, rad3))
-- Active participle, stative type II فَعِل (+++).
insert_form_or_forms(base, "ap3", q(rad1, A, rad2, I, rad3))
-- Active participle, color/defect أَفْعَل (+cd).
insert_form_or_forms(base, "apcd", q(HAMZA, A, rad1, SK, rad2, A, rad3))
-- Active participle, -ān فَعْلَان (+an).
insert_form_or_forms(base, "apan", q(rad1, A, rad2, SK, rad3, AAN))
-- Passive participle.
insert_form_or_forms(base, "pp", q(MA, rad1, SK, rad2, UU, rad3))
end
conjugations["I-sound"] = function(base, vowel_spec)
make_form_i_sound_assimilated_verb(base, vowel_spec, false)
end
conjugations["none-sound"] = function(base, vowel_spec)
-- All default stems are nil.
make_sound_verb(base)
end
conjugations["none-hollow"] = function(base, vowel_spec)
-- All default stems are nil.
make_hollow_geminate_verb(base, false)
end
conjugations["none-geminate"] = function(base, vowel_spec)
-- All default stems are nil.
make_hollow_geminate_verb(base, "geminate")
end
conjugations["none-final-weak"] = function(base, vowel_spec)
-- All default stems are nil.
make_final_weak_verb(base)
end
conjugations["I-assimilated"] = function(base, vowel_spec)
make_form_i_sound_assimilated_verb(base, vowel_spec, "assimilated")
end
local function make_form_i_hayy_verb(base, vowel_spec)
-- Verbal nouns (maṣādir) for form I are unpredictable and have to be supplied
base.irregular = true
-- past and non-past stems, active and passive, and imperative stem
local past_c_stem = "حَيِي"
local past_v_stem_long = past_c_stem
local past_v_stem_short = "حَيّ"
local past_pass_c_stem = "حُيِي"
local past_pass_v_stem_long = past_pass_c_stem
local past_pass_v_stem_short = "حُيّ"
local nonpast_stem = "حْي"
local nonpast_pass_stem = nonpast_stem
local imp_stem = _I .. nonpast_stem
-- make parts
past_2stem_conj(base, "past", {}, past_c_stem)
past_2stem_conj(base, "past_pass", {}, past_pass_c_stem)
local variant = vowel_spec.variant or "both"
if variant == "short" or variant == "both" then
past_2stem_conj(base, "past", past_v_stem_short, {})
past_2stem_conj(base, "past_pass", past_pass_v_stem_short, {})
end
function inflect_long_variant(tense, long_stem, short_stem)
inflect_tense_1(base, tense, "",
{long_stem, long_stem, long_stem, long_stem, short_stem},
{past_endings[4], past_endings[5], past_endings[7], past_endings[8],
past_endings[12]},
{"3ms", "3fs", "3md", "3fd", "3mp"})
end
if variant == "long" or variant == "both" then
inflect_long_variant("past", past_v_stem_long, past_v_stem_short)
inflect_long_variant("past_pass", past_pass_v_stem_long, past_pass_v_stem_short)
end
nonpast_1stem_conj(base, "ind", "a", nonpast_stem, ind_endings_aa)
nonpast_1stem_conj(base, "sub", "a", nonpast_stem, sub_endings_aa)
nonpast_1stem_conj(base, "juss", "a", nonpast_stem, juss_endings_aa)
nonpast_1stem_conj(base, "ind_pass", "u", nonpast_pass_stem, ind_endings_aa)
nonpast_1stem_conj(base, "sub_pass", "u", nonpast_pass_stem, sub_endings_aa)
nonpast_1stem_conj(base, "juss_pass", "u", nonpast_pass_stem, juss_endings_aa)
inflect_tense_imp(base, {imp_stem, all_same = 1}, imp_endings_aa)
-- active and passive participles apparently do not exist for this verb
end
-- Implement form-I final-weak assimilated+final-weak verb. ASSIMILATED is true for assimilated verbs.
local function make_form_i_final_weak_verb(base, vowel_spec, assimilated)
local rad1, rad2, rad3, past_vowel, nonpast_vowel = get_radicals_3(vowel_spec)
-- حَيَّ or حَيِيَ is weird enough that we handle it as a separate function.
if hayy_radicals(rad1, rad2, rad3) then
make_form_i_hayy_verb(base, vowel_spec)
return
end
-- Verbal nouns (maṣādir) for form I are unpredictable and have to be supplied.
-- Past and non-past stems, active and passive, and imperative stem.
local past_stem = q(rad1, A, rad2)
local past_pass_stem = q(rad1, U, rad2)
local nonpast_stem, nonpast_pass_stem, imp_stem
if raa_radicals(rad1, rad2, rad3) then
base.irregular = true
nonpast_stem = rad1
nonpast_pass_stem = rad1
imp_stem = rad1
else
nonpast_pass_stem = q(rad1, SK, rad2)
if assimilated then
nonpast_stem = rad2
imp_stem = rad2
else
nonpast_stem = nonpast_pass_stem
imp_stem = q(form_i_imp_stem_through_rad1(base, nonpast_vowel, rad1), rad2)
end
end
-- Make parts.
local past_ending_vowel =
req(rad3, Y) and req(past_vowel, A) and "ay" or
req(rad3, W) and req(past_vowel, A) and "aw" or
req(past_vowel, I) and "ī" or "ū"
-- Try to preserve footnotes attached to the third radical and/or past and/or non-past vowels.
local past_footnotes = iut.combine_footnotes(rget_footnotes(rad3), rget_footnotes(past_vowel))
local nonpast_ending_vowel = req(nonpast_vowel, A) and "ā" or req(nonpast_vowel, I) and "ī" or "ū"
local nonpast_footnotes = iut.combine_footnotes(rget_footnotes(rad3), rget_footnotes(nonpast_vowel))
make_final_weak_verb(base,
iut.combine_form_and_footnotes(past_stem, past_footnotes),
iut.combine_form_and_footnotes(past_pass_stem, past_footnotes),
iut.combine_form_and_footnotes(nonpast_stem, nonpast_footnotes),
iut.combine_form_and_footnotes(nonpast_pass_stem, nonpast_footnotes),
iut.combine_form_and_footnotes(imp_stem, nonpast_footnotes),
past_ending_vowel, nonpast_ending_vowel, "a")
-- Active participle.
insert_form_or_forms(base, "ap1", q(rad1, AA, rad2, IN))
-- Active participle, stative type I فَعِيّ (++). FIXME: Is this correct when rad3 is W?
insert_ap2_pp2(base, q(rad1, A, rad2, II, SH))
-- Active participle, stative type II فَعٍ (+++). FIXME: Any examples of this to verify it's correct?
insert_form_or_forms(base, "ap3", q(rad1, A, rad2, IN))
-- Active participle, color/defect أَفْعَى (+cd).
insert_form_or_forms(base, "apcd", q(HAMZA, A, rad1, SK, rad2, AAMAQ))
-- Active participle, -ān فَعْيَان or فَعْوَان (+an).
-- FIXME: Any examples of this for both rad3 = W and y to verify it's correct?
insert_form_or_forms(base, "apan", q(rad1, A, rad2, SK, rad3, AAN))
-- Passive participle.
insert_form_or_forms(base, "pp", q(MA, rad1, SK, rad2, req(rad3, Y) and II or UU, SH))
end
conjugations["I-final-weak"] = function(base, vowel_spec)
make_form_i_final_weak_verb(base, vowel_spec, false)
end
conjugations["I-assimilated+final-weak"] = function(base, vowel_spec)
make_form_i_final_weak_verb(base, vowel_spec, "assimilated")
end
conjugations["I-hollow"] = function(base, vowel_spec)
local rad1, rad2, rad3, past_vowel, nonpast_vowel = get_radicals_3(vowel_spec)
-- In some sense, hollow vowels i~i and u~u are more "correct" than a~i and a~u, but the latter follow the
-- pattern of other form-I verbs, so we map i~i to a~i and u~u to a~u in infer_radicals(). Now however we have
-- to undo this to get the actual past vowel based on the non-past vowel.
if req(past_vowel, A) then
past_vowel = map_vowel(past_vowel, function(vow)
return req(nonpast_vowel, A) and I or rget(nonpast_vowel)
end)
end
local lengthened_nonpast = map_vowel(nonpast_vowel, function(vow)
return vow == U and UU or vow == I and II or AA
end)
-- Verbal nouns (maṣādir) for form I are unpredictable and have to be supplied.
-- active past stems - vowel (v) and consonant (c)
local past_v_stem = q(rad1, AA, rad3)
local past_c_stem = q(rad1, past_vowel, rad3)
-- active non-past stems - vowel (v) and consonant (c)
local nonpast_v_stem = q(rad1, lengthened_nonpast, rad3)
local nonpast_c_stem = q(rad1, nonpast_vowel, rad3)
-- passive past stems - vowel (v) and consonant (c)
-- 'ufīla, 'ufiltu
local past_pass_v_stem = q(rad1, II, rad3)
local past_pass_c_stem = q(rad1, I, rad3)
-- passive non-past stems - vowel (v) and consonant (c)
-- yufāla/yufalna
-- stem is built differently but conjugation is identical to sound verbs
local nonpast_pass_v_stem = q(rad1, AA, rad3)
local nonpast_pass_c_stem = q(rad1, A, rad3)
-- imperative stem
local imp_v_stem = nonpast_v_stem
local imp_c_stem = nonpast_c_stem
-- make parts
make_hollow_geminate_verb(base, false, past_v_stem, past_c_stem, past_pass_v_stem,
past_pass_c_stem, nonpast_v_stem, nonpast_c_stem, nonpast_pass_v_stem,
nonpast_pass_c_stem, imp_v_stem, imp_c_stem, "a")
if kaan_radicals(rad1, rad2, rad3) then
local endings = make_nonpast_endings(U, {}, {}, {}, {})
inflect_tense(base, "juss", nonpast_prefix_consonants, q(A, rad1), endings)
base.irregular = true
end
-- Active participle.
insert_form_or_forms(base, "ap1", req(rad3, HAMZA) and q(rad1, AA, HAMZA, IN) or
q(rad1, AA, HAMZA, I, rad3))
-- Active participle, stative type I فَيِّد (++). FIXME: Any examples of this to verify it's correct?
insert_ap2_pp2(base, q(rad1, A, Y, SH, I, rad3))
-- Active participle, stative type II فَيِد (+++). FIXME: Any examples of this to verify it's correct?
insert_form_or_forms(base, "ap3", q(rad1, A, Y, I, rad3))
-- Active participle, color/defect أَفّيَد or أَفّوَد (+cd). FIXME: Any examples of this to verify it's correct?
insert_form_or_forms(base, "apcd", q(HAMZA, A, rad1, SK, rad2, A, rad3))
-- Active participle, -ān فَيْدَان or فَوْدَان (+an). Example: جَاعَ "to be hungry", act part جَوْعَان
insert_form_or_forms(base, "apan", q(rad1, A, rad2, SK, rad3, AAN))
-- Passive participle.
insert_form_or_forms(base, "pp", q(MA, rad1, req(rad2, Y) and II or UU, rad3))
end
conjugations["I-geminate"] = function(base, vowel_spec)
local rad1, rad2, rad3, past_vowel, nonpast_vowel = get_radicals_3(vowel_spec)
-- Verbal nouns (maṣādir) for form I are unpredictable and have to be supplied.
-- active past stems - vowel (v) and consonant (c)
local past_v_stem = q(rad1, A, rad2, SH)
local past_c_stem = q(rad1, A, rad2, past_vowel, rad2)
-- active non-past stems - vowel (v) and consonant (c)
local nonpast_v_stem = q(rad1, nonpast_vowel, rad2, SH)
local nonpast_c_stem = q(rad1, SK, rad2, nonpast_vowel, rad2)
-- passive past stems - vowel (v) and consonant (c)
-- dulla/dulilta
local past_pass_v_stem = q(rad1, U, rad2, SH)
local past_pass_c_stem = q(rad1, U, rad2, I, rad2)
-- passive non-past stems - vowel (v) and consonant (c)
--yudallu/yudlalna
-- stem is built differently but conjugation is identical to sound verbs
local nonpast_pass_v_stem = q(rad1, A, rad2, SH)
local nonpast_pass_c_stem = q(rad1, SK, rad2, A, rad2)
-- imperative stem
local imp_v_stem = q(rad1, nonpast_vowel, rad2, SH)
local imp_c_stem = q(form_i_imp_stem_through_rad1(base, nonpast_vowel, rad1), rad2, nonpast_vowel, rad2)
-- make parts
make_hollow_geminate_verb(base, "geminate", past_v_stem, past_c_stem, past_pass_v_stem,
past_pass_c_stem, nonpast_v_stem, nonpast_c_stem, nonpast_pass_v_stem,
nonpast_pass_c_stem, imp_v_stem, imp_c_stem, "a")
-- Active participle.
insert_form_or_forms(base, "ap1", q(rad1, AA, rad2, SH))
-- Active participle, stative type I فَعِيع (++). FIXME: Any examples of this to verify it's correct?
insert_ap2_pp2(base, q(rad1, A, rad2, II, rad2))
-- Active participle, stative type II فَعّ (+++). Example: بَرَّ "to be pious", active participle بَرّ
insert_form_or_forms(base, "ap3", q(rad1, A, rad2, SH))
-- Active participle, color/defect أَفَعّ (+cd).
-- Example: لَصَّ "to be thievish, to steal repeatedly", active participle أَلَصّ.
insert_form_or_forms(base, "apcd", q(HAMZA, A, rad1, A, rad2, SH))
-- Active participle, -ān فَعَّان (+an). FIXME: Any examples of this to verify it's correct?
insert_form_or_forms(base, "apan", q(rad1, A, rad2, SH, AAN))
-- Passive participle.
insert_form_or_forms(base, "pp", q(MA, rad1, SK, rad2, UU, rad2))
end
-- Return the ta- (active, past and non-past) and tu- (passive past) prefixes for a form II/III/V/VI verb.
-- Form V and VI verbs normally use ta- and tu-, but reduced (base.reduced) verbs use different prefixes. Form II
-- and III verbs have no prefix.
local function form_ii_iii_v_vi_ta_tu_prefix(base, rad1)
local vform = base.verb_form
if vform == "V" or vform == "VI" then
if base.reduced then
-- To simplify the code, we generate two rad1's with a sukūn between them, which is cleaned up in
-- postprocessing.
return q(_I, rad1, SK), q(rad1, SK), q(_U, rad1, SK)
else
return TA, TA, TU
end
else
return "", "", ""
end
end
-- Make form II or V sound or final-weak verb.
local function make_form_ii_v_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local final_weak = is_final_weak(base, vowel_spec)
local vform = base.verb_form
local ta_past_prefix, ta_nonpast_prefix, tu_past_prefix = form_ii_iii_v_vi_ta_tu_prefix(base, rad1)
local vn = vform == "V" and
q(ta_past_prefix, rad1, A, rad2, SH, final_weak and IN or q(U, rad3)) or
q(TA, rad1, SK, rad2, II, final_weak and AH or rad3)
-- various stem bases
local past_stem_base = q(ta_past_prefix, rad1, A, rad2, SH)
local nonpast_stem_base = q(ta_nonpast_prefix, rad1, A, rad2, SH)
local past_pass_stem_base = q(tu_past_prefix, rad1, U, rad2, SH)
-- make parts
make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base,
vn)
end
conjugations["II-sound"] = function(base, vowel_spec)
make_form_ii_v_sound_final_weak_verb(base, vowel_spec)
end
conjugations["II-final-weak"] = function(base, vowel_spec)
make_form_ii_v_sound_final_weak_verb(base, vowel_spec)
end
local function make_form_iii_alt_vn(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local final_weak = is_final_weak(base, vowel_spec)
-- Insert alternative verbal noun فِعَال. Since not all verbs have this, we require that verbs that do have it
-- specify it explicitly; a shortcut ++ is provided to make this easier (e.g. <vn:+,++> to indicate that
-- both the normal verbal noun مُفَاعَلَة and secondary verbal noun فِعَال are available).
insert_form_or_forms(base, "vn2", q(rad1, I, rad2, AA, final_weak and HAMZA or rad3))
end
-- Make form III or VI sound or final-weak verb.
local function make_form_iii_vi_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local final_weak = is_final_weak(base, vowel_spec)
local vform = base.verb_form
local ta_past_prefix, ta_nonpast_prefix, tu_past_prefix = form_ii_iii_v_vi_ta_tu_prefix(base, rad1)
local vn = vform == "VI" and
q(ta_past_prefix, rad1, AA, rad2, final_weak and IN or q(U, rad3)) or
q(MU, rad1, AA, rad2, final_weak and AAH or q(A, rad3, AH))
-- various stem bases
local past_stem_base = q(ta_past_prefix, rad1, AA, rad2)
local nonpast_stem_base = q(ta_nonpast_prefix, rad1, AA, rad2)
local past_pass_stem_base = q(tu_past_prefix, rad1, UU, rad2)
-- make parts
make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base,
vn)
if vform == "III" then
make_form_iii_alt_vn(base, vowel_spec)
end
end
conjugations["III-sound"] = function(base, vowel_spec)
make_form_iii_vi_sound_final_weak_verb(base, vowel_spec)
end
conjugations["III-final-weak"] = function(base, vowel_spec)
make_form_iii_vi_sound_final_weak_verb(base, vowel_spec)
end
-- Make form III or VI geminate verb.
local function make_form_iii_vi_geminate_verb(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local vform = base.verb_form
local ta_past_prefix, ta_nonpast_prefix, tu_past_prefix = form_ii_iii_v_vi_ta_tu_prefix(base, rad1)
-- Alternative verbal noun فِعَال will be inserted when we add sound parts below.
local vn = vform == "VI" and q(ta_past_prefix, rad1, AA, rad2, SH) or q(MU, rad1, AA, rad2, SH, AH)
-- Various stem bases.
local past_stem_base = q(ta_past_prefix, rad1, AA)
local nonpast_stem_base = q(ta_nonpast_prefix, rad1, AA)
local past_pass_stem_base = q(tu_past_prefix, rad1, UU)
-- Make parts.
local variant = vowel_spec.variant or "short"
if variant == "short" or variant == "both" then
make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
-- Also add alternative sound (non-compressed) parts. This will lead to some duplicate entries, but they are
-- removed during addition.
if variant == "long" or variant == "both" then
make_form_iii_vi_sound_final_weak_verb(base, vowel_spec)
elseif vform == "III" then
-- Still need to add the alternative form-III verbal noun.
make_form_iii_alt_vn(base, vowel_spec)
end
end
conjugations["III-geminate"] = function(base, vowel_spec)
make_form_iii_vi_geminate_verb(base, vowel_spec)
end
-- Make form IV sound or final-weak verb.
local function make_form_iv_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local final_weak = is_final_weak(base, vowel_spec)
-- core of stem base, minus stem prefixes
local stem_core
-- check for irregular verb أَرَى
local is_raa = raa_radicals(rad1, rad2, rad3)
if is_raa then
base.irregular = true
stem_core = rad1
else
stem_core = q(rad1, SK, rad2)
end
-- verbal noun
local vn = is_raa and
q(HAMZA, I, stem_core, AA, HAMZA, AH) or
q(HAMZA, I, stem_core, AA, final_weak and HAMZA or rad3)
-- various stem bases
local past_stem_base = q(HAMZA, A, stem_core)
local nonpast_stem_base = stem_core
local past_pass_stem_base = q(HAMZA, U, stem_core)
-- make parts
make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base,
vn)
end
conjugations["IV-sound"] = function(base, vowel_spec)
make_form_iv_sound_final_weak_verb(base, vowel_spec)
end
conjugations["IV-final-weak"] = function(base, vowel_spec)
make_form_iv_sound_final_weak_verb(base, vowel_spec)
end
conjugations["IV-hollow"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
-- verbal noun
local vn = q(HAMZA, I, rad1, AA, rad3, AH)
-- various stem bases
local past_stem_base = q(HAMZA, A, rad1)
local nonpast_stem_base = rad1
local past_pass_stem_base = q(HAMZA, U, rad1)
-- make parts
make_augmented_hollow_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
conjugations["IV-geminate"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local vn = q(HAMZA, I, rad1, SK, rad2, AA, rad2)
-- various stem bases
local past_stem_base = q(HAMZA, A, rad1)
local nonpast_stem_base = rad1
local past_pass_stem_base = q(HAMZA, U, rad1)
-- make parts
make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
conjugations["V-sound"] = function(base, vowel_spec)
make_form_ii_v_sound_final_weak_verb(base, vowel_spec)
end
conjugations["V-final-weak"] = function(base, vowel_spec)
make_form_ii_v_sound_final_weak_verb(base, vowel_spec)
end
conjugations["VI-sound"] = function(base, vowel_spec)
make_form_iii_vi_sound_final_weak_verb(base, vowel_spec)
end
conjugations["VI-final-weak"] = function(base, vowel_spec)
make_form_iii_vi_sound_final_weak_verb(base, vowel_spec)
end
conjugations["VI-geminate"] = function(base, vowel_spec)
make_form_iii_vi_geminate_verb(base, vowel_spec)
end
-- Make a verbal noun of the general form that applies to forms VII and above. RAD12 is the first consonant cluster
-- (after initial اِ) and RAD34 is the second consonant cluster. RAD5 is the final consonant.
local function high_form_verbal_noun(rad12, rad34, rad5)
return q(_I, rad12, I, rad34, AA, rad5)
end
-- Populate a sound or final-weak verb for any of the various high-numbered augmented forms (form VII and up) that
-- have up to 5 consonants in two clusters in the stem and the same pattern of vowels between. Some of these
-- consonants in certain verb parts are w's, which leads to apparent anomalies in certain stems of these parts, but
-- these anomalies are handled automatically in postprocessing, where we resolve sequences of iwC -> īC, uwC -> ūC,
-- w + sukūn + w -> w + shadda.
-- RAD12 is the first consonant cluster (after initial اِ) and RAD34 is the second consonant cluster. RAD5 is the
-- final consonant.
local function make_high_form_sound_final_weak_verb(base, vowel_spec, rad12, rad34, rad5)
local final_weak = is_final_weak(base, vowel_spec)
local vn = high_form_verbal_noun(rad12, rad34, final_weak and HAMZA or rad5)
-- various stem bases
local nonpast_stem_base = q(rad12, A, rad34)
local past_stem_base = q(_I, nonpast_stem_base)
local past_pass_stem_base = q(_U, rad12, U, rad34)
-- make parts
make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base,
vn)
end
local function form_vii_nrad1(base, rad1)
if base.reduced then
if not req(rad1, M) then
error(("Internal error: Form VII first radical %s is not م but .reduced specified; should have been caught earlier"):
format(rget(rad1)))
end
return M .. SH
else
return q("نْ", rad1)
end
end
-- Make form VII sound or final-weak verb.
local function make_form_vii_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
make_high_form_sound_final_weak_verb(base, vowel_spec, form_vii_nrad1(base, rad1), rad2, rad3)
end
conjugations["VII-sound"] = function(base, vowel_spec)
make_form_vii_sound_final_weak_verb(base, vowel_spec)
end
conjugations["VII-final-weak"] = function(base, vowel_spec)
make_form_vii_sound_final_weak_verb(base, vowel_spec)
end
conjugations["VII-hollow"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local nrad1 = form_vii_nrad1(base, rad1)
local vn = high_form_verbal_noun(nrad1, Y, rad3)
-- various stem bases
local nonpast_stem_base = nrad1
local past_stem_base = q(_I, nonpast_stem_base)
local past_pass_stem_base = q(_U, nrad1)
-- make parts
make_augmented_hollow_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
conjugations["VII-geminate"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local nrad1 = form_vii_nrad1(base, rad1)
local vn = high_form_verbal_noun(nrad1, rad2, rad2)
-- various stem bases
local nonpast_stem_base = q(nrad1, A)
local past_stem_base = q(_I, nonpast_stem_base)
local past_pass_stem_base = q(_U, nrad1, U)
-- make parts
make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
-- Return Form VIII verbal noun.
local function form_viii_verbal_noun(base, vowel_spec, rad1, rad2, rad3)
local final_weak = is_final_weak(base, vowel_spec)
rad3 = final_weak and HAMZA or rad3
return {high_form_verbal_noun(vowel_spec.form_viii_assim, rad2, rad3)}
end
-- Make form VIII sound or final-weak verb.
local function make_form_viii_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
-- check for irregular verb اِتَّخَذَ
if axadh_radicals(rad1, rad2, rad3) then
base.irregular = true
rad1 = T
end
make_high_form_sound_final_weak_verb(base, vowel_spec, vowel_spec.form_viii_assim, rad2, rad3)
end
conjugations["VIII-sound"] = function(base, vowel_spec)
make_form_viii_sound_final_weak_verb(base, vowel_spec)
end
conjugations["VIII-final-weak"] = function(base, vowel_spec)
make_form_viii_sound_final_weak_verb(base, vowel_spec)
end
conjugations["VIII-hollow"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local vn = form_viii_verbal_noun(base, vowel_spec, rad1, Y, rad3)
-- various stem bases
local nonpast_stem_base = vowel_spec.form_viii_assim
local past_stem_base = q(_I, nonpast_stem_base)
local past_pass_stem_base = q(_U, nonpast_stem_base)
-- make parts
make_augmented_hollow_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
conjugations["VIII-geminate"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local vn = form_viii_verbal_noun(base, vowel_spec, rad1, rad2, rad2)
-- various stem bases
local nonpast_stem_base = q(vowel_spec.form_viii_assim, A)
local past_stem_base = q(_I, nonpast_stem_base)
local past_pass_stem_base = q(_U, vowel_spec.form_viii_assim, U)
-- make parts
make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
conjugations["IX-sound"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local vn = q(_I, rad1, SK, rad2, I, rad3, AA, rad3)
-- various stem bases
local nonpast_stem_base = q(rad1, SK, rad2, A)
local past_stem_base = q(_I, nonpast_stem_base)
local past_pass_stem_base = q(_U, rad1, SK, rad2, U)
-- make parts
make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
conjugations["IX-final-weak"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
make_high_form_sound_final_weak_verb(base, vowel_spec, q(rad1, SK, rad2), rad3, rad3)
end
-- Populate a sound or final-weak verb for any of the various high-numbered
-- augmented forms that have 5 consonants in the stem and the same pattern of
-- vowels. Some of these consonants in certain verb parts are w's, which leads to
-- apparent anomalies in certain stems of these parts, but these anomalies
-- are handled automatically in postprocessing, where we resolve sequences of
-- iwC -> īC, uwC -> ūC, w + sukūn + w -> w + shadda.
local function make_high5_form_sound_final_weak_verb(base, vowel_spec, rad1, rad2, rad3, rad4, rad5)
make_high_form_sound_final_weak_verb(base, vowel_spec, q(rad1, SK, rad2), q(rad3, SK, rad4), rad5)
end
-- Make form X sound or final-weak verb.
local function make_form_x_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
-- check for irregular verb اِسْتَحْيَا (also اِسْتَحَى)
local is_hayy = hayy_radicals(rad1, rad2, rad3)
local variant = vowel_spec.variant or "both"
if not is_hayy or variant == "long" or variant == "both" then
make_high5_form_sound_final_weak_verb(base, vowel_spec, S, T, rad1, rad2, rad3)
end
if is_hayy and (variant == "short" or variant == "both") then
base.irregular = true
-- Add alternative entries to the verbal paradigms. Any duplicates are removed during addition.
make_high_form_sound_final_weak_verb(base, vowel_spec, S .. SK .. T, rad1, rad3)
end
end
conjugations["X-sound"] = function(base, vowel_spec)
make_form_x_sound_final_weak_verb(base, vowel_spec)
end
conjugations["X-final-weak"] = function(base, vowel_spec)
make_form_x_sound_final_weak_verb(base, vowel_spec)
end
conjugations["X-hollow"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local vn = q(base.reduced and "اِسْ" or "اِسْتِ", rad1, AA, rad3, AH)
-- various stem bases
local past_stem_base = q(base.reduced and "اِسْ" or "اِسْتَ", rad1)
local nonpast_stem_base = q(base.reduced and "سْ" or "سْتَ", rad1)
local past_pass_stem_base = q(base.reduced and "اُسْ" or "اُسْتُ", rad1)
-- make parts
make_augmented_hollow_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
conjugations["X-geminate"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local vn = q("اِسْتِ", rad1, SK, rad2, AA, rad2)
-- various stem bases
local past_stem_base = q("اِسْتَ", rad1)
local nonpast_stem_base = q("سْتَ", rad1)
local past_pass_stem_base = q("اُسْتُ", rad1)
-- make parts
if base.altgem then
inflect_tense(base, "past", "", {q(past_stem_base, A, rad2, SH), all_same = 1},
past_endings_ay_12_person_only)
end
make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn,
base.altgem and "[uncommon]" or nil)
end
conjugations["XI-sound"] = function(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local vn = q(_I, rad1, SK, rad2, II, rad3, AA, rad3)
-- various stem bases
local nonpast_stem_base = q(rad1, SK, rad2, AA)
local past_stem_base = q(_I, nonpast_stem_base)
local past_pass_stem_base = q(_U, rad1, SK, rad2, UU)
-- make parts
make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
-- Probably no form XI final-weak, since already geminate in form; would behave as XI-sound.
-- Make form XII sound or final-weak verb.
local function make_form_xii_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
make_high5_form_sound_final_weak_verb(base, vowel_spec, rad1, rad2, W, rad2, rad3)
end
conjugations["XII-sound"] = function(base, vowel_spec)
make_form_xii_sound_final_weak_verb(base, vowel_spec)
end
conjugations["XII-final-weak"] = function(base, vowel_spec)
make_form_xii_sound_final_weak_verb(base, vowel_spec)
end
-- Make form XIII sound or final-weak verb.
local function make_form_xiii_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
make_high5_form_sound_final_weak_verb(base, vowel_spec, rad1, rad2, W, W, rad3)
end
conjugations["XIII-sound"] = function(base, vowel_spec)
make_form_xiii_sound_final_weak_verb(base, vowel_spec)
end
conjugations["XIII-final-weak"] = function(base, vowel_spec)
make_form_xiii_sound_final_weak_verb(base, vowel_spec)
end
-- Make a form XIV or XV sound or final-weak verb. Last radical appears twice (if`anlala / yaf`anlilu) so if it were
-- w or y you'd get if`anwā / yaf`anwī or if`anyā / yaf`anyī, i.e. unlike for most augmented verbs, the identity of
-- the radical matters.
local function make_form_xiv_xv_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3 = get_radicals_3(vowel_spec)
local lastrad = base.verb_form == "XV" and Y or rad3
make_high5_form_sound_final_weak_verb(base, vowel_spec, rad1, rad2, N, rad3, lastrad)
end
conjugations["XIV-sound"] = function(base, vowel_spec)
make_form_xiv_xv_sound_final_weak_verb(base, vowel_spec)
end
conjugations["XIV-final-weak"] = function(base, vowel_spec)
make_form_xiv_xv_sound_final_weak_verb(base, vowel_spec)
end
conjugations["XV-sound"] = function(base, vowel_spec)
make_form_xiv_xv_sound_final_weak_verb(base, vowel_spec)
end
-- Probably no form XV final-weak, since already final-weak in form; would behave as XV-sound.
-- Make form Iq or IIq sound or final-weak verb.
local function make_form_iq_iiq_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3, rad4 = get_radicals_4(vowel_spec)
local final_weak = is_final_weak(base, vowel_spec)
local vform = base.verb_form
local vn = vform == "IIq" and
q(TA, rad1, A, rad2, SK, rad3, (final_weak and IN or q(U, rad4))) or
q(rad1, A, rad2, SK, rad3, (final_weak and AAH or q(A, rad4, AH)))
local ta_pref = vform == "IIq" and TA or ""
local tu_pref = vform == "IIq" and TU or ""
-- various stem bases
local past_stem_base = q(ta_pref, rad1, A, rad2, SK, rad3)
local nonpast_stem_base = past_stem_base
local past_pass_stem_base = q(tu_pref, rad1, U, rad2, SK, rad3)
-- make parts
make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base,
vn)
end
conjugations["Iq-sound"] = function(base, vowel_spec)
make_form_iq_iiq_sound_final_weak_verb(base, vowel_spec)
end
conjugations["Iq-final-weak"] = function(base, vowel_spec)
make_form_iq_iiq_sound_final_weak_verb(base, vowel_spec)
end
conjugations["IIq-sound"] = function(base, vowel_spec)
make_form_iq_iiq_sound_final_weak_verb(base, vowel_spec)
end
conjugations["IIq-final-weak"] = function(base, vowel_spec)
make_form_iq_iiq_sound_final_weak_verb(base, vowel_spec)
end
-- Make form IIIq sound or final-weak verb.
local function make_form_iiiq_sound_final_weak_verb(base, vowel_spec)
local rad1, rad2, rad3, rad4 = get_radicals_4(vowel_spec)
make_high5_form_sound_final_weak_verb(base, vowel_spec, rad1, rad2, N, rad3, rad4)
end
conjugations["IIIq-sound"] = function(base, vowel_spec)
make_form_iiiq_sound_final_weak_verb(base, vowel_spec)
end
conjugations["IIIq-final-weak"] = function(base, vowel_spec)
make_form_iiiq_sound_final_weak_verb(base, vowel_spec)
end
conjugations["IVq-sound"] = function(base, vowel_spec)
local rad1, rad2, rad3, rad4 = get_radicals_4(vowel_spec)
local vn = q(_I, rad1, SK, rad2, I, rad3, SK, rad4, AA, rad4)
-- various stem bases
local past_stem_base = q(_I, rad1, SK, rad2, A, rad3)
local nonpast_stem_base = q(rad1, SK, rad2, A, rad3)
local past_pass_stem_base = q(_U, rad1, SK, rad2, U, rad3)
-- make parts
make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn)
end
-- Probably no form IVq final-weak, since already geminate in form; would behave as IVq-sound.
end
create_conjugations()
-------------------------------------------------------------------------------
-- Guts of main conjugation function --
-------------------------------------------------------------------------------
-- Given form, weakness and radicals, check to make sure the radicals present are allowable for the weakness. Hamzas on
-- alif/wāw/yāʾ seats are never allowed (should always appear as hamza-on-the-line), and various weaknesses have various
-- strictures on allowable consonants.
local function check_radicals(form, weakness, rad1, rad2, rad3, rad4)
local function hamza_check(index, rad)
if rad == HAMZA_ON_ALIF or rad == HAMZA_UNDER_ALIF or
rad == HAMZA_ON_W or rad == HAMZA_ON_Y then
error("Radical " .. index .. " is " .. rad .. " but should be ء (hamza on the line)")
end
end
local function check_waw_ya(index, rad)
if not is_waw_ya(rad) then
error("Radical " .. index .. " is " .. rad .. " but should be و or ي")
end
end
local function check_not_waw_ya(index, rad)
if is_waw_ya(rad) then
error("In a sound verb, radical " .. index .. " should not be و or ي")
end
end
hamza_check(rad1)
hamza_check(rad2)
hamza_check(rad3)
hamza_check(rad4)
if weakness == "assimilated" or weakness == "assimilated+final-weak" then
if rad1 ~= W then
error("Radical 1 is " .. rad1 .. " but should be و")
end
-- don't check that non-assimilated form I verbs don't have wāw as their
-- first radical because some form-I verbs exist where a first-radical wāw
-- behaves as sound, e.g. wajuha yawjuhu "to be distinguished".
end
if weakness == "final-weak" or weakness == "assimilated+final-weak" then
if rad4 then
check_waw_ya(4, rad4)
else
check_waw_ya(3, rad3)
end
elseif vform_supports_final_weak(form) then
-- non-final-weak verbs cannot have weak final radical if there's a corresponding
-- final-weak verb category. I think this is safe. We may have problems with
-- ḥayya/ḥayiya yaḥyā if we treat it as a geminate verb.
if rad4 then
check_not_waw_ya(4, rad4)
else
check_not_waw_ya(3, rad3)
end
end
if weakness == "hollow" then
check_waw_ya(2, rad2)
-- don't check that non-hollow verbs in forms that support hollow verbs
-- don't have wāw or yāʾ as their second radical because some verbs exist
-- where a middle-radical wāw/yāʾ behaves as sound, e.g. form-VIII izdawaja
-- "to be in pairs".
end
if weakness == "geminate" then
if rad4 then
error("Internal error: No geminate quadrilaterals, should not be seen")
end
if rad2 ~= rad3 then
error("Weakness is geminate; radical 3 is " .. rad3 .. " but should be same as radical 2 " .. rad2)
end
elseif vform_supports_geminate(form) then
-- non-geminate verbs cannot have second and third radical same if there's
-- a corresponding geminate verb category. I think this is safe. We
-- don't fuss over double wāw or double yāʾ because this could legitimately
-- be a final-weak verb with middle wāw/yāʾ, treated as sound.
if rad4 then
error("Internal error: No quadrilaterals should support geminate verbs")
end
if rad2 == rad3 and not is_waw_ya(rad2) then
error("Weakness is '" .. weakness .. "'; radical 2 and 3 are same at " .. rad2 .. " but should not be; consider making weakness 'geminate'")
end
end
end
-- array of substitutions; each element is a 2-entry array FROM, TO; do it
-- this way so the concatenations only get evaluated once
local postprocess_subs = {
-- reorder short-vowel + shadda -> shadda + short-vowel for easier processing
{"(" .. AIU .. ")" .. SH, SH .. "%1"},
----------same letter separated by sukūn should instead use shadda---------
------------happens e.g. in kun-nā "we were".-----------------
{"(.)" .. SK .. "%1", "%1" .. SH},
---------------------------- assimilated verbs ----------------------------
-- iw, iy -> ī (assimilated verbs)
{I .. W .. SK, II},
{I .. Y .. SK, II},
-- uw, uy -> ū (assimilated verbs)
{U .. W .. SK, UU},
{U .. Y .. SK, UU},
-------------- final -yā uses tall alif not alif maqṣūra ------------------
{"(" .. Y .. SH .. "?" .. A .. ")" .. AMAQ, "%1" .. ALIF},
----------------------- handle hamza assimilation -------------------------
-- initial hamza + short-vowel + hamza + sukūn -> hamza + long vowel
{HAMZA .. A .. HAMZA .. SK, HAMZA .. A .. ALIF},
{HAMZA .. I .. HAMZA .. SK, HAMZA .. I .. Y},
{HAMZA .. U .. HAMZA .. SK, HAMZA .. U .. W}
}
local postprocess_tr_subs = {
{"ī([" .. vowels .. "y*])", "iy%1"},
{"ū([" .. vowels .. "w*])", "uw%1"},
{"(.)%*", "%1%1"}, -- implement shadda
---------------------------- assimilated verbs ----------------------------
-- iw, iy -> ī (assimilated verbs)
{"iw([^" .. vowels .. "w])", "ī%1"},
{"iy([^" .. vowels .. "y])", "ī%1"},
-- uw, uy -> ū (assimilated verbs)
{"uw([^" .. vowels .. "w])", "ū%1"},
{"uy([^" .. vowels .. "y])", "ū%1"},
----------------------- handle hamza assimilation -------------------------
-- initial hamza + short-vowel + hamza + sukūn -> hamza + long vowel
{"ʔaʔ(" .. NV .. ")", "ʔā%1"},
{"ʔiʔ(" .. NV .. ")", "ʔī%1"},
{"ʔuʔ(" .. NV .. ")", "ʔū%1"},
}
-- Post-process verb parts to eliminate phonological anomalies. Many of the changes, particularly the tricky ones,
-- involve converting hamza to have the proper seat. The rules for this are complicated and are documented on the
-- [[w:Hamza]] Wikipedia page. In some cases there are alternatives allowed, and we handle them below by returning
-- multiple possibilities.
local function postprocess_term(term)
if term == "?" then
return "?"
end
-- Add BORDER at text boundaries.
term = BORDER .. term .. BORDER
-- Do the main post-processing, based on the pattern substitutions in postprocess_subs.
for _, sub in ipairs(postprocess_subs) do
term = rsub(term, sub[1], sub[2])
end
term = term:gsub(BORDER, "")
if not rfind(term, HAMZA) then
return term
end
term = term:gsub(HAMZA, HAMZA_PH)
term = ar_utilities.process_hamza(term)
if #term == 1 then
term = term[1]
end
return term
end
local function postprocess_translit(translit)
if translit == "?" then
return "?"
end
-- Add BORDER at text boundaries.
translit = BORDER .. translit .. BORDER
-- Do the main post-processing, based on the pattern substitutions in postprocess_tr_subs.
for _, sub in ipairs(postprocess_tr_subs) do
translit = rsub(translit, sub[1], sub[2])
end
translit = translit:gsub(BORDER, "")
return translit
end
local function postprocess_forms(base)
local converted_values = {}
for slot, forms in pairs(base.forms) do
local need_dedup = false
for i, form in ipairs(forms) do
local term = postprocess_term(form.form)
local translit = form.translit and postprocess_translit(form.translit) or nil
if term ~= form.form or translit ~= form.translit then
need_dedup = true
end
converted_values[i] = {term, translit}
end
if need_dedup then
local temp_dedup = {}
for i = 1, #forms do
local new_term, new_translit = unpack(converted_values[i])
if type(new_term) == "table" then
for _, nt in ipairs(new_term) do
local new_formobj = {
form = nt,
translit = new_translit,
footnotes = forms[i].footnotes,
}
iut.insert_form(temp_dedup, "temp", new_formobj)
end
else
local new_formobj = {
form = new_term,
translit = new_translit,
footnotes = forms[i].footnotes,
}
iut.insert_form(temp_dedup, "temp", new_formobj)
end
end
base.forms[slot] = temp_dedup.temp
end
end
end
local function process_slot_overrides(base)
for slot, forms in pairs(base.slot_overrides) do
local existing_values = base.forms[slot]
base.forms[slot] = nil
for _, form in ipairs(forms) do
-- + in active participle for form I requests slot ap1
if form.form == "+" and (base.verb_form ~= "I" or slot ~= "ap") then
if not existing_values then
error(("Slot '%s' requested the default value but no such value available"):format(slot))
end
-- We maintain an invariant that no two slots share a form object (although they may share the footnote
-- lists inside the form objects). However, there is no need to copy the form objects here because there
-- is a one-to-one correspondence between slots and slot overrides, i.e. you can't have a default value
-- go into two slots.
insert_form_or_forms(base, slot, existing_values, "allow overrides", form.uncertain)
elseif default_indicator_to_active_participle_slot[form.form] then
if form.form == "++" then
if slot ~= "vn" and slot ~= "ap" and slot ~= "pp" then
error(("Secondary default value request '++' only applicable to verbal nouns and pariciples, but found in slot '%s'"):
format(slot))
end
else
if slot ~= "ap" then
error(("Secondary default value request '%s' only applicable to active pariciples, but found in slot '%s'"):
format(form.form, slot))
end
end
local secondary_default_slot =
slot == "vn" and "vn2" or slot == "pp" and "pp2" or
default_indicator_to_active_participle_slot[form.form]
local existing_values = base.forms[secondary_default_slot]
if not existing_values then
error(("Slot '%s' requested a secondary default value using '%s' but no such value available"):
format(slot, form.form))
end
-- See comment above about the lack of need to copy the form objects.
insert_form_or_forms(base, slot, existing_values, "allow overrides", form.uncertain)
-- To make sure there aren't shared form objects.
base.forms[secondary_default_slot] = nil
else
insert_form_or_forms(base, slot, form, "allow overrides", form.uncertain)
end
end
end
-- Now, for non-stative form-I verbs, fill the active participle slot from ap1 unless it should be missing (e.g.
-- passive-only or user specified 'ap:-').
if base.verb_form == "I" and not base.forms.ap and base.forms.ap1 and not skip_slot(base, "ap") then
local saw_non_stative = false
for _, vowel_spec in ipairs(base.conj_vowels) do
if req(vowel_spec.past, A) then
saw_non_stative = true
break
end
end
if saw_non_stative then
base.forms.ap = base.forms.ap1
-- To make sure there aren't shared form objects.
base.forms.ap1 = nil
end
end
end
local function handle_lemma_linked(base)
-- Compute linked versions of potential lemma slots, for use in {{ar-verb}}. We substitute the original lemma
-- (before removing links) for forms that are the same as the lemma, if the original lemma has links.
for _, slot in ipairs(export.potential_lemma_slots) do
if base.forms[slot] then
insert_form_or_forms(base, slot .. "_linked", iut.map_forms(base.forms[slot], function(form)
if form == base.lemma and rfind(base.linked_lemma, "%[%[") then
return base.linked_lemma
else
return form
end
end))
end
end
end
-- Process specs given by the user using 'addnote[SLOTSPEC][FOOTNOTE][FOOTNOTE][...]'.
local function process_addnote_specs(base)
for _, spec in ipairs(base.addnote_specs) do
for _, slot_spec in ipairs(spec.slot_specs) do
slot_spec = "^" .. slot_spec .. "$"
for slot, forms in pairs(base.forms) do
if rfind(slot, slot_spec) then
-- To save on memory, side-effect the existing forms.
for _, form in ipairs(forms) do
form.footnotes = iut.combine_footnotes(form.footnotes, spec.footnotes)
end
end
end
end
end
end
local function add_missing_links_to_forms(base)
-- Any forms without links should get them now. Redundant ones will be stripped later.
for slot, forms in pairs(base.forms) do
for _, form in ipairs(forms) do
if not form.form:find("%[%[") then
form.form = "[[" .. form.form .. "]]"
end
end
end
end
local function conjugate_verb(base)
construct_stems(base)
for _, vowel_spec in ipairs(base.conj_vowels) do
-- Reconstruct conjugation type from verb form and (possibly inferred) weakness.
conj_type = base.verb_form .. "-" .. vowel_spec.weakness
-- Check that the conjugation type is recognized.
if not conjugations[conj_type] then
error("Unknown conjugation type '" .. conj_type .. "'")
end
-- The way the conjugation functions work is they always add entries to the appropriate parts of the paradigm
-- (each of which is an array), rather than setting the values. This makes it possible to call more than one
-- conjugation function and essentially get a paradigm of the "either A or B" kind. Doing this may insert
-- duplicate entries into a particular paradigm part, but this is not a problem because we check for duplicate
-- entries when adding them, and don't insert in that case.
conjugations[conj_type](base, vowel_spec)
end
postprocess_forms(base)
process_slot_overrides(base)
-- This should happen before add_missing_links_to_forms() so that the comparison `form == base.lemma` in
-- handle_lemma_linked() works correctly and compares unlinked forms to unlinked forms.
handle_lemma_linked(base)
process_addnote_specs(base)
if not base.alternant_multiword_spec.args.noautolinkverb then
add_missing_links_to_forms(base)
end
end
local function parse_indicator_spec(angle_bracket_spec)
-- Store the original angle bracket spec so we can reconstruct the overall conj spec with the lemma(s) in them.
local base = {
angle_bracket_spec = angle_bracket_spec,
conj_vowels = {},
root_consonants = {},
user_stem_overrides = {},
user_slot_overrides = {},
slot_explicitly_missing = {},
slot_uncertain = {},
slot_override_uses_default = {},
addnote_specs = {},
}
local function parse_err(msg)
error(msg .. ": " .. angle_bracket_spec)
end
local function fetch_footnotes(separated_group)
local footnotes
for j = 2, #separated_group - 1, 2 do
if separated_group[j + 1] ~= "" then
parse_err("Extraneous text after bracketed footnotes: '" .. table.concat(separated_group) .. "'")
end
if not footnotes then
footnotes = {}
end
table.insert(footnotes, separated_group[j])
end
return footnotes
end
local inside = angle_bracket_spec:match("^<(.*)>$")
assert(inside)
local segments = put.parse_multi_delimiter_balanced_segment_run(inside, {{"[", "]"}, {"<", ">"}})
local dot_separated_groups = put.split_alternating_runs_and_strip_spaces(segments, "%.")
-- The first dot-separated element must specify the verb form, e.g. IV or IIq. If the form is I, it needs to include
-- the the past and non-past vowels, e.g. I/a~u for kataba ~ yaktubu. More than one vowel can be given,
-- comma-separated, and more than one past~non-past pair can be given, slash-separated, e.g. I/a,u~u/i~a for form I
-- كمل, which can be conjugated as kamala/kamula ~ yakmulu or kamila ~ yakmalu. An individual vowel spec must be one
-- of a, i or u and in general (a) at least one past~non-past pair most be given, and (b) both past and non-past
-- vowels must be given even though sometimes the vowel can be determined from the unvocalized form. An exception is
-- passive-only verbs, where the vowels can't in general be determined (except indirectly in some cases by looking
-- at an associated non-passive verb); in that case, the vowel~vowel spec can left out.
local slash_separated_groups = put.split_alternating_runs_and_strip_spaces(dot_separated_groups[1], "/")
local form_spec = slash_separated_groups[1]
base.form_footnotes = fetch_footnotes(form_spec)
if form_spec[1] == "" then
parse_err("Missing verb form")
end
if not allowed_vforms_with_weakness_set[form_spec[1]] then
parse_err(("Unrecognized verb form '%s', should be one of %s"):format(
form_spec[1], list_to_text(allowed_vforms, nil, " or ")))
end
if form_spec[1]:find("%-") then
base.verb_form, base.explicit_weakness = form_spec[1]:match("^(.-)%-(.*)$")
else
base.verb_form = form_spec[1]
end
if #slash_separated_groups > 1 then
if base.verb_form ~= "I" then
parse_err(("Past~non-past vowels can only be specified when verb form is I, but saw form '%s'"):format(
base.verb_form))
end
for i = 2, #slash_separated_groups do
local slash_separated_group = slash_separated_groups[i]
local tilde_separated_groups = put.split_alternating_runs_and_strip_spaces(slash_separated_group, "~")
if #tilde_separated_groups ~= 2 then
parse_err(("Expected two tilde-separated vowel specs: %s"):format(table.concat(slash_separated_group)))
end
local function parse_conj_vowels(tilde_separated_group, vtype)
local conj_vowel_objects = {}
local comma_separated_groups = put.split_alternating_runs_and_strip_spaces(tilde_separated_group, ",")
for _, comma_separated_group in ipairs(comma_separated_groups) do
local conj_vowel = comma_separated_group[1]
if conj_vowel ~= "a" and conj_vowel ~= "i" and conj_vowel ~= "u" then
parse_err(("Expected %s conjugation vowel '%s' to be one of a, i or u in %s"):format(
vtype, conj_vowel, table.concat(slash_separated_group)))
end
conj_vowel = dia[conj_vowel]
local conj_vowel_footnotes = fetch_footnotes(comma_separated_group)
-- Try to use strings when possible as it makes q() significantly more efficient.
if conj_vowel_footnotes then
table.insert(conj_vowel_objects, {form = conj_vowel, footnotes = conj_vowel_footnotes})
else
table.insert(conj_vowel_objects, conj_vowel)
end
end
return conj_vowel_objects
end
local conj_vowel_spec = {
past = parse_conj_vowels(tilde_separated_groups[1], "past"),
nonpast = parse_conj_vowels(tilde_separated_groups[2], "non-past"),
}
table.insert(base.conj_vowels, conj_vowel_spec)
end
end
for i = 2, #dot_separated_groups do
local dot_separated_group = dot_separated_groups[i]
local first_element = dot_separated_group[1]
if first_element == "addnote" then
local spec_and_footnotes = fetch_footnotes(dot_separated_group)
if #spec_and_footnotes < 2 then
parse_err("Spec with 'addnote' should be of the form 'addnote[SLOTSPEC][FOOTNOTE][FOOTNOTE][...]'")
end
local slot_spec = table.remove(spec_and_footnotes, 1)
local slot_spec_inside = rmatch(slot_spec, "^%[(.*)%]$")
if not slot_spec_inside then
parse_err("Internal error: slot_spec " .. slot_spec .. " should be surrounded with brackets")
end
local slot_specs = rsplit(slot_spec_inside, ",")
-- FIXME: Here, [[Module:it-verb]] called strip_spaces(). Generally we don't do this. Should we?
table.insert(base.addnote_specs, {slot_specs = slot_specs, footnotes = spec_and_footnotes})
elseif first_element:find("^var:") then
if #dot_separated_group > 1 then
parse_err(("Can't attach footnotes to 'var:' spec '%s'"):format(first_element))
end
base.variant = first_element:match("^var:(.*)$")
elseif first_element:find("^I+V?:") then
local root_cons, root_cons_value = first_element:match("^(I+V?):(.*)$")
local root_index
if root_cons == "I" then
root_index = 1
elseif root_cons == "II" then
root_index = 2
elseif root_cons == "III" then
root_index = 3
elseif root_cons == "IV" then
root_index = 4
if not base.verb_form:find("q$") then
parse_err(("Can't specify root consonant IV for non-quadriliteral verb form '%s': %s"):format(
base.verb_form, first_element))
end
end
local cons, translit = root_cons_value:match("^(.*)//(.*)$")
if not cons then
cons = root_cons_value
end
local root_footnotes = fetch_footnotes(dot_separated_group)
if not translit and not root_footnotes then
base.root_consonants[root_index] = cons
else
base.root_consonants[root_index] = {form = cons, translit = translit, footnotes = root_footnotes}
end
elseif first_element:find("^[a-z][a-z0-9_]*:") then
local slot_or_stem, remainder = first_element:match("^(.-):(.*)$")
dot_separated_group[1] = remainder
local comma_separated_groups = put.split_alternating_runs_and_strip_spaces(dot_separated_group, "[,،]")
if overridable_stems[slot_or_stem] then
if base.user_stem_overrides[slot_or_stem] then
parse_err("Overridable stem '" .. slot_or_stem .. "' specified twice")
end
base.user_stem_overrides[slot_or_stem] = overridable_stems[slot_or_stem](comma_separated_groups,
{prefix = slot_or_stem, base = base, parse_err = parse_err, fetch_footnotes = fetch_footnotes})
else -- assume a form override; we validate further later when the possible slots are available
if base.user_slot_overrides[slot_or_stem] then
parse_err("Form override '" .. slot_or_stem .. "' specified twice")
end
base.user_slot_overrides[slot_or_stem] = allow_multiple_values_for_override(comma_separated_groups,
{prefix = slot_or_stem, base = base, parse_err = parse_err, fetch_footnotes = fetch_footnotes},
"is form override")
end
elseif indicator_flags[first_element] then
if #dot_separated_group > 1 then
parse_err("No footnotes allowed with '" .. first_element .. "' spec")
end
if base[first_element] then
parse_err("Spec '" .. first_element .. "' specified twice")
end
base[first_element] = true
else
local passive, uncertain = first_element:match("^(.*)(%?)$")
passive = passive or first_element
uncertain = not not uncertain
if passive_types[passive] then
if #dot_separated_group > 1 then
parse_err("No footnotes allowed with '" .. passive .. "' spec")
end
if base.passive then
parse_err("Value for passive type specified twice")
end
base.passive = passive
base.passive_uncertain = uncertain
else
parse_err("Unrecognized spec '" .. first_element .. "'")
end
end
end
return base
end
-- Normalize all lemmas, substituting the pagename for blank lemmas and adding links to multiword lemmas.
local function normalize_all_lemmas(alternant_multiword_spec, head)
-- (1) Add links to all before and after text. Remember the original text so we can reconstruct the verb spec later.
if not alternant_multiword_spec.args.noautolinktext then
iut.add_links_to_before_and_after_text(alternant_multiword_spec, "remember original")
end
-- (2) Remove any links from the lemma, but remember the original form so we can use it below in the 'lemma_linked'
-- form.
iut.map_word_specs(alternant_multiword_spec, function(base)
if base.lemma == "" then
base.lemma = head
end
base.user_specified_lemma = base.lemma
base.lemma = m_links.remove_links(base.lemma)
base.user_specified_verb = base.lemma
base.verb = base.user_specified_verb
local linked_lemma
if alternant_multiword_spec.args.noautolinkverb or base.user_specified_lemma:find("%[%[") then
linked_lemma = base.user_specified_lemma
else
-- Add links to the lemma so the user doesn't specifically need to, since we preserve
-- links in multiword lemmas and include links in non-lemma forms rather than allowing
-- the entire form to be a link.
linked_lemma = iut.add_links(base.user_specified_lemma)
end
base.linked_lemma = linked_lemma
end)
end
-- Determine weakness from radicals. Used when root given in place of lemma (e.g. for {{ar-verb forms}}).
local function weakness_from_radicals(form, rad1, rad2, rad3, rad4)
local weakness = nil
local quadlit = form:find("q$")
-- If weakness unspecified, derive from radicals.
if not quadlit then
if is_waw_ya(rad3) and rad1 == W and form == "I" then
weakness = "assimilated+final-weak"
elseif is_waw_ya(rad3) and vform_supports_final_weak(form) then
weakness = "final-weak"
elseif rad2 == rad3 and vform_supports_geminate(form) then
weakness = "geminate"
elseif is_waw_ya(rad2) and vform_supports_hollow(form) then
weakness = "hollow"
elseif rad1 == W and form == "I" then
weakness = "assimilated"
else
weakness = "sound"
end
else
if is_waw_ya(rad4) then
weakness = "final-weak"
else
weakness = "sound"
end
end
return weakness
end
-- Join the infixed tāʔ (ت) to the first radical in form VIII verbs. This may cause assimilation of the tāʔ to the
-- radical or in some cases the radical to the tāʔ. Used when a root is supplied instead of a lemma (which already has
-- the appropriate assimilation in it).
local function form_viii_join_ta(rad)
if rad == W or rad == Y or rad == "ت" then return "تّ"
elseif rad == "د" then return "دّ"
elseif rad == "ث" then return "ثّ"
elseif rad == "ذ" then return "ذّ"
elseif rad == "ز" then return "زْد"
elseif rad == "ص" then return "صْط"
elseif rad == "ض" then return "ضْط"
elseif rad == "ط" then return "طّ"
elseif rad == "ظ" then return "ظّ"
else return rad .. SK .. "ت"
end
end
local function detect_indicator_spec(base)
base.forms = {}
base.stem_overrides = {}
base.slot_overrides = {}
if not base.conj_vowels[1] then
-- These may be converted to inferred vowels. If not, we throw an error if form I and not passive-only.
base.conj_vowels = {{
past = "-",
nonpast = "-",
}}
else
-- If multiple vowels specified for a given vowel type (e.g. a,u~u), expand so that each spec in
local expansion = {}
for _, spec in ipairs(base.conj_vowels) do
for _, past in ipairs(spec.past) do
for _, nonpast in ipairs(spec.nonpast) do
table.insert(expansion, {past = past, nonpast = nonpast})
end
end
end
base.conj_vowels = expansion
end
local vform = base.verb_form
-- check for quadriliteral form (Iq, IIq, IIIq, IVq)
base.quadlit = not not vform:find("q$")
-- Infer radicals as necessary. We infer a separate set of radicals for each past~non-past vowel combination because
-- they may be different (particularly with form-I hollow verbs).
for _, vowel_spec in ipairs(base.conj_vowels) do
-- NOTE: rad1, rad2, etc. refer to user-specified radicals, which are formobj tables that optionally specify an
-- explicit manual translit, whereas ir1, ir2, etc. refer to inferred radicals, which are either strings or
-- lists of possible radicals.
local rads = base.root_consonants
local rad1, rad2, rad3, rad4 = rads[1], rads[2], rads[3], rads[4]
-- Default any unspecified radicals to radicals determined from the headword. The returned radicals may be
-- lists of possible radicals, where the first radical should be chosen if the user didn't explicitly specify a
-- radical but all are allowed. If `ambig = true` is set in the table, the radical is considered ambiguous and
-- categories won't be created for weak radicals.
local weakness, ir1, ir2, ir3, ir4
if vform ~= "none" then
ir1, ir2, ir3 = rmatch(base.lemma, "^([^_])_([^_])_([^_])$")
if not ir1 then
ir1, ir2, ir3, ir4 = rmatch(base.lemma, "^([^_])_([^_])_([^_])_([^_])$")
end
if ir1 then
-- root given instead of lemma
weakness = weakness_from_radicals(vform, ir1, ir2, ir3, ir4)
if vform == "VIII" then
vowel_spec.form_viii_assim = form_viii_join_ta(ir1)
end
else
local ret = export.infer_radicals {
headword = base.lemma,
vform = vform,
passive = base.passive,
past_vowel = vowel_spec.past,
nonpast_vowel = vowel_spec.nonpast,
is_reduced = base.reduced,
}
weakness, ir1, ir2, ir3, ir4 = ret.weakness, ret.rad1, ret.rad2, ret.rad3, ret.rad4
vowel_spec.form_viii_assim = ret.form_viii_assim
vowel_spec.past = ret.past_vowel
vowel_spec.nonpast = ret.nonpast_vowel
vowel_spec.variant = base.variant or ret.variant
end
end
-- For most ambiguous radicals, the choice of radical doesn't matter because it doesn't affect the conjugation
-- one way or another. For form I hollow verbs, however, it definitely does. In fact, the choice of radical is
-- critical even beyond the past and non-past vowels because it affects the form of the passive participle. So,
-- check for this and signal an error if the radical could not be inferred and is not given explicitly.
if vform == "I" and type(ir2) == "table" and ir2.need_radical and not rad2 then
error("Unable to guess middle radical of hollow form I verb; need to specify radical explicitly")
end
if vform == "I" and not is_passive_only(base.passive) and (
rget(vowel_spec.past) == "-" or rget(vowel_spec.nonpast) == "-") then
error("Form I verb that isn't passive-only or final-weak must have past~non-past vowels specified")
end
-- Convert ambiguous radicals.
local function regularize_inferred_radical(rad)
if type(rad) == "table" then
if rad.ambig then
return {form = rad[1], ambig = true}
else
return rad[1]
end
else
return rad
end
end
-- Return the appropriate radical at index `index` (1 through 4), based either on the user-specified radical
-- `user_radical` or (if unspecified) `inferred_radical`, inferred from the unvocalized lemma. Two values are
-- returned, the "regularized" version of the radical (where ambiguous inferred radicals are converted to their
-- most likely actual radical) and the non-regularized version. The returned values are form objects rather than
-- strings.
local function fetch_radical(user_radical, inferred_radical, index)
if not user_radical then
return regularize_inferred_radical(inferred_radical), inferred_radical
else
local rad_formval = rget(user_radical)
if type(inferred_radical) == "table" then
local allowed_radical_set = m_table.listToSet(inferred_radical)
if not allowed_radical_set[rad_formval] then
error(("For lemma %s, radical %s ambiguously inferred as %s but user radical incompatibly given as %s"):
format(base.lemma, index,
list_to_text(inferred_radical, nil, " or "), rad_formval))
end
elseif rad_formval ~= inferred_radical then
error(("For lemma %s, radical %s inferred as %s but user radical incompatibly given as %s"):
format(base.lemma, index, inferred_radical, rad_formval))
end
return user_radical, user_radical
end
end
if vform ~= "none" then
vowel_spec.rad1, vowel_spec.unreg_rad1 = fetch_radical(rad1, ir1, 1)
vowel_spec.rad2, vowel_spec.unreg_rad2 = fetch_radical(rad2, ir2, 2)
vowel_spec.rad3, vowel_spec.unreg_rad3 = fetch_radical(rad3, ir3, 3)
if base.quadlit then
vowel_spec.rad4, vowel_spec.unreg_rad4 = fetch_radical(rad4, ir4, 4)
end
end
if vform == "I" then
-- If explicit weakness given using 'I-sound' or 'I-assimilated', we may need to adjust the inferred weakness.
if base.explicit_weakness == "sound" then
if weakness == "assimilated" then
weakness = "sound"
elseif weakness == "assimilated+final-weak" then
-- Verbs like waniya~yawnā "to be faint; to languish" (although the defaults should handle this
-- correctly)
weakness = "final-weak"
else
error(("Can't specify form 'I-sound' when inferred weakness is '%s' for lemma %s"):format(
weakness, base.lemma))
end
elseif base.explicit_weakness == "assimilated" then
if weakness == "sound" then
-- i~a verbs like waṭiʔa~yaṭaʔu "to tread, to trample"; wasiʕa~yasaʕu "to be spacious; to be well-off";
-- waṯiʔa~yaṯaʔu "to get bruised, to be sprained", which would default to sound.
weakness = "assimilated"
elseif weakness == "final-weak" then
-- For completeness; not clear if any verbs occur where this is needed. (There are plenty of
-- assimilated+final-weak verbs but the defaults should take care of them.)
weakness = "assimilated+final-weak"
else
error(("Can't specify form 'I-assimilated' when inferred weakness is '%s' for lemma %s"):format(
weakness, base.lemma))
end
elseif base.explicit_weakness then
error(("Internal error: Unrecognized value '%s' for base.explicit_weakness"):format(base.explicit_weakness))
end
elseif vform == "none" then
weakness = base.explicit_weakness
elseif base.explicit_weakness then
error(("Internal error: Explicit weakness should not be specifiable except with forms I and none, but saw explicit weakness '%s' with verb form '%s'"):
format(base.explicit_weakness, vform))
end
vowel_spec.weakness = weakness
if vform ~= "none" then
-- Error if radicals are wrong given the weakness. More likely to happen if the weakness is explicitly given
-- rather than inferred. Will also happen if certain incorrect letters are included as radicals e.g. hamza on
-- top of various letters, alif maqṣūra, tā' marbūṭa.
check_radicals(vform, weakness, rget(vowel_spec.rad1), rget(vowel_spec.rad2), rget(vowel_spec.rad3),
base.quadlit and rget(vowel_spec.rad4) or nil)
end
-- Check the variant value.
local form_iii_vi_geminate = (vform == "III" or vform == "VI") and rget(vowel_spec.rad2) == rget(vowel_spec.rad3) and
not req(vowel_spec.rad2, Y)
local hayy_i_x = hayy_radicals(vowel_spec.rad1, vowel_spec.rad2, vowel_spec.rad3) and (vform == "I" or vform == "X")
if form_iii_vi_geminate or hayy_i_x then
if vowel_spec.variant and vowel_spec.variant ~= "long" and vowel_spec.variant ~= "short" and vowel_spec.variant ~= "both" then
error(("For form-III/VI geminate verb or form-I/X verb with ح-ي-ي radicals, saw unrecognized 'var:%s' value; should be 'var:long', 'var:short' or 'var:both'"):format(
vowel_spec.variant))
end
elseif vowel_spec.variant then
error(("Variant value 'var:%s' not allowed in this context"):format(vowel_spec.variant))
end
end
-- If form I, regroup expanded vowels for display purposes.
if vform == "I" then
local group_by_past = {}
for _, vowel_spec in ipairs(base.conj_vowels) do
m_table.insertIfNot(group_by_past, {
past = undia[rget(vowel_spec.past)],
nonpasts = {undia[rget(vowel_spec.nonpast)]},
}, {
key = function(obj) return obj.past end,
combine = function(obj1, obj2)
for _, nonpast in ipairs(obj2.nonpasts) do
m_table.insertIfNot(obj1.nonpasts, nonpast)
end
end,
})
end
local group_by_nonpast = {}
for _, vowel_spec in ipairs(group_by_past) do
m_table.insertIfNot(group_by_nonpast, {
pasts = {vowel_spec.past},
nonpasts = vowel_spec.nonpasts,
}, {
key = function(obj) return obj.nonpasts end,
combine = function(obj1, obj2)
for _, past in ipairs(obj2.pasts) do
m_table.insertIfNot(obj1.pasts, past)
end
end,
})
end
base.grouped_conj_vowels = group_by_nonpast
end
-- Set value of passive. If not specified, default is yes for forms II, III, IV and Iq; no but uncertainly for
-- forms VII, IX, XI - XV and IIIq - IVq, as well as form I with past vowel u; impersonal but uncertainly for form
-- V, VI, X and IIq, as well as form I with past vowel i; and yes but uncertainly for the remainder (form I with
-- past vowel only a and form VIII).
if not base.passive then
base.passive_defaulted = true
-- Temporary tracking for defaulted passives by verb form, weakness and (for form I) past/non-past vowels.
track_if_ar_conj(base, "passive-defaulted/" .. vform)
for _, vowel_spec in ipairs(base.conj_vowels) do
track_if_ar_conj(base, "passive-defaulted/" .. vform.. "/" .. vowel_spec.weakness)
if vform == "I" then
local past_nonpast = ("%s~%s"):format(undia[vowel_spec.past], undia[vowel_spec.nonpast])
track_if_ar_conj(base, "passive-defaulted/I/" .. past_nonpast)
track_if_ar_conj(base, "passive-defaulted/I/" .. vowel_spec.weakness .. "/" .. past_nonpast)
end
end
if vform_probably_full_passive(vform) then
base.passive = "pass"
else
base.passive_uncertain = true
for _, vowel_spec in ipairs(base.conj_vowels) do
if vform_probably_no_passive(vform, vowel_spec.weakness, vowel_spec.past, vowel_spec.nonpast) then
base.passive = "nopass"
break
elseif vform_probably_impersonal_passive(vform, vowel_spec.weakness, vowel_spec.past,
vowel_spec.nonpast) then
base.passive = "ipass"
break
end
end
base.passive = base.passive or "pass"
end
end
-- NOTE: Currently there are no built-in stems or form overrides for Arabic; this code is inherited from
-- [[Module:ca-verb]], where such things do exist, and is kept for generality in case we decide in the future to
-- implement such things.
-- Override built-in verb stems and overrides with user-specified ones.
for stem, values in pairs(base.user_stem_overrides) do
base.stem_overrides[stem] = values
end
for slot, values in pairs(base.user_slot_overrides) do
if not base.alternant_multiword_spec.verb_slots_map[slot] then
error("Unrecognized override slot '" .. slot .. "': " .. base.angle_bracket_spec)
end
if export.unsettable_slots_set[slot] then
error("Slot '" .. slot .. "' cannot be set using an override: " .. base.angle_bracket_spec)
end
if skip_slot(base, slot, "allow overrides") then
error("Override slot '" .. slot ..
"' would be skipped based on the passive, 'noimp' and/or 'no_nonpast' settings: " ..
base.angle_bracket_spec)
end
base.slot_overrides[slot] = values
end
if base.verb_form == "none-final-weak" then
for _, stem_type in ipairs { "past", "past_pass", "nonpast", "nonpast_pass" } do
if base.stem_overrides[stem_type .. "_c"] or base.stem_overrides[stem_type .. "_v"] then
error(("Specify past stem for verb type 'none-final-weak' using '%s:...' not '%s_c:...' or '%s_v:...'"):
format(stem_type, stem_type, stem_type))
end
end
for _, stem_type in ipairs { "past", "nonpast" } do
if base.stem_overrides[stem_type] or not base.stem_overrides[stem_type .. "_final_weak_vowel"] then
error(("For verb type 'none-final-weak', if '%s:...' specified, so must '%s_final_weak_vowel:...'"):
format(stem_type, stem_type))
end
end
end
end
local function detect_all_indicator_specs(alternant_multiword_spec)
add_slots(alternant_multiword_spec)
alternant_multiword_spec.verb_forms = {}
-- This means at least one individual base had the slot marked as explicitly missing. Another base (e.g. when
-- there are multiple alternants) might have a value for the slot. In practice, we only respect this when there are
-- no overall values in the slot and `slot_uncertain` isn't set; in this case, we display "no ..." for the slot
-- instead of simply not displaying anything for the slot.
alternant_multiword_spec.slot_explicitly_missing = {}
-- This means at least one individual base had no values for the slot and the slot marked as explicitly uncertain.
-- Note that this is different from a value being present but marked as uncertain (e.g. if an override was given
-- with a ? after it); this causes the form object for the value to have `uncertain = true` set. If there are no
-- overall values in the slot and `slot_uncertain` is set, we display this in the headword.
alternant_multiword_spec.slot_uncertain = {}
iut.map_word_specs(alternant_multiword_spec, function(base)
-- So arguments, etc. can be accessed. WARNING: Creates circular reference.
base.alternant_multiword_spec = alternant_multiword_spec
detect_indicator_spec(base)
if not base.nocat then
m_table.insertIfNot(alternant_multiword_spec.verb_forms, base.verb_form)
end
if base.passive_uncertain then
alternant_multiword_spec.passive_uncertain = true
end
for slot, _ in pairs(base.slot_explicitly_missing) do
alternant_multiword_spec.slot_explicitly_missing[slot] = true
end
end)
end
local function determine_slot_uncertainty_from_forms(alternant_multiword_spec)
iut.map_word_specs(alternant_multiword_spec, function(base)
-- If no verbal noun and verb form is not 'none' (manually-specified stems) — which currently only happens for
-- form I — and the verbal noun wasn't explicitly indicated as missing using <vn:->, we assume it's just
-- unknown/unspecified rather than missing. Same with active participles.
for uncertain_slot, _ in pairs(slots_that_may_be_uncertain) do
if not base.forms[uncertain_slot] and vform ~= "none" and not skip_slot(base, uncertain_slot) then
base.slot_uncertain[uncertain_slot] = true
end
end
-- Propagate slot uncertainty up. Currently only the verbal noun can have this set but we write the code
-- generally.
for slot, _ in pairs(base.slot_uncertain) do
alternant_multiword_spec.slot_uncertain[slot] = true
end
end)
-- If slot is uncertain and has no value, explicitly set its value to "?".
for uncertain_slot, _ in pairs(slots_that_may_be_uncertain) do
if not alternant_multiword_spec.forms[uncertain_slot] and
alternant_multiword_spec.slot_uncertain[uncertain_slot] then
alternant_multiword_spec.forms[uncertain_slot] = {{form = "?"}}
end
end
end
-- Determine certain properties of the verb from the overall forms, such as whether the verb is active-only or
-- passive-only, is impersonal, lacks an imperative, etc.
local function determine_verb_properties_from_forms(alternant_multiword_spec)
alternant_multiword_spec.has_active = false
alternant_multiword_spec.has_passive = false
alternant_multiword_spec.has_non_impers_active = false
alternant_multiword_spec.has_non_impers_passive = false
alternant_multiword_spec.has_imp = false
alternant_multiword_spec.has_past = false
alternant_multiword_spec.has_nonpast = false
for slot, _ in pairs(alternant_multiword_spec.forms) do
if slot == "ap" or slot:find("[123]") and not slot:find("_pass") then
alternant_multiword_spec.has_active = true
end
if slot == "pp" or slot:find("[123]") and slot:find("_pass") then
alternant_multiword_spec.has_passive = true
end
if slot:find("[123]") and not slot:find("pass_[123]") and not slot:find("3ms") then
alternant_multiword_spec.has_non_impers_active = true
end
if slot:find("pass_[123]") and not slot:find("3ms") then
alternant_multiword_spec.has_non_impers_passive = true
end
if slot:find("^imp_") then
alternant_multiword_spec.has_imp = true
end
if slot:find("^past_") then
alternant_multiword_spec.has_past = true
end
if slot:find("^ind_") or slot:find("^sub_") or slot:find("^juss_") then
alternant_multiword_spec.has_nonpast = true
end
end
end
local function add_categories_and_annotation(alternant_multiword_spec, base, multiword_lemma, insert_ann, insert_cat)
-- Useful e.g. in constructing suppletive verbs out of parts. For a verb like جاء or أتى whose imperative comes
-- from the unrelated verb تعالى, we don't want the latter verb showing up in categories or annotations.
if base.nocat then
return
end
local vform = base.verb_form
if vform ~= "none" then
insert_ann("form", vform)
insert_cat("form-" .. vform .. " verbs")
end
if base.reduced then
insert_ann("reduced", "reduced")
if vform ~= "none" then
insert_cat("form-" .. vform .. " reduced verbs")
end
end
if base.quadlit then
insert_cat("verbs with quadriliteral roots")
end
if base.passive_defaulted then
insert_cat("verbs with defaulted passive")
end
for _, vowel_spec in ipairs(base.conj_vowels) do
local rad1, rad2, rad3, rad4 = get_radicals_4(vowel_spec)
local final_weak = is_final_weak(base, vowel_spec)
local weakness = vowel_spec.weakness
-- We have to distinguish weakness by form and weakness by conjugation. Weakness by form merely indicates the
-- presence of weak letters in certain positions in the radicals. Weakness by conjugation is related to how the
-- verbs are conjugated. For example, form-II verbs that are "hollow by form" (middle radical is wāw or yāʾ) are
-- conjugated as sound verbs. Another example: form-I verbs with initial wāw are "assimilated by form" and most
-- are assimilated by conjugation as well, but a few are sound by conjugation, e.g. wajuha yawjuhu "to be
-- distinguished" (rather than wajuha yajuhu); similarly for some hollow-by-form verbs in various forms, e.g.
-- form VIII izdawaja yazdawiju "to be in pairs" (rather than izdāja yazdāju). Categories referring to weakness
-- always refer to weakness by conjugation; weakness by form is distinguished only by categories such as
-- [[:Category:Arabic form-III verbs with و as second radical]].
insert_ann("weakness", weakness)
if vform ~= "none" then
insert_cat(("%s form-%s verbs"):format(weakness, vform))
end
local function radical_is_ambiguous(rad)
return type(rad) == "table" and rad.ambig
end
local function radical_is_unambiguous_weak(rad)
return not radical_is_ambiguous(rad) and (is_waw_ya(rad) or req(rad, HAMZA))
end
if vform ~= "none" then
local ur1, ur2, ur3, ur4 =
vowel_spec.unreg_rad1, vowel_spec.unreg_rad2, vowel_spec.unreg_rad3, vowel_spec.unreg_rad4
-- Create headword categories based on the radicals. Do the following before
-- converting the Latin radicals into Arabic ones so we distinguish
-- between ambiguous and non-ambiguous radicals.
if radical_is_ambiguous(ur1) or radical_is_ambiguous(ur2) or radical_is_ambiguous(ur3) or
ur4 and radical_is_ambiguous(ur4) then
insert_cat("verbs with ambiguous radicals")
end
if radical_is_unambiguous_weak(ur1) then
insert_cat("form-" .. vform .. " verbs with " .. rget(ur1) .. " as first radical")
end
if radical_is_unambiguous_weak(ur2) then
insert_cat("form-" .. vform .. " verbs with " .. rget(ur2) .. " as second radical")
end
if radical_is_unambiguous_weak(ur3) then
insert_cat("form-" .. vform .. " verbs with " .. rget(ur3) .. " as third radical")
end
if ur4 and radical_is_unambiguous_weak(ur4) then
insert_cat("form-" .. vform .. " verbs with " .. rget(ur4) .. " as fourth radical")
end
end
end
if vform == "I" and not is_passive_only(base.passive) then
for _, vowel_spec in ipairs(base.grouped_conj_vowels) do
insert_ann("vowels",
("%s ~ %s"):format(table.concat(vowel_spec.pasts, "/"), table.concat(vowel_spec.nonpasts, "/")))
for _, past in ipairs(vowel_spec.pasts) do
for _, nonpast in ipairs(vowel_spec.nonpasts) do
if past == "-" or nonpast == "-" then
error("Internal error: Saw form I past vowel %s and non-past vowel %s but - in place of vowel should have triggered an error earlier")
end
insert_cat(("form-I verbs with past vowel %s and non-past vowel %s"):format(past, nonpast))
end
end
end
end
for slot, name in pairs(slots_that_may_be_uncertain) do
if base.slot_uncertain[slot] then
-- An unspecified and non-defaulted verbal noun (form I) is considered uncertain rather than explicitly
-- missing. Use <vn:-> to explicitly indicate the lack of verbal noun. Same for form-I stative active
-- participles.
insert_cat(("verbs with unknown or uncertain %ss"):format(name))
end
end
if base.irregular then
insert_ann("irreg", "irregular")
insert_cat("irregular verbs")
end
end
-- Compute the categories to add the verb to, as well as the annotation to display in the conjugation title bar. We
-- combine the code to do these functions as both categories and title bar contain similar information.
local function compute_categories_and_annotation(alternant_multiword_spec)
alternant_multiword_spec.categories = {}
local ann = {}
alternant_multiword_spec.annotation = ann
ann.form = {}
ann.weakness = {}
ann.vowels = {}
ann.passive = nil
ann.reduced = {}
ann.irreg = {}
ann.defective = {}
local multiword_lemma = false
for _, slot in ipairs(export.potential_lemma_slots) do
if alternant_multiword_spec.forms[slot] then
for _, formobj in ipairs(alternant_multiword_spec.forms[slot]) do
if formobj.form:find(" ") then
multiword_lemma = true
break
end
end
break
end
end
local function insert_ann(anntype, value)
m_table.insertIfNot(alternant_multiword_spec.annotation[anntype], value)
end
local function insert_cat(cat, also_when_multiword)
-- Don't place multiword terms in categories like 'Arabic form-II verbs' to avoid spamming the categories with
-- such terms.
if also_when_multiword or not multiword_lemma then
m_table.insertIfNot(alternant_multiword_spec.categories, "Arabic " .. cat)
end
end
iut.map_word_specs(alternant_multiword_spec, function(base)
add_categories_and_annotation(alternant_multiword_spec, base, multiword_lemma, insert_ann, insert_cat)
end)
for slot, name in pairs(slots_that_may_be_uncertain) do
if alternant_multiword_spec.forms[slot] then
for _, form in ipairs(alternant_multiword_spec.forms[slot]) do
if form.uncertain then
if form.form == "?" then
insert_cat(("verbs with explicitly unknown %ss"):format(name))
else
insert_cat(("verbs needing %s checked"):format(name))
end
break
end
end
end
end
if alternant_multiword_spec.has_active then
if alternant_multiword_spec.has_passive and alternant_multiword_spec.has_non_impers_passive then
insert_cat("verbs with full passive")
ann.passive = "full passive"
elseif alternant_multiword_spec.has_passive then
insert_cat("verbs with impersonal passive")
ann.passive = "impersonal passive"
else
insert_cat("verbs lacking passive forms")
ann.passive = "no passive"
end
else
if alternant_multiword_spec.has_non_impers_passive then
insert_cat("passive verbs")
insert_cat("verbs with full passive")
ann.passive = "passive-only"
else
insert_cat("passive verbs")
insert_cat("impersonal verbs")
insert_cat("verbs with impersonal passive")
ann.passive = "impersonal (passive-only)"
end
end
if alternant_multiword_spec.passive_uncertain then
insert_cat("verbs needing passive checked")
ann.passive = ann.passive .. ' <abbr title="passive status uncertain">(?)</abbr>'
end
if alternant_multiword_spec.has_active and not alternant_multiword_spec.has_imp then
insert_ann("defective", "no imperative")
insert_cat("verbs lacking imperative forms")
end
if not alternant_multiword_spec.has_past then
insert_ann("defective", "no past")
insert_cat("verbs lacking past forms")
end
if not alternant_multiword_spec.has_nonpast then
insert_ann("defective", "no non-past")
insert_cat("verbs lacking non-past forms")
end
local ann_parts = {}
local function insert_ann_part(part, conj)
local val = table.concat(ann[part], conj or " or ")
if val ~= "" and val ~= "regular" then
table.insert(ann_parts, val)
end
end
insert_ann_part("form")
insert_ann_part("weakness")
insert_ann_part("reduced")
insert_ann_part("vowels")
if ann.passive then
table.insert(ann_parts, ann.passive)
end
insert_ann_part("irreg")
insert_ann_part("defective", ", ")
alternant_multiword_spec.annotation = table.concat(ann_parts, ", ")
end
local function show_forms(alternant_multiword_spec)
local lemmas = {}
for _, slot in ipairs(export.potential_lemma_slots) do
if alternant_multiword_spec.forms[slot] then
for _, formobj in ipairs(alternant_multiword_spec.forms[slot]) do
table.insert(lemmas, formobj)
end
break
end
end
alternant_multiword_spec.lemmas = lemmas -- save for later use in make_table()
alternant_multiword_spec.vn = alternant_multiword_spec.forms.vn -- save for later use in make_table()
-- Reconstruct the original verb spec without overrides for verbal nouns and participles, since those specific slots
-- are ignored by {{ar-verb form}}. Compute this once beforehand; `transform_accel_obj` is called repeatedly on each
-- form and we don't want to compute this repeatedly.
local reconstructed_verb_spec = iut.reconstruct_original_spec(alternant_multiword_spec, {
preprocess_angle_bracket_spec = function(spec)
spec = spec:match("^<(.*)>$")
assert(spec)
local segments = put.parse_multi_delimiter_balanced_segment_run(spec, {{"[", "]"}, {"<", ">"}})
local dot_separated_groups = put.split_alternating_runs_and_strip_spaces(segments, "%.")
-- Rejoin each dot-separated group into a single string, since we aren't actually going to do any parsing
-- of bracket-bounded textual runs; then filter out overrides for verbal nouns and participles.
local filtered_indicators = {}
for _, dot_separated_group in ipairs(dot_separated_groups) do
local indicator = table.concat(dot_separated_group)
-- FIXME: Do we want to filter out any other indicators?
if not (indicator:find("^vn:") or indicator:find("^[ap]p:")) then
table.insert(filtered_indicators, indicator)
end
end
return ("<%s>"):format(table.concat(filtered_indicators, "."))
end,
})
-- If we're dealing with a single word, no alternants and a single verb form, use the auto-conjugation-fetching
-- variant.
local reconstructed_lemma, inside = reconstructed_verb_spec:match("^([^ <>()]+)(%b<>)$")
if inside and alternant_multiword_spec.verb_forms[1] and not alternant_multiword_spec.verb_forms[2] then
reconstructed_verb_spec = ("+%s<%s>"):format(reconstructed_lemma, alternant_multiword_spec.verb_forms[1])
end
local function transform_accel_obj(slot, formobj, accel_obj)
if not accel_obj then
return accel_obj
end
if slot == "ap" or slot == "pp" or slot == "vn" then
-- FIXME: [[Module:accel]] can't correctly handle more than one verb form for participles and verbal nouns
accel_obj.form = slot .. "-" .. table.concat(alternant_multiword_spec.verb_forms, ",")
else
accel_obj.form = "verb-form-" .. reconstructed_verb_spec
end
return accel_obj
end
local function generate_link(data)
local form = data.form
local term = form.formval_for_link
local alt = form.alt
if term == "?" then
term = nil
alt = "?"
end
local link = m_links.full_link {
lang = lang, term = term, tr = "-", accel = form.accel_obj,
alt = alt, gloss = form.gloss, genders = form.genders, pos = form.pos, lit = form.lit, id = form.id,
} .. iut.get_footnote_text(form.footnotes, data.footnote_obj)
if form.q and form.q[1] or form.qq and form.qq[1] or form.l and form.l[1] or form.ll and form.ll[1] then
link = require(pron_qualifier_module).format_qualifiers {
lang = lang,
text = link,
q = form.q,
qq = form.qq,
l = form.l,
ll = form.ll,
}
end
return link
end
local props = {
lang = lang,
lemmas = lemmas,
transform_accel_obj = transform_accel_obj,
generate_link = generate_link,
slot_list = alternant_multiword_spec.verb_slots,
include_translit = true,
}
iut.show_forms(alternant_multiword_spec.forms, props)
end
-------------------------------------------------------------------------------
-- Functions to create inflection tables --
-------------------------------------------------------------------------------
-- Make the conjugation table. Called from export.show().
local function make_table(alternant_multiword_spec)
local text = mw.getCurrentFrame():expandTemplate{
title = 'inflection-table-top',
args = {
title = 'Conjugation of {title}',
tall = 'yes',
palette = "green",
category = 'conjugation',
class = 'tr-alongside', -- temp hack to prevent extra line break
}
}
text = text .. [=[
! colspan="6" | verbal noun<br /><<الْمَصْدَر>>
| colspan="7" | {vn}
]=]
if alternant_multiword_spec.has_active then
text = text .. [=[
|-
! colspan="6" | active participle<br /><<اِسْم الْفَاعِل>>
| colspan="7" | {ap}
]=]
end
if alternant_multiword_spec.has_passive then
text = text .. [=[
|-
! colspan="6" | passive participle<br /><<اِسْم الْمَفْعُول>>
| colspan="7" | {pp}
]=]
end
text = text .. [=[
|-
! colspan="999" class="separator" |
]=]
if alternant_multiword_spec.has_active then
text = text .. [=[
|-
! colspan="12" class="outer" | active voice<br /><<الْفِعْل الْمَعْلُوم>>
|-
! colspan="2" |
! colspan="3" | singular<br /><<الْمُفْرَد>>
! rowspan="12" class="separator" |
! colspan="2" | dual<br /><<الْمُثَنَّى>>
! rowspan="12" class="separator" |
! colspan="3"| plural<br /><<الْجَمْع>>
|-
! colspan="2"|
! 1<sup>st</sup> person<br /><<الْمُتَكَلِّم>>
! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>>
! 3<sup>rd</sup> person<br /><<الْغَائِب>>
! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>>
! 3<sup>rd</sup> person<br /><<الْغَائِب>>
! 1<sup>st</sup> person<br /><<الْمُتَكَلِّم>>
! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>>
! 3<sup>rd</sup> person<br /><<الْغَائِب>>
|-
! rowspan="2" | past (perfect) indicative<br /><<الْمَاضِي>>
! class="secondary" | m
| rowspan="2" | {past_1s}
| {past_2ms}
| {past_3ms}
| rowspan="2" | {past_2d}
| {past_3md}
| rowspan="2" | {past_1p}
| {past_2mp}
| {past_3mp}
|-
! class="secondary" | f
| {past_2fs}
| {past_3fs}
| {past_3fd}
| {past_2fp}
| {past_3fp}
|-
! rowspan="2" | non-past (imperfect) indicative<br /><<الْمُضَارِع الْمَرْفُوع>>
! class="secondary" | m
| rowspan="2" | {ind_1s}
| {ind_2ms}
| {ind_3ms}
| rowspan="2" | {ind_2d}
| {ind_3md}
| rowspan="2" | {ind_1p}
| {ind_2mp}
| {ind_3mp}
|-
! class="secondary" | f
| {ind_2fs}
| {ind_3fs}
| {ind_3fd}
| {ind_2fp}
| {ind_3fp}
|-
! rowspan="2" | subjunctive<br /><<الْمُضَارِع الْمَنْصُوب>>
! class="secondary" | m
| rowspan="2" | {sub_1s}
| {sub_2ms}
| {sub_3ms}
| rowspan="2" | {sub_2d}
| {sub_3md}
| rowspan="2" | {sub_1p}
| {sub_2mp}
| {sub_3mp}
|-
! class="secondary" | f
| {sub_2fs}
| {sub_3fs}
| {sub_3fd}
| {sub_2fp}
| {sub_3fp}
|-
! rowspan="2" | jussive<br /><<الْمُضَارِع الْمَجْزُوم>>
! class="secondary" | m
| rowspan="2" | {juss_1s}
| {juss_2ms}
| {juss_3ms}
| rowspan="2" | {juss_2d}
| {juss_3md}
| rowspan="2" | {juss_1p}
| {juss_2mp}
| {juss_3mp}
|-
! class="secondary" | f
| {juss_2fs}
| {juss_3fs}
| {juss_3fd}
| {juss_2fp}
| {juss_3fp}
|-
! rowspan="2" | imperative<br /><<الْأَمْر>>
! class="secondary" | m
| rowspan="2" |
| {imp_2ms}
| rowspan="2" |
| rowspan="2" | {imp_2d}
| rowspan="2" |
| rowspan="2" |
| {imp_2mp}
| rowspan="2" |
|-
! class="secondary" | f
| {imp_2fs}
| {imp_2fp}
]=]
end
if alternant_multiword_spec.has_passive then
text = text .. [=[
|-
! colspan="999" class="separator" |
|-
! colspan="12" class="outer" | passive voice<br /><<الْفِعْل الْمَجْهُول>>
|-
! colspan="2" |
! colspan="3" | singular<br /><<الْمُفْرَد>>
! rowspan="10" class="separator" |
! colspan="2" | dual<br /><<الْمُثَنَّى>>
! rowspan="10" class="separator" |
! colspan="3" | plural<br /><<الْجَمْع>>
|-
! colspan="2" |
! 1<sup>st</sup> person<br /><<الْمُتَكَلِّم>>
! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>>
! 3<sup>rd</sup> person<br /><<الْغَائِب>>
! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>>
! 3<sup>rd</sup> person<br /><<الْغَائِب>>
! 1<sup>st</sup> person<br /><<الْمُتَكَلِّم>>
! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>>
! 3<sup>rd</sup> person<br /><<الْغَائِب>>
|-
! rowspan="2" | past (perfect) indicative<br /><<الْمَاضِي>>
! class="secondary" | m
| rowspan="2" | {past_pass_1s}
| {past_pass_2ms}
| {past_pass_3ms}
| rowspan="2" | {past_pass_2d}
| {past_pass_3md}
| rowspan="2" | {past_pass_1p}
| {past_pass_2mp}
| {past_pass_3mp}
|-
! class="secondary" | f
| {past_pass_2fs}
| {past_pass_3fs}
| {past_pass_3fd}
| {past_pass_2fp}
| {past_pass_3fp}
|-
! rowspan="2" | non-past (imperfect) indicative<br /><<الْمُضَارِع الْمَرْفُوع>>
! class="secondary" | m
| rowspan="2" | {ind_pass_1s}
| {ind_pass_2ms}
| {ind_pass_3ms}
| rowspan="2" | {ind_pass_2d}
| {ind_pass_3md}
| rowspan="2" | {ind_pass_1p}
| {ind_pass_2mp}
| {ind_pass_3mp}
|-
! class="secondary" | f
| {ind_pass_2fs}
| {ind_pass_3fs}
| {ind_pass_3fd}
| {ind_pass_2fp}
| {ind_pass_3fp}
|-
! rowspan="2" | subjunctive<br /><<الْمُضَارِع الْمَنْصُوب>>
! class="secondary" | m
| rowspan="2" | {sub_pass_1s}
| {sub_pass_2ms}
| {sub_pass_3ms}
| rowspan="2" | {sub_pass_2d}
| {sub_pass_3md}
| rowspan="2" | {sub_pass_1p}
| {sub_pass_2mp}
| {sub_pass_3mp}
|-
! class="secondary" | f
| {sub_pass_2fs}
| {sub_pass_3fs}
| {sub_pass_3fd}
| {sub_pass_2fp}
| {sub_pass_3fp}
|-
! rowspan="2" | jussive<br /><<الْمُضَارِع الْمَجْزُوم>>
! class="secondary" | m
| rowspan="2" | {juss_pass_1s}
| {juss_pass_2ms}
| {juss_pass_3ms}
| rowspan="2" | {juss_pass_2d}
| {juss_pass_3md}
| rowspan="2" | {juss_pass_1p}
| {juss_pass_2mp}
| {juss_pass_3mp}
|-
! class="secondary" | f
| {juss_pass_2fs}
| {juss_pass_3fs}
| {juss_pass_3fd}
| {juss_pass_2fp}
| {juss_pass_3fp}
]=]
end
text = text .. mw.getCurrentFrame():expandTemplate{
title = 'inflection-table-bottom',
args = {
notes = '{footnote}',
}
}
local forms = alternant_multiword_spec.forms
if not alternant_multiword_spec.lemmas then
forms.title = "—"
else
local linked_lemmas = {}
for _, form in ipairs(alternant_multiword_spec.lemmas) do
table.insert(linked_lemmas, link_term(form.form, "term"))
end
forms.title = table.concat(linked_lemmas, ", ")
end
local ann_parts = {}
if alternant_multiword_spec.annotation ~= "" then
table.insert(ann_parts, alternant_multiword_spec.annotation)
end
if alternant_multiword_spec.vn then
local linked_vns = {}
for _, form in ipairs(alternant_multiword_spec.vn) do
table.insert(linked_vns, link_term(form.form, "term"))
end
table.insert(ann_parts, (#linked_vns > 1 and "verbal nouns" or "verbal noun") .. " " ..
table.concat(linked_vns, ", "))
end
local annotation = table.concat(ann_parts, ", ")
if annotation ~= "" then
forms.title = forms.title .. " (" .. annotation .. ")"
end
-- Format the table.
local tagged_table = rsub(text, "<<(.-)>>", tag_text)
return m_string_utilities.format(tagged_table, forms)
end
-------------------------------------------------------------------------------
-- External entry points --
-------------------------------------------------------------------------------
-- Append two lists `l1` and `l2`, removing duplicates. If either is {nil}, just return the other.
local function combine_lists(l1, l2)
-- combine_footnotes() does exactly what we want.
return iut.combine_footnotes(l1, l2)
end
local function combine_metadata(data)
local src1 = data.form1
local src2 = data.form2
local dest = data.dest_form
dest.uncertain = src1.uncertain or src2.uncertain
if src1.genders and src2.genders and not m_table.deepEquals(src1.genders, src2.genders) then
-- do nothing
else
dest.genders = src1.genders or src2.genders
end
if src1.pos and src2.pos and src1.pos ~= src2.pos then
-- do nothing
else
dest.pos = src1.pos or src2.pos
end
-- Don't copy .alt, .gloss, .lit, .id, which describe a single term and don't extend to multiword terms.
dest.q = combine_lists(src1.q, src2.q)
dest.qq = combine_lists(src1.qq, src2.qq)
dest.l = combine_lists(src1.l, src2.l)
dest.ll = combine_lists(src1.ll, src2.ll)
end
-- Externally callable function to parse and conjugate a verb given user-specified arguments.
-- Return value is WORD_SPEC, an object where the conjugated forms are in `WORD_SPEC.forms`
-- for each slot. If there are no values for a slot, the slot key will be missing. The value
-- for a given slot is a list of objects {form=FORM, footnotes=FOOTNOTES}.
function export.do_generate_forms(args, source_template, headword_head)
local PAGENAME = mw.loadData("Module:headword/data").pagename
local function in_template_space()
return mw.title.getCurrentTitle().nsText == "Template"
end
-- Determine the verb spec we're being asked to generate the conjugation of. This may be taken from the current page
-- title or the value of |pagename=; but not when called from {{ar-verb form}}, where the page title is a
-- non-lemma form. Note that the verb spec may omit the lemma; e.g. it may be "<II>". For this reason, we use the
-- value of `pagename` computed here down below, when calling normalize_all_lemmas().
local pagename = source_template ~= "ar-verb form" and args.pagename or PAGENAME
local head = headword_head or pagename
local arg1 = args[1]
if not arg1 then
if (pagename == "ar-conj" or pagename == "ar-verb" or pagename == "ar-verb form") and in_template_space() then
arg1 = "كتب<I/a~u.pass>"
else
arg1 = "<>"
end
end
-- When called from {{ar-verb form}}, determine the non-lemma form whose inflections we're being asked to
-- determine. This normally comes from the page title or the value of |pagename=.
local verb_form_of_form
if source_template == "ar-verb form" then
verb_form_of_form = args.pagename
if not verb_form_of_form then
if PAGENAME == "ar-verb form" and in_template_space() then
verb_form_of_form = "كتبت"
else
verb_form_of_form = PAGENAME
end
end
end
local incorporated_headword_head_into_lemma = false
if arg1:find("^<.*>$") then -- missing lemma
if head:find(" ") then
-- If multiword lemma, try to add arg spec after the first word.
-- Try to preserve the brackets in the part after the verb, but don't do it
-- if there aren't the same number of left and right brackets in the verb
-- (which means the verb was linked as part of a larger expression).
local first_word, post = rmatch(head, "^(.-)( .*)$")
local left_brackets = rsub(first_word, "[^%[]", "")
local right_brackets = rsub(first_word, "[^%]]", "")
if #left_brackets == #right_brackets then
arg1 = iut.remove_redundant_links(first_word) .. arg1 .. post
incorporated_headword_head_into_lemma = true
else
-- Try again using the form without links.
local linkless_head = m_links.remove_links(head)
if linkless_head:find(" ") then
first_word, post = rmatch(linkless_head, "^(.-)( .*)$")
arg1 = first_word .. arg1 .. post
else
error("Unable to incorporate <...> spec into explicit head due to a multiword linked verb or " ..
"unbalanced brackets; please include <> explicitly: " .. arg1)
end
end
else
-- Will be incorporated through `head` below in the call to normalize_all_lemmas().
incorporated_headword_head_into_lemma = true
end
end
local parse_props = {
parse_indicator_spec = parse_indicator_spec,
angle_brackets_omittable = true,
allow_blank_lemma = true,
}
local alternant_multiword_spec = iut.parse_inflected_text(arg1, parse_props)
alternant_multiword_spec.pos = pos or "verbs"
alternant_multiword_spec.args = args
alternant_multiword_spec.source_template = source_template
alternant_multiword_spec.verb_form_of_form = verb_form_of_form
alternant_multiword_spec.incorporated_headword_head_into_lemma = incorporated_headword_head_into_lemma
normalize_all_lemmas(alternant_multiword_spec, head)
detect_all_indicator_specs(alternant_multiword_spec)
local inflect_props = {
lang = lang,
slot_list = alternant_multiword_spec.verb_slots,
inflect_word_spec = conjugate_verb,
combine_metadata = combine_metadata,
-- We add links around the generated verbal forms rather than allow the entire multiword
-- expression to be a link, so ensure that user-specified links get included as well.
include_user_specified_links = true,
}
iut.inflect_multiword_or_alternant_multiword_spec(alternant_multiword_spec, inflect_props)
if debug_translit then
for slot, forms in pairs(alternant_multiword_spec.forms) do
for _, form in ipairs(forms) do
if form.translit then
local full_form_translit = (lang:transliterate(m_links.remove_links(form.form)))
if full_form_translit ~= form.translit then
error(("Internal error: For slot '%s', form '%s' incremental translit '%s' not same as full translit '%s'"):
format(slot, form.form, form.translit, full_form_translit))
end
end
form.form = iut.remove_redundant_links(form.form)
end
end
end
-- Remove redundant brackets around entire forms.
for slot, forms in pairs(alternant_multiword_spec.forms) do
for _, form in ipairs(forms) do
form.form = iut.remove_redundant_links(form.form)
end
end
determine_slot_uncertainty_from_forms(alternant_multiword_spec)
determine_verb_properties_from_forms(alternant_multiword_spec)
compute_categories_and_annotation(alternant_multiword_spec)
if args.json and source_template == "ar-conj" then
-- There is a circular reference in `base.alternant_multiword_spec`, which points back to top level.
iut.map_word_specs(alternant_multiword_spec, function(base)
base.alternant_multiword_spec = nil
end)
return require("Module:JSON").toJSON(alternant_multiword_spec)
end
return alternant_multiword_spec
end
-- Entry point for {{ar-conj}}. Template-callable function to parse and conjugate a verb given
-- user-specified arguments and generate a displayable table of the conjugated forms.
function export.show(frame)
local parent_args = frame:getParent().args
local params = {
[1] = {},
["noautolinktext"] = {type = "boolean"},
["noautolinkverb"] = {type = "boolean"},
["t"] = {}, -- for use by {{ar-verb form}}; otherwise ignored
["id"] = {}, -- for use by {{ar-verb form}}; otherwise ignored
["pagename"] = {}, -- for testing/documentation pages
["json"] = {type = "boolean"}, -- for bot use
}
local args = require("Module:parameters").process(parent_args, params)
local alternant_multiword_spec = export.do_generate_forms(args, "ar-conj")
if type(alternant_multiword_spec) == "string" then
-- JSON return value
return alternant_multiword_spec
end
show_forms(alternant_multiword_spec)
return make_table(alternant_multiword_spec) ..
require("Module:utilities").format_categories(alternant_multiword_spec.categories, lang, nil, nil, force_cat)
end
function export.verb_forms(frame)
local parargs = frame:getParent().args
local params = {
[1] = {},
[2] = {},
[3] = {},
[4] = {},
[5] = {},
pagename = {},
}
for _, form in ipairs(allowed_vforms) do
-- FIXME: We go up to 5 here. The code supports unlimited variants but it's unlikely we will ever see more than
-- 2.
for index = 1, 5 do
local prefix = index == 1 and form or form .. index
params[prefix .. "-pv"] = {}
for _, extn in ipairs { "", "-vn", "-ap", "-pp" } do
params[prefix .. extn] = {}
params[prefix .. extn .. "-head"] = {}
-- FIXME: No -tr?
params[prefix .. extn .. "-gloss"] = {}
end
end
end
local args = require("Module:parameters").process(parargs, params)
local i = 1
local past_vowel_re = "^[aui,]*$"
local combined_root = nil
if not args[i] or rfind(args[i], past_vowel_re) then
combined_root = args.pagename or mw.loadData("Module:headword/data").pagename
if not rfind(combined_root, "^([^ ]) ([^ ]) ([^ ])$") and not
rfind(combined_root, "^([^ ]) ([^ ]) ([^ ]) ([^ ])$") then
error("When inferring roots from page title, need three or four space-separated radicals: " .. combined_root)
end
elseif rfind(args[i], " ") then
combined_root = args[i]
i = i + 1
else
local separate_roots = {}
while args[i] and not rfind(args[i], past_vowel_re) do
table.insert(separate_roots, args[i])
i = i + 1
end
combined_root = table.concat(separate_roots, " ")
end
local past_vowel = args[i]
i = i + 1
if past_vowel and not rfind(past_vowel, past_vowel_re) then
error("Unrecognized past vowel, should be 'a', 'i', 'u', 'a,u', etc. or empty: " .. past_vowel)
end
-- Spaces interfere with parsing as a unit in [[Module:inflection utilities]], so replace with underscore.
combined_root = combined_root:gsub(" ", "_")
local split_root = rsplit(combined_root, "_")
-- Map from verb forms (I, II, etc.) to a table of verb properties,
-- which has entries e.g. for "verb" (either true to autogenerate the verb
-- head, or an explicitly specified verb head using e.g. argument "I-head"),
-- and for "verb-gloss" (which comes from e.g. the argument "I" or "I-gloss"),
-- and for "vn" and "vn-gloss", "ap" and "ap-gloss", "pp" and "pp-gloss".
local verb_properties = {}
for _, form in ipairs(allowed_vforms) do
local formpropslist = {}
local derivs = {{"verb", ""}, {"vn", "-vn"}, {"ap", "-ap"}, {"pp", "-pp"}}
local index = 1
while true do
local formprops = {}
local prefix = index == 1 and form or form .. index
if prefix == "I" then
formprops.pv = past_vowel
end
if args[prefix .. "-pv"] then
formprops.pv = args[prefix .. "-pv"]
end
for _, deriv in ipairs(derivs) do
local prop = deriv[1]
local extn = deriv[2]
if args[prefix .. extn] == "+" then
formprops[prop] = true
elseif args[prefix .. extn] == "-" then
formprops[prop] = false
elseif args[prefix .. extn] then
formprops[prop] = true
formprops[prop .. "-gloss"] = args[prefix .. extn]
end
if args[prefix .. extn .. "-head"] then
if formprops[prop] == nil then
formprops[prop] = true
end
formprops[prop] = args[prefix .. extn .. "-head"]
end
if args[prefix .. extn .. "-gloss"] then
if formprops[prop] == nil then
formprops[prop] = true
end
formprops[prop .. "-gloss"] = args[prefix .. extn .. "-gloss"]
end
end
if formprops.verb then
-- If a verb form specified, also turn on vn (unless form I, with
-- unpredictable vn) and ap, and maybe pp, according to form,
-- weakness and past vowel. But don't turn these on if there's
-- an explicit on/off specification for them (e.g. I-pp=-).
if form ~= "I" and formprops.vn == nil then
formprops.vn = true
end
if formprops.ap == nil then
formprops.ap = true
end
local weakness = weakness_from_radicals(form, split_root[1], split_root[2], split_root[3],
split_root[4])
if formprops.pp == nil and not vform_probably_no_passive(form,
weakness, rsplit(formprops.pv or "", ","), {}) then
formprops.pp = true
end
if formprops.verb == true or formprops.vn == true or formprops.ap == true or formprops.pp == true then
formprops.need_autogen = true
end
table.insert(formpropslist, formprops)
index = index + 1
else
break
end
end
table.insert(verb_properties, {form, formpropslist})
end
-- Go through and create the verb form derivations as necessary, when they haven't been explicitly given.
for _, vplist in ipairs(verb_properties) do
local vform = vplist[1]
for _, props in ipairs(vplist[2]) do
if props.need_autogen then
local form_with_vowels
if vform == "I" then
local pv = props.pv
if not pv then
-- Make up likely past vowels based on weakness and actual radical.
if split_root[3] == W then -- final-weak
form_with_vowels = "I/a~u"
elseif split_root[3] == Y then
form_with_vowels = "I/a~i"
elseif split_root[2] == W then --hollow
form_with_vowels = "I/u~u"
elseif split_root[2] == Y then
form_with_vowels = "I/i~i"
else
-- most common; doesn't matter so much since we're not displaying the non-past
form_with_vowels = "I/a~u"
end
else
local pvs = rsplit(pv, ",")
local vowel_sufs = {}
for _, pv in ipairs(pvs) do
local vowel_spec
if pv == "a" then
-- Make up likely past vowels based on weakness and actual radical.
if split_root[3] == W then -- final-weak
vowel_spec = "a~u"
elseif split_root[3] == Y then
vowel_spec = "a~i"
elseif split_root[2] == W then --hollow
vowel_spec = "a~u"
elseif split_root[2] == Y then
vowel_spec = "a~i"
else
-- most common; doesn't matter so much since we're not displaying the non-past
vowel_spec = "a~u"
end
elseif pv == "i" then
-- most common; doesn't matter so much since we're not displaying the non-past
vowel_spec = "i~a"
elseif pv == "u" then
-- most common; doesn't matter so much since we're not displaying the non-past
vowel_spec = "u~u"
else
error(("Internal error: Bad past vowel '%s' in {{ar-verb forms}}"):format(pv))
end
table.insert(vowel_sufs, vowel_spec)
end
form_with_vowels = "I/" .. table.concat(vowel_sufs, "/")
end
else
form_with_vowels = vform
end
local angle_bracket_spec = ("%s<%s.pass>"):format(combined_root, form_with_vowels)
local alternant_multiword_spec = export.do_generate_forms({angle_bracket_spec}, "ar-verb forms")
local function format_forms(forms)
if not forms then
return "-" -- FIXME: Throw an error?
end
local formatted = {}
for _, form in ipairs(forms) do
if form.translit then
table.insert(formatted, ("%s//%s"):format(form.form, form.translit))
else
table.insert(formatted, form.form)
end
end
return table.concat(formatted, ",")
end
if props.verb == true then
props.verb = format_forms(alternant_multiword_spec.forms.past_3ms)
end
for _, deriv in ipairs({"vn", "ap", "pp"}) do
if props[deriv] == true then
props[deriv] = format_forms(alternant_multiword_spec.forms[deriv])
end
end
end
end
end
-- Go through and output the result
local formtextarr = {}
for _, vplist in ipairs(verb_properties) do
local form = vplist[1]
for _, props in ipairs(vplist[2]) do
local textarr = {}
if props.verb then
local text = "* '''[[Appendix:Arabic verbs#Form " .. form .. "|Form " .. form .. "]]''': "
local linktext = {}
local splitheads = rsplit(props.verb, "[,،]")
for _, head in ipairs(splitheads) do
table.insert(linktext, m_links.full_link({lang = lang, term = head, gloss = props["verb-gloss"]}))
end
text = text .. table.concat(linktext, ", ")
table.insert(textarr, text)
for _, derivengl in ipairs({{"vn", "Verbal noun"}, {"ap", "Active participle"}, {"pp", "Passive participle"}}) do
local deriv = derivengl[1]
local engl = derivengl[2]
if props[deriv] then
local text = "** " .. engl .. ": "
local linktext = {}
local splitheads = rsplit(props[deriv], "[,،]")
for _, head in ipairs(splitheads) do
local ar, translit = head:match("^(.*)//(.-)$")
if not ar then
ar = head
end
table.insert(linktext, m_links.full_link {lang = lang, term = ar, tr = translit,
gloss = props[deriv .. "-gloss"]} )
end
text = text .. table.concat(linktext, ", ")
table.insert(textarr, text)
end
end
table.insert(formtextarr, table.concat(textarr, "\n"))
end
end
end
return table.concat(formtextarr, "\n")
end
-- Infer radicals from lemma headword (i.e. 3rd masculine singular past) and verb form (I, II, etc.). Throw an error if
-- headword is malformed. A given returned radical may be actually be a list of possible radicals, where the first one
-- should be used if the user didn't explicitly give the radical. If the list contains a field `ambig = true`, the
-- radical is considered ambiguous and should not be categorized. `is_reduced` indicates that the user specified
-- `.reduced` to indicate that the verb form is reduced by assimilation and/or haplology (typically archaic Koranic
-- forms such as اِدَّارَأَ instead of تَدَارَأَ; or اِسْطَاعَ instead of اِسْتِطَاعَ; etc.
function export.infer_radicals(data)
local headword, vform, passive, past_vowel, nonpast_vowel, is_reduced =
data.headword, data.vform, data.passive, data.past_vowel, data.nonpast_vowel, data.is_reduced
past_vowel = past_vowel or "-"
nonpast_vowel = nonpast_vowel or "-"
local function verify_vowel(vowel, param)
if vowel ~= A and vowel ~= I and vowel ~= U and vowel ~= "-" then
error(("Internal error: Bad value for %s: %s (should be Arabic diacritic vowel or '-')"):format(
param, vowel))
end
end
verify_vowel(past_vowel, "past_vowel")
verify_vowel(nonpast_vowel, "nonpast_vowel")
local ch = {}
local form_viii_assim, variant
-- sub out alif-madda for easier processing
headword = rsub(headword, AMAD, HAMZA .. ALIF)
local function infer_err(msg, noann)
local anns = {}
local nohead, novform
if noann == "nohead" then
nohead = true
elseif noann == "novform" then
novform = true
elseif noann == "nohead-vform" then
nohead = true
novform = true
elseif noann then
error(("Internal error: Unrecognized value for 'noann': %s"):format(dump(noann)))
end
if not nohead then
table.insert(anns, ("headword=%s"):format(data.headword))
end
if not novform then
table.insert(anns, ("verb form=%s"):format(data.vform))
end
anns = table.concat(anns, ", ")
if anns ~= "" then
anns = ": " .. anns
end
error(msg .. anns)
end
local len = ulen(headword)
local expected_length
-- extract the headword letters into an array
for i = 1, len do
table.insert(ch, usub(headword, i, i))
end
-- check that the letter at the given index is the given string, or
-- is one of the members of the given array
local function check(index, must)
local letter = ch[index]
if type(must) == "string" then
if not letter then
infer_err("Letter " .. index .. " is nil")
end
if letter ~= must then
infer_err(("For verb form %s, letter %s must be %s, not %s"):format(vform, index, must, letter),
"novform")
end
elseif not m_table.contains(must, letter) then
infer_err("For verb form " .. vform .. ", radical " .. index ..
" must be one of " .. table.concat(must, " ") .. ", not " .. letter, "novform")
end
end
-- Check that length of headword is within [min, max]
local function check_len(min, max)
if min and len < min then
infer_err(("Not enough letters for verb form %s, expected at least %s"):format(vform, min), "novform")
end
if max and len > max then
infer_err(("Too many letters for verb form %s, expected at most %s"):format(vform, max), "novform")
end
end
-- If the vowels are i~a or u~u, a form I verb beginning with w- normally keeps the w in the non-past. Otherwise it
-- loses it (i.e. it is "assimilated").
local function form_I_w_non_assimilated()
return req(past_vowel, I) and req(nonpast_vowel, A) or req(past_vowel, U) and req(nonpast_vowel, U)
end
-- Convert radicals to canonical form (handle various hamza varieties and check for misplaced alif or alif maqṣūra;
-- legitimate cases of these letters are handled above).
local function convert(rad, index)
if type(rad) == "table" then
for i, r in ipairs(rad) do
rad[i] = convert(r, index)
end
return rad
elseif rad == HAMZA_ON_ALIF or rad == HAMZA_UNDER_ALIF or
rad == HAMZA_ON_W or rad == HAMZA_ON_Y then
return HAMZA
elseif rad == AMAQ then
infer_err("Radical " .. index .. " must not be alif maqṣūra")
elseif rad == ALIF then
infer_err("Radical " .. index .. " must not be alif")
else
return rad
end
end
local quadlit = vform:find("q$")
-- find first radical, start of second/third radicals, check for
-- required letters
local radstart, rad1, rad2, rad3, rad4
local weakness
if vform == "I" or vform == "II" then
rad1 = ch[1]
radstart = 2
elseif vform == "III" then
rad1 = ch[1]
check(2, {ALIF, W}) -- W occurs in passive-only verbs
radstart = 3
elseif vform == "IV" then
-- this would be alif-madda but we replaced it with hamza-alif above.
if ch[1] == HAMZA and ch[2] == ALIF then
rad1 = HAMZA
else
check(1, HAMZA_ON_ALIF)
rad1 = ch[2]
end
radstart = 3
elseif vform == "V" then
check(1, is_reduced and ALIF or T)
rad1 = ch[2]
radstart = 3
elseif vform == "VI" then
check(1, is_reduced and ALIF or T)
if ch[2] == AMAD then
rad1 = HAMZA
radstart = 3
else
rad1 = ch[2]
check(3, {ALIF, W}) -- W occurs in passive-only verbs
radstart = 4
end
elseif vform == "VII" then
check(1, ALIF)
if is_reduced then
check(2, M)
rad1 = M
radstart = 3
else
check(2, N)
rad1 = ch[3]
radstart = 4
end
elseif vform == "VIII" then
check(1, ALIF)
rad1 = ch[2]
if rad1 == "د" then
rad1 = {"د", "ذ"} -- not considered ambiguous since it's usually د
radstart = 3
form_viii_assim = "دّ"
elseif rad1 == "ظ" and ch[3] == "ط" and len >= 5 then
-- [[اظطلم]], variant of [[اظلم]]
radstart = 4
form_viii_assim = "ظْط"
elseif rad1 == "ذ" and ch[3] == "د" and len >= 5 then
-- [[اذدكر]], variant of [[اذكر]]
radstart = 4
form_viii_assim = "ذْد"
elseif rad1 == T or rad1 == "ث" or rad1 == "ذ" or rad1 == "ط" or rad1 == "ظ" then
radstart = 3
form_viii_assim = rad1 .. SH
elseif rad1 == "ز" then
check(3, "د")
radstart = 4
form_viii_assim = "زْد"
elseif rad1 == "ص" or rad1 == "ض" then
check(3, "ط")
radstart = 4
form_viii_assim = rad1 .. SK .. "ط"
else
check(3, T)
radstart = 4
rad1 = convert(rad1, 1)
form_viii_assim = rad1 .. SK .. "ت"
end
if rad1 == T then
-- Radical is ambiguous, might be ت or و or ي but doesn't affect conjugation. Note that there are no
-- form-VIII verbs with initial radical ي given in Hans Wehr but Lane mentions at least:
-- - (page 2973) اِتَّأَسَ, with assimilation of the ي to ت, from root ي ء س;
-- - (page 2975) اِتَّبَسَ non-past يَتَّبِسُ and alternative اِيتَبَسَ non-past يَاتَبِسُ from the root ي ب س;
-- - (page 2976) اِتَّسَرَ non-past يَتَّسِرُ or alternatively يَأْتَسِرُ with hamza preserved from the root ي س ر.
-- These alternative forms seem very rare and probably not worth worrying about, but if we want to handle
-- them, we can do it when the time comes.
rad1 = {T, W, Y, ambig = true}
-- اِتَّخَذَ irregularly has hamza as the radical but assimilates like و
if ch[3] == "خ" and ch[4] == "ذ" then
rad1[4] = HAMZA
end
end
elseif vform == "IX" then
check(1, ALIF)
rad1 = ch[2]
radstart = 3
elseif vform == "X" then
check(1, ALIF)
check(2, S)
if is_reduced then
rad1 = ch[3]
radstart = 4
else
check(3, T)
rad1 = ch[4]
radstart = 5
end
elseif vform == "Iq" then
rad1 = ch[1]
rad2 = ch[2]
radstart = 3
elseif vform == "IIq" then
check(1, T)
rad1 = ch[2]
rad2 = ch[3]
radstart = 4
elseif vform == "IIIq" then
check(1, ALIF)
rad1 = ch[2]
rad2 = ch[3]
check(4, N)
radstart = 5
elseif vform == "IVq" then
check(1, ALIF)
rad1 = ch[2]
rad2 = ch[3]
radstart = 4
elseif vform == "XI" then
check_len(5, 5)
check(1, ALIF)
rad1 = ch[2]
rad2 = ch[3]
check(4, ALIF)
rad3 = ch[5]
weakness = "sound"
elseif vform == "XII" then
check(1, ALIF)
rad1 = ch[2]
if ch[3] ~= ch[5] then
infer_err("For verb form XII, letters 3 and 5 should be the same", "novform")
end
check(4, W)
radstart = 5
elseif vform == "XIII" then
check_len(5, 5)
check(1, ALIF)
rad1 = ch[2]
rad2 = ch[3]
check(4, W)
rad3 = ch[5]
if rad3 == AMAQ then
weakness = "final-weak"
else
weakness = "sound"
end
elseif vform == "XIV" then
check_len(6, 6)
check(1, ALIF)
rad1 = ch[2]
rad2 = ch[3]
check(4, N)
rad3 = ch[5]
if ch[6] == AMAQ then
check_waw_ya(rad3)
weakness = "final-weak"
else
if ch[5] ~= ch[6] then
infer_err("For verb form XIV, letters 5 and 6 should be the same", "novform")
end
weakness = "sound"
end
elseif vform == "XV" then
check_len(6, 6)
check(1, ALIF)
rad1 = ch[2]
rad2 = ch[3]
check(4, N)
rad3 = ch[5]
if rad3 == Y then
check(6, ALIF)
else
check(6, AMAQ)
end
weakness = "sound"
else
error("Internal error: Unrecognized verb form " .. vform)
end
-- Process the last two radicals. RADSTART is the index of the first of the two. If it's nil then all radicals have
-- already been processed above, and we don't do anything.
if radstart then
-- There must (normally) be one or two letters left.
if len == radstart then
if vform == "I" and ch[len] == Y then
-- short form حَيَّ
weakness = "final-weak"
rad2 = Y
rad3 = Y
variant = "short"
elseif vform == "IV" and rad1 == "ر" and ch[len] == AMAQ then
-- irregular verb أَرَى
weakness = "final-weak"
rad2 = HAMZA
rad3 = Y
elseif vform == "X" and rad1 == "ح" and ch[len] == AMAQ then
-- irregular verb اِسْتَحَى
weakness = "final-weak"
rad2 = Y
rad3 = Y
variant = "short"
else
-- If one letter left, then it's a geminate verb. If the letter is alif or alif maqṣūra, it will trigger
-- an error down the line.
if vform_supports_geminate(vform) then
weakness = "geminate"
rad2 = ch[len]
rad3 = ch[len]
if vform == "III" or vform == "VI" then
variant = "short"
end
else
infer_err("Apparent geminate verb, but geminate verbs not allowed for this verb form")
end
end
elseif quadlit then
-- Process last two radicals of a quadriliteral verb form.
rad3 = ch[radstart]
rad4 = ch[radstart + 1]
expected_length = radstart + 1
check_len(expected_length)
if rad4 == AMAQ or rad4 == ALIF and rad3 == Y or rad4 == Y then
-- rad4 can be Y in passive-only verbs.
if vform_supports_final_weak(vform) then
weakness = "final-weak"
-- Ambiguous radical; randomly pick wāw as radical (but avoid two wāws in a row); it could be wāw or
-- yāʾ, but doesn't affect the conjugation.
rad4 = rad3 == W and {Y, W, ambig = true} or {W, Y, ambig = true}
else
infer_err("Last radical is " .. rad4 .. " but verb form " .. vform ..
" doesn't support final-weak verbs", "novform")
end
else
weakness = "sound"
end
else
-- Process last two radicals of a triliteral verb form.
rad2 = ch[radstart]
rad3 = ch[radstart + 1]
expected_length = radstart + 1
check_len(expected_length)
if vform == "I" and (is_waw_ya(rad3) or rad3 == ALIF or rad3 == AMAQ) then
local inferred_past_vowel, inferred_nonpast_vowel
-- Check for final-weak form I verb. It can end in tall alif (rad3 = wāw) or alif maqṣūra (rad3 = yāʾ)
-- or a wāw or yāʾ (with a past vowel of i or u, e.g. nasiya/yansā "forget" or with a passive-only
-- verb).
if rad1 == W and not form_I_w_non_assimilated() then
weakness = "assimilated+final-weak"
else
weakness = "final-weak"
end
if rad3 == ALIF then
rad3 = W
inferred_past_vowel = A
inferred_nonpast_vowel = U
if is_passive_only(passive) then
infer_err("Final-weak form-I passive verbs should end in yāʔ (ي), not tall alif (ا)", "novform")
end
elseif rad3 == AMAQ then
rad3 = Y
inferred_past_vowel = A
inferred_nonpast_vowel = I
if is_passive_only(passive) then
infer_err("Final-weak form-I passive verbs should end in yāʔ (ي), not alif maqṣūra (ى)",
"novform")
end
elseif rad1 == "ح" and rad2 == Y and rad3 == Y then
-- Long variant حَيِيَ.
inferred_past_vowel = I
inferred_nonpast_vowel = A
variant = "long"
else
if not is_passive_only(passive) then
-- does a non-passive final-weak verb in -uwa ever happen? (YES: e.g. [[رجو]] "to be slack")
inferred_past_vowel = rad3 == Y and I or U
inferred_nonpast_vowel = A
end
-- Ambiguous radical; randomly pick wāw as radical (but avoid two wāws); it could be wāw or yāʾ, but
-- doesn't affect the conjugation.
rad3 = (rad1 == W or rad2 == W) and {Y, W, ambig = true} or {W, Y, ambig = true} -- ambiguous
end
if inferred_past_vowel then
local raw_past_vowel = rget(past_vowel)
local raw_nonpast_vowel = rget(nonpast_vowel)
if raw_past_vowel ~= "-" then
if raw_past_vowel ~= inferred_past_vowel then
infer_err(("Final-weak form-I verb inferred past vowel %s, which disagrees with " ..
"explicitly specified %s"):format(undia[inferred_past_vowel], undia[raw_past_vowel]), "novform")
else
-- in case of footnote in past_vowel
inferred_past_vowel = past_vowel
end
end
if raw_nonpast_vowel ~= "-" and raw_nonpast_vowel ~= A and inferred_nonpast_vowel == U then
-- if inferred as I or A, the reality can be the reverse; form-I final-weak verbs with a~a and
-- i~i exist, e.g. سَعَى/يَسْعَى, وَلِيَ/يَلِي. Weird verb [[صها]] (also written [[صهى]]) has non-past
-- يصهى so we can't throw an error in this situation.
if raw_nonpast_vowel ~= inferred_nonpast_vowel then
infer_err(("Final-weak form-I verb inferred non-past vowel %s, which disagrees with " ..
"explicitly specified %s"):format(undia[inferred_nonpast_vowel], undia[raw_nonpast_vowel]), "novform")
else
-- in case of footnote in nonpast_vowel
inferred_nonpast_vowel = nonpast_vowel
end
end
end
if not is_passive_only(passive) then
if rget(past_vowel) == "-" then
past_vowel = inferred_past_vowel
end
if rget(nonpast_vowel) == "-" then
nonpast_vowel = inferred_nonpast_vowel
end
end
elseif vform == "IX" and is_waw_ya(rad3) and len == radstart + 2 and ch[len] == AMAQ then
-- Final-weak form IX verbs like اِرْعَوَى "to desist, to repent, to see the light".
weakness = "final-weak"
expected_length = radstart + 2
elseif vform == "X" and rad1 == "ح" and rad2 == Y and rad3 == ALIF then
-- Long variant اِسْتَحْيَا.
weakness = "final-weak"
rad3 = Y
variant = "long"
elseif rad3 == AMAQ or rad2 == Y and rad3 == ALIF or rad3 == Y then
-- rad3 == Y happens in passive-only verbs.
if vform_supports_final_weak(vform) then
weakness = "final-weak"
else
infer_err("Last radical is " .. rad3 .. " but verb form doesn't support final-weak verbs")
end
-- Ambiguous radical; randomly pick wāw as radical (but avoid two wāws); it could be wāw or yāʾ, but
-- doesn't affect the conjugation.
rad3 = (rad1 == W or rad2 == W) and {Y, W, ambig = true} or {W, Y, ambig = true}
elseif rad2 == ALIF then
if vform_supports_hollow(vform) then
weakness = "hollow"
local function set_past_to_a()
if req(past_vowel, A) then
-- already set
elseif req(past_vowel, "-") or req(past_vowel, rget(nonpast_vowel)) then
past_vowel = A
else
infer_err(("Form I hollow verb with nonpast vowel set to '%s' must have past vowel set to 'a' or the same value, not %s"):
format(undia[rget(nonpast_vowel)], undia[rget(past_vowel)]), "novform")
end
end
if vform == "I" and req(nonpast_vowel, U) then
rad2 = W
set_past_to_a()
elseif vform == "I" and req(nonpast_vowel, I) then
rad2 = Y
set_past_to_a()
else
if req(nonpast_vowel, A) and not req(past_vowel, I) then
infer_err(("Form I hollow verb with nonpast vowel set to 'a' must have past vowel set to 'i', not %s"):
format(undia[rget(past_vowel)]), "novform")
end
-- Ambiguous radical; could be wāw or yāʾ; if verb form I, it's critical to get this right, and
-- the caller checks for this situation and throws an error if non-past vowel is "a" and second
-- radical isn't explicitly given.
rad2 = {W, Y, ambig = true, need_radical = true}
end
else
infer_err("Second radical is alif but verb form doesn't support hollow verbs")
end
elseif vform == "I" and rad1 == W and not form_I_w_non_assimilated() then
weakness = "assimilated"
elseif rad2 == rad3 and (vform == "III" or vform == "VI") then
weakness = "geminate"
variant = "long"
else
weakness = "sound"
end
end
if expected_length then
check_len(expected_length, expected_length)
end
end
rad1 = convert(rad1, 1)
rad2 = convert(rad2, 2)
rad3 = convert(rad3, 3)
rad4 = convert(rad4, 4)
if not weakness then
error("Internal error: Returned weakness from infer_radicals() is nil")
end
return {
weakness = weakness,
rad1 = rad1,
rad2 = rad2,
rad3 = rad3,
rad4 = rad4,
past_vowel = past_vowel,
nonpast_vowel = nonpast_vowel,
form_viii_assim = form_viii_assim,
variant = variant,
}
end
-- bot interface to infer_radicals()
function export.infer_radicals_json(frame)
local iparams = {
headword = {},
vform = {},
passive = {},
past_vowel = {},
nonpast_vowel = {},
is_reduced = {type = "boolean"},
}
local iargs = require("Module:parameters").process(frame.args, iparams)
return require("Module:JSON").toJSON(export.infer_radicals(iargs))
end
-- Infer vocalization from participle headword (active or passive), verb form (I, II, etc.) and whether the headword is
-- active or passive. Throw an error if headword is malformed. Returned radicals may contain Latin letters "t", "w" or "y"
-- indicating ambiguous radicals guessed to be tāʾ, wāw or yāʾ respectively.
function export.infer_participle_vocalization(headword, vform, weakness, is_active)
local chars = {}
local orig_headword = headword
-- Sub out alif-madda for easier processing.
headword = rsub(headword, AMAD, HAMZA .. ALIF)
local len = ulen(headword)
-- Extract the headword letters into an array.
for i = 1, len do
table.insert(chars, usub(headword, i, i))
end
local function form_intro_error_msg()
return ("For verb form %s %s%s participle %s, "):format(vform, orig_headword ~= headword and "normalized " or
"", is_active and "active" or "passive", headword)
end
local function err(msg)
error(form_intro_error_msg() .. msg, 1)
end
-- Check that length of headword is within [min, max].
local function check_len(min, max)
if min and len < min then
err(("expected at least %s letters but saw %s"):format(min, len))
elseif max and len > max then
err(("expected at most %s letters but saw %s"):format(max, len))
end
end
-- Get the character at `ind`, making sure it exists.
local function c(ind)
check_len(ind)
return chars[ind]
end
-- Check that the letter at the given index is the given string, or is one of the members of the given array
local function check(index, must)
local letter = chars[index]
local function make_possible_values()
if type(must) == "string" then
return must
else
return list_to_text(must, nil, " or ")
end
end
if not letter then
err(("expected a letter (specifically %s) at position %s, but participle is too short"):format(
make_possible_values(), index))
end
local matches
if type(must) == "string" then
matches = letter == must
else
matches = m_table.contains(must, letter)
end
if not matches then
err(("letter %s at index %s must be %s"):format(letter, index, make_possible_values()))
end
end
local function check_weakness(values, allow_missing, invert_condition)
local function make_possible_weaknesses()
for i, val in ipairs(values) do
values[i] = "'" .. val .. "'"
end
return list_to_text(values, nil, " or ")
end
if allow_missing and invert_condition then
error("Internal error: Can't specify both allow_missing and invert_condition")
end
if not weakness then
if allow_missing or invert_condition then
return
else
err(("weakness is unspecified but must be %s"):format(make_possible_weaknesses()))
end
else
local matches = m_table.contains(values, weakness)
if invert_condition and matches then
err(("weakness '%s' must not be %s"):format(weakness, make_possible_weaknesses()))
elseif not invert_condition and not matches then
err(("weakness '%s' must be %s"):format(weakness, make_possible_weaknesses()))
end
end
end
local vocalized
local function handle_possibly_final_weak(sound_prefix, expected_length)
check_len(expected_length, expected_length)
if c(expected_length) == AMAQ then
-- passive final-weak
if is_active then
err("participle in -ِى only allowed for passive participles")
end
check_weakness({"final-weak", "assimilated+final-weak"}, "allow missing")
vocalized = sound_prefix .. AN .. AMAQ
else
-- all others behave as if sound
check_weakness({"final-weak", "assimilated+final-weak"}, nil, "invert condition")
vocalized = sound_prefix .. (is_active and I or A) .. c(expected_length)
end
end
if not (vform == "I" and is_active) then
-- all participles except verb form I active begin in م-.
check(1, M)
end
if vform == "I" then
if is_active then
check(2, ALIF)
local sound_prefix = c(1) .. AA .. c(3)
if len == 3 then
if c(3) == HAMZA then
-- Either hollow with hamzated third radical, e.g. [[شاء]] active participle 'شَاءٍ', or final-weak
-- with hamzated second radical, e.g. [[رأى]] active participle 'رَاءٍ'. Theoretically (?), also
-- geminate with hamzated second/third radical, but I don't know if any such verbs exist.
if weakness == "geminate" then
vocalized = sound_prefix .. SH
else
check_weakness({"hollow", "final-weak"}, "allow missing")
vocalized = sound_prefix .. IN
end
else
check_weakness({"final-weak", "geminate"})
if weakness == "geminate" then
vocalized = sound_prefix .. SH
else
vocalized = sound_prefix .. IN
end
end
else
check_len(4, 4)
-- we will convert back to alif maqṣūra below as needed
vocalized = sound_prefix .. I .. c(4)
end
else
-- assimilated verbs: regular, e.g. مَوْزُون "weighed"
-- geminate verbs: regular, e.g. مَبْلُول "moistened"
-- third-hamzated verbs: مَبْرُوء
-- hollow verbs: مَقُود "led, driven"; مَزِيد "added, increased"
-- hollow first-hamzated verbs: مَئِيض "returned, reverted"; مَأْيُوس "despaired" (NOTE: formation is sound);
-- مَأُود or مَؤُود "bent; depleted"
-- hollow third-hamzated verbs: مَشِيء "willed, intended", مَضُوء "glittered?"
-- final-weak: مَلْقِيّ "found, encountered"; مَصْغُوّ "inclined"
-- hollow + final-weak: مَشْوِيّ "fried, grilled", مَهْوِيّ "loved"
-- first-hamzated + hollow + final-weak: مَأْوِيّ "received hospitably"
local sound_prefix = MA .. c(2) .. SK .. c(3)
if len == 5 then
-- sound, assimilated or geminate
check(4, W)
vocalized = sound_prefix .. UU .. c(5)
else
check_len(4, 4)
if c(4) == W then
-- final-weak third-wāw
vocalized = sound_prefix .. U .. W .. SH
elseif c(4) == Y then
-- final-weak third-yāʾ
vocalized = sound_prefix .. I .. Y .. SH
else
-- hollow
check(3, {W, Y})
if c(3) == W then
vocalized = MA .. c(2) .. UU .. c(4)
else
vocalized = MA .. c(2) .. II .. c(4)
end
end
end
end
elseif vform == "II" or vform == "V" or vform == "XII" or vform == "XIII" or vform == "Iq" or vform == "IIq" or
vform == "IIIq" then
local sound_prefix, expected_length
if vform == "II" then
sound_prefix = MU .. c(2) .. A .. c(3) .. SH
expected_length = 4
elseif vform == "V" then
check(2, T)
sound_prefix = MU .. T .. A .. c(3) .. A .. c(4) .. SH
expected_length = 5
elseif vform == "XII" then
-- e.g. [[احدودب]] "to be or become convex or humpbacked", مُحْدَوْدِب (active);
-- [[اثنونى]] "to be bent; to be doubled up", مُثْنَوْنٍ (active)
check(4, W)
if c(3) ~= c(5) then
err(("third letter %s should be the same as the fifth letter %s"):format(c(3), c(5)))
end
sound_prefix = MU .. c(2) .. SK .. c(3) .. A .. W .. SK .. c(5)
expected_length = 6
elseif vform == "XIII" then
-- e.g. [[اخروط]] "to get entangled; to extend", مُخْرَوِّط (active), مُخْرَوَّط (passive)
check(4, W)
sound_prefix = MU .. c(2) .. SK .. c(3) .. A .. W .. SH
expected_length = 5
elseif vform == "Iq" then
sound_prefix = MU .. c(2) .. A .. c(3) .. SK .. c(4)
expected_length = 5
elseif vform == "IIq" then
check(2, T)
sound_prefix = MU .. T .. A .. c(3) .. A .. c(4) .. SK .. c(5)
expected_length = 6
elseif vform == "IIIq" then
-- e.g. [[اخرنطم]] "to be proud and angry"
check(4, T)
sound_prefix = MU .. c(2) .. SK .. c(3) .. A .. N .. SK .. c(5)
expected_length = 6
else
error("Internal error: Unhandled verb form " .. vform)
end
if len == expected_length - 1 then
-- active final-weak
if not is_active then
err(("length-%s participle only allowed for active participles"):format(len))
end
check_weakness({"final-weak", "assimilated+final-weak"}, "allow missing")
vocalized = sound_prefix .. IN
else
handle_possibly_final_weak(sound_prefix, expected_length)
end
elseif vform == "III" or vform == "VI" then
local sound_prefix, expected_length
if vform == "VI" then
check(2, T)
check(4, ALIF)
sound_prefix = MU .. T .. A .. c(3) .. AA .. c(5)
expected_length = 6
else
sound_prefix = MU .. c(2) .. AA .. c(4)
expected_length = 5
end
if len == expected_length - 1 then
-- active final-weak or active or passive geminate
if is_active then
check_weakness({"geminate", "final-weak", "assimilated+final-weak"})
if weakness == "geminate" then
vocalized = sound_prefix .. SH
else
vocalized = sound_prefix .. IN
end
else
check_weakness({"geminate"}, "allow missing")
vocalized = sound_prefix .. SH
end
else
handle_possibly_final_weak(sound_prefix, expected_length)
end
elseif vform == "IV" or vform == "X" then
-- form IV:
-- sound: مُرْسِخ (active, "entrenching"), مُرْسَخ (passive, "entrenched")
-- first-hamzated (like sound): مُؤْيِس (active, "causing to despair"), مُؤْيَس (passive, "caused to despair")
-- final-weak: مُكْرٍ (active, "renting out"), مُكْرًى (passive, "rented out")
-- assimilated: مُورِد (active, "transferring"), مُورَد (passive, "transferred"); same when first-Y, e.g.
-- أَيْقَنَ "to be certain of": مُوقِن (active), مُوقَن (passive)
-- assimilated + final-weak: مُورٍ (active, "setting fire, kindling"), مُورًى (passive, "set fire, kindled")
-- geminate: مُمِدّ (active, "granting, helping"), مُمَدّ (passive, "granted, helped")
-- hollow: مُزِيل (active, "eliminating"), مُزَال (passive, "eliminated")
-- hollow + final-weak: مُعْيٍ (active, "tiring"), مُعْيًى (passive, "tired")
local sound_prefix, expected_length
if vform == "X" then
check(2, S)
check(3, T)
sound_prefix = MU .. S .. SK .. T .. A .. c(4)
expected_length = 6
else
sound_prefix = MU .. c(2)
expected_length = 4
end
if len == expected_length and c(len - 1) == Y and c(len) ~= AMAQ then
-- active hollow
if not is_active then
err("this shape only allowed for active participles")
end
check_weakness({"hollow"}, "allow missing")
vocalized = sound_prefix .. II .. c(len)
elseif len == expected_length and c(len - 1) == ALIF then
-- passive hollow
if is_active then
err("this shape only allowed for passive participles")
end
check_weakness({"hollow"}, "allow missing")
vocalized = sound_prefix .. AA .. c(len)
elseif len == expected_length - 1 then
-- active final-weak or active or passive geminate
if is_active then
check_weakness({"geminate", "final-weak", "assimilated+final-weak"})
if weakness == "geminate" then
vocalized = sound_prefix .. I .. c(len) .. SH
elseif vform == "IV" and c(2) == W then
-- assimilated final-weak
vocalized = sound_prefix .. c(len) .. IN
else
vocalized = sound_prefix .. SK .. c(len) .. IN
end
else
check_weakness({"geminate"}, "allow missing")
vocalized = sound_prefix .. A .. c(len) .. SH
end
else
if vform == "IV" and c(2) == W then
-- assimilated, possibly final-weak
sound_prefix = sound_prefix .. c(expected_length - 1)
else
sound_prefix = sound_prefix .. SK .. c(expected_length - 1)
end
handle_possibly_final_weak(sound_prefix, expected_length)
end
elseif vform == "VII" or vform == "VIII" then
-- form VII (passive participles are fairly rare but do exist):
-- sound: مُنْكَتِب (active "subscribing"), مُنْكَتَب (passive "subscribed")
-- geminate: مُنْضَمّ (both active "joining, containing" and passive "joined, contained")
-- final-weak: مُنْطَلٍ (active "fooling (someone)"), مُنْطَلًى (passive "fooled")
-- final-weak with medial wāw: مُنْطَوٍ (active "involving"), مُنْطَوًى (passive "involved")
-- hollow: مُنْقَاد (both active "complying with" and passive "complied with")
--
-- for form VIII, the same variants exist but things are complicated by assimilations involving the template T.
-- sound third-hamzated no assimilation: مُبْتَدِئ (active "beginning"), مُبْتَدَأ (passive "begun")
-- geminate no assimilation: مُبْتَزّ (both active "robbing" and passive "robbed")
-- final-weak no assimilation: مُبْتَنٍ (active "building"), مُبْتَنًى (passive "built")
-- final-weak with medial wāw no assimilation: مُحْتَوٍ (active "containing"), مُحْتَوًى (passive "contained")
-- hollow no assimilation: مُخْتَار (both active "choosing" and passive "chosen")
--
-- sound with total assimilation: مُتَّبِع (active "following"), مُتَّبَع (passive "followed")
-- sound with total assimilation, assimilating wāw: مُتَّعِد (active "threatening"), مُتَّعَد (passive "threatened")
-- sound with total assimilation, irregularly assimilating hamza: مُتَّخِذ (active "taking"), مُتَّخَذ (passive "taken")
-- sound with total assimilation (to ḏāl, producing dāl): مُدَّخِر (active "reserving"), مُدَّخَر (passive "reserved")
-- sound with total assimilation (to ḏāl): مُذَّكِر (active "remembering"), مُذَّكَر (passive "remembered")
-- sound with total assimilation (to ṭāʔ): مُطَّرِح (active "discarding"), مُطَّرَح (passive "discarded")
-- sound with total assimilation (to ẓāʔ): مُظَّلِم (active "tolerating"), مُظَّلَم (passive "tolerated")
-- final-weak with total assimilation, assimilating wāw: مُتَّقٍ (active "guarding against"), مُتَّقًى (passive "guarded against")
-- final-weak with total assimilation (to ṯāʔ): مُثَّنٍ (active "undulating"), مُثَّنًى (passive "undulated")
-- final-weak with total assimilation (to dāl): مُدَّعٍ (active "claiming"), مُدَّعًى (passive "claimed")
-- sound with partial assimilation (to zayn): مُزْدَهِر (active "thriving"), مُزْدَهَر (passive "thrived")
-- sound with medial wāw with partial assimilation (to zayn): مُزْدَوِج (active "appearing twice")
-- sound with partial assimilation (to ṣād): مُصْطَبِح (active "illuminating"), مُصْطَبَح (passive, "illuminated")
-- sound with partial assimilation (to ḍād): مُضْطَرِب (active "to be disturbed"; no passive)
-- geminate with partial assimilation (to ṣād): مُصْطَبّ (both active "effusing" and passive "effused")
-- geminate with partial assimilation (to ḍād): مُضْطَرّ (both active "forcing" and passive "forced")
-- final-weak with partial assimilation (to ṣād): مُصْطَلٍ (active "warming"), مُصْطَلًى (passive "warmed")
-- hollow with partial assimilation (to zayn): مُزْدَاد (both active "increasing" and passive "increased")
-- hollow with partial assimilation (to ṣad): مُصْطَاد (both active "hunting" and passive "hunted")
local sound_prefix, sufind
if vform == "VII" then
check(2, N)
sound_prefix = MU .. N .. SK .. c(3)
sufind = 4
else
local c2 = c(2)
if c2 == T or c2 == "د" or c2 == "ث" or c2 == "ذ" or c2 == "ط" or c2 == "ظ" then
-- full assimilation
sound_prefix = MU .. c2 .. SH
sufind = 3
else
-- partial or no assimilation
if c2 == "ز" then
check(3, "د")
elseif c2 == "ص" or c2 == "ض" then
check(3, "ط")
else
check(3, T)
end
sound_prefix = MU .. c2 .. SK .. c(3)
sufind = 4
end
end
if c(sufind) == ALIF then
-- hollow, active or passive
check_len(sufind + 1, sufind + 1)
check_weakness({"hollow"}, "allow missing")
vocalized = sound_prefix .. AA .. c(sufind + 1)
elseif len == sufind then
-- active final-weak or active or passive geminate
if is_active then
check_weakness({"geminate", "final-weak", "assimilated+final-weak"})
if weakness == "geminate" then
vocalized = sound_prefix .. A .. c(len) .. SH
else
vocalized = sound_prefix .. A .. c(len) .. IN
end
else
check_weakness({"geminate"}, "allow missing")
vocalized = sound_prefix .. A .. c(len) .. SH
end
else
sound_prefix = sound_prefix .. A .. c(sufind)
handle_possibly_final_weak(sound_prefix, sufind + 1)
end
elseif vform == "IX" then
check_len(4, 4)
vocalized = MU .. c(2) .. SK .. c(3) .. A .. c(4) .. SH
elseif vform == "IVq" then
-- e.g. [[اذلعب]] "to scamper away", مُذْلَعِبّ (active), مُذْلَعَبّ (passive);
-- [[اطمأن]] "to remain quietly; to be certain", مُطْمَئِنّ (active), مُطْمَأَنّ (passive)
check_len(5, 5)
local sound_prefix = MU .. c(2) .. SK .. c(3) .. A .. c(4)
if is_active then
vocalized = sound_prefix .. I .. c(5) .. SH
else
vocalized = sound_prefix .. A .. c(5) .. SH
end
elseif vform == "XI" then
check_len(5, 5)
check(4, ALIF)
vocalized = MU .. c(2) .. SK .. c(3) .. AA .. c(5) .. SH
-- e.g. [[احمار]] "to turn red, to blush", مُحْمَارّ (active)
elseif vform == "XIV" or vform == "XV" then
-- FIXME: Implement. No examples in Wiktionary currently; need to look up in a grammar.
error("Support for verb form " .. vform .. " not implemented yet")
else
error("Don't recognize verb form " .. vform)
end
vocalized = rsub(vocalized, HAMZA .. AA, AMAD)
local reconstructed_headword = lang:stripDiacritics(vocalized)
if reconstructed_headword ~= orig_headword then
error(("Internal error: Vocalized participle %s doesn't match original participle %s"):format(
vocalized, orig_headword))
end
return vocalized
end
function export.infer_participle_vocalization_json(frame)
local iparams = {
[1] = {required = true},
[2] = {required = true},
["weakness"] = {},
["passive"] = {type = "boolean"}
}
local iargs = require("Module:parameters").process(frame.args, iparams)
return export.infer_participle_vocalization(iargs[1], iargs[2], iargs.weakness, not iargs.passive)
end
return export
gopkz51p5nvpnjqfkifs9ek7o5xi1ai
Module:ar-translit
828
8167
27702
2026-06-21T14:19:22Z
Umarxon III
2840
Sahypa döretdi, mazmuny: '-- Authors: Benwing, ZxxZxxZ, Atitarev local export = {} local m_str_utils = require("Module:string utilities") local gcodepoint = m_str_utils.gcodepoint local rfind = m_str_utils.find local rsubn = m_str_utils.gsub local rmatch = m_str_utils.match local rsplit = m_str_utils.split local U = m_str_utils.char local unpack = unpack or table.unpack -- Lua 5.2 compatibility -- assigned below local has_diacritics -- version of rsubn() that discards all but the fir...'
27702
Scribunto
text/plain
-- Authors: Benwing, ZxxZxxZ, Atitarev
local export = {}
local m_str_utils = require("Module:string utilities")
local gcodepoint = m_str_utils.gcodepoint
local rfind = m_str_utils.find
local rsubn = m_str_utils.gsub
local rmatch = m_str_utils.match
local rsplit = m_str_utils.split
local U = m_str_utils.char
local unpack = unpack or table.unpack -- Lua 5.2 compatibility
-- assigned below
local has_diacritics
-- version of rsubn() that discards all but the first return value
local function rsub(term, foo, bar)
local retval = rsubn(term, foo, bar)
return retval
end
local zwnj = U(0x200C) -- zero-width non-joiner
local alif_madda = U(0x622)
local alif_hamza_below = U(0x625)
local alif = U(0x627)
local taa_marbuuTa = U(0x629)
local laam = U(0x644)
local waaw = U(0x648)
local alif_maqSuura = U(0x649)
local yaa = U(0x64A)
local fatHataan = U(0x64B)
local Dammataan = U(0x64C)
local kasrataan = U(0x64D)
local fatHa = U(0x64E)
local Damma = U(0x64F)
local kasra = U(0x650)
local shadda = U(0x651)
local sukuun = U(0x652)
local dagger_alif = U(0x670)
local alif_waSl = U(0x671)
--local zwj = U(0x200D) -- zero-width joiner
local lrm = U(0x200E) -- left-to-right mark
local rlm = U(0x200F) -- right-to-left mark
-- Occurs after al- in allaḏī and variants so that we can implement elision of
-- a- after a preceding vowel, after which we remove the marker.
local alladi_marker = U(0xFFF0)
local tt = {
-- consonants
["ب"]="b", ["ت"]="t", ["ث"]="ṯ", ["ج"]="j", ["ح"]="ḥ", ["خ"]="ḵ",
["د"]="d", ["ذ"]="ḏ", ["ر"]="r", ["ز"]="z", ["س"]="s", ["ش"]="š",
["ص"]="ṣ", ["ض"]="ḍ", ["ط"]="ṭ", ["ظ"]="ẓ", ["ع"]="ʕ", ["غ"]="ḡ",
["ف"]="f", ["ق"]="q", ["ك"]="k", ["ڪ"]="k", ["ل"]="l", ["م"]="m", ["ن"]="n",
["ه"]="h",
-- tāʾ marbūṭa (special) - always after a fátḥa (a), silent at the end of
-- an utterance, "t" in ʾiḍāfa or with pronounced tanwīn. We catch
-- most instances of tāʾ marbūṭa before we get to this stage.
[taa_marbuuTa]="t", -- tāʾ marbūṭa = ة
-- control characters
[zwnj]="-", -- ZWNJ (zero-width non-joiner)
-- [zwj]="", -- ZWJ (zero-width joiner)
-- rare letters
["پ"]="p", ["چ"]="č", ["ژ"]="ž", ["ڤ"]="v", ["ڥ"]="v", ["گ"]="g",
["ڨ"]="g", ["ڧ"]="q", ["ڢ"]="f", ["ں"]="n", ["ڭ"]="g",
-- semivowels or long vowels, alif, hamza, special letters
["ا"]="ā", -- ʾalif
-- hamzated letters
["أ"]="ʔ", -- hamza over alif
[alif_hamza_below]="ʔ", -- hamza under alif
["ؤ"]="ʔ", -- hamza over wāw
["ئ"]="ʔ", -- hamza over yā
["ء"]="ʔ", -- hamza on the line
-- long vowels
[waaw]="w", --"ū" after ḍamma (u) and not before diacritic
[yaa]="y", --"ī" after kasra (i) and not before diacritic
[alif_maqSuura]="ā", -- ʾalif maqṣūra
[alif_madda]="ʔā", -- ʾalif madda
[alif_waSl]= "", -- hamzatu l-waṣl
[dagger_alif] = "ā", -- ʾalif xanjariyya = dagger ʾalif (Koranic diacritic)
-- short vowels, šádda and sukūn
[fatHataan]="an", -- fatḥatan
[Dammataan]="un", -- ḍammatan
[kasrataan]="in", -- kasratan
[fatHa]="a", -- fatḥa
[Damma]="u", -- ḍamma
[kasra]="i", -- kasra
-- šadda - doubled consonant
[sukuun]="", --sukūn - no vowel
-- ligatures
["ﻻ"]="lā",
["ﷲ"]="llāh",
-- taṭwīl
["ـ"]="", -- taṭwīl, no sound
-- numerals
["١"]="1", ["٢"]="2", ["٣"]="3", ["٤"]="4", ["٥"]="5",
["٦"]="6", ["٧"]="7", ["٨"]="8", ["٩"]="9", ["٠"]="0",
-- punctuation (leave on separate lines)
["؟"]="?", -- question mark
["«"]='“', -- quotation mark
["»"]='”', -- quotation mark
["٫"]=".", -- decimal point
["٬"]=",", -- thousands separator
["٪"]="%", -- percent sign
["،"]=",", -- comma
["؛"]=";" -- semicolon
}
local sun_letters = "تثدذرزسشصضطظلن"
-- For use in implementing sun-letter assimilation of ال (al-)
local ttsun1 = {}
local ttsun2 = {}
local ttsun3 = {}
for cp in gcodepoint(sun_letters) do
local ch = U(cp)
ttsun1[ch] = tt[ch]
ttsun2["l-" .. ch] = tt[ch] .. "-" .. ch
table.insert(ttsun3, tt[ch])
end
-- For use in implementing elision of al-
local sun_letters_tr = table.concat(ttsun3, "")
local consonants_needing_vowels = "بتثجحخدذرزسشصضطظعغفقكڪلمنهپچژڤگڨڧڢںڭأإؤئءةﷲ"
-- consonants on the right side; includes alif madda
local rconsonants = consonants_needing_vowels .. "ويآ"
-- consonants on the left side; does not include alif madda
local lconsonants = consonants_needing_vowels .. "وي"
-- Arabic semicolon, comma, question mark; taṭwīl; period, exclamation point,
-- single quote for bold/italic, double quotes for quoted material
local punctuation = "؟،؛" .. "ـ" .. ".!'" .. '"'
local space_like = "%s'" .. '"'
local space_like_class = "[" .. space_like .. "]"
local numbers = "١٢٣٤٥٦٧٨٩٠"
local before_diacritic_checking_subs = {
------------ transformations prior to checking for diacritics --------------
-- random Koranic marks and presentation forms
{U(0x06E1), sukuun}, -- "Small High Dotless Head of Khah" (variant of sukūn)
{U(0x06DA), ""}, -- "Small High Jeem"
{U(0x06DF), ""}, -- "Small High Rounded Zero" (FIXME: correct?)
{U(0x08F0), U(0x64B)}, -- "Open Fathatan"
{U(0x08F1), U(0x64C)}, -- "Open Dammatan"
{U(0x08F2), U(0x64D)}, -- "Open Kasratan"
{U(0x06E4), ""}, -- "Small High Madda" (FIXME: correct?)
{U(0x06D6), ""}, -- "Small High Ligature Sad with Lam with Alef Maksura" (FIXME: there are others we need to do)
{U(0x06E5), "و"},
{U(0x06E6), "ي"},
-- convert llh for allāh into ll+shadda+dagger-alif+h
{"لله", "للّٰه"},
-- shadda+short-vowel (including tanwīn vowels, i.e. -an -in -un) gets
-- replaced with short-vowel+shadda during NFC normalisation, which
-- MediaWiki does for all Unicode strings; however, it makes the
-- transliteration process inconvenient, so undo it.
{"([" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. dagger_alif .. "])" .. shadda, shadda .. "%1"},
-- ignore Koranic gemination at beginning of word due to assimilation of preceding consonant
{" ([" .. lconsonants .. "])" .. shadda, " %1"},
-- ignore alif jamīla (otiose alif in 3pl verb forms)
-- #1: handle ḍamma + wāw + alif (final -ū)
{Damma .. waaw .. alif, Damma .. waaw},
-- #2: handle wāw + sukūn + alif (final -w in -aw in defective verbs)
-- this must go before the generation of w, which removes the waw here.
{waaw .. sukuun .. alif, waaw .. sukuun},
-- ignore final alif or alif maqṣūra following fatḥatan (e.g. in accusative
-- singular or words like عَصًا "stick" or هُدًى "guidance"; this is called
-- tanwin nasb)
{fatHataan .. "[" .. alif .. alif_maqSuura .. "]", fatHataan},
-- same but with the fatḥatan placed over the alif or alif maqṣūra
-- instead of over the previous letter (considered a misspelling but
-- common)
{"[" .. alif .. alif_maqSuura .. "]" .. fatHataan, fatHataan},
-- tāʾ marbūṭa should always be preceded by fatḥa, alif, alif madda or
-- dagger alif; infer fatḥa if not
{"([^" .. fatHa .. alif .. alif_madda .. dagger_alif .. "])" .. taa_marbuuTa, "%1" .. fatHa .. taa_marbuuTa},
-- similarly for alif between consonants, possibly marked with shadda
-- (does not apply to initial alif, which is silent when not marked with
-- hamza, or final alif, which might be pronounced as -an)
{"([" .. lconsonants .. "]" .. shadda .. "?)" .. alif .. "([" .. rconsonants .. "])",
"%1" .. fatHa .. alif .. "%2"},
-- infer fatḥa in case of non-fatḥa + alif/alif-maqṣūra + dagger alif
{"([^" .. fatHa .. "])([" .. alif .. alif_maqSuura .. "]" .. dagger_alif .. ")", "%1" .. fatHa .. "%2"},
-- infer kasra in case of hamza-under-alif not + kasra
{alif_hamza_below .. "([^" .. kasra .. kasrataan .. "])", alif_hamza_below .. kasra .. "%1"},
-- ignore dagger alif placed over regular alif or alif maqṣūra
{"([" .. alif .. alif_maqSuura .. "])" .. dagger_alif, "%1"},
----------- rest of these concern definite article alif-lām ----------
-- in kasra/ḍamma + alif + lam, make alif into hamzatu l-waṣl, so we
-- handle cases like بِالتَّوْفِيق (bi-t-tawfīq) correctly
{"([" .. Damma .. kasra .. "])" .. alif .. laam, "%1" .. alif_waSl .. laam},
-- al + consonant + shadda (only recognize word-initially if regular alif): remove shadda
{"^(" .. alif .. fatHa .. "?" .. laam .. "[" .. lconsonants .. "])" .. shadda, "%1"},
{"(" .. space_like_class .. alif .. fatHa .. "?" .. laam .. "[" .. lconsonants .. "])" .. shadda, "%1"},
{"(" .. alif_waSl .. fatHa .. "?" .. laam .. "[" .. lconsonants .. "])" .. shadda, "%1"},
-- handle l- hamzatu l-waṣl or word-initial al-
{"^" .. alif .. fatHa .. "?" .. laam, "al-"},
{"(" .. space_like_class .. ")" .. alif .. fatHa .. "?" .. laam, "%1al-"},
-- next one for bi-t-tawfīq
{"([" .. Damma .. kasra .. "])" .. alif_waSl .. fatHa .. "?" .. laam, "%1-l-"},
-- next one for remaining hamzatu l-waṣl (at beginning of word)
{alif_waSl .. fatHa .. "?" .. laam, "l-"},
-- special casing if the l in al- has a shadda on it (as in الَّذِي "that"),
-- so we don't mistakenly double the dash; insert a special marker here so
-- that we know later to elide the a- after a vowel
{"l%-" .. shadda, "l" .. alladi_marker .. "l"},
-- implement assimilation of sun letters
{"l%-[" .. sun_letters .. "]", ttsun2},
}
-- Transliterate the word(s) in TEXT. LANG (the language) and SC (the script)
-- are ignored. OMIT_I3RAAB means leave out final short vowels (ʾiʿrāb).
-- GRAY_I3RAAB means render transliterate short vowels (ʾiʿrāb) in gray.
-- FORCE_TRANSLIT causes even non-vocalized text to be transliterated
-- (normally the function checks for non-vocalized text and returns nil,
-- since such text is ambiguous in transliteration).
function export.tr(text, lang, sc, omit_i3raab, gray_i3raab, force_translit)
-- make it possible to call this function from a template
if type(text) == "table" then
local function f(x) return (x ~= "") and x or nil end
text, lang, sc, omit_i3raab, force_translit =
f(text.args[1]), f(text.args[2]), f(text.args[3]), f(text.args[4]), f(text.args[5])
end
for _, sub in ipairs(before_diacritic_checking_subs) do
text = rsub(text, sub[1], sub[2])
end
if not force_translit and not has_diacritics(text) then
require("Module:debug").track("ar-translit/lacking diacritics")
return nil
end
------------ transformations after checking for diacritics --------------
-- Replace plain alif with hamzatu l-waṣl when followed by fatḥa/ḍamma/kasra.
-- Must go after handling of initial al-, which distinguishes alif-fatḥa
-- from alif w/hamzatu l-waṣl. Must go before generation of ū and ī, which
-- eliminate the ḍamma/kasra.
text = rsub(text, alif .. "([" .. fatHa .. Damma .. kasra .. "])", alif_waSl .. "%1")
-- ḍamma + waw not followed by a diacritic is ū, otherwise w
text = rsub(text, Damma .. waaw .. "([^" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. shadda .. sukuun .. dagger_alif .. "])", "ū%1")
text = rsub(text, Damma .. waaw .. "$", "ū")
-- kasra + yaa not followed by a diacritic (or ū from prev step) is ī, otherwise y
text = rsub(text, kasra .. yaa .. "([^" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. shadda .. sukuun .. dagger_alif .. "ū])", "ī%1")
text = rsub(text, kasra .. yaa .. "$", "ī")
-- convert shadda to double letter.
text = rsub(text, "(.)" .. shadda, "%1%1")
if not omit_i3raab and gray_i3raab then -- show ʾiʿrāb grayed in transliteration
-- decide whether to gray out the t in ﺓ. If word begins with al- or l-, yes.
-- Otherwise, no if word ends in a/i/u, yes if ends in an/in/un.
text = rsub(text, "^(a?l%-[^%s]+)" .. taa_marbuuTa .. "([" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. "])",
'%1<span style="color: var(--wikt-palette-grey-8,#888)">t</span>%2')
text = rsub(text, "(" .. space_like_class .. "a?l%-[^%s]+)" .. taa_marbuuTa .. "([" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. "])",
'%1<span style="color: var(--wikt-palette-grey-8,#888)">t</span>%2')
text = rsub(text, taa_marbuuTa .. "([" .. fatHa .. Damma .. kasra .. "])", "t%1")
text = rsub(text, taa_marbuuTa .. "([" .. fatHataan .. Dammataan .. kasrataan .. "])",
'<span style="color: var(--wikt-palette-grey-8,#888)">t</span>%1')
text = rsub(text, ".", {
[fatHataan] = '<span style="color: var(--wikt-palette-grey-8,#888)">an</span>',
[kasrataan] = '<span style="color: var(--wikt-palette-grey-8,#888)">in</span>',
[Dammataan] = '<span style="color: var(--wikt-palette-grey-8,#888)">un</span>'
})
text = rsub(text, "([" .. fatHa .. Damma .. kasra .. "])(" .. space_like_class .. ")",
function(vowel, space)
vowel_repl = {
[fatHa] = '<span style="color: var(--wikt-palette-grey-8,#888)">a</span> ',
[kasra] = '<span style="color: var(--wikt-palette-grey-8,#888)">i</span> ',
[Damma] = '<span style="color: var(--wikt-palette-grey-8,#888)">u</span> '
}
return vowel_repl[vowel] .. space
end
)
text = rsub(text, "[" .. fatHa .. Damma .. kasra .. "]$", {
[fatHa] = '<span style="color: var(--wikt-palette-grey-8,#888)">a</span>',
[kasra] = '<span style="color: var(--wikt-palette-grey-8,#888)">i</span>',
[Damma] = '<span style="color: var(--wikt-palette-grey-8,#888)">u</span>'
})
text = rsub(text, '</span><span style="color: var(--wikt-palette-grey-8,#888)">', "")
elseif omit_i3raab then -- omit ʾiʿrāb in transliteration
text = rsub(text, "[" .. fatHataan .. Dammataan .. kasrataan .. "]", "")
text = rsub(text, "[" .. fatHa .. Damma .. kasra .. "](" .. space_like_class .. ")", "%1")
text = rsub(text, "[" .. fatHa .. Damma .. kasra .. "]$", "")
end
-- tāʾ marbūṭa should not be rendered by -t if word-final even when
-- ʾiʿrāb (desinential inflection) is shown; instead, use (t) before
-- whitespace, nothing when final; but render final -ﺍﺓ and -ﺁﺓ as -āh,
-- consistent with Wehr's dictionary
-- Left-to-right or right-to-left mark at end of text will prevent tāʾ marbūṭa
-- from being transliterated correctly.
text = string.gsub(text, lrm, "")
text = string.gsub(text, rlm, "")
text = rsub(text, "([" .. alif .. alif_madda .. "])" .. taa_marbuuTa .. "$", "%1h")
-- Ignore final tāʾ marbūṭa (it appears as "a" due to the preceding
-- short vowel). Need to do this after graying or omitting word-final
-- ʾiʿrāb.
text = rsub(text, taa_marbuuTa .. "$", "")
text = rsub(text, taa_marbuuTa .. "(%p)", "%1")
if not omit_i3raab then -- show ʾiʿrāb in transliteration
text = rsub(text, taa_marbuuTa .. "(" .. space_like_class .. ")", "(t)%1")
else
-- When omitting ʾiʿrāb, show all non-absolutely-final instances of
-- tāʾ marbūṭa as (t), with trailing ʾiʿrāb omitted.
text = rsub(text, taa_marbuuTa, "(t)")
end
-- tatwīl should be rendered as - at beginning or end of word. It will
-- be rendered as nothing in the middle of a word (FIXME, do we want
-- this?)
text = rsub(text, "^ـ", "-")
text = rsub(text, "(" .. space_like_class .. ")ـ",
"%1-")
text = rsub(text, "ـ$", "-")
text = rsub(text, "ـ(" .. space_like_class .. ")", "-%1")
-- Now convert remaining Arabic chars according to table.
text = rsub(text, ".", tt)
text = rsub(text, "aā", "ā")
-- Implement elision of al- after a final vowel. We do this
-- conservatively, only handling elision of the definite article and related
-- terms (specifically, relative pronoun الَّذِي (allaḏī) and variants) rather
-- than elision in other cases of hamzat al-waṣl (e.g. form-I imperatives
-- or form-VII and above verbal nouns) partly because elision in
-- these cases isn't so common in MSA and partly to avoid excessive
-- elision in case of words written with initial bare alif instead of
-- properly with hamzated alif. Possibly we should reconsider.
text = rsub(text, "([aiuāīū]'* +'*)a([" .. sun_letters_tr .. "][%-" .. alladi_marker .. "])",
"%1%2")
if gray_i3raab then
text = rsub(text, "([aiuāīū]'*</span>'* +'*)a([" .. sun_letters_tr .. "][%-" .. alladi_marker .. "])",
"%1%2")
end
-- remove indicator of allaḏī, which has served its purpose
text = rsub(text, alladi_marker, "")
-- Special-case the transliteration of allāh, without the hyphen.
text = rsub(text, "^(a?)l%-lāh", "%1llāh")
text = rsub(text, "(" .. space_like_class .. "a?)l%-lāh", "%1llāh")
-- Compress multiple spaces, which may occur e.g. when removing Koranic diacritics.
text = rsub(text, "(%s)%s+", "%1")
return text
end
local has_diacritics_subs = {
-- FIXME! What about lam-alif ligature?
-- remove punctuation and shadda
-- must go before removing final consonants
{"[" .. punctuation .. shadda .. "]", ""},
-- Remove consonants at end of word or utterance, so that we're OK with
-- words lacking iʿrāb (must go before removing other consonants).
-- If you want to catch places without iʿrāb, comment out the next two lines.
{"[" .. lconsonants .. "]$", ""},
{"[" .. lconsonants .. "]([%)%]}]?" .. space_like_class .. ")", "%1"},
-- remove consonants (or alif) when followed by diacritics
-- must go after removing shadda
-- do not remove the diacritics yet because we need them to handle
-- long-vowel sequences of diacritic + pseudo-consonant
{"[" .. lconsonants .. alif .. "]([" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. sukuun .. dagger_alif .. "])", "%1"},
-- the following two must go after removing consonants w/diacritics because
-- we only want to treat vocalic wāw/yā' in them (we want to have removed
-- wāw/yā' followed by a diacritic)
-- remove ḍamma + wāw
{Damma .. waaw, ""},
-- remove kasra + yā'
{kasra .. yaa, ""},
-- remove fatḥa/fatḥatan + alif/alif-maqṣūra
{"[" .. fatHataan .. fatHa .. "][" .. alif .. alif_maqSuura .. "]", ""},
-- remove diacritics
{"[" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. sukuun .. dagger_alif .. "]", ""},
-- remove numbers, hamzatu l-waṣl, alif madda
{"[" .. numbers .. "ٱ" .. "آ" .. "]", ""},
-- remove non-Arabic characters
{"[^" .. U(0x0600) .. "-" .. U(0x06FF) .. U(0x0750) .. "-" .. U(0x077F) ..
U(0x08A0) .. "-" .. U(0x08FF) .. U(0xFB50) .. "-" .. U(0xFDFF) ..
U(0xFE70) .. "-" .. U(0xFEFF) .. "]", ""}
}
-- declared as local above
function has_diacritics(text)
local orig_text = text
local count
text, count = rsubn(text, "[" .. lrm .. rlm .. "]", "")
if count > 0 then
require("Module:debug").track("ar-translit/lrm or rlm")
end
for _, sub in ipairs(has_diacritics_subs) do
text = rsub(text, unpack(sub))
end
if #text > 0 then
mw.log(("Check for missing diacritics failed; original text '%s', text without diacritics '%s'"):format(
orig_text, text))
end
return #text == 0
end
-- Return true if transliteration TR is an irregular transliteration of
-- ARABIC. Return false if ARABIC can't be transliterated. For purposes of
-- establishing regularity, hyphens are ignored and word-final tāʾ marbūṭa
-- can be transliterated as "(t)", "" or "t".
function export.irregular_translit(arabic, tr)
if not arabic or arabic == "" or not tr or tr == "" then
return false
end
local regtr = export.tr(arabic)
if not regtr or regtr == tr then
return false
end
local arwords = rsplit(arabic, " ")
local regwords = rsplit(regtr, " ")
local words = rsplit(tr, " ")
if #regwords ~= #words or #regwords ~= #arwords then
return true
end
for i=1,#regwords do
local regword = regwords[i]
local word = words[i]
local arword = arwords[i]
-- Resolve final (t) in auto-translit to t, h or nothing
if rfind(regword, "%(t%)$") then
regword = rfind(word, "āh$") and rsub(regword, "%(t%)$", "h") or
rfind(word, "t$") and rsub(regword, "%(t%)$", "t") or
rsub(regword, "%(t%)$", "")
end
-- Resolve clitics + short a + alif-lām, which may get auto-transliterated
-- to contain long ā, to short a if the manual translit has it; note
-- that currently in cases with assimilated l, the auto-translit will
-- fail, so we won't ever get here and don't have to worry about
-- auto-translit l against manual-translit assimilated char.
local clitic_chars = "^[وفكل]" -- separate line to avoid L2R display weirdness
if rfind(arword, clitic_chars .. fatHa .. "?[" .. alif .. alif_waSl .. "]" .. laam) and rfind(word, "^[wfkl]a%-") then
regword = rsub(regword, "^([wfkl])ā", "%1a")
end
-- Ignore hyphens when comparing
if rsub(regword, "%-", "") ~= rsub(word, "%-", "") then
return true
end
end
return false
end
return export
eevhmsd5a9n05vuzlsjd3c1063cl6fb
Module:ar-headword
828
8168
27703
2026-06-21T14:20:28Z
Umarxon III
2840
Sahypa döretdi, mazmuny: '-- Author: primarily Benwing2; some work by Fenakhay, Erutuon; early version by Rua local export = {} local pos_functions = {} local force_cat = false -- for testing; if true, categories appear in non-mainspace pages local ar_translit = require("Module:ar-translit") local ar_verb_module = "Module:ar-verb" local ar_utilities_module = "Module:ar-utilities" local ar = require(ar_utilities_module) local en_utilities_module = "Module:en-utilities" local headword_mo...'
27703
Scribunto
text/plain
-- Author: primarily Benwing2; some work by Fenakhay, Erutuon; early version by Rua
local export = {}
local pos_functions = {}
local force_cat = false -- for testing; if true, categories appear in non-mainspace pages
local ar_translit = require("Module:ar-translit")
local ar_verb_module = "Module:ar-verb"
local ar_utilities_module = "Module:ar-utilities"
local ar = require(ar_utilities_module)
local en_utilities_module = "Module:en-utilities"
local headword_module = "Module:headword"
local headword_utilities_module = "Module:headword utilities"
local links_module = "Module:links"
local inflection_utilities_module = "Module:inflection utilities"
local parse_utilities_module = "Module:parse utilities"
local require_when_needed = require("Module:utilities/require when needed")
local remove_links = require_when_needed(links_module, "remove_links")
local m_table = require("Module:table")
local m_str_utils = require("Module:string utilities")
local m_en_utilities = require_when_needed(en_utilities_module)
local m_headword_utilities = require_when_needed(headword_utilities_module)
local glossary_link = require_when_needed(headword_utilities_module, "glossary_link")
local boolean_param = {type = "boolean"}
local list_to_set = m_table.listToSet
local rfind = m_str_utils.find
local rmatch = m_str_utils.match
local rsubn = m_str_utils.gsub
local u = m_str_utils.char
local rsplit = m_str_utils.split
local insert = table.insert
local concat = table.concat
local unpack = unpack or table.unpack -- Lua 5.2 compatibility
local langcode = "ar"
local lang = require("Module:languages").getByCode(langcode)
local langname = lang:getCanonicalName()
local TEMPCOMMA = u(0xFFF0)
local TEMPARCOMMA = u(0xFFF1)
local misc_pos_with_gender = list_to_set {
"suffixes",
"adjective forms",
"noun forms",
"proper noun forms",
"pronoun forms",
"determiner forms",
}
-----------------------------------------------------------------------------------------
-- Utility functions --
-----------------------------------------------------------------------------------------
local dump = mw.dumpObject
-- version of mw.ustring.gsub() that discards all but the first return value
local function rsub(term, foo, bar)
local retval = rsubn(term, foo, bar)
return retval
end
local function ine(val)
if val == "" then return nil else return val end
end
-- Replace comma with a temporary char in comma + whitespace.
local function escape_comma_whitespace(run)
local escaped = false
if run:find("\\,") then
run = run:gsub("\\,", "\\" .. TEMPCOMMA)
escaped = true
end
if run:find("\\،") then
run = run:gsub("\\،", "\\" .. TEMPARCOMMA)
escaped = true
end
if run:find(",%s") then
run = run:gsub(",(%s)", TEMPCOMMA .. "%1")
escaped = true
end
if run:find("،%s") then
run = run:gsub("،(%s)", TEMPARCOMMA .. "%1")
escaped = true
end
return run, escaped
end
-- Undo replacement of comma with a temporary char in comma + whitespace.
local function unescape_comma_whitespace(run)
return (run:gsub(TEMPCOMMA, ","):gsub(TEMPARCOMMA, "،"))
end
-- Split an argument on comma or Arabic comma, but not either type of comma followed by whitespace.
local function split_on_comma(val)
if rfind(val, "[,،]%s") or val:find("\\") then
return export.split_escaping(val, "[,،]", false, escape_comma_whitespace, unescape_comma_whitespace)
else
return rsplit(val, "[,،]")
end
end
local function replace_tr_ending(tr, from, to)
if not tr then
return nil
end
local pref = tr:match("^(.*)" .. from .. "$")
if not pref then
error(("Translit '%s' does not end in -%s, as expected"):format(tr, from))
end
return pref .. to
end
-----------------------------------------------------------------------------------------
-- Tracking functions --
-----------------------------------------------------------------------------------------
local trackfn = require("Module:debug/track")
local function track(page)
trackfn(langcode .. "-headword/" .. page)
return true
end
--[==[
Examples of what you can find by looking at what links to the given
pages:
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized]]
all unvocalized pages
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/pl]]
all unvocalized pages where the plural is unvocalized,
whether specified using pl=, pl2=, etc.
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/head]]
all unvocalized pages where the head is unvocalized
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/head/nouns]]
all nouns excluding proper nouns, collective nouns,
singulative nouns where the head is unvocalized
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/head/proper]]
nouns all proper nouns where the head is unvocalized
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/head/not]]
proper nouns all words that are not proper nouns
where the head is unvocalized
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/adjectives]]
all adjectives where any parameter is unvocalized;
currently only works for heads,
so equivalent to .../unvocalized/head/adjectives
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized-empty-head]]
all pages with an empty head
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized-manual-translit]]
all unvocalized pages with manual translit
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized-manual-translit/head/nouns]]
all nouns where the head is unvocalized but has manual translit
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized-no-translit]]
all unvocalized pages without manual translit
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/i3rab]]
all pages with any parameter containing i3rab
of either -un, -u, -a or -i
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/i3rab-un]]
all pages with any parameter containing an -un i3rab ending
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/i3rab-un/pl]]
all pages where a form specified using pl=, pl2=, etc.
contains an -un i3rab ending
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/i3rab-u/head]]
all pages with a head containing an -u i3rab ending
[[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/i3rab/head/proper]]
nouns (all proper nouns with a head containing i3rab
of either -un, -u, -a or -i)
In general, the format is one of the following:
Wiktionary:Tracking/ar-headword/FIRSTLEVEL
Wiktionary:Tracking/ar-headword/FIRSTLEVEL/ARGNAME
Wiktionary:Tracking/ar-headword/FIRSTLEVEL/POS
Wiktionary:Tracking/ar-headword/FIRSTLEVEL/ARGNAME/POS
FIRSTLEVEL can be one of "unvocalized", "unvocalized-empty-head" or its
opposite "unvocalized-specified", "unvocalized-manual-translit" or its
opposite "unvocalized-no-translit", "i3rab", "i3rab-un", "i3rab-u",
"i3rab-a", or "i3rab-i".
ARGNAME is either "head" or an argument such as "pl", "f", "cons", etc.
This automatically includes arguments specified as head2=, pl3=, etc.
POS is a part of speech, lowercase and singular, e.g. "noun",
"adjective", "proper noun", "collective noun", etc. or
"not proper noun", which includes all parts of speech but proper nouns.
]==]
local function track_form(argname, form, translit, pos)
form = ar.reorder_shadda(remove_links(form))
function dotrack(page)
track(page)
track(page .. "/" .. argname)
if pos then
track(page .. "/" .. pos)
track(page .. "/" .. argname .. "/" .. pos)
if pos ~= "proper noun" then
track(page .. "/not proper noun")
track(page .. "/" .. argname .. "/not proper noun")
end
end
end
function track_i3rab(arabic, tr)
if rfind(form, arabic .. "$") then
dotrack("i3rab")
dotrack("i3rab-" .. tr)
end
end
track_i3rab(ar.UN, "un")
track_i3rab(ar.U, "u")
track_i3rab(ar.A, "a")
track_i3rab(ar.I, "i")
if form == "" or not (lang:transliterate(form)) then
dotrack("unvocalized")
if form == "" then
dotrack("unvocalized-empty-head")
else
dotrack("unvocalized-specified")
end
if translit then
dotrack("unvocalized-manual-translit")
else
dotrack("unvocalized-no-translit")
end
end
end
-----------------------------------------------------------------------------------------
-- Inflection-parsing functions --
-----------------------------------------------------------------------------------------
-- Construct the default construct state or informal form of a term in lemma format. Usually this is the same as the
-- lemma but is different for final-weak nouns and adjectives ending in -n in their lemma. NOTE: Input must be
-- shadda-reordered for this to work properly.
local function default_construct_state_or_informal(term, tr)
local pref = term:match("^(.*)" .. ar.HAMZA .. ar.IN .."$")
-- Hamza on the line with -in changes to hamza-on-yā with -ī.
if pref then
return pref .. ar.HAMZA_ON_YA .. ar.II, replace_tr_ending(tr, "in", "ī")
end
-- Otherwise just change -in to -ī.
pref = term:match("^(.*)" .. ar.IN .. "$")
if pref then
return pref .. ar.II, replace_tr_ending(tr, "in", "ī")
end
-- Change -an with alif maqṣūra to -ā with alif maqṣūra.
pref = term:match("^(.*)" .. ar.AN .. ar.AMAQ .. "$")
if pref then
return pref .. ar.AAMAQ, replace_tr_ending(tr, "an", "ā")
end
-- Change -an with tall alif (e.g. عَصًا) to -ā with tall alif.
pref = term:match("^(.*)" .. ar.AN .. ar.ALIF .. "$")
if pref then
return pref .. ar.AA, replace_tr_ending(tr, "an", "ā")
end
return term, tr
end
local function generate_construct_state_or_informal_default(data, args)
local heads = data.heads
local consobjs = {}
local different_cons = false
for _, headobj in ipairs(data.heads) do
local consterm, constr = default_construct_state_or_informal(headobj.term, headobj.tr)
different_cons = different_cons or consterm ~= headobj.term or constr ~= headobj.tr
local consobj = m_table.shallowCopy(headobj)
consobj.term = consterm
consobj.tr = constr
insert(consobjs, consobj)
end
if different_cons then
return consobjs
else
return {}
end
end
local noun_field_cons = {
field = "cons", label = "<<construct state>>", generate_default = generate_construct_state_or_informal_default,
default_when_not_explicit = function(args, data) return true end,
}
local noun_field_inf = {field = "inf", label = "informal"}
local noun_field_obl = {field = "obl", label = "<<oblique>>"}
local noun_field_def = {field = "def", label = "<<definite>> state"}
local noun_inflections = {
noun_field_cons,
noun_field_inf,
noun_field_obl,
noun_field_def,
}
local adj_field_inf = {
field = "inf", label = "informal", generate_default = generate_construct_state_or_informal_default,
default_when_not_explicit = function(args, data) return true end,
}
local adj_field_obl = noun_field_obl
local adj_field_def = noun_field_def
local adjective_inflections = {
adj_field_inf,
adj_field_obl,
adj_field_def,
}
local function has_construct_state(data)
return data.pos_category ~= "adjectives"
end
local function parse_nominal_inflection(paramname, val, parse_err)
return m_headword_utilities.parse_term_with_modifiers {
val = val,
paramname = paramname,
splitchar = ",",
include_mods = {"tr", "g"},
}
end
local function make_nominal_inflection_param_mod_spec(paramname)
return {convert = function(val, parse_err)
return parse_nominal_inflection(paramname, val, parse_err)
end}
end
-- Parse an inflection. The raw arguments come from `args[field]`, which is parsed for inline modifiers. Multiple
-- comma-separated values are allowed.
local function parse_inflection(data, args, field, is_head)
local argfield = field
local argpref = field
if type(argfield) == "table" then
argpref = argfield[2]
argfield = argfield[1]
end
local include_mods
if is_head then
include_mods = {"tr"}
else
include_mods = {"tr", "g"}
for _, spec in ipairs(has_construct_state(data) and noun_inflections or adjective_inflections) do
insert(include_mods, {spec.field, make_nominal_inflection_param_mod_spec(argpref .. "." .. spec.field)})
end
end
if is_head then
local retval
if args[argfield] then
retval = m_headword_utilities.parse_term_with_modifiers {
val = args[argfield],
paramname = field,
splitchar = ",",
is_head = is_head,
include_mods = include_mods,
}
end
return retval or {}
else
return m_headword_utilities.parse_term_list_with_modifiers {
forms = args[argfield],
paramname = field,
splitchar = ",",
is_head = is_head,
include_mods = include_mods,
}
end
end
local function insert_inflection(data, terms, label, accel, defgender, track_field, no_label, usually_no_label)
local track_pos = m_en_utilities.singularize(data.pos_category)
for _, termobj in ipairs(terms) do
-- If the user supplied a construct state or informal form for the term with a value of "+", substitute the
-- default value for the term. If the user supplied a value of "--", they want no value displayed. Otherwise,
-- if the user didn't supply any value, we check to see if the default construct state or informal form is
-- different from the lemma and display it if so; this applies particularly to terms in '-in' and '-an', where
-- the default construct state or informal form is almost always correct.
local field = has_construct_state(data) and "cons" or "inf"
if not termobj[field] then
local defcons, defconstr = default_construct_state_or_informal(termobj.term, termobj.tr)
if termobj.term ~= defcons or termobj.tr ~= defconstr then
-- We don't want to copy qualifiers, labels, etc. from the term object because we're a subinflection of
-- the term object.
termobj[field] = {{term = defcons, tr = defconstr}}
end
elseif termobj[field][1].term == "--" then
if termobj[field][2] then
error("Can't specify more than one value for <" .. field .. ":...> if first value is '--', meaning \"don't insert anything\"")
end
termobj[field] = nil
else
for i, consobj in ipairs(termobj[field]) do
if consobj.term == "+" then
if consobj.tr then
error("Can't specify translit for default value '+'")
end
consobj.term, consobj.tr = default_construct_state_or_informal(termobj.term, termobj.tr)
elseif consobj.term == "~" then
if consobj.tr then
error("Can't specify translit for term-requesting value '~'")
end
consobj.term, consobj.tr = termobj.term, termobj.tr
end
end
end
if defgender and not termobj.genders then
termobj.genders = {{spec = defgender}}
end
local function insert_nested_inflection(field, label)
if termobj[field] then
m_headword_utilities.insert_inflection {
headdata = data,
inflobj = termobj,
terms = termobj[field],
label = label
}
end
end
for _, spec in ipairs(has_construct_state(data) and noun_inflections or adjective_inflections) do
insert_nested_inflection(spec.field, spec.label)
end
track_form(track_field, termobj.term, termobj.tr, track_pos)
end
m_headword_utilities.insert_inflection {
headdata = data,
terms = terms,
label = label,
accel = accel and {form = accel} or nil,
no_label = no_label,
usually_no_label = usually_no_label,
}
end
-----------------------------------------------------------------------------------------
-- Main entry point --
-----------------------------------------------------------------------------------------
function export.show(frame)
local iparams = {
[1] = true,
}
local iargs = require("Module:parameters").process(frame.args, iparams)
local parargs = frame:getParent().args
local poscat = iargs[1]
local pos_in_1 = not poscat
if pos_in_1 then
poscat = ine(parargs[1]) or
mw.title.getCurrentTitle().fullText == "Template:" .. langcode .. "-head" and "interjection" or
error("Part of speech must be specified in 1=")
poscat = require(headword_module).canonicalize_pos(poscat)
end
local indexing_poscat = pos_in_1 and (misc_pos_with_gender[poscat] and "head_with_gender" or "head") or poscat
local params = {
["suffix"] = boolean_param,
["nosuffix"] = boolean_param,
["id"] = true,
["json"] = boolean_param,
["pagename"] = {}, -- for testing
}
if pos_in_1 then
params[1] = {required = true} -- required but ignored as already processed above
end
local head_is_head = pos_functions[indexing_poscat] and pos_functions[indexing_poscat].head_is_not_1
local headfield = head_is_head and "head" or pos_in_1 and 2 or 1
params[headfield] = head_is_head and true or {default = "+"}
params.head2 = {replaced_by = false, instead = "use multiple comma-separated values in |" .. headfield .. "="}
local tr_replaced_by = {replaced_by = false, instead = "use <tr:...> inline modifier on |" .. headfield .. "="}
params.tr = tr_replaced_by
params.tr2 = tr_replaced_by
if pos_functions[indexing_poscat] then
for key, val in pairs(pos_functions[indexing_poscat].params()) do
params[key] = val
end
end
local parargs = frame:getParent().args
local args = require("Module:parameters").process(parargs, params)
local pagename = args.pagename or mw.loadData("Module:headword/data").pagename
local data = {
lang = lang,
pos_category = poscat,
orig_pos_category = poscat,
categories = {},
heads = {},
genders = {},
inflections = {enable_auto_translit = true},
pagename = pagename,
id = args.id,
sort_key = args.sort,
force_cat_output = force_cat,
-- We expect a head always so the redundant head cat will be inaccurate.
no_redundant_head_cat = true,
}
data.heads = parse_inflection(data, args, headfield, "is_head")
for _, headobj in ipairs(data.heads) do
if headobj.term == "+" then
headobj.term = pagename
end
end
data.is_suffix = false
if args.suffix or (
not args.nosuffix and pagename:find("^%-") and poscat ~= "suffixes" and poscat ~= "suffix forms"
) then
data.is_suffix = true
data.pos_category = "suffixes"
local singular_poscat = m_en_utilities.singularize(poscat)
insert(data.categories, langname .. " " .. singular_poscat .. "-forming suffixes")
insert(data.inflections, {label = singular_poscat .. "-forming suffix"})
end
if pos_functions[indexing_poscat] then
pos_functions[indexing_poscat].func(data, args)
end
-- Do this after calling pos_functions[poscat].func() as it may modify data.heads (as verbs do).
local irreg_translit = false
for _, head in ipairs(data.heads) do
if ar_translit.irregular_translit(head.term, head.tr) then
irreg_translit = true
break
end
end
if irreg_translit then
insert(data.categories, langname .. " terms with irregular pronunciations")
end
if args.json then
return require("Module:JSON").toJSON(data)
end
return require(headword_module).full_headword(data)
end
-----------------------------------------------------------------------------------------
-- Gender handling --
-----------------------------------------------------------------------------------------
local valid_bare_genders = {false, "m", "f", "mf", "mfbysense", "mfequiv"}
local valid_bare_numbers = {false, "d", "p"}
local valid_bare_animacies = {false, "pr", "np"}
local valid_genders = {}
for _, gender in ipairs(valid_bare_genders) do
for _, number in ipairs(valid_bare_numbers) do
for _, animacy in ipairs(valid_bare_animacies) do
local parts = {}
local function ins_part(part)
if part then
insert(parts, part)
end
end
ins_part(gender)
ins_part(number)
ins_part(animacy)
local full_gender = concat(parts, "-")
valid_genders[full_gender == "" and "?" or full_gender] = true
end
end
end
local function is_masc_sg(g)
return g == "m" or g == "m-pr" or g == "m-np"
end
local function is_fem_sg(g)
return g == "f" or g == "f-pr" or g == "f-np"
end
local function is_masc_fem_sg(g)
g = g:gsub("%-pr", ""):gsub("%-np", "")
return g == "mf" or g == "mfequiv" or g == "mfbysense"
end
local function add_gender_params(params, default)
params[2] = {type = "genders", default = default or "?", template_default = "m"}
params["g2"] = {replaced_by = false, instead = "use comma-separated values in |g="}
end
-- Handle gender in params 2=, inserting into `data.genders`. Also, if a lemma, insert categories into `data.categories`
-- if the gender is unexpected for the form of the noun. (Note: If there are multiple genders,
-- [[Module:gender and number]] will automatically insert 'Arabic POS with multiple genders'.)
local function handle_gender(data, args, nonlemma, field)
if not args[field or 2] then
return
end
for _, gspec in ipairs(args[field or 2]) do
if not valid_genders[gspec.spec] then
error("Unrecognized gender: " .. gspec.spec)
end
end
data.genders = args[field or 2]
if nonlemma then
return
end
for _, gspec in ipairs(data.genders) do
local g = gspec.spec
if is_masc_sg(g) or is_fem_sg(g) or is_masc_fem_sg(g) then
local head = data.heads[1]
if head then
head = rsub(ar.reorder_shadda(remove_links(head.term)), ar.UNUOPT .. "$", "")
local ends_with_tam = rfind(head, "^[^ ]*" .. ar.TAM .. "$") or
rfind(head, "^[^ ]*" .. ar.TAM .. " ")
if (is_masc_sg(g) or is_masc_fem_sg(g)) and ends_with_tam then
insert(data.categories, langname .. " masculine terms with feminine ending")
elseif (is_fem_sg(g) or is_masc_fem_sg(g)) and not ends_with_tam and
not rfind(head, "[" .. ar.ALIF .. ar.AMAQ .. "]$") and
not rfind(head, ar.ALIF .. ar.HAMZA .. "$") then
insert(data.categories, langname .. " feminine terms lacking feminine ending")
end
end
end
end
end
-----------------------------------------------------------------------------------------
-- Inflection handlers --
-----------------------------------------------------------------------------------------
-- Add list parameters to `params` (a structure as passed to [[Module:parameters]]) for a parameter named `argpref`.
-- If `argpref` is "*", add the nominal inflection parameters for construct state, definite state, etc. Related
-- transliteration and gender parameters are no longer supported in favor of inline modifiers, and error messages are
-- output if these parameters are used.
local function add_infl_params(params, argpref)
params[argpref] = {list = true, disallow_holes = true}
params[argpref .. "tr"] = {replaced_by = false, instead = "use <tr:...> inline modifier on |" .. argpref .. "="}
params[argpref .. "g"] = {replaced_by = false, instead = "use <g:...> inline modifier on |" .. argpref .. "="}
end
--[=[
Fetch a list of inflections from the arguments in `args` based on argument `field` (e.g. "pl"). Label with `label`
(e.g. "plural"), which will appear in the headword. Insert into `data.inflections`, where `data` is the structure
passed to [[Module:headword]]. If `generate_default` is specified, it should be a function of two arguments
(`data`, `args`), which should generate the default value if no values are specified or if "+" is explicitly given.
If `generate_default` isn't specified and the user gave no values, no inflection will be inserted.
]=]
local function handle_infl(data, args, spec)
local newinfls = parse_inflection(data, args, spec.field, false)
if not newinfls[1] and spec.default_when_not_explicit and spec.default_when_not_explicit(data, args) then
newinfls = {{term = "+"}}
end
if spec.handle then
spec.handle(data, args, newinfls)
end
local default_specs = spec.allowed_defspecs
if not default_specs then
default_specs = spec.generate_default and {["+"] = true} or {}
end
local saw_defspec = false
for _, newinfl in ipairs(newinfls) do
if default_specs[newinfl.term] or newinfl.term == "~" then
saw_defspec = true
break
end
end
if saw_defspec then
local newnewinfls = {}
for _, newinfl in ipairs(newinfls) do
if default_specs[newinfl.term] then
if newinfl.tr then
error("Can't specify translit for default value '" .. newinfl.term .. "'")
end
local definfls = spec.generate_default(data, args, newinfl.term)
for _, definfl in ipairs(definfls) do
m_headword_utilities.combine_termobj_qualifiers_labels(definfl, newinfl)
insert(newnewinfls, definfl)
end
elseif newinfl.term == "~" then
if newinfl.tr then
error("Can't specify translit for head-requesting value '~'")
end
for _, headobj in ipairs(data.heads) do
headobj = m_table.shallowCopy(headobj)
m_headword_utilities.combine_termobj_qualifiers_labels(headobj, newinfl)
insert(newnewinfls, headobj)
end
else
insert(newnewinfls, newinfl)
end
end
newinfls = newnewinfls
end
if newinfls[1] then
if newinfls[1].term == "--" then
if newinfls[2] then
error("Can't specify more than one term if first term is '--', meaning \"don't insert anything\"")
end
else
insert_inflection(data, newinfls, spec.label, nil, spec.defgender, spec.field, spec.no_label,
spec.usually_no_label)
end
end
end
local function add_infl_list_params(params, infl_list)
for _, infl in ipairs(infl_list) do
add_infl_params(params, infl.field)
end
end
local function handle_infl_list_args(data, args, infl_list)
for _, infl in ipairs(infl_list) do
handle_infl(data, args, infl)
end
end
-----------------------------------------------------------------------------------------
-- Default ending generators --
-----------------------------------------------------------------------------------------
local function make_conditional_default(specs)
return function(data, args)
local heads = data.heads
if not heads[1] then
heads = {{term = data.pagename}}
end
local newobjs = {}
for _, headobj in ipairs(heads) do
local term = ar.reorder_shadda(headobj.term)
local tr = headobj.tr
local matched = false
for _, spec in ipairs(specs) do
local from, fromtr, to, totr = unpack(spec)
if from:find("^%^") then
pref = rmatch(term, from .. "$")
else
pref = rmatch(term, "^(.*)" .. from .. "$")
end
if pref then
term = pref .. to
tr = replace_tr_ending(tr, fromtr, totr)
matched = true
headobj = m_table.shallowCopy(headobj)
headobj.term = ar.undo_reorder_shadda(term)
headobj.tr = tr
insert(newobjs, headobj)
break
end
end
if not matched then
error(("Internal error: No matching spec: head=%s"):format(dump(headobj)))
end
end
return newobjs
end
end
local default_feminine = make_conditional_default {
{ar.AN .. ar.AMAQ, "an", ar.AAH, "āh"},
{ar.AN .. ar.ALIF, "an", ar.AAH, "āh"}, -- e.g. مُحْيًا
{ar.HAMZA .. ar.IN, "in", ar.HAMZA_ON_YA .. ar.IYAH, "iya"},
{ar.IN, "in", ar.IYAH, "iya"},
{"", "", ar.AH, "a"},
}
local default_masculine = make_conditional_default {
-- tall alif substitutes for alif maqṣūra after a yāʔ
{ar.Y .. ar.AAH, "āh", ar.AN .. ar.ALIF, "an"},
{ar.AAH, "āh", ar.AN .. ar.AMAQ, "an"},
-- handle the common case of final-weak feminine active participle with preceding hamza;
-- the hamza-on-yāʔ always converts back to hamza on the line when preceded by ā (alif) but
-- may not otherwise, so we just leave it alone in that case
{ar.ALIF .. ar.HAMZA_ON_YA .. ar.IYAH, "iya", ar.HAMZA .. ar.IN, "in"},
{ar.IYAH, "iya", ar.IN, "in"},
{ar.AH, "a", "", ""},
{"", "", "", ""},
}
local default_masculine_plural = make_conditional_default {
{ar.AN .. ar.AMAQ, "an", ar.AWN, "awn"},
{ar.AN .. ar.ALIF, "an", ar.AWN, "awn"}, -- e.g. مُحْيًا
{ar.HAMZA .. ar.IN, "in", ar.HAMZA_ON_WAW .. ar.UUN, "ūn"},
{ar.IN, "in", ar.UUN, "ūn"},
{"", "", ar.UUN, "ūn"},
}
local default_feminine_plural = make_conditional_default {
-- صَلَاة pl. صَلَوَات and أَدَاة pl. أَدَوَات and similar; but نَوَاة and وَفَاة with a و in them become نَوَيَات and وَفَيَات;
-- and longer terms like مُبَارَاة and كُمَّثْرَاة invariably form their plural in -يَات.
{"^([^و]" .. ar.A .. "[^و])" .. ar.AAH, "āh", ar.A .. ar.W .. ar.AAT, "awāt"},
{ar.AAH, "āh", ar.AYAAT, "ayāt"},
{ar.AN .. ar.AMAQ, "an", ar.AYAAT, "ayāt"},
{ar.AN .. ar.ALIF, "an", ar.AYAAT, "ayāt"}, -- e.g. مُحْيًا
{ar.HAMZA .. ar.IN, "in", ar.HAMZA_ON_YA .. ar.IYAAT, "iyāt"},
{ar.IN, "in", ar.IYAAT, "iyāt"},
{ar.AH, "a", ar.AAT, "āt"},
{"", "", ar.AAT, "āt"},
}
local default_masculine_dual = make_conditional_default {
{ar.AN .. ar.AMAQ, "an", ar.AYAAN, "ayān"},
{ar.AN .. ar.ALIF, "an", ar.AYAAN, "ayān"}, -- e.g. مُحْيًا
{ar.HAMZA .. ar.IN, "in", ar.HAMZA_ON_YA .. ar.IYAAN, "iyān"},
{ar.IN, "in", ar.IYAAN, "iyān"},
{"", "", ar.AAN, "ān"},
}
local default_feminine_dual = make_conditional_default {
{ar.AN .. ar.AMAQ, "an", ar.AATAAN, "ātān"},
{ar.AN .. ar.ALIF, "an", ar.AATAAN, "ātān"}, -- e.g. مُحْيًا
{ar.HAMZA .. ar.IN, "in", ar.HAMZA_ON_YA .. ar.IY .. ar.ATAAN, "iyatān"},
{ar.IN, "in", ar.IY .. ar.ATAAN, "iyatān"},
{"", "", ar.ATAAN, "atān"},
}
-- Return whether `term` is a nisba noun or adjective, ending in -iyy or -iyyah. `nisba_val` is the value of
-- args.nisba; if non-nil, it overrides any auto-determination based on the shape of the term.
local function term_is_nisba(term, nisba_val)
if nisba_val ~= nil then
return nisba_val
end
term = ar.reorder_shadda(term) -- necessary to avoid issues with e.g. أُورُوبِّيّ.
local pref = rmatch(term, "^(.*)" .. ar.IYY .. ar.UN .. "?$")
if not pref then
pref = rmatch(term, "^(.*)" .. ar.IYYAH .. ar.UN .. "?$")
end
-- Avoid false positives for words like قَوِيّ "strong" and صَبِيّ "boy". There may be other false positives
-- but this should catch most of them and will avoid very many false negatives.
return pref and not rfind(pref, "^[^ا]" .. ar.A .. ".$")
end
-----------------------------------------------------------------------------------------
-- Adjectives --
-----------------------------------------------------------------------------------------
local function is_defaulting_adjective(data, args)
return data.orig_pos_category == "defaulting adjectives"
end
local adj_field_elative = {field = "el", label = "<<elative>>"}
local adj_inflections = {
adj_field_inf,
adj_field_obl,
adj_field_def,
{field = "f", label = "feminine", generate_default = default_feminine,
default_when_not_explicit = is_defaulting_adjective},
{field = "d", label = "masculine dual", generate_default = default_masculine_dual},
{field = "fd", label = "feminine dual", generate_default = default_feminine_dual},
{field = "cpl", label = "common plural"},
{field = "pl", label = "masculine plural", generate_default = default_masculine_plural,
default_when_not_explicit = is_defaulting_adjective},
{field = "fpl", label = "feminine plural", generate_default = default_feminine_plural,
default_when_not_explicit = is_defaulting_adjective},
}
local function get_adj_params()
local params = {}
add_infl_list_params(params, adj_inflections)
add_infl_params(params, "el")
params.nisba = boolean_param
return params
end
local function handle_adj_args(data, args)
handle_infl_list_args(data, args, adj_inflections)
handle_infl(data, args, adj_field_elative)
for _, headobj in ipairs(data.heads) do
if term_is_nisba(headobj.term, args.nisba) then
insert(data.categories, langname .. " relative adjectives (nisba)")
break
end
end
end
pos_functions["adjectives"] = {
params = get_adj_params,
func = handle_adj_args,
}
pos_functions["defaulting adjectives"] = {
params = get_adj_params,
func = function(data, args)
data.pos_category = "adjectives"
handle_adj_args(data, args)
end,
}
-----------------------------------------------------------------------------------------
-- Nouns, etc. --
-----------------------------------------------------------------------------------------
local function get_masc_or_feminine_gender(data, default_type)
local saw_m, saw_f, saw_mf
for _, gender in ipairs(data.genders) do
if is_masc_sg(gender.spec) then
saw_m = true
elseif is_fem_sg(gender.spec) then
saw_f = true
elseif is_masc_fem_sg(gender.spec) then
saw_mf = true
end
end
if saw_mf or saw_m and saw_f then
error("Can't generate default for " .. default_type .. " when gender is both masculine and feminine")
elseif saw_m then
return "m"
elseif saw_f then
return "f"
else
error("Can't generate default for " .. default_type .. " when gender is not specified as " ..
"masculine or feminine singular")
end
end
local function is_defaulting_noun(data, args)
return data.orig_pos_category == "defaulting nouns"
end
local noun_field_dual = {
field = "d", label = "dual",
generate_default = function(data, args)
local gender = get_masc_or_feminine_gender(data, "noun dual")
if gender == "m" then
return default_masculine_dual(data, args)
else
return default_feminine_dual(data, args)
end
end,
}
local noun_field_plural = {
field = "pl", label = "plural",
generate_default = function(data, args, defspec)
local gender = get_masc_or_feminine_gender(data, "noun plural")
if gender == "m" then
if defspec == "+f" then
return default_feminine_plural(data, args)
else
return default_masculine_plural(data, args)
end
elseif defspec == "+f" then
error("Can't specify '+f' with feminine gender; just use '+'")
else
return default_feminine_plural(data, args)
end
end,
-- Handle the case where pl=-, indicating an uncountable noun.
handle = function(data, args, terms)
if terms[1] and terms[1] == "-" then
insert(data.categories, langname .. " uncountable nouns")
if args.pauc and args.pauc[1] then
error("Can't specify paucals when pl=-")
end
end
end,
allowed_defspecs = {["+"] = true, ["+f"] = true},
default_when_not_explicit = is_defaulting_noun,
no_label = "<<uncountable>>",
usually_no_label = "usually <<uncountable>>",
}
local noun_field_paucal = {
field = "pauc", label = "<<paucal>>", generate_default = default_feminine_plural,
}
local noun_field_feminine = {
field = "f", label = "feminine", generate_default = default_feminine,
default_when_not_explicit = function(data, args)
if data.orig_pos_category ~= "defaulting nouns" then
return nil
end
local gender = get_masc_or_feminine_gender(data, "defaulting-if-masculine noun feminine")
return gender == "m"
end,
}
local noun_field_masculine = {
field = "m", label = "masculine", generate_default = default_masculine,
default_when_not_explicit = function(data, args)
if data.orig_pos_category ~= "defaulting nouns" then
return nil
end
local gender = get_masc_or_feminine_gender(data, "defaulting-if-feminine noun masculine")
return gender == "f"
end,
}
local noun_basic_inflections = {
noun_field_cons,
noun_field_inf,
noun_field_obl,
noun_field_def,
}
local noun_shared_inflections = {
noun_field_dual,
noun_field_plural,
}
local noun_extra_inflections = {
noun_field_paucal,
noun_field_feminine,
noun_field_masculine,
}
local function get_noun_params()
local params = {}
add_gender_params(params)
add_infl_list_params(params, noun_basic_inflections)
add_infl_list_params(params, noun_shared_inflections)
add_infl_list_params(params, noun_extra_inflections)
params.nisba = boolean_param
return params
end
local function handle_noun_args(data, args)
handle_gender(data, args)
handle_infl_list_args(data, args, noun_basic_inflections)
handle_infl_list_args(data, args, noun_shared_inflections)
handle_infl_list_args(data, args, noun_extra_inflections)
for _, headobj in ipairs(data.heads) do
if term_is_nisba(headobj.term, args.nisba) then
insert(data.categories, langname .. " relative nouns (nisba)")
break
end
end
end
pos_functions["nouns"] = {
params = get_noun_params,
func = handle_noun_args,
}
pos_functions["defaulting nouns"] = {
params = get_noun_params,
func = function(data, args)
data.pos_category = "nouns"
handle_noun_args(data, args)
end,
}
local noun_field_singulative = {field = "sing", label = "<<singulative>>", defgender = "f", generate_default = default_feminine}
local noun_field_collective = {field = "coll", label = "<<collective>>", defgender = "m", generate_default = default_masculine}
local function handle_sing_coll_noun_infls(data, args, otherinfl, otherlabel, othergender)
-- Handle sing= (corresponding singulative noun) or coll= (corresponding collective noun) and their gender
handle_infl(data, args, otherinfl, otherlabel, nil, othergender)
handle_infl_list_args(data, args, sing_coll_noun_inflections)
end
local function get_singulative_collective_noun_params(defgender, otherinfl)
local params = {}
add_gender_params(params, defgender)
add_infl_list_params(params, noun_basic_inflections)
add_infl_params(params, otherinfl)
add_infl_list_params(params, noun_shared_inflections)
add_infl_params(params, "pauc")
return params
end
pos_functions["collective nouns"] = {
params = function() return get_singulative_collective_noun_params("m", "sing") end,
func = function(data, args)
data.pos_category = "nouns"
insert(data.categories, langname .. " collective nouns")
m_headword_utilities.insert_fixed_inflection {
headdata = data,
label = "<<collective>>",
}
handle_gender(data, args)
handle_infl_list_args(data, args, noun_basic_inflections)
handle_infl(data, args, noun_field_singulative)
handle_infl_list_args(data, args, noun_shared_inflections)
handle_infl(data, args, noun_field_paucal)
end
}
pos_functions["singulative nouns"] = {
params = function() return get_singulative_collective_noun_params("f", "coll") end,
func = function(data, args)
data.pos_category = "nouns"
insert(data.categories, langname .. " singulative nouns")
m_headword_utilities.insert_fixed_inflection {
headdata = data,
label = "<<singulative>>",
}
handle_gender(data, args)
handle_infl_list_args(data, args, noun_basic_inflections)
handle_infl(data, args, noun_field_collective)
handle_infl_list_args(data, args, noun_shared_inflections)
handle_infl(data, args, noun_field_paucal)
end
}
-- FIXME: Do numerals really behave almost as nouns? They vary by masc/fem.
pos_functions["numerals"] = {
params = get_noun_params,
func = function(data, args)
insert(data.categories, langname .. " cardinal numbers")
handle_noun_args(data, args)
end
}
pos_functions["proper nouns"] = {
params = get_noun_params,
func = handle_noun_args,
}
local function get_pronoun_params()
local params = {}
add_gender_params(params, defgender)
add_infl_list_params(params, noun_basic_inflections)
add_infl_list_params(params, noun_shared_inflections)
add_infl_params(params, "f")
return params
end
pos_functions["pronouns"] = {
params = get_pronoun_params,
func = function(data, args)
handle_gender(data, args)
handle_infl_list_args(data, args, noun_basic_inflections)
handle_infl_list_args(data, args, noun_shared_inflections)
handle_infl(data, args, noun_field_feminine)
end
}
-----------------------------------------------------------------------------------------
-- Non-lemma forms --
-----------------------------------------------------------------------------------------
local valid_forms = list_to_set(
{ "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "X", "XI", "XII",
"XIII", "XIV", "XV", "Iq", "IIq", "IIIq", "IVq" })
-- FIXME: Partly duplicated in [[Module:ar-inflections]].
local function handle_conj_form(data, args)
local form = args[2]
if form then
if not valid_forms[form] then
error("Invalid verb conjugation form " .. form)
end
insert(data.inflections, { label = "[[Appendix:Arabic verbs#Form " .. form .. "|form " .. form .. "]]" })
end
end
pos_functions["verb forms"] = {
params = function()
return {
[2] = {},
}
end,
func = function(data, args)
handle_conj_form(data, args)
end
}
local function get_participle_params()
local params = get_adj_params()
params[2] = {}
return params
end
pos_functions["active participles"] = {
params = get_participle_params,
func = function(data, args)
data.pos_category = "participles"
insert(data.categories, langname .. " active participles")
handle_conj_form(data, args)
handle_infl_list_args(data, args, adj_inflections)
end
}
pos_functions["passive participles"] = {
params = get_participle_params,
func = function(data, args)
data.pos_category = "participles"
insert(data.categories, langname .. " passive participles")
handle_conj_form(data, args)
handle_infl_list_args(data, args, adj_inflections)
end
}
-----------------------------------------------------------------------------------------
-- Verbs --
-----------------------------------------------------------------------------------------
pos_functions["verbs"] = {
head_is_not_1 = true,
params = function() return {
[1] = {},
-- Comma-separated lists with possible inline modifiers
["past"] = {},
["past1s"] = {},
["nonpast"] = {},
["vn"] = {},
["noautolinktext"] = {type = "boolean"},
["noautolinkverb"] = {type = "boolean"},
} end,
func = function(data, args)
local ar_verb = require(ar_verb_module)
local alternant_multiword_spec =
args[1] ~= "-" and ar_verb.do_generate_forms(args, "ar-verb", data.pagename) or nil
local function do_slot(slots_to_check, override, label, slot_is_headword)
-- Do this even with an override so we can return the correct filled slot.
local slot, slotval
if alternant_multiword_spec then
for _, potential_slot in ipairs(slots_to_check) do
slotval = alternant_multiword_spec.forms[potential_slot]
if slotval then
slot = potential_slot
break
end
end
end
local function get_slot_values()
local terms = {}
for _, form in ipairs(slotval) do
local term = {
term = form.form,
id = form.id,
genders = form.genders,
pos = form.pos,
lit = form.lit,
}
term.tr = form.translit
if form.footnotes then
local quals, refs = require(inflection_utilities_module).
convert_footnotes_to_qualifiers_and_references(form.footnotes)
term.q = quals
term.refs = refs
end
insert(terms, term)
end
return terms
end
if override then
local override_param_mods = {
alt = {},
t = {
-- [[Module:headword]] expects the gloss in "gloss".
item_dest = "gloss",
},
gloss = {},
g = {
-- [[Module:headword]] expects the genders in "genders".
item_dest = "genders",
type = "genders",
},
pos = {},
lit = {},
id = {},
-- Qualifiers and labels
q = {
type = "qualifier",
},
qq = {
type = "qualifier",
},
l = {
type = "labels",
},
ll = {
type = "labels",
},
ref = {
-- [[Module:headword]] expects the references in "refs".
item_dest = "refs",
type = "references",
},
}
local function generate_obj(formval, parse_err)
if formval == "+" then
return {term = "+", underlying_terms = get_slot_values()}
end
local val, uncertain = formval:match("^(.*)(%?)$")
val = val or formval
uncertain = not not uncertain
local ar, translit = val:match("^(.*)//(.*)$")
if not ar then
ar = formval
end
local retval = {term = ar, uncertain = uncertain}
retval.tr = translit
end
local terms
if override:find("<") then
terms = require(parse_utilities_module).parse_inline_modifiers(override, {
paramname = paramname,
param_mods = override_param_mods,
generate_obj = generate_obj,
splitchar = "[,،]",
escape_fun = escape_comma_whitespace,
unescape_fun = unescape_comma_whitespace,
})
else
terms = split_on_comma(override)
for i, split in ipairs(terms) do
terms[i] = generate_obj(split)
end
end
-- See if + was supplied and we have to potentially flatten multiple default terms and harmonize
-- default properties with override properties.
local saw_underlying_terms = false
for _, term in ipairs(terms) do
if term.underlying_terms then
saw_underlying_terms = true
break
end
end
if saw_underlying_terms then
-- Flatten any default terms, copying the corresponding override properties over the default
-- properties. Non-default terms get inserted directly.
local flattened = {}
for _, term in ipairs(terms) do
if term.underlying_terms then
for _, underlying in ipairs(term.underlying_terms) do
for k, v in pairs(term) do
if k ~= "term" and k ~= "underlying_terms" then
if k == "uncertain" then
underlying.uncertain = underlying.uncertain or v
elseif type(v) ~= "table" or v[1] then
-- Don't copy empty lists (which are the default) over possibly non-empty
-- lists.
underlying[k] = v
end
end
end
insert(flattened, underlying)
end
else
insert(flattened, term)
end
end
terms = flattened
end
if not slot_is_headword then
terms.label = label
end
return terms, slot
elseif not alternant_multiword_spec then
return nil, slot
else
if not slotval then
if slot_is_headword then
-- FIXME, put "uncertain" as qualifier? Does this ever happen?
return nil, slot
elseif alternant_multiword_spec.slot_uncertain[slot] then
return {label = label .. " uncertain"}, slot
elseif alternant_multiword_spec.slot_explicitly_missing[slot] then
return {label = "no " .. label}, slot
else
-- just say nothing about this slot
return nil, slot
end
end
local terms = get_slot_values()
if not slot_is_headword then
terms.label = label
end
return terms, slot
end
end
local gloss_parts = {}
for _, vform in ipairs(alternant_multiword_spec.verb_forms) do
insert(gloss_parts, "[[Appendix:Arabic verbs#Form " .. vform .. "|" .. vform .. "]]")
end
if gloss_parts[1] then
data.gloss = concat(gloss_parts, ", ")
end
if data.heads[1] and args.past then
error("Can't specify both head= and past= to {{ar-verb}}; prefer past=")
end
if not alternant_multiword_spec.has_active then
insert(data.inflections, {label = "passive-only"})
end
-- Do this always so `past_slot` is correctly filled.
local past, past_slot = do_slot(ar_verb.potential_lemma_slots, args.past, "-", "slot is headword")
if data.heads[1] then
-- user specified head=; don't override with past= or slot 'past_3sm' etc.
else
if past then
data.heads = past
end
end
local should_do_past1s = not not args.past1s
if not should_do_past1s then
local is_form_I = false
for _, vform in ipairs(alternant_multiword_spec.verb_forms) do
if vform == "I" then
is_form_I = true
break
end
end
if is_form_I then
require(inflection_utilities_module).map_word_specs(alternant_multiword_spec, function(base)
if base.verb_form == "I" then
for _, vowel_spec in ipairs(base.conj_vowels) do
-- For form-I geminate verbs, the final vowel of the past is elided in the citation form.
-- We want to display it for all cases other than active a~u and a~i (the most common
-- cases).
if vowel_spec.weakness == "geminate" then
if ar_verb.is_passive_only(base.passive) then
should_do_past1s = true
break
end
local past_vowel = ar_verb.rget(vowel_spec.past)
local nonpast_vowel = ar_verb.rget(vowel_spec.nonpast)
if not (past_vowel == ar.A and (nonpast_vowel == ar.U or nonpast_vowel == ar.I)) then
should_do_past1s = true
break
end
end
end
-- FIXME, provide way of breaking early from map_word_specs().
end
end)
end
end
local past1s
if should_do_past1s then
past1s, _ = do_slot({"past_1s", "past_pass_1s"}, args.past1s, "first-person singular past")
if past1s then
insert(data.inflections, past1s)
end
end
local nonpast_slots
if not past_slot or past_slot:find("^past_") then
nonpast_slots = {"ind_3ms", "ind_pass_3ms", "imp_2ms"}
else
nonpast_slots = {}
end
local nonpast, _ = do_slot(nonpast_slots, args.nonpast, "non-past")
if nonpast then
insert(data.inflections, nonpast)
end
local vn, _ = do_slot({"vn"}, args.vn, "verbal noun")
if vn then
insert(data.inflections, vn)
end
-- FIXME: Should we insert categories? Conjugation also does it and is more likely to be accurate.
--for _, cat in ipairs(alternant_multiword_spec.categories) do
-- insert(data.categories, cat)
--end
--[=[
-- FIXME: Review this to see if we need to port it.
-- If the user didn't explicitly specify head=, or specified exactly one head (not 2+) and we were able to
-- incorporate any links in that head into the 1= specification, use the infinitive generated by
-- [[Module:pt-verb]] in place of the user-specified or auto-generated head. This was copied from
-- [[Module:it-headword]], where doing this gets accents marked on the verb(s). We don't have accents marked on
-- the verb but by doing this we do get any footnotes on the infinitive propagated here. Don't do this if the
-- user gave multiple heads or gave a head with a multiword-linked verbal expression such as Italian
-- '[[dare esca]] [[al]] [[fuoco]]' (FIXME: give Portuguese equivalent).
if not data.user_specified_heads[1] or (
not data.user_specified_heads[2] and alternant_multiword_spec.incorporated_headword_head_into_lemma
) then
data.heads = {}
for _, lemma_obj in ipairs(alternant_multiword_spec.forms.infinitive_linked) do
local quals, refs = require(inflection_utilities_module).
convert_footnotes_to_qualifiers_and_references(lemma_obj.footnotes)
insert(data.heads, {term = lemma_obj.form, q = quals, refs = refs})
end
end
]=]
end
}
-----------------------------------------------------------------------------------------
-- Generic parts of speech --
-----------------------------------------------------------------------------------------
pos_functions.head_with_gender = {
params = function()
return {
[3] = {type = "genders"},
}
end,
func = function(data, args)
handle_gender(data, args, "nonlemma", 3)
end,
}
return export
9vikodozg8ooobpmp9upyzh3n3sr6ey
Module:ar-pronunciation
828
8169
27704
2026-06-21T14:23:35Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local m_str_utils = require("Module:string utilities") local m_table = require("Module:table") local audio_module = "Module:audio" local parse_utilities_module = "Module:parse utilities" local rfind = m_str_utils.find local rsplit = m_str_utils.split local ugsub = m_str_utils.gsub local ulen = m_str_utils.len local ulower = m_str_utils.lower local usub = m_str_utils.sub local concat = table.concat local insert = table.insert local lang = req...'
27704
Scribunto
text/plain
local export = {}
local m_str_utils = require("Module:string utilities")
local m_table = require("Module:table")
local audio_module = "Module:audio"
local parse_utilities_module = "Module:parse utilities"
local rfind = m_str_utils.find
local rsplit = m_str_utils.split
local ugsub = m_str_utils.gsub
local ulen = m_str_utils.len
local ulower = m_str_utils.lower
local usub = m_str_utils.sub
local concat = table.concat
local insert = table.insert
local lang = require("Module:languages").getByCode("ar")
local sc = require("Module:scripts").getByCode("Arab")
local correspondences = {
["ʾ"] = "ʔ",
["ṯ"] = "θ",
["j"] = "d͡ʒ",
["ḥ"] = "ħ",
["ḵ"] = "x",
["ḏ"] = "ð",
["š"] = "ʃ",
["ṣ"] = "sˤ",
["ḍ"] = "dˤ",
["ṭ"] = "tˤ",
["ẓ"] = "ðˤ",
["ž"] = "ʒ",
["ʿ"] = "ʕ",
["ḡ"] = "ɣ",
["ḷ"] = "lˤ",
["ū"] = "uː",
["ī"] = "iː",
["ā"] = "aː",
["y"] = "j",
["g"] = "ɡ",
["ē"] = "eː",
["ō"] = "oː",
[""] = "",
}
local vowels = "aāeēiīoōuū"
local vowel = "[" .. vowels .. "]"
local long_vowels = "āēīōū"
local long_vowel = "[" .. long_vowels .. "]"
local consonant = "[^" .. vowels .. ". -]"
local syllabify_pattern = "(" .. vowel .. ")(" .. consonant .. "?)(" .. consonant .. "?)(" .. vowel .. ")"
local tie = "‿"
local closed_syllable_shortening_pattern = "(" .. long_vowel .. ")(" .. tie .. ")" .. "(" .. consonant .. ")"
local function rsub(term, foo, bar)
local retval = ugsub(term, foo, bar)
return retval
end
local function generate_obj(respelling)
return { respelling = respelling }
end
local function combine_qualifiers(qual1, qual2)
if not qual1 then
return qual2
end
if not qual2 then
return qual1
end
local qualifiers = m_table.deepCopy(qual1)
for _, qual in ipairs(qual2) do
m_table.insertIfNot(qualifiers, qual)
end
return qualifiers
end
local function split_on_comma(term)
if not term then
return nil
end
if term:find(",%s") or term:find("\\") then
return require(parse_utilities_module).split_on_comma(term)
else
return rsplit(term, ",")
end
end
local function parse_respellings_with_modifiers(respelling, paramname)
if respelling:find("[<%[]") then
local put = require(parse_utilities_module)
local segments = put.parse_multi_delimiter_balanced_segment_run(respelling, { { "<", ">" }, { "[", "]" } })
local comma_separated_groups = put.split_alternating_runs_on_comma(segments)
local retval = {}
for _, group in ipairs(comma_separated_groups) do
local j = 2
while j <= #group do
if not group[j]:find("^<.*>$") then
group[j - 1] = group[j - 1] .. group[j] .. group[j + 1]
table.remove(group, j)
table.remove(group, j)
else
j = j + 2
end
end
local param_mods = {
q = { type = "qualifier" },
qq = { type = "qualifier" },
a = { type = "labels" },
aa = { type = "labels" },
ref = { item_dest = "refs", type = "references" },
}
table.insert(retval, put.parse_inline_modifiers_from_segments {
group = group,
arg = respelling,
props = {
paramname = paramname,
param_mods = param_mods,
generate_obj = generate_obj,
},
})
end
return retval
else
local retval = {}
for _, item in ipairs(split_on_comma(respelling)) do
table.insert(retval, generate_obj(item))
end
return retval
end
end
local function parse_pron_modifier(arg, paramname, generate_obj, param_mods, splitchar)
splitchar = splitchar or ","
if arg:find("<") then
param_mods.q = { type = "qualifier" }
param_mods.qq = { type = "qualifier" }
param_mods.a = { type = "labels" }
param_mods.aa = { type = "labels" }
param_mods.ref = { item_dest = "refs", type = "references" }
return require(parse_utilities_module).parse_inline_modifiers(arg, {
param_mods = param_mods,
generate_obj = generate_obj,
paramname = paramname,
splitchar = splitchar,
})
else
local retval = {}
local split_arg = splitchar == "," and split_on_comma(arg) or rsplit(arg, splitchar)
for _, term in ipairs(split_arg) do
table.insert(retval, generate_obj(term))
end
return retval
end
end
local function parse_audio(lang, arg, pagename, paramname)
local param_mods = {
IPA = { sublist = true },
text = {},
t = { item_dest = "gloss" },
gloss = {},
pos = {},
lit = {},
g = { item_dest = "genders", sublist = true },
bad = {},
cap = { item_dest = "caption" },
}
local function process_special_chars(val)
if not val then
return val
end
return (val:gsub("#", pagename))
end
local function generate_audio_obj(arg)
return { file = process_special_chars(arg) }
end
local retvals = parse_pron_modifier(arg, paramname, generate_audio_obj, param_mods, "%s*;%s*")
for _, retval in ipairs(retvals) do
retval.lang = lang
retval.text = process_special_chars(retval.text)
retval.caption = process_special_chars(retval.caption)
local textobj = require(audio_module).construct_audio_textobj(retval)
retval.text = textobj
retval.gloss = nil
retval.pos = nil
retval.lit = nil
retval.genders = nil
end
return retvals
end
local function parse_regional_phonetics(ph_arg, pagename)
if not ph_arg or ph_arg == "" then
return {}
end
local regionals = {}
for _, item in ipairs(rsplit(ph_arg, "%s*;%s*")) do
local audio = nil
local item_no_mod = item:gsub("<a:([^>]+)>", function(a)
audio = a:gsub("#", pagename)
return ""
end)
local region, ipa = item_no_mod:match("^([^:]+):(.+)$")
if region and ipa then
local regions = rsplit(region, "%s*,%s*")
table.insert(regionals, { regions = regions, ipa = ipa, audio = audio })
end
end
return regionals
end
local function syllabify(text)
text = ugsub(text, "%-(" .. consonant .. ")%-(" .. consonant .. ")", "%1.%2")
text = ugsub(text, "%-", ".")
for _ = 1, 2 do
text = ugsub(
text,
syllabify_pattern,
function(a, b, c, d)
if c == "" and b ~= "" then
c, b = b, ""
end
return a .. b .. "." .. c .. d
end
)
end
text = ugsub(text, "(" .. vowel .. ") (" .. consonant .. ")%.?(" ..
consonant .. ")", "%1" .. tie .. "%2.%3")
return text
end
local function closed_syllable_shortening(text)
local shorten = {
["ā"] = "a",
["ē"] = "e",
["ī"] = "i",
["ō"] = "o",
["ū"] = "u",
}
text = ugsub(text,
closed_syllable_shortening_pattern,
function(vowel, tie, consonant)
return shorten[vowel] .. tie .. consonant
end)
return text
end
function export.link(term)
return require("Module:links").full_link { term = term, lang = lang, sc = sc }
end
function export.toIPA(list, silent_error)
local translit
if list.tr then
translit = list.tr
elseif list.term then
require("Module:script utilities").checkScript(list.term, "Arab")
translit = lang:transliterate(list.term)
if not translit then
if silent_error then
return ''
else
error('Module:ar-translit failed to generate a transliteration from "' .. list.term .. '".')
end
end
else
if silent_error then
return ''
else
error('No Arabic text or transliteration was provided to the function "toIPA".')
end
end
translit = ugsub(translit, "llāh", "ḷḷāh")
translit = ugsub(translit, "([iī] ?)ḷḷ", "%1ll")
translit = ugsub(translit, "%(t%)", "")
translit = ugsub(translit, "(" .. vowel .. ") " .. vowel, "%1 ")
translit = ugsub(translit, "%-?l%-?", "l")
translit = syllabify(translit)
translit = closed_syllable_shortening(translit)
local output = ugsub(translit, ".", correspondences)
output = ugsub(output, "%-", "")
return output
end
function export.get_pron_info(terms, pagename, paramname)
if #terms == 1 and terms[1].respelling == "-" then
return { pron_list = nil }
end
local pron_list = {}
local brackets = "/%s/"
for _, term in ipairs(terms) do
local respelling = term.respelling
local ar_term, tr
if not respelling or respelling == "" or respelling == "#" then
ar_term = pagename
elseif rfind(respelling, "[a-zA-Z]") then
tr = respelling
elseif respelling:find("[ء-ي]") then
ar_term = respelling
else
tr = respelling
end
local pron = export.toIPA({ term = ar_term, tr = tr }, false)
if pron and pron ~= "" then
local bracketed_pron = brackets:format(pron)
table.insert(pron_list, {
pron = bracketed_pron,
q = term.q,
qq = term.qq,
a = term.a,
aa = term.aa,
refs = term.refs,
})
end
end
return { pron_list = pron_list }
end
function export.show_old(frame)
local params = {
[1] = { list = true, allow_holes = true },
["tr"] = { list = true, allow_holes = true },
["qual"] = { list = true, allow_holes = true },
["nl"] = { type = "boolean" },
["ann"] = {},
}
local args = require("Module:parameters").process(frame:getParent().args, params)
local ar_terms = args[1]
local transliterations = args.tr
local qualifiers = args.qual
local nl = args.nl
if not (ar_terms.maxindex > 0 or transliterations.maxindex > 0) then
if mw.title.getCurrentTitle().nsText == "Template" then
ar_terms[1] = "كَلِمَة"
ar_terms.maxindex = 1
else
error(
'Please provide vocalized Arabic in the first parameter of {{[[Template:ar-IPA|ar-IPA]]}}, or transliteration in the "tr" parameter.')
end
end
local pronunciations = {}
for i = 1, math.max(ar_terms.maxindex, transliterations.maxindex) do
local ar_term = ar_terms[i]
local tr = transliterations[i]
local qual = qualifiers[i]
if not (ar_term or tr) then
error("There is a gap in the parameters. Provide either |" .. i .. "= or |tr" .. i .. "=.")
elseif ar_term and tr then
mw.logObject("Duplicate parameters |" .. i .. "= and |tr" .. i .. "= in {{ar-IPA}},")
end
local pron = export.toIPA { term = ar_term, tr = tr }
table.insert(pronunciations, { pron = "/" .. pron .. "/", qualifiers = qual and { qual } or nil })
end
local anntext = ""
if args.ann then
anntext = args.ann
if args.ann:find("%+") then
local anndefs = {}
for i = 1, ar_terms.maxindex do
local ar_term = ar_terms[i]
if ar_term then
table.insert(anndefs, "'''" .. ar_term .. "'''")
end
end
if anndefs[1] then
anndefs = table.concat(anndefs, ", ")
anntext = anntext:gsub("%+", require("Module:string utilities").replacement_escape(anndefs))
end
end
anntext = require("Module:qualifier").format_qualifier(anntext, "", "") .. ": "
end
if nl then
return anntext .. require("Module:IPA").format_IPA_multiple(lang, pronunciations)
else
return anntext .. require("Module:IPA").format_IPA_full { lang = lang, items = pronunciations }
end
end
function export.show(frame)
local parent_args = frame:getParent().args
local process = require("Module:parameters").process
local params = {
[1] = {},
["audios"] = {},
["a"] = { alias_of = "audios" },
["ph"] = {},
["pagename"] = {},
["indent"] = {},
["ann"] = {},
}
local args = process(parent_args, params)
local pagename = args.pagename or mw.loadData("Module:headword/data").pagename
local indent = args.indent or "*"
local termspec = args[1] or "#"
local terms = parse_respellings_with_modifiers(termspec, 1)
local pronobj = export.get_pron_info(terms, pagename, 1)
local regional_phonetics = parse_regional_phonetics(args.ph, pagename)
local parts = {}
local function ins(text)
table.insert(parts, text)
end
local anntext = ""
if args.ann then
anntext = args.ann
if args.ann:find("%+") then
local anndefs = {}
for _, term in ipairs(terms) do
local respelling = term.respelling
if respelling and respelling:find("[ء-ي]") then
table.insert(anndefs, "'''" .. respelling .. "'''")
end
end
if anndefs[1] then
anndefs = table.concat(anndefs, ", ")
anntext = anntext:gsub("%+", require("Module:string utilities").replacement_escape(anndefs))
end
end
anntext = require("Module:qualifier").format_qualifier(anntext, "", "") .. ": "
end
if pronobj.pron_list and #pronobj.pron_list > 0 then
local formatted = require("Module:IPA").format_IPA_full { lang = lang, items = pronobj.pron_list }
ins(indent .. anntext .. mw.ustring.toNFC(formatted))
end
if args.audios then
local format_audio = require("Module:audio").format_audio
local audio_objs = parse_audio(lang, args.audios, pagename, "audios")
for i, audio_obj in ipairs(audio_objs) do
if #audio_objs > 1 and not audio_obj.caption then
audio_obj.caption = "Audio " .. i
end
ins("\n" .. indent .. " " .. format_audio(audio_obj))
end
end
if #regional_phonetics > 0 then
local m_IPA = require("Module:IPA")
local m_accent = require("Module:accent qualifier")
for _, regional in ipairs(regional_phonetics) do
local regions = regional.regions
local ipa = regional.ipa
local pron_item = { pron = "[" .. ipa .. "]" }
local formatted_ipa = m_IPA.format_IPA_full { lang = lang, items = { pron_item } }
local formatted_region = m_accent.format_qualifiers(lang, regions)
local line = "\n" .. indent .. indent .. " " .. formatted_region .. " " .. mw.ustring.toNFC(formatted_ipa)
if regional.audio then
local audio_obj = {
lang = lang,
file = regional.audio,
}
local textobj = require(audio_module).construct_audio_textobj(audio_obj)
audio_obj.text = textobj
line = line .. " " .. require("Module:audio").format_audio(audio_obj)
end
ins(line)
end
end
return concat(parts)
end
return export
fufezqvhcygwotxk2txtm6ulorript3
Module:ar-nominals
828
8170
27705
2026-06-21T14:58:09Z
Umarxon III
2840
Sahypa döretdi, mazmuny: '-- Author: Benwing, based on early version by CodeCat. --[[ FIXME: Nouns/adjectives to create to exemplify complex declensions: -- riḍan (رِضًا or رِضًى) --]] local m_utilities = require("Module:utilities") local m_links = require("Module:links") local ar_utilities = require("Module:ar-utilities") local lang = require("Module:languages").getByCode("ar") local u = require("Module:string/char") local rfind = mw.ustring.find local rsubn = mw.ustring.g...'
27705
Scribunto
text/plain
-- Author: Benwing, based on early version by CodeCat.
--[[
FIXME: Nouns/adjectives to create to exemplify complex declensions:
-- riḍan (رِضًا or رِضًى)
--]]
local m_utilities = require("Module:utilities")
local m_links = require("Module:links")
local ar_utilities = require("Module:ar-utilities")
local lang = require("Module:languages").getByCode("ar")
local u = require("Module:string/char")
local rfind = mw.ustring.find
local rsubn = mw.ustring.gsub
local rmatch = mw.ustring.match
local rsplit = mw.text.split
-- This is used in place of a transliteration when no manual
-- translit is specified and we're unable to automatically generate
-- one (typically because some vowel diacritics are missing).
local BOGUS_CHAR = u(0xFFFD)
-- hamza variants
local HAMZA = u(0x0621) -- hamza on the line (stand-alone hamza) = ء
local HAMZA_ON_ALIF = u(0x0623)
local HAMZA_ON_W = u(0x0624)
local HAMZA_UNDER_ALIF = u(0x0625)
local HAMZA_ON_Y = u(0x0626)
local HAMZA_ANY = "[" .. HAMZA .. HAMZA_ON_ALIF .. HAMZA_UNDER_ALIF .. HAMZA_ON_W .. HAMZA_ON_Y .. "]"
local HAMZA_PH = u(0xFFF0) -- hamza placeholder
-- various letters
local ALIF = u(0x0627) -- ʾalif = ا
local AMAQ = u(0x0649) -- ʾalif maqṣūra = ى
local AMAD = u(0x0622) -- ʾalif madda = آ
local TAM = u(0x0629) -- tāʾ marbūṭa = ة
local T = u(0x062A) -- tāʾ = ت
local HYPHEN = u(0x0640)
local N = u(0x0646) -- nūn = ن
local W = u(0x0648) -- wāw = و
local Y = u(0x064A) -- yā = ي
-- diacritics
local A = u(0x064E) -- fatḥa
local AN = u(0x064B) -- fatḥatān (fatḥa tanwīn)
local U = u(0x064F) -- ḍamma
local UN = u(0x064C) -- ḍammatān (ḍamma tanwīn)
local I = u(0x0650) -- kasra
local IN = u(0x064D) -- kasratān (kasra tanwīn)
local SK = u(0x0652) -- sukūn = no vowel
local SH = u(0x0651) -- šadda = gemination of consonants
local DAGGER_ALIF = u(0x0670)
local DIACRITIC_ANY_BUT_SH = "[" .. A .. I .. U .. AN .. IN .. UN .. SK .. DAGGER_ALIF .. "]"
-- common combinations
local NA = N .. A
local NI = N .. I
local AH = A .. TAM
local AT = A .. T
local AA = A .. ALIF
local AAMAQ = A .. AMAQ
local AAH = AA .. TAM
local AAT = AA .. T
local II = I .. Y
local IIN = II .. N
local IINA = II .. NA
local IY = II
local UU = U .. W
local UUN = UU .. N
local UUNA = UU .. NA
local AY = A .. Y
local AW = A .. W
local AYSK = AY .. SK
local AWSK = AW .. SK
local AAN = AA .. N
local AANI = AA .. NI
local AYN = AYSK .. N
local AYNI = AYSK .. NI
local AWN = AWSK .. N
local AWNA = AWSK .. NA
local AYNA = AYSK .. NA
local AYAAT = AY .. AAT
local UNU = "[" .. UN .. U .. "]"
-- optional diacritics/letters
local AOPT = A .. "?"
local AOPTA = A .. "?" .. ALIF
local IOPT = I .. "?"
local UOPT = U .. "?"
local UNOPT = UN .. "?"
local UNUOPT = UNU .. "?"
local SKOPT = SK .. "?"
-- lists of consonants
-- exclude tāʾ marbūṭa because we don't want it treated as a consonant
-- in patterns like أَفْعَل
local consonants_needing_vowels_no_tam = "بتثجحخدذرزسشصضطظعغفقكلمنهپچڤگڨڧأإؤئء"
-- consonants on the right side; includes alif madda
local rconsonants_no_tam = consonants_needing_vowels_no_tam .. "ويآ"
-- consonants on the left side; does not include alif madda
local lconsonants_no_tam = consonants_needing_vowels_no_tam .. "وي"
local CONS = "[" .. lconsonants_no_tam .. "]"
local CONSPAR = "([" .. lconsonants_no_tam .. "])"
local LRM = u(0x200E) --left-to-right mark
-- First syllable or so of elative/color-defect adjective
local ELCD_START = "^" .. HAMZA_ON_ALIF .. AOPT .. CONSPAR
local export = {}
--------------------
-- Utility functions
--------------------
function ine(x) -- If Not Empty
if x == nil then
return nil
elseif rfind(x, '^".*"$') then
local ret = rmatch(x, '^"(.*)"$')
return ret
elseif rfind(x, "^'.*'$") then
local ret = rmatch(x, "^'(.*)'$")
return ret
elseif x == "" then
return nil
else
return x
end
end
-- Compare two items, recursively comparing arrays.
-- FIXME, doesn't work for tables that aren't arrays.
function equals(x, y)
if type(x) == "table" and type(y) == "table" then
if #x ~= #y then
return false
end
for key, value in ipairs(x) do
if not equals(value, y[key]) then
return false
end
end
return true
end
return x == y
end
-- true if array contains item
function contains(tab, item)
for _, value in pairs(tab) do
if equals(value, item) then
return true
end
end
return false
end
-- append to array if element not already present
function insert_if_not(tab, item)
if not contains(tab, item) then
table.insert(tab, item)
end
end
-- version of rsubn() that discards all but the first return value
function rsub(term, foo, bar)
local retval = rsubn(term, foo, bar)
return retval
end
-- version of rsub() that asserts that a match occurred
function assert_rsub(term, foo, bar)
local retval, numsub = rsubn(term, foo, bar)
assert(numsub > 0)
return retval
end
function make_link(arabic)
--return m_links.full_link(nil, arabic, lang, nil, "term", nil, {tr = "-"}, false)
return m_links.full_link({lang = lang, alt = arabic}, "term")
end
function track(page)
require("Module:debug").track("ar-nominals/" .. page)
return true
end
-------------------------------------
-- Functions for building inflections
-------------------------------------
-- Functions that do the actual inflecting by creating the forms of a basic term.
local inflections = {}
local max_mods = 9 -- maximum number of modifiers
local mod_list = {"mod"} -- list of "mod", "mod2", "mod3", ...
for i=2,max_mods do
table.insert(mod_list, "mod" .. i)
end
-- Create and return the 'data' structure that will hold all of the
-- generated declensional forms, as well as other ancillary information
-- such as the possible numbers, genders and cases the the actual numbers
-- and states to store (in 'data.numbers' and 'data.states' respectively).
function init_data()
-- FORMS contains a table of forms for each inflectional category,
-- e.g. "nom_sg_ind" for nouns or "nom_m_sg_ind" for adjectives. The value
-- of an entry is an array of alternatives (e.g. different plurals), where
-- each alternative is either a string of the form "ARABIC" or
-- "ARABIC/TRANSLIT", or an array of such strings (this is used for
-- alternative spellings involving different hamza seats,
-- e.g. مُبْتَدَؤُون or مُبْتَدَأُون). Alternative hamza spellings are separated
-- in display by an "inner separator" (/), while alternatives on
-- the level of different plurals are separated by an "outer separator" (;).
return {forms = {}, title = nil, categories = {},
allgenders = {"m", "f"},
allstates = {"ind", "def", "con"},
allnumbers = {"sg", "du", "pl"},
states = {}, -- initialized later
numbers = {}, -- initialized later
engnumbers = {sg="singular", du="dual", pl="plural",
coll="collective", sing="singulative", pauc="paucal"},
engnumberscap = {sg="singular", du="dual", pl="plural",
coll="collective", sing="singulative", pauc="paucal (3-10)"},
allcases = {"nom", "acc", "gen", "inf"},
allcases_with_lemma = {"nom", "acc", "gen", "inf", "lemma"},
-- index into endings array indicating correct ending for given
-- combination of state and case
statecases = {
ind = {nom = 1, acc = 2, gen = 3, inf = 10, lemma = 13},
def = {nom = 4, acc = 5, gen = 6, inf = 11, lemma = 14},
-- used for a definite adjective modifying a construct-state noun
defcon = {nom = 4, acc = 5, gen = 6, inf = 11, lemma = 14},
con = {nom = 7, acc = 8, gen = 9, inf = 12, lemma = 15},
},
}
end
-- Initialize and return ARGS, ORIGARGS and DATA (see init_data()).
-- ARGS is a table of user-supplied arguments, massaged from the original
-- arguments by converting empty-string arguments to nil and appending
-- translit arguments to their base arguments with a separating slash.
-- ORIGARGS is the original table of arguments.
function init(origargs)
-- Massage arguments by converting empty arguments to nil, and
-- "" or '' arguments to empty.
local args = {}
for k, v in pairs(origargs) do
args[k] = ine(v)
end
-- Further massage arguments by appending translit arguments to the
-- corresponding base arguments, with a slash separator, as is expected
-- in the rest of the code.
--
-- FIXME: We should consider separating translit and base arguments by the
-- separators ; , | (used in overrides; see handle_lemma_and_overrides())
-- and matching up individual parts, to allow separate translit arguments
-- to be specified for overrides. But maybe not; the point of allowing
-- separate translit arguments is for compatibility with headword
-- templates such as "ar-noun" and "ar-adj", and those templates don't
-- handle override arguments.
local function dotr(arg, argtr)
if not args[arg] then
error("Argument '" .. argtr .."' specified but not corresponding base argument '" .. arg .. "'")
end
args[arg] = args[arg] .. "/" .. args[argtr]
end
-- By convention, corresponding to arg 1 is tr; corresponding to
-- head2, head3, ... is tr2, tr3, ...; corresponding to
-- modhead2, modhead3, ... is modtr2, modtr3, ...; corresponding to
-- modNhead2, modNhead3, ... is modNtr2, modNtr3, ..; corresponding to
-- all other arguments FOO, FOO2, ... is FOOtr, FOO2tr, ...
for k, v in pairs(args) do
if k == "tr" then
dotr(1, "tr")
elseif rfind(k, "tr[0-9]+$") then
dotr(assert_rsub(k, "tr([0-9]+)$", "head%1"), k)
elseif rfind(k, "tr$") then
dotr(assert_rsub(k, "tr$", ""), k)
end
end
-- Construct data.
local data = init_data()
return args, origargs, data
end
-- Parse the user-specified state spec and other related arguments. The
-- user can specify, using idafaN=, how modifiers are related to previous
-- words. The user can also manually specify which states are to appear;
-- whether to omit the definite article in the definite state; and
-- how/whether to restrict modifiers to a particular state, case or number.
-- Normally the modN_* parameters and basestate= do not need to be set
-- directly; instead, use idafaN=. It may be necessary to explicitly
-- specify state= in the presence of proper nouns or definite-only
-- adjectival expressions. NOTE: At the time this function is called,
-- data.numbers has not yet been initialized.
function parse_state_etc_spec(data, args)
local function check(arg, dataval, allvalues)
if args[arg] then
if not contains(allvalues, args[arg]) then
error("For " .. arg .. "=, value '" .. args[arg] .. "' should be one of " ..
table.concat(allvalues, ", "))
end
data[dataval] = args[arg]
end
end
local function check_boolean(arg, dataval)
check(arg, dataval, {"yes", "no"})
if data[dataval] == "yes" then
data[dataval] = true
elseif data[dataval] == "no" then
data[dataval] = false
end
end
-- Make sure no holes in mod values
for i=1,(#mod_list)-1 do
if args[mod_list[i+1]] and not args[mod_list[i]] then
error("Hole in modifier arguments -- " .. mod_list[i+1] ..
" present but not " .. mod_list[i])
end
end
-- FIXME! Remove this once we're sure there are no instances of mod2
-- that haven't been converted to modhead2.
if args["mod2"] then
track("mod2")
end
-- Set default value; may be overridden e.g. by arg["state"] or
-- by idafaN=.
data.states = data.allstates
-- List of pairs of idafaN/modN parameters
local idafa_mod_list = {{"idafa", "mod"}}
for i=2,max_mods do
table.insert(idafa_mod_list, {"idafa" .. i, "mod" .. i})
end
-- True if the value of an |idafa= param is a valid adjectival modifier
-- value.
local function valid_adjectival_idafaval(idafaval)
return idafaval == "adj" or idafaval == "adj-base" or
idafaval == "adj-mod" or rfind(idafaval, "^adj%-mod[0-9]+$")
end
-- Extract the referent (base or modifier) of an adjectival |idafa= param.
-- Assumes the value is valid.
local function adjectival_idafaval_referent(idafaval)
if idafaval == "adj" then
return "base"
end
return assert_rsub(idafaval, "^adj%-", "")
end
-- Convert a base/mod spec to an index: 0=base, 1=mod, 2=mod2, etc.
local function basemod_to_index(basemod)
if basemod == "base" then return 0 end
if basemod == "mod" then return 1 end
return tonumber(assert_rsub(basemod, "^mod", ""))
end
-- Recognize idafa spec and handle it.
-- We do the following:
-- (1) Check that if idafaN= is given, then modN= is also given.
-- (2) Check that adjectival modifiers aren't followed by idafa modifiers.
-- (3) Check that adjectival modifiers are modifying the base or an
-- ʾidāfa modifier, not another adjectival modifier.
-- (4) Support idafa values "adj-base", "adj-mod", "adj-mod2", "adj"
-- (="adj-base") etc. and check that we're referring to an earlier
-- word.
-- (5) For ʾidāfa modifiers, set basestate=con, set modN_case=gen,
-- set modN_idafa=true, and set modN_number to the number specified
-- in the parameter value (e.g. 'sg' or 'def-pl'); and if the
-- parameter value specifies a state (e.g. 'def' or 'ind-du'),
-- set modN_state= to this value, and if this is the last ʾidāfa
-- modifier, also set state= to this value; if this is not the last
-- ʾidāfa modifier, set modN_state=con and disallow a state to be
-- specified in the parameter value.
-- (6) For adjectival modifiers of the base, do nothing.
-- (7) For adjectival modifiers of ʾidāfa modifiers, set modN_case=gen;
-- set modN_idafa=false; and set modN_number=, modN_numgen= and
-- modN_state= to match the values of the idafa modifier.
-- error checking and find last ʾidāfa modifier
local last_is_idafa = true
local last_idafa_mod = "base"
for _, idafa_mod in ipairs(idafa_mod_list) do
local idafaparam = idafa_mod[1]
local mod = idafa_mod[2]
local idafaval = args[idafaparam]
if idafaval then
local paramval = idafaparam .. "=" .. idafaval
if not args[mod] then
error("'" .. idafaparam .. "' parameter without corresponding '"
.. mod .. "' parameter")
end
if not valid_adjectival_idafaval(idafaval) then
-- We're a construct (ʾidāfa) modifier
if not last_is_idafa then
error("ʾidāfa modifier " .. paramval .. " follows adjectival modifier")
end
last_idafa_mod = mod
else
last_is_idafa = false
local adjref = adjectival_idafaval_referent(idafaval)
if adjref ~= "base" then
if basemod_to_index(adjref) >= basemod_to_index(mod) then
error(paramval .. " can only refer to an earlier element")
end
local idafaref = assert_rsub(adjref, "^mod", "idafa")
if not args[idafaref] then
error(paramval .. " cannot refer to a missing modifier")
elseif valid_adjectival_idafaval(args[idafaref]) then
error(paramval .. " cannot refer to an adjectival modifier")
end
end
end
end
end
-- Now go through and set all the modN_ data values appropriately.
for _, idafa_mod in ipairs(idafa_mod_list) do
local idafaparam = idafa_mod[1]
local mod = idafa_mod[2]
local idafaval = args[idafaparam]
if idafaval then
local paramval = idafaparam .. "=" .. idafaval
local bad_idafa = true
if idafaval == "yes" then
idafaval = "sg"
end
if idafaval == "ind-def" or contains(data.allstates, idafaval) then
idafaval = idafaval .. "-sg"
end
if not idafaval then
bad_idafa = false
elseif valid_adjectival_idafaval(idafaval) then
local adjref = adjectival_idafaval_referent(idafaval)
if adjref ~= "base" then
data[mod .. "_case"] = "gen"
data[mod .. "_state"] = data[adjref .. "_state"]
-- if agreement is with ind-def, make it def
if data[mod .. "_state"] == "ind-def" then
data[mod .. "_state"] = "def"
end
data[mod .. "_number"] = data[adjref .. "_number"]
data[mod .. "_numgen"] = data[adjref .. "_numgen"]
data[mod .. "_idafa"] = false
end
bad_idafa = false
elseif contains(data.allnumbers, idafaval) then
data.basestate = "con"
data[mod .. "_case"] = "gen"
data[mod .. "_number"] = idafaval
data[mod .. "_idafa"] = true
if mod ~= last_idafa_mod then
data[mod .. "_state"] = "con"
end
bad_idafa = false
elseif rfind(idafaval, "%-") then
local state_num = rsplit(idafaval, "%-")
-- Support ind-def as a possible value. We set modstate to
-- ind-def, which will signal definite agreement with adjectival
-- modifiers; then later on we change the value to ind.
if #state_num == 3 and state_num[1] == "ind" and state_num[2] == "def" then
state_num[1] = "ind-def"
state_num[2] = state_num[3]
table.remove(state_num)
end
if #state_num == 2 then
local state = state_num[1]
local num = state_num[2]
if (state == "ind-def" or contains(data.allstates, state))
and contains(data.allnumbers, num) then
if mod == last_idafa_mod then
if state == "ind-def" then
data.states = {"def"}
else
data.states = {state}
end
else
error(paramval .. " cannot specify a state because it is not the last ʾidāfa modifier")
end
data.basestate = "con"
data[mod .. "_case"] = "gen"
data[mod .. "_state"] = state
data[mod .. "_number"] = num
data[mod .. "_idafa"] = true
bad_idafa = false
end
end
end
if bad_idafa then
error(paramval .. " should be one of yes, def, sg, def-sg, adj, adj-base, adj-mod, adj-mod2 or similar")
end
end
end
if args["state"] == "ind-def" then
data.states = {"def"}
data.basestate = "ind"
elseif args["state"] then
data.states = rsplit(args["state"], ",")
for _, state in ipairs(data.states) do
if not contains(data.allstates, state) then
error("For state=, value '" .. state .. "' should be one of " ..
table.concat(data.allstates, ", "))
end
end
end
-- Now process explicit settings, so that they can override the
-- settings based on idafaN=.
check("basestate", "basestate", data.allstates)
check_boolean("noirreg", "noirreg")
check_boolean("omitarticle", "omitarticle")
data.prefix = args.prefix
for _, mod in ipairs(mod_list) do
check(mod .. "state", mod .. "_state", data.allstates)
check(mod .. "case", mod .. "_case", data.allcases)
check(mod .. "number", mod .. "_number", data.allnumgens)
check(mod .. "numgen", mod .. "_numgen", data.allnumgens)
check_boolean(mod .. "idafa", mod .. "_idafa")
check_boolean(mod .. "omitarticle", mod .. "_omitarticle")
data[mod .. "_prefix"] = args[mod .. "prefix"]
end
-- Make sure modN_numgen is initialized, to modN_number if necessary.
-- This simplifies logic in certain places, e.g. call_inflections().
-- Also convert ind-def to ind.
for _, mod in ipairs(mod_list) do
data[mod .. "_numgen"] = data[mod .. "_numgen"] or data[mod .. "_number"]
if data[mod .. "_state"] == "ind-def" then
data[mod.. "_state"] = "ind"
end
end
end
-- Parse the user-specified number spec. The user can manually specify which
-- numbers are to appear. Return true if |number= was specified.
function parse_number_spec(data, args)
if args["number"] then
data.numbers = rsplit(args["number"], ",")
for _, num in ipairs(data.numbers) do
if not contains(data.allnumbers, num) then
error("For number=, value '" .. num .. "' should be one of " ..
table.concat(data.allnumbers, ", "))
end
end
return true
else
data.numbers = data.allnumbers
return false
end
end
-- Determine which numbers will appear using the logic for nouns.
-- See comment just below.
function determine_noun_numbers(data, args, pls)
-- Can manually specify which numbers are to appear, and exactly those
-- numbers will appear. Otherwise, if any plurals given, duals and plurals
-- appear; else, only singular (on the assumption that the word is a proper
-- noun or abstract noun that exists only in the singular); however,
-- singular won't appear if "-" given for singular, and similarly for dual.
if not parse_number_spec(data, args) then
data.numbers = {}
local sgarg1 = args[1]
local duarg1 = args["d"]
if sgarg1 ~= "-" then
table.insert(data.numbers, "sg")
end
if #pls["base"] > 0 then
-- Dual appears if either: explicit dual stem (not -) is given, or
-- default dual is used and explicit singular stem (not -) is given.
if (duarg1 and duarg1 ~= "-") or (not duarg1 and sgarg1 ~= "-") then
table.insert(data.numbers, "du")
end
table.insert(data.numbers, "pl")
elseif duarg1 and duarg1 ~= "-" then
-- If explicit dual but no plural given, include it. Useful for
-- dual tantum words.
table.insert(data.numbers, "du")
end
end
end
-- For stem STEM, convert to stem-and-type format and insert stem and type
-- into RESULTS, checking to make sure it's not already there. SGS is the
-- list of singular items to base derived forms off of (masculine or feminine
-- as appropriate), an array of length-two arrays of {COMBINED_STEM, TYPE} as
-- returned by stem_and_type(); ISFEM is true if this is feminine gender;
-- NUM is "sg", "du" or "pl". POS is the part of speech, generally "noun" or
-- "adjective".
function insert_stems(stem, results, sgs, isfem, num, pos)
if stem == "-" then
return
end
for _, sg in ipairs(sgs) do
local combined_stem, ty = export.stem_and_type(stem,
sg[1], sg[2], isfem, num, pos)
insert_if_not(results, {combined_stem, ty})
end
end
-- Handle manually specified overrides of individual forms. Separate
-- outer-level alternants with ; or , or the Arabic equivalents; separate
-- inner-level alternants with | (we can't use / because it's already in
-- use separating Arabic from translit).
--
-- Also determine lemma and allow it to be overridden.
-- Also allow POS (part of speech) to be overridden.
function handle_lemma_and_overrides(data, args)
local function handle_override(arg)
if args[arg] then
local ovval = {}
local alts1 = rsplit(args[arg], "[;,؛،]")
for _, alt1 in ipairs(alts1) do
local alts2 = rsplit(alt1, "|")
table.insert(ovval, alts2)
end
data.forms[arg] = ovval
end
end
local function do_overrides(mod)
for _, numgen in ipairs(data.allnumgens) do
for _, state in ipairs(data.allstates) do
for _, case in ipairs(data.allcases) do
local arg = mod .. case .. "_" .. numgen .. "_" .. state
handle_override(arg)
if args[arg] and not data.noirreg then
insert_cat(data, mod, numgen,
"Arabic NOUNs with irregular SINGULAR",
"SINGULAR of irregular NOUN")
end
end
end
end
end
do_overrides("")
for _, mod in ipairs(mod_list) do
do_overrides(mod .. "_")
end
local function get_lemma(mod)
for _, numgen in ipairs(data.numgens()) do
for _, state in ipairs(data.states) do
local arg = mod .. "lemma_" .. numgen .. "_" .. state
if data.forms[arg] and #data.forms[arg] > 0 then
return data.forms[arg]
end
end
end
return nil
end
data.forms["lemma"] = get_lemma("")
for _, mod in ipairs(mod_list) do
data.forms[mod .. "_lemma"] = get_lemma(mod .. "_")
end
handle_override("lemma")
for _, mod in ipairs(mod_list) do
handle_override(mod .. "_lemma")
end
end
-- Return the part of speech based on the part of speech contained in
-- data.pos and MOD (either "", "mod_", "mod2_", etc., same as in
-- do_gender_number_1()). If we're a modifier, don't use data.pos but
-- instead choose based on whether modifier is adjectival or nominal
-- (ʾiḍāfa).
function get_pos(data, mod)
local ismod = mod ~= ""
if not ismod then
return data.pos
elseif data[mod .. "idafa"] then
return "noun"
else
return "adjective"
end
end
-- Find the stems associated with a particular gender/number combination.
-- ARGS is the set of all arguments. ARGPREFS is an array of argument prefixes
-- (e.g. "f" for the actual arguments "f", "f2", ..., for the feminine
-- singular; we allow more than one to handle "cpl"). SGS is a
-- "stem-type list" (see do_gender_number()), and is the list of stems to
-- base derived forms off of (masculine or feminine as appropriate), an array
-- of length-two arrays of {COMBINED_STEM, TYPE} as returned by
-- stem_and_type(). DEFAULT, ISFEM and NUM are as in do_gender_number().
-- MOD is either "", "mod_", "mod2_", etc. depending if we're working on a
-- base or modifier argument (in the latter case, basically if the argument
-- begins with "mod").
function do_gender_number_1(data, args, argprefs, sgs, default, isfem, num, mod)
local results = {}
local function handle_stem(stem)
insert_stems(stem, results, sgs, isfem, num, get_pos(data, mod))
end
-- If no arguments specified, use the default instead.
need_default = true
for _, argpref in ipairs(argprefs) do
if args[argpref] then
need_default = false
break
end
end
if need_default then
if not default then
return results
end
handle_stem(default)
return results
end
-- For explicitly specified arguments, make sure there's at least one
-- stem to generate off of; otherwise specifying e.g. 'sing=- pauc=فُلَان'
-- won't override paucal.
if #sgs == 0 then
sgs = {{"", ""}}
end
for _, argpref in ipairs(argprefs) do
if args[argpref] then
handle_stem(args[argpref])
end
local i = 2
while args[argpref .. i] do
handle_stem(args[argpref .. i])
i = i + 1
end
end
return results
end
-- For a given gender/number combination, parse and return the full set
-- of stems for both base and modifier. The return value is a
-- "stem specification", i.e. table with a "base" key for the base, a
-- "mod" key for the first modifier (see below), a "mod2" key for the
-- second modifier, etc. listing all stems for both the base and modifier(s).
-- The value of each key is a "stem-type list", i.e. an array of stem-type
-- pairs, where each element is a size-two array of {COMBINED_STEM, STEM_TYPE}.
-- COMBINED_STEM is a stem with attached transliteration in the form
-- STEM/TRANSLIT (where the transliteration is either manually specified in
-- the stem argument, e.g. 'pl=لُورْدَات/lordāt', or auto-transliterated from
-- the Arabic, with BOGUS_CHAR substituting for the transliteration if
-- auto-translit fails). STEM_TYPE is the declension of the stem, either
-- manually specified, e.g. 'بَبَّغَاء:di' for manually-specified diptote, or
-- auto-detected (see stem_and_type() and detect_type()).
--
-- DATA and ARGS are as in init(). ARGPREFS is an array of the prefixes for
-- the argument(s) specifying the stem (and optional translit and declension
-- type). For a given ARGPREF, we check ARGPREF, ARGPREF2, ARGPREF3, ... in
-- turn for the base, and modARGPREF, modARGPREF2, modARGPREF3, ... in turn
-- for the first modifier, and mod2ARGPREF, mod2ARGPREF2, mod2ARGPREF3, ...
-- for the second modifier, etc. SGS is a stem specification (see above),
-- giving the stems that are used to base derived forms off of (e.g. if a stem
-- type "smp" appears in place of a stem, the sound masculine plural of the
-- stems in SGS will be derived). DEFAULT is a single stem (i.e. a string) that
-- is used when no stems were explicitly given by the user (typically either
-- "f", "m", "d" or "p"), or nil for no default. ISFEM is true if we're
-- accumulating stems for a feminine number/gender category, and NUM is the
-- number (expected to be "sg", "du" or "pl") of the number/gender category
-- we're accumulating stems for.
--
-- About bases and modifiers: Note that e.g. in the noun phrase يَوْم الاِثْنَيْن
-- the head noun يَوْم is the base and the noun الاِثْنَيْن is the modifier.
-- In a noun phrase like البَحْر الأَبْيَض المُتَوَسِّط, there are two modifiers.
-- Note that modifiers come in two varieties, adjectival modifiers and
-- construct (ʾidāfa) modifiers. The first above noun phrase is an example
-- of a noun phrase with a construct modifier, where the base is fixed in
-- the construct state and the modifier is fixed in number and case
-- (which is always genitive) and possibly in state. The second above noun
-- phrase is an example of a noun phrase with two adjectival modifiers.
-- A construct modifier is generally a noun, whereas an adjectival modifier
-- is an adjective that usually agrees in state, number and case with the
-- base noun. (Note that in the case of multiple modifiers, it is possible
-- for e.g. the second modifier to be an adjectival modifier that agrees
-- with the first, construct, modifier, in which case its case will be fixed
-- to genitive, its number will be fixed to the same number as the first
-- modifier and its state will vary or not depending on whether the first
-- modifier's state varies. It is not possible in general to distinguish
-- adjectival and construct modifiers by looking at the values of
-- modN_state, modN_case or modN_number, since e.g. a third modifier could
-- have all of them specified and be either kind. Thus we have modN_idafa,
-- which is true for a construct modifier, false otherwise.)
function do_gender_number(data, args, argprefs, sgs, default, isfem, num)
local results = do_gender_number_1(data, args, argprefs, sgs["base"],
default, isfem, num, "")
basemodtable = {base=results}
for _, mod in ipairs(mod_list) do
local modn_argprefs = {}
for _, argpref in ipairs(argprefs) do
table.insert(modn_argprefs, mod .. argpref)
end
local modn_results = do_gender_number_1(data, args, modn_argprefs,
sgs[mod] or {}, default, isfem, num, mod .. "_")
basemodtable[mod] = modn_results
end
return basemodtable
end
-- Generate inflections for the given combined stem and type, for MOD
-- (either "" if we're working on the base or "mod_", "mod2_", etc. if we're
-- working on a modifier) and NUMGEN (number or number-gender combination,
-- of the sort that forms part of the keys in DATA.FORMS).
function call_inflection(combined_stem, ty, data, mod, numgen)
if ty == "-" then
return
end
if not inflections[ty] then
error("Unknown inflection type '" .. ty .. "'")
end
local ar, tr = split_arabic_tr(combined_stem)
inflections[ty](ar, tr, data, mod, numgen)
end
-- Generate inflections for the stems of a given number/gender combination
-- and for either the base or the modifier. STEMTYPES is a stem-type list
-- (see do_gender_number()), listing all the stems and corresponding
-- declension types. MOD is either "", "mod_", "mod2_", etc. depending on
-- whether we're working on the base or a modifier. NUMGEN is the number or
-- number-gender combination we're working on, of the sort that forms part
-- of the keys in DATA.FORMS, e.g. "sg" or "m_sg".
function call_inflections(stemtypes, data, mod, numgen)
local mod_with_modnumgen = mod ~= "" and data[mod .. "numgen"]
-- If modN_numgen= is given, do nothing if NUMGEN isn't the same
if mod_with_modnumgen and data[mod .. "numgen"] ~= numgen then
return
end
-- always call inflection() if mod_with_modnumgen since it may affect
-- other numbers (cf. يَوْم الاِثْنَيْن)
if mod_with_modnumgen or contains(data.numbers, rsub(numgen, "^.*_", "")) then
for _, stemtype in ipairs(stemtypes) do
call_inflection(stemtype[1], stemtype[2], data, mod, numgen)
end
end
end
-- Generate the entire set of inflections for a noun or adjective.
-- Also handle any manually-specified part of speech and any manual
-- inflection overrides. The value of INFLECTIONS is an array of stem
-- specifications, one per number, where each element is a size-two
-- array of a stem specification (containing the set of stems and
-- corresponding declension types for the base and any modifiers;
-- see do_gender_number()) and a NUMGEN string, i.e. a string identifying
-- the number or number/gender in question (e.g. "sg", "du", "pl",
-- "m_sg", "f_pl", etc.).
function do_inflections_and_overrides(data, args, inflections)
-- do this before generating inflections so POS change is reflected in
-- categories
if args["pos"] then
data.pos = args["pos"]
end
for _, inflection in ipairs(inflections) do
call_inflections(inflection[1]["base"] or {}, data, "", inflection[2])
for _, mod in ipairs(mod_list) do
call_inflections(inflection[1][mod] or {}, data,
mod .. "_", inflection[2])
end
end
handle_lemma_and_overrides(data, args)
end
-- Helper function for get_heads(). Parses the stems for either the
-- base or the modifier (see do_gender_number()). ARG1 is the argument
-- for the first stem and ARGN is the prefix of the arguments for the
-- remaining stems. For example, for the singular base, ARG1=1 and
-- ARGN="head"; for the first singular modifier, ARG1="mod" and
-- ARGN="modhead"; for the plural base, ARG1=ARGN="pl". The arguments
-- other than the first are numbered 2, 3, ..., which is appended to
-- ARGN. MOD is either "", "mod_", "mod2_", etc. depending if we're
-- working on a base or modifier argument. The returned value is an
-- array of stems, where each element is a size-two array of
-- {COMBINED_STEM, STEM_TYPE}. See do_gender_number().
function get_heads_1(data, args, arg1, argn, mod)
if not args[arg1] then
return {}
end
local heads
if args[arg1] == "-" then
heads = {{"", "-"}}
else
heads = {}
insert_stems(args[arg1], heads, {{args[arg1], ""}}, false, "sg",
get_pos(data, mod))
end
local i = 2
while args[argn .. i] do
local arg = args[argn .. i]
insert_stems(arg, heads, {{arg, ""}}, false, "sg",
get_pos(data, mod))
i = i + 1
end
return heads
end
-- Very similar to do_gender_number(), and returns the same type of
-- structure, but works specifically for the stems of the head (the
-- most basic gender/number combiation, e.g. singular for nouns,
-- masculine singular for adjectives and gendered nouns, collective
-- for collective nouns, etc.), including both base and modifier.
-- See do_gender_number(). Note that the actual return value is
-- two items, the first of which is the same type of structure
-- returned by do_gender_number() and the second of which is a boolean
-- indicating whether we were called from within a template documentation
-- page (in which case no user-specified arguments exist and we
-- substitute sample ones). The reason for this boolean is to indicate
-- whether sample arguments need to be substituted for other numbers
-- as well.
function get_heads(data, args, headtype)
if not args[1] and mw.title.getCurrentTitle().nsText == "Template" then
return {base={{"{{{1}}}", "tri"}}}, true
end
if not args[1] then error("Parameter 1 (" .. headtype .. " stem) may not be empty.") end
local base = get_heads_1(data, args, 1, "head", "")
basemodtable = {base=base}
for _, mod in ipairs(mod_list) do
local modn = get_heads_1(data, args, mod, mod .. "head", mod .. "_")
basemodtable[mod] = modn
end
return basemodtable, false
end
-- The main entry point for noun tables.
function export.show_noun(frame)
local args, origargs, data = init(frame:getParent().args)
data.pos = "noun"
data.numgens = function() return data.numbers end
data.allnumgens = data.allnumbers
local sgs, is_template = get_heads(data, args, "singular")
local pls = is_template and {base={{"{{{pl}}}", "tri"}}} or
do_gender_number(data, args, {"pl", "cpl"}, sgs, nil, false, "pl")
-- always do dual so cases like يَوْم الاِثْنَيْن work -- a singular with
-- a dual modifier, where data.number refers only the singular
-- but we need to go ahead and compute the dual so it parses the
-- "modd" modifier dual argument. When the modifier dual argument
-- is parsed, it will store the resulting dual declension for اِثْنَيْن
-- in the modifier slot for all numbers, including specifically
-- the singular.
local dus = do_gender_number(data, args, {"d"}, sgs, "d", false, "du")
parse_state_etc_spec(data, args)
determine_noun_numbers(data, args, pls)
do_inflections_and_overrides(data, args,
{{sgs, "sg"}, {dus, "du"}, {pls, "pl"}})
-- Make the table
return make_noun_table(data)
end
function any_feminine(data, stem_spec)
for basemod, stemtypelist in pairs(stem_spec) do
-- Only check modifiers if modN_numgen= not given. If not given, the
-- modifier needs to be declined for all numgens; else only for the
-- given numgen, which should be explicitly specified.
if not (basemod ~= "base" and data[basemod .. "_numgen"]) then
for _, stemtype in ipairs(stemtypelist) do
if rfind(stemtype[1], TAM .. UNUOPT .. "/") then
return true
end
end
end
end
return false
end
function all_feminine(data, stem_spec)
for basemod, stemtypelist in pairs(stem_spec) do
-- Only check modifiers if modN_numgen= not given. If not given, the
-- modifier needs to be declined for all numgens; else only for the
-- given numgen, which should be explicitly specified.
if not (basemod ~= "base" and data[basemod .. "_numgen"]) then
for _, stemtype in ipairs(stemtypelist) do
if not rfind(stemtype[1], TAM .. UNUOPT .. "/") then
return false
end
end
end
end
return true
end
-- The main entry point for collective noun tables.
function export.show_coll_noun(frame)
local args, origargs, data = init(frame:getParent().args)
data.pos = "noun"
data.allnumbers = {"coll", "sing", "du", "pauc", "pl"}
data.engnumberscap["pl"] = "plural of variety"
data.numgens = function() return data.numbers end
data.allnumgens = data.allnumbers
local colls, is_template = get_heads(data, args, "collective")
local pls = is_template and {base={{"{{{pl}}}", "tri"}}} or
do_gender_number(data, args, {"pl", "cpl"}, colls, nil, false, "pl")
parse_state_etc_spec(data, args)
-- If collective noun is already feminine in form, don't try to
-- form a feminine singulative
local collfem = any_feminine(data, colls)
local sings = do_gender_number(data, args, {"sing"}, colls,
not already_feminine and "f" or nil, true, "sg")
local singfem = all_feminine(data, sings)
local dus = do_gender_number(data, args, {"d"}, sings, "d", singfem, "du")
local paucs = do_gender_number(data, args, {"pauc"}, sings, "paucp",
singfem, "pl")
-- Can manually specify which numbers are to appear, and exactly those
-- numbers will appear. Otherwise, if any plurals given, plurals appear,
-- and if singulative given, dual and paucal appear.
if not parse_number_spec(data, args) then
data.numbers = {}
if args[1] ~= "-" then
table.insert(data.numbers, "coll")
end
if #sings["base"] > 0 then
table.insert(data.numbers, "sing")
end
if #dus["base"] > 0 then
table.insert(data.numbers, "du")
end
if #paucs["base"] > 0 then
table.insert(data.numbers, "pauc")
end
if #pls["base"] > 0 then
table.insert(data.numbers, "pl")
end
end
-- Generate the collective, singulative, dual, paucal and plural forms
do_inflections_and_overrides(data, args,
{{colls, "coll"}, {sings, "sing"}, {dus, "du"}, {paucs, "pauc"}, {pls, "pl"}})
-- Make the table
return make_noun_table(data)
end
-- The main entry point for singulative noun tables.
function export.show_sing_noun(frame)
local args, origargs, data = init(frame:getParent().args)
data.pos = "noun"
data.allnumbers = {"sing", "coll", "du", "pauc", "pl"}
data.engnumberscap["pl"] = "plural of variety"
data.numgens = function() return data.numbers end
data.allnumgens = data.allnumbers
parse_state_etc_spec(data, args)
local sings, is_template = get_heads(data, args, "singulative")
-- If all singulative nouns feminine in form, form a masculine collective
local singfem = all_feminine(data, sings)
local colls = do_gender_number(data, args, {"coll"}, sings,
singfem and "m" or nil, false, "sg")
local dus = do_gender_number(data, args, {"d"}, sings, "d", singfem, "du")
local paucs = do_gender_number(data, args, {"pauc"}, sings, "paucp",
singfem, "pl")
local pls = is_template and {base={{"{{{pl}}}", "tri"}}} or
do_gender_number(data, args, {"pl", "cpl"}, colls, nil, false, "pl")
-- Can manually specify which numbers are to appear, and exactly those
-- numbers will appear. Otherwise, if any plurals given, plurals appear;
-- if singulative given or derivable, it and dual and paucal will appear.
if not parse_number_spec(data, args) then
data.numbers = {}
if args[1] ~= "-" then
table.insert(data.numbers, "sing")
end
if #colls["base"] > 0 then
table.insert(data.numbers, "coll")
end
if #dus["base"] > 0 then
table.insert(data.numbers, "du")
end
if #paucs["base"] > 0 then
table.insert(data.numbers, "pauc")
end
if #pls["base"] > 0 then
table.insert(data.numbers, "pl")
end
end
-- Generate the singulative, collective, dual, paucal and plural forms
do_inflections_and_overrides(data, args,
{{sings, "sing"}, {colls, "coll"}, {dus, "du"}, {paucs, "pauc"}, {pls, "pl"}})
-- Make the table
return make_noun_table(data)
end
-- The implementation of the main entry point for adjective and
-- gendered noun tables.
function show_gendered(frame, isadj, pos)
local args, origargs, data = init(frame:getParent().args)
data.pos = pos
data.numgens = function()
local numgens = {}
for _, gender in ipairs(data.allgenders) do
for _, number in ipairs(data.numbers) do
table.insert(numgens, gender .. "_" .. number)
end
end
return numgens
end
data.allnumgens = {}
for _, gender in ipairs(data.allgenders) do
for _, number in ipairs(data.allnumbers) do
table.insert(data.allnumgens, gender .. "_" .. number)
end
end
parse_state_etc_spec(data, args)
local msgs = get_heads(data, args, 'masculine singular')
-- Always do all of these so cases like يَوْم الاِثْنَيْن work.
-- See comment in show_noun().
local fsgs = do_gender_number(data, args, {"f"}, msgs, "f", true, "sg")
local mdus = do_gender_number(data, args, {"d"}, msgs, "d", false, "du")
local fdus = do_gender_number(data, args, {"fd"}, fsgs, "d", true, "du")
local mpls = do_gender_number(data, args, {"pl", "cpl"}, msgs,
isadj and "p" or nil, false, "pl")
local fpls = do_gender_number(data, args, {"fpl", "cpl"}, fsgs, "fp",
true, "pl")
if isadj then
parse_number_spec(data, args)
else
determine_noun_numbers(data, args, mpls)
end
-- Generate the singular, dual and plural forms
do_inflections_and_overrides(data, args,
{{msgs, "m_sg"}, {fsgs, "f_sg"}, {mdus, "m_du"}, {fdus, "f_du"},
{mpls, "m_pl"}, {fpls, "f_pl"}})
-- Make the table
if isadj then
return make_adj_table(data)
else
return make_gendered_noun_table(data)
end
end
-- The main entry point for gendered noun tables.
function export.show_gendered_noun(frame)
return show_gendered(frame, false, "noun")
end
-- The main entry point for numeral tables. Same as using show_gendered_noun()
-- with pos=numeral.
function export.show_numeral(frame)
return show_gendered(frame, false, "numeral")
end
-- The main entry point for adjective tables.
function export.show_adj(frame)
return show_gendered(frame, true, "adjective")
end
-- Inflection functions
function do_translit(term)
return (lang:transliterate(term)) or track("cant-translit") and BOGUS_CHAR
end
function split_arabic_tr(term)
if term == "" then
return "", ""
elseif not rfind(term, "/") then
return term, do_translit(term)
else
splitvals = rsplit(term, "/")
if #splitvals ~= 2 then
error("Must have at most one slash in a combined Arabic/translit expr: '" .. term .. "'")
end
return splitvals[1], splitvals[2]
end
end
function reorder_shadda(word)
-- shadda+short-vowel (including tanwīn vowels, i.e. -an -in -un) gets
-- replaced with short-vowel+shadda during NFC normalisation, which
-- MediaWiki does for all Unicode strings; however, it makes the
-- detection process inconvenient, so undo it.
word = rsub(word, "(" .. DIACRITIC_ANY_BUT_SH .. ")" .. SH, SH .. "%1")
return word
end
-- Combine PREFIX, AR/TR, and ENDING in that order. PREFIX and ENDING
-- can be of the form ARABIC/TRANSLIT. The Arabic and translit parts are
-- separated out and grouped together, resulting in a string of the
-- form ARABIC/TRANSLIT (TRANSLIT will always be present, computed
-- automatically if not present in the source). The return value is actually a
-- list of ARABIC/TRANSLIT strings because hamza resolution is applied to
-- ARABIC, which may produce multiple outcomes (all of which will have the
-- same TRANSLIT).
function combine_with_ending(prefix, ar, tr, ending)
local prefixar, prefixtr = split_arabic_tr(prefix)
local endingar, endingtr = split_arabic_tr(ending)
-- When calling hamza_seat(), leave out prefixes, which we expect to be
-- clitics like وَ. (In case the prefix is a separate word, it won't matter
-- whether we include it in the text passed to hamza_seat().)
allar = hamza_seat(ar .. endingar)
-- Convert ...īān to ...iyān in case of stems ending in -ī or -ū
-- (e.g. kubrī "bridge").
if rfind(endingtr, "^[aeiouāēīōū]") then
if rfind(tr, "ī$") then
tr = rsub(tr, "ī$", "iy")
elseif rfind(tr, "ū$") then
tr = rsub(tr, "ū$", "uw")
end
end
tr = prefixtr .. tr .. endingtr
allartr = {}
for _, arval in ipairs(allar) do
table.insert(allartr, prefixar .. arval .. "/" .. tr)
end
return allartr
end
-- Combine PREFIX, STEM/TR and ENDING in that order and insert into the
-- list of items in DATA[KEY], initializing it if empty and making sure
-- not to insert duplicates. ENDING can be a list of endings, will be
-- distributed over the remaining parts. PREFIX and/or ENDING can be
-- of the form ARABIC/TRANSLIT (the stem is already split into Arabic STEM
-- and Latin TR). Note that what's inserted into DATA[KEY] is actually a
-- list of ARABIC/TRANSLIT strings; if more than one is present in the list,
-- they represent hamza variants, i.e. different ways of writing a hamza
-- sound, such as مُبْتَدَؤُون vs. مُبْتَدَأُون (see init_data()).
function add_inflection(data, key, prefix, stem, tr, ending)
if data.forms[key] == nil then
data.forms[key] = {}
end
if type(ending) ~= "table" then
ending = {ending}
end
for _, endingval in ipairs(ending) do
insert_if_not(data.forms[key],
combine_with_ending(prefix, stem, tr, endingval))
end
end
-- Form inflections from combination of STEM, with transliteration TR,
-- and ENDINGS (and definite article where necessary, plus any specified
-- prefixes) and store in DATA, for the number or gender/number
-- determined by MOD ("", "mod_", "mod2_", etc.; see call_inflection()) and
-- NUMGEN ("sg", "du", "pl", or "m_sg", "f_pl", etc. for adjectives). ENDINGS
-- is an array of 15 values, each of which is a string or array of
-- alternatives. The order of ENDINGS is indefinite nom, acc, gen; definite
-- nom, acc, gen; construct-state nom, acc, gen; informal indefinite, definite,
-- construct; lemma indefinite, definite, construct. (Normally the lemma is
-- based off of the indefinite, but if the inflection has been restricted to
-- particular states, it comes from one of those states, in the order
-- indefinite, definite, construct.) See also add_inflection() for more info
-- on exactly what is inserted into DATA.
function add_inflections(stem, tr, data, mod, numgen, endings)
stem = canon_hamza(stem)
assert(#endings == 15)
local ismod = mod ~= ""
-- If working on modifier and modN_numgen= is given, it better agree with
-- NUMGEN; the case where it doesn't agree should have been caught in
-- call_inflections().
if ismod and data[mod .. "numgen"] then
assert(data[mod .. "numgen"] == numgen)
end
-- Return a list of combined of ar/tr forms, with the ending tacked on.
-- There may be more than one form because of alternative hamza seats that
-- may be supplied, e.g. مُبْتَدَؤُون or مُبْتَدَأُون (mubtadaʾūn "(grammatical) subjects").
local defstem, deftr
if stem == "?" or data[mod .. "omitarticle"] then
defstem = stem
deftr = tr
else
-- apply sun-letter assimilation and hamzat al-wasl elision
defstem = rsub("الْ" .. stem, "^الْ([سشصتثطدذضزژظنرل])", "ال%1ّ")
defstem = rsub(defstem, "^الْ([اٱ])([ًٌٍَُِ])", "ال%2%1")
deftr = rsub("al-" .. tr, "^al%-([sšṣtṯṭdḏḍzžẓnrḷ])", "a%1-%1")
end
-- For a given MOD spec, is the previous word (base or modifier) a noun?
-- We assume the base is always a noun in this case, and otherwise
-- look at the value of modN_idafa.
local function prev_mod_is_noun(mod)
if mod == "mod_" then
return true
end
if mod == "mod2_" then
return data["mod_idafa"]
end
modnum = assert_rsub(mod, "^mod([0-9]+)_$", "%1")
modnum = modnum - 1
return data["mod" .. modnum .. "_idafa"]
end
local numgens = ismod and data[mod .. "numgen"] and data.numgens() or {numgen}
-- "defcon" means definite adjective modifying construct state noun. We
-- add a ... before the adjective (and after the construct-state noun) to
-- indicate that a nominal modifier would go between noun and adjective.
local stems = {ind = stem, def = defstem, con = stem,
defcon = "... " .. defstem}
local trs = {ind = tr, def = deftr, con = tr, defcon = "... " .. deftr}
for _, ng in ipairs(numgens) do
for _, state in ipairs(data.allstates) do
for _, case in ipairs(data.allcases_with_lemma) do
-- We are generating the inflections for STATE, but sometimes
-- we want to use the inflected form of a different state, e.g.
-- if modN_state= or basestate= is set to some particular state.
-- If we're dealing with an adjectival modifier, then in
-- place of "con" we use "defcon" if immediately after a noun
-- (see comment above), else "def".
local thestate = ismod and data[mod .. "state"] or
ismod and not data[mod .. "idafa"] and state == "con" and
(prev_mod_is_noun(mod) and "defcon" or "def") or
not ismod and data.basestate or
state
local is_lemmainf = case == "lemma" or case == "inf"
-- Don't substitute value of modcase for lemma/informal "cases"
local thecase = is_lemmainf and case or
ismod and data[mod .. "case"] or case
add_inflection(data, mod .. case .. "_" .. ng .. "_" .. state,
data[mod .. "prefix"] or "",
stems[thestate], trs[thestate],
endings[data.statecases[thestate][thecase]])
end
end
end
end
-- Insert into a category and a type variable (e.g. m_sg_type) for the
-- declension type of a particular declension (e.g. masculine singular for
-- adjectives). MOD and NUMGEN are as in call_inflection(). CATVALUE is the
-- category and ENGVALUE is the English description of the declension type.
-- In these values, NOUN is replaced with either "noun" or "adjective",
-- SINGULAR is replaced with the English equivalent of the number in NUMGEN
-- (e.g. "singular", "dual" or "plural") while BROKSING is the same but uses
-- "broken plural" in place of "plural" and "broken paucal" in place of
-- "paucal".
function insert_cat(data, mod, numgen, catvalue, engvalue)
local singpl = data.engnumbers[rsub(numgen, "^.*_", "")]
assert(singpl ~= nil)
local broksingpl = rsub(singpl, "plural", "broken plural")
broksingpl = rsub(broksingpl, "paucal", "broken paucal")
if rfind(broksingpl, "broken plural") and (rfind(catvalue, "BROKSING") or
rfind(engvalue, "BROKSING")) then
table.insert(data.categories, "Arabic " .. data.pos .. "s with broken plural")
end
if rfind(catvalue, "irregular") or rfind(engvalue, "irregular") then
table.insert(data.categories, "Arabic irregular " .. data.pos .. "s")
end
catvalue = rsub(catvalue, "NOUN", data.pos)
catvalue = rsub(catvalue, "SINGULAR", singpl)
catvalue = rsub(catvalue, "BROKSING", broksingpl)
engvalue = rsub(engvalue, "NOUN", data.pos)
engvalue = rsub(engvalue, "SINGULAR", singpl)
engvalue = rsub(engvalue, "BROKSING", broksingpl)
-- add links to specialised grammatical terms
engvalue = rsub(engvalue, "triptote", "[[triptote]]")
engvalue = rsub(engvalue, "diptote", "[[diptote]]")
engvalue = rsub(engvalue, "broken plural", "BBB")
engvalue = rsub(engvalue, "sound plural", "SSS")
engvalue = rsub(engvalue, "broken", "[[broken plural|broken]]")
engvalue = rsub(engvalue, "sound", "[[sound plural|sound]]")
engvalue = rsub(engvalue, "BBB", "[[broken plural]]")
engvalue = rsub(engvalue, "SSS", "[[sound plural]]")
if mod == "" and catvalue ~= "" then
insert_if_not(data.categories, catvalue)
end
if engvalue ~= "" then
local key = mod .. numgen .. "_type"
if data.forms[key] == nil then
data.forms[key] = {}
end
insert_if_not(data.forms[key], engvalue)
end
if contains(data.states, "def") and not contains(data.states, "ind") then
insert_if_not(data.categories, "Arabic definite " .. data.pos .. "s")
end
end
-- Return true if we're handling modifier inflections and the modifier's
-- case is limited to an oblique case (gen or acc; typically genitive,
-- in an ʾidāfa construction). This is used when returning lemma
-- inflections -- the modifier part of the lemma should agree in case
-- with modifier's case if it's restricted in case.
function mod_oblique(mod, data)
return mod ~= "" and data[mod .. "case"] and (
data[mod .. "case"] == "acc" or data[mod .. "case"] == "gen")
end
-- Similar to mod_oblique but specifically when the modifier case is
-- limited to the accusative (which is rare or nonexistent in practice).
function mod_acc(mod, data)
return mod ~= "" and data[mod .. "case"] and data[mod .. "case"] == "acc"
end
-- Handle triptote and diptote inflections
function triptote_diptote(stem, tr, data, mod, numgen, is_dip, lc)
-- Remove any case ending
if rfind(stem, "[" .. UN .. U .. "]$") then
stem = rsub(stem, "[" .. UN .. U .. "]$", "")
tr = rsub(tr, "un?$", "")
end
-- special-case for صلوة pronounced ṣalāh; check translit
local is_aah = rfind(stem, TAM .. "$") and rfind(tr, "āh$")
if rfind(stem, TAM .. "$") then
if rfind(tr, "h$") then
tr = rsub(tr, "h$", "t")
elseif not rfind(tr, "t$") then
tr = tr .. "t"
end
end
add_inflections(stem, tr, data, mod, numgen,
{is_dip and U or UN,
is_dip and A or AN .. ((rfind(stem, "[" .. HAMZA_ON_ALIF .. TAM .. "]$")
or rfind(stem, "[" .. AMAD .. ALIF .. "]" .. HAMZA .. "$")
) and "" or ALIF),
is_dip and A or IN,
U, A, I,
lc and UU or U,
lc and AA or A,
lc and II or I,
{}, {}, {}, -- omit informal inflections
{}, {}, {}, -- omit lemma inflections
})
-- add category and informal and lemma inflections
local tote = lc and "long construct" or is_dip and "diptote" or "triptote"
local singpl_tote = "BROKSING " .. tote
local cat_prefix = "Arabic NOUNs with " .. tote .. " BROKSING"
-- since we're checking translit for -āh we probably don't need to
-- check stem too
if is_aah or rfind(stem, "[" .. AMAD .. ALIF .. "]" .. TAM .. "$") then
add_inflections(stem, rsub(tr, "t$", ""), data, mod, numgen,
{{}, {}, {},
{}, {}, {},
{}, {}, {},
"/t", "/t", "/t", -- informal pron. is -āt
"/h", "/h", "/t", -- lemma uses -āh
})
insert_cat(data, mod, numgen, cat_prefix .. " in -āh",
singpl_tote .. " in " .. make_link(HYPHEN .. AAH))
elseif rfind(stem, TAM .. "$") then
add_inflections(stem, rsub(tr, "t$", ""), data, mod, numgen,
{{}, {}, {},
{}, {}, {},
{}, {}, {},
"", "", "/t",
"", "", "/t",
})
insert_cat(data, mod, numgen, cat_prefix .. " in -a",
singpl_tote .. " in " .. make_link(HYPHEN .. AH))
elseif lc then
add_inflections(stem, tr, data, mod, numgen,
{{}, {}, {},
{}, {}, {},
{}, {}, {},
"", "", UU,
"", "", UU,
})
insert_cat(data, mod, numgen, cat_prefix,
singpl_tote)
else
-- also special-case the nisba ending, which has an informal
-- pronunciation.
if rfind(stem, IY .. SH .. "$") then
local infstem = rsub(stem, SH .. "$", "")
local inftr = rsub(tr, "iyy$", "ī")
-- add informal and lemma inflections separately
add_inflections(infstem, inftr, data, mod, numgen,
{{}, {}, {},
{}, {}, {},
{}, {}, {},
"", "", "",
{}, {}, {},
})
add_inflections(stem, tr, data, mod, numgen,
{{}, {}, {},
{}, {}, {},
{}, {}, {},
{}, {}, {},
"", "", "",
})
else
add_inflections(stem, tr, data, mod, numgen,
{{}, {}, {},
{}, {}, {},
{}, {}, {},
"", "", "",
"", "", "",
})
end
insert_cat(data, mod, numgen, "Arabic NOUNs with basic " .. tote .. " BROKSING",
"basic " .. singpl_tote)
end
end
-- Regular triptote
inflections["tri"] = function(stem, tr, data, mod, numgen)
triptote_diptote(stem, tr, data, mod, numgen, false)
end
-- Regular diptote
inflections["di"] = function(stem, tr, data, mod, numgen)
triptote_diptote(stem, tr, data, mod, numgen, true)
end
-- Elative and color/defect adjective: usually same as diptote,
-- might be invariable
function elative_color_defect(stem, tr, data, mod, numgen)
if rfind(stem, "[" .. ALIF .. AMAQ .. "]$") then
invariable(stem, tr, data, mod, numgen)
else
triptote_diptote(stem, tr, data, mod, numgen, true)
end
end
-- Elative: usually same as diptote, might be invariable
inflections["el"] = function(stem, tr, data, mod, numgen)
elative_color_defect(stem, tr, data, mod, numgen)
end
-- Color/defect adjective: Same as elative
inflections["cd"] = function(stem, tr, data, mod, numgen)
elative_color_defect(stem, tr, data, mod, numgen)
end
-- Triptote with lengthened ending in the construct state
inflections["lc"] = function(stem, tr, data, mod, numgen)
triptote_diptote(stem, tr, data, mod, numgen, false, true)
end
function in_defective(stem, tr, data, mod, numgen, tri)
if not rfind(stem, IN .. "$") then
error("'in' declension stem should end in -in: '" .. stem .. "'")
end
stem = rsub(stem, IN .. "$", "")
tr = rsub(tr, "in$", "")
local acc_ind_ending = tri and IY .. AN .. ALIF or IY .. A
add_inflections(stem, tr, data, mod, numgen,
{IN, acc_ind_ending, IN,
II, IY .. A, II,
II, IY .. A, II,
II, II, II,
-- FIXME: What should happen with the lemma when modifier case
-- is limited to the accusative and modifier state is e.g. definite?
-- Should the lemma end in -iya or -ī? In practice this will rarely
-- if ever happen.
mod_acc(mod, data) and acc_ind_ending or IN, II, II,
})
local tote = tri and "triptote" or "diptote"
insert_cat(data, mod, numgen, "Arabic NOUNs with " .. tote .. " BROKSING in -in",
"BROKSING " .. tote .. " in " .. make_link(HYPHEN .. IN))
end
function detect_in_type(stem, ispl)
if ispl and rfind(stem, "^" .. CONS .. AOPT .. CONS .. AOPTA .. CONS .. IN .. "$") then -- layālin
return "diin"
else -- other -in words
return "triin"
end
end
-- Defective in -in
inflections["in"] = function(stem, tr, data, mod, numgen)
in_defective(stem, tr, data, mod, numgen,
detect_in_type(stem, rfind(numgen, "pl")) == "triin")
end
-- Defective in -in, force "triptote" variant
inflections["triin"] = function(stem, tr, data, mod, numgen)
in_defective(stem, tr, data, mod, numgen, true)
end
-- Defective in -in, force "diptote" variant
inflections["diin"] = function(stem, tr, data, mod, numgen)
in_defective(stem, tr, data, mod, numgen, false)
end
-- Defective in -an (comes in two variants, depending on spelling with tall alif or alif maqṣūra)
inflections["an"] = function(stem, tr, data, mod, numgen)
local tall_alif
if rfind(stem, AN .. ALIF .. "$") then
tall_alif = true
stem = rsub(stem, AN .. ALIF .. "$", "")
elseif rfind(stem, AN .. AMAQ .. "$") then
tall_alif = false
stem = rsub(stem, AN .. AMAQ .. "$", "")
else
error("Invalid stem for 'an' declension type: " .. stem)
end
tr = rsub(tr, "an$", "")
if tall_alif then
add_inflections(stem, tr, data, mod, numgen,
{AN .. ALIF, AN .. ALIF, AN .. ALIF,
AA, AA, AA,
AA, AA, AA,
AA, AA, AA,
AN .. ALIF, AA, AA,
})
else
add_inflections(stem, tr, data, mod, numgen,
{AN .. AMAQ, AN .. AMAQ, AN .. AMAQ,
AAMAQ, AAMAQ, AAMAQ,
AAMAQ, AAMAQ, AAMAQ,
AAMAQ, AAMAQ, AAMAQ,
AN .. AMAQ, AAMAQ, AAMAQ,
})
end
-- FIXME: Should we distinguish between tall alif and alif maqṣūra?
insert_cat(data, mod, numgen, "Arabic NOUNs with BROKSING in -an",
"BROKSING in " .. make_link(HYPHEN .. AN .. (tall_alif and ALIF or AMAQ)))
end
function invariable(stem, tr, data, mod, numgen)
add_inflections(stem, tr, data, mod, numgen,
{"", "", "",
"", "", "",
"", "", "",
"", "", "",
"", "", "",
})
insert_cat(data, mod, numgen, "Arabic NOUNs with invariable BROKSING",
"BROKSING invariable")
end
-- Invariable in -ā (non-loanword type)
inflections["inv"] = function(stem, tr, data, mod, numgen)
invariable(stem, tr, data, mod, numgen)
end
-- Invariable in -ā (loanword type, behaving in the dual as if ending in -a, I think!)
inflections["lwinv"] = function(stem, tr, data, mod, numgen)
invariable(stem, tr, data, mod, numgen)
end
-- Duals
inflections["d"] = function(stem, tr, data, mod, numgen)
if rfind(stem, ALIF .. NI .. "?$") then
stem = rsub(stem, AOPTA .. NI .. "?$", "")
elseif rfind(stem, AMAD .. NI .. "?$") then
stem = rsub(stem, AMAD .. NI .. "?$", HAMZA_PH)
else
error("Dual stem should end in -ān(i): '" .. stem .. "'")
end
tr = rsub(tr, "āni?$", "")
local mo = mod_oblique(mod, data)
add_inflections(stem, tr, data, mod, numgen,
{AANI, AYNI, AYNI,
AANI, AYNI, AYNI,
AA, AYSK, AYSK,
AYN, AYN, AYSK,
mo and AYN or AAN, mo and AYN or AAN, mo and AYSK or AA,
})
insert_cat(data, mod, numgen, "", "dual in " .. make_link(HYPHEN .. AANI))
end
-- Sound masculine plural
inflections["smp"] = function(stem, tr, data, mod, numgen)
if not rfind(stem, UUNA .. "?$") then
error("Sound masculine plural stem should end in -ūn(a): '" .. stem .. "'")
end
stem = rsub(stem, UUNA .. "?$", "")
tr = rsub(tr, "ūna?$", "")
local mo = mod_oblique(mod, data)
add_inflections(stem, tr, data, mod, numgen,
{UUNA, IINA, IINA,
UUNA, IINA, IINA,
UU, II, II,
IIN, IIN, II,
mo and IIN or UUN, mo and IIN or UUN, mo and II or UU,
})
-- use SINGULAR because conceivably this might be used with the paucal
-- instead of plural
insert_cat(data, mod, numgen, "Arabic NOUNs with sound masculine SINGULAR",
"sound masculine SINGULAR")
end
-- Sound feminine plural
inflections["sfp"] = function(stem, tr, data, mod, numgen)
if not rfind(stem, "[" .. ALIF .. AMAD .. "]" .. T .. UN .. "?$") then
error("Sound feminine plural stem should end in -āt(un): '" .. stem .. "'")
end
stem = rsub(stem, UN .. "$", "")
tr = rsub(tr, "un$", "")
add_inflections(stem, tr, data, mod, numgen,
{UN, IN, IN,
U, I, I,
U, I, I,
"", "", "",
"", "", "",
})
-- use SINGULAR because this might be used with the paucal
-- instead of plural
insert_cat(data, mod, numgen, "Arabic NOUNs with sound feminine SINGULAR",
"sound feminine SINGULAR")
end
-- Plural of defective in -an
inflections["awnp"] = function(stem, tr, data, mod, numgen)
if not rfind(stem, AWNA .. "?$") then
error("'awnp' plural stem should end in -awn(a): '" .. stem .. "'")
end
stem = rsub(stem, AWNA .. "?$", "")
tr = rsub(tr, "awna?$", "")
local mo = mod_oblique(mod, data)
add_inflections(stem, tr, data, mod, numgen,
{AWNA, AYNA, AYNA,
AWNA, AYNA, AYNA,
AWSK, AYSK, AYSK,
AYN, AYN, AYSK,
mo and AYN or AWN, mo and AYN or AWN, mo and AYSK or AWSK,
})
-- use SINGULAR because conceivably this might be used with the paucal
-- instead of plural
insert_cat(data, mod, numgen, "Arabic NOUNs with sound SINGULAR in -awna",
"sound SINGULAR in " .. make_link(HYPHEN .. AWNA))
end
-- Unknown
inflections["?"] = function(stem, tr, data, mod, numgen)
add_inflections("?", "?", data, mod, numgen,
{"", "", "",
"", "", "",
"", "", "",
"", "", "",
"", "", "",
})
insert_cat(data, mod, numgen, "Arabic NOUNs with unknown SINGULAR",
"SINGULAR unknown")
end
-- Detect declension of noun or adjective stem or lemma. We allow triptotes,
-- diptotes and sound plurals to either come with ʾiʿrāb or not. We detect
-- some cases where vowels are missing, when it seems fairly unambiguous to
-- do so. ISFEM is true if we are dealing with a feminine stem (not
-- currently used and needs to be rethought). NUM is "sg", "du", or "pl",
-- depending on the number of the stem.
--
-- POS is the part of speech, generally "noun" or "adjective". Used to
-- distinguish nouns and adjectives of the فَعْلَان type. There are nouns of
-- this type and they generally are triptotes, e.g. قَطْرَان "tar"
-- and شَيْطَان "devil". An additional complication is that the user can set
-- the POS to something else, like "numeral". We don't use this POS for
-- modifiers, where we determine whether they are noun-like or adjective-like
-- according to whether mod_idafa= is true.
--
-- Some unexpectedly diptote nouns/adjectives:
--
-- jiʿrān in ʾabū jiʿrān "dung beetle"
-- distributive numbers: ṯunāʾ "two at a time", ṯulāṯ/maṯlaṯ "three at a time",
-- rubāʿ "four at a time" (not a regular diptote pattern, cf. triptote
-- junāḥ "misdemeanor, sin", nujār "origin, root", nuḥām "flamingo")
-- jahannam (f.) "hell"
-- many names: jilliq/jillaq "Damascus", judda/jidda "Jedda", jibrīl (and
-- variants) "Gabriel", makka "Mecca", etc.
-- jibriyāʾ "pride"
-- kibriyāʾ "glory, pride"
-- babbaḡāʾ "parrot"
-- ʿayāyāʾ "incapable, tired"
-- suwaidāʾ "black bile, melancholy"
-- Note also: ʾajhar "day-blind" (color-defect) and ʾajhar "louder" (elative)
function export.detect_type(stem, isfem, num, pos)
local function dotrack(word)
track(word)
track(word .. "/" .. pos)
return true
end
-- Not strictly necessary because the caller (stem_and_type) already
-- reorders, but won't hurt, and may be necessary if this function is
-- called from an external caller.
stem = reorder_shadda(stem)
local origstem = stem
-- So that we don't get tripped up by alif madda, we replace alif madda
-- with the sequence hamza + fatḥa + alif before the regexps below.
stem = rsub(stem, AMAD, HAMZA .. AA)
if num == "du" then
if rfind(stem, ALIF .. NI .. "?$") then
return "d"
else
error("Malformed stem for dual, should end in the nominative dual ending -ān(i): '" .. origstem .. "'")
end
end
if rfind(stem, IN .. "$") then -- -in words
return detect_in_type(stem, num == "pl")
elseif rfind(stem, AN .. "[" .. ALIF .. AMAQ .. "]$") then
return "an"
elseif rfind(stem, AN .. "$") then
error("Malformed stem, fatḥatan should be over second-to-last letter: " .. origstem)
elseif num == "pl" and rfind(stem, AW .. SKOPT .. N .. AOPT .. "$") then
return "awnp"
elseif num == "pl" and rfind(stem, ALIF .. T .. UNOPT .. "$") and
-- Avoid getting tripped up by plurals like ʾawqāt "times",
-- ʾaḥwāt "fishes", ʾabyāt "verses", ʾazyāt "oils", ʾaṣwāt "voices",
-- ʾamwāt "dead (pl.)".
not rfind(stem, HAMZA_ON_ALIF .. A .. CONS .. SK .. CONS .. AAT .. UNOPT .. "$") then
return "sfp"
elseif num == "pl" and rfind(stem, W .. N .. AOPT .. "$") and
-- Avoid getting tripped up by plurals like ʿuyūn "eyes",
-- qurūn "horns" (note we check for U between first two consonants
-- so we correctly ignore cases like sinūn "hours" (from sana),
-- riʾūn "lungs" (from riʾa) and banūn "sons" (from ibn).
not rfind(stem, "^" .. CONS .. U .. CONS .. UUN .. AOPT .. "$") then
return "smp"
elseif rfind(stem, UN .. "$") then -- explicitly specified triptotes (we catch sound feminine plurals above)
return "tri"
elseif rfind(stem, U .. "$") then -- explicitly specified diptotes
return "di"
elseif -- num == "pl" and
( -- various diptote plural patterns; these are diptote even in the singular (e.g. yanāyir "January", falāfil "falafel", tuʾabāʾ "yawn, fatigue"
-- currently we sometimes end up with such plural patterns in the "singular" in a singular
-- ʾidāfa construction with plural modifier. (FIXME: These should be fixed to the correct number.)
rfind(stem, "^" .. CONS .. AOPT .. CONS .. AOPTA .. CONS .. IOPT .. Y .. "?" .. CONS .. "$") and dotrack("fawaakih") or -- fawākih, daqāʾiq, makātib, mafātīḥ
rfind(stem, "^" .. CONS .. AOPT .. CONS .. AOPTA .. CONS .. SH .. "$")
and not rfind(stem, "^" .. T) and dotrack("mawaadd") or -- mawādd, maqāmm, ḍawāll; exclude t- so we don't catch form-VI verbal nouns like taḍādd (HACK!!!)
rfind(stem, "^" .. CONS .. U .. CONS .. AOPT .. CONS .. AOPTA .. HAMZA .. "$") and dotrack("wuzaraa") or -- wuzarāʾ "ministers", juhalāʾ "ignorant (pl.)"
rfind(stem, ELCD_START .. SKOPT .. CONS .. IOPT .. CONS .. AOPTA .. HAMZA .. "$") and dotrack("asdiqaa") or -- ʾaṣdiqāʾ
rfind(stem, ELCD_START .. IOPT .. CONS .. SH .. AOPTA .. HAMZA .. "$") and dotrack("aqillaa") -- ʾaqillāʾ, ʾajillāʾ "important (pl.)", ʾaḥibbāʾ "lovers"
) then
return "di"
elseif num == "sg" and ( -- diptote singular patterns (nouns/adjectives)
rfind(stem, "^" .. CONS .. A .. CONS .. SK .. CONS .. AOPTA .. HAMZA .. "$") and dotrack("qamraa") or -- qamrāʾ "moon-white, moonlight"; baydāʾ "desert"; ṣaḥrāʾ "desert-like, desert"; tayhāʾ "trackless, desolate region"; not pl. to avoid catching e.g. ʾabnāʾ "sons", ʾaḥmāʾ "fathers-in-law", ʾamlāʾ "steppes, deserts" (pl. of malan), ʾanbāʾ "reports" (pl. of nabaʾ)
rfind(stem, ELCD_START .. SK .. CONS .. A .. CONS .. "$") and dotrack("abyad") or -- ʾabyaḍ "white", ʾakbar "greater"; FIXME nouns like ʾaʿzab "bachelor", ʾaḥmad "Ahmed" but not ʾarnab "rabbit", ʾanjar "anchor", ʾabjad "abjad", ʾarbaʿ "four", ʾandar "threshing floor" (cf. diptote ʾandar "rarer")
rfind(stem, ELCD_START .. A .. CONS .. SH .. "$") and dotrack("alaff") or -- ʾalaff "plump", ʾaḥabb "more desirable"
-- do the following on the origstem so we can check specifically for alif madda
rfind(origstem, "^" .. AMAD .. CONS .. A .. CONS .. "$") and dotrack("aalam") -- ʾālam "more painful", ʾāḵar "other"
) then
return "di"
elseif num == "sg" and pos == "adjective" and ( -- diptote singular patterns (adjectives)
rfind(stem, "^" .. CONS .. A .. CONS .. SK .. CONS .. AOPTA .. N .. "$") and dotrack("kaslaan") or -- kaslān "lazy", ʿaṭšān "thirsty", jawʿān "hungry", ḡaḍbān "angry", tayhān "wandering, perplexed"; but not nouns like qaṭrān "tar", šayṭān "devil", mawtān "plague", maydān "square"
-- rfind(stem, "^" .. CONS .. A .. CONS .. SH .. AOPTA .. N .. "$") and dotrack("laffaa") -- excluded because of too many false positives e.g. ḵawwān "disloyal", not to mention nouns like jannān "gardener"; only diptote example I can find is ʿayyān "incapable, weary" (diptote per Lane but not Wehr)
rfind(stem, "^" .. CONS .. A .. CONS .. SH .. AOPTA .. HAMZA .. "$") and dotrack("laffaa") -- laffāʾ "plump (fem.)"; but not nouns like jarrāʾ "runner", ḥaddāʾ "camel driver", lawwāʾ "wryneck"
) then
return "di"
elseif rfind(stem, AMAQ .. "$") then -- kaslā, ḏikrā (spelled with alif maqṣūra)
return "inv"
elseif rfind(stem, "[" .. ALIF .. SK .. "]" .. Y .. AOPTA .. "$") then -- dunyā, hadāyā (spelled with tall alif after yāʾ)
return "inv"
elseif rfind(stem, ALIF .. "$") then -- kāmērā, lībiyā (spelled with tall alif; we catch dunyā and hadāyā above)
return "lwinv"
elseif rfind(stem, II .. "$") then -- cases like كُوبْرِي kubrī "bridge" and صَوَانِي ṣawānī pl. of ṣīniyya; modern words that would probably end with -in
dotrack("ii")
return "inv"
elseif rfind(stem, UU .. "$") then -- FIXME: Does this occur? Check the tracking
dotrack("uu")
return "inv"
else
return "tri"
end
end
-- Replace hamza (of any sort) at the end of a word, possibly followed by
-- a nominative case ending or -in or -an, with HAMZA_PH, and replace alif
-- madda at the end of a word with HAMZA_PH plus fatḥa + alif. To undo these
-- changes, use hamza_seat().
function canon_hamza(word)
word = rsub(word, AMAD .. "$", HAMZA_PH .. AA)
word = rsub(word, HAMZA_ANY .. "([" .. UN .. U .. IN .. "]?)$", HAMZA_PH .. "%1")
word = rsub(word, HAMZA_ANY .. "(" .. AN .. "[" .. ALIF .. AMAQ .. "])$", HAMZA_PH .. "%1")
return word
end
-- Supply the appropriate hamza seat(s) for a placeholder hamza.
function hamza_seat(word)
if rfind(word, HAMZA_PH) then -- optimization to avoid many regexp substs
return ar_utilities.process_hamza(word)
end
return {word}
end
--[[
-- Supply the appropriate hamza seat for a placeholder hamza in a combined
-- Arabic/translation expression.
function split_and_hamza_seat(word)
if rfind(word, HAMZA_PH) then -- optimization to avoid many regexp substs
local ar, tr = split_arabic_tr(word)
-- FIXME: Do something with all values returned
ar = ar_utilities.process_hamza(ar)[1]
return ar .. "/" .. tr
end
return word
end
--]]
-- Return stem and type of an argument given the singular stem and whether
-- this is a plural argument. WORD may be of the form ARABIC, ARABIC/TR,
-- ARABIC:TYPE, ARABIC/TR:TYPE, or TYPE, for Arabic stem ARABIC with
-- transliteration TR and of type (i.e. declension) TYPE. If the type
-- is omitted, it is auto-detected using detect_type(). If the transliteration
-- is omitted, it is auto-transliterated from the Arabic. If only the type
-- is present, it is a sound plural type ("sf", "sm" or "awn"),
-- in which case the stem and translit are generated from the singular by
-- regular rules. SG may be of the form ARABIC/TR or ARABIC. ISFEM is true
-- if WORD is a feminine stem. NUM is either "sg", "du" or "pl" according to
-- the number of the stem. The return value will be in the ARABIC/TR format.
--
-- POS is the part of speech, generally "noun" or "adjective". Used to
-- distinguish nouns and adjectives of the فَعْلَان type. There are nouns of
-- this type and they generally are triptotes, e.g. قَطْرَان "tar"
-- and شَيْطَان "devil". An additional complication is that the user can set
-- the POS to something else, like "numeral". We don't use this POS for
-- modifiers, where we determine whether they are noun-like or adjective-like
-- according to whether mod_idafa= is true.
function export.stem_and_type(word, sg, sgtype, isfem, num, pos)
local rettype = nil
if rfind(word, ":") then
local split = rsplit(word, ":")
if #split > 2 then
error("More than one colon found in argument: '" .. word .. "'")
end
word, rettype = split[1], split[2]
end
local ar, tr = split_arabic_tr(word)
-- Need to reorder shaddas here so that shadda at the end of a stem
-- followed by ʾiʿrāb or a plural ending or whatever can get processed
-- correctly. This processing happens in various places so make sure
-- we return the reordered Arabic in all circumstances.
ar = reorder_shadda(ar)
local artr = ar .. "/" .. tr
-- Now return split-out ARABIC/TR and TYPE, with shaddas reordered in
-- the Arabic.
if rettype then
return artr, rettype
end
-- Likewise, do shadda reordering for the singular.
local sgar, sgtr = split_arabic_tr(sg)
sgar = reorder_shadda(sgar)
-- Apply a substitution to the singular Arabic and translit. If a
-- substitution could be made, return the combined ARABIC/TR with
-- substitutions made; else, return nil. The Arabic has ARFROM
-- replaced with ARTO, while the translit has TRFROM replaced with
-- TRTO, and if that doesn't match, replace TRFROM2 with TRTO2.
local function sub(arfrom, arto, trfrom, trto, trfrom2, trto2, trfrom3, trto3)
if rfind(sgar, arfrom) then
local arret = rsub(sgar, arfrom, arto)
local trret = sgtr
if rfind(sgtr, trfrom) then
trret = rsub(sgtr, trfrom, trto)
elseif trfrom2 and rfind(sgtr, trfrom2) then
trret = rsub(sgtr, trfrom2, trto2)
elseif trfrom3 and rfind(sgtr, trfrom3) then
trret = rsub(sgtr, trfrom3, trto3)
elseif not rfind(sgtr, BOGUS_CHAR) then
error("Transliteration '" .. sgtr .."' does not have same ending as Arabic '" .. sgar .. "'")
end
return arret .. "/" .. trret
else
return nil
end
end
if (num ~= "sg" or not isfem) and (word == "elf" or word == "cdf" or word == "intf" or word == "rf" or word == "f") then
error("Inference of form for inflection type '" .. word .. "' only allowed in singular feminine")
end
if num ~= "du" and word == "d" then
error("Inference of form for inflection type '" .. word .. "' only allowed in dual")
end
if num ~= "pl" and (word == "sfp" or word == "smp" or word == "awnp" or word == "cdp" or word == "sp" or word == "fp" or word == "p") then
error("Inference of form for inflection type '" .. word .. "' only allowed in plural")
end
local function is_intensive_adj(ar)
return rfind(ar, "^" .. CONS .. A .. CONS .. SK .. CONS .. AOPTA .. N .. UOPT .. "$") or
rfind(ar, "^" .. CONS .. A .. CONS .. SK .. AMAD .. N .. UOPT .. "$") or
rfind(ar, "^" .. CONS .. A .. CONS .. SH .. AOPTA .. N .. UOPT .. "$")
end
local function is_feminine_cd_adj(ar)
return pos == "adjective" and
(rfind(ar, "^" .. CONS .. A .. CONS .. SK .. CONS .. AOPTA .. HAMZA .. UOPT .. "$") or -- ʾḥamrāʾ/ʿamyāʾ/bayḍāʾ
rfind(ar, "^" .. CONS .. A .. CONS .. SH .. AOPTA .. HAMZA .. UOPT .. "$") -- laffāʾ
)
end
local function is_elcd_adj(ar)
return rfind(ar, ELCD_START .. SK .. CONS .. A .. CONS .. UOPT .. "$") or -- ʾabyaḍ "white", ʾakbar "greater"
rfind(ar, ELCD_START .. A .. CONS .. SH .. UOPT .. "$") or -- ʾalaff "plump", ʾaqall "fewer"
rfind(ar, ELCD_START .. SK .. CONS .. AAMAQ .. "$") or -- ʾaʿmā "blind", ʾadnā "lower"
rfind(ar, "^" .. AMAD .. CONS .. A .. CONS .. UOPT .. "$") -- ʾālam "more painful", ʾāḵar "other"
end
if word == "?" or
(rfind(word, "^[a-z][a-z]*$") and sgtype == "?") then
--if 'word' is a type, actual value inferred from sg; if sgtype is ?,
--propagate it to all derived types
return "", "?"
end
if word == "intf" then
if not is_intensive_adj(sgar) then
error("Singular stem not in CACCān form: " .. sgar)
end
local ret = (
sub(AMAD .. N .. UOPT .. "$", AMAD, "nu?$", "") or -- ends in -ʾān
sub(AOPTA .. N .. UOPT .. "$", AMAQ, "nu?$", "") -- ends in -ān
)
return ret, "inv"
end
if word == "elf" then
local ret = (
sub(ELCD_START .. SK .. "[" .. Y .. W .. "]" .. A .. CONSPAR .. UOPT .. "$",
"%1" .. UU .. "%2" .. AMAQ, "ʔa(.)[yw]a(.)u?", "%1ū%2ā") or -- ʾajyad
sub(ELCD_START .. SK .. CONSPAR .. A .. CONSPAR .. UOPT .. "$",
"%1" .. U .. "%2" .. SK .. "%3" .. AMAQ, "ʔa(.)(.)a(.)u?", "%1u%2%3ā") or -- ʾakbar
sub(ELCD_START .. A .. CONSPAR .. SH .. UOPT .. "$",
"%1" .. U .. "%2" .. SH .. AMAQ, "ʔa(.)a(.)%2u?", "%1u%2%2ā") or -- ʾaqall
sub(ELCD_START .. SK .. CONSPAR .. AAMAQ .. "$",
"%1" .. U .. "%2" .. SK .. Y .. ALIF, "ʔa(.)(.)ā", "%1u%2yā") or -- ʾadnā
sub("^" .. AMAD .. CONSPAR .. A .. CONSPAR .. UOPT .. "$",
HAMZA_ON_ALIF .. U .. "%1" .. SK .. "%2" .. AMAQ, "ʔā(.)a(.)u?", "ʔu%1%2ā") -- ʾālam "more painful", ʾāḵar "other"
)
if not ret then
error("Singular stem not an elative adjective: " .. sgar)
end
return ret, "inv"
end
if word == "cdf" then
local ret = (
sub(ELCD_START .. SK .. CONSPAR .. A .. CONSPAR .. UOPT .. "$",
"%1" .. A .. "%2" .. SK .. "%3" .. AA .. HAMZA, "ʔa(.)(.)a(.)u?", "%1a%2%3āʔ") or -- ʾaḥmar
sub(ELCD_START .. A .. CONSPAR .. SH .. UOPT .. "$",
"%1" .. A .. "%2" .. SH .. AA .. HAMZA, "ʔa(.)a(.)%2u?", "%1a%2%2āʔ") or -- ʾalaff
sub(ELCD_START .. SK .. CONSPAR .. AAMAQ .. "$",
"%1" .. A .. "%2" .. SK .. Y .. AA .. HAMZA, "ʔa(.)(.)ā", "%1a%2yāʔ") -- ʾaʿmā
)
if not ret then
error("Singular stem not a color/defect adjective: " .. sgar)
end
return ret, "cd" -- so plural will be correct
end
-- Regular feminine -- add ة, possibly with stem modifications
if word == "rf" then
sgar = canon_hamza(sgar)
if rfind(sgar, TAM .. UNUOPT .. "$") then
--Don't do this or we have problems when forming singulative from
--collective with a construct modifier that's feminine
--error("Singular stem is already feminine: " .. sgar)
return sgar .. "/" .. sgtr, "tri"
end
local ret = (
sub(AN .. "[" .. ALIF .. AMAQ .. "]$", AAH, "an$", "āh") or -- ends in -an
sub(IN .. "$", IY .. AH, "in$", "iya") or -- ends in -in
sub(AOPT .. "[" .. ALIF .. AMAQ .. "]$", AAH, "ā$", "āh") or -- ends in alif or alif maqṣūra
-- We separate the ʾiʿrāb and no-ʾiʿrāb cases even though we can
-- do a single Arabic regexp to cover both because we want to
-- remove u(n) from the translit only when ʾiʿrāb is present to
-- lessen the risk of removing -un in the actual stem. We also
-- allow for cases where the ʾiʿrāb is present in Arabic but not
-- in translit.
sub(UNU .. "$", AH, "un?$", "a", "$", "a") or -- anything else + -u(n)
sub("$", AH, "$", "a") -- anything else
)
return ret, "tri"
end
if word == "f" then
if sgtype == "cd" then
return export.stem_and_type("cdf", sg, sgtype, true, "sg", pos)
elseif sgtype == "el" then
return export.stem_and_type("elf", sg, sgtype, true, "sg", pos)
elseif sgtype =="di" and is_intensive_adj(sgar) then
return export.stem_and_type("intf", sg, sgtype, true, "sg", pos)
elseif sgtype == "di" and is_elcd_adj(sgar) then
-- If form is elative or color-defect, we don't know which of
-- the two it is, and each has a special feminine which isn't
-- the regular "just add ة", so shunt to unknown. This will
-- ensure that ?'s appear in place of the inflection -- also
-- for dual and plural.
return export.stem_and_type("?", sg, sgtype, true, "sg", pos)
else
return export.stem_and_type("rf", sg, sgtype, true, "sg", pos)
end
end
if word == "rm" then
sgar = canon_hamza(sgar)
--Don't do this or we have problems when forming collective from
--singulative with a construct modifier that's not feminine,
--e.g. شَجَرَة التُفَّاح
--if not rfind(sgar, TAM .. UNUOPT .. "$") then
-- error("Singular stem is not feminine: " .. sgar)
--end
local ret = (
sub(AAH .. UNUOPT .. "$", AN .. AMAQ, "ātun?$", "an", "ā[ht]$", "an") or -- in -āh
sub(IY .. AH .. UNUOPT .. "$", IN, "iyatun?$", "in", "iya$", "in") or -- ends in -iya
sub(AOPT .. TAM .. UNUOPT .. "$", "", "atun?$", "", "a$", "") or --ends in -a
sub("$", "", "$", "") -- do nothing
)
return ret, "tri"
end
if word == "m" then
-- FIXME: handle cd (color-defect)
-- FIXME: handle el (elative)
-- FIXME: handle int (intensive)
return export.stem_and_type("rm", sg, sgtype, false, "sg", pos)
end
-- The plural used for feminine adjectives. If the singular type is
-- color/defect or it looks like a feminine color/defect adjective,
-- use color/defect plural. Otherwise shunt to sound feminine plural.
if word == "fp" then
if sgtype == "cd" or is_feminine_cd_adj(sgar) then
return export.stem_and_type("cdp", sg, sgtype, true, "pl", pos)
else
return export.stem_and_type("sfp", sg, sgtype, true, "pl", pos)
end
end
if word == "sp" then
if sgtype == "cd" then
return export.stem_and_type("cdp", sg, sgtype, isfem, "pl", pos)
elseif isfem then
return export.stem_and_type("sfp", sg, sgtype, true, "pl", pos)
elseif sgtype == "an" then
return export.stem_and_type("awnp", sg, sgtype, false, "pl", pos)
else
return export.stem_and_type("smp", sg, sgtype, false, "pl", pos)
end
end
-- Conservative plural, as used for masculine plural adjectives.
-- If singular type is color-defect, shunt to color-defect plural; else
-- shunt to unknown, so ? appears in place of the inflections.
if word == "p" then
if sgtype == "cd" then
return export.stem_and_type("cdp", sg, sgtype, isfem, "pl", pos)
else
return export.stem_and_type("?", sg, sgtype, isfem, "pl", pos)
end
end
-- Special plural used for paucal plurals of singulatives. If ends in -ة
-- (most common), use strong feminine plural; if ends with -iyy (next
-- most common), use strong masculine plural; ends default to "p"
-- (conservative plural).
if word == "paucp" then
if rfind(sgar, TAM .. UNUOPT .. "$") then
return export.stem_and_type("sfp", sg, sgtype, true, "pl", pos)
elseif rfind(sgar, IY .. SH .. UNUOPT .. "$") then
return export.stem_and_type("smp", sg, sgtype, false, "pl", pos)
else
return export.stem_and_type("p", sg, sgtype, isfem, "pl", pos)
end
end
if word == "d" then
sgar = canon_hamza(sgar)
local ret = (
sub(AN .. "[" .. ALIF .. AMAQ .. "]$", AY .. AAN, "an$", "ayān") or -- ends in -an
sub(IN .. "$", IY .. AAN, "in$", "iyān") or -- ends in -in
sgtype == "lwinv" and sub(AOPTA .. "$", AT .. AAN, "[āa]$", "atān") or -- lwinv, ends in alif; allow translit with short -a
sub(AOPT .. "[" .. ALIF .. AMAQ .. "]$", AY .. AAN, "ā$", "ayān") or -- ends in alif or alif maqṣūra
-- We separate the ʾiʿrāb and no-ʾiʿrāb cases even though we can
-- do a single Arabic regexp to cover both because we want to
-- remove u(n) from the translit only when ʾiʿrāb is present to
-- lessen the risk of removing -un in the actual stem. We also
-- allow for cases where the ʾiʿrāb is present in Arabic but not
-- in translit.
--
-- NOTE: Collapsing the "h$" and "$" cases into "h?$" doesn't work
-- in the case of words ending in -āh, which end up having the
-- translit end in -tāntān.
sub(TAM .. UNU .. "$", T .. AAN, "[ht]un?$", "tān", "h$", "tān", "$", "tān") or -- ends in tāʾ marbuṭa + -u(n)
sub(TAM .. "$", T .. AAN, "h$", "tān", "$", "tān") or -- ends in tāʾ marbuṭa
-- Same here as above
sub(UNU .. "$", AAN, "un?$", "ān", "$", "ān") or -- anything else + -u(n)
sub("$", AAN, "$", "ān") -- anything else
)
return ret, "d"
end
-- Strong feminine plural in -āt, possibly with stem modifications
if word == "sfp" then
sgar = canon_hamza(sgar)
sgar = rsub(sgar, AMAD .. "(" .. TAM .. UNUOPT .. ")$", HAMZA_PH .. AA .. "%1")
sgar = rsub(sgar, HAMZA_ANY .. "(" .. AOPT .. TAM .. UNUOPT .. ")$", HAMZA_PH .. "%1")
local ret = (
sub(AOPTA .. TAM .. UNUOPT .. "$", AYAAT, "ā[ht]$", "ayāt", "ātun?$", "ayāt") or -- ends in -āh
sub(AOPT .. TAM .. UNUOPT .. "$", AAT, "a$", "āt", "atun?$", "āt") or -- ends in -a
sub(AN .. "[" .. ALIF .. AMAQ .. "]$", AYAAT, "an$", "ayāt") or -- ends in -an
sub(IN .. "$", IY .. AAT, "in$", "iyāt") or -- ends in -in
sgtype == "inv" and (
sub(AOPT .. "[" .. ALIF .. AMAQ .. "]$", AYAAT, "ā$", "ayāt") -- ends in alif or alif maqṣūra
) or
sgtype == "lwinv" and (
sub(AOPTA .. "$", AAT, "[āa]$", "āt") -- loanword ending in tall alif; allow translit with short -a
) or
-- We separate the ʾiʿrāb and no-ʾiʿrāb cases even though we can
-- do a single Arabic regexp to cover both because we want to
-- remove u(n) from the translit only when ʾiʿrāb is present to
-- lessen the risk of removing -un in the actual stem. We also
-- allow for cases where the ʾiʿrāb is present in Arabic but not
-- in translit.
sub(UNU .. "$", AAT, "un?$", "āt", "$", "āt") or -- anything else + -u(n)
sub("$", AAT, "$", "āt") -- anything else
)
return ret, "sfp"
end
if word == "smp" then
sgar = canon_hamza(sgar)
local ret = (
sub(IN .. "$", UUN, "in$", "ūn") or -- ends in -in
-- See comments above for why we have two cases, one for UNU and
-- one for non-UNU
sub(UNU .. "$", UUN, "un?$", "ūn", "$", "ūn") or -- anything else + -u(n)
sub("$", UUN, "$", "ūn") -- anything else
)
return ret, "smp"
end
-- Color/defect plural; singular must be masculine or feminine
-- color/defect adjective
if word == "cdp" then
local ret = (
sub(ELCD_START .. SK .. W .. A .. CONSPAR .. UOPT .. "$",
"%1" .. UU .. "%2", "ʔa(.)wa(.)u?", "%1ū%2") or -- ʾaswad
sub(ELCD_START .. SK .. Y .. A .. CONSPAR .. UOPT .. "$",
"%1" .. II .. "%2", "ʔa(.)ya(.)u?", "%1ī%2") or -- ʾabyaḍ
sub(ELCD_START .. SK .. CONSPAR .. A .. CONSPAR .. UOPT .. "$",
"%1" .. U .. "%2" .. SK .. "%3", "ʔa(.)(.)a(.)u?", "%1u%2%3") or -- ʾaḥmar
sub(ELCD_START .. A .. CONSPAR .. SH .. UOPT .. "$",
"%1" .. U .. "%2" .. SH, "ʔa(.)a(.)%2u?", "%1u%2%2") or -- ʾalaff
sub(ELCD_START .. SK .. CONSPAR .. AAMAQ .. "$",
"%1" .. U .. "%2" .. Y, "ʔa(.)(.)ā", "%1u%2y") or -- ʾaʿmā
sub("^" .. CONSPAR .. A .. W .. SKOPT .. CONSPAR .. AA .. HAMZA .. UOPT .. "$", "%1" .. UU .. "%2", "(.)aw(.)āʔu?", "%1ū%2") or -- sawdāʾ
sub("^" .. CONSPAR .. A .. Y .. SKOPT .. CONSPAR .. AA .. HAMZA .. UOPT .. "$", "%1" .. II .. "%2", "(.)ay(.)āʔu?", "%1ī%2") or -- bayḍāʾ
sub("^" .. CONSPAR .. A .. CONSPAR .. SK .. CONSPAR .. AA .. HAMZA .. UOPT .. "$", "%1" .. U .. "%2" .. SK .. "%3", "(.)a(.)(.)āʔu?", "%1u%2%3") or -- ʾḥamrāʾ/ʿamyāʾ
sub("^" .. CONSPAR .. A .. CONSPAR .. SH .. AA .. HAMZA .. UOPT .. "$", "%1" .. U .. "%2" .. SH, "(.)a(.)%2āʔu?", "%1u%2%2") -- laffāʾ
)
if not ret then
error("For 'cdp', singular must be masculine or feminine color/defect adjective: " .. sgar)
end
return ret, "tri"
end
if word == "awnp" then
local ret = (
sub(AN .. "[" .. ALIF .. AMAQ .. "]$", AWSK .. N, "an$", "awn") -- ends in -an
)
if not ret then
error("For 'awnp', singular must end in -an: " .. sgar)
end
return ret, "awnp"
end
return artr, export.detect_type(ar, isfem, num, pos)
end
-- need LRM here so multiple Arabic plurals end up agreeing in order with
-- the transliteration
local outersep = LRM .. "; "
local innersep = LRM .. "/"
-- Subfunction of show_form(), used to implement recursively generating
-- all combinations of elements from FORM and from each of the items in
-- LIST_OF_MODS, both of which are either arrays of strings or arrays of
-- arrays of strings, where the strings are in the form ARABIC/TRANSLIT,
-- as described in show_form(). TRAILING_ARTRMODS is an array of ARTRMOD
-- items, each of which is a two-element array of ARMOD (Arabic) and TRMOD
-- (transliteration), accumulating all of the suffixes generated so far
-- in the recursion process. Each time we recur we take the last MOD item
-- off of LIST_OF_MODS, separate each element in MOD into its Arabic and
-- Latin parts and to each Arabic/Latin pair we add all elements in
-- TRAILING_ARTRMODS, passing the newly generated list of ARTRMOD items
-- down the next recursion level with the shorter LIST_OF_MODS. We end up
-- returning a string to insert into the Wiki-markup table.
function show_form_1(form, list_of_mods, trailing_artrmods, use_parens)
if #list_of_mods == 0 then
local arabicvals = {}
local latinvals = {}
local parenvals = {}
-- Accumulate separately the Arabic and transliteration into
-- ARABICVALS and LATINVALS, then concatenate each down below.
-- However, if USE_PARENS, we put each transliteration directly
-- after the corresponding Arabic, in parens, and put the results
-- in PARENVALS, which get concatenated below. (This is used in the
-- title of the declension table.)
for _, artrmod in ipairs(trailing_artrmods) do
assert(#artrmod == 2)
local armod = artrmod[1]
local trmod = artrmod[2]
for _, subform in ipairs(form) do
local ar_span, tr_span
local ar_subspan, tr_subspan
local ar_subspans = {}
local tr_subspans = {}
if type(subform) ~= "table" then
subform = {subform}
end
for _, subsubform in ipairs(subform) do
local arabic, translit = split_arabic_tr(subsubform)
if arabic == "-" then
ar_subspan = "—"
tr_subspan = "—"
elseif arabic == "?" then
ar_subspan = "?"
tr_subspan = "?"
else
tr_subspan = (rfind(translit, BOGUS_CHAR) or rfind(trmod, BOGUS_CHAR)) and "?" or
require("Module:script utilities").tag_translit(translit .. trmod, lang, "default", 'style="color: var(--wikt-palette-grey-8,#888);"')
-- implement elision of al- after vowel
tr_subspan = rsub(tr_subspan, "([aeiouāēīōū][ %-])a([sšṣtṯṭdḏḍzžẓnrḷl]%-)", "%1%2")
tr_subspan = rsub(tr_subspan, "([aeiouāēīōū][ %-])a(llāh)", "%1%2")
ar_subspan = m_links.full_link({lang = lang, term = arabic .. armod, tr = "-"})
end
insert_if_not(ar_subspans, ar_subspan)
insert_if_not(tr_subspans, tr_subspan)
end
ar_span = table.concat(ar_subspans, innersep)
tr_span = table.concat(tr_subspans, innersep)
if use_parens then
table.insert(parenvals, ar_span .. " (" .. tr_span .. ")")
else
table.insert(arabicvals, ar_span)
table.insert(latinvals, tr_span)
end
end
end
if use_parens then
return table.concat(parenvals, outersep)
else
local arabic_span = table.concat(arabicvals, outersep)
local latin_span = table.concat(latinvals, outersep)
if arabic_span == "?" then
return "?"
else
return arabic_span .. "<br />" .. latin_span
end
end
else
local last_mods = table.remove(list_of_mods)
local artrmods = {}
for _, mod in ipairs(last_mods) do
if type(mod) ~= "table" then
mod = {mod}
end
for _, submod in ipairs(mod) do
local armod, trmod = split_arabic_tr(submod)
-- If the value is -, we need to create a blank entry
-- rather than skipping it; if we have no entries at any
-- level, then there will be no overall entries at all
-- because the inside of the loop at the next level will
-- never be executed.
if armod == "-" then
armod = ""
trmod = ""
end
if armod ~= "" then armod = ' ' .. armod end
if trmod ~= "" then trmod = ' ' .. trmod end
for _, trailing_artrmod in ipairs(trailing_artrmods) do
local trailing_armod = trailing_artrmod[1]
local trailing_trmod = trailing_artrmod[2]
armod = armod .. trailing_armod
trmod = trmod .. trailing_trmod
artrmod = {armod, trmod}
table.insert(artrmods, artrmod)
end
end
end
return show_form_1(form, list_of_mods, artrmods, use_parens)
end
end
-- Generate a string to substitute into a particular form in a Wiki-markup
-- table. FORM is the set of inflected forms corresponding to the base,
-- either an array of strings (referring e.g. to different possible plurals)
-- or an array of arrays of strings (the first level referring e.g. to
-- different possible plurals and the inner level referring typically to
-- hamza-spelling variants). LIST_OF_MODS is an array of MODS elements, one
-- per modifier. Each MODS element is the set of inflected forms corresponding
-- to the modifier and is of the same form as FORM, i.e. an array of strings
-- or an array of arrays of strings. Each string is typically of the form
-- "ARABIC/TRANSLIT", i.e. an Arabic string and a Latin string separated
-- by a slash. We loop over all possible combinations of elements from
-- each array; this requires recursion.
function show_form(form, list_of_mods, use_parens)
if not form then
return "—"
elseif type(form) ~= "table" then
error("a non-table value was given in the list of inflected forms.")
end
if #form == 0 then
return "—"
end
-- We need to start the recursion with the third parameter containing
-- one blank element rather than no elements, otherwise no elements
-- will be propagated to the next recursion level.
return show_form_1(form, list_of_mods, {{"", ""}}, use_parens)
end
-- Create a Wiki-markup table using the values in DATA and the template in
-- WIKICODE.
function make_table(data, wikicode)
-- Function used as replace arg of call to rsub(). Replace the
-- specified param with its (HTML) value. The param references appear
-- as {{{PARAM}}} in the wikicode.
local function repl(param)
if param == "pos" then
return data.pos
elseif param == "info" then
return data.title and " (" .. data.title .. ")" or ""
elseif rfind(param, "type$") then
return table.concat(data.forms[param] or {"—"}, outersep .. "<br>")
else
local list_of_mods = {}
for _, mod in ipairs(mod_list) do
local mods = data.forms[mod .. "_" .. param]
if not mods or #mods == 0 then
-- We need one blank element rather than no element,
-- otherwise no elements will be propagated from one
-- recursion level to the next.
mods = {""}
end
table.insert(list_of_mods, mods)
end
return show_form(data.forms[param], list_of_mods, param == "lemma")
end
end
-- For states not in the list of those to be displayed, clear out the
-- corresponding inflections so they appear as a dash.
for _, state in ipairs(data.allstates) do
if not contains(data.states, state) then
for _, numgen in ipairs(data.numgens()) do
for _, case in ipairs(data.allcases) do
data.forms[case .. "_" .. numgen .. "_" .. state] = {}
end
end
end
end
return rsub(wikicode, "{{{([a-z_]+)}}}", repl) .. m_utilities.format_categories(data.categories, lang)
end
-- Generate part of the noun table for a given number spec NUM (e.g. sg)
function generate_noun_num(num)
return [=[! indefinite
! definite
! construct
|-
! informal
| {{{inf_]=] .. num .. [=[_ind}}}
| {{{inf_]=] .. num .. [=[_def}}}
| {{{inf_]=] .. num .. [=[_con}}}
|-
! nominative
| {{{nom_]=] .. num .. [=[_ind}}}
| {{{nom_]=] .. num .. [=[_def}}}
| {{{nom_]=] .. num .. [=[_con}}}
|-
! accusative
| {{{acc_]=] .. num .. [=[_ind}}}
| {{{acc_]=] .. num .. [=[_def}}}
| {{{acc_]=] .. num .. [=[_con}}}
|-
! genitive
| {{{gen_]=] .. num .. [=[_ind}}}
| {{{gen_]=] .. num .. [=[_def}}}
| {{{gen_]=] .. num .. [=[_con}}}
]=]
end
-- Make the noun table
function make_noun_table(data)
local wikicode = mw.getCurrentFrame():expandTemplate{
title = 'inflection-table-top',
args = {
title = 'Declension of {{{pos}}} {{{lemma}}}',
tall = 'yes',
palette = "green",
category = 'declension',
class = 'tr-alongside', -- temp hack to prevent extra line break
}
}
for _, num in ipairs(data.numbers) do
if num == "du" then
wikicode = wikicode .. [=[|-
! class="outer" | dual
]=] .. generate_noun_num("du")
else
wikicode = wikicode .. [=[|-
! class="outer" rowspan=2 | ]=] .. data.engnumberscap[num] .. "\n" .. [=[
! class="outer" style="font-style:normal" colspan=3 | {{{]=] .. num .. [=[_type}}}
|-
]=] .. generate_noun_num(num)
end
end
wikicode = wikicode .. mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-bottom' }
return make_table(data, wikicode)
end
-- Generate part of the gendered-noun table for a given numgen spec
-- NUM (e.g. m_sg)
function generate_gendered_noun_num(num)
return [=[|-
! indefinite
! definite
! construct
! indefinite
! definite
! construct
|-
! informal
| {{{inf_m_]=] .. num .. [=[_ind}}}
| {{{inf_m_]=] .. num .. [=[_def}}}
| {{{inf_m_]=] .. num .. [=[_con}}}
| {{{inf_f_]=] .. num .. [=[_ind}}}
| {{{inf_f_]=] .. num .. [=[_def}}}
| {{{inf_f_]=] .. num .. [=[_con}}}
|-
! nominative
| {{{nom_m_]=] .. num .. [=[_ind}}}
| {{{nom_m_]=] .. num .. [=[_def}}}
| {{{nom_m_]=] .. num .. [=[_con}}}
| {{{nom_f_]=] .. num .. [=[_ind}}}
| {{{nom_f_]=] .. num .. [=[_def}}}
| {{{nom_f_]=] .. num .. [=[_con}}}
|-
! accusative
| {{{acc_m_]=] .. num .. [=[_ind}}}
| {{{acc_m_]=] .. num .. [=[_def}}}
| {{{acc_m_]=] .. num .. [=[_con}}}
| {{{acc_f_]=] .. num .. [=[_ind}}}
| {{{acc_f_]=] .. num .. [=[_def}}}
| {{{acc_f_]=] .. num .. [=[_con}}}
|-
! genitive
| {{{gen_m_]=] .. num .. [=[_ind}}}
| {{{gen_m_]=] .. num .. [=[_def}}}
| {{{gen_m_]=] .. num .. [=[_con}}}
| {{{gen_f_]=] .. num .. [=[_ind}}}
| {{{gen_f_]=] .. num .. [=[_def}}}
| {{{gen_f_]=] .. num .. [=[_con}}}
]=]
end
-- Make the gendered noun table
function make_gendered_noun_table(data)
local wikicode = mw.getCurrentFrame():expandTemplate{
title = 'inflection-table-top',
args = {
title = 'Declension of {{{pos}}} {{{lemma}}}',
tall = 'yes',
palette = "green",
category = 'declension',
class = 'tr-alongside', -- temp hack to prevent extra line break
}
}
for _, num in ipairs(data.numbers) do
if num == "du" then
wikicode = wikicode .. [=[|-
! class="outer" rowspan=2 | dual
! class="outer" colspan=3 | masculine
! class="outer" colspan=3 | feminine
]=] .. generate_gendered_noun_num("du")
else
wikicode = wikicode .. [=[|-
! class="outer" rowspan=3 | ]=] .. data.engnumberscap[num] .. "\n" .. [=[
! class="outer" colspan=3 | masculine
! class="outer" colspan=3 | feminine
|-
! class="outer" style="font-style:normal" colspan=3 | {{{m_]=] .. num .. [=[_type}}}
! class="outer" style="font-style:normal" colspan=3 | {{{f_]=] .. num .. [=[_type}}}
]=] .. generate_gendered_noun_num(num)
end
end
wikicode = wikicode .. mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-bottom' }
return make_table(data, wikicode)
end
-- Generate part of the adjective table for a given numgen spec NUM (e.g. m_sg)
function generate_adj_num(num)
return [=[|-
! indefinite
! definite
! indefinite
! definite
|-
! informal
| {{{inf_m_]=] .. num .. [=[_ind}}}
| {{{inf_m_]=] .. num .. [=[_def}}}
| {{{inf_f_]=] .. num .. [=[_ind}}}
| {{{inf_f_]=] .. num .. [=[_def}}}
|-
! nominative
| {{{nom_m_]=] .. num .. [=[_ind}}}
| {{{nom_m_]=] .. num .. [=[_def}}}
| {{{nom_f_]=] .. num .. [=[_ind}}}
| {{{nom_f_]=] .. num .. [=[_def}}}
|-
! accusative
| {{{acc_m_]=] .. num .. [=[_ind}}}
| {{{acc_m_]=] .. num .. [=[_def}}}
| {{{acc_f_]=] .. num .. [=[_ind}}}
| {{{acc_f_]=] .. num .. [=[_def}}}
|-
! genitive
| {{{gen_m_]=] .. num .. [=[_ind}}}
| {{{gen_m_]=] .. num .. [=[_def}}}
| {{{gen_f_]=] .. num .. [=[_ind}}}
| {{{gen_f_]=] .. num .. [=[_def}}}
]=]
end
-- Make the adjective table
function make_adj_table(data)
local wikicode = mw.getCurrentFrame():expandTemplate{
title = 'inflection-table-top',
args = {
title = 'Declension of {{{pos}}} {{{lemma}}}',
tall = 'yes',
palette = "green",
category = 'declension',
class = 'tr-alongside', -- temp hack to prevent extra line break
}
}
if contains(data.numbers, "sg") then
wikicode = wikicode .. [=[|-
! class="outer" rowspan=3 | singular
! class="outer" colspan=2 | masculine
! class="outer" colspan=2 | feminine
|-
! class="outer" style="font-style:normal" colspan=2 | {{{m_sg_type}}}
! class="outer" style="font-style:normal" colspan=2 | {{{f_sg_type}}}
]=] .. generate_adj_num("sg")
end
if contains(data.numbers, "du") then
wikicode = wikicode .. [=[|-
! class="outer" rowspan=2 | dual
! class="outer" colspan=2 | masculine
! class="outer" colspan=2 | feminine
]=] .. generate_adj_num("du")
end
if contains(data.numbers, "pl") then
wikicode = wikicode .. [=[|-
! class="outer" rowspan=3 | plural
! class="outer" colspan=2 | masculine
! class="outer" colspan=2 | feminine
|-
! class="outer" style="font-style:normal" colspan=2 | {{{m_pl_type}}}
! class="outer" style="font-style:normal" colspan=2 | {{{f_pl_type}}}
]=] .. generate_adj_num("pl")
end
wikicode = wikicode .. mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-bottom' }
return make_table(data, wikicode)
end
return export
-- For Vim, so we get 4-space tabs
-- vim: set ts=4 sw=4 noet:
6y30719qp23rwl6phtl5bn8lc8lkw5a
Module:ar-utilities
828
8171
27706
2026-06-21T14:59:12Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local m_utilities = require("Module:utilities") local lang = require("Module:languages").getByCode("ar") local sc = require("Module:scripts").getByCode("Arab") local rfind = mw.ustring.find local rsubn = mw.ustring.gsub local u = require("Module:string/char") local consonants = "[بتثجحخدذرزسشصضطظعغقفلكمنهويء]" -- version of rsubn() that discards all but the first return value function export.rsub(term, foo, bar)...'
27706
Scribunto
text/plain
local export = {}
local m_utilities = require("Module:utilities")
local lang = require("Module:languages").getByCode("ar")
local sc = require("Module:scripts").getByCode("Arab")
local rfind = mw.ustring.find
local rsubn = mw.ustring.gsub
local u = require("Module:string/char")
local consonants = "[بتثجحخدذرزسشصضطظعغقفلكمنهويء]"
-- version of rsubn() that discards all but the first return value
function export.rsub(term, foo, bar)
local retval = rsubn(term, foo, bar)
return retval
end
local rsub = export.rsub
-- synthesize a frame so that exported functions meant to be called from
-- templates can be called from the debug console.
function export.debug_frame(parargs, args)
return { args = args, getParent = function() return { args = parargs } end }
end
function export.catfix()
return m_utilities.catfix(lang, sc)
end
--------------------------- diacritics, letters and combinations ------------------------------
-- hamza variants
local HAMZA = u(0x0621) -- hamza on the line (stand-alone hamza) = ء
local HAMZA_ON_ALIF = u(0x0623)
local HAMZA_ON_WAW = u(0x0624)
local HAMZA_UNDER_ALIF = u(0x0625)
local HAMZA_ON_YA = u(0x0626)
local HAMZA_PH = u(0xFFF0) -- hamza placeholder
export.HAMZA = HAMZA
export.HAMZA_ON_ALIF = HAMZA_ON_ALIF
export.HAMZA_ON_WAW = HAMZA_ON_WAW
export.HAMZA_UNDER_ALIF = HAMZA_UNDER_ALIF
export.HAMZA_ON_YA = HAMZA_ON_YA
export.HAMZA_PH = HAMZA_PH
-- diacritics
local A = u(0x064E) -- fatḥa
local AN = u(0x064B) -- fatḥatān (fatḥa tanwīn)
local U = u(0x064F) -- ḍamma
local UN = u(0x064C) -- ḍammatān (ḍamma tanwīn)
local I = u(0x0650) -- kasra
local IN = u(0x064D) -- kasratān (kasra tanwīn)
local SK = u(0x0652) -- sukūn = no vowel
local SH = u(0x0651) -- šadda = gemination of consonants
local DAGGER_ALIF = u(0x0670)
-- Pattern matching any diacritics that may be on a consonant other than shadda
local DIACRITIC_ANY_BUT_SH = "[" .. A .. I .. U .. AN .. IN .. UN .. SK .. DAGGER_ALIF .. "]"
-- Pattern matching short vowels
local AIU = "[" .. A .. I .. U .. "]"
-- Pattern matching any diacritics that may be on a consonant
local DIACRITIC = SH .. "?" .. DIACRITIC_ANY_BUT_SH
export.A = A
export.AN = AN
export.U = U
export.UN = UN
export.I = I
export.IN = IN
export.SK = SK
export.SH = SH
export.DAGGER_ALIF = DAGGER_ALIF
-- Pattern matching any diacritics that may be on a consonant other than shadda
export.DIACRITIC_ANY_BUT_SH = DIACRITIC_ANY_BUT_SH
-- Pattern matching short vowels
export.AIU = AIU
-- Pattern matching any diacritics that may be on a consonant
export.DIACRITIC = DIACRITIC
-- various letters and signs
local ALIF = u(0x0627) -- ʾalif = ا
local ALIF_WASLA = u(0x0671) -- ʾalif waṣla = hamzatu l-waṣl = ٱ
local AMAQ = u(0x0649) -- ʾalif maqṣūra = ى
local AMAD = u(0x0622) -- ʾalif madda = آ
local TAM = u(0x0629) -- tāʾ marbūṭa = ة
local WAW = u(0x0648) -- wāw = و
local W = WAW
local YA = u(0x064A) -- yā = ي
local Y = YA
local T = u(0x062A) -- tāʾ = ت
local HYPHEN = u(0x0640)
local N = u(0x0646) -- nūn = ن
local LRM = u(0x200E) -- left-to-right mark
export.ALIF = ALIF
export.ALIF_WASLA = ALIF_WASLA
export.AMAQ = AMAQ
export.AMAD = AMAD
export.TAM = TAM
export.WAW = WAW
export.W = W
export.YA = YA
export.Y = Y
export.T = T
export.HYPHEN = HYPHEN
export.N = N
export.LRM = LRM
-- common combinations
local AW = A .. W -- diphthong, construct state of some final-weak nouns, 3sm past of some final-weak verbs, etc.
local AY = A .. Y -- diphthong, construct state of most final-weak nouns, 3sm past of most final-weak verbs, etc.
local IY = I .. Y -- equivalent to long ī
local UW = U .. W -- equivalent to long ū
local AA = A .. ALIF -- long ā
local AAMAQ = A .. AMAQ -- vocalized ʾalif maqṣūra
local II = IY -- long ī
local IIN = IY .. N -- short strong masculine oblique plural ending
local IINA = IIN .. A -- full strong msaculine oblique plural ending
local UU = UW -- long ū
local UUN = UU .. N -- short strong masculine nominative plural ending
local UUNA = UUN .. A -- full strong masculine nominative plural ending
local AWN = AW .. SK .. N -- short verbal ending of some final-weak verbs
local AWNA = AWN .. A -- full verbal ending of some final-weak verbs
local AYN = AY .. SK .. N -- short oblique dual ending, verbal ending of some final-weak verbs
local AYNI = AYN .. I -- full oblique dual ending
local AYNA = AYN .. A -- full verbal ending of some final-weak verbs
local AAN = AA .. N -- short nominative dual ending
local AANI = AAN .. I -- full nominative dual ending
local UNU = "[" .. UN .. U .. "]" -- matches nominative singular of strong masculine triptotes and diptotes
local UNUOPT = UNU .. "?" -- optional equivalent of UNU, for short forms
local AH = A .. TAM -- feminine ending
local AAH = AA .. TAM -- final-weak feminine ending
local AAT = AA .. T -- short strong feminine plural ending
local AATUN = AAT .. UN -- full strong nominative feminine plural ending
local IYAH = I .. Y .. AH -- ending of some final-weak feminines
local AYAAT = AY .. AAT -- final-weak plural ending
local AYAAN = AY .. AAN -- final-weak dual ending
local IYAAT = IY .. AAT -- final-weak plural ending
local IYAAN = IY .. AAN -- final-weak dual ending
local IYY = IY .. SH -- masculine nisba ending
local IYYAH = IY .. SH .. AH -- feminine nisba ending
local ATAAN = A .. T .. AAN -- feminine dual ending
local AATAAN = AAT .. AAN -- final-weak feminine dual ending
-- other possibilities (currently found in verb module):
-- AT, AYSK, AWSK, N, NA, NI, M, MA, MU, TA, TU, _I = ALIF .. I, _U = ALIF .. U
export.AW = AW
export.AY = AY
export.IY = IY
export.UW = UW
export.AA = AA
export.AAMAQ = AAMAQ
export.II = II
export.IIN = IIN
export.IINA = IINA
export.UU = UU
export.UUN = UUN
export.UUNA = UUNA
export.AWN = AWN
export.AWNA = AWNA
export.AYN = AYN
export.AYNI = AYNI
export.AYNA = AYNA
export.AAN = AAN
export.AANI = AANI
export.UNU = UNU
export.UNUOPT = UNUOPT
export.AH = AH
export.AAH = AAH
export.AAT = AAT
export.AATUN = AATUN
export.IYAH = IYAH
export.AYAAT = AYAAT
export.AYAAN = AYAAN
export.IYAAT = IYAAT
export.IYAAN = IYAAN
export.IYY = IYY
export.IYYAH = IYYAH
export.ATAAN = ATAAN
export.AATAAN = AATAAN
function export.reorder_shadda(text)
-- shadda+short-vowel (including tanwīn vowels, i.e. -an -in -un) gets
-- replaced with short-vowel+shadda during NFC normalisation, which
-- MediaWiki does for all Unicode strings; however, it makes the
-- detection process inconvenient, so undo it. (For example, the code in
-- remove_in would fail to detect the -in in مُتَرَبٍّ because the shadda
-- would come after the -in.)
text = rsub(text, "(" .. DIACRITIC_ANY_BUT_SH .. ")" .. SH, SH .. "%1")
return text
end
function export.undo_reorder_shadda(text)
return mw.ustring.toNFC(text)
end
--------------------------- hamza processing ------------------------------
local hamza_subs = {
--------------------------- handle initial hamza --------------------------
-- put initial hamza on a seat according to following vowel.
{ "^" .. HAMZA_PH .. "([" .. I .. YA .. "])", HAMZA_UNDER_ALIF .. "%1" },
{ " " .. HAMZA_PH .. "([" .. I .. YA .. "])", " " .. HAMZA_UNDER_ALIF .. "%1" },
{ "^" .. HAMZA_PH, HAMZA_ON_ALIF }, -- if no vowel, assume a
{ " " .. HAMZA_PH, " " .. HAMZA_ON_ALIF }, -- if no vowel, assume a
----------------------------- handle final hamza --------------------------
-- "final" hamza may be followed by a short vowel or tanwīn sequence
-- use a previous short vowel to get the seat
{ "(" .. AIU .. ")(" .. HAMZA_PH .. ")(" .. DIACRITIC .. "?)$",
function(v, ham, diacrit)
ham = v == I and HAMZA_ON_YA or v == U and HAMZA_ON_WAW or HAMZA_ON_ALIF
return v .. ham .. diacrit
end
},
{ "(" .. AIU .. ")(" .. HAMZA_PH .. ")(" .. DIACRITIC .. "? )",
function(v, ham, diacrit)
ham = v == I and HAMZA_ON_YA or v == U and HAMZA_ON_WAW or HAMZA_ON_ALIF
return v .. ham .. diacrit
end
},
-- else hamza is on the line
{ HAMZA_PH .. "(" .. DIACRITIC .. "?)$", HAMZA .. "%1" },
---------------------------- handle medial hamza --------------------------
-- if long vowel or diphthong precedes, we need to ignore it.
{ "([" .. AMAD .. ALIF .. WAW .. YA .. "]" .. SK .. "?)(" .. HAMZA_PH .. ")(" .. SH .. "?)([^ ])",
function(prec, ham, shad, v2)
ham = (v2 == I or v2 == YA) and HAMZA_ON_YA or
(v2 == U or v2 == WAW) and HAMZA_ON_WAW or
rfind(prec, YA) and HAMZA_ON_YA or
HAMZA
return prec .. ham .. shad .. v2
end
},
-- otherwise, seat of medial hamza relates to vowels on one or both sides.
{ "([^ ])(" .. HAMZA_PH .. ")(" .. SH .. "?)(" .. AN .. "?[^ ])",
function(v1, ham, shad, v2)
ham = (v1 == I or v2 == I or v2 == YA) and HAMZA_ON_YA or
(v1 == U or v2 == U or v2 == WAW) and HAMZA_ON_WAW or
-- special exception for the accusative ending, in words like
-- جُزْءًا (juzʾan). By the rules of Thackston pp. 281-282 a
-- hamza-on-alif should appear, but that would result in
-- two alifs in a row, which is generally forbidden.
-- According to Haywood/Nahmad pp. 114-115, after sukūn before
-- the accusative ending (including when a pronominal suffix
-- follows) hamza is written on yāʾ if the previous letter
-- is connecting, else on the line. The only examples they
-- give involve preceding non-connecting z (جُزْءًا juzʾan and
-- (جُزْءَهُ juzʾahu) and preceding diphthongs, with the only
-- connecting letter being yāʾ, where we have hamza-on-yāʾ
-- anyway by the preceding regexp. Haywood/Nahmad's rule seems
-- too complicated, and since it conflicts with Thackston,
-- we only implement the case where otherwise two alifs would
-- appear with the indefinite accusative ending.
v2 == AN .. ALIF and HAMZA or
HAMZA_ON_ALIF
return v1 .. ham .. shad .. v2
end
},
--------------------------- handle alif madda -----------------------------
{ HAMZA_ON_ALIF .. A .. "?" .. ALIF, AMAD },
----------------------- catch any remaining hamzas ------------------------
{ HAMZA_PH, HAMZA }
}
function export.process_hamza(term)
-- convert HAMZA_PH into appropriate hamza seat
for _, sub in ipairs(hamza_subs) do
term = rsub(term, sub[1], sub[2])
end
-- sequence of hamza-on-wāw + wāw is problematic and leads to a preferred
-- alternative with some other type of hamza, as well as the original
-- sequence; sequence of wāw + hamza-on-wāw + wāw is especially problematic
-- and leads to two different alternatives with the original sequence not
-- one of them
if rfind(term, WAW .. "ؤُو") then
return { rsub(term, WAW .. "ؤُو", WAW .. "ئُو"), rsub(term, WAW .. "ؤُو", WAW .. "ءُو") }
elseif rfind(term, YA .. "ؤُو") then
return { rsub(term, YA .. "ؤُو", YA .. "ئُو"), term }
elseif rfind(term, ALIF .. "ؤُو") then
-- Here John Mace "Arabic Verbs" is inconsistent. In past-tense parts,
-- the preferred alternative has hamza on the line, whereas in
-- non-past parts the preferred alternative has hamza-on-yāʾ even
-- though the sequence of vowels is identical. It's too complicated to
-- propagate information about tense through to here so pick one.
return { rsub(term, ALIF .. "ؤُو", ALIF .. "ئُو"), term }
-- no alternative spelling in sequence of U/A + hamza-on-wāw + U + wāw;
-- sequence of I + hamza-on-wāw + U + wāw does not occur (has
-- hamza-on-yāʾ instead)
else
return { term }
end
end
----------------------------------- misc junk ---------------------------------
-- Used in {{ar-adj-in}} so that we can specify a full lemma rather than
-- requiring the user to truncate the -in ending. FIXME: Move ar-adj-in
-- into Lua.
function export.remove_in(frame)
local lemma = frame.args[1] or error("Lemma required.")
return rsub(export.reorder_shadda(lemma), IN .. "$", "")
end
-- Used in {{ar-adj-an}} so that we can specify a full lemma rather than
-- requiring the user to truncate the -an ending. FIXME: Move ar-adj-an
-- into Lua.
function export.remove_an(frame)
local lemma = frame.args[1] or error("Lemma required.")
return rsub(export.reorder_shadda(lemma), AN .. AMAQ .. "$", "")
end
-- Compare two words and find the alternation pattern (vowel changes, prefixes, suffixes etc.)
-- Still a WIP, doesn't work correctly yet.
function export.find_pattern(word1, word2)
return nil
end
function export.etymology(frame)
local text, categories = {}, {}
local linkText
local frame_params = {
[1] = { required = true },
}
local frame_args = require("Module:parameters").process(frame.args, frame_params)
local anchor = frame_args[1]
local data = {
["color adjective"] = {
anchor = "Color or defect adjectives",
text = "color adjective",
categories = { "color/defect adjectives" },
},
["defect adjective"] = {
anchor = "Color or defect adjectives",
text = "defect adjective",
categories = { "color/defect adjectives" },
},
}
local params = {
[1] = {},
["nocat"] = { type = "boolean", default = false },
["lc"] = { type = "boolean", default = false },
["nocap"] = { alias_of = "lc" },
["notext"] = { type = "boolean", default = false },
}
local args = require("Module:parameters").process(frame:getParent().args, params)
if anchor and data[anchor] then
local data = data[anchor]
anchor = data.anchor or error('The data table does not include an anchor for "' .. anchor .. '".')
linkText = data.text or error('The data table does not include link text for "' .. anchor .. '".')
if not args.lc then
linkText = rsubn(linkText, "^%a", function(a) return mw.ustring.upper(a) end)
end
if not args.notext then
table.insert(text, "[[Appendix:Arabic nominals#" .. anchor .. "|" .. linkText .. "]]")
end
if not args.nocat then
table.insert(categories, m_utilities.format_categories(data.categories, lang))
end
else
error('The anchor "' .. tostring(anchor) .. '" is not found in the list of anchors.')
end
return table.concat(text) .. table.concat(categories)
end
return export
b5yp8g0vr2gkhegct05upo1dfcwoet9
Module:anchors
828
8172
27707
2026-06-21T15:00:35Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local string_utilities_module = "Module:string utilities" local anchor_encode = mw.uri.anchorEncode local concat = table.concat local insert = table.insert local language_anchor -- Defined below. local function decode_entities(...) decode_entities = require(string_utilities_module).decode_entities return decode_entities(...) end local function encode_entities(...) encode_entities = require(string_utilities_module).encode_entities return...'
27707
Scribunto
text/plain
local export = {}
local string_utilities_module = "Module:string utilities"
local anchor_encode = mw.uri.anchorEncode
local concat = table.concat
local insert = table.insert
local language_anchor -- Defined below.
local function decode_entities(...)
decode_entities = require(string_utilities_module).decode_entities
return decode_entities(...)
end
local function encode_entities(...)
encode_entities = require(string_utilities_module).encode_entities
return encode_entities(...)
end
-- Returns the anchor text to be used as the fragment of a link to a language section.
function export.language_anchor(lang, id)
return anchor_encode(lang:getFullName() .. ": " .. id)
end
language_anchor = export.language_anchor
-- Normalizes input text (removes formatting etc.), which can then be used as an anchor in an `id=` field.
function export.normalize_anchor(str)
return decode_entities(anchor_encode(str))
end
function export.make_anchors(ids)
local anchors = {}
for i = 1, #ids do
local id = ids[i]
local el = mw.html.create("span")
:addClass("template-anchor")
:attr("id", anchor_encode(id))
:attr("data-id", id)
insert(anchors, tostring(el))
end
return concat(anchors)
end
function export.senseid(lang, id, tag_name)
-- The following tag is opened but never closed, where is it supposed to be closed?
-- with <li> it doesn't matter, as it is closed automatically.
-- with <p> it is a problem
-- Cannot use mw.html here as it always closes tags
return "<" .. tag_name .. " class=\"senseid\" id=\"" .. language_anchor(lang, id) .. "\" data-lang=\"" .. lang:getCode() .. "\" data-id=\"" .. encode_entities(id) .. "\">"
end
function export.etymid(lang, id)
-- Use a <ul> tag to ensure spacing doesn't get messed up.
local el = mw.html.create("ul")
:addClass("etymid")
:attr("id", language_anchor(lang, id))
:attr("data-lang", lang:getCode())
:attr("data-id", id)
return tostring(el)
end
function export.etymonid(lang, id, opts)
opts = opts or {}
-- Use a <ul> tag to ensure spacing doesn't get messed up.
local el = mw.html.create("ul")
:addClass("etymonid")
:attr("data-lang", lang:getCode())
if id then
el:attr("id", language_anchor(lang, id))
el:attr("data-id", id)
end
if opts.no_tree then
el:attr("data-no-tree", "1")
end
if opts.title then
el:attr("data-title", opts.title)
end
if opts.empty_tree then
el:attr("data-empty-tree", "1")
end
if opts.ety_tree_json then
el:attr("data-ety-tree-json", opts.ety_tree_json)
end
return tostring(el)
end
return export
2o0jqln8y0bxvpfhe66eogs2k88qkap
Module:alternative forms
828
8173
27708
2026-06-21T15:01:33Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local labels_module = "Module:labels" local links_module = "Module:links" local parameter_utilities_module = "Module:parameter utilities" local function track(page) require("Module:debug/track")("alter/" .. page) end --[==[ Main function for displaying alternative forms. Extracted out from the template-callable function so this can be called by other modules (in particular, [[Module:descendants tree]]). `allow_self_link` causes terms the sam...'
27708
Scribunto
text/plain
local export = {}
local labels_module = "Module:labels"
local links_module = "Module:links"
local parameter_utilities_module = "Module:parameter utilities"
local function track(page)
require("Module:debug/track")("alter/" .. page)
end
--[==[
Main function for displaying alternative forms. Extracted out from the template-callable function so this can be
called by other modules (in particular, [[Module:descendants tree]]). `allow_self_link` causes terms the same as the
pagename to be shown normally; otherwise they are displayed unlinked. `default_separator` controls the separator between
terms when the user didn't use a special separator term like ";" (defaulting to ", ").
]==]
function export.display_alternative_forms(parent_args, pagename, allow_self_link, default_separator)
local params = {
[1] = {required = true, type = "language", default = "en"},
[2] = {list = true, allow_holes = true},
}
local m_param_utils = require(parameter_utilities_module)
local param_mods = m_param_utils.construct_param_mods {
{group = {"link", "ref"}},
-- For compatibility, we need to turn off separate_no_index for q= and qq=.
{group = "q", separate_no_index = false},
-- We currently don't support unindexed l= and ll=.
{group = "l", require_index = true},
}
local items, args = m_param_utils.parse_list_with_inline_modifiers_and_separate_params {
params = params,
param_mods = param_mods,
raw_args = parent_args,
termarg = 2,
parse_lang_prefix = true,
track_module = "alter",
lang = 1,
sc = "sc.default",
stop_when = function(data)
local stop = not data.any_param_at_index
if stop and parent_args[data.orig_index + 1] == nil then
track("actual hole in params")
end
return stop
end,
default_separator = default_separator,
}
if not items[1] then
error("No items found!")
end
local lang = args[1]
local raw_labels = {}
-- Extract the labels and make sure none are blank or omitted.
local last_item_index = items[#items].orig_index
if last_item_index < args[2].maxindex then
for i = last_item_index + 2, args[2].maxindex do
if not args[2][i] then
-- Indices in i start at 1 but parameters start at 2 to add 1 to shown index.
error("Missing/blank item not allowed in [[Template:alt]] labels, but saw such an item in parameter "
.. (i + 1))
end
table.insert(raw_labels, args[2][i])
end
end
-- Make sure there aren't property parameters after the last item (i.e. corresponding to labels).
for k, v in pairs(args) do
-- Look for named list parameters. We check:
-- (1) key is a string (excludes the term param, which is a number);
-- (2) value is a table, i.e. a list;
-- (3) v.maxindex is set (i.e. allow_holes was used);
-- (4) v.maxindex is past the index of the last term.
if type(k) == "string" and type(v) == "table" and v.maxindex and v.maxindex > last_item_index then
local set_values = {}
for i = last_item_index + 1, v.maxindex do
if v[i] then
table.insert(set_values, i)
end
end
error(("Extraneous values for %s= (set at position%s %s)"):format(k, #set_values > 1 and "s" or "",
table.concat(set_values, ",")))
end
end
if not allow_self_link then
-- If the to-be-linked term is the same as the pagename, display it unlinked.
for _, item in ipairs(items) do
if not item.term and lang:stripDiacritics(item.term) == pagename then
track("term is pagename")
item.alt = item.alt or item.term
item.term = nil
end
end
end
local labels
if #raw_labels > 0 then
labels = require(labels_module).process_raw_labels { labels = raw_labels, lang = lang, nocat = true }
end
local parts = {}
local function ins(part)
table.insert(parts, part)
end
-- Construct the final output.
-- First the items, including separators, left and right regular qualifiers and left and right per-item labels.
for _, item in ipairs(items) do
ins(item.separator)
local text = require(links_module).full_link(item, nil, allow_self_link, "show qualifiers")
ins(text)
end
-- If there are labels, construct them now and append to final output.
if labels then
if lang:hasTranslit() then
ins(" — " .. require(labels_module).format_processed_labels {
labels = labels, lang = lang
})
else
ins(" " .. require(labels_module).format_processed_labels {
labels = labels, lang = lang, open = "(", close = ")"
})
end
end
return table.concat(parts)
end
--[==[
Template-callable function for displaying alternative forms.
]==]
function export.create(frame)
local parent_args = frame:getParent().args
return export.display_alternative_forms(parent_args, mw.loadData("Module:headword/data").pagename)
end
return export
mhx02egmbnof8g4pu8dgdyw5qclxj2k
Module:anagrams
828
8174
27709
2026-06-22T06:36:31Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local m_links = require("Module:links") local export = {} function export.show(frame) local params = { [1] = {required = true, type = "full language", default = "und"}, [2] = {required = true, default = "anagram", list = true}, ["a"] = true, } local args = require("Module:parameters").process(frame:getParent().args, params) for i, val in ipairs(args[2]) do args[2][i] = m_links.full_link({lang = args[1], term = val}) end return table.concat(ar...'
27709
Scribunto
text/plain
local m_links = require("Module:links")
local export = {}
function export.show(frame)
local params = {
[1] = {required = true, type = "full language", default = "und"},
[2] = {required = true, default = "anagram", list = true},
["a"] = true,
}
local args = require("Module:parameters").process(frame:getParent().args, params)
for i, val in ipairs(args[2]) do
args[2][i] = m_links.full_link({lang = args[1], term = val})
end
return table.concat(args[2], ", ")
end
return export
cbd4ykft3oy42d5o4i3uazieynl3v5a
Module:ar-link
828
8175
27710
2026-06-22T06:37:34Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local U = require("Module:string/char") -- Derived from Arabic data table in [[Module:languages/data/2]]. local entry_name_replacements = { [U(0x0671)] = U(0x0627), [U(0x064B)] = "", [U(0x064C)] = "", [U(0x064D)] = "", [U(0x064E)] = "", [U(0x064F)] = "", [U(0x0650)] = "", [U(0x0651)] = "", [U(0x0652)] = "", [U(0x0670)] = "", [U(0x0640)] = "", } local function make_entry_name(text) return (text:gsub("[%z\1-\127\194-\244][\128-\191]*", entr...'
27710
Scribunto
text/plain
local export = {}
local U = require("Module:string/char")
-- Derived from Arabic data table in [[Module:languages/data/2]].
local entry_name_replacements = {
[U(0x0671)] = U(0x0627), [U(0x064B)] = "", [U(0x064C)] = "", [U(0x064D)] = "",
[U(0x064E)] = "", [U(0x064F)] = "", [U(0x0650)] = "", [U(0x0651)] = "",
[U(0x0652)] = "", [U(0x0670)] = "", [U(0x0640)] = "",
}
local function make_entry_name(text)
return (text:gsub("[%z\1-\127\194-\244][\128-\191]*", entry_name_replacements))
end
local function link(entry, text)
return '<span class="Arab" lang="ar">[['
.. make_entry_name(entry)
.. '#Arabic|' .. (text or entry) .. ']]</span>‎'
end
function export.link(frame)
local text = frame.args[1]
if not text then
return nil
end
local transliterate = require("Module:memoize")(require "Module:ar-translit".tr)
local open_paren = ' <span class="mention-gloss-paren annotation-paren">(</span><span class="tr Latn" xml:lang="ar-Latn" lang="ar-Latn">'
local close_paren = '</span><span class="mention-gloss-paren annotation-paren">)</span>'
return (text
:gsub(
"%[%[([^%]]+)%]%]",
function (link_text)
local entry, text = link_text:match("^([^|]+)|(.+)$")
entry, text = entry or link_text, text or link_text
local translit = transliterate(text)
if translit then
return link(entry, text)
.. open_paren .. translit .. close_paren
else
return link(link_text)
end
end))
end
return export
0mw6bj6ewmpvyx16aqly4rvowarxafg
Module:category link/templates
828
8176
27711
2026-06-22T06:38:26Z
Umarxon III
2840
Sahypa döretdi, mazmuny: '-- Prevent substitution. if mw.isSubsting() then return require("Module:unsubst") end local make_link = require("Module:category link").make_link local process_params = require("Module:parameters").process local unpack = unpack or table.unpack -- Lua 5.2 compatibility local export = {} function export.category_t(frame) return make_link(unpack(process_params(frame:getParent().args, { [1] = {required = true, allow_empty = true, no_trim = true}, [2] = {allo...'
27711
Scribunto
text/plain
-- Prevent substitution.
if mw.isSubsting() then
return require("Module:unsubst")
end
local make_link = require("Module:category link").make_link
local process_params = require("Module:parameters").process
local unpack = unpack or table.unpack -- Lua 5.2 compatibility
local export = {}
function export.category_t(frame)
return make_link(unpack(process_params(frame:getParent().args, {
[1] = {required = true, allow_empty = true, no_trim = true},
[2] = {allow_empty = true, no_trim = true},
})))
end
return export
99l3255tcq0j0mvel3osy2rm41dv30q
Module:doublet table
828
8177
27712
2026-06-22T06:39:14Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local get_by_name = require("Module:languages").getByCanonicalName local m_links = require("Module:links") local auto_subtable = require("Module:auto-subtable") local langs = require("Module:languages/cache") local function quote(word) return "“" .. word .. "”" end local function trim(word) return string.match(word, "%s*(.-)%s*$") end local link local function make_link(lang, qualifier) return function(word) return link(word, lan...'
27712
Scribunto
text/plain
local export = {}
local get_by_name = require("Module:languages").getByCanonicalName
local m_links = require("Module:links")
local auto_subtable = require("Module:auto-subtable")
local langs = require("Module:languages/cache")
local function quote(word)
return "“" .. word .. "”"
end
local function trim(word)
return string.match(word, "%s*(.-)%s*$")
end
local link
local function make_link(lang, qualifier)
return function(word)
return link(word, lang, qualifier)
end
end
-- Create a link out of `word` (which may be multipart, with the parts separated by a slash or by "and") in language
-- `lang`, with non-gloss text `qualifier` to include. Normally, qualifiers ought to be `.qq` or `.ll`, but the
-- previous version faked full links totally manually and included the qualifiers inside of the gloss/translit parens
-- like the `.ng` argument does, so we maintain some of the old code in format_qualifier() and include the result as
-- non-gloss text. (Declared local above to make a forward reference.)
function link(word, lang, qualifier)
if word == "" then
return "—"
end
if word:find("\127", nil, true) then
return (word:gsub("^(.-)( ?\127'\"`UNIQ%-%-%w+%-[%dA-F]+%-%-?QINU`\"'\127)",
function(text, space_and_strip_marker)
return link(text, lang, qualifier) .. space_and_strip_marker
end)
)
end
if word:find(" and ", nil, true) then
return (word:gsub("(.+) and (.+)", function (first, second)
return link(first, lang, qualifier) .. " and " .. link(second, lang, qualifier)
end))
end
if word:find("[[", nil, true) then
return (word:gsub("%[%[([^%]]+)%]%]", make_link(lang, qualifier)))
end
local entry, link_text, sense_id
if word:find("|", nil, true) then
entry, link_text = word:match("^([^|]+)|(.+)$")
if not entry then
error("Malformed piped link: " .. word)
end
else
entry = word
end
-- moule$mussel -> moule#French-mussel (assuming lang is French)
if entry:match("%$") then
entry, sense_id = entry:match("([^$]+)$(.+)$")
if not entry then
error("Malformed sense id: " .. entry)
end
link_text = entry
end
if not link_text then
link_text = entry or word
end
return m_links.full_link {
lang = lang,
term = mw.text.killMarkers(entry),
alt = link_text,
id = sense_id,
ng = qualifier,
}
end
local function gsub_or_nil(str, pattern, repl)
local result, count = string.gsub(str, pattern, repl)
if count == 0 then
return nil
end
return result
end
local langs_by_name = {}
setmetatable(langs_by_name, {
-- Auto-create language objects: langs.English -> language object for English.
__index = function(self, key)
local lang = get_by_name(mw.text.killMarkers(key), true)
self[key] = lang
return lang
end
})
local function link_language_names(text)
return (text:gsub("%[%[([^%]]+)%]%]", function (name)
return langs_by_name[name]:makeWikipediaLink()
end))
end
local comma_placeholder = "\1"
local semicolon_placeholder = "\2"
local placeholder_convert = {
[comma_placeholder] = ",", [semicolon_placeholder] = ";",
[","] = comma_placeholder, [";"] = semicolon_placeholder,
}
local function format_qualifier(qualifier_content, link_text, lang)
if qualifier_content:find("\127", nil, true) then
return (qualifier_content:gsub("[^\127]+ ?", format_qualifier))
end
if qualifier_content:find('"', nil, true) then
return (qualifier_content
:gsub(comma_placeholder, placeholder_convert)
:gsub(
'"([^"]+)"',
function (gloss)
return quote(gloss:gsub("[,;]", placeholder_convert))
end)
:gsub(
"[^,;]+",
function (item)
if item:find("“", nil, true) then
return item
else
return "''" .. item .. "''"
end
end)
:gsub("[" .. comma_placeholder .. semicolon_placeholder .. "]", placeholder_convert)
)
else
return (qualifier_content
:gsub(comma_placeholder, placeholder_convert)
:gsub("[^,;]+", "''%1''")
)
end
end
local function link_and_make_qualifier(cell, lang)
if not cell then
return ""
end
if cell:find(",", nil, true) then
return (cell
-- Replace commas in qualifiers with semicolons, so that the function
-- doesn't confuse commas in qualifiers and commas that separate words.
:gsub("%([^%)]+%)", function (qualifier)
return qualifier:gsub(",", placeholder_convert)
end)
:gsub("([^,]+)(,? ?)", function(text, comma)
return link_and_make_qualifier(text, lang) .. comma
end)
)
elseif cell:find("/", nil, true) then
return (cell
:gsub("([^/]+)( ?/? ?)", function(text, slash)
return link_and_make_qualifier(text, lang) .. slash
end)
)
elseif cell:find("(", nil, true) then
return gsub_or_nil(
cell,
"(.-) %(([^%)]+)%)",
function (link_text, qualifier_content)
return link(link_text, lang, link_language_names(format_qualifier(qualifier_content, link_text, lang)))
end)
or error("Ill-formed qualifier in " .. quote(cell) .. " for " .. lang:getCanonicalName() .. ".")
end
return link(cell, lang)
end
local function link_term_list(text, lang)
if text:find("[[", nil, true) then
return (text:gsub("%[%[([^%]]+)%]%]", make_link(lang)))
end
return (text:gsub("([^,]+)", make_link(lang)))
end
local function make_table(rows, column_number_to_lang, arg_count)
local output = {}
for i, header_cell in ipairs(rows[1]) do
output[i] = ("! %s"):format(header_cell)
end
local row_count_for_headers_at_bottom = 10
local headers_at_bottom = #rows > row_count_for_headers_at_bottom
local headers
if headers_at_bottom then
headers = "|-\n" .. table.concat(output, "\n")
end
table.insert(output, 1, '{| class="wikitable sortable"')
local column_count = #column_number_to_lang
local column_number = column_count
local row_number = 1 -- Header is row 1.
table.insert(output, "|-")
for _ = column_count + 1, arg_count do
if column_number == column_count then
column_number = 1
row_number = row_number + 1
table.insert(output, "|-")
else
column_number = column_number + 1
end
local lang = langs[column_number_to_lang[column_number]]
local content = rows[row_number][column_number]
table.insert(output, ('| data-sort-value="%s" | %s'):format(
m_links.remove_links(lang:stripDiacritics(content:match("[^,(]+") or content)),
link_and_make_qualifier(content, lang)))
end
if headers_at_bottom then
table.insert(output, headers)
end
table.insert(output, "|}")
return table.concat(output, "\n")
end
function export.doublet_table(frame)
local args = frame:getParent().args
if not args.langs then
return
end
local column_number_to_lang = {}
local column_count = 0
for lang in args.langs:gmatch("[^, ]+") do
column_count = column_count + 1
column_number_to_lang[column_count] = lang
end
local rows = auto_subtable()
local column_number = 0
local row_number = 1
local arg_count
for i, arg in ipairs(args) do
arg_count = i
if column_number == column_count then
column_number = 1
row_number = row_number + 1
else
column_number = column_number + 1
end
rows[row_number][column_number] = trim(arg)
end
return make_table(rows, column_number_to_lang, arg_count)
end
local function make_family_doublet_table(rows, column_count)
local Array = require("Module:array")
local output = Array()
for i, header_cell in ipairs(rows[1]) do
if i == 1 then
-- Assumes the language name is a single capitalized word.
-- Works in [[Appendix:Romance doublets]].
header_cell = header_cell:gsub("^(%u%l+) (.+)$", function (language_name, terms)
return language_name .. " " .. link_term_list(terms, langs_by_name[language_name])
end)
output:insert(("|+ %s"):format(header_cell))
output:insert("!")
else
output:insert(("! %s"):format(header_cell))
end
end
local row_count_for_headers_at_bottom = 10
local headers_at_bottom = #rows > row_count_for_headers_at_bottom
local headers
if headers_at_bottom then
headers = "|-\n" .. output:concat("\n")
end
output:insert(1, '{| class="wikitable"')
for i = 2, #rows do
if rows[i][1] == "See also" then
output:insert(('|-\n| colspan="%d" style="text-align: center; font-weight: bold;" | See also')
:format(column_count))
else
local lang = langs_by_name[rows[i][1]]
output:insert("|-\n! " .. rows[i][1]) -- link language name?
for j = 2, column_count do
output:insert("| " .. link_and_make_qualifier(rows[i][j], lang))
end
end
end
if headers_at_bottom then
output:insert(headers)
end
output:insert("|}")
return output:concat("\n")
end
-- Copies sequential numbered arguments and counts them (while ignoring "See also").
local function process_args(args)
local count = 0
local new_args = {}
for i, v in ipairs(args) do
v = trim(v)
if v ~= "See also" then
count = count + 1
end
new_args[i] = v
end
return new_args, count
end
function export.family_doublets(frame)
local args = frame:getParent().args
local column_count = tonumber(args.cols) or error("Provide the number of columns in the |cols= parameter.")
local arg_count
args, arg_count = process_args(args) -- Warning! Removes named parameters!
if arg_count % column_count ~= 0 then
error(
string.format(
"There are %d cell parameters but %d columns. The number of cells should be a multiple of the number of columns.",
arg_count, column_count))
end
local rows = auto_subtable()
local column_number = 0
local row_number = 1
for _, arg in ipairs(args) do
if column_number == column_count then
column_number = 1
row_number = row_number + 1
else
column_number = column_number + 1
end
rows[row_number][column_number] = arg
if arg == "See also" then
column_number = 0
row_number = row_number + 1
end
end
rows:un_auto_subtable() -- to avoid problems with below function
return make_family_doublet_table(rows, column_count)
end
return export
6iixx4arsp3m6bupfczbk72okvq83eu
Module:grc-appendix
828
8178
27713
2026-06-22T06:40:22Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local items = { ["contraction"] = "contraction", ["contracted"] = "contraction", ["first declension"] = "first declension", ["second declension"] = "second declension", ["third declension"] = "third declension", ["enclitics"] = "enclitics", ["numerals"] = "numerals", ["correlatives"] = "correlatives", ["nouns"] = "nouns", [""] = "", [""] = "", } return export'
27713
Scribunto
text/plain
local export = {}
local items = {
["contraction"] = "contraction",
["contracted"] = "contraction",
["first declension"] = "first declension",
["second declension"] = "second declension",
["third declension"] = "third declension",
["enclitics"] = "enclitics",
["numerals"] = "numerals",
["correlatives"] = "correlatives",
["nouns"] = "nouns",
[""] = "",
[""] = "",
}
return export
62lyp9ktuuv9trvtaoeo4hfi9xk0oja
Module:grc-link/data
828
8179
27714
2026-06-22T06:41:35Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local make_auto_subtabler = require("Module:auto-subtable") local content = mw.title.new("Appendix:Ancient Greek endings"):getContent() local endings = {} -- Find entries for endings marked by the syntax "; [[...]]". -- Store them in a table. for anchor in content:gmatch("\n; %[%[([^%]]+)%]%]") do if anchor:find("[\128-\255]") then for suffix in anchor:gmatch("%-[^%s,]+") do endings[suffix] = true end end end local shares_ending = make_auto_subtabler...'
27714
Scribunto
text/plain
local make_auto_subtabler = require("Module:auto-subtable")
local content = mw.title.new("Appendix:Ancient Greek endings"):getContent()
local endings = {}
-- Find entries for endings marked by the syntax "; [[...]]".
-- Store them in a table.
for anchor in content:gmatch("\n; %[%[([^%]]+)%]%]") do
if anchor:find("[\128-\255]") then
for suffix in anchor:gmatch("%-[^%s,]+") do
endings[suffix] = true
end
end
end
local shares_ending = make_auto_subtabler()
-- The actual purpose of this data module:
-- Check if each ending ends with the characters of any smaller endings, by
-- snipping off progressively larger pieces of the ending and comparing them to
-- all other endings.
-- If so, store the ending in an array indexed by the shorter ending.
-- For instance, -εσθαι ends with the characters of -αι and -σθαι.
for ending in pairs(endings) do
for i = 1, mw.ustring.len(ending) - 2 do -- Ignore the first two characters because of the hyphen.
local sub_ending = "-" .. mw.ustring.sub(ending, -i)
if endings[sub_ending] then
table.insert(shares_ending[sub_ending], ending)
end
end
end
shares_ending:un_auto_subtable()
return { shares_ending = shares_ending }
shatopxunukohzewfrh759dh24p398a
Module:grc-link
828
8180
27715
2026-06-22T06:42:58Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local function remove_macron_breve(text) return mw.ustring.toNFD(text):gsub("\204[\132\134]", "") end local function link(text) return '<span class="Polyt" lang="grc">[[' .. remove_macron_breve(text) .. '#Ancient Greek|' .. text .. ']]</span>' end local function anchor_link(text) return '<span class="Polyt" lang="grc">[[#' .. text .. '|' .. text .. ']]</span>' end local function tag(text) return '<span class="Polyt" lang="grc">'...'
27715
Scribunto
text/plain
local export = {}
local function remove_macron_breve(text)
return mw.ustring.toNFD(text):gsub("\204[\132\134]", "")
end
local function link(text)
return '<span class="Polyt" lang="grc">[['
.. remove_macron_breve(text)
.. '#Ancient Greek|' .. text .. ']]</span>'
end
local function anchor_link(text)
return '<span class="Polyt" lang="grc">[[#'
.. text
.. '|' .. text .. ']]</span>'
end
local function tag(text)
return '<span class="Polyt" lang="grc">' .. text .. '</span>'
end
local function individual_anchor(text)
return '<span id="' .. text .. '"></span>'
end
local function make_anchors(text)
if text:find(",") then
local anchors = {}
for word in text:gmatch("[^, ]+") do
table.insert(anchors, individual_anchor(word))
end
return table.concat(anchors)
else
return individual_anchor(text)
end
end
local function count(text, pattern, bytepattern)
local _, count = (bytepattern and string.gsub or mw.ustring.gsub)(text, pattern, "")
return count
end
local function get_length(text)
return count(text, "[%z\1-\127\194-\244][\128-\191]*", true)
end
local U = require("Module:string/char")
local acute = U(0x301)
local grave = U(0x300)
local circumflex = U(0x342)
local function check(text)
if get_length(text) == 1 then
return ""
end
local errors = {}
text = mw.ustring.toNFD(text)
if count(text, grave) > 0 then
table.insert(errors, "Grave found!")
end
local accent_count = count(text, "[" .. acute .. circumflex .. "]")
if accent_count > 1 then
table.insert(errors, "Too many accents!")
elseif accent_count == 0 and text:sub(-1) ~= "-" then
table.insert(errors, "No accent!")
end
if errors[1] then
return ' <span style="color: goldenrod;">' .. table.concat(errors, " ") .. '</span>'
else
return ""
end
end
-- For [[Appendix:Ancient Greek endings]]; using individual templates is way too slow.
function export.link_Greek(frame)
local text = frame:getParent().args[1]
if text then
local data = mw.loadData "Module:grc-link/data"
local macron = mw.ustring.char(0x306)
local breve = mw.ustring.char(0x304)
local subscript = mw.ustring.char(0x345)
local replacements = {
[macron] = "a", -- macron
[breve] = "b", -- breve
[subscript] = "c", -- iota subscript
}
local get_sort_value = require("Module:memoize")(function (suffix)
suffix = mw.ustring.gsub(mw.ustring.toNFD(suffix),
"[" .. macron .. breve .. subscript .. "]",
replacements)
return suffix
end)
local entries = {}
local i, j, entry, pos
while true do
i, j, entry = text:find("(...-)\n;", pos)
if i == nil then
table.insert(entries, text:sub(pos or 1))
break
end
table.insert(entries, entry)
pos = j - 1
end
return (table.concat(
require("Module:fun").map(
-- Automatically list other suffixes that share the same last
-- few letters, using [[Module:User:Erutuon/10/data]].
function (entry)
if entry:find("\n;") then
local shares_ending
for headword in entry:match("\n; %[%[([^%]]+)%]%]"):gmatch("%-[^%s,]+") do
if data.shares_ending[headword] then
shares_ending = shares_ending or {}
for _, suffix in ipairs(data.shares_ending[headword]) do
table.insert(shares_ending, "[[" .. suffix .. "]]")
end
end
end
if shares_ending then
table.sort(
shares_ending,
function (ending1, ending2)
return get_sort_value(ending1) < get_sort_value(ending2)
end)
return entry .. "\n: See also " .. table.concat(shares_ending, ", ") .. "."
end
end
return entry
end,
entries))
:gsub(
"(\n?;? ?)%[%[((%-?)[^%]]+)%]%]",
function (preceding, link_text, hyphen)
if link_text:find("[\206\207\225]") then -- leading bytes for Greek and Coptic block and leading byte for Greek Extended block
if preceding == "\n; " then
return preceding .. make_anchors(link_text) .. tag(link_text)
else
if hyphen == "-" then
return preceding .. anchor_link(link_text)
else
return preceding .. link(link_text) .. check(link_text)
end
end
end
end)
:gsub(
"(&[^;]+;)(&[^;]+;)",
'<span class="Polyt" lang="grc">[[%2|%1%2]]</span>')
:gsub("\n$", ""))
end
end
-- Used in [[User:Erutuon/Classical Greek prose]].
function export.link_and_transliterate(frame)
local text = frame.args[1] or frame:getParent().args[1]
if not text then
return
end
local open_paren = ' <span class="mention-gloss-paren annotation-paren">(</span><span class="tr Latn" xml:lang="grc-Latn" lang="grc-Latn">'
local close_paren = '</span><span class="mention-gloss-paren annotation-paren">)</span>'
local column_value = '10em'
return '<div style="-moz-columns: ' .. column_value .. '; -webkit-columns: ' .. column_value .. '; columns: ' .. column_value .. ';">'
.. text
:gsub(
"%[%[([^%]]+)%]%]",
function (link_text)
return link(link_text)
.. open_paren .. (require("Module:languages").getByCode("grc"):transliterate(link_text)) .. close_paren
end)
:gsub("\n$", "")
.. '</div>'
end
function export.strongs_list(frame)
local text = frame.args[1]
local function strip_diacritics(word)
return mw.ustring.toNFD(word):gsub("[\204\205][\128\129\136\147\148\130\133]", "")
-- U+0300 \204\128 COMBINING GRAVE ACCENT
-- U+0301 \204\129 COMBINING ACUTE ACCENT
-- U+0308 \204\136 COMBINING DIAERESIS
-- U+0313 \204\147 COMBINING COMMA ABOVE
-- U+0314 \204\148 COMBINING REVERSED COMMA ABOVE
-- U+0342 \205\130 COMBINING GREEK PERISPOMENI
-- U+0345 \205\133 COMBINING GREEK YPOGEGRAMMENI
end
local function get_first_letter(word)
return mw.ustring.upper(strip_diacritics(word:match("^%-?([%z\1-\127\194-\244][\128-\191]*)")))
end
local prev_letter
return text:gsub(
"%f[^\n%z]([^\t\n]+)\t([^\t\n]+)",
function(number, word)
local header = ""
local letter = get_first_letter(word)
if letter ~= prev_letter then
if number ~= "1" then
header = "</ul>\n\n"
end
header = header .. ('===%s – %04d===\n<ul class="plainlinks" style="column-width: 12em;">\n'):format(letter, tonumber(number))
prev_letter = letter
end
return header
.. '<li> [https://www.blueletterbible.org/lexicon/g' .. number .. "/wlc G" .. number
.. "]: " .. link(word)
.. (word:find(" ", 1, true) and ("<br>(" .. word:gsub("[^ ]+", link) .. ")") or "")
end) .. "</ul>"
end
return export
g84o250ck5y6ht0x2q9wt4ki8ymvwgx
Module:ja-link
828
8181
27716
2026-06-22T06:43:50Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local m_links = require("Module:links") local m_string_utils = require("Module:string utilities") local ugsub = m_string_utils.gsub local upper = m_string_utils.upper local kana_to_romaji = require("Module:Hrkt-translit").tr -- [[Module:languages]] -- [[Module:parameters]] -- [[Module:script utilities]] -- [[Module:ja-ruby]] -- [[Module:Hrkt-translit]] function export.link(data, options) options = options or {} data.lang = data.lang or req...'
27716
Scribunto
text/plain
local export = {}
local m_links = require("Module:links")
local m_string_utils = require("Module:string utilities")
local ugsub = m_string_utils.gsub
local upper = m_string_utils.upper
local kana_to_romaji = require("Module:Hrkt-translit").tr
-- [[Module:languages]]
-- [[Module:parameters]]
-- [[Module:script utilities]]
-- [[Module:ja-ruby]]
-- [[Module:Hrkt-translit]]
function export.link(data, options)
options = options or {}
data.lang = data.lang or require'Module:languages'.getByCode'ja'
local kana_for_rom = data.kana or data.lemma
if not data.kana then
data.lemma = data.lemma:gsub('[ %%%^%-%.]', '')
end
local ruby
if data.kana and data.lemma ~= data.kana then
ruby = require('Module:ja-ruby').ruby_auto{
term = data.lemma,
kana = data.kana,
options = options.rubyOptions,
}
else
require("Module:debug").track('ja-link/no ruby')
ruby = data.lemma
end
if ruby:match'%[%[.+%]%]' then
require("Module:debug").track('ja-link/manual wikilink')
data.term = ruby
elseif data.linkto == "" or data.linkto == "-" then
require("Module:debug").track('ja-link/disabled link')
data.alt = ruby
else
data.term = data.linkto or data.lemma:gsub('[ %%]', '')
data.alt = ruby
end
if data.tr ~= '-' then
if not data.tr then
data.tr = m_links.remove_links(kana_to_romaji(kana_for_rom, data.lang:getCode(), nil, {hist = options.hist}))
if options.caps then
require("Module:debug").track("ja-link/caps")
data.tr = ugsub(data.tr, "%f[^%s%c%p]%l", upper)
end
else
if options.hist then require("Module:debug").track("ja-link/parameter hist unused") end
end
data.tr = "<i>" .. data.tr .. "</i>"
end
data.lemma = nil
data.kana = nil
data.linkto = nil
return m_links.full_link(data, options.face, not options.disableSelfLink)
end
function export.show(frame)
local alias_of_3 = {alias_of = 3}
local boolean = {type = "boolean"}
local args = require("Module:parameters").process(frame:getParent().args, {
[1] = {required = true},
[2] = true,
[3] = true,
['gloss'] = alias_of_3,
['t'] = alias_of_3,
['linkto'] = {allow_empty = true},
['rom'] = true,
['lit'] = true,
['pos'] = true,
['id'] = true,
['hist'] = boolean,
['caps'] = boolean,
['self'] = {type = "boolean", default = false},
})
return export.link({
lang = frame.args[1] and require'Module:languages'.getByCode(frame.args[1]),
lemma = args[1],
kana = args[2],
gloss = args[3],
lit = args["lit"],
pos = args["pos"],
id = args["id"],
linkto = args["linkto"],
tr = args["rom"],
}, {
caps = args["caps"],
hist = args["hist"],
disableSelfLink = args["self"],
})
end
return export
146glug46mxs4kvdx4wlctbgfrhmz4v
Module:ja-link/fast
828
8182
27717
2026-06-22T06:44:41Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} -- Used in [[Wiktionary:Frequency lists/Japanese]]. Converts bare links to -- {{l/ja}}-type links. function export.link(frame) local text = frame.args[1] if not text then return nil end local function link(text) return '<span class="Jpan" lang="ja">[[' .. text .. '#Japanese|' .. text .. ']]</span>' end return (text :gsub( "%[%[([^%]]+)%]%]", link)) end return export'
27717
Scribunto
text/plain
local export = {}
-- Used in [[Wiktionary:Frequency lists/Japanese]]. Converts bare links to
-- {{l/ja}}-type links.
function export.link(frame)
local text = frame.args[1]
if not text then
return nil
end
local function link(text)
return '<span class="Jpan" lang="ja">[['
.. text
.. '#Japanese|' .. text .. ']]</span>'
end
return (text
:gsub(
"%[%[([^%]]+)%]%]",
link))
end
return export
4wcbighghsc4ym5bapy3g6m2phenjns
Module:links
828
8183
27718
2026-06-22T06:45:40Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} --[=[ [[Unsupported titles]], pages with high memory usage, extraction modules and part-of-speech names are listed at [[Module:links/data]]. Other modules used: [[Module:script utilities]] [[Module:scripts]] [[Module:languages]] and its submodules [[Module:gender and number]] [[Module:debug/track]] ]=] local anchors_module = "Module:anchors" local debug_track_module = "Module:debug/track" local form_of_module = "Module:form of"...'
27718
Scribunto
text/plain
local export = {}
--[=[
[[Unsupported titles]], pages with high memory usage,
extraction modules and part-of-speech names are listed
at [[Module:links/data]].
Other modules used:
[[Module:script utilities]]
[[Module:scripts]]
[[Module:languages]] and its submodules
[[Module:gender and number]]
[[Module:debug/track]]
]=]
local anchors_module = "Module:anchors"
local debug_track_module = "Module:debug/track"
local form_of_module = "Module:form of"
local gender_and_number_module = "Module:gender and number"
local languages_module = "Module:languages"
local load_module = "Module:load"
local memoize_module = "Module:memoize"
local pages_module = "Module:pages"
local pron_qualifier_module = "Module:pron qualifier"
local scripts_module = "Module:scripts"
local script_utilities_module = "Module:script utilities"
local string_encode_entities_module = "Module:string/encode entities"
local string_utilities_module = "Module:string utilities"
local table_module = "Module:table"
local utilities_module = "Module:utilities"
local concat = table.concat
local find = string.find
local get_current_title = mw.title.getCurrentTitle
local insert = table.insert
local ipairs = ipairs
local match = string.match
local new_title = mw.title.new
local pairs = pairs
local remove = table.remove
local sub = string.sub
local toNFC = mw.ustring.toNFC
local tostring = tostring
local type = type
local unstrip = mw.text.unstrip
local NAMESPACE = get_current_title().nsText
local function anchor_encode(...)
anchor_encode = require(memoize_module)(mw.uri.anchorEncode, true)
return anchor_encode(...)
end
local function debug_track(...)
debug_track = require(debug_track_module)
return debug_track(...)
end
local function decode_entities(...)
decode_entities = require(string_utilities_module).decode_entities
return decode_entities(...)
end
local function decode_uri(...)
decode_uri = require(string_utilities_module).decode_uri
return decode_uri(...)
end
-- Can't yet replace, as the [[Module:string utilities]] version no longer has automatic double-encoding prevention, which requires changes here to account for.
local function encode_entities(...)
encode_entities = require(string_encode_entities_module)
return encode_entities(...)
end
local function extend(...)
extend = require(table_module).extend
return extend(...)
end
local function find_best_script_without_lang(...)
find_best_script_without_lang = require(scripts_module).findBestScriptWithoutLang
return find_best_script_without_lang(...)
end
local function format_categories(...)
format_categories = require(utilities_module).format_categories
return format_categories(...)
end
local function format_genders(...)
format_genders = require(gender_and_number_module).format_genders
return format_genders(...)
end
local function format_qualifiers(...)
format_qualifiers = require(pron_qualifier_module).format_qualifiers
return format_qualifiers(...)
end
local function get_current_L2(...)
get_current_L2 = require(pages_module).get_current_L2
return get_current_L2(...)
end
local function get_lang(...)
get_lang = require(languages_module).getByCode
return get_lang(...)
end
local function get_script(...)
get_script = require(scripts_module).getByCode
return get_script(...)
end
local function language_anchor(...)
language_anchor = require(anchors_module).language_anchor
return language_anchor(...)
end
local function load_data(...)
load_data = require(load_module).load_data
return load_data(...)
end
local function request_script(...)
request_script = require(script_utilities_module).request_script
return request_script(...)
end
local function shallow_copy(...)
shallow_copy = require(table_module).shallowCopy
return shallow_copy(...)
end
local function split(...)
split = require(string_utilities_module).split
return split(...)
end
local function tag_text(...)
tag_text = require(script_utilities_module).tag_text
return tag_text(...)
end
local function tag_translit(...)
tag_translit = require(script_utilities_module).tag_translit
return tag_translit(...)
end
local function trim(...)
trim = require(string_utilities_module).trim
return trim(...)
end
local function u(...)
u = require(string_utilities_module).char
return u(...)
end
local function ulower(...)
ulower = require(string_utilities_module).lower
return ulower(...)
end
local function umatch(...)
umatch = require(string_utilities_module).match
return umatch(...)
end
local m_headword_data
local function get_headword_data()
m_headword_data = load_data("Module:headword/data")
return m_headword_data
end
local function track(page, code)
local tracking_page = "links/" .. page
debug_track(tracking_page)
if code then
debug_track(tracking_page .. "/" .. code)
end
end
local function selective_trim(...)
-- Unconditionally trimmed charset.
local always_trim =
"\194\128-\194\159" .. -- U+0080-009F (C1 control characters)
"\194\173" .. -- U+00AD (soft hyphen)
"\226\128\170-\226\128\174" .. -- U+202A-202E (directionality formatting characters)
"\226\129\166-\226\129\169" -- U+2066-2069 (directionality formatting characters)
-- Standard trimmed charset.
local standard_trim = "%s" .. -- (default whitespace charset)
"\226\128\139-\226\128\141" .. -- U+200B-200D (zero-width spaces)
always_trim
-- If there are non-whitespace characters, trim all characters in `standard_trim`.
-- Otherwise, only trim the characters in `always_trim`.
selective_trim = function(text)
if text == "" then
return text
end
local trimmed = trim(text, standard_trim)
if trimmed ~= "" then
return trimmed
end
return trim(text, always_trim)
end
return selective_trim(...)
end
local function escape(text, str)
local rep
repeat
text, rep = text:gsub("\\\\(\\*" .. str .. ")", "\5%1")
until rep == 0
return (text:gsub("\\" .. str, "\6"))
end
local function unescape(text, str)
return (text
:gsub("\5", "\\")
:gsub("\6", str))
end
-- Remove bold, italics, soft hyphens, strip markers and HTML tags.
local function remove_formatting(str)
str = str
:gsub("('*)'''(.-'*)'''", "%1%2")
:gsub("('*)''(.-'*)''", "%1%2")
:gsub("", "")
return (unstrip(str)
:gsub("<[^<>]+>", ""))
end
--[==[Takes an input and splits on a double slash (taking account of escaping backslashes).]==]
function export.split_on_slashes(text)
if text:find("\\", nil, true) then
track("escaped", "split_on_slashes")
end
text = split(escape(text, "//"), "//", true) or {}
for i, v in ipairs(text) do
text[i] = unescape(v, "//")
if v == "" then
text[i] = false
end
end
return text
end
--[==[Takes a wikilink and outputs the link target and display text. By default, the link target will be returned as a title object, but if `allow_bad_target` is set it will be returned as a string, and no check will be performed as to whether it is a valid link target.]==]
function export.get_wikilink_parts(text, allow_bad_target)
-- TODO: replace `allow_bad_target` with `allow_unsupported`, with support for links to unsupported titles, including escape sequences.
if ( -- Filters out anything but "[[...]]" with no intermediate "[[" or "]]".
not match(text, "^()%[%[") or -- Faster than sub(text, 1, 2) ~= "[[".
find(text, "[[", 3, true) or
find(text, "]]", 3, true) ~= #text - 1
) then
return nil, nil
end
local pipe, title, display = find(text, "|", 3, true)
if pipe then
title, display = sub(text, 3, pipe - 1), sub(text, pipe + 1, -3)
else
title = sub(text, 3, -3)
display = title
end
if allow_bad_target then
return title, display
end
title = new_title(title)
-- No title object means the target is invalid.
if title == nil then
return nil, nil
-- If the link target starts with "#" then mw.title.new returns a broken
-- title object, so grab the current title and give it the correct fragment.
elseif title.prefixedText == "" then
local fragment = title.fragment
if fragment == "" then -- [[#]] isn't valid
return nil, nil
end
title = get_current_title()
title.fragment = fragment
end
return title, display
end
-- Does the work of export.get_fragment, but can be called directly to avoid unnecessary checks for embedded links.
local function get_fragment(text)
text = escape(text, "#")
-- Replace numeric character references with the corresponding character (' → '),
-- as they contain #, which causes the numeric character reference to be
-- misparsed (wa'a → wa'a → pagename wa&, fragment 39;a).
text = decode_entities(text)
local target, fragment = text:match("^(.-)#(.+)$")
target = target or text
target = unescape(target, "#")
fragment = fragment and unescape(fragment, "#")
return target, fragment
end
--[==[Takes a link target and outputs the actual target and the fragment (if any).]==]
function export.get_fragment(text)
if text:find("\\", nil, true) then
track("escaped", "get_fragment")
end
-- If there are no embedded links, process input.
local open = find(text, "[[", nil, true)
if not open then
return get_fragment(text)
end
local close = find(text, "]]", open + 2, true)
if not close then
return get_fragment(text)
-- If there is one, but it's redundant (i.e. encloses everything with no pipe), remove and process.
elseif open == 1 and close == #text - 1 and not find(text, "|", 3, true) then
return get_fragment(sub(text, 3, -3))
end
-- Otherwise, return the input.
return text
end
--[==[
Given a link target as passed to `full_link()`, get the actual page that the target refers to. This removes
bold, italics, strip markets and HTML; calls `makeEntryName()` for the language in question; converts targets
beginning with `*` to the Reconstruction namespace; and converts appendix-constructed languages to the Appendix
namespace. Returns up to three values:
# the actual page to link to, or {nil} to not link to anything;
# how the target should be displayed as, if the user didn't explicitly specify any display text; generally the
same as the original target, but minus any anti-asterisk !!;
# the value `true` if the target had a backslash-escaped * in it (FIXME: explain this more clearly).
]==]
function export.get_link_page_with_auto_display(target, lang, sc, plain)
local orig_target = target
if not target then
return nil
elseif target:find("\\", nil, true) then
track("escaped", "get_link_page")
end
target = remove_formatting(target)
if target:sub(1, 1) == ":" then
track("initial colon")
-- FIXME, the auto_display (second return value) should probably remove the colon
return target:sub(2), orig_target
end
local prefix = target:match("^(.-):")
-- Convert any escaped colons
target = target:gsub("\\:", ":")
if prefix then
-- If this is an a link to another namespace or an interwiki link, ensure there's an initial colon and then
-- return what we have (so that it works as a conventional link, and doesn't do anything weird like add the term
-- to a category.)
prefix = ulower(trim(prefix))
if prefix ~= "" and (
load_data("Module:data/namespaces")[prefix] or
load_data("Module:data/interwikis")[prefix]
) then
return target, orig_target
end
end
-- Check if the term is reconstructed and remove any asterisk. Also check for anti-asterisk (!!).
-- Otherwise, handle the escapes.
local reconstructed, escaped, anti_asterisk
if not plain then
target, reconstructed = target:gsub("^%*(.)", "%1")
if reconstructed == 0 then
target, anti_asterisk = target:gsub("^!!(.)", "%1")
if anti_asterisk == 1 then
-- Remove !! from original. FIXME! We do it this way because the call to remove_formatting() above
-- may cause non-initial !! to be interpreted as anti-asterisks. We should surely move the
-- remove_formatting() call later.
orig_target = orig_target:gsub("^!!", "")
end
end
end
target, escaped = target:gsub("^(\\-)\\%*", "%1*")
if not (sc and sc:getCode() ~= "None") then
sc = lang:findBestScript(target)
end
-- Remove carets if they are used to capitalize parts of transliterations (unless they have been escaped).
if (not sc:hasCapitalization()) and sc:isTransliterated() and target:match("%^") then
target = escape(target, "^")
:gsub("%^", "")
target = unescape(target, "^")
end
-- Get the entry name for the language.
target = lang:makeEntryName(target, sc, reconstructed == 1 or lang:hasType("appendix-constructed"))
-- If the link contains unexpanded template parameters, then don't create a link.
if target:match("{{{.-}}}") then
-- FIXME: Should we return the original target as the default display value (second return value)?
return nil
end
-- Link to appendix for reconstructed terms and terms in appendix-only languages. Plain links interpret *
-- literally, however.
if reconstructed == 1 then
if lang:getFullCode() == "und" then
-- Return the original target as default display value. If we don't do this, we wrongly get
-- [Term?] displayed instead.
return nil, orig_target
end
target = "Reconstruction:" .. lang:getFullName() .. "/" .. target
-- Reconstructed languages and substrates require an initial *.
elseif anti_asterisk ~= 1 and (lang:hasType("reconstructed") or lang:getFamilyCode() == "qfa-sub") then
error(("The specified language %s is unattested, while the term '%s' does not begin with '*' to indicate that it is reconstructed.")
:
format(lang:getCanonicalName(), orig_target))
elseif lang:hasType("appendix-constructed") then
target = "Appendix:" .. lang:getFullName() .. "/" .. target
else
target = target
end
return target, orig_target, escaped > 0
end
function export.get_link_page(target, lang, sc, plain)
local target, auto_display, escaped = export.get_link_page_with_auto_display(target, lang, sc, plain)
return target, escaped
end
-- Make a link from a given link's parts
local function make_link(link, lang, sc, id, isolated, cats, no_alt_ast, plain)
-- Convert percent encoding to plaintext.
link.target = link.target and decode_uri(link.target, "PATH")
link.fragment = link.fragment and decode_uri(link.fragment, "PATH")
-- Find fragments (if one isn't already set).
-- Prevents {{l|en|word#Etymology 2|word}} from linking to [[word#Etymology 2#English]].
-- # can be escaped as \#.
if link.target and link.fragment == nil then
link.target, link.fragment = get_fragment(link.target)
end
-- Process the target
local auto_display, escaped
link.target, auto_display, escaped = export.get_link_page_with_auto_display(link.target, lang, sc, plain)
-- Create a default display form.
-- If the target is "" then it's a link like [[#English]], which refers to the current page.
if auto_display == "" then
auto_display = (m_headword_data or get_headword_data()).pagename
end
-- If the display is the target and the reconstruction * has been escaped, remove the escaping backslash.
if escaped then
auto_display = auto_display:gsub("\\([^\\]*%*)", "%1", 1)
end
-- Process the display form.
if link.display then
local orig_display = link.display
link.display = lang:makeDisplayText(link.display, sc, true)
if cats then
auto_display = lang:makeDisplayText(auto_display, sc)
-- If the alt text is the same as what would have been automatically generated, then the alt parameter is redundant (e.g. {{l|en|foo|foo}}, {{l|en|w:foo|foo}}, but not {{l|en|w:foo|w:foo}}).
-- If they're different, but the alt text could have been entered as the term parameter without it affecting the target page, then the target parameter is redundant (e.g. {{l|ru|фу|фу́}}).
-- If `no_alt_ast` is true, use pcall to catch the error which will be thrown if this is a reconstructed lang and the alt text doesn't have *.
if link.display == auto_display then
insert(cats, lang:getFullName() .. " links with redundant alt parameters")
else
local ok, check
if no_alt_ast then
ok, check = pcall(export.get_link_page, orig_display, lang, sc, plain)
else
ok = true
check = export.get_link_page(orig_display, lang, sc, plain)
end
if ok and link.target == check then
insert(cats, lang:getFullName() .. " links with redundant target parameters")
end
end
end
else
link.display = lang:makeDisplayText(auto_display, sc)
end
if not link.target then
return link.display
end
-- If the target is the same as the current page, there is no sense id
-- and either the language code is "und" or the current L2 is the current
-- language then return a "self-link" like the software does.
if link.target == get_current_title().prefixedText then
local fragment, current_L2 = link.fragment, get_current_L2()
if (
fragment and fragment == current_L2 or
not (id or fragment) and (lang:getFullCode() == "und" or lang:getFullName() == current_L2)
) then
return tostring(mw.html.create("strong")
:addClass("selflink")
:wikitext(link.display))
end
end
-- Add fragment. Do not add a section link to "Undetermined", as such sections do not exist and are invalid.
-- TabbedLanguages handles links without a section by linking to the "last visited" section, but adding
-- "Undetermined" would break that feature. For localized prefixes that make syntax error, please use the
-- format: ["xyz"] = true.
local prefix = link.target:match("^:*([^:]+):")
prefix = prefix and ulower(prefix)
if prefix ~= "category" and not (prefix and load_data("Module:data/interwikis")[prefix]) then
if (link.fragment or link.target:sub(-1) == "#") and not plain then
track("fragment", lang:getFullCode())
if cats then
insert(cats, lang:getFullName() .. " links with manual fragments")
end
end
if not link.fragment then
if id then
link.fragment = lang:getFullCode() == "und" and anchor_encode(id) or language_anchor(lang, id)
elseif lang:getFullCode() ~= "und" and not (link.target:match("^Appendix:") or link.target:match("^Reconstruction:")) then
link.fragment = anchor_encode(lang:getFullName())
end
end
end
-- Put inward-facing square brackets around a link to isolated spacing character(s).
if isolated and #link.display > 0 and not umatch(decode_entities(link.display), "%S") then
link.display = "]" .. link.display .. "["
end
link.target = link.target:gsub("^(:?)(.*)", function(m1, m2)
return m1 .. encode_entities(m2, "#%&+/:<=>@[\\]_{|}")
end)
link.fragment = link.fragment and encode_entities(remove_formatting(link.fragment), "#%&+/:<=>@[\\]_{|}")
return "[[" ..
link.target:gsub("^[^:]", ":%0") .. (link.fragment and "#" .. link.fragment or "") .. "|" .. link.display .. "]]"
end
-- Split a link into its parts
local function parse_link(linktext)
local link = { target = linktext }
local target = link.target
link.target, link.display = target:match("^(..-)|(.+)$")
if not link.target then
link.target = target
link.display = target
end
-- There's no point in processing these, as they aren't real links.
local target_lower = link.target:lower()
for _, false_positive in ipairs({ "category", "cat", "file", "image" }) do
if target_lower:match("^" .. false_positive .. ":") then
return nil
end
end
link.display = decode_entities(link.display)
link.target, link.fragment = get_fragment(link.target)
-- So that make_link does not look for a fragment again.
if not link.fragment then
link.fragment = false
end
return link
end
local function check_params_ignored_when_embedded(alt, lang, id, cats)
if alt then
track("alt-ignored")
if cats then
insert(cats, lang:getFullName() .. " links with ignored alt parameters")
end
end
if id then
track("id-ignored")
if cats then
insert(cats, lang:getFullName() .. " links with ignored id parameters")
end
end
end
-- Find embedded links and ensure they link to the correct section.
local function process_embedded_links(text, alt, lang, sc, id, cats, no_alt_ast, plain)
-- Process the non-linked text.
text = lang:makeDisplayText(text, sc, true)
-- If the text begins with * and another character, then act as if each link begins with *. However, don't do this if the * is contained within a link at the start. E.g. `|*[[foo]]` would set all_reconstructed to true, while `|[[*foo]]` would not.
local all_reconstructed = false
if not plain then
-- anchor_encode removes links etc.
if anchor_encode(text):sub(1, 1) == "*" then
all_reconstructed = true
end
-- Otherwise, handle any escapes.
text = text:gsub("^(\\-)\\%*", "%1*")
end
check_params_ignored_when_embedded(alt, lang, id, cats)
local function process_link(space1, linktext, space2)
local capture = "[[" .. linktext .. "]]"
local link = parse_link(linktext)
-- Return unprocessed false positives untouched (e.g. categories).
if not link then
return capture
end
if all_reconstructed then
if link.target:find("^!!") then
-- Check for anti-asterisk !! at the beginning of a target, indicating that a reconstructed term
-- wants a part of the term to link to a non-reconstructed term, e.g. Old English
-- {{ang-noun|m|head=*[[!!Crist|Cristes]] [[!!mæsseǣfen]]}}.
link.target = link.target:sub(3)
-- Also remove !! from the display, which may have been copied from the target (as in mæsseǣfen in
-- the example above).
link.display = link.display:gsub("^!!", "")
elseif not link.target:match("^%*") then
link.target = "*" .. link.target
end
end
linktext = make_link(link, lang, sc, id, false, nil, no_alt_ast, plain)
:gsub("^%[%[", "\3")
:gsub("%]%]$", "\4")
return space1 .. linktext .. space2
end
-- Use chars 1 and 2 as temporary substitutions, so that we can use charsets. These are converted to chars 3 and 4 by process_link, which means we can convert any remaining chars 1 and 2 back to square brackets (i.e. those not part of a link).
text = text
:gsub("%[%[", "\1")
:gsub("%]%]", "\2")
-- If the script uses ^ to capitalize transliterations, make sure that any carets preceding links are on the inside, so that they get processed with the following text.
if (
text:find("^", nil, true) and
not sc:hasCapitalization() and
sc:isTransliterated()
) then
text = escape(text, "^")
:gsub("%^\1", "\1%^")
text = unescape(text, "^")
end
text = text:gsub("\1(%s*)([^\1\2]-)(%s*)\2", process_link)
-- Remove the extra * at the beginning of a language link if it's immediately followed by a link whose display begins with * too.
if all_reconstructed then
text = text:gsub("^%*\3([^|\1-\4]+)|%*", "\3%1|*")
end
return (text
:gsub("[\1\3]", "[[")
:gsub("[\2\4]", "]]")
)
end
local function simple_link(term, fragment, alt, lang, sc, id, cats, no_alt_ast, srwc)
local plain
if lang == nil then
lang, plain = get_lang("und"), true
end
-- Get the link target and display text. If the term is the empty string, treat the input as a link to the current page.
if term == "" then
term = get_current_title().prefixedText
elseif term then
local new_term, new_alt = export.get_wikilink_parts(term, true)
if new_term then
check_params_ignored_when_embedded(alt, lang, id, cats)
-- [[|foo]] links are treated as plaintext "[[|foo]]".
-- FIXME: Pipes should be handled via a proper escape sequence, as they can occur in unsupported titles.
if new_term == "" then
term, alt = nil, term
else
local title = new_title(new_term)
if title then
local ns = title.namespace
-- File: and Category: links should be returned as-is.
if ns == 6 or ns == 14 then
return term
end
end
term, alt = new_term, new_alt
if cats then
if not (srwc and srwc(term, alt)) then
insert(cats, lang:getFullName() .. " links with redundant wikilinks")
end
end
end
end
end
if alt then
alt = selective_trim(alt)
if alt == "" then
alt = nil
end
end
-- If there's nothing to process, return nil.
if not (term or alt) then
return nil
end
-- If there is no script, get one.
if not sc then
sc = lang:findBestScript(alt or term)
end
-- Embedded wikilinks need to be processed individually.
if term then
local open = find(term, "[[", nil, true)
if open and find(term, "]]", open + 2, true) then
return process_embedded_links(term, alt, lang, sc, id, cats, no_alt_ast, plain)
end
term = selective_trim(term)
end
-- If not, make a link using the parameters.
return make_link({
target = term,
display = alt,
fragment = fragment
}, lang, sc, id, true, cats, no_alt_ast, plain)
end
--[==[Creates a basic link to the given term. It links to the language section (such as <code>==English==</code>), but it does not add language and script wrappers, so any code that uses this function should call the <code class="n">[[Module:script utilities#tag_text|tag_text]]</code> from [[Module:script utilities]] to add such wrappers itself at some point.
The first argument, <code class="n">data</code>, may contain the following items, a subset of the items used in the <code class="n">data</code> argument of <code class="n">full_link</code>. If any other items are included, they are ignored.
{ {
term = entry_to_link_to,
alt = link_text_or_displayed_text,
lang = language_object,
id = sense_id,
} }
; <code class="n">term</code>
: Text to turn into a link. This is generally the name of a page. The text can contain wikilinks already embedded in it. These are processed individually just like a single link would be. The <code class="n">alt</code> argument is ignored in this case.
; <code class="n">alt</code> (''optional'')
: The alternative display for the link, if different from the linked page. If this is {{code|lua|nil}}, the <code class="n">text</code> argument is used instead (much like regular wikilinks). If <code class="n">text</code> contains wikilinks in it, this argument is ignored and has no effect. (Links in which the alt is ignored are tracked with the tracking template {{whatlinkshere|tracking=links/alt-ignored}}.)
; <code class="n">lang</code>
: The [[Module:languages#Language objects|language object]] for the term being linked. If this argument is defined, the function will determine the language's canonical name (see [[Template:language data documentation]]), and point the link or links in the <code class="n">term</code> to the language's section of an entry, or to a language-specific senseid if the <code class="n">id</code> argument is defined.
; <code class="n">id</code> (''optional'')
: Sense id string. If this argument is defined, the link will point to a language-specific sense id ({{ll|en|identifier|id=HTML}}) created by the template {{temp|senseid}}. A sense id consists of the language's canonical name, a hyphen (<code>-</code>), and the string that was supplied as the <code class="n">id</code> argument. This is useful when a term has more than one sense in a language. If the <code class="n">term</code> argument contains wikilinks, this argument is ignored. (Links in which the sense id is ignored are tracked with the tracking template {{whatlinkshere|tracking=links/id-ignored}}.)
The second argument is as follows:
; <code class="n">allow_self_link</code>
: If {{code|lua|true}}, the function will also generate links to the current page. The default ({{code|lua|false}}) will not generate a link but generate a bolded "self link" instead.
The following special options are processed for each link (both simple text and with embedded wikilinks):
* The target page name will be processed to generate the correct entry name. This is done by the [[Module:languages#makeEntryName|makeEntryName]] function in [[Module:languages]], using the <code class="n">entry_name</code> replacements in the language's data file (see [[Template:language data documentation]] for more information). This function is generally used to automatically strip dictionary-only diacritics that are not part of the normal written form of a language.
* If the text starts with <code class="n">*</code>, then the term is considered a reconstructed term, and a link to the Reconstruction: namespace will be created. If the text contains embedded wikilinks, then <code class="n">*</code> is automatically applied to each one individually, while preserving the displayed form of each link as it was given. This allows linking to phrases containing multiple reconstructed terms, while only showing the * once at the beginning.
* If the text starts with <code class="n">:</code>, then the link is treated as "raw" and the above steps are skipped. This can be used in rare cases where the page name begins with <code class="n">*</code> or if diacritics should not be stripped. For example:
** {{temp|l|en|*nix}} links to the nonexistent page [[Reconstruction:English/nix]] (<code class="n">*</code> is interpreted as a reconstruction), but {{temp|l|en|:*nix}} links to [[*nix]].
** {{temp|l|sl|Franche-Comté}} links to the nonexistent page [[Franche-Comte]] (<code>é</code> is converted to <code>e</code> by <code class="n">makeEntryName</code>), but {{temp|l|sl|:Franche-Comté}} links to [[Franche-Comté]].]==]
function export.language_link(data)
if type(data) ~= "table" then
error(
"The first argument to the function language_link must be a table. See Module:links/documentation for more information.")
elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then
track("escaped", "language_link")
end
-- Categorize links to "und".
local lang, cats = data.lang, data.cats
if cats and lang:getCode() == "und" then
insert(cats, "Undetermined language links")
end
return simple_link(
data.term,
data.fragment,
data.alt,
lang,
data.sc,
data.id,
cats,
data.no_alt_ast,
data.suppress_redundant_wikilink_cat
)
end
function export.plain_link(data)
if type(data) ~= "table" then
error(
"The first argument to the function plain_link must be a table. See Module:links/documentation for more information.")
elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then
track("escaped", "plain_link")
end
return simple_link(
data.term,
data.fragment,
data.alt,
nil,
data.sc,
data.id,
data.cats,
data.no_alt_ast,
data.suppress_redundant_wikilink_cat
)
end
--[==[Replace any links with links to the correct section, but don't link the whole text if no embedded links are found. Returns the display text form.]==]
function export.embedded_language_links(data)
if type(data) ~= "table" then
error(
"The first argument to the function embedded_language_links must be a table. See Module:links/documentation for more information.")
elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then
track("escaped", "embedded_language_links")
end
local term, lang, sc = data.term, data.lang, data.sc
-- If we don't have a script, get one.
if not sc then
sc = lang:findBestScript(term)
end
-- Do we have embedded wikilinks? If so, they need to be processed individually.
local open = find(term, "[[", nil, true)
if open and find(term, "]]", open + 2, true) then
return process_embedded_links(term, data.alt, lang, sc, data.id, data.cats, data.no_alt_ast)
end
-- If not, return the display text.
term = selective_trim(term)
-- FIXME: Double-escape any percent-signs, because we don't want to treat non-linked text as having percent-encoded characters. This is a hack: percent-decoding should come out of [[Module:languages]] and only dealt with in this module, as it's specific to links.
term = term:gsub("%%", "%%25")
return lang:makeDisplayText(term, sc, true)
end
function export.mark(text, item_type, face, lang)
local tag = { "", "" }
if item_type == "gloss" then
tag = { '<span class="mention-gloss-double-quote">“</span><span class="mention-gloss">',
'</span><span class="mention-gloss-double-quote">”</span>' }
if type(text) == "string" and text:match("^''[^'].*''$") then
-- Temporary tracking for mention glosses that are entirely italicized or bolded, which is probably
-- wrong. (Note that this will also find bolded mention glosses since they use triple apostrophes.)
track("italicized-mention-gloss", lang and lang:getFullCode() or nil)
end
elseif item_type == "tr" then
if face == "term" then
tag = { '<span lang="' .. lang:getFullCode() .. '" class="tr mention-tr Latn">',
'</span>' }
else
tag = { '<span lang="' .. lang:getFullCode() .. '" class="tr Latn">', '</span>' }
end
elseif item_type == "ts" then
-- \226\129\160 = word joiner (zero-width non-breaking space) U+2060
tag = { '<span class="ts mention-ts Latn">/\226\129\160', '\226\129\160/</span>' }
elseif item_type == "pos" then
tag = { '<span class="ann-pos">', '</span>' }
elseif item_type == "non-gloss" then
tag = { '<span class="ann-non-gloss">', '</span>' }
elseif item_type == "annotations" then
tag = { '<span class="mention-gloss-paren annotation-paren">(</span>',
'<span class="mention-gloss-paren annotation-paren">)</span>' }
elseif item_type == "infl" then
tag = { '<span class="ann-infl">', '</span>' }
end
if type(text) == "string" then
return tag[1] .. text .. tag[2]
else
return ""
end
end
local pos_tags
--[==[Formats the annotations that are displayed with a link created by {{code|lua|full_link}}. Annotations are the extra bits of information that are displayed following the linked term, and include things such as gender, transliteration, gloss and so on.
* The first argument is a table possessing some or all of the following keys:
*:; <code class="n">genders</code>
*:: Table containing a list of gender specifications in the style of [[Module:gender and number]].
*:; <code class="n">tr</code>
*:: Transliteration.
*:; <code class="n">gloss</code>
*:: Gloss that translates the term in the link, or gives some other descriptive information.
*:; <code class="n">pos</code>
*:: Part of speech of the linked term. If the given argument matches one of the aliases in `pos_aliases` in [[Module:headword/data]], or consists of a part of speech or alias followed by `f` (for a non-lemma form), expand it appropriately. Otherwise, just show the given text as it is.
*:; <code class="n">ng</code>
*:: Arbitrary non-gloss descriptive text for the link. This should be used in preference to putting descriptive text in `gloss` or `pos`.
*:; <code class="n">lit</code>
*:: Literal meaning of the term, if the usual meaning is figurative or idiomatic.
*:; <code class="n">infl</code>
*:: Table containing a list of grammar tags in the style of [[Module:form of]] `tagged_inflections`.
*:Any of the above values can be omitted from the <code class="n">info</code> argument. If a completely empty table is given (with no annotations at all), then an empty string is returned.
* The second argument is a string. Valid values are listed in [[Module:script utilities/data]] "data.translit" table.]==]
function export.format_link_annotations(data, face)
local output = {}
-- Interwiki link
if data.interwiki then
insert(output, data.interwiki)
end
-- Genders
if type(data.genders) ~= "table" then
data.genders = { data.genders }
end
if data.genders and #data.genders > 0 then
local genders, gender_cats = format_genders(data.genders, data.lang)
insert(output, " " .. genders)
if gender_cats then
local cats = data.cats
if cats then
extend(cats, gender_cats)
end
end
end
local annotations = {}
-- Transliteration and transcription
if data.tr and data.tr[1] or data.ts and data.ts[1] then
local kind
if face == "term" then
kind = face
else
kind = "default"
end
if data.tr[1] and data.ts[1] then
insert(annotations, tag_translit(data.tr[1], data.lang, kind) .. " " .. export.mark(data.ts[1], "ts"))
elseif data.ts[1] then
insert(annotations, export.mark(data.ts[1], "ts"))
else
insert(annotations, tag_translit(data.tr[1], data.lang, kind))
end
end
-- Gloss/translation
if data.gloss then
insert(annotations, export.mark(data.gloss, "gloss"))
end
-- Part of speech
if data.pos then
-- debug category for pos= containing transcriptions
if data.pos:match("/[^><]-/") then
data.pos = data.pos .. "[[Category:links likely containing transcriptions in pos]]"
end
-- Canonicalize part of speech aliases as well as non-lemma aliases like 'nf' or 'nounf' for "noun form".
pos_tags = pos_tags or (m_headword_data or get_headword_data()).pos_aliases
local pos = pos_tags[data.pos]
if not pos and data.pos:find("f$") then
local pos_form = data.pos:sub(1, -2)
-- We only expand something ending in 'f' if the result is a recognized non-lemma POS.
pos_form = (pos_tags[pos_form] or pos_form) .. " form"
if (m_headword_data or get_headword_data()).nonlemmas[pos_form .. "s"] then
pos = pos_form
end
end
insert(annotations, export.mark(pos or data.pos, "pos"))
end
-- Inflection data
if data.infl then
local m_form_of = require(form_of_module)
-- Split tag sets manually, since tagged_inflections creates a numbered list, and we do not want that.
local infl_outputs = {}
local tag_sets = m_form_of.split_tag_set(data.infl)
for _, tag_set in ipairs(tag_sets) do
table.insert(infl_outputs,
m_form_of.tagged_inflections({ tags = tag_set, lang = data.lang, nocat = true, nolink = true, nowrap = true }))
end
insert(annotations, export.mark(table.concat(infl_outputs, "; "), "infl"))
end
-- Non-gloss text
if data.ng then
insert(annotations, export.mark(data.ng, "non-gloss"))
end
-- Literal/sum-of-parts meaning
if data.lit then
insert(annotations, "literally " .. export.mark(data.lit, "gloss"))
end
-- Provide a hook to insert additional annotations such as nested inflections.
if data.postprocess_annotations then
data.postprocess_annotations {
data = data,
annotations = annotations
}
end
if #annotations > 0 then
insert(output, " " .. export.mark(concat(annotations, ", "), "annotations"))
end
return concat(output)
end
-- Encode certain characters to avoid various delimiter-related issues at various stages. We need to encode < and >
-- because they end up forming part of CSS class names inside of <span ...> and will interfere with finding the end
-- of the HTML tag. I first tried converting them to URL encoding, i.e. %3C and %3E; they then appear in the URL as
-- %253C and %253E, which get mapped back to %3C and %3E when passed to [[Module:accel]]. But mapping them to <
-- and > somehow works magically without any further work; they appear in the URL as < and >, and get passed to
-- [[Module:accel]] as < and >. I have no idea who along the chain of calls is doing the encoding and decoding. If
-- someone knows, please modify this comment appropriately!
local accel_char_map
local function get_accel_char_map()
accel_char_map = {
["%"] = ".",
[" "] = "_",
["_"] = u(0xFFF0),
["<"] = "<",
[">"] = ">",
}
return accel_char_map
end
local function encode_accel_param_chars(param)
return (param:gsub("[% <>_]", accel_char_map or get_accel_char_map()))
end
local function encode_accel_param(prefix, param)
if not param then
return ""
end
if type(param) == "table" then
local filled_params = {}
-- There may be gaps in the sequence, especially for translit params.
local maxindex = 0
for k in pairs(param) do
if type(k) == "number" and k > maxindex then
maxindex = k
end
end
for i = 1, maxindex do
filled_params[i] = param[i] or ""
end
-- [[Module:accel]] splits these up again.
param = concat(filled_params, "*~!")
end
-- This is decoded again by [[WT:ACCEL]].
return prefix .. encode_accel_param_chars(param)
end
local function insert_if_not_blank(list, item)
if item == "" then
return
end
insert(list, item)
end
local function get_class(lang, tr, accel, nowrap)
if not accel and not nowrap then
return ""
end
local classes = {}
if accel then
insert(classes, "form-of lang-" .. lang:getFullCode())
local form = accel.form
if form then
insert(classes, encode_accel_param_chars(form) .. "-form-of")
end
insert_if_not_blank(classes, encode_accel_param("gender-", accel.gender))
insert_if_not_blank(classes, encode_accel_param("pos-", accel.pos))
insert_if_not_blank(classes, encode_accel_param("transliteration-", accel.translit or (tr ~= "-" and tr or nil)))
insert_if_not_blank(classes, encode_accel_param("target-", accel.target))
insert_if_not_blank(classes, encode_accel_param("origin-", accel.lemma))
insert_if_not_blank(classes, encode_accel_param("origin_transliteration-", accel.lemma_translit))
if accel.no_store then
insert(classes, "form-of-nostore")
end
end
if nowrap then
insert(classes, nowrap)
end
return concat(classes, " ")
end
-- Add any left or right regular or accent qualifiers, labels or references to a formatted term. `data` is the object
-- specifying the term, which should optionally contain:
-- * a language object in `lang`; required if any accent qualifiers or labels are given;
-- * left regular qualifiers in `q` (an array of strings or a single string); an empty array or blank string will be
-- ignored;
-- * right regular qualifiers in `qq` (an array of strings or a single string); an empty array or blank string will be
-- ignored;
-- * left accent qualifiers in `a` (an array of strings); an empty array will be ignored;
-- * right accent qualifiers in `aa` (an array of strings); an empty array will be ignored;
-- * left labels in `l` (an array of strings); an empty array will be ignored;
-- * right labels in `ll` (an array of strings); an empty array will be ignored;
-- * references in `refs`, an array either of strings (formatted reference text) or objects containing fields `text`
-- (formatted reference text) and optionally `name` and/or `group`.
-- `formatted` is the formatted version of the term itself.
local function add_qualifiers_and_refs_to_term(data, formatted)
local q = data.q
if type(q) == "string" then
q = { q }
end
local qq = data.qq
if type(qq) == "string" then
qq = { qq }
end
if q and q[1] or qq and qq[1] or data.a and data.a[1] or data.aa and data.aa[1] or data.l and data.l[1] or
data.ll and data.ll[1] or data.refs and data.refs[1] then
formatted = format_qualifiers {
lang = data.lang,
text = formatted,
q = q,
qq = qq,
a = data.a,
aa = data.aa,
l = data.l,
ll = data.ll,
refs = data.refs,
}
end
return formatted
end
--[==[
Creates a full link, with annotations (see `[[#format_link_annotations|format_link_annotations]]`), in the style of {{tl|l}} or {{tl|m}}.
The first argument, `data`, must be a table. It contains the various elements that can be supplied as parameters to {{tl|l}} or {{tl|m}}:
{ {
term = entry_to_link_to,
alt = link_text_or_displayed_text,
lang = language_object,
sc = script_object,
track_sc = boolean,
no_nonstandard_sc_cat = boolean,
fragment = link_fragment,
id = sense_id,
genders = { "gender1", "gender2", ... },
tr = transliteration,
respect_link_tr = boolean,
ts = transcription,
gloss = gloss,
pos = part_of_speech_tag,
ng = non-gloss text,
lit = literal_translation,
infl = { "form_of_grammar_tag1", "form_of_grammar_tag2", ... },
no_alt_ast = boolean,
accel = {accelerated_creation_tags},
interwiki = interwiki,
pretext = "text_at_beginning" or nil,
posttext = "text_at_end" or nil,
q = { "left_qualifier1", "left_qualifier2", ...} or "left_qualifier",
qq = { "right_qualifier1", "right_qualifier2", ...} or "right_qualifier",
l = { "left_label1", "left_label2", ...},
ll = { "right_label1", "right_label2", ...},
a = { "left_accent_qualifier1", "left_accent_qualifier2", ...},
aa = { "right_accent_qualifier1", "right_accent_qualifier2", ...},
refs = { "formatted_ref1", "formatted_ref2", ...} or { {text = "text", name = "name", group = "group"}, ... },
show_qualifiers = boolean,
} }
Any one of the items in the `data` table may be {nil}, but an error will be shown if neither `term` nor `alt` nor `tr`
is present. Thus, calling {full_link{ term = term, lang = lang, sc = sc }}, where `term` is the page to link to (which
may have diacritics that will be stripped and/or embedded bracketed links) and `lang` is a
[[Module:languages#Language objects|language object]] from [[Module:languages]], will give a plain link similar to the
one produced by the template {{tl|l}}, and calling {full_link( { term = term, lang = lang, sc = sc }, "term" )} will
give a link similar to the one produced by the template {{tl|m}}.
The function will:
* Try to determine the script, based on the characters found in the `term` or `alt` argument, if the script was not
given. If a script is given and `track_sc` is {true}, it will check whether the input script is the same as the one
which would have been automatically generated and add the category [[:Category:LANG terms with redundant script codes]]
if yes, or [[:Category:LANG terms with non-redundant manual script codes]] if no. This should be used when the input
script object is directly determined by a template's `sc` parameter.
* Call `[[#language_link|language_link]]` on the `term` or `alt` forms, to remove diacritics in the page name, process
any embedded wikilinks and create links to Reconstruction or Appendix pages when necessary.
* Call `[[Module:script utilities#tag_text]]` to add the appropriate language and script tags to the term and
italicize terms written in the Latin script if necessary. Accelerated creation tags, as used by [[WT:ACCEL]], are
included.
* Generate a transliteration, based on the `alt` or `term` arguments, if the script is not Latin, no transliteration was
provided in `tr` and the combination of the term's language and script support automatic transliteration. The
transliteration itself will be linked if both `.respect_link_tr` is specified and the language of the term has the
`link_tr` property set for the script of the term; but not otherwise.
* Add the annotations (transliteration, gender, gloss, etc.) after the link.
* If `no_alt_ast` is specified, then the `alt` text does not need to contain an asterisk if the language is
reconstructed. This should only be used by modules which really need to allow links to reconstructions that don't
display asterisks (e.g. number boxes).
* If `pretext` or `posttext` is specified, this is text to (respectively) prepend or append to the output, directly
before processing qualifiers, labels and references. This can be used to add arbitrary extra text inside of the
qualifiers, labels and references.
* If `show_qualifiers` is specified or the `show_qualifiers` argument is given, then left and right qualifiers, accent
qualifiers, labels and references will be displayed, otherwise they will be ignored. (This is because a fair amount of
code stores qualifiers, labels and/or references in these fields and displays them itself, rather than expecting
{full_link()} to display them.)]==]
function export.full_link(data, face, allow_self_link, show_qualifiers)
if type(data) ~= "table" then
error("The first argument to the function full_link must be a table. "
.. "See Module:links/documentation for more information.")
elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then
track("escaped", "full_link")
end
-- Prevent data from being destructively modified.
local data = shallow_copy(data)
-- FIXME: this shouldn't be added to `data`, as that means the input table needs to be cloned.
data.cats = {}
-- Categorize links to "und".
local lang, cats = data.lang, data.cats
if cats and lang:getCode() == "und" then
insert(cats, "Undetermined language links")
end
local terms = { true }
-- Generate multiple forms if applicable.
for _, param in ipairs { "term", "alt" } do
if type(data[param]) == "string" and data[param]:find("//", nil, true) then
data[param] = export.split_on_slashes(data[param])
elseif type(data[param]) == "string" and not (type(data.term) == "string" and data.term:find("//", nil, true)) then
if not data.no_generate_forms then
data[param] = lang:generateForms(data[param])
else
data[param] = { data[param] }
end
else
data[param] = {}
end
end
for _, param in ipairs { "sc", "tr", "ts" } do
data[param] = { data[param] }
end
for _, param in ipairs { "term", "alt", "sc", "tr", "ts" } do
for i in pairs(data[param]) do
terms[i] = true
end
end
-- Create the link
local output = {}
local id, no_alt_ast, srwc, accel, nevercalltr = data.id, data.no_alt_ast, data.suppress_redundant_wikilink_cat,
data.accel, data.never_call_transliteration_module
local link_tr = data.respect_link_tr and lang:link_tr(data.sc[1])
for i in ipairs(terms) do
local link
-- Is there any text to show?
if (data.term[i] or data.alt[i]) then
-- Try to detect the script if it was not provided
local display_term = data.alt[i] or data.term[i]
local best = lang:findBestScript(display_term)
-- no_nonstandard_sc_cat is intended for use in [[Module:interproject]]
if (
not data.no_nonstandard_sc_cat and
best:getCode() == "None" and
find_best_script_without_lang(display_term):getCode() ~= "None"
) then
insert(cats, lang:getFullName() .. " terms in nonstandard scripts")
end
if not data.sc[i] then
data.sc[i] = best
-- Track uses of sc parameter.
elseif data.track_sc then
if data.sc[i]:getCode() == best:getCode() then
insert(cats, lang:getFullName() .. " terms with redundant script codes")
else
insert(cats, lang:getFullName() .. " terms with non-redundant manual script codes")
end
end
-- If using a discouraged character sequence, add to maintenance category
if data.sc[i]:hasNormalizationFixes() == true then
if (data.term[i] and data.sc[i]:fixDiscouragedSequences(toNFC(data.term[i])) ~= toNFC(data.term[i])) or (data.alt[i] and data.sc[i]:fixDiscouragedSequences(toNFC(data.alt[i])) ~= toNFC(data.alt[i])) then
insert(cats, "Pages using discouraged character sequences")
end
end
link = simple_link(
data.term[i],
data.fragment,
data.alt[i],
lang,
data.sc[i],
id,
cats,
no_alt_ast,
srwc
)
end
-- simple_link can return nil, so check if a link has been generated.
if link then
-- Add "nowrap" class to prefixes in order to prevent wrapping after the hyphen
local nowrap
local display_term = data.alt[i] or data.term[i]
if display_term and (display_term:find("^%-") or display_term:find("^־")) then -- Hebrew maqqef -- FIXME, use hyphens from [[Module:affix]]
nowrap = "nowrap"
end
link = tag_text(link, lang, data.sc[i], face, get_class(lang, data.tr[i], accel, nowrap))
else
--[[ No term to show.
Is there at least a transliteration we can work from? ]]
link = request_script(lang, data.sc[i])
-- No link to show, and no transliteration either. Show a term request (unless it's a substrate, as they rarely take terms).
if (link == "" or (not data.tr[i]) or data.tr[i] == "-") and lang:getFamilyCode() ~= "qfa-sub" then
-- If there are multiple terms, break the loop instead.
if i > 1 then
remove(output)
break
elseif NAMESPACE ~= "Template" then
insert(cats, lang:getFullName() .. " term requests")
end
link = "<small>[Term?]</small>"
end
end
insert(output, link)
if i < #terms then insert(output, "<span class=\"Zsym mention\" style=\"font-size:100%;\"> / </span>") end
end
-- When suppress_tr is true, do not show or generate any transliteration
if data.suppress_tr then
data.tr[1] = nil
else
-- TODO: Currently only handles the first transliteration, pending consensus on how to handle multiple translits for multiple forms, as this is not always desirable (e.g. traditional/simplified Chinese).
if data.tr[1] == "" or data.tr[1] == "-" then
data.tr[1] = nil
else
local phonetic_extraction = load_data("Module:links/data").phonetic_extraction
phonetic_extraction = phonetic_extraction[lang:getCode()] or phonetic_extraction[lang:getFullCode()]
if phonetic_extraction then
data.tr[1] = data.tr[1] or
require(phonetic_extraction).getTranslit(export.remove_links(data.alt[1] or data.term[1]))
elseif (data.term[1] or data.alt[1]) and data.sc[1]:isTransliterated() then
-- Track whenever there is manual translit. The categories below like 'terms with redundant transliterations'
-- aren't sufficient because they only work with reference to automatic translit and won't operate at all in
-- languages without any automatic translit, like Persian and Hebrew.
if data.tr[1] then
local full_code = lang:getFullCode()
track("manual-tr", full_code)
end
if not nevercalltr then
-- Try to generate a transliteration.
local text = data.alt[1] or data.term[1]
if not link_tr then
text = export.remove_links(text, true)
end
local automated_tr = lang:transliterate(text, data.sc[1])
if automated_tr then
local manual_tr = data.tr[1]
if manual_tr then
if export.remove_links(manual_tr) == export.remove_links(automated_tr) then
insert(cats, lang:getFullName() .. " terms with redundant transliterations")
else
-- Prevents Arabic root categories from flooding the tracking categories.
if NAMESPACE ~= "Category" then
insert(cats,
lang:getFullName() .. " terms with non-redundant manual transliterations")
end
end
end
if not manual_tr or lang:overrideManualTranslit(data.sc[1]) then
data.tr[1] = automated_tr
end
end
end
end
end
end
-- Link to the transliteration entry for languages that require this
if data.tr[1] and link_tr and not data.tr[1]:match("%[%[(.-)%]%]") then
data.tr[1] = simple_link(
data.tr[1],
nil,
nil,
lang,
get_script("Latn"),
nil,
cats,
no_alt_ast,
srwc
)
elseif data.tr[1] and not link_tr then
-- Remove the pseudo-HTML tags added by remove_links.
data.tr[1] = data.tr[1]:gsub("</?link>", "")
end
if data.tr[1] and not umatch(data.tr[1], "[^%s%p]") then data.tr[1] = nil end
insert(output, export.format_link_annotations(data, face))
if data.pretext then
insert(output, 1, data.pretext)
end
if data.posttext then
insert(output, data.posttext)
end
local categories = cats[1] and format_categories(cats, lang, "-", nil, nil, data.sc) or ""
output = concat(output)
if show_qualifiers or data.show_qualifiers then
output = add_qualifiers_and_refs_to_term(data, output)
end
return output .. categories
end
--[==[Replaces all wikilinks with their displayed text, and removes any categories. This function can be invoked either from a template or from another module.
-- Strips links: deletes category links, the targets of piped links, and any double square brackets involved in links (other than file links, which are untouched). If `tag` is set, then any links removed will be given pseudo-HTML tags, which allow the substitution functions in [[Module:languages]] to properly subdivide the text in order to reduce the chance of substitution failures in modules which scrape pages like [[Module:zh-translit]].
-- FIXME: This is quite hacky. We probably want this to be integrated into [[Module:languages]], but we can't do that until we know that nothing is pushing pipe linked transliterations through it for languages which don't have link_tr set.
* <code><nowiki>[[page|displayed text]]</nowiki></code> → <code><nowiki>displayed text</nowiki></code>
* <code><nowiki>[[page and displayed text]]</nowiki></code> → <code><nowiki>page and displayed text</nowiki></code>
* <code><nowiki>[[Category:English lemmas|WORD]]</nowiki></code> → ''(nothing)'']==]
function export.remove_links(text, tag)
if type(text) == "table" then
text = text.args[1]
end
if not text or text == "" then
return ""
end
text = text
:gsub("%[%[", "\1")
:gsub("%]%]", "\2")
-- Parse internal links for the display text.
text = text:gsub("(\1)([^\1\2]-)(\2)",
function(c1, c2, c3)
-- Don't remove files.
for _, false_positive in ipairs({ "file", "image" }) do
if c2:lower():match("^" .. false_positive .. ":") then return c1 .. c2 .. c3 end
end
-- Remove categories completely.
for _, false_positive in ipairs({ "category", "cat" }) do
if c2:lower():match("^" .. false_positive .. ":") then return "" end
end
-- In piped links, remove all text before the pipe, unless it's the final character (i.e. the pipe trick), in which case just remove the pipe.
c2 = c2:match("^[^|]*|(.+)") or c2:match("([^|]+)|$") or c2
if tag then
return "<link>" .. c2 .. "</link>"
else
return c2
end
end)
text = text
:gsub("\1", "[[")
:gsub("\2", "]]")
return text
end
function export.section_link(link)
if type(link) ~= "string" then
error("The first argument to section_link was a " .. type(link) .. ", but it should be a string.")
elseif link:find("\\", nil, true) then
track("escaped", "section_link")
end
local target, section = get_fragment((link:gsub("_", " ")))
if not section then
error("No \"#\" delineating a section name")
end
return simple_link(
target,
section,
target .. " § " .. section
)
end
return export
go1a5j6fymqq8baizjbz87vusf7r536
Module:links/templates
828
8184
27719
2026-06-22T06:46:36Z
Umarxon III
2840
Sahypa döretdi, mazmuny: '-- Prevent substitution. if mw.isSubsting() then return require("Module:unsubst") end local export = {} local links_module = "Module:links" local process_params = require("Module:parameters").process local remove = table.remove local upper = require("Module:string utilities").upper --[=[ Modules used: [[Module:links]] [[Module:languages]] [[Module:scripts]] [[Module:parameters]] [[Module:debug]] ]=] do local function get_args(frame) -- `compat` is a...'
27719
Scribunto
text/plain
-- Prevent substitution.
if mw.isSubsting() then
return require("Module:unsubst")
end
local export = {}
local links_module = "Module:links"
local process_params = require("Module:parameters").process
local remove = table.remove
local upper = require("Module:string utilities").upper
--[=[
Modules used:
[[Module:links]]
[[Module:languages]]
[[Module:scripts]]
[[Module:parameters]]
[[Module:debug]]
]=]
do
local function get_args(frame)
-- `compat` is a compatibility mode for {{term}}.
-- If given a nonempty value, the function uses lang= to specify the
-- language, and all the positional parameters shift one number lower.
local iargs = frame.args
iargs.compat = iargs.compat and iargs.compat ~= ""
iargs.langname = iargs.langname and iargs.langname ~= ""
iargs.notself = iargs.notself and iargs.notself ~= ""
local alias_of_4 = {alias_of = 4}
local boolean = {type = "boolean"}
local params = {
[1] = {required = true, type = "language", default = "und"},
[2] = true,
[3] = true,
[4] = true,
g = {list = true, type = "genders", flatten = true},
gloss = alias_of_4,
id = true,
lit = true,
ng = true,
pos = true,
sc = {type = "script"},
t = alias_of_4,
tr = true,
ts = true,
q = {type = "qualifier"},
qq = {type = "qualifier"},
l = {type = "labels"},
ll = {type = "labels"},
ref = {type = "references"},
["accel-form"] = true,
["accel-translit"] = true,
["accel-lemma"] = true,
["accel-lemma-translit"] = true,
["accel-gender"] = true,
["accel-nostore"] = boolean,
}
if iargs.compat then
params.lang = {type = "language", default = "und"}
remove(params, 1)
alias_of_4.alias_of = 3
end
if iargs.langname then
params.w = boolean
end
return process_params(frame:getParent().args, params), iargs
end
-- Used in [[Template:l]] and [[Template:m]].
function export.l_term_t(frame)
local args, iargs = get_args(frame)
local compat = iargs.compat
local lang = args[compat and "lang" or 1]
-- Tracking for und.
if not compat and lang:getCode() == "und" then
require("Module:debug").track("link/und")
end
local term = args[(compat and 1 or 2)]
local alt = args[(compat and 2 or 3)]
term = term ~= "" and term or nil
if not term and not alt and iargs.demo then
term = iargs.demo
end
local langname = iargs.langname and (
args.w and lang:makeWikipediaLink() or
lang:getCanonicalName()
) or nil
if langname and term == "-" then
return langname
end
-- Forward the information to full_link
return (langname and langname .. " " or "") .. require(links_module).full_link(
{
lang = lang,
sc = args.sc,
track_sc = true,
term = term,
alt = alt,
gloss = args[4],
id = args.id,
tr = args.tr,
ts = args.ts,
genders = args.g,
pos = args.pos,
ng = args.ng,
lit = args.lit,
q = args.q,
qq = args.qq,
l = args.l,
ll = args.ll,
refs = args.ref,
show_qualifiers = true,
accel = args["accel-form"] and {
form = args["accel-form"],
translit = args["accel-translit"],
lemma = args["accel-lemma"],
lemma_translit = args["accel-lemma-translit"],
gender = args["accel-gender"],
nostore = args["accel-nostore"],
} or nil
},
iargs.face,
not iargs.notself
)
end
-- Used in [[Template:link-annotations]].
function export.l_annotations_t(frame)
local args, iargs = get_args(frame)
-- Forward the information to format_link_annotations
return require(links_module).format_link_annotations(
{
lang = args[1],
tr = { args.tr },
ts = { args.ts },
genders = args.g,
pos = args.pos,
ng = args.ng,
lit = args.lit
},
iargs.face
)
end
end
-- Used in [[Template:ll]].
do
local function get_args(frame)
return process_params(frame:getParent().args, {
[1] = {required = true, type = "language", default = "und"},
[2] = {allow_empty = true},
[3] = true,
id = true,
sc = {type = "script"},
})
end
function export.ll(frame)
local args = get_args(frame)
local lang = args[1]
local sc = args.sc
local term = args[2]
term = term ~= "" and term or nil
return require(links_module).language_link{
lang = lang,
sc = sc,
term = term,
alt = args[3],
id = args.id
} or "<small>[Term?]</small>" ..
require("Module:utilities").format_categories(
{lang:getFullName() .. " term requests"},
lang, "-", nil, nil, sc
)
end
end
function export.def_t(frame)
local args = process_params(frame:getParent().args, {
[1] = {required = true, default = ""},
})
local face = frame.args.face
local ret = require("Module:script utilities").tag_definition(require(links_module).embedded_language_links{
term = args[1],
lang = require("Module:languages").getByCode("en"),
sc = require("Module:scripts").getByCode("Latn")
}, face)
if face == "non-gloss" then
return ret
end
return '<span class="mention-gloss-paren">(</span>' .. ret .. '<span class="mention-gloss-paren">)</span>'
end
function export.linkify_t(frame)
local args = process_params(frame:getParent().args, {
[1] = {required = true, default = ""},
})
args[1] = mw.text.trim(args[1])
if args[1] == "" or args[1]:find("[[", nil, true) then
return args[1]
end
return "[[" .. args[1] .. "]]"
end
function export.cap_t(frame)
local args = process_params(frame:getParent().args, {
[1] = {required = true},
[2] = true,
lang = {type = "language", default = "en"},
})
local term = args[1]
return require(links_module).full_link{
lang = args.lang,
term = term,
alt = term:gsub("^.[\128-\191]*", upper) .. (args[2] or "")
}
end
function export.section_link_t(frame)
local args = process_params(frame:getParent().args, {
[1] = {},
})
return require(links_module).section_link(args[1])
end
return export
49ywpqwyhc1meeszxumsw7m3fxck38v
Module:links/testcases
828
8185
27720
2026-06-22T06:47:40Z
Umarxon III
2840
Sahypa döretdi, mazmuny: '--[=[ Unit tests for [[Module:links]]. Click talk page to run tests. ]=] local p = require('Module:UnitTests') local m_links = require('Module:links') local m_util = require('Module:utilities') local get_lang_by_code = require("Module:languages").getByCode local function tag(lang_code, sc_code) return function (text) return '<span class="' .. sc_code .. '" lang="' .. lang_code .. '">' .. text .. '</span>' end end local options = { nowiki = true, show_diff...'
27720
Scribunto
text/plain
--[=[
Unit tests for [[Module:links]]. Click talk page to run tests.
]=]
local p = require('Module:UnitTests')
local m_links = require('Module:links')
local m_util = require('Module:utilities')
local get_lang_by_code = require("Module:languages").getByCode
local function tag(lang_code, sc_code)
return function (text)
return '<span class="' .. sc_code .. '" lang="' .. lang_code .. '">' .. text .. '</span>'
end
end
local options = { nowiki = true, show_difference = true }
function p:check_link(example, expected)
self:preprocess_equals(example, expected, options)
end
function p:test_links()
local frame = mw.getCurrentFrame()
local temp = frame.args.temp or "l"
local compat = frame.args.compat
local lang = compat and "lang=" or ""
local link_examples = {
'anchor',
{
'{{' .. temp .. '|' .. lang .. 'en|-er#Etymology 2|-er}}',
'<span class="Latn" lang="en">[[-er#Etymology 2|-er]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'en|[[-er#Etymology 2|-er]]}}',
'<span class="Latn" lang="en">[[-er#Etymology 2|-er]]</span>'
},
'character entity references in link target',
{
'{{' .. temp .. '|' .. lang .. 'nia|wa'a}}',
'<span class="Latn" lang="nia">[[wa\'a#Nias|wa\'a]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'nia|wa'a}}',
'<span class="Latn" lang="nia">[[wa\'a#Nias|wa\'a]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'ja|恵‌美|恵&#8204;美}}',
'<span class="Jpan" lang="ja">[[恵美#Japanese|恵&#8204;美]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'en|&}}',
'<span class="None" lang="en">[[Unsupported titles/Amp#English|&]]</span>'
},
'simple linking', -- ([[Module:languages]])
{
'{{' .. temp .. '|' .. lang .. 'la|verbum}}',
'<span class="Latn" lang="la">[[verbum#Latin|verbum]]</span>'
},
'using wikilinks',
{
'{{' .. temp .. '|' .. lang .. 'en|[[God]] be [[with]] [[you]]}}',
'<span class="Latn" lang="en">[[God#English|God]] be [[with#English|with]] [[you#English|you]]</span>'
},
'alternative text',
{
'{{' .. temp .. '|' .. lang .. 'en|go|went}}',
'<span class="Latn" lang="en">[[go#English|went]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'en|to [[go]]|went}}',
'<span class="Latn" lang="en">to [[go#English|go]]</span>'
},
'sense id',
{
'{{' .. temp .. '|' .. lang .. 'en|go|id=game}}',
'<span class="Latn" lang="en">[[go#English-game|go]]</span>'
},
'constructed terms', -- ([[Module:languages]])
{
'{{' .. temp .. '|' .. lang .. 'sjn|mithril}}',
'<span class="Latn" lang="sjn">[[Appendix:Sindarin/mithril|mithril]]</span>'
},
'reconstructed terms', -- ([[Module:languages]])
{
'{{' .. temp .. '|' .. lang .. 'ine-pro|*bʰréh₂tēr}}',
'<span class="Latn" lang="ine-pro">[[Reconstruction:Proto-Indo-European/bʰréh₂tēr|*bʰréh₂tēr]]</span>'
},
{
'{{#iferror:{{' .. temp .. '|' .. lang .. 'ine-pro|bʰréh₂tēr}}|Script error}}',
'Script error'
},
{
'{{' .. temp .. '|' .. lang .. 'sla-pro|[[*dьnь]] [[*serda]]}}',
'<span class="Latn" lang="sla-pro">[[Reconstruction:Proto-Slavic/dьnь|*dьnь]] [[Reconstruction:Proto-Slavic/serda|*serda]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'la|verbum .. [[verbum]] .. [[*verbum]] .. [[*verbum|verbum]] .. [[*verbum|*verba]]}}',
'<span class="Latn" lang="la">verbum .. [[verbum#Latin|verbum]] .. [[Reconstruction:Latin/verbum|*verbum]] .. [[Reconstruction:Latin/verbum|verbum]] .. [[Reconstruction:Latin/verbum|*verba]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'sla-pro|*[[serda]]}}',
'<span class="Latn" lang="sla-pro">*[[Reconstruction:Proto-Slavic/serda|serda]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'sla-pro|*[[*serda]] .. [[*serda]] .. [[serda]] .. [[*serda|serda]] .. [[*serda|*serda]]}}',
'<span class="Latn" lang="sla-pro">[[Reconstruction:Proto-Slavic/*serda|*serda]] .. [[Reconstruction:Proto-Slavic/*serda|*serda]] .. [[Reconstruction:Proto-Slavic/serda|serda]] .. [[Reconstruction:Proto-Slavic/*serda|serda]] .. [[Reconstruction:Proto-Slavic/*serda|*serda]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'sla-pro|*[[dьnь|alt1]] [[serda|alt2]]}}',
'<span class="Latn" lang="sla-pro">*[[Reconstruction:Proto-Slavic/dьnь|alt1]] [[Reconstruction:Proto-Slavic/serda|alt2]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'und|[[attested]] .. [[*unattested]] .. [[*unattested|unattested-alt]]}}',
'<span class="Zyyy" lang="und">[[attested|attested]] .. *unattested .. unattested-alt</span>[[Category:Undetermined language links]]'
},
'script detection', -- (lang_obj:findBestScript())
{
'{{' .. temp .. '|' .. lang .. 'sh|српски}} / {{' .. temp .. '|' .. lang .. 'sh|srpski}}',
'<span class="Cyrl" lang="sh">[[српски#Serbo-Croatian|српски]]</span> / <span class="Latn" lang="sh">[[srpski#Serbo-Croatian|srpski]]</span>'
},
'target page\'s title', -- (Language:stripDiacritics())
{
'{{' .. temp .. '|' .. lang .. 'la|verbō}}',
'<span class="Latn" lang="la">[[verbo#Latin|verbō]]</span>'
},
'gender and number', -- ([[Module:gender and number]])
{
'{{' .. temp .. '|' .. lang .. 'la|verbum|g=m}}',
'<span class="Latn" lang="la">[[verbum#Latin|verbum]]</span> <span class="gender"><abbr title="masculine gender">m</abbr></span>'
},
{
'{{' .. temp .. '|' .. lang .. 'la|verbum|g=m|g2=f}}',
'<span class="Latn" lang="la">[[verbum#Latin|verbum]]</span> <span class="gender"><abbr title="masculine gender">m</abbr> or <abbr title="feminine gender">f</abbr></span>'
},
'transliteration',
{
'{{' .. temp .. '|' .. lang .. 'ar|كلمة|tr=kalima}}',
'<span class="Arab" lang="ar">[[كلمة#Arabic|كلمة]]</span>‎ <span class="mention-gloss-paren annotation-paren">(</span><span lang="ar-Latn" class="tr Latn">kalima</span><span class="mention-gloss-paren annotation-paren">)</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'ru|русский}}',
'<span class="Cyrl" lang="ru">[[русский#Russian|русский]]</span> <span class="mention-gloss-paren annotation-paren">(</span><span lang="ru-Latn" class="tr Latn">russkij</span><span class="mention-gloss-paren annotation-paren">)</span>'
},
'gloss',
{
'{{' .. temp .. '|' .. lang .. 'ru|русский|gloss=Russian}}',
'<span class="Cyrl" lang="ru">[[русский#Russian|русский]]</span> <span class="mention-gloss-paren annotation-paren">(</span><span lang="ru-Latn" class="tr Latn">russkij</span>, <span class="mention-gloss-double-quote">“</span><span class="mention-gloss">Russian</span><span class="mention-gloss-double-quote">”</span><span class="mention-gloss-paren annotation-paren">)</span>'
},
'Wikipedia link',
{
'{{' .. temp .. '|' .. lang .. 'en|w:word}}',
'<span class="Latn" lang="en">[[w:word|word]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'en|[[w:English language]]}}',
'<span class="Latn" lang="en">[[w:English language|w:English language]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'en|[[wikipedia:English language]]}}',
'<span class="Latn" lang="en">[[wikipedia:English language|wikipedia:English language]]</span>'
},
'Linking to titles with special characters: asterisk, slash',
{
'{{' .. temp .. '|' .. lang .. 'mul|/}}',
'<span class="None" lang="mul">[[:/#Translingual|/]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'mul|//}}',
'<span class="None" lang="mul">[[://#Translingual|//]]</span>'
},
{
'{{' .. temp .. '|' .. lang .. 'mul|*}}',
'<span class="None" lang="mul">[[*#Translingual|*]]</span>'
},
}
self:iterate(link_examples, 'check_link')
end
function p:check_strip_diacritics(lang_code, unstripped, stripped)
local lang_obj = get_lang_by_code(lang_code)
local sc_code = lang_obj:findBestScript(unstripped):getCode()
self:equals(
('[%s] <i class="mention %s" lang="%s">%s</i>'):format(lang_code, sc_code, lang_code, unstripped),
lang_obj:stripDiacritics(unstripped),
stripped,
{ display = tag(lang_code, sc_code) }
)
end
function p:test_remove_diacritics()
-- insert here lines of the form:
local examples = {
{ 'ru', 'ба́бушка', 'бабушка' },
{ 'mk', 'ЃѓЌќ - е́а́́', 'ЃѓЌќ - еа' }, -- [[w:Macedonian alphabet]]
{ 'sh', 'Łł ĆćŃńŹź Ŭŭ - ȁàȃáā ȐȒŔ ѝӣ', 'Łł ĆćŃńŹź Ŭŭ - aaaaa RRR ии' }, -- [[w:Serbian Cyrillic alphabet]] / [[w:Gaj's Latin alphabet]]
{ 'grc', 'ᾱ, ᾱ́, ᾰ̓́', 'α, ά, ἄ' },
}
self:iterate(examples, 'check_strip_diacritics')
end
function p:test_section_link()
local examples = {
{
"w:Hindustani phonology#Vowels [ɛ], [ɛː]",
"[[w:Hindustani phonology#Vowels_%5B%C9%9B%5D,_%5B%C9%9B%CB%90%5D|"
.. "w:Hindustani phonology § Vowels [ɛ], [ɛː]]]"
},
}
self:iterate(
examples,
function (self, page, expected)
self:equals(
mw.text.nowiki(page),
m_links.section_link(page),
expected)
end)
end
return p
epu79d5ol18lhi07uskbf2m3w0ae2zr
Module:nn-inf
828
8186
27721
2026-06-22T06:48:37Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local lang = require("Module:languages").getByCode("nn") local links = require("Module:links") local export = {} function export.main(frame) local PAGENAME = mw.loadData("Module:headword/data").pagename local args = frame:getParent().args local root = '' local separator = '' if args[1] and args[1] ~= '' then root = args[1] else root = PAGENAME:gsub('[ae]$', '') end if args[2] and args[2] ~= '' then separator = args[2] else separator = '...'
27721
Scribunto
text/plain
local lang = require("Module:languages").getByCode("nn")
local links = require("Module:links")
local export = {}
function export.main(frame)
local PAGENAME = mw.loadData("Module:headword/data").pagename
local args = frame:getParent().args
local root = ''
local separator = ''
if args[1] and args[1] ~= '' then
root = args[1]
else
root = PAGENAME:gsub('[ae]$', '')
end
if args[2] and args[2] ~= '' then
separator = args[2]
else
separator = ','
end
if separator == ',' then
separator = ', '
end
local linkA = links.full_link{term = root .. 'a', lang = lang}
local linkE = links.full_link{term = root .. 'e', lang = lang}
return linkA .. separator .. linkE
end
return export
9p1t1c0rn0kfdm6ec9pv8n920y03jv0
Module:quick link
828
8187
27722
2026-06-22T06:49:45Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local function validate_lang(lang) return type(lang) == "table" and type(lang.getCode) == "function" end -- Entry names are processed with this function. Catalan has no entry-name -- replacements, but apparently uses straight apostrophes in entry titles and -- curly ones in displayed text. local function curly_apostrophe_to_straight(str) return (str:gsub("’", "'")) end -- Find grammatical terms such as "first person" and "plural" in part...'
27722
Scribunto
text/plain
local export = {}
local function validate_lang(lang)
return type(lang) == "table" and type(lang.getCode) == "function"
end
-- Entry names are processed with this function. Catalan has no entry-name
-- replacements, but apparently uses straight apostrophes in entry titles and
-- curly ones in displayed text.
local function curly_apostrophe_to_straight(str)
return (str:gsub("’", "'"))
end
-- Find grammatical terms such as "first person" and "plural" in part of the
-- Catalan personal pronouns table and link them.
-- FIXME: This is a massive hack and should not be in this module.
local function link_terms_in_ca_table(text)
local ordinals = { "first", "second", "third" }
text = text:gsub(
"((%d)%l+ %l+)",
function(whole_match, number)
return "[[" .. ordinals[tonumber(number)] .. " person|"
.. whole_match:gsub(" ", " ") .. "]]"
end)
text = text:gsub(
'![^\n]+singular.-class="notes%-row"',
function(table_interior)
return table_interior:gsub(
"%l+",
function(word)
local link
if word == "singular" or word == "neuter" or word:find "i[nv]e$"
or word:find "al$" then
link = word
elseif word == "majestic" then
link = "majestic plural|majestic"
end
if link then
return "[[" .. link .. "]]"
end
end)
end)
return text
end
function export.main(frame)
local params = {
title = {required = true},
lang = {type = "language"},
}
local args = require("Module:parameters").process(frame.args, params)
local title = args.title
local lang = args.lang
local content = frame:preprocess("{{" .. title .. "}}")
local m_links = require("Module:links")
local function full_link(entry, text)
if not text then
-- FIXME!!! Another nasty hack.
local curly_to_straight = curly_apostrophe_to_straight(entry)
if curly_to_straight ~= text then
text = entry
entry = curly_to_straight
end
end
return m_links.full_link { term = entry, alt = text, lang = lang, }
end
linked_content = content:gsub(
"%b[]",
function (potential_link)
if potential_link:sub(2, 2) == "[" and potential_link:sub(-2, -2) == "]" then
local link_contents = potential_link:sub(3, -3) -- strip off outer brackets
local target, text
if link_contents:find("|") then
target, text = link_contents:match("^([^|]+)|(.+)$")
else
target = link_contents
end
if target:find("^([^:]+):") or target:find("#") then
return potential_link
else
return full_link(target, text)
end
end
end)
if lang:getCode() == "ca" then
linked_content = link_terms_in_ca_table(linked_content)
end
return linked_content
end
return export
hia51mnglcb225uwtxz8p1tf0z7efcl
Module:ru-link
828
8188
27723
2026-06-22T06:50:53Z
Umarxon III
2840
Sahypa döretdi, mazmuny: 'local export = {} local full_link = require 'Module:links'.full_link local ru = require 'Module:languages'.getByCode 'ru' -- Just guessing at some of these! local abbreviations = { a = 'adjective', advpro = 'adverbial pronoun', anum = 'adjectival numeral', apro = 'adjectival pronoun', conj = 'conjunction', init = 'initialism', intj = 'interjection', num = 'numeral', part = 'particle', pr = 'preposition', s = 'substantive', spro = 'substantival pronoun', v...'
27723
Scribunto
text/plain
local export = {}
local full_link = require 'Module:links'.full_link
local ru = require 'Module:languages'.getByCode 'ru'
-- Just guessing at some of these!
local abbreviations = {
a = 'adjective', advpro = 'adverbial pronoun', anum = 'adjectival numeral',
apro = 'adjectival pronoun', conj = 'conjunction', init = 'initialism',
intj = 'interjection', num = 'numeral', part = 'particle',
pr = 'preposition', s = 'substantive', spro = 'substantival pronoun',
v = 'verb',
}
function export.link_list(frame)
local list = frame.args[1]
list = list:gsub(
'# ([^,]+), ([^\n]+)',
function (word, POS)
return '# ' .. full_link { lang = ru, term = word,
pos = abbreviations[POS] or POS }
end)
return list
end
return export
9osmvyhzd5k2kn3e9zk178kz8it8ht2