Wikisözlük tkwiktionary https://tk.wiktionary.org/wiki/Ba%C5%9F_Sahypa MediaWiki 1.47.0-wmf.7 case-sensitive Media Ýörite Çekişme Ulanyjy Ulanyjy çekişme Wikisözlük Wikisözlük çekişme Faýl Faýl çekişme MediaWiki MediaWiki çekişme Şablon Şablon çekişme Ýardam Ýardam çekişme Kategoriýa Kategoriýa çekişme TimedText TimedText talk Module Module talk Event Event talk Wikisözlük:Interfeýs administratorlary 4 8163 27697 2026-06-21T14:00:55Z Umarxon III 2840 Sahypa döretdi, mazmuny: ''''Interfeýs administratorlary''' — CSS we JavaScript bilen ýazylan Wikisözlüknyň hyzmat sahypalaryny (MediaWiki: Common.js, MediaWiki: Vector.css we [[Special:Gadjets]]) sanawynda görkezilen gadjet sahypalaryny redaktirlemäge hukugy bolan ulanyjylardyr. Bu sahypalarda Wikisözlüknyň mazmunynyň görkezilişini üýtgetmek, sahypalaryň özüni alyp barşyny üýtgetmek ýa-da çylşyrymly gurallary döretmek üçin ähli Wikisözlük redaktorlarynyň...' 27697 wikitext text/x-wiki '''Interfeýs administratorlary''' — CSS we JavaScript bilen ýazylan Wikisözlüknyň hyzmat sahypalaryny (MediaWiki: Common.js, MediaWiki: Vector.css we [[Special:Gadjets]]) sanawynda görkezilen gadjet sahypalaryny redaktirlemäge hukugy bolan ulanyjylardyr. Bu sahypalarda Wikisözlüknyň mazmunynyň görkezilişini üýtgetmek, sahypalaryň özüni alyp barşyny üýtgetmek ýa-da çylşyrymly gurallary döretmek üçin ähli Wikisözlük redaktorlarynyň we okyjylarynyň brauzerlerinde işleýän programma kody bar. Interfeýs dolandyryjylary, eýesiniň razylygy bilen ýa-da işleýşinde tehniki kynçylyklar bar bolsa, MediaWiki at giňişligindäki beýleki sahypalary, beýleki adamlaryň ýazgylaryny we stillerini redaktirläp bilerler. == Baýdagy bellemegiň esaslary == Interfeýs administratorynyň baýdagy ýokary derejeli tehniki başarnyk, şeýle hem jemgyýetiň ynamynyň ýokary derejesini talap edýändigi sebäpli, dalaşgäriň administrator, býurokrat ýa-da inerener baýdagy bolmagy hökmanydyr. Aýratyn ýagdaýlarda, bu baýdagy beýleki dil bölümlerinde ýa-da gaznanyň taslamalarynda bu baýdagy bolan ýa-da ýeterlik wagtyň dowamynda beýleki dil bölümlerini ýa-da taslamalaryny tehniki taýdan hyzmat eden rus Wikisözlük gatnaşyjylaryna wagtlaýyn meseleleri çözmek üçin interfeýs dolandyryjy baýdagy bellenilip bilner. Baýdak Wikisözlük sahypasynda bellendi: [[Wikisözlük:Interfeýs dolandyryjylaryna haýyş|Interfeýs dolandyryjylaryna haýyş]]. Programmada, gatnaşyjy "Interfeýs dolandyryjylarynyň tehniki başarnyklary" bölüminde görkezilen islendik usul bilen tehniki başarnyklaryny subut etmeli. Mundan başga-da, çekişme wagtynda dalaşgäre tehniki başarnyklaryny barlamaga gönükdirilen soraglar berilip bilner. Ara alyp maslahatlaşmak azyndan bir hepde dowam edýär. Ondan soň, býurokratlar tehniki başarnyk we jemgyýetiň dalaşgäre bolan ynam derejesi baradaky argumentlere baha berýärler we baýdagyň bellenilmegi barada karar berýärler. Wikisözlüknyň işleýşine ýaramaz täsir edýän ýa-da aç-açan ylalaşyga garşy edilen üýtgeşmeleri çözmek üçin baýdak wagtlaýynça býurokratlar tarapyndan bellenip bilner. Baýdak bu hereketleri tamamlamak üçin zerur döwür üçin bellenilýär. == Interfeýs dolandyryjylarynyň tehniki başarnyklary == Interfeýs administratorynyň baýdagyny soraýan gatnaşyjynyň tehniki başarnyklary, ony almak üçin hökmany talapdyr. Başarnyk islendik görnüşde görkezilip bilner: * Şablonlara, şahsy CSS we JS-de üýtgeşmeler; * Mysallar ýa-da gadjetlerde, umumy CSS ýa-da JS-de taýýar ýerine ýetiriş bilen üýtgeşmeler üçin teklipler; * Üçünji tarapyň çeşmeleri (mysal üçin, GitHub), şol sanda gatnaşyjynyň tehniki başarnyklaryny subut edip biljek we belli bir gatnaşyjy bilen aç-açan baglanyşykly beýleki fond taslamalarynda işlemek. == Arzalary bir wagtda tabşyrmak == Bir gatnaşyjynyň diňe Interfeýs administratorynyň baýdagy we Dolandyryjy, Býurokrat ýa-da Inerener baýdagy bolup bilse-de, gatnaşyjy bu baýdaklar üçin arzalary bir wagtda tabşyryp biler. Şeýle-de bolsa, Interfeýs administratorynyň baýdagy üçin ýüztutma iň soňky hasaplanýar. Interfeýs dolandyryjysy we inerener baýdaklary üçin ýüztutma diskussiýa görnüşinde geçirilýändigi sebäpli, gatnaşyjy iki aýry arzanyň ýerine ýekeje kombinirlenen arzany iberip biler. Dolandyryjy ýa-da Býurokrat baýdagy üçin ýüztutma şowsuz bolan halatynda, interfeýs administratorynyň baýdagy baradaky arzada bu baýdagy bellemek talaplarynyň berjaý edilýändigi görkezilen bolsa we dalaşgärlere Wikisözlük laýyklykda in Engineener baýdagynyň bellenmegine päsgel berýän deliller ýok bolsa: In Engineenerler # Baýdagy sylaglamak we aýyrmak (ylalaşyksyz hereketler we ş.m.). [[Kategoriýa:Wikisözlük]] f0ic7npnsozfev5h0bnyxdv9hnev9rq Wikisözlük:Interfeýs dolandyryjylaryna haýyş 4 8164 27698 2026-06-21T14:03:47Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'Bu sahypa [[Wikisözlük:Interfeýs administratorlarý|interfeýs administratorynyň]] baýdagyny almak üçin haýyşlary galdyrar. <inputbox> type=commenttitle preload=Wikisözlük:Interfeýs dolandyryjylaryna haýyş/Şablon editintro=Şablon:Editintro/IASS page=Wikisözlük:Interfeýs dolandyryjylaryna haýyş default={{U|{{subst<noinclude></noinclude>:REVISIONUSER}}}} buttonlabel=Haýyş goýuň hidden=yes </inputbox>' 27698 wikitext text/x-wiki Bu sahypa [[Wikisözlük:Interfeýs administratorlarý|interfeýs administratorynyň]] baýdagyny almak üçin haýyşlary galdyrar. <inputbox> type=commenttitle preload=Wikisözlük:Interfeýs dolandyryjylaryna haýyş/Şablon editintro=Şablon:Editintro/IASS page=Wikisözlük:Interfeýs dolandyryjylaryna haýyş default={{U|{{subst<noinclude></noinclude>:REVISIONUSER}}}} buttonlabel=Haýyş goýuň hidden=yes </inputbox> 77upav40mm16bw9155aa68vve24hvh4 27699 27698 2026-06-21T14:07:04Z Umarxon III 2840 27699 wikitext text/x-wiki Bu sahypa [[Wikisözlük:Interfeýs administratorlary|interfeýs administratorynyň]] baýdagyny almak üçin haýyşlary galdyrar. <inputbox> type=commenttitle preload=Wikisözlük:Interfeýs dolandyryjylaryna haýyş/Şablon editintro=Şablon:Editintro/IASS page=Wikisözlük:Interfeýs dolandyryjylaryna haýyş default={{U|{{subst<noinclude></noinclude>:REVISIONUSER}}}} buttonlabel=Haýyş goýuň hidden=yes </inputbox> == [[Ulanyjy:Umarxon III|Umarxon III]] == Salam! Uzak wagt bäri Türkmen Wikipediýasynda peýdaly üýtgeşmeler girizýärin. Häzirki wagtda bu dilde Wikisözlükda gadjet ýok. Munuň üçin interfeýs administratorynyň hukuklaryny almaly. Meni goldarsyňyz diýip umyt edýärin. [[Ulanyjy:Umarxon III|Umarxon III]] ([[Ulanyjy çekişme:Umarxon III|gürleşme]]) 14:06, 21 iýun 2026 (UTC). br7hmc5gbe89eisyzraeto32wd21fmn Module:ar-stripdiacritics 828 8165 27700 2026-06-21T14:16:26Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local m_str_utils = require("Module:string utilities") local find = m_str_utils.find local gsub = m_str_utils.gsub local U = m_str_utils.char local taTwiil = U(0x640) local waSla = U(0x671) -- diacritics ordinarily removed by entry_name replacements local Arabic_diacritics = U(0x64B, 0x64C, 0x64D, 0x64E, 0x64F, 0x650, 0x651, 0x652, 0x656, 0x670, 0x6DF, 0x6E0, 0x6E1) -- replace alif waṣl with alif -- remove tatweel and diacritics: fathatan, dammatan, kasratan...' 27700 Scribunto text/plain local m_str_utils = require("Module:string utilities") local find = m_str_utils.find local gsub = m_str_utils.gsub local U = m_str_utils.char local taTwiil = U(0x640) local waSla = U(0x671) -- diacritics ordinarily removed by entry_name replacements local Arabic_diacritics = U(0x64B, 0x64C, 0x64D, 0x64E, 0x64F, 0x650, 0x651, 0x652, 0x656, 0x670, 0x6DF, 0x6E0, 0x6E1) -- replace alif waṣl with alif -- remove tatweel and diacritics: fathatan, dammatan, kasratan, fatha, -- damma, kasra, shadda, sukun, subscript alif, superscript (dagger) alif, -- sifr mustadir, sifr mustatil, variant sukun local replacements = { from = {U(0x0671), "[" .. U(0x640) .. Arabic_diacritics .. "]"}, to = {U(0x0627)}, } local export = {} function export.stripDiacritics(text, lang, sc) if text == waSla or find(text, "^" .. taTwiil .. "?[" .. Arabic_diacritics .. "]" .. "$") then return text end for i, from in ipairs(replacements.from) do local to = replacements.to[i] or "" text = gsub(text, from, to) end return text end return export 2lgsh3j6uhu231mbh9l3b1ycea09psv Module:ar-verb 828 8166 27701 2026-06-21T14:18:09Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} --[=[ This module implements {{ar-conj}} and provides the underlying conjugation functions for {{ar-verb}} (whose actual formatting is done in [[Module:ar-headword]]). Author: User:Benwing, from an early version (2013-2014) by User:Atitarev, User:ZxxZxxZ. ]=] --[=[ TERMINOLOGY: -- "slot" = A particular combination of tense/mood/person/number/etc. Example slot names for verbs are "past_1s" (past tense first-person singular), "juss_pass_3...' 27701 Scribunto text/plain local export = {} --[=[ This module implements {{ar-conj}} and provides the underlying conjugation functions for {{ar-verb}} (whose actual formatting is done in [[Module:ar-headword]]). Author: User:Benwing, from an early version (2013-2014) by User:Atitarev, User:ZxxZxxZ. ]=] --[=[ TERMINOLOGY: -- "slot" = A particular combination of tense/mood/person/number/etc. Example slot names for verbs are "past_1s" (past tense first-person singular), "juss_pass_3fp" (non-past jussive passive third-person feminine plural) "ap" (active participle). Each slot is filled with zero or more forms. -- "form" = The conjugated Arabic form representing the value of a given slot. -- "lemma" = The dictionary form of a given Arabic term. For Arabic, normally the third person masculine singular past, although other forms may be used if this form is missing (e.g. in passive-only verbs or verbs lacking the past). ]=] --[=[ FIXME: 1. Finish unimplemented conjugation types. Only IX-final-weak left (extremely rare, possibly only one verb اِعْمَايَ (according to Haywood and Nahmad p. 244, who are very specific about the irregular occurrence of alif + yā instead of expected اِعْمَيَّ with doubled yā). Not in Hans Wehr. NOTE: Not true about this, cf. form IX اِرْعَوَى "to desist, to repent, to see the light". Also note form XII اِخْضَوْضَرَ = form IX اِخْضَرَّ "to be or become green". [DONE except for اِعْمَايَ] 2. Implement irregular verbs as special cases and recognize them, e.g. -- laysa "to not be"; only exists in the past tense, no non-past, no imperative, no participles, no passive, no verbal noun. Irregular alternation las-/lays-. [IMPLEMENTABLE USING OVERRIDES] -- istaḥā yastaḥī "be ashamed of" -- this is complex according to Hans Wehr because there are two verbs, regular istaḥyā yastaḥyī "to spare (someone)'s life" and irregular istaḥyā yastaḥyī "to be ashamed to face (someone)", which is irregular because it has the alternate irregular form istaḥā yastaḥī which only applies to this meaning. Currently we follow Haywood and Nahmad in saying that both varieties can be spelled istaḥyā/istaḥā/istaḥḥā, but we should instead use a variant= param similar to حَيَّ to distinguish the two possibilities, and maybe not include istaḥḥā. -- ʿayya/ʿayiya yaʿayyu/yaʿyā "to not find the right way, be incapable of, stammer, falter, fall ill". This appears to be a mixture of a geminate and final-weak verb. Unclear what the whole paradigm looks like. Do the consonant-ending parts in the past follow the final-weak paradigm? Is it the same in the non-past? Or can you conjugate the non-past fully as either geminate or final-weak? -- اِنْمَحَى inmaḥā or يمَّحَى immaḥā "to be effaced, obliterated; to disappear, vanish" has irregular assimilation of inm- to imm- as an alternative. inmalasa "to become smooth; to glide; to slip away; to escape" also has immalasa as an alternative. The only other form VII verbs in Hans Wehr beginning with -m- are inmalaḵa "to be pulled out, torn out, wrenched" and inmāʿa "to be melted, to melt, to dissolve", which are not listed with imm- alternatives, but might have them; if so, we should handle this generally. [DONE] -- يَرَعَ yaraʕa yariʕu "to be a coward, to be chickenhearted" as an alternative form of يَرِعَ yariʕa yayraʕu (as given in Wehr). [IMPLEMENTABLE USING OVERRIDES] 3. Implement individual override parameters for each paradigm part. See Module:fro-verb for an example of how to do this generally. Note that {{temp|ar-conj-I}} and other of the older templates already had such individual override params. [DONE] Irregular verbs already implemented: -- [ḥayya/ḥayiya yaḥyā "live" -- behaves like a normal final-weak verb (e.g. past first singular ḥayītu) except in the past-tense parts with vowel-initial endings (all the third person except for the third feminine plural). The normal singular and dual endings have -yiya- in them, which compresses to -yya-, with the normal endings the less preferred ones. In masculine third plural, expected ḥayū is replaced by ḥayyū by analogy to the -yy- parts, and the regular form is not given as an alternant in John Mace. Barron's 201 verbs appears to have the regular ḥayū as the part, however. Note also that final -yā appears with tall alif. This appears to be a spelling convention of Arabic, also applying in ḥayyā (form II, "to keep (someone) alive") and 'aḥyā (form IV, "to animate, revive, give birth to, give new life to").] -- implemented -- [ittaxadha yattaxidhu "take"] -- implemented -- [sa'ala yas'alu "ask" with alternative jussive/imperative yasal/sal] -- implemented -- [ra'ā yarā "see"] -- implemented -- ['arā yurī "show"] -- implemented -- ['akala ya'kulu "eat" with imperative kul] -- implemented -- ['axadha ya'xudhu "take" with imperative xudh] -- implemented -- ['amara ya'muru "order" with imperative mur] -- implemented --]=] local force_cat = false -- set to true for debugging -- if true, always maintain manual translit during processing, and compare against full translit at the end local debug_translit = false local lang = require("Module:languages").getByCode("ar") local m_links = require("Module:links") local m_string_utilities = require("Module:string utilities") local m_table = require("Module:table") local ar_utilities = require("Module:ar-utilities") local ar_nominals = require("Module:ar-nominals") local iut = require("Module:inflection utilities") local put = require("Module:parse utilities") local pron_qualifier_module = "Module:pron qualifier" local list_to_text = mw.text.listToText local rfind = m_string_utilities.find local rsubn = m_string_utilities.gsub local rmatch = m_string_utilities.match local rsplit = m_string_utilities.split local usub = m_string_utilities.sub local ulen = m_string_utilities.len local u = m_string_utilities.char local unpack = unpack or table.unpack -- Lua 5.2 compatibility local dump = mw.dumpObject -- Within this module, conjugations are the functions that do the actual -- conjugating by creating the parts of a basic verb. -- They are defined further down. local conjugations = {} -- hamza variants local HAMZA = u(0x0621) -- hamza on the line (stand-alone hamza) = ء local HAMZA_ON_ALIF = u(0x0623) local HAMZA_ON_W = u(0x0624) local HAMZA_UNDER_ALIF = u(0x0625) local HAMZA_ON_Y = u(0x0626) local HAMZA_ANY = "[" .. HAMZA .. HAMZA_ON_ALIF .. HAMZA_UNDER_ALIF .. HAMZA_ON_W .. HAMZA_ON_Y .. "]" local HAMZA_PH = u(0xFFF0) -- hamza placeholder local BAD = u(0xFFF1) local BORDER = u(0xFFF2) -- diacritics local A = u(0x064E) -- fatḥa local AN = u(0x064B) -- fatḥatān (fatḥa tanwīn) local U = u(0x064F) -- ḍamma local UN = u(0x064C) -- ḍammatān (ḍamma tanwīn) local I = u(0x0650) -- kasra local IN = u(0x064D) -- kasratān (kasra tanwīn) local SK = u(0x0652) -- sukūn = no vowel local SH = u(0x0651) -- šadda = gemination of consonants local DAGGER_ALIF = u(0x0670) local DIACRITIC_ANY_BUT_SH = "[" .. A .. I .. U .. AN .. IN .. UN .. SK .. DAGGER_ALIF .. "]" -- Pattern matching short vowels local AIU = "[" .. A .. I .. U .. "]" -- Pattern matching short vowels or sukūn local AIUSK = "[" .. A .. I .. U .. SK .. "]" -- Pattern matching any diacritics that may be on a consonant local DIACRITIC = SH .. "?" .. DIACRITIC_ANY_BUT_SH -- translit_patterns local vowels = "aeiouāēīōū" local NV = "[^" .. vowels .. "]" local dia = {a = A, i = I, u = U} local undia = {[A] = "a", [I] = "i", [U] = "u", ["-"] = "-"} -- various letters and signs local ALIF = u(0x0627) -- ʾalif = ا local AMAQ = u(0x0649) -- ʾalif maqṣūra = ى local AMAD = u(0x0622) -- ʾalif madda = آ local TAM = u(0x0629) -- tāʾ marbūṭa = ة local T = u(0x062A) -- tāʾ = ت local HYPHEN = u(0x0640) local N = u(0x0646) -- nūn = ن local W = u(0x0648) -- wāw = و local Y = u(0x064A) -- yāʾ = ي local S = "س" local M = "م" local LRM = u(0x200e) -- left-to-right mark -- common combinations local AH = A .. TAM local AT = A .. T local AA = A .. ALIF local AAMAQ = A .. AMAQ local AAH = AA .. TAM local AAT = AA .. T local II = I .. Y local UU = U .. W local AY = A .. Y local AW = A .. W local AYSK = AY .. SK local AWSK = AW .. SK local NA = N .. A local NI = N .. I local AAN = AA .. N local AANI = AA .. NI local AYNI = AYSK .. NI local AWNA = AWSK .. NA local AYNA = AYSK .. NA local AYAAT = AY .. AAT local UNU = "[" .. UN .. U .. "]" local MA = M .. A local MU = M .. U local TA = T .. A local TU = T .. U local _I = ALIF .. I local _U = ALIF .. U local translit_cache = { -- hamza variants [HAMZA] = "ʔ", [HAMZA_ON_ALIF] = "ʔ", [HAMZA_ON_W] = "ʔ", [HAMZA_UNDER_ALIF] = "ʔ", [HAMZA_ON_Y] = "ʔ", [HAMZA_PH] = "ʔ", -- diacritics [A] = "a", [AN] = "an", [U] = "u", [UN] = "un", [I] = "i", [IN] = "in", [SK] = "", [SH] = "*", -- handled specially [DAGGER_ALIF] = "ā", -- various letters and signs [""] = "", [ALIF] = BAD, -- we should never be transliterating ALIF by itself, as its translit in isolation is ambiguous [AMAQ] = BAD, [AMAD] = "ʔā", [TAM] = "", [T] = "t", [N] = "n", [W] = "w", [Y] = "y", [S] = "s", [M] = "m", [LRM] = "", -- common combinations [AH] = "a", [AT] = "at", [AA] = "ā", [AAMAQ] = "ā", [AAH] = "āh", [AAT] = "āt", [II] = "ī", [UU] = "ū", [AY] = "ay", [AW] = "aw", [AYSK] = "ay", [AWSK] = "aw", [NA] = "na", [NI] = "ni", [AAN] = "ān", [AANI] = "āni", [AYNI] = "ayni", [AWNA] = "awna", [AYNA] = "ayna", [AYAAT] = "ayāt", [MA] = "ma", [MU] = "mu", [TA] = "ta", [TU] = "tu", [_I] = "i", [_U] = "u", } local function transliterate(text) local cached = translit_cache[text] if cached then if cached == BAD then error(("Internal error: Unable to transliterate %s because explicitly marked as BAD"):format(text)) end return cached end local tr = (lang:transliterate(text)) if not tr then error(("Internal error: Unable to transliterate: %s"):format(text)) end translit_cache[text] = tr return tr end local all_person_number_list = { "1s", "2ms", "2fs", "3ms", "3fs", "2d", "3md", "3fd", "1p", "2mp", "2fp", "3mp", "3fp" } local function make_person_number_slot_accel_list(list) local slot_accel_list = {} return slot_accel_list end local imp_person_number_list = {} for _, pn in ipairs(all_person_number_list) do if pn:find("^2") then table.insert(imp_person_number_list, pn) end end local passive_types = m_table.listToSet { "pass", -- verb has both active and passive "ipass", -- verb is active with impersonal passive "nopass", -- verb is active-only "onlypass", -- verb is passive-only "onlypass-impers", -- verb itself is impersonal, meaning passive-only with impersonal passive } local indicator_flags = m_table.listToSet { "nopast", "no_nonpast", "noimp", "nocat", -- don't categorize or include annotations about this; useful in suppletive parts of verbs "reduced", -- verb has assimilation/reduction of initial coronals "altgem", -- form X with alternative past geminate forms with final-weak endings } export.potential_lemma_slots = {"past_3ms", "past_pass_3ms", "ind_3ms", "ind_pass_3ms", "imp_2ms"} export.unsettable_slots = {} for _, potential_lemma_slot in ipairs(export.potential_lemma_slots) do table.insert(export.unsettable_slots, potential_lemma_slot .. "_linked") end -- We don't set the active participle directly for form I because we don't want stative verbs (with past vowel i or u) -- to default to فَاعِل. Instead we set the special slot 'ap1' and later copy it to 'ap' for non-stative verbs. The user -- meanwhile can explicitly request the فَاعِل form for active participles for stative verbs using `ap:+`. table.insert(export.unsettable_slots, "ap1") -- primary default فَاعِل for form I active participles table.insert(export.unsettable_slots, "ap2") -- secondary default فَعِيل for form I active participles (stative I) table.insert(export.unsettable_slots, "ap3") -- secondary default فَعِل for form I active participles (stative II) table.insert(export.unsettable_slots, "apcd") -- secondary default أَفْعَل for form I active participles (color/defect) table.insert(export.unsettable_slots, "apan") -- secondary default فَعْلَان for form I active participles (in -ān) table.insert(export.unsettable_slots, "pp2") -- secondary default فَعِيل for form I passive participles (same as ap2) table.insert(export.unsettable_slots, "vn2") -- secondary default فِعَال for form III verbal nouns export.unsettable_slots_set = m_table.listToSet(export.unsettable_slots) local default_indicator_to_active_participle_slot = { ["+"] = "ap1", ["++"] = "ap2", ["+++"] = "ap3", ["+cd"] = "apcd", ["+an"] = "apan", } local slots_that_may_be_uncertain = { vn = "verbal noun", ap = "active participle", } -- Initialize all the slots for which we generate forms. local function add_slots(alternant_multiword_spec) alternant_multiword_spec.verb_slots = { {"ap", "act|part"}, {"pp", "pass|part"}, {"vn", "vnoun"}, } for _, unsettable_slot in ipairs(export.unsettable_slots) do table.insert(alternant_multiword_spec.verb_slots, {unsettable_slot, "-"}) end -- Add entries for a slot with person/number variants. -- `slot_prefix` is the prefix of the slot, typically specifying the tense/aspect. -- `tag_suffix` is a string listing the set of inflection tags to add after the person/number tags. -- `person_number_list` is a list of the person/number slot suffixes to add to `slot_prefix`. local function add_personal_slot(slot_prefix, tag_suffix, person_number_list) for _, persnum in ipairs(person_number_list) do local slot = slot_prefix .. "_" .. persnum local accel = persnum:gsub("(.)", "%1|") .. tag_suffix table.insert(alternant_multiword_spec.verb_slots, {slot, accel}) end end local tenses = { {"past", "past|%s"}, {"ind", "non-past|%s|ind"}, {"sub", "non-past|%s|sub"}, {"juss", "non-past|%s|juss"}, } for _, slot_accel in ipairs(tenses) do local slot, accel = unpack(slot_accel) for _, voice in ipairs {"act", "pass"} do add_personal_slot(voice == "act" and slot or slot .. "_pass", accel:format(voice), all_person_number_list) end end add_personal_slot("imp", "imp", imp_person_number_list) alternant_multiword_spec.verb_slots_map = {} for _, slot_accel in ipairs(alternant_multiword_spec.verb_slots) do local slot, accel = unpack(slot_accel) alternant_multiword_spec.verb_slots_map[slot] = accel end end local overridable_stems = {} local slot_override_param_mods = { footnote = { item_dest = "footnotes", store = "insert", }, alt = {}, t = { -- [[Module:links]] expects the gloss in "gloss". item_dest = "gloss", }, gloss = {}, g = { -- [[Module:links]] expects the genders in "g". `sublist = true` automatically splits on comma (optionally -- with surrounding whitespace). item_dest = "genders", sublist = true, }, pos = {}, lit = {}, id = {}, -- Qualifiers and labels q = { type = "qualifier", }, qq = { type = "qualifier", }, l = { type = "labels", }, ll = { type = "labels", }, } local function generate_obj(formval, parse_err, prefix, is_slot_override) local val, uncertain = formval:match("^(.*)(%?)$") val = val or formval uncertain = not not uncertain local ar, translit = val:match("^(.*)//(.*)$") if not ar then ar = val end if ar == "" then if uncertain then ar = "?" else error(("Can't specify blank value for override for %s override '%s'"):format( is_slot_override and "slot" or "stem", prefix)) end end return {form = ar, translit = translit, uncertain = uncertain} end local function parse_inline_modifiers(comma_separated_group, parse_err, prefix, is_slot_override) local function this_generate_obj(formval, parse_err) return generate_obj(formval, parse_err, prefix, is_slot_override) end return put.parse_inline_modifiers_from_segments { group = comma_separated_group, props = { param_mods = slot_override_param_mods, parse_err = parse_err, generate_obj = this_generate_obj, pre_normalize_modifiers = function(data) local modtext = data.modtext modtext = modtext:match("^(%[.*%])$") if modtext then return ("<footnote:%s>"):format(modtext) end return data.modtext end, }, } end local function allow_multiple_values_for_override(comma_separated_groups, data, is_slot_override) local retvals = {} for _, comma_separated_group in ipairs(comma_separated_groups) do local retval if is_slot_override then retval = parse_inline_modifiers(comma_separated_group, data.parse_err) else retval = generate_obj(comma_separated_group[1], data.parse_err, data.prefix, is_slot_override) retval.footnotes = data.fetch_footnotes(comma_separated_group) end table.insert(retvals, retval) end for _, form in ipairs(retvals) do if form.form == "+" or default_indicator_to_active_participle_slot[form.form] then if form.form ~= "+" and default_indicator_to_active_participle_slot[form.form] and not is_slot_override then error(("Stem override '%s' cannot use %s to request a secondary default"):format( data.prefix, form.form)) end data.base.slot_override_uses_default[data.prefix] = true end end for _, form in ipairs(retvals) do if form.form == "-" then data.base.slot_explicitly_missing[data.prefix] = true break end end if data.base.slot_explicitly_missing[data.prefix] then for _, form in ipairs(retvals) do if form.form ~= "-" then data.parse_err(("For slot or stem '%s', saw both - and a value other than -, which isn't allowed"): format(data.prefix)) end end return nil end return retvals end local function simple_choice(choices) return function(separated_groups, data) if #separated_groups > 1 then data.parse_err("For spec '" .. data.prefix .. ":', only one value currently allowed") end if #separated_groups[1] > 1 then data.parse_err("For spec '" .. data.prefix .. ":', no footnotes currently allowed") end local choice = separated_groups[1][1] if not m_table.contains(choices, choice) then data.parse_err("For spec '" .. data.prefix .. ":', saw value '" .. choice .. "' but expected one of '" .. table.concat(choices, ",") .. "'") end return choice end end for _, overridable_stem in ipairs { "past", "past_v", "past_c", "past_pass", "past_pass_v", "past_pass_c", "nonpast", "nonpast_v", "nonpast_c", "nonpast_pass", "nonpast_pass_v", "nonpast_pass_c", "imp", "imp_v", "imp_c", } do overridable_stems[overridable_stem] = allow_multiple_values_for_override end overridable_stems.past_final_weak_vowel = simple_choice { "ay", "aw", "ī", "ū" } overridable_stems.past_pass_final_weak_vowel = simple_choice { "ay", "aw", "ī", "ū" } overridable_stems.nonpast_final_weak_vowel = simple_choice { "ā", "ī", "ū" } overridable_stems.nonpast_pass_final_weak_vowel = simple_choice { "ā", "ī", "ū" } ------------------------------------------------------------------------------- -- Utility functions -- ------------------------------------------------------------------------------- -- version of rsubn() that discards all but the first return value local function rsub(term, foo, bar) return (rsubn(term, foo, bar)) end -- version of rsubn() that returns a 2nd argument boolean indicating whether a substitution was made. local function rsubb(term, foo, bar) local retval, nsubs = rsubn(term, foo, bar) return retval, nsubs > 0 end -- Concatenate one or more strings or form objects. local function q(...) local not_all_strings = debug_translit local has_manual_translit = debug_translit for i = 1, select("#", ...) do local argt = select(i, ...) if not argt then error(("Internal error: Saw nil at index %s: %s"):format(i, dump({...}))) end if type(argt) ~= "string" then not_all_strings = true if argt.translit then has_manual_translit = true break end end end if not not_all_strings then -- just strings, concatenate directly return table.concat({...}) end local formvals = {} local translit = has_manual_translit and {} or nil local footnotes for i = 1, select("#", ...) do local argt = select(i, ...) if type(argt) == "string" then formvals[i] = argt if has_manual_translit then translit[i] = transliterate(argt) end else formvals[i] = argt.form if has_manual_translit then translit[i] = argt.translit or transliterate(argt.form) end footnotes = iut.combine_footnotes(footnotes, argt.footnotes) end end -- FIXME: Do we want to support other properties? return { form = table.concat(formvals), translit = has_manual_translit and table.concat(translit) or nil, footnotes = footnotes, } end -- Return the formval associated with `rad` (a radical or past/non-past vowel, either a string or form object). local function rget(rad) if type(rad) == "string" then return rad elseif type(rad) == "table" then return rad.form else error(("Internal error: Unexpected type for radical or past/non-past vowel: %s"):format(dump(rad))) end end export.rget = rget -- for use in [[Module:ar-headword]] -- Return the footnotes associated with `rad` (a radical or past/non-past vowel, either a string or form object). local function rget_footnotes(rad) if type(rad) == "string" then return nil elseif type(rad) == "table" then return rad.footnotes else error(("Internal error: Unexpected type for radical or past/non-past vowel: %s"):format(dump(rad))) end end -- Return true if the formval associated with `rad` (a radical or past/non-past vowel, either a string or form object) -- is `val`. local function req(rad, val) return rget(rad) == val end -- Map `vow` (a past/non-past vowel, either a string or form object without translit) by passing the formval through -- `fn`. Don't call this on radicals because they may have manual translit and it isn't clear how to handle that. local function map_vowel(vow, fn) if type(vow) == "string" then return fn(vow) elseif type(vow) == "table" then return {form = fn(vow.form), footnotes = vow.footnotes} else error(("Internal error: Unexpected type for past/non-past vowel: %s"):format(dump(vow))) end end local function get_radicals_3(vowel_spec) return vowel_spec.rad1, vowel_spec.rad2, vowel_spec.rad3, vowel_spec.past, vowel_spec.nonpast end local function get_radicals_4(vowel_spec) return vowel_spec.rad1, vowel_spec.rad2, vowel_spec.rad3, vowel_spec.rad4 end local function is_final_weak(base, vowel_spec) return vowel_spec.weakness == "final-weak" or base.form == "XV" end local function link_term(text, face, id) return m_links.full_link({lang = lang, term = text, tr = "-", id = id}, face) end local function tag_text(text, tag, class) return m_links.full_link({lang = lang, alt = text, tr = "-"}) end local function track(page) require("Module:debug/track")("ar-verb/" .. page) return true end local function track_if_ar_conj(base, page) if base.alternant_multiword_spec.source_template == "ar-conj" then require("Module:debug/track")("ar-verb/" .. page) end return true end local function reorder_shadda(word) -- shadda+short-vowel (including tanwīn vowels, i.e. -an -in -un) gets -- replaced with short-vowel+shadda during NFC normalisation, which -- MediaWiki does for all Unicode strings; however, it makes various -- processes inconvenient, so undo it. word = rsub(word, "(" .. DIACRITIC_ANY_BUT_SH .. ")" .. SH, SH .. "%1") return word end ------------------------------------------------------------------------------- -- Basic functions to inflect tenses -- ------------------------------------------------------------------------------- local function skip_slot(base, slot, allow_overrides) if base.slot_explicitly_missing[slot] then return true end if not allow_overrides and base.slot_overrides[slot] and not base.slot_override_uses_default[slot] then -- Skip any slots for which there are overrides, except those that request the default value using +, ++, etc. return true end if base.passive == "nopass" and (slot == "pp" or slot:find("_pass")) then return true elseif base.passive == "onlypass" and slot ~= "pp" and slot ~= "vn" and not slot:find("_pass") then return true elseif base.passive == "ipass" and slot:find("_pass") and not slot:find("3ms") then return true elseif base.passive == "onlypass-impers" and slot ~= "pp" and slot ~= "vn" and (not slot:find("_pass") or slot:find("_pass") and not slot:find("3ms")) then return true end if base.nopast and slot:find("^past_") then return true end if base.noimp and slot:find("^imp_") then return true end if base.no_nonpast and (slot:find("^ind_") or slot:find("^sub_") or slot:find("^juss")) then return true end return false end local function basic_combine_stem_ending(stem, ending) return stem .. ending end local function basic_combine_stem_ending_tr(stem, ending) return stem .. ending end -- Concatenate `prefixes`, `stems` and `endings` (any of which may be an abbreviate form list, i.e. strings, form -- objects or lists of strings or form objects) and store into `slot`. If a user-supplied override exists for the slot, -- nothing will happen unless `allow_overrides` is provided. local function add3(base, slot, prefixes, stems, endings, allow_overrides) if skip_slot(base, slot, allow_overrides) then return end -- Optimization since the prefixes are almost always single strings. if type(prefixes) == "string" then local function do_combine_stem_ending(stem, ending) return prefixes .. stem .. ending end local function do_combine_stem_ending_tr(stem, ending) return transliterate(prefixes) .. stem .. ending end iut.add_forms(base.forms, slot, stems, endings, do_combine_stem_ending, transliterate, do_combine_stem_ending_tr, base.form_footnotes) else iut.add_multiple_forms(base.forms, slot, {prefixes, stems, endings}, basic_combine_stem_ending, transliterate, basic_combine_stem_ending_tr, base.form_footnotes) end end -- Insert one or more forms in `form_or_forms` into `slot`. `form_or_forms` is an abbreviated form list (see comment at -- top of [[Module:inflection utilities]]). If a user-supplied override exists for the slot, nothing will happen unless -- `allow_overrides` is provided. BEWARE: One form object should never occur in two different slots, or twice in a given -- slot; if taking a form object from an existing slot, make sure to shallowCopy() it. local function insert_form_or_forms(base, slot, form_or_forms, allow_overrides, uncertain) if not skip_slot(base, slot, allow_overrides) then -- Some optimizations of the most common case of inserting a single string. if type(form_or_forms) == "string" and not base.form_footnotes then form_or_forms = {form = form_or_forms, uncertain = uncertain} iut.insert_form(base.forms, slot, form_or_forms) else local list = iut.convert_to_general_list_form(form_or_forms, base.form_footnotes) if uncertain then for _, formobj in ipairs(list) do formobj.uncertain = true end end iut.insert_forms(base.forms, slot, list) end end end -- Insert `string_or_form` into both the ap2 and pp2 slots, shallowCopying a form object to make sure no form objects -- occur in two slots. local function insert_ap2_pp2(base, string_or_form) insert_form_or_forms(base, "ap2", string_or_form) if type(string_or_form) == "table" then string_or_form = m_table.shallowCopy(string_or_form) end insert_form_or_forms(base, "pp2", string_or_form) end -- Convert `stemforms` (a string, a form object, or a list of strings and/or form objects) into "general form" (a list -- of form objects) and map `fn` over the list of objects. `fn` is passed two arguments (form value and translit) and -- should likewise return the new form value and translit. Footnotes will be preserved. FIXME: Preserve other metadata. local function map_general(stemforms, fn) return iut.map_forms(iut.convert_to_general_list_form(stemforms), fn) end -- Similar to map_general() except that `fn` should return a single value (one or more strings or form objects), instead -- of two values (form value and translit), and the resulting value(s) from all calls to `fn` will be flattened to -- construct the overall return value. Footnotes will be preserved. FIXME: Preserve other metadata. local function flatmap_general(stemforms, fn) return iut.flatmap_forms(iut.convert_to_general_list_form(stemforms), fn) end -- Given user-supplied stem overrides in `base`, construct any derived stem overrides (e.g. vowel-specific or -- consonant-specific variants), and truncate initial y-/ي- in any non-past overrides. local function construct_stems(base) local stems = base.stem_overrides stems.past_v = stems.past_v or stems.past stems.past_c = stems.past_c or stems.past stems.past_pass_v = stems.past_pass_v or stems.past_pass stems.past_pass_c = stems.past_pass_c or stems.past_pass stems.nonpast_v = stems.nonpast_v or stems.nonpast stems.nonpast_c = stems.nonpast_c or stems.nonpast stems.nonpast_pass_v = stems.nonpast_pass_v or stems.nonpast_pass stems.nonpast_pass_c = stems.nonpast_pass_c or stems.nonpast_pass stems.imp_v = stems.imp_v or stems.imp stems.imp_c = stems.imp_c or stems.imp local function truncate_nonpast_initial_cons(stem_type, form, translit) if form == "+" then return form, translit end if not form:find("^" .. Y) then error(("Form value %s for stem type '%s' should begin with ي"):format(form, stem_type)) end form = form:gsub("^" .. Y, "") if translit then if not translit:find("^y") then error(("Translit value %s for stem type '%s' should begin with y"):format(translit, stem_type)) end translit = translit:gsub("^y", "") end return form, translit end for _, nonpast_stem_type in ipairs { "nonpast_v", "nonpast_c", "nonpast_pass_v", "nonpast_pass_c" } do if stems[nonpast_stem_type] then stems[nonpast_stem_type] = map_general(stems[nonpast_stem_type], function(form, translit) return truncate_nonpast_initial_cons(nonpast_stem_type, form, translit) end) end end end -- Given user-specified overrides for stem `stemname`, return overrides with occurrences of + replaced by -- `default_stem`. If no overrides, return `default_stem`, or {} if no default. local function override_stem_if_needed(base, stemname, default_stem) local overrides = base.stem_overrides[stemname] if not overrides then return default_stem or {} end return map_general(overrides, function(form, translit) if form ~= "+" and default_indicator_to_active_participle_slot[form] then error(("Stem overrides cannot use secondary default indicators but saw %s in stem override '%s'"):format( form, stemname)) end if form == "+" then if translit then error(("Cannot supply manual translit along with + for stem override '%s'"):format(stemname)) end if not default_stem then error(("Cannot use + for stem override '%s' because no default is available"):format(stemname)) end if type(default_stem) ~= "string" then error(("Internal error: Default stem for '%s' is not a string: %s"):format(stemname, dump(default_stem))) end return default_stem end return form, translit end) end ------------------------------------------------------------------------------- -- Properties of different verbal forms -- ------------------------------------------------------------------------------- local allowed_vforms = {"I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "X", "XI", "XII", "XIII", "XIV", "XV", "Iq", "IIq", "IIIq", "IVq"} local allowed_vforms_set = m_table.listToSet(allowed_vforms) local allowed_vforms_with_weakness = m_table.shallowCopy(allowed_vforms) -- The user needs to be able to explicitly specify that a form-I verb (specifically one whose initial radical is و) is -- sound. Cf. wajiʕa yawjaʕu (not #yajaʕu) "to ache, to hurt". In general, i~a and u~u verbs whose initial radical is و -- seem to not assimilate the first radical; cf. وقح "to be shameless", variously waqaḥa~yaqiḥu, waquḥa~yawquḥu and -- waqiḥa~yawqaḥu, whereas a~i verbs (wafaḍa~yafiḍu "to rush"), i~i verbs (wafiqa~yafiqu "to be proper, to be suitable") -- and a~a verbs (waḍaʕa~yaḍaʕu "to set down, to place") do assimilate. But there are naturally exceptions, e.g. -- waṭiʔa~yaṭaʔu "to tread, to trample"; wasiʕa~yasaʕu "to be spacious; to be well-off"; waṯiʔa~yaṯaʔu "to get bruised, -- to be sprained". Also beware of waniya~yawnā "to be faint; to languish", which is sound in the first radical and -- final-weak in the last radical. Nonetheless, the regularity of the patterns mentioned above suggest we should provide -- them as defaults. -- Note that there are other cases of unexpectedly sound verbs, e.g. izdawaja~yazdawiju "to be in pairs", layisa~yalyasu -- "to be valiant, to be brave", ʔaḥwaja~yuḥwiju "to need", istahwana~yastahwinu "to consider easy", sawisa~yaswasu "to -- be or become moth-eaten or worm-eaten" (vs. sāsa~yasūsu "to govern, to rule" from the same radicals), ʕawira~yaʕwaru -- "to be one-eyed", istajwaba~yastajwibu "to interrogate", etc. But in these cases there is no need for explicit user -- specification as the lemma itself specifies the unexpected soundness. for _, form_with_weakness in ipairs { "I-sound", "I-assimilated", "none-sound", "none-hollow", "none-geminate", "none-final-weak" } do table.insert(allowed_vforms_with_weakness, form_with_weakness) end local allowed_vforms_with_weakness_set = m_table.listToSet(allowed_vforms_with_weakness) local function vform_supports_final_weak(vform) return vform ~= "XI" and vform ~= "XV" and vform ~= "IVq" end local function vform_supports_geminate(vform) return vform == "I" or vform == "III" or vform == "IV" or vform == "VI" or vform == "VII" or vform == "VIII" or vform == "X" end local function vform_supports_hollow(vform) return vform == "I" or vform == "IV" or vform == "VII" or vform == "VIII" or vform == "X" end local function vform_probably_impersonal_passive(vform, weakness, past_vowel, nonpast_vowel) return vform == "I" and req(past_vowel, I) or vform == "V" or vform == "VI" or vform == "X" or vform == "IIq" end local function vform_probably_full_passive(vform) return vform == "II" or vform == "III" or vform == "IV" or vform == "Iq" end local function vform_probably_no_passive(vform, weakness, past_vowel, nonpast_vowel) return vform == "I" and req(past_vowel, U) or vform == "VII" or vform == "IX" or vform == "XI" or vform == "XII" or vform == "XIII" or vform == "XIV" or vform == "XV" or vform == "IIIq" or vform == "IVq" end -- Active vforms II, III, IV, Iq use non-past prefixes in -u- instead of -a-. local function prefix_vowel_from_vform(vform) if vform == "II" or vform == "III" or vform == "IV" or vform == "Iq" then return "u" else return "a" end end -- True if the active non-past takes a-vocalization rather than i-vocalization in its last syllable. local function vform_nonpast_a_vowel(vform) return vform == "V" or vform == "VI" or vform == "XV" or vform == "IIq" end -- True if the `passive` spec indicates a passive-only verb. local function is_passive_only(passive) return passive == "onlypass" or passive == "onlypass-impers" end export.is_passive_only = is_passive_only -- for use in [[Module:ar-headword]] ------------------------------------------------------------------------------- -- Properties of specific sounds -- ------------------------------------------------------------------------------- -- Is radical wāw (و) or yāʾ (ي)? local function is_waw_ya(rad) return req(rad, W) or req(rad, Y) end -- Check that radical is wāw (و) or yāʾ (ي), error if not local function check_waw_ya(rad) if not is_waw_ya(rad) then error("Expecting weak radical: '" .. rget(rad) .. "' should be " .. W .. " or " .. Y) end end -- Form-I verb حيّ or حيي and form-X verb استحيا or استحى local function hayy_radicals(rad1, rad2, rad3) return req(rad1, "ح") and req(rad2, Y) and is_waw_ya(rad3) end -- FUCK ME HARD. "Lua error at line 1514: main function has more than 200 local variables". local function create_conjugations() ------------------------------------------------------------------------------- -- Radicals associated with various irregular verbs -- ------------------------------------------------------------------------------- -- Form-I verb أخذ or form-VIII verb اتخذ local function axadh_radicals(rad1, rad2, rad3) return req(rad1, HAMZA) and req(rad2, "خ") and req(rad3, "ذ") end -- Form-I verb whose imperative has a reduced form: أكل and أخذ and أمر. Return "shortonly" if only -- short-form imperatives exist (أكل and أخذ) or "shortlong" if long-form imperatives also exist (أمر); -- they are used after a clitic like فَ and وَ. local function reduced_imperative_verb(rad1, rad2, rad3) return axadh_radicals(rad1, rad2, rad3) and "shortonly" or req(rad1, HAMZA) and req(rad2, "ك") and req(rad3, "ل") and "shortonly" or req(rad1, HAMZA) and req(rad2, "م") and req(rad3, "ر") and "shortlong" end -- Form-I verb رأى and form-IV verb أرى local function raa_radicals(rad1, rad2, rad3) return req(rad1, "ر") and req(rad2, HAMZA) and is_waw_ya(rad3) end -- Form-I verb سأل local function saal_radicals(rad1, rad2, rad3) return req(rad1, "س") and req(rad2, HAMZA) and req(rad3, "ل") end -- Form-I verb كان local function kaan_radicals(rad1, rad2, rad3) return req(rad1, "ك") and req(rad2, W) and req(rad3, N) end ------------------------------------------------------------------------------- -- Sets of past endings -- ------------------------------------------------------------------------------- -- The 13 endings of the sound/hollow/geminate past tense. local past_endings = { -- singular SK .. TU, SK .. TA, SK .. "تِ", A, A .. "تْ", --dual SK .. "تُمَا", AA, A .. "تَا", -- plural SK .. "نَا", SK .. "تُمْ", -- shadda + vowel diacritic ends up in the wrong order due to Unicode -- bug, so keep them separate to avoid this SK .. "تُن" .. SH .. A, UU .. ALIF, SK .. "نَ" } -- Make endings for final-weak past in -aytu or -awtu. AYAW is AY or AW as appropriate. Note that AA and AW are -- global variables. local function make_past_endings_ay_aw(ayaw, third_sg_masc) return { -- singular ayaw .. SK .. TU, ayaw .. SK .. TA, ayaw .. SK .. "تِ", third_sg_masc, A .. "تْ", --dual ayaw .. SK .. "تُمَا", ayaw .. AA, A .. "تَا", -- plural ayaw .. SK .. "نَا", ayaw .. SK .. "تُمْ", -- shadda + vowel diacritic ends up in the wrong order due to Unicode -- bug, so keep them separate to avoid this ayaw .. SK .. "تُن" .. SH .. A, AW .. SK .. ALIF, ayaw .. SK .. "نَ" } end -- past final-weak -aytu endings local past_endings_ay = make_past_endings_ay_aw(AY, AAMAQ) -- past final-weak -awtu endings local past_endings_aw = make_past_endings_ay_aw(AW, AA) -- used for alternative endings for form-X geminate verbs like اِسْتَمَرَّ local past_endings_ay_12_person_only = { -- singular AY .. SK .. TU, AY .. SK .. TA, AY .. SK .. "تِ", {}, {}, --dual AY .. SK .. "تُمَا", {}, {}, -- plural AY .. SK .. "نَا", AY .. SK .. "تُمْ", -- shadda + vowel diacritic ends up in the wrong order due to Unicode -- bug, so keep them separate to avoid this AY .. SK .. "تُن" .. SH .. A, {}, {}, } -- Make endings for final-weak past in -ītu or -ūtu. IIUU is ī or ū as appropriate. Note that AA and UU are global -- variables. local function make_past_endings_ii_uu(iiuu) return { -- singular iiuu .. TU, iiuu .. TA, iiuu .. "تِ", iiuu .. A, iiuu .. A .. "تْ", --dual iiuu .. "تُمَا", iiuu .. AA, iiuu .. A .. "تَا", -- plural iiuu .. "نَا", iiuu .. "تُمْ", -- shadda + vowel diacritic ends up in the wrong order due to Unicode -- bug, so keep them separate to avoid this iiuu .. "تُن" .. SH .. A, UU .. ALIF, iiuu .. "نَ" } end -- past final-weak -ītu endings local past_endings_ii = make_past_endings_ii_uu(II) -- past final-weak -ūtu endings local past_endings_uu = make_past_endings_ii_uu(UU) ------------------------------------------------------------------------------- -- Sets of non-past prefixes and endings -- ------------------------------------------------------------------------------- local nonpast_prefix_consonants = { -- singular HAMZA, T, T, Y, T, -- dual T, Y, T, -- plural N, T, T, Y, Y } -- There are only five distinct endings in all non-past verbs. Make any set of non-past endings given these five -- distinct endings. local function make_nonpast_endings(null, fem, dual, pl, fempl) return { -- singular null, null, fem, null, null, -- dual dual, dual, dual, -- plural null, pl, fempl, pl, fempl } end -- endings for non-past indicative local ind_endings = make_nonpast_endings( U, II .. NA, AANI, UU .. NA, SK .. NA ) -- Make the endings for non-past subjunctive/jussive, given the vowel diacritic used in "null" endings -- (1s/2ms/3ms/3fs/1p). local function make_sub_juss_endings(dia_null) return make_nonpast_endings( dia_null, II, AA, UU .. ALIF, SK .. NA ) end -- endings for non-past subjunctive local sub_endings = make_sub_juss_endings(A) -- endings for non-past jussive local juss_endings = make_sub_juss_endings(SK) -- endings for alternative geminate non-past jussive in -a; same as subjunctive local juss_endings_alt_a = sub_endings -- endings for alternative geminate non-past jussive in -i local juss_endings_alt_i = make_sub_juss_endings(I) -- Endings for final-weak non-past indicative in -ā. Note that AY, AW and AAMAQ are global variables. local ind_endings_aa = make_nonpast_endings( AAMAQ, AYSK .. NA, AY .. AANI, AWSK .. NA, AYSK .. NA ) -- Make endings for final-weak non-past indicative in -ī or -ū; IIUU is ī or ū as appropriate. Note that II and UU -- are global variables. local function make_ind_endings_ii_uu(iiuu) return make_nonpast_endings( iiuu, II .. NA, iiuu .. AANI, UU .. NA, iiuu .. NA ) end -- endings for final-weak non-past indicative in -ī local ind_endings_ii = make_ind_endings_ii_uu(II) -- endings for final-weak non-past indicative in -ū local ind_endings_uu = make_ind_endings_ii_uu(UU) -- Endings for final-weak non-past subjunctive in -ā. Note that AY, AW, ALIF, AAMAQ are global variables. local sub_endings_aa = make_nonpast_endings( AAMAQ, AYSK, AY .. AA, AWSK .. ALIF, AYSK .. NA ) -- Make endings for final-weak non-past subjunctive in -ī or -ū. IIUU is ī or ū as appropriate. Note that AA, II, -- UU, ALIF are global variables. local function make_sub_endings_ii_uu(iiuu) return make_nonpast_endings( iiuu .. A, II, iiuu .. AA, UU .. ALIF, iiuu .. NA ) end -- endings for final-weak non-past subjunctive in -ī local sub_endings_ii = make_sub_endings_ii_uu(II) -- endings for final-weak non-past subjunctive in -ū local sub_endings_uu = make_sub_endings_ii_uu(UU) -- endings for final-weak non-past jussive in -ā local juss_endings_aa = make_nonpast_endings( A, AYSK, AY .. AA, AWSK .. ALIF, AYSK .. NA ) -- Make endings for final-weak non-past jussive in -ī or -ū. IU is short i or u, IIUU is long ī or ū as appropriate. -- Note that AA, II, UU, ALIF are global variables. local function make_juss_endings_ii_uu(iu, iiuu) return make_nonpast_endings( iu, II, iiuu .. AA, UU .. ALIF, iiuu .. NA ) end -- endings for final-weak non-past jussive in -ī local juss_endings_ii = make_juss_endings_ii_uu(I, II) -- endings for final-weak non-past jussive in -ū local juss_endings_uu = make_juss_endings_ii_uu(U, UU) ------------------------------------------------------------------------------- -- Sets of imperative endings -- ------------------------------------------------------------------------------- -- Extract the second person jussive endings to get corresponding imperative endings. local function imperative_endings_from_jussive(endings) return {endings[2], endings[3], endings[6], endings[10], endings[11]} end -- normal imperative endings local imp_endings = imperative_endings_from_jussive(juss_endings) -- alternative geminate imperative endings in -a local imp_endings_alt_a = imperative_endings_from_jussive(juss_endings_alt_a) -- alternative geminate imperative endings in -i local imp_endings_alt_i = imperative_endings_from_jussive(juss_endings_alt_i) -- final-weak imperative endings in -ā local imp_endings_aa = imperative_endings_from_jussive(juss_endings_aa) -- final-weak imperative endings in -ī local imp_endings_ii = imperative_endings_from_jussive(juss_endings_ii) -- final-weak imperative endings in -ū local imp_endings_uu = imperative_endings_from_jussive(juss_endings_uu) ------------------------------------------------------------------------------- -- Basic functions to inflect tenses -- ------------------------------------------------------------------------------- -- Add to `base` the inflections for the tense indicated by `tense` (the prefix in the slot names, e.g. 'past' -- or 'juss_pass'), formed by combining the `prefixes`, `stems` and `endings`. Each of `prefixes`, `stems` and -- `endings` is either a sequence of 5 (for the imperative) or 13 (for other tenses) abbreviated form lists (each of -- which is either a string, a form object, or a list of strings and/or form objects; see -- [[Module:inflection utilities]] for more info). Alternatively, any of `prefixes`, `stems` or `endings` can be a -- single-element list containing an abbreviated form list, with an additional key `all_same` set to true, or (as a -- special case) a single string; in the latter cases, the same value is used for all 5 or 13 slots. If existing -- inflections already exist, they will be added to, not overridden. `pnums` is the list of person/number slot name -- suffixes, which must match up with the elements in `prefixes`, `stems` and `endings` (i.e. 5 for imperative, 13 -- otherwise). local function inflect_tense_1(base, tense, prefixes, stems, endings, pnums) if not prefixes or not stems or not endings then return end local function verify_affixes(affixname, affixes) local function interr(msg) error(("Internal error: For tense '%s', '%s' %s: %s"):format(tense, affixname, msg, dump(affixes))) end if type(affixes) == "string" then -- do nothing elseif type(affixes) ~= "table" then interr("is not a table or string") elseif affixes.all_same then if #affixes ~= 1 then interr(("with all_same = true should have length 1 but has length %s"):format(#affixes)) end else if #affixes ~= #pnums then interr(("should have length %s but has length %s"):format(#pnums, #affixes)) end end end verify_affixes("prefixes", prefixes) verify_affixes("stems", stems) verify_affixes("endings", endings) local function get_affix(affixes, i) if type(affixes) == "string" then return affixes elseif affixes.all_same then return affixes[1] else return affixes[i] end end for i, pnum in ipairs(pnums) do local prefix = get_affix(prefixes, i) local stem = get_affix(stems, i) local ending = get_affix(endings, i) local slot = tense .. "_" .. pnum add3(base, slot, prefix, stem, ending) end end -- Add to `base` the inflections for the tense indicated by `tense` (the prefix in the slot names, e.g. 'past' -- or 'juss_pass'), formed by combining the `prefixes`, `stems` and `endings`. This is a simple wrapper around -- inflect_tense_1() that applies to all tenses other than the imperative; see inflect_tense_1() for more -- information about the parameters. local function inflect_tense(base, tense, prefixes, stems, endings) inflect_tense_1(base, tense, prefixes, stems, endings, all_person_number_list) end -- Like inflect_tense() but for the imperative, which has only five parts instead of 13 and no prefixes. local function inflect_tense_imp(base, stems, endings) inflect_tense_1(base, "imp", "", stems, endings, imp_person_number_list) end ------------------------------------------------------------------------------- -- Functions to inflect the past tense -- ------------------------------------------------------------------------------- -- Generate past verbs using specified vowel and consonant stems; works for sound, assimilated, hollow, and geminate -- verbs, active and passive. local function past_2stem_conj(base, tense, v_stem, c_stem, footnote_12) local passive = tense:find("_pass") and "_pass" or "" -- Override stems with user-specified stems if available. v_stem = override_stem_if_needed(base, "past" .. passive .. "_v", v_stem) local c_stem_12 = c_stem if footnote_12 then c_stem_12 = iut.combine_form_and_footnotes(c_stem_12, footnote_12) end c_stem_12 = override_stem_if_needed(base, "past" .. passive .. "_c", c_stem_12) local c_stem_3 = override_stem_if_needed(base, "past" .. passive .. "_c", c_stem) inflect_tense(base, tense, "", { -- singular c_stem_12, c_stem_12, c_stem_12, v_stem, v_stem, --dual c_stem_12, v_stem, v_stem, -- plural c_stem_12, c_stem_12, c_stem_12, v_stem, c_stem_3 }, past_endings) end -- Generate past verbs using single specified stem; works for sound and assimilated verbs, active and passive. local function past_1stem_conj(base, tense, stem) past_2stem_conj(base, tense, stem, stem) end ------------------------------------------------------------------------------- -- Functions to inflect non-past tenses -- ------------------------------------------------------------------------------- -- Generate non-past conjugation, with two stems, for vowel-initial and consonant-initial endings, respectively. -- Useful for active and passive; for all forms; for all weaknesses (sound, assimilated, hollow, final-weak and -- geminate) and for all types of non-past (indicative, subjunctive, jussive) except for the imperative. (There is a -- separate wrapper function below for geminate jussives because they have three alternants.) Both stems may be the -- same, e.g. for sound verbs. -- `prefix_vowel` will be either "a" or "u". `endings` should be an array of 13 items. If `endings` is nil or -- omitted, infer the endings from the tense. If `jussive` is true, or `endings` is nil and `tense` indicatives -- jussive, use the jussive pattern of vowel/consonant stems (different from the normal ones). local function nonpast_2stem_conj(base, tense, prefix_vowel, v_stem, c_stem, endings, jussive) local passive = tense:find("_pass") and "_pass" or "" -- Override stems with user-specified stems if available. v_stem = override_stem_if_needed(base, "nonpast" .. passive .. "_v", v_stem and q(dia[prefix_vowel], v_stem) or nil) c_stem = override_stem_if_needed(base, "nonpast" .. passive .. "_c", c_stem and q(dia[prefix_vowel], c_stem) or nil) if not endings then if tense:find("^ind") then endings = ind_endings elseif tense:find("^sub") then endings = sub_endings elseif tense:find("^juss") then jussive = true endings = juss_endings else error("Internal error: Unrecognized tense '" .. tense .."'") end end if not jussive then inflect_tense(base, tense, nonpast_prefix_consonants, { -- singular v_stem, v_stem, v_stem, v_stem, v_stem, --dual v_stem, v_stem, v_stem, -- plural v_stem, v_stem, c_stem, v_stem, c_stem }, endings) else inflect_tense(base, tense, nonpast_prefix_consonants, { -- singular -- 'adlul, tadlul, tadullī, yadlul, tadlul c_stem, c_stem, v_stem, c_stem, c_stem, --dual -- tadullā, yadullā, tadullā v_stem, v_stem, v_stem, -- plural -- nadlul, tadullū, tadlulna, yadullū, yadlulna c_stem, v_stem, c_stem, v_stem, c_stem }, endings) end end -- Generate non-past conjugation with one stem (no distinct stems for vowel-initial and consonant-initial endings). -- See nonpast_2stem_conj(). local function nonpast_1stem_conj(base, tense, prefix_vowel, stem, endings, jussive) nonpast_2stem_conj(base, tense, prefix_vowel, stem, stem, endings, jussive) end -- Generate active/passive jussive geminative. There are three alternants, two with terminations -a and -i and one -- in a null termination with a distinct pattern of vowel/consonant stem usage. See nonpast_2stem_conj() for a -- description of the arguments. local function jussive_gem_conj(base, tense, prefix_vowel, v_stem, c_stem) -- alternative in -a nonpast_2stem_conj(base, tense, prefix_vowel, v_stem, c_stem, juss_endings_alt_a) -- alternative in -i nonpast_2stem_conj(base, tense, prefix_vowel, v_stem, c_stem, juss_endings_alt_i) -- alternative in -null; requires different combination of v_stem and -- c_stem since the null endings require the c_stem (e.g. "tadlul" here) -- whereas the corresponding endings above in -a or -i require the v_stem -- (e.g. "tadulla, tadulli" above) nonpast_2stem_conj(base, tense, prefix_vowel, v_stem, c_stem, juss_endings, "jussive") end ------------------------------------------------------------------------------- -- Functions to inflect the imperative -- ------------------------------------------------------------------------------- -- Generate imperative conjugation, with two stems, for vowel-initial and consonant-initial endings, respectively. -- Useful for all forms, and for all weaknesses other than final-weak. Note that the two stems may be the same -- (specifically for sound and assimilated verbs). If `endings` is nil or omitted, use `imp_endings`. If `alt_gem` -- is specified, use the pattern of vowel and consonant stems appropriate for the alternative geminate imperatives -- that use a null ending of -a or -i instead of an empty ending. local function make_2stem_imperative(base, v_stem, c_stem, endings, alt_gem) endings = endings or imp_endings -- Override stems with user-specified stems if available. v_stem = override_stem_if_needed(base, "imp_v", v_stem) c_stem = override_stem_if_needed(base, "imp_c", c_stem) if alt_gem then inflect_tense_imp(base, {v_stem, v_stem, v_stem, v_stem, c_stem}, endings) else inflect_tense_imp(base, {c_stem, v_stem, v_stem, v_stem, c_stem}, endings) end end -- Generate imperative parts for sound or assimilated verbs. local function make_1stem_imperative(base, stem) make_2stem_imperative(base, stem, stem) end -- Generate imperative parts for geminate verbs form I (also IV, VII, VIII, X). local function make_gem_imperative(base, v_stem, c_stem) make_2stem_imperative(base, v_stem, c_stem, imp_endings_alt_a, "alt gem") make_2stem_imperative(base, v_stem, c_stem, imp_endings_alt_i, "alt gem") make_2stem_imperative(base, v_stem, c_stem) end ------------------------------------------------------------------------------- -- Functions to inflect entire verbs -- ------------------------------------------------------------------------------- -- Generate finite parts of a sound verb (also works for assimilated verbs) from five stems (past and non-past, -- active and passive, plus imperative) plus the prefix vowel in the active non-past ("a" or "u"). local function make_sound_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem, prefix_vowel) past_1stem_conj(base, "past", past_stem) past_1stem_conj(base, "past_pass", past_pass_stem) nonpast_1stem_conj(base, "ind", prefix_vowel, nonpast_stem) nonpast_1stem_conj(base, "sub", prefix_vowel, nonpast_stem) nonpast_1stem_conj(base, "juss", prefix_vowel, nonpast_stem) nonpast_1stem_conj(base, "ind_pass", "u", nonpast_pass_stem) nonpast_1stem_conj(base, "sub_pass", "u", nonpast_pass_stem) nonpast_1stem_conj(base, "juss_pass", "u", nonpast_pass_stem) make_1stem_imperative(base, imp_stem) end local function past_final_weak_endings_from_vowel(vowel) if vowel == "ay" then return past_endings_ay elseif vowel == "aw" then return past_endings_aw elseif vowel == "ī" then return past_endings_ii elseif vowel == "ū" then return past_endings_uu elseif not vowel then return nil else error(("Internal error: Unrecognized past final-weak vowel spec '%s'"):format(vowel)) end end local function nonpast_final_weak_endings_from_vowel(vowel) if vowel == "ā" then return ind_endings_aa, sub_endings_aa, juss_endings_aa, imp_endings_aa elseif vowel == "ī" then return ind_endings_ii, sub_endings_ii, juss_endings_ii, imp_endings_ii elseif vowel == "ū" then return ind_endings_uu, sub_endings_uu, juss_endings_uu, imp_endings_uu elseif not vowel then return nil else error(("Internal error: Unrecognized non-past final-weak vowel spec '%s'"):format(vowel)) end end -- Generate finite parts of a final-weak verb from five stems (past and non-past, active and passive, plus -- imperative), the past active ending vowel (ay, aw, ī or ū), the non-past active ending vowel (ā, ī or ū) and the -- prefix vowel in the active non-past (a or u). local function make_final_weak_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem, past_ending_vowel, nonpast_ending_vowel, prefix_vowel) past_stem = override_stem_if_needed(base, "past", past_stem) past_pass_stem = override_stem_if_needed(base, "past_pass", past_pass_stem) -- Don't call override_stem_if_needed() here for non-past stems; it's called in nonpast_2stem_conj(). imp_stem = override_stem_if_needed(base, "imp", imp_stem) -- + not supported for ending vowel overrides past_ending_vowel = base.stem_overrides.past_final_weak_vowel or past_ending_vowel local past_pass_ending_vowel = base.stem_overrides.past_pass_final_weak_vowel or "ī" nonpast_ending_vowel = base.stem_overrides.nonpast_final_weak_vowel or nonpast_ending_vowel local nonpast_pass_ending_vowel = base.stem_overrides.nonpast_pass_final_weak_vowel or "ā" local past_endings = past_final_weak_endings_from_vowel(past_ending_vowel) local past_pass_endings = past_final_weak_endings_from_vowel(past_pass_ending_vowel) local ind_endings, sub_endings, juss_endings, imp_endings = nonpast_final_weak_endings_from_vowel(nonpast_ending_vowel) local ind_pass_endings, sub_pass_endings, juss_pass_endings = nonpast_final_weak_endings_from_vowel(nonpast_pass_ending_vowel) inflect_tense(base, "past", "", {past_stem, all_same = 1}, past_endings) inflect_tense(base, "past_pass", "", {past_pass_stem, all_same = 1}, past_pass_endings) nonpast_1stem_conj(base, "ind", prefix_vowel, nonpast_stem, ind_endings) nonpast_1stem_conj(base, "sub", prefix_vowel, nonpast_stem, sub_endings) nonpast_1stem_conj(base, "juss", prefix_vowel, nonpast_stem, juss_endings) nonpast_1stem_conj(base, "ind_pass", "u", nonpast_pass_stem, ind_pass_endings) nonpast_1stem_conj(base, "sub_pass", "u", nonpast_pass_stem, sub_pass_endings) nonpast_1stem_conj(base, "juss_pass", "u", nonpast_pass_stem, juss_pass_endings) inflect_tense_imp(base, {imp_stem, all_same = 1}, imp_endings) end -- Generate finite parts of an augmented (form II+) final-weak verb from five stems (past and non-past, active and -- passive, plus imperative) plus the prefix vowel in the active non-past ("a" or "u") and a flag indicating if it -- behaves like a form V/VI verb in taking non-past endings in -ā instead of -ī. local function make_augmented_final_weak_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem, prefix_vowel, form56) make_final_weak_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem, "ay", form56 and "ā" or "ī", prefix_vowel) end -- Generate finite parts of an augmented (form II+) sound or final-weak verb, given: -- * `base` (conjugation data structure); -- * `vowel_spec` (radicals, weakness); -- * `past_stem_base` (active past stem minus last syllable (= -al or -ā)); -- * `nonpast_stem_base` (non-past stem minus last syllable (= -al/-il or -ā/-ī); -- * `past_pass_stem_base` (passive past stem minus last syllable (= -il or -ī)); -- * `vn` (verbal noun). local function make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) insert_form_or_forms(base, "vn", vn) local lastrad = base.quadlit and vowel_spec.rad4 or vowel_spec.rad3 local final_weak = is_final_weak(base, vowel_spec) local prefix_vowel = prefix_vowel_from_vform(base.verb_form) local form56 = vform_nonpast_a_vowel(base.verb_form) local a_base_suffix = final_weak and "" or q(A, lastrad) local i_base_suffix = final_weak and "" or q(I, lastrad) -- past and non-past stems, active and passive local past_stem = q(past_stem_base, a_base_suffix) -- In forms 5 and 6, non-past has /a/ as last stem vowel in the non-past -- in both active and passive, but /i/ in the active participle and /a/ -- in the passive participle. Elsewhere, consistent /i/ in active non-past -- and participle, consistent /a/ in passive non-past and participle. -- Hence, forms 5 and 6 differ only in the non-past active (but not -- active participle), so we have to split the finite non-past stem and -- active participle stem. local nonpast_stem = q(nonpast_stem_base, form56 and a_base_suffix or i_base_suffix) local ap_stem = q(nonpast_stem_base, i_base_suffix) local past_pass_stem = q(past_pass_stem_base, i_base_suffix) local nonpast_pass_stem = q(nonpast_stem_base, a_base_suffix) -- imperative stem local imp_stem = q(past_stem_base, form56 and a_base_suffix or i_base_suffix) -- make parts if final_weak then make_augmented_final_weak_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem, prefix_vowel, form56) else make_sound_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem, prefix_vowel) end -- active and passive participle if final_weak then insert_form_or_forms(base, "ap", q(MU, ap_stem, IN)) insert_form_or_forms(base, "pp", q(MU, nonpast_pass_stem, AN, AMAQ)) else insert_form_or_forms(base, "ap", q(MU, ap_stem)) insert_form_or_forms(base, "pp", q(MU, nonpast_pass_stem)) end end -- Generate finite parts of a hollow or geminate verb from ten stems (vowel and consonant stems for each of past and -- non-past, active and passive, plus imperative) plus the prefix vowel in the active non-past ("a" or "u"), plus a -- flag indicating if we are a geminate verb. local function make_hollow_geminate_verb(base, geminate, past_v_stem, past_c_stem, past_pass_v_stem, past_pass_c_stem, nonpast_v_stem, nonpast_c_stem, nonpast_pass_v_stem, nonpast_pass_c_stem, imp_v_stem, imp_c_stem, prefix_vowel, altgem_note) past_2stem_conj(base, "past", past_v_stem, past_c_stem, altgem_note) past_2stem_conj(base, "past_pass", past_pass_v_stem, past_pass_c_stem) nonpast_2stem_conj(base, "ind", prefix_vowel, nonpast_v_stem, nonpast_c_stem) nonpast_2stem_conj(base, "sub", prefix_vowel, nonpast_v_stem, nonpast_c_stem) nonpast_2stem_conj(base, "ind_pass", "u", nonpast_pass_v_stem, nonpast_pass_c_stem) nonpast_2stem_conj(base, "sub_pass", "u", nonpast_pass_v_stem, nonpast_pass_c_stem) if geminate then jussive_gem_conj(base, "juss", prefix_vowel, nonpast_v_stem, nonpast_c_stem) jussive_gem_conj(base, "juss_pass", "u", nonpast_pass_v_stem, nonpast_pass_c_stem) make_gem_imperative(base, imp_v_stem, imp_c_stem) else nonpast_2stem_conj(base, "juss", prefix_vowel, nonpast_v_stem, nonpast_c_stem) nonpast_2stem_conj(base, "juss_pass", "u", nonpast_pass_v_stem, nonpast_pass_c_stem) make_2stem_imperative(base, imp_v_stem, imp_c_stem) end end -- Generate finite parts of an augmented (form II+) hollow verb, given: -- * `base` (conjugation data structure); -- * `vowel_spec` (radicals, weakness); -- * `past_stem_base` (invariable part of active past stem); -- * `nonpast_stem_base` (invariable part of nonpast stem); -- * `past_pass_stem_base` (invariable part of passive past stem); -- * `vn` (verbal noun). local function make_augmented_hollow_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) insert_form_or_forms(base, "vn", vn) local lastrad = base.quadlit and vowel_spec.rad4 or vowel_spec.rad3 local form410 = base.verb_form == "IV" or base.verb_form == "X" local prefix_vowel = prefix_vowel_from_vform(base.verb_form) local a_base_suffix_v, a_base_suffix_c local i_base_suffix_v, i_base_suffix_c a_base_suffix_v = q(AA, lastrad) -- 'af-āl-a, inf-āl-a a_base_suffix_c = q(A, lastrad) -- 'af-al-tu, inf-al-tu i_base_suffix_v = q(II, lastrad) -- 'uf-īl-a, unf-īl-a i_base_suffix_c = q(I, lastrad) -- 'uf-il-tu, unf-il-tu -- past and non-past stems, active and passive, for vowel-initial and -- consonant-initial endings local past_v_stem = q(past_stem_base, a_base_suffix_v) local past_c_stem = q(past_stem_base, a_base_suffix_c) -- yu-f-īl-u, ya-staf-īl-u but yanf-āl-u, yaft-āl-u local nonpast_v_stem = q(nonpast_stem_base, form410 and i_base_suffix_v or a_base_suffix_v) local nonpast_c_stem = q(nonpast_stem_base, form410 and i_base_suffix_c or a_base_suffix_c) local past_pass_v_stem = q(past_pass_stem_base, i_base_suffix_v) local past_pass_c_stem = q(past_pass_stem_base, i_base_suffix_c) local nonpast_pass_v_stem = q(nonpast_stem_base, a_base_suffix_v) local nonpast_pass_c_stem = q(nonpast_stem_base, a_base_suffix_c) -- imperative stem local imp_v_stem = q(past_stem_base, form410 and i_base_suffix_v or a_base_suffix_v) local imp_c_stem = q(past_stem_base, form410 and i_base_suffix_c or a_base_suffix_c) -- make parts make_hollow_geminate_verb(base, false, past_v_stem, past_c_stem, past_pass_v_stem, past_pass_c_stem, nonpast_v_stem, nonpast_c_stem, nonpast_pass_v_stem, nonpast_pass_c_stem, imp_v_stem, imp_c_stem, prefix_vowel) -- active participle insert_form_or_forms(base, "ap", q(MU, nonpast_v_stem)) -- passive participle insert_form_or_forms(base, "pp", q(MU, nonpast_pass_v_stem)) end -- Generate finite parts of an augmented (form II+) geminate verb, given: -- * `base` (conjugation data structure); -- * `vowel_spec` (radicals, weakness); -- * `past_stem_base` (invariable part of active past stem; this and the stem bases below will end with a consonant -- for forms IV, X, IVq, and a short vowel for the others); -- * `nonpast_stem_base` (invariable part of nonpast stem); -- * `past_pass_stem_base` (invariable part of passive past stem); -- * `vn` (verbal noun); -- * `altgem_note` (footnote to add to active past 1/2-person forms, when alternative forms are supplied [form X]). local function make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn, altgem_note) insert_form_or_forms(base, "vn", vn) local vform = base.verb_form local lastrad = base.quadlit and vowel_spec.rad4 or vowel_spec.rad3 local prefix_vowel = prefix_vowel_from_vform(vform) local a_base_suffix_v, a_base_suffix_c local i_base_suffix_v, i_base_suffix_c if vform == "IV" or vform == "X" or vform == "IVq" then a_base_suffix_v = q(A, lastrad, SH) -- 'af-all a_base_suffix_c = q(SK, lastrad, A, lastrad) -- 'af-lal i_base_suffix_v = q(I, lastrad, SH) -- yuf-ill i_base_suffix_c = q(SK, lastrad, I, lastrad) -- yuf-lil else a_base_suffix_v = q(lastrad, SH) -- fā-ll, infa-ll a_base_suffix_c = q(lastrad, A, lastrad) -- fā-lal, infa-lal i_base_suffix_v = q(lastrad, SH) -- yufā-ll, yanfa-ll i_base_suffix_c = q(lastrad, I, lastrad) -- yufā-lil, yanfa-lil end -- past and non-past stems, active and passive, for vowel-initial and -- consonant-initial endings local past_v_stem = q(past_stem_base, a_base_suffix_v) local past_c_stem = q(past_stem_base, a_base_suffix_c) local nonpast_v_stem = q(nonpast_stem_base, vform_nonpast_a_vowel(vform) and a_base_suffix_v or i_base_suffix_v) local nonpast_c_stem = q(nonpast_stem_base, vform_nonpast_a_vowel(vform) and a_base_suffix_c or i_base_suffix_c) -- NOTE: Formerly had a comment that "vform III and VI passive past do not have contracted parts, only -- uncontracted parts, which are added separately by those functions". This is based on Mace -- "Arabic Verbs and Essential Grammar" (1999) entry 63 (continued), which shows passive ḥūjija but no ḥūjja; -- but that is apparently a mistake, as (1) verb tables in other books do show contracted passive parts for -- these forms; (2) there is no mention of such an exception on p. 99, which explains how geminate ("doubled") -- verbs work (on the contrary, it says "The contracted and uncontracted pairs (see above) are found all -- over Forms III and VI of the doubled verbs"). local past_pass_v_stem = q(past_pass_stem_base, i_base_suffix_v) local past_pass_c_stem = q(past_pass_stem_base, i_base_suffix_c) local nonpast_pass_v_stem = q(nonpast_stem_base, a_base_suffix_v) local nonpast_pass_c_stem = q(nonpast_stem_base, a_base_suffix_c) -- imperative stem local imp_v_stem = q(past_stem_base, vform_nonpast_a_vowel(vform) and a_base_suffix_v or i_base_suffix_v) local imp_c_stem = q(past_stem_base, vform_nonpast_a_vowel(vform) and a_base_suffix_c or i_base_suffix_c) -- make parts make_hollow_geminate_verb(base, "geminate", past_v_stem, past_c_stem, past_pass_v_stem, past_pass_c_stem, nonpast_v_stem, nonpast_c_stem, nonpast_pass_v_stem, nonpast_pass_c_stem, imp_v_stem, imp_c_stem, prefix_vowel, altgem_note) -- active participle insert_form_or_forms(base, "ap", q(MU, nonpast_v_stem)) -- passive participle insert_form_or_forms(base, "pp", q(MU, nonpast_pass_v_stem)) end ------------------------------------------------------------------------------- -- Conjugation functions for specific conjugation types -- ------------------------------------------------------------------------------- local function form_i_imp_stem_through_rad1(base, nonpast_vowel, rad1) local imp_vowel = map_vowel(nonpast_vowel, function(vow) if vow == A or vow == I then return I elseif vow == U then return U elseif not skip_slot(base, "imp_2ms") then error(("Internal error: Non-past vowel %s isn't a, i, or u, should have been caught earlier"):format( dump(nonpast_vowel))) else -- Passive-only; imperative won't ever be displayed so it doesn't matter. return I end end) -- Mace ("Arabic Verbs and Essentials of Grammar" p. 63: [https://archive.org/details/arabicverbsessen00john/page/62/mode/2up]) -- claims that initial hamza is assimilated/elided into a long vowel in the form-I imperative, but apparently -- this isn't corrrect. local vowel_on_alif = map_vowel(imp_vowel, function(vow) return ALIF .. vow end) return q(vowel_on_alif, rad1, SK) end -- Implement form-I sound or assimilated verb. ASSIMILATED is true for assimilated verbs. local function make_form_i_sound_assimilated_verb(base, vowel_spec, assimilated) local rad1, rad2, rad3, past_vowel, nonpast_vowel = get_radicals_3(vowel_spec) -- Verbal nouns (maṣādir) for form I are unpredictable and have to be supplied -- past and non-past stems, active and passive local past_stem = q(rad1, A, rad2, past_vowel, rad3) local nonpast_stem = assimilated and q(rad2, nonpast_vowel, rad3) or q(rad1, SK, rad2, nonpast_vowel, rad3) local past_pass_stem = q(rad1, U, rad2, I, rad3) local nonpast_pass_stem = q(rad1, SK, rad2, A, rad3) -- imperative stem -- check for irregular verb with reduced imperative (أَخَذَ or أَكَلَ or أَمَرَ) local reducedimp = reduced_imperative_verb(rad1, rad2, rad3) if reducedimp then base.irregular = true end local imp_stem_suffix = q(rad2, nonpast_vowel, rad3) local long_imp_stem_base = form_i_imp_stem_through_rad1(base, nonpast_vowel, rad1) local short_imp_stem_base = "" local imp_stem = q((assimilated or reducedimp) and "" or long_imp_stem_base, imp_stem_suffix) -- make parts make_sound_verb(base, past_stem, past_pass_stem, nonpast_stem, nonpast_pass_stem, imp_stem, "a") if reducedimp == "shortlong" then make_1stem_imperative(base, iut.combine_form_and_footnotes(q(long_imp_stem_base, imp_stem_suffix), mw.getCurrentFrame():preprocess("[used especially with a clitic such as {{m|ar|فَ}} or {{m|ar|وَ}}]"))) end -- Check for irregular verb سَأَلَ with alternative jussive and imperative. Calling this after make_sound_verb() -- adds additional entries to the paradigm parts. if saal_radicals(rad1, rad2, rad3) then base.irregular = true nonpast_1stem_conj(base, "juss", "a", "سَل") nonpast_1stem_conj(base, "juss_pass", "u", "سَل") make_1stem_imperative(base, "سَل") end -- Active participle. insert_form_or_forms(base, "ap1", q(rad1, AA, rad2, I, rad3)) -- Insert alternative active participle (stative type I) فَعِيل. Since not all verbs have this, we require that -- verbs that do have it specify it explicitly; a shortcut ++ is provided to make this easier (e.g. <ap:++> to -- indicate that the alternative form should be used for the active participle, <ap:+,++> to indicate that both -- forms can be used, and <ap:-> to indicate that there is no active participle). The same form is used for -- secondary default passive participle. insert_ap2_pp2(base, q(rad1, A, rad2, II, rad3)) -- Active participle, stative type II فَعِل (+++). insert_form_or_forms(base, "ap3", q(rad1, A, rad2, I, rad3)) -- Active participle, color/defect أَفْعَل (+cd). insert_form_or_forms(base, "apcd", q(HAMZA, A, rad1, SK, rad2, A, rad3)) -- Active participle, -ān فَعْلَان (+an). insert_form_or_forms(base, "apan", q(rad1, A, rad2, SK, rad3, AAN)) -- Passive participle. insert_form_or_forms(base, "pp", q(MA, rad1, SK, rad2, UU, rad3)) end conjugations["I-sound"] = function(base, vowel_spec) make_form_i_sound_assimilated_verb(base, vowel_spec, false) end conjugations["none-sound"] = function(base, vowel_spec) -- All default stems are nil. make_sound_verb(base) end conjugations["none-hollow"] = function(base, vowel_spec) -- All default stems are nil. make_hollow_geminate_verb(base, false) end conjugations["none-geminate"] = function(base, vowel_spec) -- All default stems are nil. make_hollow_geminate_verb(base, "geminate") end conjugations["none-final-weak"] = function(base, vowel_spec) -- All default stems are nil. make_final_weak_verb(base) end conjugations["I-assimilated"] = function(base, vowel_spec) make_form_i_sound_assimilated_verb(base, vowel_spec, "assimilated") end local function make_form_i_hayy_verb(base, vowel_spec) -- Verbal nouns (maṣādir) for form I are unpredictable and have to be supplied base.irregular = true -- past and non-past stems, active and passive, and imperative stem local past_c_stem = "حَيِي" local past_v_stem_long = past_c_stem local past_v_stem_short = "حَيّ" local past_pass_c_stem = "حُيِي" local past_pass_v_stem_long = past_pass_c_stem local past_pass_v_stem_short = "حُيّ" local nonpast_stem = "حْي" local nonpast_pass_stem = nonpast_stem local imp_stem = _I .. nonpast_stem -- make parts past_2stem_conj(base, "past", {}, past_c_stem) past_2stem_conj(base, "past_pass", {}, past_pass_c_stem) local variant = vowel_spec.variant or "both" if variant == "short" or variant == "both" then past_2stem_conj(base, "past", past_v_stem_short, {}) past_2stem_conj(base, "past_pass", past_pass_v_stem_short, {}) end function inflect_long_variant(tense, long_stem, short_stem) inflect_tense_1(base, tense, "", {long_stem, long_stem, long_stem, long_stem, short_stem}, {past_endings[4], past_endings[5], past_endings[7], past_endings[8], past_endings[12]}, {"3ms", "3fs", "3md", "3fd", "3mp"}) end if variant == "long" or variant == "both" then inflect_long_variant("past", past_v_stem_long, past_v_stem_short) inflect_long_variant("past_pass", past_pass_v_stem_long, past_pass_v_stem_short) end nonpast_1stem_conj(base, "ind", "a", nonpast_stem, ind_endings_aa) nonpast_1stem_conj(base, "sub", "a", nonpast_stem, sub_endings_aa) nonpast_1stem_conj(base, "juss", "a", nonpast_stem, juss_endings_aa) nonpast_1stem_conj(base, "ind_pass", "u", nonpast_pass_stem, ind_endings_aa) nonpast_1stem_conj(base, "sub_pass", "u", nonpast_pass_stem, sub_endings_aa) nonpast_1stem_conj(base, "juss_pass", "u", nonpast_pass_stem, juss_endings_aa) inflect_tense_imp(base, {imp_stem, all_same = 1}, imp_endings_aa) -- active and passive participles apparently do not exist for this verb end -- Implement form-I final-weak assimilated+final-weak verb. ASSIMILATED is true for assimilated verbs. local function make_form_i_final_weak_verb(base, vowel_spec, assimilated) local rad1, rad2, rad3, past_vowel, nonpast_vowel = get_radicals_3(vowel_spec) -- حَيَّ or حَيِيَ is weird enough that we handle it as a separate function. if hayy_radicals(rad1, rad2, rad3) then make_form_i_hayy_verb(base, vowel_spec) return end -- Verbal nouns (maṣādir) for form I are unpredictable and have to be supplied. -- Past and non-past stems, active and passive, and imperative stem. local past_stem = q(rad1, A, rad2) local past_pass_stem = q(rad1, U, rad2) local nonpast_stem, nonpast_pass_stem, imp_stem if raa_radicals(rad1, rad2, rad3) then base.irregular = true nonpast_stem = rad1 nonpast_pass_stem = rad1 imp_stem = rad1 else nonpast_pass_stem = q(rad1, SK, rad2) if assimilated then nonpast_stem = rad2 imp_stem = rad2 else nonpast_stem = nonpast_pass_stem imp_stem = q(form_i_imp_stem_through_rad1(base, nonpast_vowel, rad1), rad2) end end -- Make parts. local past_ending_vowel = req(rad3, Y) and req(past_vowel, A) and "ay" or req(rad3, W) and req(past_vowel, A) and "aw" or req(past_vowel, I) and "ī" or "ū" -- Try to preserve footnotes attached to the third radical and/or past and/or non-past vowels. local past_footnotes = iut.combine_footnotes(rget_footnotes(rad3), rget_footnotes(past_vowel)) local nonpast_ending_vowel = req(nonpast_vowel, A) and "ā" or req(nonpast_vowel, I) and "ī" or "ū" local nonpast_footnotes = iut.combine_footnotes(rget_footnotes(rad3), rget_footnotes(nonpast_vowel)) make_final_weak_verb(base, iut.combine_form_and_footnotes(past_stem, past_footnotes), iut.combine_form_and_footnotes(past_pass_stem, past_footnotes), iut.combine_form_and_footnotes(nonpast_stem, nonpast_footnotes), iut.combine_form_and_footnotes(nonpast_pass_stem, nonpast_footnotes), iut.combine_form_and_footnotes(imp_stem, nonpast_footnotes), past_ending_vowel, nonpast_ending_vowel, "a") -- Active participle. insert_form_or_forms(base, "ap1", q(rad1, AA, rad2, IN)) -- Active participle, stative type I فَعِيّ (++). FIXME: Is this correct when rad3 is W? insert_ap2_pp2(base, q(rad1, A, rad2, II, SH)) -- Active participle, stative type II فَعٍ (+++). FIXME: Any examples of this to verify it's correct? insert_form_or_forms(base, "ap3", q(rad1, A, rad2, IN)) -- Active participle, color/defect أَفْعَى (+cd). insert_form_or_forms(base, "apcd", q(HAMZA, A, rad1, SK, rad2, AAMAQ)) -- Active participle, -ān فَعْيَان or فَعْوَان (+an). -- FIXME: Any examples of this for both rad3 = W and y to verify it's correct? insert_form_or_forms(base, "apan", q(rad1, A, rad2, SK, rad3, AAN)) -- Passive participle. insert_form_or_forms(base, "pp", q(MA, rad1, SK, rad2, req(rad3, Y) and II or UU, SH)) end conjugations["I-final-weak"] = function(base, vowel_spec) make_form_i_final_weak_verb(base, vowel_spec, false) end conjugations["I-assimilated+final-weak"] = function(base, vowel_spec) make_form_i_final_weak_verb(base, vowel_spec, "assimilated") end conjugations["I-hollow"] = function(base, vowel_spec) local rad1, rad2, rad3, past_vowel, nonpast_vowel = get_radicals_3(vowel_spec) -- In some sense, hollow vowels i~i and u~u are more "correct" than a~i and a~u, but the latter follow the -- pattern of other form-I verbs, so we map i~i to a~i and u~u to a~u in infer_radicals(). Now however we have -- to undo this to get the actual past vowel based on the non-past vowel. if req(past_vowel, A) then past_vowel = map_vowel(past_vowel, function(vow) return req(nonpast_vowel, A) and I or rget(nonpast_vowel) end) end local lengthened_nonpast = map_vowel(nonpast_vowel, function(vow) return vow == U and UU or vow == I and II or AA end) -- Verbal nouns (maṣādir) for form I are unpredictable and have to be supplied. -- active past stems - vowel (v) and consonant (c) local past_v_stem = q(rad1, AA, rad3) local past_c_stem = q(rad1, past_vowel, rad3) -- active non-past stems - vowel (v) and consonant (c) local nonpast_v_stem = q(rad1, lengthened_nonpast, rad3) local nonpast_c_stem = q(rad1, nonpast_vowel, rad3) -- passive past stems - vowel (v) and consonant (c) -- 'ufīla, 'ufiltu local past_pass_v_stem = q(rad1, II, rad3) local past_pass_c_stem = q(rad1, I, rad3) -- passive non-past stems - vowel (v) and consonant (c) -- yufāla/yufalna -- stem is built differently but conjugation is identical to sound verbs local nonpast_pass_v_stem = q(rad1, AA, rad3) local nonpast_pass_c_stem = q(rad1, A, rad3) -- imperative stem local imp_v_stem = nonpast_v_stem local imp_c_stem = nonpast_c_stem -- make parts make_hollow_geminate_verb(base, false, past_v_stem, past_c_stem, past_pass_v_stem, past_pass_c_stem, nonpast_v_stem, nonpast_c_stem, nonpast_pass_v_stem, nonpast_pass_c_stem, imp_v_stem, imp_c_stem, "a") if kaan_radicals(rad1, rad2, rad3) then local endings = make_nonpast_endings(U, {}, {}, {}, {}) inflect_tense(base, "juss", nonpast_prefix_consonants, q(A, rad1), endings) base.irregular = true end -- Active participle. insert_form_or_forms(base, "ap1", req(rad3, HAMZA) and q(rad1, AA, HAMZA, IN) or q(rad1, AA, HAMZA, I, rad3)) -- Active participle, stative type I فَيِّد (++). FIXME: Any examples of this to verify it's correct? insert_ap2_pp2(base, q(rad1, A, Y, SH, I, rad3)) -- Active participle, stative type II فَيِد (+++). FIXME: Any examples of this to verify it's correct? insert_form_or_forms(base, "ap3", q(rad1, A, Y, I, rad3)) -- Active participle, color/defect أَفّيَد or أَفّوَد (+cd). FIXME: Any examples of this to verify it's correct? insert_form_or_forms(base, "apcd", q(HAMZA, A, rad1, SK, rad2, A, rad3)) -- Active participle, -ān فَيْدَان or فَوْدَان (+an). Example: جَاعَ "to be hungry", act part جَوْعَان insert_form_or_forms(base, "apan", q(rad1, A, rad2, SK, rad3, AAN)) -- Passive participle. insert_form_or_forms(base, "pp", q(MA, rad1, req(rad2, Y) and II or UU, rad3)) end conjugations["I-geminate"] = function(base, vowel_spec) local rad1, rad2, rad3, past_vowel, nonpast_vowel = get_radicals_3(vowel_spec) -- Verbal nouns (maṣādir) for form I are unpredictable and have to be supplied. -- active past stems - vowel (v) and consonant (c) local past_v_stem = q(rad1, A, rad2, SH) local past_c_stem = q(rad1, A, rad2, past_vowel, rad2) -- active non-past stems - vowel (v) and consonant (c) local nonpast_v_stem = q(rad1, nonpast_vowel, rad2, SH) local nonpast_c_stem = q(rad1, SK, rad2, nonpast_vowel, rad2) -- passive past stems - vowel (v) and consonant (c) -- dulla/dulilta local past_pass_v_stem = q(rad1, U, rad2, SH) local past_pass_c_stem = q(rad1, U, rad2, I, rad2) -- passive non-past stems - vowel (v) and consonant (c) --yudallu/yudlalna -- stem is built differently but conjugation is identical to sound verbs local nonpast_pass_v_stem = q(rad1, A, rad2, SH) local nonpast_pass_c_stem = q(rad1, SK, rad2, A, rad2) -- imperative stem local imp_v_stem = q(rad1, nonpast_vowel, rad2, SH) local imp_c_stem = q(form_i_imp_stem_through_rad1(base, nonpast_vowel, rad1), rad2, nonpast_vowel, rad2) -- make parts make_hollow_geminate_verb(base, "geminate", past_v_stem, past_c_stem, past_pass_v_stem, past_pass_c_stem, nonpast_v_stem, nonpast_c_stem, nonpast_pass_v_stem, nonpast_pass_c_stem, imp_v_stem, imp_c_stem, "a") -- Active participle. insert_form_or_forms(base, "ap1", q(rad1, AA, rad2, SH)) -- Active participle, stative type I فَعِيع (++). FIXME: Any examples of this to verify it's correct? insert_ap2_pp2(base, q(rad1, A, rad2, II, rad2)) -- Active participle, stative type II فَعّ (+++). Example: بَرَّ "to be pious", active participle بَرّ insert_form_or_forms(base, "ap3", q(rad1, A, rad2, SH)) -- Active participle, color/defect أَفَعّ (+cd). -- Example: لَصَّ "to be thievish, to steal repeatedly", active participle أَلَصّ. insert_form_or_forms(base, "apcd", q(HAMZA, A, rad1, A, rad2, SH)) -- Active participle, -ān فَعَّان (+an). FIXME: Any examples of this to verify it's correct? insert_form_or_forms(base, "apan", q(rad1, A, rad2, SH, AAN)) -- Passive participle. insert_form_or_forms(base, "pp", q(MA, rad1, SK, rad2, UU, rad2)) end -- Return the ta- (active, past and non-past) and tu- (passive past) prefixes for a form II/III/V/VI verb. -- Form V and VI verbs normally use ta- and tu-, but reduced (base.reduced) verbs use different prefixes. Form II -- and III verbs have no prefix. local function form_ii_iii_v_vi_ta_tu_prefix(base, rad1) local vform = base.verb_form if vform == "V" or vform == "VI" then if base.reduced then -- To simplify the code, we generate two rad1's with a sukūn between them, which is cleaned up in -- postprocessing. return q(_I, rad1, SK), q(rad1, SK), q(_U, rad1, SK) else return TA, TA, TU end else return "", "", "" end end -- Make form II or V sound or final-weak verb. local function make_form_ii_v_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local final_weak = is_final_weak(base, vowel_spec) local vform = base.verb_form local ta_past_prefix, ta_nonpast_prefix, tu_past_prefix = form_ii_iii_v_vi_ta_tu_prefix(base, rad1) local vn = vform == "V" and q(ta_past_prefix, rad1, A, rad2, SH, final_weak and IN or q(U, rad3)) or q(TA, rad1, SK, rad2, II, final_weak and AH or rad3) -- various stem bases local past_stem_base = q(ta_past_prefix, rad1, A, rad2, SH) local nonpast_stem_base = q(ta_nonpast_prefix, rad1, A, rad2, SH) local past_pass_stem_base = q(tu_past_prefix, rad1, U, rad2, SH) -- make parts make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end conjugations["II-sound"] = function(base, vowel_spec) make_form_ii_v_sound_final_weak_verb(base, vowel_spec) end conjugations["II-final-weak"] = function(base, vowel_spec) make_form_ii_v_sound_final_weak_verb(base, vowel_spec) end local function make_form_iii_alt_vn(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local final_weak = is_final_weak(base, vowel_spec) -- Insert alternative verbal noun فِعَال. Since not all verbs have this, we require that verbs that do have it -- specify it explicitly; a shortcut ++ is provided to make this easier (e.g. <vn:+,++> to indicate that -- both the normal verbal noun مُفَاعَلَة and secondary verbal noun فِعَال are available). insert_form_or_forms(base, "vn2", q(rad1, I, rad2, AA, final_weak and HAMZA or rad3)) end -- Make form III or VI sound or final-weak verb. local function make_form_iii_vi_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local final_weak = is_final_weak(base, vowel_spec) local vform = base.verb_form local ta_past_prefix, ta_nonpast_prefix, tu_past_prefix = form_ii_iii_v_vi_ta_tu_prefix(base, rad1) local vn = vform == "VI" and q(ta_past_prefix, rad1, AA, rad2, final_weak and IN or q(U, rad3)) or q(MU, rad1, AA, rad2, final_weak and AAH or q(A, rad3, AH)) -- various stem bases local past_stem_base = q(ta_past_prefix, rad1, AA, rad2) local nonpast_stem_base = q(ta_nonpast_prefix, rad1, AA, rad2) local past_pass_stem_base = q(tu_past_prefix, rad1, UU, rad2) -- make parts make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) if vform == "III" then make_form_iii_alt_vn(base, vowel_spec) end end conjugations["III-sound"] = function(base, vowel_spec) make_form_iii_vi_sound_final_weak_verb(base, vowel_spec) end conjugations["III-final-weak"] = function(base, vowel_spec) make_form_iii_vi_sound_final_weak_verb(base, vowel_spec) end -- Make form III or VI geminate verb. local function make_form_iii_vi_geminate_verb(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local vform = base.verb_form local ta_past_prefix, ta_nonpast_prefix, tu_past_prefix = form_ii_iii_v_vi_ta_tu_prefix(base, rad1) -- Alternative verbal noun فِعَال will be inserted when we add sound parts below. local vn = vform == "VI" and q(ta_past_prefix, rad1, AA, rad2, SH) or q(MU, rad1, AA, rad2, SH, AH) -- Various stem bases. local past_stem_base = q(ta_past_prefix, rad1, AA) local nonpast_stem_base = q(ta_nonpast_prefix, rad1, AA) local past_pass_stem_base = q(tu_past_prefix, rad1, UU) -- Make parts. local variant = vowel_spec.variant or "short" if variant == "short" or variant == "both" then make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end -- Also add alternative sound (non-compressed) parts. This will lead to some duplicate entries, but they are -- removed during addition. if variant == "long" or variant == "both" then make_form_iii_vi_sound_final_weak_verb(base, vowel_spec) elseif vform == "III" then -- Still need to add the alternative form-III verbal noun. make_form_iii_alt_vn(base, vowel_spec) end end conjugations["III-geminate"] = function(base, vowel_spec) make_form_iii_vi_geminate_verb(base, vowel_spec) end -- Make form IV sound or final-weak verb. local function make_form_iv_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local final_weak = is_final_weak(base, vowel_spec) -- core of stem base, minus stem prefixes local stem_core -- check for irregular verb أَرَى local is_raa = raa_radicals(rad1, rad2, rad3) if is_raa then base.irregular = true stem_core = rad1 else stem_core = q(rad1, SK, rad2) end -- verbal noun local vn = is_raa and q(HAMZA, I, stem_core, AA, HAMZA, AH) or q(HAMZA, I, stem_core, AA, final_weak and HAMZA or rad3) -- various stem bases local past_stem_base = q(HAMZA, A, stem_core) local nonpast_stem_base = stem_core local past_pass_stem_base = q(HAMZA, U, stem_core) -- make parts make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end conjugations["IV-sound"] = function(base, vowel_spec) make_form_iv_sound_final_weak_verb(base, vowel_spec) end conjugations["IV-final-weak"] = function(base, vowel_spec) make_form_iv_sound_final_weak_verb(base, vowel_spec) end conjugations["IV-hollow"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) -- verbal noun local vn = q(HAMZA, I, rad1, AA, rad3, AH) -- various stem bases local past_stem_base = q(HAMZA, A, rad1) local nonpast_stem_base = rad1 local past_pass_stem_base = q(HAMZA, U, rad1) -- make parts make_augmented_hollow_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end conjugations["IV-geminate"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local vn = q(HAMZA, I, rad1, SK, rad2, AA, rad2) -- various stem bases local past_stem_base = q(HAMZA, A, rad1) local nonpast_stem_base = rad1 local past_pass_stem_base = q(HAMZA, U, rad1) -- make parts make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end conjugations["V-sound"] = function(base, vowel_spec) make_form_ii_v_sound_final_weak_verb(base, vowel_spec) end conjugations["V-final-weak"] = function(base, vowel_spec) make_form_ii_v_sound_final_weak_verb(base, vowel_spec) end conjugations["VI-sound"] = function(base, vowel_spec) make_form_iii_vi_sound_final_weak_verb(base, vowel_spec) end conjugations["VI-final-weak"] = function(base, vowel_spec) make_form_iii_vi_sound_final_weak_verb(base, vowel_spec) end conjugations["VI-geminate"] = function(base, vowel_spec) make_form_iii_vi_geminate_verb(base, vowel_spec) end -- Make a verbal noun of the general form that applies to forms VII and above. RAD12 is the first consonant cluster -- (after initial اِ) and RAD34 is the second consonant cluster. RAD5 is the final consonant. local function high_form_verbal_noun(rad12, rad34, rad5) return q(_I, rad12, I, rad34, AA, rad5) end -- Populate a sound or final-weak verb for any of the various high-numbered augmented forms (form VII and up) that -- have up to 5 consonants in two clusters in the stem and the same pattern of vowels between. Some of these -- consonants in certain verb parts are w's, which leads to apparent anomalies in certain stems of these parts, but -- these anomalies are handled automatically in postprocessing, where we resolve sequences of iwC -> īC, uwC -> ūC, -- w + sukūn + w -> w + shadda. -- RAD12 is the first consonant cluster (after initial اِ) and RAD34 is the second consonant cluster. RAD5 is the -- final consonant. local function make_high_form_sound_final_weak_verb(base, vowel_spec, rad12, rad34, rad5) local final_weak = is_final_weak(base, vowel_spec) local vn = high_form_verbal_noun(rad12, rad34, final_weak and HAMZA or rad5) -- various stem bases local nonpast_stem_base = q(rad12, A, rad34) local past_stem_base = q(_I, nonpast_stem_base) local past_pass_stem_base = q(_U, rad12, U, rad34) -- make parts make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end local function form_vii_nrad1(base, rad1) if base.reduced then if not req(rad1, M) then error(("Internal error: Form VII first radical %s is not م but .reduced specified; should have been caught earlier"): format(rget(rad1))) end return M .. SH else return q("نْ", rad1) end end -- Make form VII sound or final-weak verb. local function make_form_vii_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) make_high_form_sound_final_weak_verb(base, vowel_spec, form_vii_nrad1(base, rad1), rad2, rad3) end conjugations["VII-sound"] = function(base, vowel_spec) make_form_vii_sound_final_weak_verb(base, vowel_spec) end conjugations["VII-final-weak"] = function(base, vowel_spec) make_form_vii_sound_final_weak_verb(base, vowel_spec) end conjugations["VII-hollow"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local nrad1 = form_vii_nrad1(base, rad1) local vn = high_form_verbal_noun(nrad1, Y, rad3) -- various stem bases local nonpast_stem_base = nrad1 local past_stem_base = q(_I, nonpast_stem_base) local past_pass_stem_base = q(_U, nrad1) -- make parts make_augmented_hollow_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end conjugations["VII-geminate"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local nrad1 = form_vii_nrad1(base, rad1) local vn = high_form_verbal_noun(nrad1, rad2, rad2) -- various stem bases local nonpast_stem_base = q(nrad1, A) local past_stem_base = q(_I, nonpast_stem_base) local past_pass_stem_base = q(_U, nrad1, U) -- make parts make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end -- Return Form VIII verbal noun. local function form_viii_verbal_noun(base, vowel_spec, rad1, rad2, rad3) local final_weak = is_final_weak(base, vowel_spec) rad3 = final_weak and HAMZA or rad3 return {high_form_verbal_noun(vowel_spec.form_viii_assim, rad2, rad3)} end -- Make form VIII sound or final-weak verb. local function make_form_viii_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) -- check for irregular verb اِتَّخَذَ if axadh_radicals(rad1, rad2, rad3) then base.irregular = true rad1 = T end make_high_form_sound_final_weak_verb(base, vowel_spec, vowel_spec.form_viii_assim, rad2, rad3) end conjugations["VIII-sound"] = function(base, vowel_spec) make_form_viii_sound_final_weak_verb(base, vowel_spec) end conjugations["VIII-final-weak"] = function(base, vowel_spec) make_form_viii_sound_final_weak_verb(base, vowel_spec) end conjugations["VIII-hollow"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local vn = form_viii_verbal_noun(base, vowel_spec, rad1, Y, rad3) -- various stem bases local nonpast_stem_base = vowel_spec.form_viii_assim local past_stem_base = q(_I, nonpast_stem_base) local past_pass_stem_base = q(_U, nonpast_stem_base) -- make parts make_augmented_hollow_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end conjugations["VIII-geminate"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local vn = form_viii_verbal_noun(base, vowel_spec, rad1, rad2, rad2) -- various stem bases local nonpast_stem_base = q(vowel_spec.form_viii_assim, A) local past_stem_base = q(_I, nonpast_stem_base) local past_pass_stem_base = q(_U, vowel_spec.form_viii_assim, U) -- make parts make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end conjugations["IX-sound"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local vn = q(_I, rad1, SK, rad2, I, rad3, AA, rad3) -- various stem bases local nonpast_stem_base = q(rad1, SK, rad2, A) local past_stem_base = q(_I, nonpast_stem_base) local past_pass_stem_base = q(_U, rad1, SK, rad2, U) -- make parts make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end conjugations["IX-final-weak"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) make_high_form_sound_final_weak_verb(base, vowel_spec, q(rad1, SK, rad2), rad3, rad3) end -- Populate a sound or final-weak verb for any of the various high-numbered -- augmented forms that have 5 consonants in the stem and the same pattern of -- vowels. Some of these consonants in certain verb parts are w's, which leads to -- apparent anomalies in certain stems of these parts, but these anomalies -- are handled automatically in postprocessing, where we resolve sequences of -- iwC -> īC, uwC -> ūC, w + sukūn + w -> w + shadda. local function make_high5_form_sound_final_weak_verb(base, vowel_spec, rad1, rad2, rad3, rad4, rad5) make_high_form_sound_final_weak_verb(base, vowel_spec, q(rad1, SK, rad2), q(rad3, SK, rad4), rad5) end -- Make form X sound or final-weak verb. local function make_form_x_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) -- check for irregular verb اِسْتَحْيَا (also اِسْتَحَى) local is_hayy = hayy_radicals(rad1, rad2, rad3) local variant = vowel_spec.variant or "both" if not is_hayy or variant == "long" or variant == "both" then make_high5_form_sound_final_weak_verb(base, vowel_spec, S, T, rad1, rad2, rad3) end if is_hayy and (variant == "short" or variant == "both") then base.irregular = true -- Add alternative entries to the verbal paradigms. Any duplicates are removed during addition. make_high_form_sound_final_weak_verb(base, vowel_spec, S .. SK .. T, rad1, rad3) end end conjugations["X-sound"] = function(base, vowel_spec) make_form_x_sound_final_weak_verb(base, vowel_spec) end conjugations["X-final-weak"] = function(base, vowel_spec) make_form_x_sound_final_weak_verb(base, vowel_spec) end conjugations["X-hollow"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local vn = q(base.reduced and "اِسْ" or "اِسْتِ", rad1, AA, rad3, AH) -- various stem bases local past_stem_base = q(base.reduced and "اِسْ" or "اِسْتَ", rad1) local nonpast_stem_base = q(base.reduced and "سْ" or "سْتَ", rad1) local past_pass_stem_base = q(base.reduced and "اُسْ" or "اُسْتُ", rad1) -- make parts make_augmented_hollow_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end conjugations["X-geminate"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local vn = q("اِسْتِ", rad1, SK, rad2, AA, rad2) -- various stem bases local past_stem_base = q("اِسْتَ", rad1) local nonpast_stem_base = q("سْتَ", rad1) local past_pass_stem_base = q("اُسْتُ", rad1) -- make parts if base.altgem then inflect_tense(base, "past", "", {q(past_stem_base, A, rad2, SH), all_same = 1}, past_endings_ay_12_person_only) end make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn, base.altgem and "[uncommon]" or nil) end conjugations["XI-sound"] = function(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local vn = q(_I, rad1, SK, rad2, II, rad3, AA, rad3) -- various stem bases local nonpast_stem_base = q(rad1, SK, rad2, AA) local past_stem_base = q(_I, nonpast_stem_base) local past_pass_stem_base = q(_U, rad1, SK, rad2, UU) -- make parts make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end -- Probably no form XI final-weak, since already geminate in form; would behave as XI-sound. -- Make form XII sound or final-weak verb. local function make_form_xii_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) make_high5_form_sound_final_weak_verb(base, vowel_spec, rad1, rad2, W, rad2, rad3) end conjugations["XII-sound"] = function(base, vowel_spec) make_form_xii_sound_final_weak_verb(base, vowel_spec) end conjugations["XII-final-weak"] = function(base, vowel_spec) make_form_xii_sound_final_weak_verb(base, vowel_spec) end -- Make form XIII sound or final-weak verb. local function make_form_xiii_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) make_high5_form_sound_final_weak_verb(base, vowel_spec, rad1, rad2, W, W, rad3) end conjugations["XIII-sound"] = function(base, vowel_spec) make_form_xiii_sound_final_weak_verb(base, vowel_spec) end conjugations["XIII-final-weak"] = function(base, vowel_spec) make_form_xiii_sound_final_weak_verb(base, vowel_spec) end -- Make a form XIV or XV sound or final-weak verb. Last radical appears twice (if`anlala / yaf`anlilu) so if it were -- w or y you'd get if`anwā / yaf`anwī or if`anyā / yaf`anyī, i.e. unlike for most augmented verbs, the identity of -- the radical matters. local function make_form_xiv_xv_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3 = get_radicals_3(vowel_spec) local lastrad = base.verb_form == "XV" and Y or rad3 make_high5_form_sound_final_weak_verb(base, vowel_spec, rad1, rad2, N, rad3, lastrad) end conjugations["XIV-sound"] = function(base, vowel_spec) make_form_xiv_xv_sound_final_weak_verb(base, vowel_spec) end conjugations["XIV-final-weak"] = function(base, vowel_spec) make_form_xiv_xv_sound_final_weak_verb(base, vowel_spec) end conjugations["XV-sound"] = function(base, vowel_spec) make_form_xiv_xv_sound_final_weak_verb(base, vowel_spec) end -- Probably no form XV final-weak, since already final-weak in form; would behave as XV-sound. -- Make form Iq or IIq sound or final-weak verb. local function make_form_iq_iiq_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3, rad4 = get_radicals_4(vowel_spec) local final_weak = is_final_weak(base, vowel_spec) local vform = base.verb_form local vn = vform == "IIq" and q(TA, rad1, A, rad2, SK, rad3, (final_weak and IN or q(U, rad4))) or q(rad1, A, rad2, SK, rad3, (final_weak and AAH or q(A, rad4, AH))) local ta_pref = vform == "IIq" and TA or "" local tu_pref = vform == "IIq" and TU or "" -- various stem bases local past_stem_base = q(ta_pref, rad1, A, rad2, SK, rad3) local nonpast_stem_base = past_stem_base local past_pass_stem_base = q(tu_pref, rad1, U, rad2, SK, rad3) -- make parts make_augmented_sound_final_weak_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end conjugations["Iq-sound"] = function(base, vowel_spec) make_form_iq_iiq_sound_final_weak_verb(base, vowel_spec) end conjugations["Iq-final-weak"] = function(base, vowel_spec) make_form_iq_iiq_sound_final_weak_verb(base, vowel_spec) end conjugations["IIq-sound"] = function(base, vowel_spec) make_form_iq_iiq_sound_final_weak_verb(base, vowel_spec) end conjugations["IIq-final-weak"] = function(base, vowel_spec) make_form_iq_iiq_sound_final_weak_verb(base, vowel_spec) end -- Make form IIIq sound or final-weak verb. local function make_form_iiiq_sound_final_weak_verb(base, vowel_spec) local rad1, rad2, rad3, rad4 = get_radicals_4(vowel_spec) make_high5_form_sound_final_weak_verb(base, vowel_spec, rad1, rad2, N, rad3, rad4) end conjugations["IIIq-sound"] = function(base, vowel_spec) make_form_iiiq_sound_final_weak_verb(base, vowel_spec) end conjugations["IIIq-final-weak"] = function(base, vowel_spec) make_form_iiiq_sound_final_weak_verb(base, vowel_spec) end conjugations["IVq-sound"] = function(base, vowel_spec) local rad1, rad2, rad3, rad4 = get_radicals_4(vowel_spec) local vn = q(_I, rad1, SK, rad2, I, rad3, SK, rad4, AA, rad4) -- various stem bases local past_stem_base = q(_I, rad1, SK, rad2, A, rad3) local nonpast_stem_base = q(rad1, SK, rad2, A, rad3) local past_pass_stem_base = q(_U, rad1, SK, rad2, U, rad3) -- make parts make_augmented_geminate_verb(base, vowel_spec, past_stem_base, nonpast_stem_base, past_pass_stem_base, vn) end -- Probably no form IVq final-weak, since already geminate in form; would behave as IVq-sound. end create_conjugations() ------------------------------------------------------------------------------- -- Guts of main conjugation function -- ------------------------------------------------------------------------------- -- Given form, weakness and radicals, check to make sure the radicals present are allowable for the weakness. Hamzas on -- alif/wāw/yāʾ seats are never allowed (should always appear as hamza-on-the-line), and various weaknesses have various -- strictures on allowable consonants. local function check_radicals(form, weakness, rad1, rad2, rad3, rad4) local function hamza_check(index, rad) if rad == HAMZA_ON_ALIF or rad == HAMZA_UNDER_ALIF or rad == HAMZA_ON_W or rad == HAMZA_ON_Y then error("Radical " .. index .. " is " .. rad .. " but should be ء (hamza on the line)") end end local function check_waw_ya(index, rad) if not is_waw_ya(rad) then error("Radical " .. index .. " is " .. rad .. " but should be و or ي") end end local function check_not_waw_ya(index, rad) if is_waw_ya(rad) then error("In a sound verb, radical " .. index .. " should not be و or ي") end end hamza_check(rad1) hamza_check(rad2) hamza_check(rad3) hamza_check(rad4) if weakness == "assimilated" or weakness == "assimilated+final-weak" then if rad1 ~= W then error("Radical 1 is " .. rad1 .. " but should be و") end -- don't check that non-assimilated form I verbs don't have wāw as their -- first radical because some form-I verbs exist where a first-radical wāw -- behaves as sound, e.g. wajuha yawjuhu "to be distinguished". end if weakness == "final-weak" or weakness == "assimilated+final-weak" then if rad4 then check_waw_ya(4, rad4) else check_waw_ya(3, rad3) end elseif vform_supports_final_weak(form) then -- non-final-weak verbs cannot have weak final radical if there's a corresponding -- final-weak verb category. I think this is safe. We may have problems with -- ḥayya/ḥayiya yaḥyā if we treat it as a geminate verb. if rad4 then check_not_waw_ya(4, rad4) else check_not_waw_ya(3, rad3) end end if weakness == "hollow" then check_waw_ya(2, rad2) -- don't check that non-hollow verbs in forms that support hollow verbs -- don't have wāw or yāʾ as their second radical because some verbs exist -- where a middle-radical wāw/yāʾ behaves as sound, e.g. form-VIII izdawaja -- "to be in pairs". end if weakness == "geminate" then if rad4 then error("Internal error: No geminate quadrilaterals, should not be seen") end if rad2 ~= rad3 then error("Weakness is geminate; radical 3 is " .. rad3 .. " but should be same as radical 2 " .. rad2) end elseif vform_supports_geminate(form) then -- non-geminate verbs cannot have second and third radical same if there's -- a corresponding geminate verb category. I think this is safe. We -- don't fuss over double wāw or double yāʾ because this could legitimately -- be a final-weak verb with middle wāw/yāʾ, treated as sound. if rad4 then error("Internal error: No quadrilaterals should support geminate verbs") end if rad2 == rad3 and not is_waw_ya(rad2) then error("Weakness is '" .. weakness .. "'; radical 2 and 3 are same at " .. rad2 .. " but should not be; consider making weakness 'geminate'") end end end -- array of substitutions; each element is a 2-entry array FROM, TO; do it -- this way so the concatenations only get evaluated once local postprocess_subs = { -- reorder short-vowel + shadda -> shadda + short-vowel for easier processing {"(" .. AIU .. ")" .. SH, SH .. "%1"}, ----------same letter separated by sukūn should instead use shadda--------- ------------happens e.g. in kun-nā "we were".----------------- {"(.)" .. SK .. "%1", "%1" .. SH}, ---------------------------- assimilated verbs ---------------------------- -- iw, iy -> ī (assimilated verbs) {I .. W .. SK, II}, {I .. Y .. SK, II}, -- uw, uy -> ū (assimilated verbs) {U .. W .. SK, UU}, {U .. Y .. SK, UU}, -------------- final -yā uses tall alif not alif maqṣūra ------------------ {"(" .. Y .. SH .. "?" .. A .. ")" .. AMAQ, "%1" .. ALIF}, ----------------------- handle hamza assimilation ------------------------- -- initial hamza + short-vowel + hamza + sukūn -> hamza + long vowel {HAMZA .. A .. HAMZA .. SK, HAMZA .. A .. ALIF}, {HAMZA .. I .. HAMZA .. SK, HAMZA .. I .. Y}, {HAMZA .. U .. HAMZA .. SK, HAMZA .. U .. W} } local postprocess_tr_subs = { {"ī([" .. vowels .. "y*])", "iy%1"}, {"ū([" .. vowels .. "w*])", "uw%1"}, {"(.)%*", "%1%1"}, -- implement shadda ---------------------------- assimilated verbs ---------------------------- -- iw, iy -> ī (assimilated verbs) {"iw([^" .. vowels .. "w])", "ī%1"}, {"iy([^" .. vowels .. "y])", "ī%1"}, -- uw, uy -> ū (assimilated verbs) {"uw([^" .. vowels .. "w])", "ū%1"}, {"uy([^" .. vowels .. "y])", "ū%1"}, ----------------------- handle hamza assimilation ------------------------- -- initial hamza + short-vowel + hamza + sukūn -> hamza + long vowel {"ʔaʔ(" .. NV .. ")", "ʔā%1"}, {"ʔiʔ(" .. NV .. ")", "ʔī%1"}, {"ʔuʔ(" .. NV .. ")", "ʔū%1"}, } -- Post-process verb parts to eliminate phonological anomalies. Many of the changes, particularly the tricky ones, -- involve converting hamza to have the proper seat. The rules for this are complicated and are documented on the -- [[w:Hamza]] Wikipedia page. In some cases there are alternatives allowed, and we handle them below by returning -- multiple possibilities. local function postprocess_term(term) if term == "?" then return "?" end -- Add BORDER at text boundaries. term = BORDER .. term .. BORDER -- Do the main post-processing, based on the pattern substitutions in postprocess_subs. for _, sub in ipairs(postprocess_subs) do term = rsub(term, sub[1], sub[2]) end term = term:gsub(BORDER, "") if not rfind(term, HAMZA) then return term end term = term:gsub(HAMZA, HAMZA_PH) term = ar_utilities.process_hamza(term) if #term == 1 then term = term[1] end return term end local function postprocess_translit(translit) if translit == "?" then return "?" end -- Add BORDER at text boundaries. translit = BORDER .. translit .. BORDER -- Do the main post-processing, based on the pattern substitutions in postprocess_tr_subs. for _, sub in ipairs(postprocess_tr_subs) do translit = rsub(translit, sub[1], sub[2]) end translit = translit:gsub(BORDER, "") return translit end local function postprocess_forms(base) local converted_values = {} for slot, forms in pairs(base.forms) do local need_dedup = false for i, form in ipairs(forms) do local term = postprocess_term(form.form) local translit = form.translit and postprocess_translit(form.translit) or nil if term ~= form.form or translit ~= form.translit then need_dedup = true end converted_values[i] = {term, translit} end if need_dedup then local temp_dedup = {} for i = 1, #forms do local new_term, new_translit = unpack(converted_values[i]) if type(new_term) == "table" then for _, nt in ipairs(new_term) do local new_formobj = { form = nt, translit = new_translit, footnotes = forms[i].footnotes, } iut.insert_form(temp_dedup, "temp", new_formobj) end else local new_formobj = { form = new_term, translit = new_translit, footnotes = forms[i].footnotes, } iut.insert_form(temp_dedup, "temp", new_formobj) end end base.forms[slot] = temp_dedup.temp end end end local function process_slot_overrides(base) for slot, forms in pairs(base.slot_overrides) do local existing_values = base.forms[slot] base.forms[slot] = nil for _, form in ipairs(forms) do -- + in active participle for form I requests slot ap1 if form.form == "+" and (base.verb_form ~= "I" or slot ~= "ap") then if not existing_values then error(("Slot '%s' requested the default value but no such value available"):format(slot)) end -- We maintain an invariant that no two slots share a form object (although they may share the footnote -- lists inside the form objects). However, there is no need to copy the form objects here because there -- is a one-to-one correspondence between slots and slot overrides, i.e. you can't have a default value -- go into two slots. insert_form_or_forms(base, slot, existing_values, "allow overrides", form.uncertain) elseif default_indicator_to_active_participle_slot[form.form] then if form.form == "++" then if slot ~= "vn" and slot ~= "ap" and slot ~= "pp" then error(("Secondary default value request '++' only applicable to verbal nouns and pariciples, but found in slot '%s'"): format(slot)) end else if slot ~= "ap" then error(("Secondary default value request '%s' only applicable to active pariciples, but found in slot '%s'"): format(form.form, slot)) end end local secondary_default_slot = slot == "vn" and "vn2" or slot == "pp" and "pp2" or default_indicator_to_active_participle_slot[form.form] local existing_values = base.forms[secondary_default_slot] if not existing_values then error(("Slot '%s' requested a secondary default value using '%s' but no such value available"): format(slot, form.form)) end -- See comment above about the lack of need to copy the form objects. insert_form_or_forms(base, slot, existing_values, "allow overrides", form.uncertain) -- To make sure there aren't shared form objects. base.forms[secondary_default_slot] = nil else insert_form_or_forms(base, slot, form, "allow overrides", form.uncertain) end end end -- Now, for non-stative form-I verbs, fill the active participle slot from ap1 unless it should be missing (e.g. -- passive-only or user specified 'ap:-'). if base.verb_form == "I" and not base.forms.ap and base.forms.ap1 and not skip_slot(base, "ap") then local saw_non_stative = false for _, vowel_spec in ipairs(base.conj_vowels) do if req(vowel_spec.past, A) then saw_non_stative = true break end end if saw_non_stative then base.forms.ap = base.forms.ap1 -- To make sure there aren't shared form objects. base.forms.ap1 = nil end end end local function handle_lemma_linked(base) -- Compute linked versions of potential lemma slots, for use in {{ar-verb}}. We substitute the original lemma -- (before removing links) for forms that are the same as the lemma, if the original lemma has links. for _, slot in ipairs(export.potential_lemma_slots) do if base.forms[slot] then insert_form_or_forms(base, slot .. "_linked", iut.map_forms(base.forms[slot], function(form) if form == base.lemma and rfind(base.linked_lemma, "%[%[") then return base.linked_lemma else return form end end)) end end end -- Process specs given by the user using 'addnote[SLOTSPEC][FOOTNOTE][FOOTNOTE][...]'. local function process_addnote_specs(base) for _, spec in ipairs(base.addnote_specs) do for _, slot_spec in ipairs(spec.slot_specs) do slot_spec = "^" .. slot_spec .. "$" for slot, forms in pairs(base.forms) do if rfind(slot, slot_spec) then -- To save on memory, side-effect the existing forms. for _, form in ipairs(forms) do form.footnotes = iut.combine_footnotes(form.footnotes, spec.footnotes) end end end end end end local function add_missing_links_to_forms(base) -- Any forms without links should get them now. Redundant ones will be stripped later. for slot, forms in pairs(base.forms) do for _, form in ipairs(forms) do if not form.form:find("%[%[") then form.form = "[[" .. form.form .. "]]" end end end end local function conjugate_verb(base) construct_stems(base) for _, vowel_spec in ipairs(base.conj_vowels) do -- Reconstruct conjugation type from verb form and (possibly inferred) weakness. conj_type = base.verb_form .. "-" .. vowel_spec.weakness -- Check that the conjugation type is recognized. if not conjugations[conj_type] then error("Unknown conjugation type '" .. conj_type .. "'") end -- The way the conjugation functions work is they always add entries to the appropriate parts of the paradigm -- (each of which is an array), rather than setting the values. This makes it possible to call more than one -- conjugation function and essentially get a paradigm of the "either A or B" kind. Doing this may insert -- duplicate entries into a particular paradigm part, but this is not a problem because we check for duplicate -- entries when adding them, and don't insert in that case. conjugations[conj_type](base, vowel_spec) end postprocess_forms(base) process_slot_overrides(base) -- This should happen before add_missing_links_to_forms() so that the comparison `form == base.lemma` in -- handle_lemma_linked() works correctly and compares unlinked forms to unlinked forms. handle_lemma_linked(base) process_addnote_specs(base) if not base.alternant_multiword_spec.args.noautolinkverb then add_missing_links_to_forms(base) end end local function parse_indicator_spec(angle_bracket_spec) -- Store the original angle bracket spec so we can reconstruct the overall conj spec with the lemma(s) in them. local base = { angle_bracket_spec = angle_bracket_spec, conj_vowels = {}, root_consonants = {}, user_stem_overrides = {}, user_slot_overrides = {}, slot_explicitly_missing = {}, slot_uncertain = {}, slot_override_uses_default = {}, addnote_specs = {}, } local function parse_err(msg) error(msg .. ": " .. angle_bracket_spec) end local function fetch_footnotes(separated_group) local footnotes for j = 2, #separated_group - 1, 2 do if separated_group[j + 1] ~= "" then parse_err("Extraneous text after bracketed footnotes: '" .. table.concat(separated_group) .. "'") end if not footnotes then footnotes = {} end table.insert(footnotes, separated_group[j]) end return footnotes end local inside = angle_bracket_spec:match("^<(.*)>$") assert(inside) local segments = put.parse_multi_delimiter_balanced_segment_run(inside, {{"[", "]"}, {"<", ">"}}) local dot_separated_groups = put.split_alternating_runs_and_strip_spaces(segments, "%.") -- The first dot-separated element must specify the verb form, e.g. IV or IIq. If the form is I, it needs to include -- the the past and non-past vowels, e.g. I/a~u for kataba ~ yaktubu. More than one vowel can be given, -- comma-separated, and more than one past~non-past pair can be given, slash-separated, e.g. I/a,u~u/i~a for form I -- كمل, which can be conjugated as kamala/kamula ~ yakmulu or kamila ~ yakmalu. An individual vowel spec must be one -- of a, i or u and in general (a) at least one past~non-past pair most be given, and (b) both past and non-past -- vowels must be given even though sometimes the vowel can be determined from the unvocalized form. An exception is -- passive-only verbs, where the vowels can't in general be determined (except indirectly in some cases by looking -- at an associated non-passive verb); in that case, the vowel~vowel spec can left out. local slash_separated_groups = put.split_alternating_runs_and_strip_spaces(dot_separated_groups[1], "/") local form_spec = slash_separated_groups[1] base.form_footnotes = fetch_footnotes(form_spec) if form_spec[1] == "" then parse_err("Missing verb form") end if not allowed_vforms_with_weakness_set[form_spec[1]] then parse_err(("Unrecognized verb form '%s', should be one of %s"):format( form_spec[1], list_to_text(allowed_vforms, nil, " or "))) end if form_spec[1]:find("%-") then base.verb_form, base.explicit_weakness = form_spec[1]:match("^(.-)%-(.*)$") else base.verb_form = form_spec[1] end if #slash_separated_groups > 1 then if base.verb_form ~= "I" then parse_err(("Past~non-past vowels can only be specified when verb form is I, but saw form '%s'"):format( base.verb_form)) end for i = 2, #slash_separated_groups do local slash_separated_group = slash_separated_groups[i] local tilde_separated_groups = put.split_alternating_runs_and_strip_spaces(slash_separated_group, "~") if #tilde_separated_groups ~= 2 then parse_err(("Expected two tilde-separated vowel specs: %s"):format(table.concat(slash_separated_group))) end local function parse_conj_vowels(tilde_separated_group, vtype) local conj_vowel_objects = {} local comma_separated_groups = put.split_alternating_runs_and_strip_spaces(tilde_separated_group, ",") for _, comma_separated_group in ipairs(comma_separated_groups) do local conj_vowel = comma_separated_group[1] if conj_vowel ~= "a" and conj_vowel ~= "i" and conj_vowel ~= "u" then parse_err(("Expected %s conjugation vowel '%s' to be one of a, i or u in %s"):format( vtype, conj_vowel, table.concat(slash_separated_group))) end conj_vowel = dia[conj_vowel] local conj_vowel_footnotes = fetch_footnotes(comma_separated_group) -- Try to use strings when possible as it makes q() significantly more efficient. if conj_vowel_footnotes then table.insert(conj_vowel_objects, {form = conj_vowel, footnotes = conj_vowel_footnotes}) else table.insert(conj_vowel_objects, conj_vowel) end end return conj_vowel_objects end local conj_vowel_spec = { past = parse_conj_vowels(tilde_separated_groups[1], "past"), nonpast = parse_conj_vowels(tilde_separated_groups[2], "non-past"), } table.insert(base.conj_vowels, conj_vowel_spec) end end for i = 2, #dot_separated_groups do local dot_separated_group = dot_separated_groups[i] local first_element = dot_separated_group[1] if first_element == "addnote" then local spec_and_footnotes = fetch_footnotes(dot_separated_group) if #spec_and_footnotes < 2 then parse_err("Spec with 'addnote' should be of the form 'addnote[SLOTSPEC][FOOTNOTE][FOOTNOTE][...]'") end local slot_spec = table.remove(spec_and_footnotes, 1) local slot_spec_inside = rmatch(slot_spec, "^%[(.*)%]$") if not slot_spec_inside then parse_err("Internal error: slot_spec " .. slot_spec .. " should be surrounded with brackets") end local slot_specs = rsplit(slot_spec_inside, ",") -- FIXME: Here, [[Module:it-verb]] called strip_spaces(). Generally we don't do this. Should we? table.insert(base.addnote_specs, {slot_specs = slot_specs, footnotes = spec_and_footnotes}) elseif first_element:find("^var:") then if #dot_separated_group > 1 then parse_err(("Can't attach footnotes to 'var:' spec '%s'"):format(first_element)) end base.variant = first_element:match("^var:(.*)$") elseif first_element:find("^I+V?:") then local root_cons, root_cons_value = first_element:match("^(I+V?):(.*)$") local root_index if root_cons == "I" then root_index = 1 elseif root_cons == "II" then root_index = 2 elseif root_cons == "III" then root_index = 3 elseif root_cons == "IV" then root_index = 4 if not base.verb_form:find("q$") then parse_err(("Can't specify root consonant IV for non-quadriliteral verb form '%s': %s"):format( base.verb_form, first_element)) end end local cons, translit = root_cons_value:match("^(.*)//(.*)$") if not cons then cons = root_cons_value end local root_footnotes = fetch_footnotes(dot_separated_group) if not translit and not root_footnotes then base.root_consonants[root_index] = cons else base.root_consonants[root_index] = {form = cons, translit = translit, footnotes = root_footnotes} end elseif first_element:find("^[a-z][a-z0-9_]*:") then local slot_or_stem, remainder = first_element:match("^(.-):(.*)$") dot_separated_group[1] = remainder local comma_separated_groups = put.split_alternating_runs_and_strip_spaces(dot_separated_group, "[,،]") if overridable_stems[slot_or_stem] then if base.user_stem_overrides[slot_or_stem] then parse_err("Overridable stem '" .. slot_or_stem .. "' specified twice") end base.user_stem_overrides[slot_or_stem] = overridable_stems[slot_or_stem](comma_separated_groups, {prefix = slot_or_stem, base = base, parse_err = parse_err, fetch_footnotes = fetch_footnotes}) else -- assume a form override; we validate further later when the possible slots are available if base.user_slot_overrides[slot_or_stem] then parse_err("Form override '" .. slot_or_stem .. "' specified twice") end base.user_slot_overrides[slot_or_stem] = allow_multiple_values_for_override(comma_separated_groups, {prefix = slot_or_stem, base = base, parse_err = parse_err, fetch_footnotes = fetch_footnotes}, "is form override") end elseif indicator_flags[first_element] then if #dot_separated_group > 1 then parse_err("No footnotes allowed with '" .. first_element .. "' spec") end if base[first_element] then parse_err("Spec '" .. first_element .. "' specified twice") end base[first_element] = true else local passive, uncertain = first_element:match("^(.*)(%?)$") passive = passive or first_element uncertain = not not uncertain if passive_types[passive] then if #dot_separated_group > 1 then parse_err("No footnotes allowed with '" .. passive .. "' spec") end if base.passive then parse_err("Value for passive type specified twice") end base.passive = passive base.passive_uncertain = uncertain else parse_err("Unrecognized spec '" .. first_element .. "'") end end end return base end -- Normalize all lemmas, substituting the pagename for blank lemmas and adding links to multiword lemmas. local function normalize_all_lemmas(alternant_multiword_spec, head) -- (1) Add links to all before and after text. Remember the original text so we can reconstruct the verb spec later. if not alternant_multiword_spec.args.noautolinktext then iut.add_links_to_before_and_after_text(alternant_multiword_spec, "remember original") end -- (2) Remove any links from the lemma, but remember the original form so we can use it below in the 'lemma_linked' -- form. iut.map_word_specs(alternant_multiword_spec, function(base) if base.lemma == "" then base.lemma = head end base.user_specified_lemma = base.lemma base.lemma = m_links.remove_links(base.lemma) base.user_specified_verb = base.lemma base.verb = base.user_specified_verb local linked_lemma if alternant_multiword_spec.args.noautolinkverb or base.user_specified_lemma:find("%[%[") then linked_lemma = base.user_specified_lemma else -- Add links to the lemma so the user doesn't specifically need to, since we preserve -- links in multiword lemmas and include links in non-lemma forms rather than allowing -- the entire form to be a link. linked_lemma = iut.add_links(base.user_specified_lemma) end base.linked_lemma = linked_lemma end) end -- Determine weakness from radicals. Used when root given in place of lemma (e.g. for {{ar-verb forms}}). local function weakness_from_radicals(form, rad1, rad2, rad3, rad4) local weakness = nil local quadlit = form:find("q$") -- If weakness unspecified, derive from radicals. if not quadlit then if is_waw_ya(rad3) and rad1 == W and form == "I" then weakness = "assimilated+final-weak" elseif is_waw_ya(rad3) and vform_supports_final_weak(form) then weakness = "final-weak" elseif rad2 == rad3 and vform_supports_geminate(form) then weakness = "geminate" elseif is_waw_ya(rad2) and vform_supports_hollow(form) then weakness = "hollow" elseif rad1 == W and form == "I" then weakness = "assimilated" else weakness = "sound" end else if is_waw_ya(rad4) then weakness = "final-weak" else weakness = "sound" end end return weakness end -- Join the infixed tāʔ (ت) to the first radical in form VIII verbs. This may cause assimilation of the tāʔ to the -- radical or in some cases the radical to the tāʔ. Used when a root is supplied instead of a lemma (which already has -- the appropriate assimilation in it). local function form_viii_join_ta(rad) if rad == W or rad == Y or rad == "ت" then return "تّ" elseif rad == "د" then return "دّ" elseif rad == "ث" then return "ثّ" elseif rad == "ذ" then return "ذّ" elseif rad == "ز" then return "زْد" elseif rad == "ص" then return "صْط" elseif rad == "ض" then return "ضْط" elseif rad == "ط" then return "طّ" elseif rad == "ظ" then return "ظّ" else return rad .. SK .. "ت" end end local function detect_indicator_spec(base) base.forms = {} base.stem_overrides = {} base.slot_overrides = {} if not base.conj_vowels[1] then -- These may be converted to inferred vowels. If not, we throw an error if form I and not passive-only. base.conj_vowels = {{ past = "-", nonpast = "-", }} else -- If multiple vowels specified for a given vowel type (e.g. a,u~u), expand so that each spec in local expansion = {} for _, spec in ipairs(base.conj_vowels) do for _, past in ipairs(spec.past) do for _, nonpast in ipairs(spec.nonpast) do table.insert(expansion, {past = past, nonpast = nonpast}) end end end base.conj_vowels = expansion end local vform = base.verb_form -- check for quadriliteral form (Iq, IIq, IIIq, IVq) base.quadlit = not not vform:find("q$") -- Infer radicals as necessary. We infer a separate set of radicals for each past~non-past vowel combination because -- they may be different (particularly with form-I hollow verbs). for _, vowel_spec in ipairs(base.conj_vowels) do -- NOTE: rad1, rad2, etc. refer to user-specified radicals, which are formobj tables that optionally specify an -- explicit manual translit, whereas ir1, ir2, etc. refer to inferred radicals, which are either strings or -- lists of possible radicals. local rads = base.root_consonants local rad1, rad2, rad3, rad4 = rads[1], rads[2], rads[3], rads[4] -- Default any unspecified radicals to radicals determined from the headword. The returned radicals may be -- lists of possible radicals, where the first radical should be chosen if the user didn't explicitly specify a -- radical but all are allowed. If `ambig = true` is set in the table, the radical is considered ambiguous and -- categories won't be created for weak radicals. local weakness, ir1, ir2, ir3, ir4 if vform ~= "none" then ir1, ir2, ir3 = rmatch(base.lemma, "^([^_])_([^_])_([^_])$") if not ir1 then ir1, ir2, ir3, ir4 = rmatch(base.lemma, "^([^_])_([^_])_([^_])_([^_])$") end if ir1 then -- root given instead of lemma weakness = weakness_from_radicals(vform, ir1, ir2, ir3, ir4) if vform == "VIII" then vowel_spec.form_viii_assim = form_viii_join_ta(ir1) end else local ret = export.infer_radicals { headword = base.lemma, vform = vform, passive = base.passive, past_vowel = vowel_spec.past, nonpast_vowel = vowel_spec.nonpast, is_reduced = base.reduced, } weakness, ir1, ir2, ir3, ir4 = ret.weakness, ret.rad1, ret.rad2, ret.rad3, ret.rad4 vowel_spec.form_viii_assim = ret.form_viii_assim vowel_spec.past = ret.past_vowel vowel_spec.nonpast = ret.nonpast_vowel vowel_spec.variant = base.variant or ret.variant end end -- For most ambiguous radicals, the choice of radical doesn't matter because it doesn't affect the conjugation -- one way or another. For form I hollow verbs, however, it definitely does. In fact, the choice of radical is -- critical even beyond the past and non-past vowels because it affects the form of the passive participle. So, -- check for this and signal an error if the radical could not be inferred and is not given explicitly. if vform == "I" and type(ir2) == "table" and ir2.need_radical and not rad2 then error("Unable to guess middle radical of hollow form I verb; need to specify radical explicitly") end if vform == "I" and not is_passive_only(base.passive) and ( rget(vowel_spec.past) == "-" or rget(vowel_spec.nonpast) == "-") then error("Form I verb that isn't passive-only or final-weak must have past~non-past vowels specified") end -- Convert ambiguous radicals. local function regularize_inferred_radical(rad) if type(rad) == "table" then if rad.ambig then return {form = rad[1], ambig = true} else return rad[1] end else return rad end end -- Return the appropriate radical at index `index` (1 through 4), based either on the user-specified radical -- `user_radical` or (if unspecified) `inferred_radical`, inferred from the unvocalized lemma. Two values are -- returned, the "regularized" version of the radical (where ambiguous inferred radicals are converted to their -- most likely actual radical) and the non-regularized version. The returned values are form objects rather than -- strings. local function fetch_radical(user_radical, inferred_radical, index) if not user_radical then return regularize_inferred_radical(inferred_radical), inferred_radical else local rad_formval = rget(user_radical) if type(inferred_radical) == "table" then local allowed_radical_set = m_table.listToSet(inferred_radical) if not allowed_radical_set[rad_formval] then error(("For lemma %s, radical %s ambiguously inferred as %s but user radical incompatibly given as %s"): format(base.lemma, index, list_to_text(inferred_radical, nil, " or "), rad_formval)) end elseif rad_formval ~= inferred_radical then error(("For lemma %s, radical %s inferred as %s but user radical incompatibly given as %s"): format(base.lemma, index, inferred_radical, rad_formval)) end return user_radical, user_radical end end if vform ~= "none" then vowel_spec.rad1, vowel_spec.unreg_rad1 = fetch_radical(rad1, ir1, 1) vowel_spec.rad2, vowel_spec.unreg_rad2 = fetch_radical(rad2, ir2, 2) vowel_spec.rad3, vowel_spec.unreg_rad3 = fetch_radical(rad3, ir3, 3) if base.quadlit then vowel_spec.rad4, vowel_spec.unreg_rad4 = fetch_radical(rad4, ir4, 4) end end if vform == "I" then -- If explicit weakness given using 'I-sound' or 'I-assimilated', we may need to adjust the inferred weakness. if base.explicit_weakness == "sound" then if weakness == "assimilated" then weakness = "sound" elseif weakness == "assimilated+final-weak" then -- Verbs like waniya~yawnā "to be faint; to languish" (although the defaults should handle this -- correctly) weakness = "final-weak" else error(("Can't specify form 'I-sound' when inferred weakness is '%s' for lemma %s"):format( weakness, base.lemma)) end elseif base.explicit_weakness == "assimilated" then if weakness == "sound" then -- i~a verbs like waṭiʔa~yaṭaʔu "to tread, to trample"; wasiʕa~yasaʕu "to be spacious; to be well-off"; -- waṯiʔa~yaṯaʔu "to get bruised, to be sprained", which would default to sound. weakness = "assimilated" elseif weakness == "final-weak" then -- For completeness; not clear if any verbs occur where this is needed. (There are plenty of -- assimilated+final-weak verbs but the defaults should take care of them.) weakness = "assimilated+final-weak" else error(("Can't specify form 'I-assimilated' when inferred weakness is '%s' for lemma %s"):format( weakness, base.lemma)) end elseif base.explicit_weakness then error(("Internal error: Unrecognized value '%s' for base.explicit_weakness"):format(base.explicit_weakness)) end elseif vform == "none" then weakness = base.explicit_weakness elseif base.explicit_weakness then error(("Internal error: Explicit weakness should not be specifiable except with forms I and none, but saw explicit weakness '%s' with verb form '%s'"): format(base.explicit_weakness, vform)) end vowel_spec.weakness = weakness if vform ~= "none" then -- Error if radicals are wrong given the weakness. More likely to happen if the weakness is explicitly given -- rather than inferred. Will also happen if certain incorrect letters are included as radicals e.g. hamza on -- top of various letters, alif maqṣūra, tā' marbūṭa. check_radicals(vform, weakness, rget(vowel_spec.rad1), rget(vowel_spec.rad2), rget(vowel_spec.rad3), base.quadlit and rget(vowel_spec.rad4) or nil) end -- Check the variant value. local form_iii_vi_geminate = (vform == "III" or vform == "VI") and rget(vowel_spec.rad2) == rget(vowel_spec.rad3) and not req(vowel_spec.rad2, Y) local hayy_i_x = hayy_radicals(vowel_spec.rad1, vowel_spec.rad2, vowel_spec.rad3) and (vform == "I" or vform == "X") if form_iii_vi_geminate or hayy_i_x then if vowel_spec.variant and vowel_spec.variant ~= "long" and vowel_spec.variant ~= "short" and vowel_spec.variant ~= "both" then error(("For form-III/VI geminate verb or form-I/X verb with ح-ي-ي radicals, saw unrecognized 'var:%s' value; should be 'var:long', 'var:short' or 'var:both'"):format( vowel_spec.variant)) end elseif vowel_spec.variant then error(("Variant value 'var:%s' not allowed in this context"):format(vowel_spec.variant)) end end -- If form I, regroup expanded vowels for display purposes. if vform == "I" then local group_by_past = {} for _, vowel_spec in ipairs(base.conj_vowels) do m_table.insertIfNot(group_by_past, { past = undia[rget(vowel_spec.past)], nonpasts = {undia[rget(vowel_spec.nonpast)]}, }, { key = function(obj) return obj.past end, combine = function(obj1, obj2) for _, nonpast in ipairs(obj2.nonpasts) do m_table.insertIfNot(obj1.nonpasts, nonpast) end end, }) end local group_by_nonpast = {} for _, vowel_spec in ipairs(group_by_past) do m_table.insertIfNot(group_by_nonpast, { pasts = {vowel_spec.past}, nonpasts = vowel_spec.nonpasts, }, { key = function(obj) return obj.nonpasts end, combine = function(obj1, obj2) for _, past in ipairs(obj2.pasts) do m_table.insertIfNot(obj1.pasts, past) end end, }) end base.grouped_conj_vowels = group_by_nonpast end -- Set value of passive. If not specified, default is yes for forms II, III, IV and Iq; no but uncertainly for -- forms VII, IX, XI - XV and IIIq - IVq, as well as form I with past vowel u; impersonal but uncertainly for form -- V, VI, X and IIq, as well as form I with past vowel i; and yes but uncertainly for the remainder (form I with -- past vowel only a and form VIII). if not base.passive then base.passive_defaulted = true -- Temporary tracking for defaulted passives by verb form, weakness and (for form I) past/non-past vowels. track_if_ar_conj(base, "passive-defaulted/" .. vform) for _, vowel_spec in ipairs(base.conj_vowels) do track_if_ar_conj(base, "passive-defaulted/" .. vform.. "/" .. vowel_spec.weakness) if vform == "I" then local past_nonpast = ("%s~%s"):format(undia[vowel_spec.past], undia[vowel_spec.nonpast]) track_if_ar_conj(base, "passive-defaulted/I/" .. past_nonpast) track_if_ar_conj(base, "passive-defaulted/I/" .. vowel_spec.weakness .. "/" .. past_nonpast) end end if vform_probably_full_passive(vform) then base.passive = "pass" else base.passive_uncertain = true for _, vowel_spec in ipairs(base.conj_vowels) do if vform_probably_no_passive(vform, vowel_spec.weakness, vowel_spec.past, vowel_spec.nonpast) then base.passive = "nopass" break elseif vform_probably_impersonal_passive(vform, vowel_spec.weakness, vowel_spec.past, vowel_spec.nonpast) then base.passive = "ipass" break end end base.passive = base.passive or "pass" end end -- NOTE: Currently there are no built-in stems or form overrides for Arabic; this code is inherited from -- [[Module:ca-verb]], where such things do exist, and is kept for generality in case we decide in the future to -- implement such things. -- Override built-in verb stems and overrides with user-specified ones. for stem, values in pairs(base.user_stem_overrides) do base.stem_overrides[stem] = values end for slot, values in pairs(base.user_slot_overrides) do if not base.alternant_multiword_spec.verb_slots_map[slot] then error("Unrecognized override slot '" .. slot .. "': " .. base.angle_bracket_spec) end if export.unsettable_slots_set[slot] then error("Slot '" .. slot .. "' cannot be set using an override: " .. base.angle_bracket_spec) end if skip_slot(base, slot, "allow overrides") then error("Override slot '" .. slot .. "' would be skipped based on the passive, 'noimp' and/or 'no_nonpast' settings: " .. base.angle_bracket_spec) end base.slot_overrides[slot] = values end if base.verb_form == "none-final-weak" then for _, stem_type in ipairs { "past", "past_pass", "nonpast", "nonpast_pass" } do if base.stem_overrides[stem_type .. "_c"] or base.stem_overrides[stem_type .. "_v"] then error(("Specify past stem for verb type 'none-final-weak' using '%s:...' not '%s_c:...' or '%s_v:...'"): format(stem_type, stem_type, stem_type)) end end for _, stem_type in ipairs { "past", "nonpast" } do if base.stem_overrides[stem_type] or not base.stem_overrides[stem_type .. "_final_weak_vowel"] then error(("For verb type 'none-final-weak', if '%s:...' specified, so must '%s_final_weak_vowel:...'"): format(stem_type, stem_type)) end end end end local function detect_all_indicator_specs(alternant_multiword_spec) add_slots(alternant_multiword_spec) alternant_multiword_spec.verb_forms = {} -- This means at least one individual base had the slot marked as explicitly missing. Another base (e.g. when -- there are multiple alternants) might have a value for the slot. In practice, we only respect this when there are -- no overall values in the slot and `slot_uncertain` isn't set; in this case, we display "no ..." for the slot -- instead of simply not displaying anything for the slot. alternant_multiword_spec.slot_explicitly_missing = {} -- This means at least one individual base had no values for the slot and the slot marked as explicitly uncertain. -- Note that this is different from a value being present but marked as uncertain (e.g. if an override was given -- with a ? after it); this causes the form object for the value to have `uncertain = true` set. If there are no -- overall values in the slot and `slot_uncertain` is set, we display this in the headword. alternant_multiword_spec.slot_uncertain = {} iut.map_word_specs(alternant_multiword_spec, function(base) -- So arguments, etc. can be accessed. WARNING: Creates circular reference. base.alternant_multiword_spec = alternant_multiword_spec detect_indicator_spec(base) if not base.nocat then m_table.insertIfNot(alternant_multiword_spec.verb_forms, base.verb_form) end if base.passive_uncertain then alternant_multiword_spec.passive_uncertain = true end for slot, _ in pairs(base.slot_explicitly_missing) do alternant_multiword_spec.slot_explicitly_missing[slot] = true end end) end local function determine_slot_uncertainty_from_forms(alternant_multiword_spec) iut.map_word_specs(alternant_multiword_spec, function(base) -- If no verbal noun and verb form is not 'none' (manually-specified stems) — which currently only happens for -- form I — and the verbal noun wasn't explicitly indicated as missing using <vn:->, we assume it's just -- unknown/unspecified rather than missing. Same with active participles. for uncertain_slot, _ in pairs(slots_that_may_be_uncertain) do if not base.forms[uncertain_slot] and vform ~= "none" and not skip_slot(base, uncertain_slot) then base.slot_uncertain[uncertain_slot] = true end end -- Propagate slot uncertainty up. Currently only the verbal noun can have this set but we write the code -- generally. for slot, _ in pairs(base.slot_uncertain) do alternant_multiword_spec.slot_uncertain[slot] = true end end) -- If slot is uncertain and has no value, explicitly set its value to "?". for uncertain_slot, _ in pairs(slots_that_may_be_uncertain) do if not alternant_multiword_spec.forms[uncertain_slot] and alternant_multiword_spec.slot_uncertain[uncertain_slot] then alternant_multiword_spec.forms[uncertain_slot] = {{form = "?"}} end end end -- Determine certain properties of the verb from the overall forms, such as whether the verb is active-only or -- passive-only, is impersonal, lacks an imperative, etc. local function determine_verb_properties_from_forms(alternant_multiword_spec) alternant_multiword_spec.has_active = false alternant_multiword_spec.has_passive = false alternant_multiword_spec.has_non_impers_active = false alternant_multiword_spec.has_non_impers_passive = false alternant_multiword_spec.has_imp = false alternant_multiword_spec.has_past = false alternant_multiword_spec.has_nonpast = false for slot, _ in pairs(alternant_multiword_spec.forms) do if slot == "ap" or slot:find("[123]") and not slot:find("_pass") then alternant_multiword_spec.has_active = true end if slot == "pp" or slot:find("[123]") and slot:find("_pass") then alternant_multiword_spec.has_passive = true end if slot:find("[123]") and not slot:find("pass_[123]") and not slot:find("3ms") then alternant_multiword_spec.has_non_impers_active = true end if slot:find("pass_[123]") and not slot:find("3ms") then alternant_multiword_spec.has_non_impers_passive = true end if slot:find("^imp_") then alternant_multiword_spec.has_imp = true end if slot:find("^past_") then alternant_multiword_spec.has_past = true end if slot:find("^ind_") or slot:find("^sub_") or slot:find("^juss_") then alternant_multiword_spec.has_nonpast = true end end end local function add_categories_and_annotation(alternant_multiword_spec, base, multiword_lemma, insert_ann, insert_cat) -- Useful e.g. in constructing suppletive verbs out of parts. For a verb like جاء or أتى whose imperative comes -- from the unrelated verb تعالى, we don't want the latter verb showing up in categories or annotations. if base.nocat then return end local vform = base.verb_form if vform ~= "none" then insert_ann("form", vform) insert_cat("form-" .. vform .. " verbs") end if base.reduced then insert_ann("reduced", "reduced") if vform ~= "none" then insert_cat("form-" .. vform .. " reduced verbs") end end if base.quadlit then insert_cat("verbs with quadriliteral roots") end if base.passive_defaulted then insert_cat("verbs with defaulted passive") end for _, vowel_spec in ipairs(base.conj_vowels) do local rad1, rad2, rad3, rad4 = get_radicals_4(vowel_spec) local final_weak = is_final_weak(base, vowel_spec) local weakness = vowel_spec.weakness -- We have to distinguish weakness by form and weakness by conjugation. Weakness by form merely indicates the -- presence of weak letters in certain positions in the radicals. Weakness by conjugation is related to how the -- verbs are conjugated. For example, form-II verbs that are "hollow by form" (middle radical is wāw or yāʾ) are -- conjugated as sound verbs. Another example: form-I verbs with initial wāw are "assimilated by form" and most -- are assimilated by conjugation as well, but a few are sound by conjugation, e.g. wajuha yawjuhu "to be -- distinguished" (rather than wajuha yajuhu); similarly for some hollow-by-form verbs in various forms, e.g. -- form VIII izdawaja yazdawiju "to be in pairs" (rather than izdāja yazdāju). Categories referring to weakness -- always refer to weakness by conjugation; weakness by form is distinguished only by categories such as -- [[:Category:Arabic form-III verbs with و as second radical]]. insert_ann("weakness", weakness) if vform ~= "none" then insert_cat(("%s form-%s verbs"):format(weakness, vform)) end local function radical_is_ambiguous(rad) return type(rad) == "table" and rad.ambig end local function radical_is_unambiguous_weak(rad) return not radical_is_ambiguous(rad) and (is_waw_ya(rad) or req(rad, HAMZA)) end if vform ~= "none" then local ur1, ur2, ur3, ur4 = vowel_spec.unreg_rad1, vowel_spec.unreg_rad2, vowel_spec.unreg_rad3, vowel_spec.unreg_rad4 -- Create headword categories based on the radicals. Do the following before -- converting the Latin radicals into Arabic ones so we distinguish -- between ambiguous and non-ambiguous radicals. if radical_is_ambiguous(ur1) or radical_is_ambiguous(ur2) or radical_is_ambiguous(ur3) or ur4 and radical_is_ambiguous(ur4) then insert_cat("verbs with ambiguous radicals") end if radical_is_unambiguous_weak(ur1) then insert_cat("form-" .. vform .. " verbs with " .. rget(ur1) .. " as first radical") end if radical_is_unambiguous_weak(ur2) then insert_cat("form-" .. vform .. " verbs with " .. rget(ur2) .. " as second radical") end if radical_is_unambiguous_weak(ur3) then insert_cat("form-" .. vform .. " verbs with " .. rget(ur3) .. " as third radical") end if ur4 and radical_is_unambiguous_weak(ur4) then insert_cat("form-" .. vform .. " verbs with " .. rget(ur4) .. " as fourth radical") end end end if vform == "I" and not is_passive_only(base.passive) then for _, vowel_spec in ipairs(base.grouped_conj_vowels) do insert_ann("vowels", ("%s ~ %s"):format(table.concat(vowel_spec.pasts, "/"), table.concat(vowel_spec.nonpasts, "/"))) for _, past in ipairs(vowel_spec.pasts) do for _, nonpast in ipairs(vowel_spec.nonpasts) do if past == "-" or nonpast == "-" then error("Internal error: Saw form I past vowel %s and non-past vowel %s but - in place of vowel should have triggered an error earlier") end insert_cat(("form-I verbs with past vowel %s and non-past vowel %s"):format(past, nonpast)) end end end end for slot, name in pairs(slots_that_may_be_uncertain) do if base.slot_uncertain[slot] then -- An unspecified and non-defaulted verbal noun (form I) is considered uncertain rather than explicitly -- missing. Use <vn:-> to explicitly indicate the lack of verbal noun. Same for form-I stative active -- participles. insert_cat(("verbs with unknown or uncertain %ss"):format(name)) end end if base.irregular then insert_ann("irreg", "irregular") insert_cat("irregular verbs") end end -- Compute the categories to add the verb to, as well as the annotation to display in the conjugation title bar. We -- combine the code to do these functions as both categories and title bar contain similar information. local function compute_categories_and_annotation(alternant_multiword_spec) alternant_multiword_spec.categories = {} local ann = {} alternant_multiword_spec.annotation = ann ann.form = {} ann.weakness = {} ann.vowels = {} ann.passive = nil ann.reduced = {} ann.irreg = {} ann.defective = {} local multiword_lemma = false for _, slot in ipairs(export.potential_lemma_slots) do if alternant_multiword_spec.forms[slot] then for _, formobj in ipairs(alternant_multiword_spec.forms[slot]) do if formobj.form:find(" ") then multiword_lemma = true break end end break end end local function insert_ann(anntype, value) m_table.insertIfNot(alternant_multiword_spec.annotation[anntype], value) end local function insert_cat(cat, also_when_multiword) -- Don't place multiword terms in categories like 'Arabic form-II verbs' to avoid spamming the categories with -- such terms. if also_when_multiword or not multiword_lemma then m_table.insertIfNot(alternant_multiword_spec.categories, "Arabic " .. cat) end end iut.map_word_specs(alternant_multiword_spec, function(base) add_categories_and_annotation(alternant_multiword_spec, base, multiword_lemma, insert_ann, insert_cat) end) for slot, name in pairs(slots_that_may_be_uncertain) do if alternant_multiword_spec.forms[slot] then for _, form in ipairs(alternant_multiword_spec.forms[slot]) do if form.uncertain then if form.form == "?" then insert_cat(("verbs with explicitly unknown %ss"):format(name)) else insert_cat(("verbs needing %s checked"):format(name)) end break end end end end if alternant_multiword_spec.has_active then if alternant_multiword_spec.has_passive and alternant_multiword_spec.has_non_impers_passive then insert_cat("verbs with full passive") ann.passive = "full passive" elseif alternant_multiword_spec.has_passive then insert_cat("verbs with impersonal passive") ann.passive = "impersonal passive" else insert_cat("verbs lacking passive forms") ann.passive = "no passive" end else if alternant_multiword_spec.has_non_impers_passive then insert_cat("passive verbs") insert_cat("verbs with full passive") ann.passive = "passive-only" else insert_cat("passive verbs") insert_cat("impersonal verbs") insert_cat("verbs with impersonal passive") ann.passive = "impersonal (passive-only)" end end if alternant_multiword_spec.passive_uncertain then insert_cat("verbs needing passive checked") ann.passive = ann.passive .. ' <abbr title="passive status uncertain">(?)</abbr>' end if alternant_multiword_spec.has_active and not alternant_multiword_spec.has_imp then insert_ann("defective", "no imperative") insert_cat("verbs lacking imperative forms") end if not alternant_multiword_spec.has_past then insert_ann("defective", "no past") insert_cat("verbs lacking past forms") end if not alternant_multiword_spec.has_nonpast then insert_ann("defective", "no non-past") insert_cat("verbs lacking non-past forms") end local ann_parts = {} local function insert_ann_part(part, conj) local val = table.concat(ann[part], conj or " or ") if val ~= "" and val ~= "regular" then table.insert(ann_parts, val) end end insert_ann_part("form") insert_ann_part("weakness") insert_ann_part("reduced") insert_ann_part("vowels") if ann.passive then table.insert(ann_parts, ann.passive) end insert_ann_part("irreg") insert_ann_part("defective", ", ") alternant_multiword_spec.annotation = table.concat(ann_parts, ", ") end local function show_forms(alternant_multiword_spec) local lemmas = {} for _, slot in ipairs(export.potential_lemma_slots) do if alternant_multiword_spec.forms[slot] then for _, formobj in ipairs(alternant_multiword_spec.forms[slot]) do table.insert(lemmas, formobj) end break end end alternant_multiword_spec.lemmas = lemmas -- save for later use in make_table() alternant_multiword_spec.vn = alternant_multiword_spec.forms.vn -- save for later use in make_table() -- Reconstruct the original verb spec without overrides for verbal nouns and participles, since those specific slots -- are ignored by {{ar-verb form}}. Compute this once beforehand; `transform_accel_obj` is called repeatedly on each -- form and we don't want to compute this repeatedly. local reconstructed_verb_spec = iut.reconstruct_original_spec(alternant_multiword_spec, { preprocess_angle_bracket_spec = function(spec) spec = spec:match("^<(.*)>$") assert(spec) local segments = put.parse_multi_delimiter_balanced_segment_run(spec, {{"[", "]"}, {"<", ">"}}) local dot_separated_groups = put.split_alternating_runs_and_strip_spaces(segments, "%.") -- Rejoin each dot-separated group into a single string, since we aren't actually going to do any parsing -- of bracket-bounded textual runs; then filter out overrides for verbal nouns and participles. local filtered_indicators = {} for _, dot_separated_group in ipairs(dot_separated_groups) do local indicator = table.concat(dot_separated_group) -- FIXME: Do we want to filter out any other indicators? if not (indicator:find("^vn:") or indicator:find("^[ap]p:")) then table.insert(filtered_indicators, indicator) end end return ("<%s>"):format(table.concat(filtered_indicators, ".")) end, }) -- If we're dealing with a single word, no alternants and a single verb form, use the auto-conjugation-fetching -- variant. local reconstructed_lemma, inside = reconstructed_verb_spec:match("^([^ <>()]+)(%b<>)$") if inside and alternant_multiword_spec.verb_forms[1] and not alternant_multiword_spec.verb_forms[2] then reconstructed_verb_spec = ("+%s<%s>"):format(reconstructed_lemma, alternant_multiword_spec.verb_forms[1]) end local function transform_accel_obj(slot, formobj, accel_obj) if not accel_obj then return accel_obj end if slot == "ap" or slot == "pp" or slot == "vn" then -- FIXME: [[Module:accel]] can't correctly handle more than one verb form for participles and verbal nouns accel_obj.form = slot .. "-" .. table.concat(alternant_multiword_spec.verb_forms, ",") else accel_obj.form = "verb-form-" .. reconstructed_verb_spec end return accel_obj end local function generate_link(data) local form = data.form local term = form.formval_for_link local alt = form.alt if term == "?" then term = nil alt = "?" end local link = m_links.full_link { lang = lang, term = term, tr = "-", accel = form.accel_obj, alt = alt, gloss = form.gloss, genders = form.genders, pos = form.pos, lit = form.lit, id = form.id, } .. iut.get_footnote_text(form.footnotes, data.footnote_obj) if form.q and form.q[1] or form.qq and form.qq[1] or form.l and form.l[1] or form.ll and form.ll[1] then link = require(pron_qualifier_module).format_qualifiers { lang = lang, text = link, q = form.q, qq = form.qq, l = form.l, ll = form.ll, } end return link end local props = { lang = lang, lemmas = lemmas, transform_accel_obj = transform_accel_obj, generate_link = generate_link, slot_list = alternant_multiword_spec.verb_slots, include_translit = true, } iut.show_forms(alternant_multiword_spec.forms, props) end ------------------------------------------------------------------------------- -- Functions to create inflection tables -- ------------------------------------------------------------------------------- -- Make the conjugation table. Called from export.show(). local function make_table(alternant_multiword_spec) local text = mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-top', args = { title = 'Conjugation of {title}', tall = 'yes', palette = "green", category = 'conjugation', class = 'tr-alongside', -- temp hack to prevent extra line break } } text = text .. [=[ ! colspan="6" | verbal noun<br /><<الْمَصْدَر>> | colspan="7" | {vn} ]=] if alternant_multiword_spec.has_active then text = text .. [=[ |- ! colspan="6" | active participle<br /><<اِسْم الْفَاعِل>> | colspan="7" | {ap} ]=] end if alternant_multiword_spec.has_passive then text = text .. [=[ |- ! colspan="6" | passive participle<br /><<اِسْم الْمَفْعُول>> | colspan="7" | {pp} ]=] end text = text .. [=[ |- ! colspan="999" class="separator" | ]=] if alternant_multiword_spec.has_active then text = text .. [=[ |- ! colspan="12" class="outer" | active voice<br /><<الْفِعْل الْمَعْلُوم>> |- ! colspan="2" | ! colspan="3" | singular<br /><<الْمُفْرَد>> ! rowspan="12" class="separator" | ! colspan="2" | dual<br /><<الْمُثَنَّى>> ! rowspan="12" class="separator" | ! colspan="3"| plural<br /><<الْجَمْع>> |- ! colspan="2"| ! 1<sup>st</sup> person<br /><<الْمُتَكَلِّم>> ! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>> ! 3<sup>rd</sup> person<br /><<الْغَائِب>> ! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>> ! 3<sup>rd</sup> person<br /><<الْغَائِب>> ! 1<sup>st</sup> person<br /><<الْمُتَكَلِّم>> ! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>> ! 3<sup>rd</sup> person<br /><<الْغَائِب>> |- ! rowspan="2" | past (perfect) indicative<br /><<الْمَاضِي>> ! class="secondary" | m | rowspan="2" | {past_1s} | {past_2ms} | {past_3ms} | rowspan="2" | {past_2d} | {past_3md} | rowspan="2" | {past_1p} | {past_2mp} | {past_3mp} |- ! class="secondary" | f | {past_2fs} | {past_3fs} | {past_3fd} | {past_2fp} | {past_3fp} |- ! rowspan="2" | non-past (imperfect) indicative<br /><<الْمُضَارِع الْمَرْفُوع>> ! class="secondary" | m | rowspan="2" | {ind_1s} | {ind_2ms} | {ind_3ms} | rowspan="2" | {ind_2d} | {ind_3md} | rowspan="2" | {ind_1p} | {ind_2mp} | {ind_3mp} |- ! class="secondary" | f | {ind_2fs} | {ind_3fs} | {ind_3fd} | {ind_2fp} | {ind_3fp} |- ! rowspan="2" | subjunctive<br /><<الْمُضَارِع الْمَنْصُوب>> ! class="secondary" | m | rowspan="2" | {sub_1s} | {sub_2ms} | {sub_3ms} | rowspan="2" | {sub_2d} | {sub_3md} | rowspan="2" | {sub_1p} | {sub_2mp} | {sub_3mp} |- ! class="secondary" | f | {sub_2fs} | {sub_3fs} | {sub_3fd} | {sub_2fp} | {sub_3fp} |- ! rowspan="2" | jussive<br /><<الْمُضَارِع الْمَجْزُوم>> ! class="secondary" | m | rowspan="2" | {juss_1s} | {juss_2ms} | {juss_3ms} | rowspan="2" | {juss_2d} | {juss_3md} | rowspan="2" | {juss_1p} | {juss_2mp} | {juss_3mp} |- ! class="secondary" | f | {juss_2fs} | {juss_3fs} | {juss_3fd} | {juss_2fp} | {juss_3fp} |- ! rowspan="2" | imperative<br /><<الْأَمْر>> ! class="secondary" | m | rowspan="2" | | {imp_2ms} | rowspan="2" | | rowspan="2" | {imp_2d} | rowspan="2" | | rowspan="2" | | {imp_2mp} | rowspan="2" | |- ! class="secondary" | f | {imp_2fs} | {imp_2fp} ]=] end if alternant_multiword_spec.has_passive then text = text .. [=[ |- ! colspan="999" class="separator" | |- ! colspan="12" class="outer" | passive voice<br /><<الْفِعْل الْمَجْهُول>> |- ! colspan="2" | ! colspan="3" | singular<br /><<الْمُفْرَد>> ! rowspan="10" class="separator" | ! colspan="2" | dual<br /><<الْمُثَنَّى>> ! rowspan="10" class="separator" | ! colspan="3" | plural<br /><<الْجَمْع>> |- ! colspan="2" | ! 1<sup>st</sup> person<br /><<الْمُتَكَلِّم>> ! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>> ! 3<sup>rd</sup> person<br /><<الْغَائِب>> ! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>> ! 3<sup>rd</sup> person<br /><<الْغَائِب>> ! 1<sup>st</sup> person<br /><<الْمُتَكَلِّم>> ! 2<sup>nd</sup> person<br /><<الْمُخَاطَب>> ! 3<sup>rd</sup> person<br /><<الْغَائِب>> |- ! rowspan="2" | past (perfect) indicative<br /><<الْمَاضِي>> ! class="secondary" | m | rowspan="2" | {past_pass_1s} | {past_pass_2ms} | {past_pass_3ms} | rowspan="2" | {past_pass_2d} | {past_pass_3md} | rowspan="2" | {past_pass_1p} | {past_pass_2mp} | {past_pass_3mp} |- ! class="secondary" | f | {past_pass_2fs} | {past_pass_3fs} | {past_pass_3fd} | {past_pass_2fp} | {past_pass_3fp} |- ! rowspan="2" | non-past (imperfect) indicative<br /><<الْمُضَارِع الْمَرْفُوع>> ! class="secondary" | m | rowspan="2" | {ind_pass_1s} | {ind_pass_2ms} | {ind_pass_3ms} | rowspan="2" | {ind_pass_2d} | {ind_pass_3md} | rowspan="2" | {ind_pass_1p} | {ind_pass_2mp} | {ind_pass_3mp} |- ! class="secondary" | f | {ind_pass_2fs} | {ind_pass_3fs} | {ind_pass_3fd} | {ind_pass_2fp} | {ind_pass_3fp} |- ! rowspan="2" | subjunctive<br /><<الْمُضَارِع الْمَنْصُوب>> ! class="secondary" | m | rowspan="2" | {sub_pass_1s} | {sub_pass_2ms} | {sub_pass_3ms} | rowspan="2" | {sub_pass_2d} | {sub_pass_3md} | rowspan="2" | {sub_pass_1p} | {sub_pass_2mp} | {sub_pass_3mp} |- ! class="secondary" | f | {sub_pass_2fs} | {sub_pass_3fs} | {sub_pass_3fd} | {sub_pass_2fp} | {sub_pass_3fp} |- ! rowspan="2" | jussive<br /><<الْمُضَارِع الْمَجْزُوم>> ! class="secondary" | m | rowspan="2" | {juss_pass_1s} | {juss_pass_2ms} | {juss_pass_3ms} | rowspan="2" | {juss_pass_2d} | {juss_pass_3md} | rowspan="2" | {juss_pass_1p} | {juss_pass_2mp} | {juss_pass_3mp} |- ! class="secondary" | f | {juss_pass_2fs} | {juss_pass_3fs} | {juss_pass_3fd} | {juss_pass_2fp} | {juss_pass_3fp} ]=] end text = text .. mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-bottom', args = { notes = '{footnote}', } } local forms = alternant_multiword_spec.forms if not alternant_multiword_spec.lemmas then forms.title = "—" else local linked_lemmas = {} for _, form in ipairs(alternant_multiword_spec.lemmas) do table.insert(linked_lemmas, link_term(form.form, "term")) end forms.title = table.concat(linked_lemmas, ", ") end local ann_parts = {} if alternant_multiword_spec.annotation ~= "" then table.insert(ann_parts, alternant_multiword_spec.annotation) end if alternant_multiword_spec.vn then local linked_vns = {} for _, form in ipairs(alternant_multiword_spec.vn) do table.insert(linked_vns, link_term(form.form, "term")) end table.insert(ann_parts, (#linked_vns > 1 and "verbal nouns" or "verbal noun") .. " " .. table.concat(linked_vns, ", ")) end local annotation = table.concat(ann_parts, ", ") if annotation ~= "" then forms.title = forms.title .. " (" .. annotation .. ")" end -- Format the table. local tagged_table = rsub(text, "<<(.-)>>", tag_text) return m_string_utilities.format(tagged_table, forms) end ------------------------------------------------------------------------------- -- External entry points -- ------------------------------------------------------------------------------- -- Append two lists `l1` and `l2`, removing duplicates. If either is {nil}, just return the other. local function combine_lists(l1, l2) -- combine_footnotes() does exactly what we want. return iut.combine_footnotes(l1, l2) end local function combine_metadata(data) local src1 = data.form1 local src2 = data.form2 local dest = data.dest_form dest.uncertain = src1.uncertain or src2.uncertain if src1.genders and src2.genders and not m_table.deepEquals(src1.genders, src2.genders) then -- do nothing else dest.genders = src1.genders or src2.genders end if src1.pos and src2.pos and src1.pos ~= src2.pos then -- do nothing else dest.pos = src1.pos or src2.pos end -- Don't copy .alt, .gloss, .lit, .id, which describe a single term and don't extend to multiword terms. dest.q = combine_lists(src1.q, src2.q) dest.qq = combine_lists(src1.qq, src2.qq) dest.l = combine_lists(src1.l, src2.l) dest.ll = combine_lists(src1.ll, src2.ll) end -- Externally callable function to parse and conjugate a verb given user-specified arguments. -- Return value is WORD_SPEC, an object where the conjugated forms are in `WORD_SPEC.forms` -- for each slot. If there are no values for a slot, the slot key will be missing. The value -- for a given slot is a list of objects {form=FORM, footnotes=FOOTNOTES}. function export.do_generate_forms(args, source_template, headword_head) local PAGENAME = mw.loadData("Module:headword/data").pagename local function in_template_space() return mw.title.getCurrentTitle().nsText == "Template" end -- Determine the verb spec we're being asked to generate the conjugation of. This may be taken from the current page -- title or the value of |pagename=; but not when called from {{ar-verb form}}, where the page title is a -- non-lemma form. Note that the verb spec may omit the lemma; e.g. it may be "<II>". For this reason, we use the -- value of `pagename` computed here down below, when calling normalize_all_lemmas(). local pagename = source_template ~= "ar-verb form" and args.pagename or PAGENAME local head = headword_head or pagename local arg1 = args[1] if not arg1 then if (pagename == "ar-conj" or pagename == "ar-verb" or pagename == "ar-verb form") and in_template_space() then arg1 = "كتب<I/a~u.pass>" else arg1 = "<>" end end -- When called from {{ar-verb form}}, determine the non-lemma form whose inflections we're being asked to -- determine. This normally comes from the page title or the value of |pagename=. local verb_form_of_form if source_template == "ar-verb form" then verb_form_of_form = args.pagename if not verb_form_of_form then if PAGENAME == "ar-verb form" and in_template_space() then verb_form_of_form = "كتبت" else verb_form_of_form = PAGENAME end end end local incorporated_headword_head_into_lemma = false if arg1:find("^<.*>$") then -- missing lemma if head:find(" ") then -- If multiword lemma, try to add arg spec after the first word. -- Try to preserve the brackets in the part after the verb, but don't do it -- if there aren't the same number of left and right brackets in the verb -- (which means the verb was linked as part of a larger expression). local first_word, post = rmatch(head, "^(.-)( .*)$") local left_brackets = rsub(first_word, "[^%[]", "") local right_brackets = rsub(first_word, "[^%]]", "") if #left_brackets == #right_brackets then arg1 = iut.remove_redundant_links(first_word) .. arg1 .. post incorporated_headword_head_into_lemma = true else -- Try again using the form without links. local linkless_head = m_links.remove_links(head) if linkless_head:find(" ") then first_word, post = rmatch(linkless_head, "^(.-)( .*)$") arg1 = first_word .. arg1 .. post else error("Unable to incorporate <...> spec into explicit head due to a multiword linked verb or " .. "unbalanced brackets; please include <> explicitly: " .. arg1) end end else -- Will be incorporated through `head` below in the call to normalize_all_lemmas(). incorporated_headword_head_into_lemma = true end end local parse_props = { parse_indicator_spec = parse_indicator_spec, angle_brackets_omittable = true, allow_blank_lemma = true, } local alternant_multiword_spec = iut.parse_inflected_text(arg1, parse_props) alternant_multiword_spec.pos = pos or "verbs" alternant_multiword_spec.args = args alternant_multiword_spec.source_template = source_template alternant_multiword_spec.verb_form_of_form = verb_form_of_form alternant_multiword_spec.incorporated_headword_head_into_lemma = incorporated_headword_head_into_lemma normalize_all_lemmas(alternant_multiword_spec, head) detect_all_indicator_specs(alternant_multiword_spec) local inflect_props = { lang = lang, slot_list = alternant_multiword_spec.verb_slots, inflect_word_spec = conjugate_verb, combine_metadata = combine_metadata, -- We add links around the generated verbal forms rather than allow the entire multiword -- expression to be a link, so ensure that user-specified links get included as well. include_user_specified_links = true, } iut.inflect_multiword_or_alternant_multiword_spec(alternant_multiword_spec, inflect_props) if debug_translit then for slot, forms in pairs(alternant_multiword_spec.forms) do for _, form in ipairs(forms) do if form.translit then local full_form_translit = (lang:transliterate(m_links.remove_links(form.form))) if full_form_translit ~= form.translit then error(("Internal error: For slot '%s', form '%s' incremental translit '%s' not same as full translit '%s'"): format(slot, form.form, form.translit, full_form_translit)) end end form.form = iut.remove_redundant_links(form.form) end end end -- Remove redundant brackets around entire forms. for slot, forms in pairs(alternant_multiword_spec.forms) do for _, form in ipairs(forms) do form.form = iut.remove_redundant_links(form.form) end end determine_slot_uncertainty_from_forms(alternant_multiword_spec) determine_verb_properties_from_forms(alternant_multiword_spec) compute_categories_and_annotation(alternant_multiword_spec) if args.json and source_template == "ar-conj" then -- There is a circular reference in `base.alternant_multiword_spec`, which points back to top level. iut.map_word_specs(alternant_multiword_spec, function(base) base.alternant_multiword_spec = nil end) return require("Module:JSON").toJSON(alternant_multiword_spec) end return alternant_multiword_spec end -- Entry point for {{ar-conj}}. Template-callable function to parse and conjugate a verb given -- user-specified arguments and generate a displayable table of the conjugated forms. function export.show(frame) local parent_args = frame:getParent().args local params = { [1] = {}, ["noautolinktext"] = {type = "boolean"}, ["noautolinkverb"] = {type = "boolean"}, ["t"] = {}, -- for use by {{ar-verb form}}; otherwise ignored ["id"] = {}, -- for use by {{ar-verb form}}; otherwise ignored ["pagename"] = {}, -- for testing/documentation pages ["json"] = {type = "boolean"}, -- for bot use } local args = require("Module:parameters").process(parent_args, params) local alternant_multiword_spec = export.do_generate_forms(args, "ar-conj") if type(alternant_multiword_spec) == "string" then -- JSON return value return alternant_multiword_spec end show_forms(alternant_multiword_spec) return make_table(alternant_multiword_spec) .. require("Module:utilities").format_categories(alternant_multiword_spec.categories, lang, nil, nil, force_cat) end function export.verb_forms(frame) local parargs = frame:getParent().args local params = { [1] = {}, [2] = {}, [3] = {}, [4] = {}, [5] = {}, pagename = {}, } for _, form in ipairs(allowed_vforms) do -- FIXME: We go up to 5 here. The code supports unlimited variants but it's unlikely we will ever see more than -- 2. for index = 1, 5 do local prefix = index == 1 and form or form .. index params[prefix .. "-pv"] = {} for _, extn in ipairs { "", "-vn", "-ap", "-pp" } do params[prefix .. extn] = {} params[prefix .. extn .. "-head"] = {} -- FIXME: No -tr? params[prefix .. extn .. "-gloss"] = {} end end end local args = require("Module:parameters").process(parargs, params) local i = 1 local past_vowel_re = "^[aui,]*$" local combined_root = nil if not args[i] or rfind(args[i], past_vowel_re) then combined_root = args.pagename or mw.loadData("Module:headword/data").pagename if not rfind(combined_root, "^([^ ]) ([^ ]) ([^ ])$") and not rfind(combined_root, "^([^ ]) ([^ ]) ([^ ]) ([^ ])$") then error("When inferring roots from page title, need three or four space-separated radicals: " .. combined_root) end elseif rfind(args[i], " ") then combined_root = args[i] i = i + 1 else local separate_roots = {} while args[i] and not rfind(args[i], past_vowel_re) do table.insert(separate_roots, args[i]) i = i + 1 end combined_root = table.concat(separate_roots, " ") end local past_vowel = args[i] i = i + 1 if past_vowel and not rfind(past_vowel, past_vowel_re) then error("Unrecognized past vowel, should be 'a', 'i', 'u', 'a,u', etc. or empty: " .. past_vowel) end -- Spaces interfere with parsing as a unit in [[Module:inflection utilities]], so replace with underscore. combined_root = combined_root:gsub(" ", "_") local split_root = rsplit(combined_root, "_") -- Map from verb forms (I, II, etc.) to a table of verb properties, -- which has entries e.g. for "verb" (either true to autogenerate the verb -- head, or an explicitly specified verb head using e.g. argument "I-head"), -- and for "verb-gloss" (which comes from e.g. the argument "I" or "I-gloss"), -- and for "vn" and "vn-gloss", "ap" and "ap-gloss", "pp" and "pp-gloss". local verb_properties = {} for _, form in ipairs(allowed_vforms) do local formpropslist = {} local derivs = {{"verb", ""}, {"vn", "-vn"}, {"ap", "-ap"}, {"pp", "-pp"}} local index = 1 while true do local formprops = {} local prefix = index == 1 and form or form .. index if prefix == "I" then formprops.pv = past_vowel end if args[prefix .. "-pv"] then formprops.pv = args[prefix .. "-pv"] end for _, deriv in ipairs(derivs) do local prop = deriv[1] local extn = deriv[2] if args[prefix .. extn] == "+" then formprops[prop] = true elseif args[prefix .. extn] == "-" then formprops[prop] = false elseif args[prefix .. extn] then formprops[prop] = true formprops[prop .. "-gloss"] = args[prefix .. extn] end if args[prefix .. extn .. "-head"] then if formprops[prop] == nil then formprops[prop] = true end formprops[prop] = args[prefix .. extn .. "-head"] end if args[prefix .. extn .. "-gloss"] then if formprops[prop] == nil then formprops[prop] = true end formprops[prop .. "-gloss"] = args[prefix .. extn .. "-gloss"] end end if formprops.verb then -- If a verb form specified, also turn on vn (unless form I, with -- unpredictable vn) and ap, and maybe pp, according to form, -- weakness and past vowel. But don't turn these on if there's -- an explicit on/off specification for them (e.g. I-pp=-). if form ~= "I" and formprops.vn == nil then formprops.vn = true end if formprops.ap == nil then formprops.ap = true end local weakness = weakness_from_radicals(form, split_root[1], split_root[2], split_root[3], split_root[4]) if formprops.pp == nil and not vform_probably_no_passive(form, weakness, rsplit(formprops.pv or "", ","), {}) then formprops.pp = true end if formprops.verb == true or formprops.vn == true or formprops.ap == true or formprops.pp == true then formprops.need_autogen = true end table.insert(formpropslist, formprops) index = index + 1 else break end end table.insert(verb_properties, {form, formpropslist}) end -- Go through and create the verb form derivations as necessary, when they haven't been explicitly given. for _, vplist in ipairs(verb_properties) do local vform = vplist[1] for _, props in ipairs(vplist[2]) do if props.need_autogen then local form_with_vowels if vform == "I" then local pv = props.pv if not pv then -- Make up likely past vowels based on weakness and actual radical. if split_root[3] == W then -- final-weak form_with_vowels = "I/a~u" elseif split_root[3] == Y then form_with_vowels = "I/a~i" elseif split_root[2] == W then --hollow form_with_vowels = "I/u~u" elseif split_root[2] == Y then form_with_vowels = "I/i~i" else -- most common; doesn't matter so much since we're not displaying the non-past form_with_vowels = "I/a~u" end else local pvs = rsplit(pv, ",") local vowel_sufs = {} for _, pv in ipairs(pvs) do local vowel_spec if pv == "a" then -- Make up likely past vowels based on weakness and actual radical. if split_root[3] == W then -- final-weak vowel_spec = "a~u" elseif split_root[3] == Y then vowel_spec = "a~i" elseif split_root[2] == W then --hollow vowel_spec = "a~u" elseif split_root[2] == Y then vowel_spec = "a~i" else -- most common; doesn't matter so much since we're not displaying the non-past vowel_spec = "a~u" end elseif pv == "i" then -- most common; doesn't matter so much since we're not displaying the non-past vowel_spec = "i~a" elseif pv == "u" then -- most common; doesn't matter so much since we're not displaying the non-past vowel_spec = "u~u" else error(("Internal error: Bad past vowel '%s' in {{ar-verb forms}}"):format(pv)) end table.insert(vowel_sufs, vowel_spec) end form_with_vowels = "I/" .. table.concat(vowel_sufs, "/") end else form_with_vowels = vform end local angle_bracket_spec = ("%s<%s.pass>"):format(combined_root, form_with_vowels) local alternant_multiword_spec = export.do_generate_forms({angle_bracket_spec}, "ar-verb forms") local function format_forms(forms) if not forms then return "-" -- FIXME: Throw an error? end local formatted = {} for _, form in ipairs(forms) do if form.translit then table.insert(formatted, ("%s//%s"):format(form.form, form.translit)) else table.insert(formatted, form.form) end end return table.concat(formatted, ",") end if props.verb == true then props.verb = format_forms(alternant_multiword_spec.forms.past_3ms) end for _, deriv in ipairs({"vn", "ap", "pp"}) do if props[deriv] == true then props[deriv] = format_forms(alternant_multiword_spec.forms[deriv]) end end end end end -- Go through and output the result local formtextarr = {} for _, vplist in ipairs(verb_properties) do local form = vplist[1] for _, props in ipairs(vplist[2]) do local textarr = {} if props.verb then local text = "* '''[[Appendix:Arabic verbs#Form " .. form .. "|Form " .. form .. "]]''': " local linktext = {} local splitheads = rsplit(props.verb, "[,،]") for _, head in ipairs(splitheads) do table.insert(linktext, m_links.full_link({lang = lang, term = head, gloss = props["verb-gloss"]})) end text = text .. table.concat(linktext, ", ") table.insert(textarr, text) for _, derivengl in ipairs({{"vn", "Verbal noun"}, {"ap", "Active participle"}, {"pp", "Passive participle"}}) do local deriv = derivengl[1] local engl = derivengl[2] if props[deriv] then local text = "** " .. engl .. ": " local linktext = {} local splitheads = rsplit(props[deriv], "[,،]") for _, head in ipairs(splitheads) do local ar, translit = head:match("^(.*)//(.-)$") if not ar then ar = head end table.insert(linktext, m_links.full_link {lang = lang, term = ar, tr = translit, gloss = props[deriv .. "-gloss"]} ) end text = text .. table.concat(linktext, ", ") table.insert(textarr, text) end end table.insert(formtextarr, table.concat(textarr, "\n")) end end end return table.concat(formtextarr, "\n") end -- Infer radicals from lemma headword (i.e. 3rd masculine singular past) and verb form (I, II, etc.). Throw an error if -- headword is malformed. A given returned radical may be actually be a list of possible radicals, where the first one -- should be used if the user didn't explicitly give the radical. If the list contains a field `ambig = true`, the -- radical is considered ambiguous and should not be categorized. `is_reduced` indicates that the user specified -- `.reduced` to indicate that the verb form is reduced by assimilation and/or haplology (typically archaic Koranic -- forms such as اِدَّارَأَ instead of تَدَارَأَ; or اِسْطَاعَ instead of اِسْتِطَاعَ; etc. function export.infer_radicals(data) local headword, vform, passive, past_vowel, nonpast_vowel, is_reduced = data.headword, data.vform, data.passive, data.past_vowel, data.nonpast_vowel, data.is_reduced past_vowel = past_vowel or "-" nonpast_vowel = nonpast_vowel or "-" local function verify_vowel(vowel, param) if vowel ~= A and vowel ~= I and vowel ~= U and vowel ~= "-" then error(("Internal error: Bad value for %s: %s (should be Arabic diacritic vowel or '-')"):format( param, vowel)) end end verify_vowel(past_vowel, "past_vowel") verify_vowel(nonpast_vowel, "nonpast_vowel") local ch = {} local form_viii_assim, variant -- sub out alif-madda for easier processing headword = rsub(headword, AMAD, HAMZA .. ALIF) local function infer_err(msg, noann) local anns = {} local nohead, novform if noann == "nohead" then nohead = true elseif noann == "novform" then novform = true elseif noann == "nohead-vform" then nohead = true novform = true elseif noann then error(("Internal error: Unrecognized value for 'noann': %s"):format(dump(noann))) end if not nohead then table.insert(anns, ("headword=%s"):format(data.headword)) end if not novform then table.insert(anns, ("verb form=%s"):format(data.vform)) end anns = table.concat(anns, ", ") if anns ~= "" then anns = ": " .. anns end error(msg .. anns) end local len = ulen(headword) local expected_length -- extract the headword letters into an array for i = 1, len do table.insert(ch, usub(headword, i, i)) end -- check that the letter at the given index is the given string, or -- is one of the members of the given array local function check(index, must) local letter = ch[index] if type(must) == "string" then if not letter then infer_err("Letter " .. index .. " is nil") end if letter ~= must then infer_err(("For verb form %s, letter %s must be %s, not %s"):format(vform, index, must, letter), "novform") end elseif not m_table.contains(must, letter) then infer_err("For verb form " .. vform .. ", radical " .. index .. " must be one of " .. table.concat(must, " ") .. ", not " .. letter, "novform") end end -- Check that length of headword is within [min, max] local function check_len(min, max) if min and len < min then infer_err(("Not enough letters for verb form %s, expected at least %s"):format(vform, min), "novform") end if max and len > max then infer_err(("Too many letters for verb form %s, expected at most %s"):format(vform, max), "novform") end end -- If the vowels are i~a or u~u, a form I verb beginning with w- normally keeps the w in the non-past. Otherwise it -- loses it (i.e. it is "assimilated"). local function form_I_w_non_assimilated() return req(past_vowel, I) and req(nonpast_vowel, A) or req(past_vowel, U) and req(nonpast_vowel, U) end -- Convert radicals to canonical form (handle various hamza varieties and check for misplaced alif or alif maqṣūra; -- legitimate cases of these letters are handled above). local function convert(rad, index) if type(rad) == "table" then for i, r in ipairs(rad) do rad[i] = convert(r, index) end return rad elseif rad == HAMZA_ON_ALIF or rad == HAMZA_UNDER_ALIF or rad == HAMZA_ON_W or rad == HAMZA_ON_Y then return HAMZA elseif rad == AMAQ then infer_err("Radical " .. index .. " must not be alif maqṣūra") elseif rad == ALIF then infer_err("Radical " .. index .. " must not be alif") else return rad end end local quadlit = vform:find("q$") -- find first radical, start of second/third radicals, check for -- required letters local radstart, rad1, rad2, rad3, rad4 local weakness if vform == "I" or vform == "II" then rad1 = ch[1] radstart = 2 elseif vform == "III" then rad1 = ch[1] check(2, {ALIF, W}) -- W occurs in passive-only verbs radstart = 3 elseif vform == "IV" then -- this would be alif-madda but we replaced it with hamza-alif above. if ch[1] == HAMZA and ch[2] == ALIF then rad1 = HAMZA else check(1, HAMZA_ON_ALIF) rad1 = ch[2] end radstart = 3 elseif vform == "V" then check(1, is_reduced and ALIF or T) rad1 = ch[2] radstart = 3 elseif vform == "VI" then check(1, is_reduced and ALIF or T) if ch[2] == AMAD then rad1 = HAMZA radstart = 3 else rad1 = ch[2] check(3, {ALIF, W}) -- W occurs in passive-only verbs radstart = 4 end elseif vform == "VII" then check(1, ALIF) if is_reduced then check(2, M) rad1 = M radstart = 3 else check(2, N) rad1 = ch[3] radstart = 4 end elseif vform == "VIII" then check(1, ALIF) rad1 = ch[2] if rad1 == "د" then rad1 = {"د", "ذ"} -- not considered ambiguous since it's usually د radstart = 3 form_viii_assim = "دّ" elseif rad1 == "ظ" and ch[3] == "ط" and len >= 5 then -- [[اظطلم]], variant of [[اظلم]] radstart = 4 form_viii_assim = "ظْط" elseif rad1 == "ذ" and ch[3] == "د" and len >= 5 then -- [[اذدكر]], variant of [[اذكر]] radstart = 4 form_viii_assim = "ذْد" elseif rad1 == T or rad1 == "ث" or rad1 == "ذ" or rad1 == "ط" or rad1 == "ظ" then radstart = 3 form_viii_assim = rad1 .. SH elseif rad1 == "ز" then check(3, "د") radstart = 4 form_viii_assim = "زْد" elseif rad1 == "ص" or rad1 == "ض" then check(3, "ط") radstart = 4 form_viii_assim = rad1 .. SK .. "ط" else check(3, T) radstart = 4 rad1 = convert(rad1, 1) form_viii_assim = rad1 .. SK .. "ت" end if rad1 == T then -- Radical is ambiguous, might be ت or و or ي but doesn't affect conjugation. Note that there are no -- form-VIII verbs with initial radical ي given in Hans Wehr but Lane mentions at least: -- - (page 2973) اِتَّأَسَ, with assimilation of the ي to ت, from root ي ء س; -- - (page 2975) اِتَّبَسَ non-past يَتَّبِسُ and alternative اِيتَبَسَ non-past يَاتَبِسُ from the root ي ب س; -- - (page 2976) اِتَّسَرَ non-past يَتَّسِرُ or alternatively يَأْتَسِرُ with hamza preserved from the root ي س ر. -- These alternative forms seem very rare and probably not worth worrying about, but if we want to handle -- them, we can do it when the time comes. rad1 = {T, W, Y, ambig = true} -- اِتَّخَذَ irregularly has hamza as the radical but assimilates like و if ch[3] == "خ" and ch[4] == "ذ" then rad1[4] = HAMZA end end elseif vform == "IX" then check(1, ALIF) rad1 = ch[2] radstart = 3 elseif vform == "X" then check(1, ALIF) check(2, S) if is_reduced then rad1 = ch[3] radstart = 4 else check(3, T) rad1 = ch[4] radstart = 5 end elseif vform == "Iq" then rad1 = ch[1] rad2 = ch[2] radstart = 3 elseif vform == "IIq" then check(1, T) rad1 = ch[2] rad2 = ch[3] radstart = 4 elseif vform == "IIIq" then check(1, ALIF) rad1 = ch[2] rad2 = ch[3] check(4, N) radstart = 5 elseif vform == "IVq" then check(1, ALIF) rad1 = ch[2] rad2 = ch[3] radstart = 4 elseif vform == "XI" then check_len(5, 5) check(1, ALIF) rad1 = ch[2] rad2 = ch[3] check(4, ALIF) rad3 = ch[5] weakness = "sound" elseif vform == "XII" then check(1, ALIF) rad1 = ch[2] if ch[3] ~= ch[5] then infer_err("For verb form XII, letters 3 and 5 should be the same", "novform") end check(4, W) radstart = 5 elseif vform == "XIII" then check_len(5, 5) check(1, ALIF) rad1 = ch[2] rad2 = ch[3] check(4, W) rad3 = ch[5] if rad3 == AMAQ then weakness = "final-weak" else weakness = "sound" end elseif vform == "XIV" then check_len(6, 6) check(1, ALIF) rad1 = ch[2] rad2 = ch[3] check(4, N) rad3 = ch[5] if ch[6] == AMAQ then check_waw_ya(rad3) weakness = "final-weak" else if ch[5] ~= ch[6] then infer_err("For verb form XIV, letters 5 and 6 should be the same", "novform") end weakness = "sound" end elseif vform == "XV" then check_len(6, 6) check(1, ALIF) rad1 = ch[2] rad2 = ch[3] check(4, N) rad3 = ch[5] if rad3 == Y then check(6, ALIF) else check(6, AMAQ) end weakness = "sound" else error("Internal error: Unrecognized verb form " .. vform) end -- Process the last two radicals. RADSTART is the index of the first of the two. If it's nil then all radicals have -- already been processed above, and we don't do anything. if radstart then -- There must (normally) be one or two letters left. if len == radstart then if vform == "I" and ch[len] == Y then -- short form حَيَّ weakness = "final-weak" rad2 = Y rad3 = Y variant = "short" elseif vform == "IV" and rad1 == "ر" and ch[len] == AMAQ then -- irregular verb أَرَى weakness = "final-weak" rad2 = HAMZA rad3 = Y elseif vform == "X" and rad1 == "ح" and ch[len] == AMAQ then -- irregular verb اِسْتَحَى weakness = "final-weak" rad2 = Y rad3 = Y variant = "short" else -- If one letter left, then it's a geminate verb. If the letter is alif or alif maqṣūra, it will trigger -- an error down the line. if vform_supports_geminate(vform) then weakness = "geminate" rad2 = ch[len] rad3 = ch[len] if vform == "III" or vform == "VI" then variant = "short" end else infer_err("Apparent geminate verb, but geminate verbs not allowed for this verb form") end end elseif quadlit then -- Process last two radicals of a quadriliteral verb form. rad3 = ch[radstart] rad4 = ch[radstart + 1] expected_length = radstart + 1 check_len(expected_length) if rad4 == AMAQ or rad4 == ALIF and rad3 == Y or rad4 == Y then -- rad4 can be Y in passive-only verbs. if vform_supports_final_weak(vform) then weakness = "final-weak" -- Ambiguous radical; randomly pick wāw as radical (but avoid two wāws in a row); it could be wāw or -- yāʾ, but doesn't affect the conjugation. rad4 = rad3 == W and {Y, W, ambig = true} or {W, Y, ambig = true} else infer_err("Last radical is " .. rad4 .. " but verb form " .. vform .. " doesn't support final-weak verbs", "novform") end else weakness = "sound" end else -- Process last two radicals of a triliteral verb form. rad2 = ch[radstart] rad3 = ch[radstart + 1] expected_length = radstart + 1 check_len(expected_length) if vform == "I" and (is_waw_ya(rad3) or rad3 == ALIF or rad3 == AMAQ) then local inferred_past_vowel, inferred_nonpast_vowel -- Check for final-weak form I verb. It can end in tall alif (rad3 = wāw) or alif maqṣūra (rad3 = yāʾ) -- or a wāw or yāʾ (with a past vowel of i or u, e.g. nasiya/yansā "forget" or with a passive-only -- verb). if rad1 == W and not form_I_w_non_assimilated() then weakness = "assimilated+final-weak" else weakness = "final-weak" end if rad3 == ALIF then rad3 = W inferred_past_vowel = A inferred_nonpast_vowel = U if is_passive_only(passive) then infer_err("Final-weak form-I passive verbs should end in yāʔ (ي), not tall alif (ا)", "novform") end elseif rad3 == AMAQ then rad3 = Y inferred_past_vowel = A inferred_nonpast_vowel = I if is_passive_only(passive) then infer_err("Final-weak form-I passive verbs should end in yāʔ (ي), not alif maqṣūra (ى)", "novform") end elseif rad1 == "ح" and rad2 == Y and rad3 == Y then -- Long variant حَيِيَ. inferred_past_vowel = I inferred_nonpast_vowel = A variant = "long" else if not is_passive_only(passive) then -- does a non-passive final-weak verb in -uwa ever happen? (YES: e.g. [[رجو]] "to be slack") inferred_past_vowel = rad3 == Y and I or U inferred_nonpast_vowel = A end -- Ambiguous radical; randomly pick wāw as radical (but avoid two wāws); it could be wāw or yāʾ, but -- doesn't affect the conjugation. rad3 = (rad1 == W or rad2 == W) and {Y, W, ambig = true} or {W, Y, ambig = true} -- ambiguous end if inferred_past_vowel then local raw_past_vowel = rget(past_vowel) local raw_nonpast_vowel = rget(nonpast_vowel) if raw_past_vowel ~= "-" then if raw_past_vowel ~= inferred_past_vowel then infer_err(("Final-weak form-I verb inferred past vowel %s, which disagrees with " .. "explicitly specified %s"):format(undia[inferred_past_vowel], undia[raw_past_vowel]), "novform") else -- in case of footnote in past_vowel inferred_past_vowel = past_vowel end end if raw_nonpast_vowel ~= "-" and raw_nonpast_vowel ~= A and inferred_nonpast_vowel == U then -- if inferred as I or A, the reality can be the reverse; form-I final-weak verbs with a~a and -- i~i exist, e.g. سَعَى/يَسْعَى, وَلِيَ/يَلِي. Weird verb [[صها]] (also written [[صهى]]) has non-past -- يصهى so we can't throw an error in this situation. if raw_nonpast_vowel ~= inferred_nonpast_vowel then infer_err(("Final-weak form-I verb inferred non-past vowel %s, which disagrees with " .. "explicitly specified %s"):format(undia[inferred_nonpast_vowel], undia[raw_nonpast_vowel]), "novform") else -- in case of footnote in nonpast_vowel inferred_nonpast_vowel = nonpast_vowel end end end if not is_passive_only(passive) then if rget(past_vowel) == "-" then past_vowel = inferred_past_vowel end if rget(nonpast_vowel) == "-" then nonpast_vowel = inferred_nonpast_vowel end end elseif vform == "IX" and is_waw_ya(rad3) and len == radstart + 2 and ch[len] == AMAQ then -- Final-weak form IX verbs like اِرْعَوَى "to desist, to repent, to see the light". weakness = "final-weak" expected_length = radstart + 2 elseif vform == "X" and rad1 == "ح" and rad2 == Y and rad3 == ALIF then -- Long variant اِسْتَحْيَا. weakness = "final-weak" rad3 = Y variant = "long" elseif rad3 == AMAQ or rad2 == Y and rad3 == ALIF or rad3 == Y then -- rad3 == Y happens in passive-only verbs. if vform_supports_final_weak(vform) then weakness = "final-weak" else infer_err("Last radical is " .. rad3 .. " but verb form doesn't support final-weak verbs") end -- Ambiguous radical; randomly pick wāw as radical (but avoid two wāws); it could be wāw or yāʾ, but -- doesn't affect the conjugation. rad3 = (rad1 == W or rad2 == W) and {Y, W, ambig = true} or {W, Y, ambig = true} elseif rad2 == ALIF then if vform_supports_hollow(vform) then weakness = "hollow" local function set_past_to_a() if req(past_vowel, A) then -- already set elseif req(past_vowel, "-") or req(past_vowel, rget(nonpast_vowel)) then past_vowel = A else infer_err(("Form I hollow verb with nonpast vowel set to '%s' must have past vowel set to 'a' or the same value, not %s"): format(undia[rget(nonpast_vowel)], undia[rget(past_vowel)]), "novform") end end if vform == "I" and req(nonpast_vowel, U) then rad2 = W set_past_to_a() elseif vform == "I" and req(nonpast_vowel, I) then rad2 = Y set_past_to_a() else if req(nonpast_vowel, A) and not req(past_vowel, I) then infer_err(("Form I hollow verb with nonpast vowel set to 'a' must have past vowel set to 'i', not %s"): format(undia[rget(past_vowel)]), "novform") end -- Ambiguous radical; could be wāw or yāʾ; if verb form I, it's critical to get this right, and -- the caller checks for this situation and throws an error if non-past vowel is "a" and second -- radical isn't explicitly given. rad2 = {W, Y, ambig = true, need_radical = true} end else infer_err("Second radical is alif but verb form doesn't support hollow verbs") end elseif vform == "I" and rad1 == W and not form_I_w_non_assimilated() then weakness = "assimilated" elseif rad2 == rad3 and (vform == "III" or vform == "VI") then weakness = "geminate" variant = "long" else weakness = "sound" end end if expected_length then check_len(expected_length, expected_length) end end rad1 = convert(rad1, 1) rad2 = convert(rad2, 2) rad3 = convert(rad3, 3) rad4 = convert(rad4, 4) if not weakness then error("Internal error: Returned weakness from infer_radicals() is nil") end return { weakness = weakness, rad1 = rad1, rad2 = rad2, rad3 = rad3, rad4 = rad4, past_vowel = past_vowel, nonpast_vowel = nonpast_vowel, form_viii_assim = form_viii_assim, variant = variant, } end -- bot interface to infer_radicals() function export.infer_radicals_json(frame) local iparams = { headword = {}, vform = {}, passive = {}, past_vowel = {}, nonpast_vowel = {}, is_reduced = {type = "boolean"}, } local iargs = require("Module:parameters").process(frame.args, iparams) return require("Module:JSON").toJSON(export.infer_radicals(iargs)) end -- Infer vocalization from participle headword (active or passive), verb form (I, II, etc.) and whether the headword is -- active or passive. Throw an error if headword is malformed. Returned radicals may contain Latin letters "t", "w" or "y" -- indicating ambiguous radicals guessed to be tāʾ, wāw or yāʾ respectively. function export.infer_participle_vocalization(headword, vform, weakness, is_active) local chars = {} local orig_headword = headword -- Sub out alif-madda for easier processing. headword = rsub(headword, AMAD, HAMZA .. ALIF) local len = ulen(headword) -- Extract the headword letters into an array. for i = 1, len do table.insert(chars, usub(headword, i, i)) end local function form_intro_error_msg() return ("For verb form %s %s%s participle %s, "):format(vform, orig_headword ~= headword and "normalized " or "", is_active and "active" or "passive", headword) end local function err(msg) error(form_intro_error_msg() .. msg, 1) end -- Check that length of headword is within [min, max]. local function check_len(min, max) if min and len < min then err(("expected at least %s letters but saw %s"):format(min, len)) elseif max and len > max then err(("expected at most %s letters but saw %s"):format(max, len)) end end -- Get the character at `ind`, making sure it exists. local function c(ind) check_len(ind) return chars[ind] end -- Check that the letter at the given index is the given string, or is one of the members of the given array local function check(index, must) local letter = chars[index] local function make_possible_values() if type(must) == "string" then return must else return list_to_text(must, nil, " or ") end end if not letter then err(("expected a letter (specifically %s) at position %s, but participle is too short"):format( make_possible_values(), index)) end local matches if type(must) == "string" then matches = letter == must else matches = m_table.contains(must, letter) end if not matches then err(("letter %s at index %s must be %s"):format(letter, index, make_possible_values())) end end local function check_weakness(values, allow_missing, invert_condition) local function make_possible_weaknesses() for i, val in ipairs(values) do values[i] = "'" .. val .. "'" end return list_to_text(values, nil, " or ") end if allow_missing and invert_condition then error("Internal error: Can't specify both allow_missing and invert_condition") end if not weakness then if allow_missing or invert_condition then return else err(("weakness is unspecified but must be %s"):format(make_possible_weaknesses())) end else local matches = m_table.contains(values, weakness) if invert_condition and matches then err(("weakness '%s' must not be %s"):format(weakness, make_possible_weaknesses())) elseif not invert_condition and not matches then err(("weakness '%s' must be %s"):format(weakness, make_possible_weaknesses())) end end end local vocalized local function handle_possibly_final_weak(sound_prefix, expected_length) check_len(expected_length, expected_length) if c(expected_length) == AMAQ then -- passive final-weak if is_active then err("participle in -ِى only allowed for passive participles") end check_weakness({"final-weak", "assimilated+final-weak"}, "allow missing") vocalized = sound_prefix .. AN .. AMAQ else -- all others behave as if sound check_weakness({"final-weak", "assimilated+final-weak"}, nil, "invert condition") vocalized = sound_prefix .. (is_active and I or A) .. c(expected_length) end end if not (vform == "I" and is_active) then -- all participles except verb form I active begin in م-. check(1, M) end if vform == "I" then if is_active then check(2, ALIF) local sound_prefix = c(1) .. AA .. c(3) if len == 3 then if c(3) == HAMZA then -- Either hollow with hamzated third radical, e.g. [[شاء]] active participle 'شَاءٍ', or final-weak -- with hamzated second radical, e.g. [[رأى]] active participle 'رَاءٍ'. Theoretically (?), also -- geminate with hamzated second/third radical, but I don't know if any such verbs exist. if weakness == "geminate" then vocalized = sound_prefix .. SH else check_weakness({"hollow", "final-weak"}, "allow missing") vocalized = sound_prefix .. IN end else check_weakness({"final-weak", "geminate"}) if weakness == "geminate" then vocalized = sound_prefix .. SH else vocalized = sound_prefix .. IN end end else check_len(4, 4) -- we will convert back to alif maqṣūra below as needed vocalized = sound_prefix .. I .. c(4) end else -- assimilated verbs: regular, e.g. مَوْزُون "weighed" -- geminate verbs: regular, e.g. مَبْلُول "moistened" -- third-hamzated verbs: مَبْرُوء -- hollow verbs: مَقُود "led, driven"; مَزِيد "added, increased" -- hollow first-hamzated verbs: مَئِيض "returned, reverted"; مَأْيُوس "despaired" (NOTE: formation is sound); -- مَأُود or مَؤُود "bent; depleted" -- hollow third-hamzated verbs: مَشِيء "willed, intended", مَضُوء "glittered?" -- final-weak: مَلْقِيّ "found, encountered"; مَصْغُوّ "inclined" -- hollow + final-weak: مَشْوِيّ "fried, grilled", مَهْوِيّ "loved" -- first-hamzated + hollow + final-weak: مَأْوِيّ "received hospitably" local sound_prefix = MA .. c(2) .. SK .. c(3) if len == 5 then -- sound, assimilated or geminate check(4, W) vocalized = sound_prefix .. UU .. c(5) else check_len(4, 4) if c(4) == W then -- final-weak third-wāw vocalized = sound_prefix .. U .. W .. SH elseif c(4) == Y then -- final-weak third-yāʾ vocalized = sound_prefix .. I .. Y .. SH else -- hollow check(3, {W, Y}) if c(3) == W then vocalized = MA .. c(2) .. UU .. c(4) else vocalized = MA .. c(2) .. II .. c(4) end end end end elseif vform == "II" or vform == "V" or vform == "XII" or vform == "XIII" or vform == "Iq" or vform == "IIq" or vform == "IIIq" then local sound_prefix, expected_length if vform == "II" then sound_prefix = MU .. c(2) .. A .. c(3) .. SH expected_length = 4 elseif vform == "V" then check(2, T) sound_prefix = MU .. T .. A .. c(3) .. A .. c(4) .. SH expected_length = 5 elseif vform == "XII" then -- e.g. [[احدودب]] "to be or become convex or humpbacked", مُحْدَوْدِب (active); -- [[اثنونى]] "to be bent; to be doubled up", مُثْنَوْنٍ (active) check(4, W) if c(3) ~= c(5) then err(("third letter %s should be the same as the fifth letter %s"):format(c(3), c(5))) end sound_prefix = MU .. c(2) .. SK .. c(3) .. A .. W .. SK .. c(5) expected_length = 6 elseif vform == "XIII" then -- e.g. [[اخروط]] "to get entangled; to extend", مُخْرَوِّط (active), مُخْرَوَّط (passive) check(4, W) sound_prefix = MU .. c(2) .. SK .. c(3) .. A .. W .. SH expected_length = 5 elseif vform == "Iq" then sound_prefix = MU .. c(2) .. A .. c(3) .. SK .. c(4) expected_length = 5 elseif vform == "IIq" then check(2, T) sound_prefix = MU .. T .. A .. c(3) .. A .. c(4) .. SK .. c(5) expected_length = 6 elseif vform == "IIIq" then -- e.g. [[اخرنطم]] "to be proud and angry" check(4, T) sound_prefix = MU .. c(2) .. SK .. c(3) .. A .. N .. SK .. c(5) expected_length = 6 else error("Internal error: Unhandled verb form " .. vform) end if len == expected_length - 1 then -- active final-weak if not is_active then err(("length-%s participle only allowed for active participles"):format(len)) end check_weakness({"final-weak", "assimilated+final-weak"}, "allow missing") vocalized = sound_prefix .. IN else handle_possibly_final_weak(sound_prefix, expected_length) end elseif vform == "III" or vform == "VI" then local sound_prefix, expected_length if vform == "VI" then check(2, T) check(4, ALIF) sound_prefix = MU .. T .. A .. c(3) .. AA .. c(5) expected_length = 6 else sound_prefix = MU .. c(2) .. AA .. c(4) expected_length = 5 end if len == expected_length - 1 then -- active final-weak or active or passive geminate if is_active then check_weakness({"geminate", "final-weak", "assimilated+final-weak"}) if weakness == "geminate" then vocalized = sound_prefix .. SH else vocalized = sound_prefix .. IN end else check_weakness({"geminate"}, "allow missing") vocalized = sound_prefix .. SH end else handle_possibly_final_weak(sound_prefix, expected_length) end elseif vform == "IV" or vform == "X" then -- form IV: -- sound: مُرْسِخ (active, "entrenching"), مُرْسَخ (passive, "entrenched") -- first-hamzated (like sound): مُؤْيِس (active, "causing to despair"), مُؤْيَس (passive, "caused to despair") -- final-weak: مُكْرٍ (active, "renting out"), مُكْرًى (passive, "rented out") -- assimilated: مُورِد (active, "transferring"), مُورَد (passive, "transferred"); same when first-Y, e.g. -- أَيْقَنَ "to be certain of": مُوقِن (active), مُوقَن (passive) -- assimilated + final-weak: مُورٍ (active, "setting fire, kindling"), مُورًى (passive, "set fire, kindled") -- geminate: مُمِدّ (active, "granting, helping"), مُمَدّ (passive, "granted, helped") -- hollow: مُزِيل (active, "eliminating"), مُزَال (passive, "eliminated") -- hollow + final-weak: مُعْيٍ (active, "tiring"), مُعْيًى (passive, "tired") local sound_prefix, expected_length if vform == "X" then check(2, S) check(3, T) sound_prefix = MU .. S .. SK .. T .. A .. c(4) expected_length = 6 else sound_prefix = MU .. c(2) expected_length = 4 end if len == expected_length and c(len - 1) == Y and c(len) ~= AMAQ then -- active hollow if not is_active then err("this shape only allowed for active participles") end check_weakness({"hollow"}, "allow missing") vocalized = sound_prefix .. II .. c(len) elseif len == expected_length and c(len - 1) == ALIF then -- passive hollow if is_active then err("this shape only allowed for passive participles") end check_weakness({"hollow"}, "allow missing") vocalized = sound_prefix .. AA .. c(len) elseif len == expected_length - 1 then -- active final-weak or active or passive geminate if is_active then check_weakness({"geminate", "final-weak", "assimilated+final-weak"}) if weakness == "geminate" then vocalized = sound_prefix .. I .. c(len) .. SH elseif vform == "IV" and c(2) == W then -- assimilated final-weak vocalized = sound_prefix .. c(len) .. IN else vocalized = sound_prefix .. SK .. c(len) .. IN end else check_weakness({"geminate"}, "allow missing") vocalized = sound_prefix .. A .. c(len) .. SH end else if vform == "IV" and c(2) == W then -- assimilated, possibly final-weak sound_prefix = sound_prefix .. c(expected_length - 1) else sound_prefix = sound_prefix .. SK .. c(expected_length - 1) end handle_possibly_final_weak(sound_prefix, expected_length) end elseif vform == "VII" or vform == "VIII" then -- form VII (passive participles are fairly rare but do exist): -- sound: مُنْكَتِب (active "subscribing"), مُنْكَتَب (passive "subscribed") -- geminate: مُنْضَمّ (both active "joining, containing" and passive "joined, contained") -- final-weak: مُنْطَلٍ (active "fooling (someone)"), مُنْطَلًى (passive "fooled") -- final-weak with medial wāw: مُنْطَوٍ (active "involving"), مُنْطَوًى (passive "involved") -- hollow: مُنْقَاد (both active "complying with" and passive "complied with") -- -- for form VIII, the same variants exist but things are complicated by assimilations involving the template T. -- sound third-hamzated no assimilation: مُبْتَدِئ (active "beginning"), مُبْتَدَأ (passive "begun") -- geminate no assimilation: مُبْتَزّ (both active "robbing" and passive "robbed") -- final-weak no assimilation: مُبْتَنٍ (active "building"), مُبْتَنًى (passive "built") -- final-weak with medial wāw no assimilation: مُحْتَوٍ (active "containing"), مُحْتَوًى (passive "contained") -- hollow no assimilation: مُخْتَار (both active "choosing" and passive "chosen") -- -- sound with total assimilation: مُتَّبِع (active "following"), مُتَّبَع (passive "followed") -- sound with total assimilation, assimilating wāw: مُتَّعِد (active "threatening"), مُتَّعَد (passive "threatened") -- sound with total assimilation, irregularly assimilating hamza: مُتَّخِذ (active "taking"), مُتَّخَذ (passive "taken") -- sound with total assimilation (to ḏāl, producing dāl): مُدَّخِر (active "reserving"), مُدَّخَر (passive "reserved") -- sound with total assimilation (to ḏāl): مُذَّكِر (active "remembering"), مُذَّكَر (passive "remembered") -- sound with total assimilation (to ṭāʔ): مُطَّرِح (active "discarding"), مُطَّرَح (passive "discarded") -- sound with total assimilation (to ẓāʔ): مُظَّلِم (active "tolerating"), مُظَّلَم (passive "tolerated") -- final-weak with total assimilation, assimilating wāw: مُتَّقٍ (active "guarding against"), مُتَّقًى (passive "guarded against") -- final-weak with total assimilation (to ṯāʔ): مُثَّنٍ (active "undulating"), مُثَّنًى (passive "undulated") -- final-weak with total assimilation (to dāl): مُدَّعٍ (active "claiming"), مُدَّعًى (passive "claimed") -- sound with partial assimilation (to zayn): مُزْدَهِر (active "thriving"), مُزْدَهَر (passive "thrived") -- sound with medial wāw with partial assimilation (to zayn): مُزْدَوِج (active "appearing twice") -- sound with partial assimilation (to ṣād): مُصْطَبِح (active "illuminating"), مُصْطَبَح (passive, "illuminated") -- sound with partial assimilation (to ḍād): مُضْطَرِب (active "to be disturbed"; no passive) -- geminate with partial assimilation (to ṣād): مُصْطَبّ (both active "effusing" and passive "effused") -- geminate with partial assimilation (to ḍād): مُضْطَرّ (both active "forcing" and passive "forced") -- final-weak with partial assimilation (to ṣād): مُصْطَلٍ (active "warming"), مُصْطَلًى (passive "warmed") -- hollow with partial assimilation (to zayn): مُزْدَاد (both active "increasing" and passive "increased") -- hollow with partial assimilation (to ṣad): مُصْطَاد (both active "hunting" and passive "hunted") local sound_prefix, sufind if vform == "VII" then check(2, N) sound_prefix = MU .. N .. SK .. c(3) sufind = 4 else local c2 = c(2) if c2 == T or c2 == "د" or c2 == "ث" or c2 == "ذ" or c2 == "ط" or c2 == "ظ" then -- full assimilation sound_prefix = MU .. c2 .. SH sufind = 3 else -- partial or no assimilation if c2 == "ز" then check(3, "د") elseif c2 == "ص" or c2 == "ض" then check(3, "ط") else check(3, T) end sound_prefix = MU .. c2 .. SK .. c(3) sufind = 4 end end if c(sufind) == ALIF then -- hollow, active or passive check_len(sufind + 1, sufind + 1) check_weakness({"hollow"}, "allow missing") vocalized = sound_prefix .. AA .. c(sufind + 1) elseif len == sufind then -- active final-weak or active or passive geminate if is_active then check_weakness({"geminate", "final-weak", "assimilated+final-weak"}) if weakness == "geminate" then vocalized = sound_prefix .. A .. c(len) .. SH else vocalized = sound_prefix .. A .. c(len) .. IN end else check_weakness({"geminate"}, "allow missing") vocalized = sound_prefix .. A .. c(len) .. SH end else sound_prefix = sound_prefix .. A .. c(sufind) handle_possibly_final_weak(sound_prefix, sufind + 1) end elseif vform == "IX" then check_len(4, 4) vocalized = MU .. c(2) .. SK .. c(3) .. A .. c(4) .. SH elseif vform == "IVq" then -- e.g. [[اذلعب]] "to scamper away", مُذْلَعِبّ (active), مُذْلَعَبّ (passive); -- [[اطمأن]] "to remain quietly; to be certain", مُطْمَئِنّ (active), مُطْمَأَنّ (passive) check_len(5, 5) local sound_prefix = MU .. c(2) .. SK .. c(3) .. A .. c(4) if is_active then vocalized = sound_prefix .. I .. c(5) .. SH else vocalized = sound_prefix .. A .. c(5) .. SH end elseif vform == "XI" then check_len(5, 5) check(4, ALIF) vocalized = MU .. c(2) .. SK .. c(3) .. AA .. c(5) .. SH -- e.g. [[احمار]] "to turn red, to blush", مُحْمَارّ (active) elseif vform == "XIV" or vform == "XV" then -- FIXME: Implement. No examples in Wiktionary currently; need to look up in a grammar. error("Support for verb form " .. vform .. " not implemented yet") else error("Don't recognize verb form " .. vform) end vocalized = rsub(vocalized, HAMZA .. AA, AMAD) local reconstructed_headword = lang:stripDiacritics(vocalized) if reconstructed_headword ~= orig_headword then error(("Internal error: Vocalized participle %s doesn't match original participle %s"):format( vocalized, orig_headword)) end return vocalized end function export.infer_participle_vocalization_json(frame) local iparams = { [1] = {required = true}, [2] = {required = true}, ["weakness"] = {}, ["passive"] = {type = "boolean"} } local iargs = require("Module:parameters").process(frame.args, iparams) return export.infer_participle_vocalization(iargs[1], iargs[2], iargs.weakness, not iargs.passive) end return export gopkz51p5nvpnjqfkifs9ek7o5xi1ai Module:ar-translit 828 8167 27702 2026-06-21T14:19:22Z Umarxon III 2840 Sahypa döretdi, mazmuny: '-- Authors: Benwing, ZxxZxxZ, Atitarev local export = {} local m_str_utils = require("Module:string utilities") local gcodepoint = m_str_utils.gcodepoint local rfind = m_str_utils.find local rsubn = m_str_utils.gsub local rmatch = m_str_utils.match local rsplit = m_str_utils.split local U = m_str_utils.char local unpack = unpack or table.unpack -- Lua 5.2 compatibility -- assigned below local has_diacritics -- version of rsubn() that discards all but the fir...' 27702 Scribunto text/plain -- Authors: Benwing, ZxxZxxZ, Atitarev local export = {} local m_str_utils = require("Module:string utilities") local gcodepoint = m_str_utils.gcodepoint local rfind = m_str_utils.find local rsubn = m_str_utils.gsub local rmatch = m_str_utils.match local rsplit = m_str_utils.split local U = m_str_utils.char local unpack = unpack or table.unpack -- Lua 5.2 compatibility -- assigned below local has_diacritics -- version of rsubn() that discards all but the first return value local function rsub(term, foo, bar) local retval = rsubn(term, foo, bar) return retval end local zwnj = U(0x200C) -- zero-width non-joiner local alif_madda = U(0x622) local alif_hamza_below = U(0x625) local alif = U(0x627) local taa_marbuuTa = U(0x629) local laam = U(0x644) local waaw = U(0x648) local alif_maqSuura = U(0x649) local yaa = U(0x64A) local fatHataan = U(0x64B) local Dammataan = U(0x64C) local kasrataan = U(0x64D) local fatHa = U(0x64E) local Damma = U(0x64F) local kasra = U(0x650) local shadda = U(0x651) local sukuun = U(0x652) local dagger_alif = U(0x670) local alif_waSl = U(0x671) --local zwj = U(0x200D) -- zero-width joiner local lrm = U(0x200E) -- left-to-right mark local rlm = U(0x200F) -- right-to-left mark -- Occurs after al- in allaḏī and variants so that we can implement elision of -- a- after a preceding vowel, after which we remove the marker. local alladi_marker = U(0xFFF0) local tt = { -- consonants ["ب"]="b", ["ت"]="t", ["ث"]="ṯ", ["ج"]="j", ["ح"]="ḥ", ["خ"]="ḵ", ["د"]="d", ["ذ"]="ḏ", ["ر"]="r", ["ز"]="z", ["س"]="s", ["ش"]="š", ["ص"]="ṣ", ["ض"]="ḍ", ["ط"]="ṭ", ["ظ"]="ẓ", ["ع"]="ʕ", ["غ"]="ḡ", ["ف"]="f", ["ق"]="q", ["ك"]="k", ["ڪ"]="k", ["ل"]="l", ["م"]="m", ["ن"]="n", ["ه"]="h", -- tāʾ marbūṭa (special) - always after a fátḥa (a), silent at the end of -- an utterance, "t" in ʾiḍāfa or with pronounced tanwīn. We catch -- most instances of tāʾ marbūṭa before we get to this stage. [taa_marbuuTa]="t", -- tāʾ marbūṭa = ة -- control characters [zwnj]="-", -- ZWNJ (zero-width non-joiner) -- [zwj]="", -- ZWJ (zero-width joiner) -- rare letters ["پ"]="p", ["چ"]="č", ["ژ"]="ž", ["ڤ"]="v", ["ڥ"]="v", ["گ"]="g", ["ڨ"]="g", ["ڧ"]="q", ["ڢ"]="f", ["ں"]="n", ["ڭ"]="g", -- semivowels or long vowels, alif, hamza, special letters ["ا"]="ā", -- ʾalif -- hamzated letters ["أ"]="ʔ", -- hamza over alif [alif_hamza_below]="ʔ", -- hamza under alif ["ؤ"]="ʔ", -- hamza over wāw ["ئ"]="ʔ", -- hamza over yā ["ء"]="ʔ", -- hamza on the line -- long vowels [waaw]="w", --"ū" after ḍamma (u) and not before diacritic [yaa]="y", --"ī" after kasra (i) and not before diacritic [alif_maqSuura]="ā", -- ʾalif maqṣūra [alif_madda]="ʔā", -- ʾalif madda [alif_waSl]= "", -- hamzatu l-waṣl [dagger_alif] = "ā", -- ʾalif xanjariyya = dagger ʾalif (Koranic diacritic) -- short vowels, šádda and sukūn [fatHataan]="an", -- fatḥatan [Dammataan]="un", -- ḍammatan [kasrataan]="in", -- kasratan [fatHa]="a", -- fatḥa [Damma]="u", -- ḍamma [kasra]="i", -- kasra -- šadda - doubled consonant [sukuun]="", --sukūn - no vowel -- ligatures ["ﻻ"]="lā", ["ﷲ"]="llāh", -- taṭwīl ["ـ"]="", -- taṭwīl, no sound -- numerals ["١"]="1", ["٢"]="2", ["٣"]="3", ["٤"]="4", ["٥"]="5", ["٦"]="6", ["٧"]="7", ["٨"]="8", ["٩"]="9", ["٠"]="0", -- punctuation (leave on separate lines) ["؟"]="?", -- question mark ["«"]='“', -- quotation mark ["»"]='”', -- quotation mark ["٫"]=".", -- decimal point ["٬"]=",", -- thousands separator ["٪"]="%", -- percent sign ["،"]=",", -- comma ["؛"]=";" -- semicolon } local sun_letters = "تثدذرزسشصضطظلن" -- For use in implementing sun-letter assimilation of ال (al-) local ttsun1 = {} local ttsun2 = {} local ttsun3 = {} for cp in gcodepoint(sun_letters) do local ch = U(cp) ttsun1[ch] = tt[ch] ttsun2["l-" .. ch] = tt[ch] .. "-" .. ch table.insert(ttsun3, tt[ch]) end -- For use in implementing elision of al- local sun_letters_tr = table.concat(ttsun3, "") local consonants_needing_vowels = "بتثجحخدذرزسشصضطظعغفقكڪلمنهپچژڤگڨڧڢںڭأإؤئءةﷲ" -- consonants on the right side; includes alif madda local rconsonants = consonants_needing_vowels .. "ويآ" -- consonants on the left side; does not include alif madda local lconsonants = consonants_needing_vowels .. "وي" -- Arabic semicolon, comma, question mark; taṭwīl; period, exclamation point, -- single quote for bold/italic, double quotes for quoted material local punctuation = "؟،؛" .. "ـ" .. ".!'" .. '"' local space_like = "%s'" .. '"' local space_like_class = "[" .. space_like .. "]" local numbers = "١٢٣٤٥٦٧٨٩٠" local before_diacritic_checking_subs = { ------------ transformations prior to checking for diacritics -------------- -- random Koranic marks and presentation forms {U(0x06E1), sukuun}, -- "Small High Dotless Head of Khah" (variant of sukūn) {U(0x06DA), ""}, -- "Small High Jeem" {U(0x06DF), ""}, -- "Small High Rounded Zero" (FIXME: correct?) {U(0x08F0), U(0x64B)}, -- "Open Fathatan" {U(0x08F1), U(0x64C)}, -- "Open Dammatan" {U(0x08F2), U(0x64D)}, -- "Open Kasratan" {U(0x06E4), ""}, -- "Small High Madda" (FIXME: correct?) {U(0x06D6), ""}, -- "Small High Ligature Sad with Lam with Alef Maksura" (FIXME: there are others we need to do) {U(0x06E5), "و"}, {U(0x06E6), "ي"}, -- convert llh for allāh into ll+shadda+dagger-alif+h {"لله", "للّٰه"}, -- shadda+short-vowel (including tanwīn vowels, i.e. -an -in -un) gets -- replaced with short-vowel+shadda during NFC normalisation, which -- MediaWiki does for all Unicode strings; however, it makes the -- transliteration process inconvenient, so undo it. {"([" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. dagger_alif .. "])" .. shadda, shadda .. "%1"}, -- ignore Koranic gemination at beginning of word due to assimilation of preceding consonant {" ([" .. lconsonants .. "])" .. shadda, " %1"}, -- ignore alif jamīla (otiose alif in 3pl verb forms) -- #1: handle ḍamma + wāw + alif (final -ū) {Damma .. waaw .. alif, Damma .. waaw}, -- #2: handle wāw + sukūn + alif (final -w in -aw in defective verbs) -- this must go before the generation of w, which removes the waw here. {waaw .. sukuun .. alif, waaw .. sukuun}, -- ignore final alif or alif maqṣūra following fatḥatan (e.g. in accusative -- singular or words like عَصًا "stick" or هُدًى "guidance"; this is called -- tanwin nasb) {fatHataan .. "[" .. alif .. alif_maqSuura .. "]", fatHataan}, -- same but with the fatḥatan placed over the alif or alif maqṣūra -- instead of over the previous letter (considered a misspelling but -- common) {"[" .. alif .. alif_maqSuura .. "]" .. fatHataan, fatHataan}, -- tāʾ marbūṭa should always be preceded by fatḥa, alif, alif madda or -- dagger alif; infer fatḥa if not {"([^" .. fatHa .. alif .. alif_madda .. dagger_alif .. "])" .. taa_marbuuTa, "%1" .. fatHa .. taa_marbuuTa}, -- similarly for alif between consonants, possibly marked with shadda -- (does not apply to initial alif, which is silent when not marked with -- hamza, or final alif, which might be pronounced as -an) {"([" .. lconsonants .. "]" .. shadda .. "?)" .. alif .. "([" .. rconsonants .. "])", "%1" .. fatHa .. alif .. "%2"}, -- infer fatḥa in case of non-fatḥa + alif/alif-maqṣūra + dagger alif {"([^" .. fatHa .. "])([" .. alif .. alif_maqSuura .. "]" .. dagger_alif .. ")", "%1" .. fatHa .. "%2"}, -- infer kasra in case of hamza-under-alif not + kasra {alif_hamza_below .. "([^" .. kasra .. kasrataan .. "])", alif_hamza_below .. kasra .. "%1"}, -- ignore dagger alif placed over regular alif or alif maqṣūra {"([" .. alif .. alif_maqSuura .. "])" .. dagger_alif, "%1"}, ----------- rest of these concern definite article alif-lām ---------- -- in kasra/ḍamma + alif + lam, make alif into hamzatu l-waṣl, so we -- handle cases like بِالتَّوْفِيق (bi-t-tawfīq) correctly {"([" .. Damma .. kasra .. "])" .. alif .. laam, "%1" .. alif_waSl .. laam}, -- al + consonant + shadda (only recognize word-initially if regular alif): remove shadda {"^(" .. alif .. fatHa .. "?" .. laam .. "[" .. lconsonants .. "])" .. shadda, "%1"}, {"(" .. space_like_class .. alif .. fatHa .. "?" .. laam .. "[" .. lconsonants .. "])" .. shadda, "%1"}, {"(" .. alif_waSl .. fatHa .. "?" .. laam .. "[" .. lconsonants .. "])" .. shadda, "%1"}, -- handle l- hamzatu l-waṣl or word-initial al- {"^" .. alif .. fatHa .. "?" .. laam, "al-"}, {"(" .. space_like_class .. ")" .. alif .. fatHa .. "?" .. laam, "%1al-"}, -- next one for bi-t-tawfīq {"([" .. Damma .. kasra .. "])" .. alif_waSl .. fatHa .. "?" .. laam, "%1-l-"}, -- next one for remaining hamzatu l-waṣl (at beginning of word) {alif_waSl .. fatHa .. "?" .. laam, "l-"}, -- special casing if the l in al- has a shadda on it (as in الَّذِي "that"), -- so we don't mistakenly double the dash; insert a special marker here so -- that we know later to elide the a- after a vowel {"l%-" .. shadda, "l" .. alladi_marker .. "l"}, -- implement assimilation of sun letters {"l%-[" .. sun_letters .. "]", ttsun2}, } -- Transliterate the word(s) in TEXT. LANG (the language) and SC (the script) -- are ignored. OMIT_I3RAAB means leave out final short vowels (ʾiʿrāb). -- GRAY_I3RAAB means render transliterate short vowels (ʾiʿrāb) in gray. -- FORCE_TRANSLIT causes even non-vocalized text to be transliterated -- (normally the function checks for non-vocalized text and returns nil, -- since such text is ambiguous in transliteration). function export.tr(text, lang, sc, omit_i3raab, gray_i3raab, force_translit) -- make it possible to call this function from a template if type(text) == "table" then local function f(x) return (x ~= "") and x or nil end text, lang, sc, omit_i3raab, force_translit = f(text.args[1]), f(text.args[2]), f(text.args[3]), f(text.args[4]), f(text.args[5]) end for _, sub in ipairs(before_diacritic_checking_subs) do text = rsub(text, sub[1], sub[2]) end if not force_translit and not has_diacritics(text) then require("Module:debug").track("ar-translit/lacking diacritics") return nil end ------------ transformations after checking for diacritics -------------- -- Replace plain alif with hamzatu l-waṣl when followed by fatḥa/ḍamma/kasra. -- Must go after handling of initial al-, which distinguishes alif-fatḥa -- from alif w/hamzatu l-waṣl. Must go before generation of ū and ī, which -- eliminate the ḍamma/kasra. text = rsub(text, alif .. "([" .. fatHa .. Damma .. kasra .. "])", alif_waSl .. "%1") -- ḍamma + waw not followed by a diacritic is ū, otherwise w text = rsub(text, Damma .. waaw .. "([^" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. shadda .. sukuun .. dagger_alif .. "])", "ū%1") text = rsub(text, Damma .. waaw .. "$", "ū") -- kasra + yaa not followed by a diacritic (or ū from prev step) is ī, otherwise y text = rsub(text, kasra .. yaa .. "([^" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. shadda .. sukuun .. dagger_alif .. "ū])", "ī%1") text = rsub(text, kasra .. yaa .. "$", "ī") -- convert shadda to double letter. text = rsub(text, "(.)" .. shadda, "%1%1") if not omit_i3raab and gray_i3raab then -- show ʾiʿrāb grayed in transliteration -- decide whether to gray out the t in ﺓ. If word begins with al- or l-, yes. -- Otherwise, no if word ends in a/i/u, yes if ends in an/in/un. text = rsub(text, "^(a?l%-[^%s]+)" .. taa_marbuuTa .. "([" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. "])", '%1<span style="color: var(--wikt-palette-grey-8,#888)">t</span>%2') text = rsub(text, "(" .. space_like_class .. "a?l%-[^%s]+)" .. taa_marbuuTa .. "([" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. "])", '%1<span style="color: var(--wikt-palette-grey-8,#888)">t</span>%2') text = rsub(text, taa_marbuuTa .. "([" .. fatHa .. Damma .. kasra .. "])", "t%1") text = rsub(text, taa_marbuuTa .. "([" .. fatHataan .. Dammataan .. kasrataan .. "])", '<span style="color: var(--wikt-palette-grey-8,#888)">t</span>%1') text = rsub(text, ".", { [fatHataan] = '<span style="color: var(--wikt-palette-grey-8,#888)">an</span>', [kasrataan] = '<span style="color: var(--wikt-palette-grey-8,#888)">in</span>', [Dammataan] = '<span style="color: var(--wikt-palette-grey-8,#888)">un</span>' }) text = rsub(text, "([" .. fatHa .. Damma .. kasra .. "])(" .. space_like_class .. ")", function(vowel, space) vowel_repl = { [fatHa] = '<span style="color: var(--wikt-palette-grey-8,#888)">a</span> ', [kasra] = '<span style="color: var(--wikt-palette-grey-8,#888)">i</span> ', [Damma] = '<span style="color: var(--wikt-palette-grey-8,#888)">u</span> ' } return vowel_repl[vowel] .. space end ) text = rsub(text, "[" .. fatHa .. Damma .. kasra .. "]$", { [fatHa] = '<span style="color: var(--wikt-palette-grey-8,#888)">a</span>', [kasra] = '<span style="color: var(--wikt-palette-grey-8,#888)">i</span>', [Damma] = '<span style="color: var(--wikt-palette-grey-8,#888)">u</span>' }) text = rsub(text, '</span><span style="color: var(--wikt-palette-grey-8,#888)">', "") elseif omit_i3raab then -- omit ʾiʿrāb in transliteration text = rsub(text, "[" .. fatHataan .. Dammataan .. kasrataan .. "]", "") text = rsub(text, "[" .. fatHa .. Damma .. kasra .. "](" .. space_like_class .. ")", "%1") text = rsub(text, "[" .. fatHa .. Damma .. kasra .. "]$", "") end -- tāʾ marbūṭa should not be rendered by -t if word-final even when -- ʾiʿrāb (desinential inflection) is shown; instead, use (t) before -- whitespace, nothing when final; but render final -ﺍﺓ and -ﺁﺓ as -āh, -- consistent with Wehr's dictionary -- Left-to-right or right-to-left mark at end of text will prevent tāʾ marbūṭa -- from being transliterated correctly. text = string.gsub(text, lrm, "") text = string.gsub(text, rlm, "") text = rsub(text, "([" .. alif .. alif_madda .. "])" .. taa_marbuuTa .. "$", "%1h") -- Ignore final tāʾ marbūṭa (it appears as "a" due to the preceding -- short vowel). Need to do this after graying or omitting word-final -- ʾiʿrāb. text = rsub(text, taa_marbuuTa .. "$", "") text = rsub(text, taa_marbuuTa .. "(%p)", "%1") if not omit_i3raab then -- show ʾiʿrāb in transliteration text = rsub(text, taa_marbuuTa .. "(" .. space_like_class .. ")", "(t)%1") else -- When omitting ʾiʿrāb, show all non-absolutely-final instances of -- tāʾ marbūṭa as (t), with trailing ʾiʿrāb omitted. text = rsub(text, taa_marbuuTa, "(t)") end -- tatwīl should be rendered as - at beginning or end of word. It will -- be rendered as nothing in the middle of a word (FIXME, do we want -- this?) text = rsub(text, "^ـ", "-") text = rsub(text, "(" .. space_like_class .. ")ـ", "%1-") text = rsub(text, "ـ$", "-") text = rsub(text, "ـ(" .. space_like_class .. ")", "-%1") -- Now convert remaining Arabic chars according to table. text = rsub(text, ".", tt) text = rsub(text, "aā", "ā") -- Implement elision of al- after a final vowel. We do this -- conservatively, only handling elision of the definite article and related -- terms (specifically, relative pronoun الَّذِي (allaḏī) and variants) rather -- than elision in other cases of hamzat al-waṣl (e.g. form-I imperatives -- or form-VII and above verbal nouns) partly because elision in -- these cases isn't so common in MSA and partly to avoid excessive -- elision in case of words written with initial bare alif instead of -- properly with hamzated alif. Possibly we should reconsider. text = rsub(text, "([aiuāīū]'* +'*)a([" .. sun_letters_tr .. "][%-" .. alladi_marker .. "])", "%1%2") if gray_i3raab then text = rsub(text, "([aiuāīū]'*</span>'* +'*)a([" .. sun_letters_tr .. "][%-" .. alladi_marker .. "])", "%1%2") end -- remove indicator of allaḏī, which has served its purpose text = rsub(text, alladi_marker, "") -- Special-case the transliteration of allāh, without the hyphen. text = rsub(text, "^(a?)l%-lāh", "%1llāh") text = rsub(text, "(" .. space_like_class .. "a?)l%-lāh", "%1llāh") -- Compress multiple spaces, which may occur e.g. when removing Koranic diacritics. text = rsub(text, "(%s)%s+", "%1") return text end local has_diacritics_subs = { -- FIXME! What about lam-alif ligature? -- remove punctuation and shadda -- must go before removing final consonants {"[" .. punctuation .. shadda .. "]", ""}, -- Remove consonants at end of word or utterance, so that we're OK with -- words lacking iʿrāb (must go before removing other consonants). -- If you want to catch places without iʿrāb, comment out the next two lines. {"[" .. lconsonants .. "]$", ""}, {"[" .. lconsonants .. "]([%)%]}]?" .. space_like_class .. ")", "%1"}, -- remove consonants (or alif) when followed by diacritics -- must go after removing shadda -- do not remove the diacritics yet because we need them to handle -- long-vowel sequences of diacritic + pseudo-consonant {"[" .. lconsonants .. alif .. "]([" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. sukuun .. dagger_alif .. "])", "%1"}, -- the following two must go after removing consonants w/diacritics because -- we only want to treat vocalic wāw/yā' in them (we want to have removed -- wāw/yā' followed by a diacritic) -- remove ḍamma + wāw {Damma .. waaw, ""}, -- remove kasra + yā' {kasra .. yaa, ""}, -- remove fatḥa/fatḥatan + alif/alif-maqṣūra {"[" .. fatHataan .. fatHa .. "][" .. alif .. alif_maqSuura .. "]", ""}, -- remove diacritics {"[" .. fatHataan .. Dammataan .. kasrataan .. fatHa .. Damma .. kasra .. sukuun .. dagger_alif .. "]", ""}, -- remove numbers, hamzatu l-waṣl, alif madda {"[" .. numbers .. "ٱ" .. "آ" .. "]", ""}, -- remove non-Arabic characters {"[^" .. U(0x0600) .. "-" .. U(0x06FF) .. U(0x0750) .. "-" .. U(0x077F) .. U(0x08A0) .. "-" .. U(0x08FF) .. U(0xFB50) .. "-" .. U(0xFDFF) .. U(0xFE70) .. "-" .. U(0xFEFF) .. "]", ""} } -- declared as local above function has_diacritics(text) local orig_text = text local count text, count = rsubn(text, "[" .. lrm .. rlm .. "]", "") if count > 0 then require("Module:debug").track("ar-translit/lrm or rlm") end for _, sub in ipairs(has_diacritics_subs) do text = rsub(text, unpack(sub)) end if #text > 0 then mw.log(("Check for missing diacritics failed; original text '%s', text without diacritics '%s'"):format( orig_text, text)) end return #text == 0 end -- Return true if transliteration TR is an irregular transliteration of -- ARABIC. Return false if ARABIC can't be transliterated. For purposes of -- establishing regularity, hyphens are ignored and word-final tāʾ marbūṭa -- can be transliterated as "(t)", "" or "t". function export.irregular_translit(arabic, tr) if not arabic or arabic == "" or not tr or tr == "" then return false end local regtr = export.tr(arabic) if not regtr or regtr == tr then return false end local arwords = rsplit(arabic, " ") local regwords = rsplit(regtr, " ") local words = rsplit(tr, " ") if #regwords ~= #words or #regwords ~= #arwords then return true end for i=1,#regwords do local regword = regwords[i] local word = words[i] local arword = arwords[i] -- Resolve final (t) in auto-translit to t, h or nothing if rfind(regword, "%(t%)$") then regword = rfind(word, "āh$") and rsub(regword, "%(t%)$", "h") or rfind(word, "t$") and rsub(regword, "%(t%)$", "t") or rsub(regword, "%(t%)$", "") end -- Resolve clitics + short a + alif-lām, which may get auto-transliterated -- to contain long ā, to short a if the manual translit has it; note -- that currently in cases with assimilated l, the auto-translit will -- fail, so we won't ever get here and don't have to worry about -- auto-translit l against manual-translit assimilated char. local clitic_chars = "^[وفكل]" -- separate line to avoid L2R display weirdness if rfind(arword, clitic_chars .. fatHa .. "?[" .. alif .. alif_waSl .. "]" .. laam) and rfind(word, "^[wfkl]a%-") then regword = rsub(regword, "^([wfkl])ā", "%1a") end -- Ignore hyphens when comparing if rsub(regword, "%-", "") ~= rsub(word, "%-", "") then return true end end return false end return export eevhmsd5a9n05vuzlsjd3c1063cl6fb Module:ar-headword 828 8168 27703 2026-06-21T14:20:28Z Umarxon III 2840 Sahypa döretdi, mazmuny: '-- Author: primarily Benwing2; some work by Fenakhay, Erutuon; early version by Rua local export = {} local pos_functions = {} local force_cat = false -- for testing; if true, categories appear in non-mainspace pages local ar_translit = require("Module:ar-translit") local ar_verb_module = "Module:ar-verb" local ar_utilities_module = "Module:ar-utilities" local ar = require(ar_utilities_module) local en_utilities_module = "Module:en-utilities" local headword_mo...' 27703 Scribunto text/plain -- Author: primarily Benwing2; some work by Fenakhay, Erutuon; early version by Rua local export = {} local pos_functions = {} local force_cat = false -- for testing; if true, categories appear in non-mainspace pages local ar_translit = require("Module:ar-translit") local ar_verb_module = "Module:ar-verb" local ar_utilities_module = "Module:ar-utilities" local ar = require(ar_utilities_module) local en_utilities_module = "Module:en-utilities" local headword_module = "Module:headword" local headword_utilities_module = "Module:headword utilities" local links_module = "Module:links" local inflection_utilities_module = "Module:inflection utilities" local parse_utilities_module = "Module:parse utilities" local require_when_needed = require("Module:utilities/require when needed") local remove_links = require_when_needed(links_module, "remove_links") local m_table = require("Module:table") local m_str_utils = require("Module:string utilities") local m_en_utilities = require_when_needed(en_utilities_module) local m_headword_utilities = require_when_needed(headword_utilities_module) local glossary_link = require_when_needed(headword_utilities_module, "glossary_link") local boolean_param = {type = "boolean"} local list_to_set = m_table.listToSet local rfind = m_str_utils.find local rmatch = m_str_utils.match local rsubn = m_str_utils.gsub local u = m_str_utils.char local rsplit = m_str_utils.split local insert = table.insert local concat = table.concat local unpack = unpack or table.unpack -- Lua 5.2 compatibility local langcode = "ar" local lang = require("Module:languages").getByCode(langcode) local langname = lang:getCanonicalName() local TEMPCOMMA = u(0xFFF0) local TEMPARCOMMA = u(0xFFF1) local misc_pos_with_gender = list_to_set { "suffixes", "adjective forms", "noun forms", "proper noun forms", "pronoun forms", "determiner forms", } ----------------------------------------------------------------------------------------- -- Utility functions -- ----------------------------------------------------------------------------------------- local dump = mw.dumpObject -- version of mw.ustring.gsub() that discards all but the first return value local function rsub(term, foo, bar) local retval = rsubn(term, foo, bar) return retval end local function ine(val) if val == "" then return nil else return val end end -- Replace comma with a temporary char in comma + whitespace. local function escape_comma_whitespace(run) local escaped = false if run:find("\\,") then run = run:gsub("\\,", "\\" .. TEMPCOMMA) escaped = true end if run:find("\\،") then run = run:gsub("\\،", "\\" .. TEMPARCOMMA) escaped = true end if run:find(",%s") then run = run:gsub(",(%s)", TEMPCOMMA .. "%1") escaped = true end if run:find("،%s") then run = run:gsub("،(%s)", TEMPARCOMMA .. "%1") escaped = true end return run, escaped end -- Undo replacement of comma with a temporary char in comma + whitespace. local function unescape_comma_whitespace(run) return (run:gsub(TEMPCOMMA, ","):gsub(TEMPARCOMMA, "،")) end -- Split an argument on comma or Arabic comma, but not either type of comma followed by whitespace. local function split_on_comma(val) if rfind(val, "[,،]%s") or val:find("\\") then return export.split_escaping(val, "[,،]", false, escape_comma_whitespace, unescape_comma_whitespace) else return rsplit(val, "[,،]") end end local function replace_tr_ending(tr, from, to) if not tr then return nil end local pref = tr:match("^(.*)" .. from .. "$") if not pref then error(("Translit '%s' does not end in -%s, as expected"):format(tr, from)) end return pref .. to end ----------------------------------------------------------------------------------------- -- Tracking functions -- ----------------------------------------------------------------------------------------- local trackfn = require("Module:debug/track") local function track(page) trackfn(langcode .. "-headword/" .. page) return true end --[==[ Examples of what you can find by looking at what links to the given pages: [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized]] all unvocalized pages [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/pl]] all unvocalized pages where the plural is unvocalized, whether specified using pl=, pl2=, etc. [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/head]] all unvocalized pages where the head is unvocalized [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/head/nouns]] all nouns excluding proper nouns, collective nouns, singulative nouns where the head is unvocalized [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/head/proper]] nouns all proper nouns where the head is unvocalized [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/head/not]] proper nouns all words that are not proper nouns where the head is unvocalized [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized/adjectives]] all adjectives where any parameter is unvocalized; currently only works for heads, so equivalent to .../unvocalized/head/adjectives [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized-empty-head]] all pages with an empty head [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized-manual-translit]] all unvocalized pages with manual translit [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized-manual-translit/head/nouns]] all nouns where the head is unvocalized but has manual translit [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/unvocalized-no-translit]] all unvocalized pages without manual translit [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/i3rab]] all pages with any parameter containing i3rab of either -un, -u, -a or -i [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/i3rab-un]] all pages with any parameter containing an -un i3rab ending [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/i3rab-un/pl]] all pages where a form specified using pl=, pl2=, etc. contains an -un i3rab ending [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/i3rab-u/head]] all pages with a head containing an -u i3rab ending [[Special:WhatLinksHere/Wiktionary:Tracking/ar-headword/i3rab/head/proper]] nouns (all proper nouns with a head containing i3rab of either -un, -u, -a or -i) In general, the format is one of the following: Wiktionary:Tracking/ar-headword/FIRSTLEVEL Wiktionary:Tracking/ar-headword/FIRSTLEVEL/ARGNAME Wiktionary:Tracking/ar-headword/FIRSTLEVEL/POS Wiktionary:Tracking/ar-headword/FIRSTLEVEL/ARGNAME/POS FIRSTLEVEL can be one of "unvocalized", "unvocalized-empty-head" or its opposite "unvocalized-specified", "unvocalized-manual-translit" or its opposite "unvocalized-no-translit", "i3rab", "i3rab-un", "i3rab-u", "i3rab-a", or "i3rab-i". ARGNAME is either "head" or an argument such as "pl", "f", "cons", etc. This automatically includes arguments specified as head2=, pl3=, etc. POS is a part of speech, lowercase and singular, e.g. "noun", "adjective", "proper noun", "collective noun", etc. or "not proper noun", which includes all parts of speech but proper nouns. ]==] local function track_form(argname, form, translit, pos) form = ar.reorder_shadda(remove_links(form)) function dotrack(page) track(page) track(page .. "/" .. argname) if pos then track(page .. "/" .. pos) track(page .. "/" .. argname .. "/" .. pos) if pos ~= "proper noun" then track(page .. "/not proper noun") track(page .. "/" .. argname .. "/not proper noun") end end end function track_i3rab(arabic, tr) if rfind(form, arabic .. "$") then dotrack("i3rab") dotrack("i3rab-" .. tr) end end track_i3rab(ar.UN, "un") track_i3rab(ar.U, "u") track_i3rab(ar.A, "a") track_i3rab(ar.I, "i") if form == "" or not (lang:transliterate(form)) then dotrack("unvocalized") if form == "" then dotrack("unvocalized-empty-head") else dotrack("unvocalized-specified") end if translit then dotrack("unvocalized-manual-translit") else dotrack("unvocalized-no-translit") end end end ----------------------------------------------------------------------------------------- -- Inflection-parsing functions -- ----------------------------------------------------------------------------------------- -- Construct the default construct state or informal form of a term in lemma format. Usually this is the same as the -- lemma but is different for final-weak nouns and adjectives ending in -n in their lemma. NOTE: Input must be -- shadda-reordered for this to work properly. local function default_construct_state_or_informal(term, tr) local pref = term:match("^(.*)" .. ar.HAMZA .. ar.IN .."$") -- Hamza on the line with -in changes to hamza-on-yā with -ī. if pref then return pref .. ar.HAMZA_ON_YA .. ar.II, replace_tr_ending(tr, "in", "ī") end -- Otherwise just change -in to -ī. pref = term:match("^(.*)" .. ar.IN .. "$") if pref then return pref .. ar.II, replace_tr_ending(tr, "in", "ī") end -- Change -an with alif maqṣūra to -ā with alif maqṣūra. pref = term:match("^(.*)" .. ar.AN .. ar.AMAQ .. "$") if pref then return pref .. ar.AAMAQ, replace_tr_ending(tr, "an", "ā") end -- Change -an with tall alif (e.g. عَصًا) to -ā with tall alif. pref = term:match("^(.*)" .. ar.AN .. ar.ALIF .. "$") if pref then return pref .. ar.AA, replace_tr_ending(tr, "an", "ā") end return term, tr end local function generate_construct_state_or_informal_default(data, args) local heads = data.heads local consobjs = {} local different_cons = false for _, headobj in ipairs(data.heads) do local consterm, constr = default_construct_state_or_informal(headobj.term, headobj.tr) different_cons = different_cons or consterm ~= headobj.term or constr ~= headobj.tr local consobj = m_table.shallowCopy(headobj) consobj.term = consterm consobj.tr = constr insert(consobjs, consobj) end if different_cons then return consobjs else return {} end end local noun_field_cons = { field = "cons", label = "<<construct state>>", generate_default = generate_construct_state_or_informal_default, default_when_not_explicit = function(args, data) return true end, } local noun_field_inf = {field = "inf", label = "informal"} local noun_field_obl = {field = "obl", label = "<<oblique>>"} local noun_field_def = {field = "def", label = "<<definite>> state"} local noun_inflections = { noun_field_cons, noun_field_inf, noun_field_obl, noun_field_def, } local adj_field_inf = { field = "inf", label = "informal", generate_default = generate_construct_state_or_informal_default, default_when_not_explicit = function(args, data) return true end, } local adj_field_obl = noun_field_obl local adj_field_def = noun_field_def local adjective_inflections = { adj_field_inf, adj_field_obl, adj_field_def, } local function has_construct_state(data) return data.pos_category ~= "adjectives" end local function parse_nominal_inflection(paramname, val, parse_err) return m_headword_utilities.parse_term_with_modifiers { val = val, paramname = paramname, splitchar = ",", include_mods = {"tr", "g"}, } end local function make_nominal_inflection_param_mod_spec(paramname) return {convert = function(val, parse_err) return parse_nominal_inflection(paramname, val, parse_err) end} end -- Parse an inflection. The raw arguments come from `args[field]`, which is parsed for inline modifiers. Multiple -- comma-separated values are allowed. local function parse_inflection(data, args, field, is_head) local argfield = field local argpref = field if type(argfield) == "table" then argpref = argfield[2] argfield = argfield[1] end local include_mods if is_head then include_mods = {"tr"} else include_mods = {"tr", "g"} for _, spec in ipairs(has_construct_state(data) and noun_inflections or adjective_inflections) do insert(include_mods, {spec.field, make_nominal_inflection_param_mod_spec(argpref .. "." .. spec.field)}) end end if is_head then local retval if args[argfield] then retval = m_headword_utilities.parse_term_with_modifiers { val = args[argfield], paramname = field, splitchar = ",", is_head = is_head, include_mods = include_mods, } end return retval or {} else return m_headword_utilities.parse_term_list_with_modifiers { forms = args[argfield], paramname = field, splitchar = ",", is_head = is_head, include_mods = include_mods, } end end local function insert_inflection(data, terms, label, accel, defgender, track_field, no_label, usually_no_label) local track_pos = m_en_utilities.singularize(data.pos_category) for _, termobj in ipairs(terms) do -- If the user supplied a construct state or informal form for the term with a value of "+", substitute the -- default value for the term. If the user supplied a value of "--", they want no value displayed. Otherwise, -- if the user didn't supply any value, we check to see if the default construct state or informal form is -- different from the lemma and display it if so; this applies particularly to terms in '-in' and '-an', where -- the default construct state or informal form is almost always correct. local field = has_construct_state(data) and "cons" or "inf" if not termobj[field] then local defcons, defconstr = default_construct_state_or_informal(termobj.term, termobj.tr) if termobj.term ~= defcons or termobj.tr ~= defconstr then -- We don't want to copy qualifiers, labels, etc. from the term object because we're a subinflection of -- the term object. termobj[field] = {{term = defcons, tr = defconstr}} end elseif termobj[field][1].term == "--" then if termobj[field][2] then error("Can't specify more than one value for <" .. field .. ":...> if first value is '--', meaning \"don't insert anything\"") end termobj[field] = nil else for i, consobj in ipairs(termobj[field]) do if consobj.term == "+" then if consobj.tr then error("Can't specify translit for default value '+'") end consobj.term, consobj.tr = default_construct_state_or_informal(termobj.term, termobj.tr) elseif consobj.term == "~" then if consobj.tr then error("Can't specify translit for term-requesting value '~'") end consobj.term, consobj.tr = termobj.term, termobj.tr end end end if defgender and not termobj.genders then termobj.genders = {{spec = defgender}} end local function insert_nested_inflection(field, label) if termobj[field] then m_headword_utilities.insert_inflection { headdata = data, inflobj = termobj, terms = termobj[field], label = label } end end for _, spec in ipairs(has_construct_state(data) and noun_inflections or adjective_inflections) do insert_nested_inflection(spec.field, spec.label) end track_form(track_field, termobj.term, termobj.tr, track_pos) end m_headword_utilities.insert_inflection { headdata = data, terms = terms, label = label, accel = accel and {form = accel} or nil, no_label = no_label, usually_no_label = usually_no_label, } end ----------------------------------------------------------------------------------------- -- Main entry point -- ----------------------------------------------------------------------------------------- function export.show(frame) local iparams = { [1] = true, } local iargs = require("Module:parameters").process(frame.args, iparams) local parargs = frame:getParent().args local poscat = iargs[1] local pos_in_1 = not poscat if pos_in_1 then poscat = ine(parargs[1]) or mw.title.getCurrentTitle().fullText == "Template:" .. langcode .. "-head" and "interjection" or error("Part of speech must be specified in 1=") poscat = require(headword_module).canonicalize_pos(poscat) end local indexing_poscat = pos_in_1 and (misc_pos_with_gender[poscat] and "head_with_gender" or "head") or poscat local params = { ["suffix"] = boolean_param, ["nosuffix"] = boolean_param, ["id"] = true, ["json"] = boolean_param, ["pagename"] = {}, -- for testing } if pos_in_1 then params[1] = {required = true} -- required but ignored as already processed above end local head_is_head = pos_functions[indexing_poscat] and pos_functions[indexing_poscat].head_is_not_1 local headfield = head_is_head and "head" or pos_in_1 and 2 or 1 params[headfield] = head_is_head and true or {default = "+"} params.head2 = {replaced_by = false, instead = "use multiple comma-separated values in |" .. headfield .. "="} local tr_replaced_by = {replaced_by = false, instead = "use <tr:...> inline modifier on |" .. headfield .. "="} params.tr = tr_replaced_by params.tr2 = tr_replaced_by if pos_functions[indexing_poscat] then for key, val in pairs(pos_functions[indexing_poscat].params()) do params[key] = val end end local parargs = frame:getParent().args local args = require("Module:parameters").process(parargs, params) local pagename = args.pagename or mw.loadData("Module:headword/data").pagename local data = { lang = lang, pos_category = poscat, orig_pos_category = poscat, categories = {}, heads = {}, genders = {}, inflections = {enable_auto_translit = true}, pagename = pagename, id = args.id, sort_key = args.sort, force_cat_output = force_cat, -- We expect a head always so the redundant head cat will be inaccurate. no_redundant_head_cat = true, } data.heads = parse_inflection(data, args, headfield, "is_head") for _, headobj in ipairs(data.heads) do if headobj.term == "+" then headobj.term = pagename end end data.is_suffix = false if args.suffix or ( not args.nosuffix and pagename:find("^%-") and poscat ~= "suffixes" and poscat ~= "suffix forms" ) then data.is_suffix = true data.pos_category = "suffixes" local singular_poscat = m_en_utilities.singularize(poscat) insert(data.categories, langname .. " " .. singular_poscat .. "-forming suffixes") insert(data.inflections, {label = singular_poscat .. "-forming suffix"}) end if pos_functions[indexing_poscat] then pos_functions[indexing_poscat].func(data, args) end -- Do this after calling pos_functions[poscat].func() as it may modify data.heads (as verbs do). local irreg_translit = false for _, head in ipairs(data.heads) do if ar_translit.irregular_translit(head.term, head.tr) then irreg_translit = true break end end if irreg_translit then insert(data.categories, langname .. " terms with irregular pronunciations") end if args.json then return require("Module:JSON").toJSON(data) end return require(headword_module).full_headword(data) end ----------------------------------------------------------------------------------------- -- Gender handling -- ----------------------------------------------------------------------------------------- local valid_bare_genders = {false, "m", "f", "mf", "mfbysense", "mfequiv"} local valid_bare_numbers = {false, "d", "p"} local valid_bare_animacies = {false, "pr", "np"} local valid_genders = {} for _, gender in ipairs(valid_bare_genders) do for _, number in ipairs(valid_bare_numbers) do for _, animacy in ipairs(valid_bare_animacies) do local parts = {} local function ins_part(part) if part then insert(parts, part) end end ins_part(gender) ins_part(number) ins_part(animacy) local full_gender = concat(parts, "-") valid_genders[full_gender == "" and "?" or full_gender] = true end end end local function is_masc_sg(g) return g == "m" or g == "m-pr" or g == "m-np" end local function is_fem_sg(g) return g == "f" or g == "f-pr" or g == "f-np" end local function is_masc_fem_sg(g) g = g:gsub("%-pr", ""):gsub("%-np", "") return g == "mf" or g == "mfequiv" or g == "mfbysense" end local function add_gender_params(params, default) params[2] = {type = "genders", default = default or "?", template_default = "m"} params["g2"] = {replaced_by = false, instead = "use comma-separated values in |g="} end -- Handle gender in params 2=, inserting into `data.genders`. Also, if a lemma, insert categories into `data.categories` -- if the gender is unexpected for the form of the noun. (Note: If there are multiple genders, -- [[Module:gender and number]] will automatically insert 'Arabic POS with multiple genders'.) local function handle_gender(data, args, nonlemma, field) if not args[field or 2] then return end for _, gspec in ipairs(args[field or 2]) do if not valid_genders[gspec.spec] then error("Unrecognized gender: " .. gspec.spec) end end data.genders = args[field or 2] if nonlemma then return end for _, gspec in ipairs(data.genders) do local g = gspec.spec if is_masc_sg(g) or is_fem_sg(g) or is_masc_fem_sg(g) then local head = data.heads[1] if head then head = rsub(ar.reorder_shadda(remove_links(head.term)), ar.UNUOPT .. "$", "") local ends_with_tam = rfind(head, "^[^ ]*" .. ar.TAM .. "$") or rfind(head, "^[^ ]*" .. ar.TAM .. " ") if (is_masc_sg(g) or is_masc_fem_sg(g)) and ends_with_tam then insert(data.categories, langname .. " masculine terms with feminine ending") elseif (is_fem_sg(g) or is_masc_fem_sg(g)) and not ends_with_tam and not rfind(head, "[" .. ar.ALIF .. ar.AMAQ .. "]$") and not rfind(head, ar.ALIF .. ar.HAMZA .. "$") then insert(data.categories, langname .. " feminine terms lacking feminine ending") end end end end end ----------------------------------------------------------------------------------------- -- Inflection handlers -- ----------------------------------------------------------------------------------------- -- Add list parameters to `params` (a structure as passed to [[Module:parameters]]) for a parameter named `argpref`. -- If `argpref` is "*", add the nominal inflection parameters for construct state, definite state, etc. Related -- transliteration and gender parameters are no longer supported in favor of inline modifiers, and error messages are -- output if these parameters are used. local function add_infl_params(params, argpref) params[argpref] = {list = true, disallow_holes = true} params[argpref .. "tr"] = {replaced_by = false, instead = "use <tr:...> inline modifier on |" .. argpref .. "="} params[argpref .. "g"] = {replaced_by = false, instead = "use <g:...> inline modifier on |" .. argpref .. "="} end --[=[ Fetch a list of inflections from the arguments in `args` based on argument `field` (e.g. "pl"). Label with `label` (e.g. "plural"), which will appear in the headword. Insert into `data.inflections`, where `data` is the structure passed to [[Module:headword]]. If `generate_default` is specified, it should be a function of two arguments (`data`, `args`), which should generate the default value if no values are specified or if "+" is explicitly given. If `generate_default` isn't specified and the user gave no values, no inflection will be inserted. ]=] local function handle_infl(data, args, spec) local newinfls = parse_inflection(data, args, spec.field, false) if not newinfls[1] and spec.default_when_not_explicit and spec.default_when_not_explicit(data, args) then newinfls = {{term = "+"}} end if spec.handle then spec.handle(data, args, newinfls) end local default_specs = spec.allowed_defspecs if not default_specs then default_specs = spec.generate_default and {["+"] = true} or {} end local saw_defspec = false for _, newinfl in ipairs(newinfls) do if default_specs[newinfl.term] or newinfl.term == "~" then saw_defspec = true break end end if saw_defspec then local newnewinfls = {} for _, newinfl in ipairs(newinfls) do if default_specs[newinfl.term] then if newinfl.tr then error("Can't specify translit for default value '" .. newinfl.term .. "'") end local definfls = spec.generate_default(data, args, newinfl.term) for _, definfl in ipairs(definfls) do m_headword_utilities.combine_termobj_qualifiers_labels(definfl, newinfl) insert(newnewinfls, definfl) end elseif newinfl.term == "~" then if newinfl.tr then error("Can't specify translit for head-requesting value '~'") end for _, headobj in ipairs(data.heads) do headobj = m_table.shallowCopy(headobj) m_headword_utilities.combine_termobj_qualifiers_labels(headobj, newinfl) insert(newnewinfls, headobj) end else insert(newnewinfls, newinfl) end end newinfls = newnewinfls end if newinfls[1] then if newinfls[1].term == "--" then if newinfls[2] then error("Can't specify more than one term if first term is '--', meaning \"don't insert anything\"") end else insert_inflection(data, newinfls, spec.label, nil, spec.defgender, spec.field, spec.no_label, spec.usually_no_label) end end end local function add_infl_list_params(params, infl_list) for _, infl in ipairs(infl_list) do add_infl_params(params, infl.field) end end local function handle_infl_list_args(data, args, infl_list) for _, infl in ipairs(infl_list) do handle_infl(data, args, infl) end end ----------------------------------------------------------------------------------------- -- Default ending generators -- ----------------------------------------------------------------------------------------- local function make_conditional_default(specs) return function(data, args) local heads = data.heads if not heads[1] then heads = {{term = data.pagename}} end local newobjs = {} for _, headobj in ipairs(heads) do local term = ar.reorder_shadda(headobj.term) local tr = headobj.tr local matched = false for _, spec in ipairs(specs) do local from, fromtr, to, totr = unpack(spec) if from:find("^%^") then pref = rmatch(term, from .. "$") else pref = rmatch(term, "^(.*)" .. from .. "$") end if pref then term = pref .. to tr = replace_tr_ending(tr, fromtr, totr) matched = true headobj = m_table.shallowCopy(headobj) headobj.term = ar.undo_reorder_shadda(term) headobj.tr = tr insert(newobjs, headobj) break end end if not matched then error(("Internal error: No matching spec: head=%s"):format(dump(headobj))) end end return newobjs end end local default_feminine = make_conditional_default { {ar.AN .. ar.AMAQ, "an", ar.AAH, "āh"}, {ar.AN .. ar.ALIF, "an", ar.AAH, "āh"}, -- e.g. مُحْيًا {ar.HAMZA .. ar.IN, "in", ar.HAMZA_ON_YA .. ar.IYAH, "iya"}, {ar.IN, "in", ar.IYAH, "iya"}, {"", "", ar.AH, "a"}, } local default_masculine = make_conditional_default { -- tall alif substitutes for alif maqṣūra after a yāʔ {ar.Y .. ar.AAH, "āh", ar.AN .. ar.ALIF, "an"}, {ar.AAH, "āh", ar.AN .. ar.AMAQ, "an"}, -- handle the common case of final-weak feminine active participle with preceding hamza; -- the hamza-on-yāʔ always converts back to hamza on the line when preceded by ā (alif) but -- may not otherwise, so we just leave it alone in that case {ar.ALIF .. ar.HAMZA_ON_YA .. ar.IYAH, "iya", ar.HAMZA .. ar.IN, "in"}, {ar.IYAH, "iya", ar.IN, "in"}, {ar.AH, "a", "", ""}, {"", "", "", ""}, } local default_masculine_plural = make_conditional_default { {ar.AN .. ar.AMAQ, "an", ar.AWN, "awn"}, {ar.AN .. ar.ALIF, "an", ar.AWN, "awn"}, -- e.g. مُحْيًا {ar.HAMZA .. ar.IN, "in", ar.HAMZA_ON_WAW .. ar.UUN, "ūn"}, {ar.IN, "in", ar.UUN, "ūn"}, {"", "", ar.UUN, "ūn"}, } local default_feminine_plural = make_conditional_default { -- صَلَاة pl. صَلَوَات and أَدَاة pl. أَدَوَات and similar; but نَوَاة and وَفَاة with a و in them become نَوَيَات and وَفَيَات; -- and longer terms like مُبَارَاة and كُمَّثْرَاة invariably form their plural in -يَات. {"^([^و]" .. ar.A .. "[^و])" .. ar.AAH, "āh", ar.A .. ar.W .. ar.AAT, "awāt"}, {ar.AAH, "āh", ar.AYAAT, "ayāt"}, {ar.AN .. ar.AMAQ, "an", ar.AYAAT, "ayāt"}, {ar.AN .. ar.ALIF, "an", ar.AYAAT, "ayāt"}, -- e.g. مُحْيًا {ar.HAMZA .. ar.IN, "in", ar.HAMZA_ON_YA .. ar.IYAAT, "iyāt"}, {ar.IN, "in", ar.IYAAT, "iyāt"}, {ar.AH, "a", ar.AAT, "āt"}, {"", "", ar.AAT, "āt"}, } local default_masculine_dual = make_conditional_default { {ar.AN .. ar.AMAQ, "an", ar.AYAAN, "ayān"}, {ar.AN .. ar.ALIF, "an", ar.AYAAN, "ayān"}, -- e.g. مُحْيًا {ar.HAMZA .. ar.IN, "in", ar.HAMZA_ON_YA .. ar.IYAAN, "iyān"}, {ar.IN, "in", ar.IYAAN, "iyān"}, {"", "", ar.AAN, "ān"}, } local default_feminine_dual = make_conditional_default { {ar.AN .. ar.AMAQ, "an", ar.AATAAN, "ātān"}, {ar.AN .. ar.ALIF, "an", ar.AATAAN, "ātān"}, -- e.g. مُحْيًا {ar.HAMZA .. ar.IN, "in", ar.HAMZA_ON_YA .. ar.IY .. ar.ATAAN, "iyatān"}, {ar.IN, "in", ar.IY .. ar.ATAAN, "iyatān"}, {"", "", ar.ATAAN, "atān"}, } -- Return whether `term` is a nisba noun or adjective, ending in -iyy or -iyyah. `nisba_val` is the value of -- args.nisba; if non-nil, it overrides any auto-determination based on the shape of the term. local function term_is_nisba(term, nisba_val) if nisba_val ~= nil then return nisba_val end term = ar.reorder_shadda(term) -- necessary to avoid issues with e.g. أُورُوبِّيّ. local pref = rmatch(term, "^(.*)" .. ar.IYY .. ar.UN .. "?$") if not pref then pref = rmatch(term, "^(.*)" .. ar.IYYAH .. ar.UN .. "?$") end -- Avoid false positives for words like قَوِيّ "strong" and صَبِيّ "boy". There may be other false positives -- but this should catch most of them and will avoid very many false negatives. return pref and not rfind(pref, "^[^ا]" .. ar.A .. ".$") end ----------------------------------------------------------------------------------------- -- Adjectives -- ----------------------------------------------------------------------------------------- local function is_defaulting_adjective(data, args) return data.orig_pos_category == "defaulting adjectives" end local adj_field_elative = {field = "el", label = "<<elative>>"} local adj_inflections = { adj_field_inf, adj_field_obl, adj_field_def, {field = "f", label = "feminine", generate_default = default_feminine, default_when_not_explicit = is_defaulting_adjective}, {field = "d", label = "masculine dual", generate_default = default_masculine_dual}, {field = "fd", label = "feminine dual", generate_default = default_feminine_dual}, {field = "cpl", label = "common plural"}, {field = "pl", label = "masculine plural", generate_default = default_masculine_plural, default_when_not_explicit = is_defaulting_adjective}, {field = "fpl", label = "feminine plural", generate_default = default_feminine_plural, default_when_not_explicit = is_defaulting_adjective}, } local function get_adj_params() local params = {} add_infl_list_params(params, adj_inflections) add_infl_params(params, "el") params.nisba = boolean_param return params end local function handle_adj_args(data, args) handle_infl_list_args(data, args, adj_inflections) handle_infl(data, args, adj_field_elative) for _, headobj in ipairs(data.heads) do if term_is_nisba(headobj.term, args.nisba) then insert(data.categories, langname .. " relative adjectives (nisba)") break end end end pos_functions["adjectives"] = { params = get_adj_params, func = handle_adj_args, } pos_functions["defaulting adjectives"] = { params = get_adj_params, func = function(data, args) data.pos_category = "adjectives" handle_adj_args(data, args) end, } ----------------------------------------------------------------------------------------- -- Nouns, etc. -- ----------------------------------------------------------------------------------------- local function get_masc_or_feminine_gender(data, default_type) local saw_m, saw_f, saw_mf for _, gender in ipairs(data.genders) do if is_masc_sg(gender.spec) then saw_m = true elseif is_fem_sg(gender.spec) then saw_f = true elseif is_masc_fem_sg(gender.spec) then saw_mf = true end end if saw_mf or saw_m and saw_f then error("Can't generate default for " .. default_type .. " when gender is both masculine and feminine") elseif saw_m then return "m" elseif saw_f then return "f" else error("Can't generate default for " .. default_type .. " when gender is not specified as " .. "masculine or feminine singular") end end local function is_defaulting_noun(data, args) return data.orig_pos_category == "defaulting nouns" end local noun_field_dual = { field = "d", label = "dual", generate_default = function(data, args) local gender = get_masc_or_feminine_gender(data, "noun dual") if gender == "m" then return default_masculine_dual(data, args) else return default_feminine_dual(data, args) end end, } local noun_field_plural = { field = "pl", label = "plural", generate_default = function(data, args, defspec) local gender = get_masc_or_feminine_gender(data, "noun plural") if gender == "m" then if defspec == "+f" then return default_feminine_plural(data, args) else return default_masculine_plural(data, args) end elseif defspec == "+f" then error("Can't specify '+f' with feminine gender; just use '+'") else return default_feminine_plural(data, args) end end, -- Handle the case where pl=-, indicating an uncountable noun. handle = function(data, args, terms) if terms[1] and terms[1] == "-" then insert(data.categories, langname .. " uncountable nouns") if args.pauc and args.pauc[1] then error("Can't specify paucals when pl=-") end end end, allowed_defspecs = {["+"] = true, ["+f"] = true}, default_when_not_explicit = is_defaulting_noun, no_label = "<<uncountable>>", usually_no_label = "usually <<uncountable>>", } local noun_field_paucal = { field = "pauc", label = "<<paucal>>", generate_default = default_feminine_plural, } local noun_field_feminine = { field = "f", label = "feminine", generate_default = default_feminine, default_when_not_explicit = function(data, args) if data.orig_pos_category ~= "defaulting nouns" then return nil end local gender = get_masc_or_feminine_gender(data, "defaulting-if-masculine noun feminine") return gender == "m" end, } local noun_field_masculine = { field = "m", label = "masculine", generate_default = default_masculine, default_when_not_explicit = function(data, args) if data.orig_pos_category ~= "defaulting nouns" then return nil end local gender = get_masc_or_feminine_gender(data, "defaulting-if-feminine noun masculine") return gender == "f" end, } local noun_basic_inflections = { noun_field_cons, noun_field_inf, noun_field_obl, noun_field_def, } local noun_shared_inflections = { noun_field_dual, noun_field_plural, } local noun_extra_inflections = { noun_field_paucal, noun_field_feminine, noun_field_masculine, } local function get_noun_params() local params = {} add_gender_params(params) add_infl_list_params(params, noun_basic_inflections) add_infl_list_params(params, noun_shared_inflections) add_infl_list_params(params, noun_extra_inflections) params.nisba = boolean_param return params end local function handle_noun_args(data, args) handle_gender(data, args) handle_infl_list_args(data, args, noun_basic_inflections) handle_infl_list_args(data, args, noun_shared_inflections) handle_infl_list_args(data, args, noun_extra_inflections) for _, headobj in ipairs(data.heads) do if term_is_nisba(headobj.term, args.nisba) then insert(data.categories, langname .. " relative nouns (nisba)") break end end end pos_functions["nouns"] = { params = get_noun_params, func = handle_noun_args, } pos_functions["defaulting nouns"] = { params = get_noun_params, func = function(data, args) data.pos_category = "nouns" handle_noun_args(data, args) end, } local noun_field_singulative = {field = "sing", label = "<<singulative>>", defgender = "f", generate_default = default_feminine} local noun_field_collective = {field = "coll", label = "<<collective>>", defgender = "m", generate_default = default_masculine} local function handle_sing_coll_noun_infls(data, args, otherinfl, otherlabel, othergender) -- Handle sing= (corresponding singulative noun) or coll= (corresponding collective noun) and their gender handle_infl(data, args, otherinfl, otherlabel, nil, othergender) handle_infl_list_args(data, args, sing_coll_noun_inflections) end local function get_singulative_collective_noun_params(defgender, otherinfl) local params = {} add_gender_params(params, defgender) add_infl_list_params(params, noun_basic_inflections) add_infl_params(params, otherinfl) add_infl_list_params(params, noun_shared_inflections) add_infl_params(params, "pauc") return params end pos_functions["collective nouns"] = { params = function() return get_singulative_collective_noun_params("m", "sing") end, func = function(data, args) data.pos_category = "nouns" insert(data.categories, langname .. " collective nouns") m_headword_utilities.insert_fixed_inflection { headdata = data, label = "<<collective>>", } handle_gender(data, args) handle_infl_list_args(data, args, noun_basic_inflections) handle_infl(data, args, noun_field_singulative) handle_infl_list_args(data, args, noun_shared_inflections) handle_infl(data, args, noun_field_paucal) end } pos_functions["singulative nouns"] = { params = function() return get_singulative_collective_noun_params("f", "coll") end, func = function(data, args) data.pos_category = "nouns" insert(data.categories, langname .. " singulative nouns") m_headword_utilities.insert_fixed_inflection { headdata = data, label = "<<singulative>>", } handle_gender(data, args) handle_infl_list_args(data, args, noun_basic_inflections) handle_infl(data, args, noun_field_collective) handle_infl_list_args(data, args, noun_shared_inflections) handle_infl(data, args, noun_field_paucal) end } -- FIXME: Do numerals really behave almost as nouns? They vary by masc/fem. pos_functions["numerals"] = { params = get_noun_params, func = function(data, args) insert(data.categories, langname .. " cardinal numbers") handle_noun_args(data, args) end } pos_functions["proper nouns"] = { params = get_noun_params, func = handle_noun_args, } local function get_pronoun_params() local params = {} add_gender_params(params, defgender) add_infl_list_params(params, noun_basic_inflections) add_infl_list_params(params, noun_shared_inflections) add_infl_params(params, "f") return params end pos_functions["pronouns"] = { params = get_pronoun_params, func = function(data, args) handle_gender(data, args) handle_infl_list_args(data, args, noun_basic_inflections) handle_infl_list_args(data, args, noun_shared_inflections) handle_infl(data, args, noun_field_feminine) end } ----------------------------------------------------------------------------------------- -- Non-lemma forms -- ----------------------------------------------------------------------------------------- local valid_forms = list_to_set( { "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "X", "XI", "XII", "XIII", "XIV", "XV", "Iq", "IIq", "IIIq", "IVq" }) -- FIXME: Partly duplicated in [[Module:ar-inflections]]. local function handle_conj_form(data, args) local form = args[2] if form then if not valid_forms[form] then error("Invalid verb conjugation form " .. form) end insert(data.inflections, { label = "[[Appendix:Arabic verbs#Form " .. form .. "|form " .. form .. "]]" }) end end pos_functions["verb forms"] = { params = function() return { [2] = {}, } end, func = function(data, args) handle_conj_form(data, args) end } local function get_participle_params() local params = get_adj_params() params[2] = {} return params end pos_functions["active participles"] = { params = get_participle_params, func = function(data, args) data.pos_category = "participles" insert(data.categories, langname .. " active participles") handle_conj_form(data, args) handle_infl_list_args(data, args, adj_inflections) end } pos_functions["passive participles"] = { params = get_participle_params, func = function(data, args) data.pos_category = "participles" insert(data.categories, langname .. " passive participles") handle_conj_form(data, args) handle_infl_list_args(data, args, adj_inflections) end } ----------------------------------------------------------------------------------------- -- Verbs -- ----------------------------------------------------------------------------------------- pos_functions["verbs"] = { head_is_not_1 = true, params = function() return { [1] = {}, -- Comma-separated lists with possible inline modifiers ["past"] = {}, ["past1s"] = {}, ["nonpast"] = {}, ["vn"] = {}, ["noautolinktext"] = {type = "boolean"}, ["noautolinkverb"] = {type = "boolean"}, } end, func = function(data, args) local ar_verb = require(ar_verb_module) local alternant_multiword_spec = args[1] ~= "-" and ar_verb.do_generate_forms(args, "ar-verb", data.pagename) or nil local function do_slot(slots_to_check, override, label, slot_is_headword) -- Do this even with an override so we can return the correct filled slot. local slot, slotval if alternant_multiword_spec then for _, potential_slot in ipairs(slots_to_check) do slotval = alternant_multiword_spec.forms[potential_slot] if slotval then slot = potential_slot break end end end local function get_slot_values() local terms = {} for _, form in ipairs(slotval) do local term = { term = form.form, id = form.id, genders = form.genders, pos = form.pos, lit = form.lit, } term.tr = form.translit if form.footnotes then local quals, refs = require(inflection_utilities_module). convert_footnotes_to_qualifiers_and_references(form.footnotes) term.q = quals term.refs = refs end insert(terms, term) end return terms end if override then local override_param_mods = { alt = {}, t = { -- [[Module:headword]] expects the gloss in "gloss". item_dest = "gloss", }, gloss = {}, g = { -- [[Module:headword]] expects the genders in "genders". item_dest = "genders", type = "genders", }, pos = {}, lit = {}, id = {}, -- Qualifiers and labels q = { type = "qualifier", }, qq = { type = "qualifier", }, l = { type = "labels", }, ll = { type = "labels", }, ref = { -- [[Module:headword]] expects the references in "refs". item_dest = "refs", type = "references", }, } local function generate_obj(formval, parse_err) if formval == "+" then return {term = "+", underlying_terms = get_slot_values()} end local val, uncertain = formval:match("^(.*)(%?)$") val = val or formval uncertain = not not uncertain local ar, translit = val:match("^(.*)//(.*)$") if not ar then ar = formval end local retval = {term = ar, uncertain = uncertain} retval.tr = translit end local terms if override:find("<") then terms = require(parse_utilities_module).parse_inline_modifiers(override, { paramname = paramname, param_mods = override_param_mods, generate_obj = generate_obj, splitchar = "[,،]", escape_fun = escape_comma_whitespace, unescape_fun = unescape_comma_whitespace, }) else terms = split_on_comma(override) for i, split in ipairs(terms) do terms[i] = generate_obj(split) end end -- See if + was supplied and we have to potentially flatten multiple default terms and harmonize -- default properties with override properties. local saw_underlying_terms = false for _, term in ipairs(terms) do if term.underlying_terms then saw_underlying_terms = true break end end if saw_underlying_terms then -- Flatten any default terms, copying the corresponding override properties over the default -- properties. Non-default terms get inserted directly. local flattened = {} for _, term in ipairs(terms) do if term.underlying_terms then for _, underlying in ipairs(term.underlying_terms) do for k, v in pairs(term) do if k ~= "term" and k ~= "underlying_terms" then if k == "uncertain" then underlying.uncertain = underlying.uncertain or v elseif type(v) ~= "table" or v[1] then -- Don't copy empty lists (which are the default) over possibly non-empty -- lists. underlying[k] = v end end end insert(flattened, underlying) end else insert(flattened, term) end end terms = flattened end if not slot_is_headword then terms.label = label end return terms, slot elseif not alternant_multiword_spec then return nil, slot else if not slotval then if slot_is_headword then -- FIXME, put "uncertain" as qualifier? Does this ever happen? return nil, slot elseif alternant_multiword_spec.slot_uncertain[slot] then return {label = label .. " uncertain"}, slot elseif alternant_multiword_spec.slot_explicitly_missing[slot] then return {label = "no " .. label}, slot else -- just say nothing about this slot return nil, slot end end local terms = get_slot_values() if not slot_is_headword then terms.label = label end return terms, slot end end local gloss_parts = {} for _, vform in ipairs(alternant_multiword_spec.verb_forms) do insert(gloss_parts, "[[Appendix:Arabic verbs#Form " .. vform .. "|" .. vform .. "]]") end if gloss_parts[1] then data.gloss = concat(gloss_parts, ", ") end if data.heads[1] and args.past then error("Can't specify both head= and past= to {{ar-verb}}; prefer past=") end if not alternant_multiword_spec.has_active then insert(data.inflections, {label = "passive-only"}) end -- Do this always so `past_slot` is correctly filled. local past, past_slot = do_slot(ar_verb.potential_lemma_slots, args.past, "-", "slot is headword") if data.heads[1] then -- user specified head=; don't override with past= or slot 'past_3sm' etc. else if past then data.heads = past end end local should_do_past1s = not not args.past1s if not should_do_past1s then local is_form_I = false for _, vform in ipairs(alternant_multiword_spec.verb_forms) do if vform == "I" then is_form_I = true break end end if is_form_I then require(inflection_utilities_module).map_word_specs(alternant_multiword_spec, function(base) if base.verb_form == "I" then for _, vowel_spec in ipairs(base.conj_vowels) do -- For form-I geminate verbs, the final vowel of the past is elided in the citation form. -- We want to display it for all cases other than active a~u and a~i (the most common -- cases). if vowel_spec.weakness == "geminate" then if ar_verb.is_passive_only(base.passive) then should_do_past1s = true break end local past_vowel = ar_verb.rget(vowel_spec.past) local nonpast_vowel = ar_verb.rget(vowel_spec.nonpast) if not (past_vowel == ar.A and (nonpast_vowel == ar.U or nonpast_vowel == ar.I)) then should_do_past1s = true break end end end -- FIXME, provide way of breaking early from map_word_specs(). end end) end end local past1s if should_do_past1s then past1s, _ = do_slot({"past_1s", "past_pass_1s"}, args.past1s, "first-person singular past") if past1s then insert(data.inflections, past1s) end end local nonpast_slots if not past_slot or past_slot:find("^past_") then nonpast_slots = {"ind_3ms", "ind_pass_3ms", "imp_2ms"} else nonpast_slots = {} end local nonpast, _ = do_slot(nonpast_slots, args.nonpast, "non-past") if nonpast then insert(data.inflections, nonpast) end local vn, _ = do_slot({"vn"}, args.vn, "verbal noun") if vn then insert(data.inflections, vn) end -- FIXME: Should we insert categories? Conjugation also does it and is more likely to be accurate. --for _, cat in ipairs(alternant_multiword_spec.categories) do -- insert(data.categories, cat) --end --[=[ -- FIXME: Review this to see if we need to port it. -- If the user didn't explicitly specify head=, or specified exactly one head (not 2+) and we were able to -- incorporate any links in that head into the 1= specification, use the infinitive generated by -- [[Module:pt-verb]] in place of the user-specified or auto-generated head. This was copied from -- [[Module:it-headword]], where doing this gets accents marked on the verb(s). We don't have accents marked on -- the verb but by doing this we do get any footnotes on the infinitive propagated here. Don't do this if the -- user gave multiple heads or gave a head with a multiword-linked verbal expression such as Italian -- '[[dare esca]] [[al]] [[fuoco]]' (FIXME: give Portuguese equivalent). if not data.user_specified_heads[1] or ( not data.user_specified_heads[2] and alternant_multiword_spec.incorporated_headword_head_into_lemma ) then data.heads = {} for _, lemma_obj in ipairs(alternant_multiword_spec.forms.infinitive_linked) do local quals, refs = require(inflection_utilities_module). convert_footnotes_to_qualifiers_and_references(lemma_obj.footnotes) insert(data.heads, {term = lemma_obj.form, q = quals, refs = refs}) end end ]=] end } ----------------------------------------------------------------------------------------- -- Generic parts of speech -- ----------------------------------------------------------------------------------------- pos_functions.head_with_gender = { params = function() return { [3] = {type = "genders"}, } end, func = function(data, args) handle_gender(data, args, "nonlemma", 3) end, } return export 9vikodozg8ooobpmp9upyzh3n3sr6ey Module:ar-pronunciation 828 8169 27704 2026-06-21T14:23:35Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local m_str_utils = require("Module:string utilities") local m_table = require("Module:table") local audio_module = "Module:audio" local parse_utilities_module = "Module:parse utilities" local rfind = m_str_utils.find local rsplit = m_str_utils.split local ugsub = m_str_utils.gsub local ulen = m_str_utils.len local ulower = m_str_utils.lower local usub = m_str_utils.sub local concat = table.concat local insert = table.insert local lang = req...' 27704 Scribunto text/plain local export = {} local m_str_utils = require("Module:string utilities") local m_table = require("Module:table") local audio_module = "Module:audio" local parse_utilities_module = "Module:parse utilities" local rfind = m_str_utils.find local rsplit = m_str_utils.split local ugsub = m_str_utils.gsub local ulen = m_str_utils.len local ulower = m_str_utils.lower local usub = m_str_utils.sub local concat = table.concat local insert = table.insert local lang = require("Module:languages").getByCode("ar") local sc = require("Module:scripts").getByCode("Arab") local correspondences = { ["ʾ"] = "ʔ", ["ṯ"] = "θ", ["j"] = "d͡ʒ", ["ḥ"] = "ħ", ["ḵ"] = "x", ["ḏ"] = "ð", ["š"] = "ʃ", ["ṣ"] = "sˤ", ["ḍ"] = "dˤ", ["ṭ"] = "tˤ", ["ẓ"] = "ðˤ", ["ž"] = "ʒ", ["ʿ"] = "ʕ", ["ḡ"] = "ɣ", ["ḷ"] = "lˤ", ["ū"] = "uː", ["ī"] = "iː", ["ā"] = "aː", ["y"] = "j", ["g"] = "ɡ", ["ē"] = "eː", ["ō"] = "oː", [""] = "", } local vowels = "aāeēiīoōuū" local vowel = "[" .. vowels .. "]" local long_vowels = "āēīōū" local long_vowel = "[" .. long_vowels .. "]" local consonant = "[^" .. vowels .. ". -]" local syllabify_pattern = "(" .. vowel .. ")(" .. consonant .. "?)(" .. consonant .. "?)(" .. vowel .. ")" local tie = "‿" local closed_syllable_shortening_pattern = "(" .. long_vowel .. ")(" .. tie .. ")" .. "(" .. consonant .. ")" local function rsub(term, foo, bar) local retval = ugsub(term, foo, bar) return retval end local function generate_obj(respelling) return { respelling = respelling } end local function combine_qualifiers(qual1, qual2) if not qual1 then return qual2 end if not qual2 then return qual1 end local qualifiers = m_table.deepCopy(qual1) for _, qual in ipairs(qual2) do m_table.insertIfNot(qualifiers, qual) end return qualifiers end local function split_on_comma(term) if not term then return nil end if term:find(",%s") or term:find("\\") then return require(parse_utilities_module).split_on_comma(term) else return rsplit(term, ",") end end local function parse_respellings_with_modifiers(respelling, paramname) if respelling:find("[<%[]") then local put = require(parse_utilities_module) local segments = put.parse_multi_delimiter_balanced_segment_run(respelling, { { "<", ">" }, { "[", "]" } }) local comma_separated_groups = put.split_alternating_runs_on_comma(segments) local retval = {} for _, group in ipairs(comma_separated_groups) do local j = 2 while j <= #group do if not group[j]:find("^<.*>$") then group[j - 1] = group[j - 1] .. group[j] .. group[j + 1] table.remove(group, j) table.remove(group, j) else j = j + 2 end end local param_mods = { q = { type = "qualifier" }, qq = { type = "qualifier" }, a = { type = "labels" }, aa = { type = "labels" }, ref = { item_dest = "refs", type = "references" }, } table.insert(retval, put.parse_inline_modifiers_from_segments { group = group, arg = respelling, props = { paramname = paramname, param_mods = param_mods, generate_obj = generate_obj, }, }) end return retval else local retval = {} for _, item in ipairs(split_on_comma(respelling)) do table.insert(retval, generate_obj(item)) end return retval end end local function parse_pron_modifier(arg, paramname, generate_obj, param_mods, splitchar) splitchar = splitchar or "," if arg:find("<") then param_mods.q = { type = "qualifier" } param_mods.qq = { type = "qualifier" } param_mods.a = { type = "labels" } param_mods.aa = { type = "labels" } param_mods.ref = { item_dest = "refs", type = "references" } return require(parse_utilities_module).parse_inline_modifiers(arg, { param_mods = param_mods, generate_obj = generate_obj, paramname = paramname, splitchar = splitchar, }) else local retval = {} local split_arg = splitchar == "," and split_on_comma(arg) or rsplit(arg, splitchar) for _, term in ipairs(split_arg) do table.insert(retval, generate_obj(term)) end return retval end end local function parse_audio(lang, arg, pagename, paramname) local param_mods = { IPA = { sublist = true }, text = {}, t = { item_dest = "gloss" }, gloss = {}, pos = {}, lit = {}, g = { item_dest = "genders", sublist = true }, bad = {}, cap = { item_dest = "caption" }, } local function process_special_chars(val) if not val then return val end return (val:gsub("#", pagename)) end local function generate_audio_obj(arg) return { file = process_special_chars(arg) } end local retvals = parse_pron_modifier(arg, paramname, generate_audio_obj, param_mods, "%s*;%s*") for _, retval in ipairs(retvals) do retval.lang = lang retval.text = process_special_chars(retval.text) retval.caption = process_special_chars(retval.caption) local textobj = require(audio_module).construct_audio_textobj(retval) retval.text = textobj retval.gloss = nil retval.pos = nil retval.lit = nil retval.genders = nil end return retvals end local function parse_regional_phonetics(ph_arg, pagename) if not ph_arg or ph_arg == "" then return {} end local regionals = {} for _, item in ipairs(rsplit(ph_arg, "%s*;%s*")) do local audio = nil local item_no_mod = item:gsub("<a:([^>]+)>", function(a) audio = a:gsub("#", pagename) return "" end) local region, ipa = item_no_mod:match("^([^:]+):(.+)$") if region and ipa then local regions = rsplit(region, "%s*,%s*") table.insert(regionals, { regions = regions, ipa = ipa, audio = audio }) end end return regionals end local function syllabify(text) text = ugsub(text, "%-(" .. consonant .. ")%-(" .. consonant .. ")", "%1.%2") text = ugsub(text, "%-", ".") for _ = 1, 2 do text = ugsub( text, syllabify_pattern, function(a, b, c, d) if c == "" and b ~= "" then c, b = b, "" end return a .. b .. "." .. c .. d end ) end text = ugsub(text, "(" .. vowel .. ") (" .. consonant .. ")%.?(" .. consonant .. ")", "%1" .. tie .. "%2.%3") return text end local function closed_syllable_shortening(text) local shorten = { ["ā"] = "a", ["ē"] = "e", ["ī"] = "i", ["ō"] = "o", ["ū"] = "u", } text = ugsub(text, closed_syllable_shortening_pattern, function(vowel, tie, consonant) return shorten[vowel] .. tie .. consonant end) return text end function export.link(term) return require("Module:links").full_link { term = term, lang = lang, sc = sc } end function export.toIPA(list, silent_error) local translit if list.tr then translit = list.tr elseif list.term then require("Module:script utilities").checkScript(list.term, "Arab") translit = lang:transliterate(list.term) if not translit then if silent_error then return '' else error('Module:ar-translit failed to generate a transliteration from "' .. list.term .. '".') end end else if silent_error then return '' else error('No Arabic text or transliteration was provided to the function "toIPA".') end end translit = ugsub(translit, "llāh", "ḷḷāh") translit = ugsub(translit, "([iī] ?)ḷḷ", "%1ll") translit = ugsub(translit, "%(t%)", "") translit = ugsub(translit, "(" .. vowel .. ") " .. vowel, "%1 ") translit = ugsub(translit, "%-?l%-?", "l") translit = syllabify(translit) translit = closed_syllable_shortening(translit) local output = ugsub(translit, ".", correspondences) output = ugsub(output, "%-", "") return output end function export.get_pron_info(terms, pagename, paramname) if #terms == 1 and terms[1].respelling == "-" then return { pron_list = nil } end local pron_list = {} local brackets = "/%s/" for _, term in ipairs(terms) do local respelling = term.respelling local ar_term, tr if not respelling or respelling == "" or respelling == "#" then ar_term = pagename elseif rfind(respelling, "[a-zA-Z]") then tr = respelling elseif respelling:find("[ء-ي]") then ar_term = respelling else tr = respelling end local pron = export.toIPA({ term = ar_term, tr = tr }, false) if pron and pron ~= "" then local bracketed_pron = brackets:format(pron) table.insert(pron_list, { pron = bracketed_pron, q = term.q, qq = term.qq, a = term.a, aa = term.aa, refs = term.refs, }) end end return { pron_list = pron_list } end function export.show_old(frame) local params = { [1] = { list = true, allow_holes = true }, ["tr"] = { list = true, allow_holes = true }, ["qual"] = { list = true, allow_holes = true }, ["nl"] = { type = "boolean" }, ["ann"] = {}, } local args = require("Module:parameters").process(frame:getParent().args, params) local ar_terms = args[1] local transliterations = args.tr local qualifiers = args.qual local nl = args.nl if not (ar_terms.maxindex > 0 or transliterations.maxindex > 0) then if mw.title.getCurrentTitle().nsText == "Template" then ar_terms[1] = "كَلِمَة" ar_terms.maxindex = 1 else error( 'Please provide vocalized Arabic in the first parameter of {{[[Template:ar-IPA|ar-IPA]]}}, or transliteration in the "tr" parameter.') end end local pronunciations = {} for i = 1, math.max(ar_terms.maxindex, transliterations.maxindex) do local ar_term = ar_terms[i] local tr = transliterations[i] local qual = qualifiers[i] if not (ar_term or tr) then error("There is a gap in the parameters. Provide either |" .. i .. "= or |tr" .. i .. "=.") elseif ar_term and tr then mw.logObject("Duplicate parameters |" .. i .. "= and |tr" .. i .. "= in {{ar-IPA}},") end local pron = export.toIPA { term = ar_term, tr = tr } table.insert(pronunciations, { pron = "/" .. pron .. "/", qualifiers = qual and { qual } or nil }) end local anntext = "" if args.ann then anntext = args.ann if args.ann:find("%+") then local anndefs = {} for i = 1, ar_terms.maxindex do local ar_term = ar_terms[i] if ar_term then table.insert(anndefs, "'''" .. ar_term .. "'''") end end if anndefs[1] then anndefs = table.concat(anndefs, ", ") anntext = anntext:gsub("%+", require("Module:string utilities").replacement_escape(anndefs)) end end anntext = require("Module:qualifier").format_qualifier(anntext, "", "") .. ":&#32;" end if nl then return anntext .. require("Module:IPA").format_IPA_multiple(lang, pronunciations) else return anntext .. require("Module:IPA").format_IPA_full { lang = lang, items = pronunciations } end end function export.show(frame) local parent_args = frame:getParent().args local process = require("Module:parameters").process local params = { [1] = {}, ["audios"] = {}, ["a"] = { alias_of = "audios" }, ["ph"] = {}, ["pagename"] = {}, ["indent"] = {}, ["ann"] = {}, } local args = process(parent_args, params) local pagename = args.pagename or mw.loadData("Module:headword/data").pagename local indent = args.indent or "*" local termspec = args[1] or "#" local terms = parse_respellings_with_modifiers(termspec, 1) local pronobj = export.get_pron_info(terms, pagename, 1) local regional_phonetics = parse_regional_phonetics(args.ph, pagename) local parts = {} local function ins(text) table.insert(parts, text) end local anntext = "" if args.ann then anntext = args.ann if args.ann:find("%+") then local anndefs = {} for _, term in ipairs(terms) do local respelling = term.respelling if respelling and respelling:find("[ء-ي]") then table.insert(anndefs, "'''" .. respelling .. "'''") end end if anndefs[1] then anndefs = table.concat(anndefs, ", ") anntext = anntext:gsub("%+", require("Module:string utilities").replacement_escape(anndefs)) end end anntext = require("Module:qualifier").format_qualifier(anntext, "", "") .. ":&#32;" end if pronobj.pron_list and #pronobj.pron_list > 0 then local formatted = require("Module:IPA").format_IPA_full { lang = lang, items = pronobj.pron_list } ins(indent .. anntext .. mw.ustring.toNFC(formatted)) end if args.audios then local format_audio = require("Module:audio").format_audio local audio_objs = parse_audio(lang, args.audios, pagename, "audios") for i, audio_obj in ipairs(audio_objs) do if #audio_objs > 1 and not audio_obj.caption then audio_obj.caption = "Audio " .. i end ins("\n" .. indent .. " " .. format_audio(audio_obj)) end end if #regional_phonetics > 0 then local m_IPA = require("Module:IPA") local m_accent = require("Module:accent qualifier") for _, regional in ipairs(regional_phonetics) do local regions = regional.regions local ipa = regional.ipa local pron_item = { pron = "[" .. ipa .. "]" } local formatted_ipa = m_IPA.format_IPA_full { lang = lang, items = { pron_item } } local formatted_region = m_accent.format_qualifiers(lang, regions) local line = "\n" .. indent .. indent .. " " .. formatted_region .. " " .. mw.ustring.toNFC(formatted_ipa) if regional.audio then local audio_obj = { lang = lang, file = regional.audio, } local textobj = require(audio_module).construct_audio_textobj(audio_obj) audio_obj.text = textobj line = line .. " " .. require("Module:audio").format_audio(audio_obj) end ins(line) end end return concat(parts) end return export fufezqvhcygwotxk2txtm6ulorript3 Module:ar-nominals 828 8170 27705 2026-06-21T14:58:09Z Umarxon III 2840 Sahypa döretdi, mazmuny: '-- Author: Benwing, based on early version by CodeCat. --[[ FIXME: Nouns/adjectives to create to exemplify complex declensions: -- riḍan (رِضًا or رِضًى) --]] local m_utilities = require("Module:utilities") local m_links = require("Module:links") local ar_utilities = require("Module:ar-utilities") local lang = require("Module:languages").getByCode("ar") local u = require("Module:string/char") local rfind = mw.ustring.find local rsubn = mw.ustring.g...' 27705 Scribunto text/plain -- Author: Benwing, based on early version by CodeCat. --[[ FIXME: Nouns/adjectives to create to exemplify complex declensions: -- riḍan (رِضًا or رِضًى) --]] local m_utilities = require("Module:utilities") local m_links = require("Module:links") local ar_utilities = require("Module:ar-utilities") local lang = require("Module:languages").getByCode("ar") local u = require("Module:string/char") local rfind = mw.ustring.find local rsubn = mw.ustring.gsub local rmatch = mw.ustring.match local rsplit = mw.text.split -- This is used in place of a transliteration when no manual -- translit is specified and we're unable to automatically generate -- one (typically because some vowel diacritics are missing). local BOGUS_CHAR = u(0xFFFD) -- hamza variants local HAMZA = u(0x0621) -- hamza on the line (stand-alone hamza) = ء local HAMZA_ON_ALIF = u(0x0623) local HAMZA_ON_W = u(0x0624) local HAMZA_UNDER_ALIF = u(0x0625) local HAMZA_ON_Y = u(0x0626) local HAMZA_ANY = "[" .. HAMZA .. HAMZA_ON_ALIF .. HAMZA_UNDER_ALIF .. HAMZA_ON_W .. HAMZA_ON_Y .. "]" local HAMZA_PH = u(0xFFF0) -- hamza placeholder -- various letters local ALIF = u(0x0627) -- ʾalif = ا local AMAQ = u(0x0649) -- ʾalif maqṣūra = ى local AMAD = u(0x0622) -- ʾalif madda = آ local TAM = u(0x0629) -- tāʾ marbūṭa = ة local T = u(0x062A) -- tāʾ = ت local HYPHEN = u(0x0640) local N = u(0x0646) -- nūn = ن local W = u(0x0648) -- wāw = و local Y = u(0x064A) -- yā = ي -- diacritics local A = u(0x064E) -- fatḥa local AN = u(0x064B) -- fatḥatān (fatḥa tanwīn) local U = u(0x064F) -- ḍamma local UN = u(0x064C) -- ḍammatān (ḍamma tanwīn) local I = u(0x0650) -- kasra local IN = u(0x064D) -- kasratān (kasra tanwīn) local SK = u(0x0652) -- sukūn = no vowel local SH = u(0x0651) -- šadda = gemination of consonants local DAGGER_ALIF = u(0x0670) local DIACRITIC_ANY_BUT_SH = "[" .. A .. I .. U .. AN .. IN .. UN .. SK .. DAGGER_ALIF .. "]" -- common combinations local NA = N .. A local NI = N .. I local AH = A .. TAM local AT = A .. T local AA = A .. ALIF local AAMAQ = A .. AMAQ local AAH = AA .. TAM local AAT = AA .. T local II = I .. Y local IIN = II .. N local IINA = II .. NA local IY = II local UU = U .. W local UUN = UU .. N local UUNA = UU .. NA local AY = A .. Y local AW = A .. W local AYSK = AY .. SK local AWSK = AW .. SK local AAN = AA .. N local AANI = AA .. NI local AYN = AYSK .. N local AYNI = AYSK .. NI local AWN = AWSK .. N local AWNA = AWSK .. NA local AYNA = AYSK .. NA local AYAAT = AY .. AAT local UNU = "[" .. UN .. U .. "]" -- optional diacritics/letters local AOPT = A .. "?" local AOPTA = A .. "?" .. ALIF local IOPT = I .. "?" local UOPT = U .. "?" local UNOPT = UN .. "?" local UNUOPT = UNU .. "?" local SKOPT = SK .. "?" -- lists of consonants -- exclude tāʾ marbūṭa because we don't want it treated as a consonant -- in patterns like أَفْعَل local consonants_needing_vowels_no_tam = "بتثجحخدذرزسشصضطظعغفقكلمنهپچڤگڨڧأإؤئء" -- consonants on the right side; includes alif madda local rconsonants_no_tam = consonants_needing_vowels_no_tam .. "ويآ" -- consonants on the left side; does not include alif madda local lconsonants_no_tam = consonants_needing_vowels_no_tam .. "وي" local CONS = "[" .. lconsonants_no_tam .. "]" local CONSPAR = "([" .. lconsonants_no_tam .. "])" local LRM = u(0x200E) --left-to-right mark -- First syllable or so of elative/color-defect adjective local ELCD_START = "^" .. HAMZA_ON_ALIF .. AOPT .. CONSPAR local export = {} -------------------- -- Utility functions -------------------- function ine(x) -- If Not Empty if x == nil then return nil elseif rfind(x, '^".*"$') then local ret = rmatch(x, '^"(.*)"$') return ret elseif rfind(x, "^'.*'$") then local ret = rmatch(x, "^'(.*)'$") return ret elseif x == "" then return nil else return x end end -- Compare two items, recursively comparing arrays. -- FIXME, doesn't work for tables that aren't arrays. function equals(x, y) if type(x) == "table" and type(y) == "table" then if #x ~= #y then return false end for key, value in ipairs(x) do if not equals(value, y[key]) then return false end end return true end return x == y end -- true if array contains item function contains(tab, item) for _, value in pairs(tab) do if equals(value, item) then return true end end return false end -- append to array if element not already present function insert_if_not(tab, item) if not contains(tab, item) then table.insert(tab, item) end end -- version of rsubn() that discards all but the first return value function rsub(term, foo, bar) local retval = rsubn(term, foo, bar) return retval end -- version of rsub() that asserts that a match occurred function assert_rsub(term, foo, bar) local retval, numsub = rsubn(term, foo, bar) assert(numsub > 0) return retval end function make_link(arabic) --return m_links.full_link(nil, arabic, lang, nil, "term", nil, {tr = "-"}, false) return m_links.full_link({lang = lang, alt = arabic}, "term") end function track(page) require("Module:debug").track("ar-nominals/" .. page) return true end ------------------------------------- -- Functions for building inflections ------------------------------------- -- Functions that do the actual inflecting by creating the forms of a basic term. local inflections = {} local max_mods = 9 -- maximum number of modifiers local mod_list = {"mod"} -- list of "mod", "mod2", "mod3", ... for i=2,max_mods do table.insert(mod_list, "mod" .. i) end -- Create and return the 'data' structure that will hold all of the -- generated declensional forms, as well as other ancillary information -- such as the possible numbers, genders and cases the the actual numbers -- and states to store (in 'data.numbers' and 'data.states' respectively). function init_data() -- FORMS contains a table of forms for each inflectional category, -- e.g. "nom_sg_ind" for nouns or "nom_m_sg_ind" for adjectives. The value -- of an entry is an array of alternatives (e.g. different plurals), where -- each alternative is either a string of the form "ARABIC" or -- "ARABIC/TRANSLIT", or an array of such strings (this is used for -- alternative spellings involving different hamza seats, -- e.g. مُبْتَدَؤُون or مُبْتَدَأُون). Alternative hamza spellings are separated -- in display by an "inner separator" (/), while alternatives on -- the level of different plurals are separated by an "outer separator" (;). return {forms = {}, title = nil, categories = {}, allgenders = {"m", "f"}, allstates = {"ind", "def", "con"}, allnumbers = {"sg", "du", "pl"}, states = {}, -- initialized later numbers = {}, -- initialized later engnumbers = {sg="singular", du="dual", pl="plural", coll="collective", sing="singulative", pauc="paucal"}, engnumberscap = {sg="singular", du="dual", pl="plural", coll="collective", sing="singulative", pauc="paucal (3-10)"}, allcases = {"nom", "acc", "gen", "inf"}, allcases_with_lemma = {"nom", "acc", "gen", "inf", "lemma"}, -- index into endings array indicating correct ending for given -- combination of state and case statecases = { ind = {nom = 1, acc = 2, gen = 3, inf = 10, lemma = 13}, def = {nom = 4, acc = 5, gen = 6, inf = 11, lemma = 14}, -- used for a definite adjective modifying a construct-state noun defcon = {nom = 4, acc = 5, gen = 6, inf = 11, lemma = 14}, con = {nom = 7, acc = 8, gen = 9, inf = 12, lemma = 15}, }, } end -- Initialize and return ARGS, ORIGARGS and DATA (see init_data()). -- ARGS is a table of user-supplied arguments, massaged from the original -- arguments by converting empty-string arguments to nil and appending -- translit arguments to their base arguments with a separating slash. -- ORIGARGS is the original table of arguments. function init(origargs) -- Massage arguments by converting empty arguments to nil, and -- "" or '' arguments to empty. local args = {} for k, v in pairs(origargs) do args[k] = ine(v) end -- Further massage arguments by appending translit arguments to the -- corresponding base arguments, with a slash separator, as is expected -- in the rest of the code. -- -- FIXME: We should consider separating translit and base arguments by the -- separators ; , | (used in overrides; see handle_lemma_and_overrides()) -- and matching up individual parts, to allow separate translit arguments -- to be specified for overrides. But maybe not; the point of allowing -- separate translit arguments is for compatibility with headword -- templates such as "ar-noun" and "ar-adj", and those templates don't -- handle override arguments. local function dotr(arg, argtr) if not args[arg] then error("Argument '" .. argtr .."' specified but not corresponding base argument '" .. arg .. "'") end args[arg] = args[arg] .. "/" .. args[argtr] end -- By convention, corresponding to arg 1 is tr; corresponding to -- head2, head3, ... is tr2, tr3, ...; corresponding to -- modhead2, modhead3, ... is modtr2, modtr3, ...; corresponding to -- modNhead2, modNhead3, ... is modNtr2, modNtr3, ..; corresponding to -- all other arguments FOO, FOO2, ... is FOOtr, FOO2tr, ... for k, v in pairs(args) do if k == "tr" then dotr(1, "tr") elseif rfind(k, "tr[0-9]+$") then dotr(assert_rsub(k, "tr([0-9]+)$", "head%1"), k) elseif rfind(k, "tr$") then dotr(assert_rsub(k, "tr$", ""), k) end end -- Construct data. local data = init_data() return args, origargs, data end -- Parse the user-specified state spec and other related arguments. The -- user can specify, using idafaN=, how modifiers are related to previous -- words. The user can also manually specify which states are to appear; -- whether to omit the definite article in the definite state; and -- how/whether to restrict modifiers to a particular state, case or number. -- Normally the modN_* parameters and basestate= do not need to be set -- directly; instead, use idafaN=. It may be necessary to explicitly -- specify state= in the presence of proper nouns or definite-only -- adjectival expressions. NOTE: At the time this function is called, -- data.numbers has not yet been initialized. function parse_state_etc_spec(data, args) local function check(arg, dataval, allvalues) if args[arg] then if not contains(allvalues, args[arg]) then error("For " .. arg .. "=, value '" .. args[arg] .. "' should be one of " .. table.concat(allvalues, ", ")) end data[dataval] = args[arg] end end local function check_boolean(arg, dataval) check(arg, dataval, {"yes", "no"}) if data[dataval] == "yes" then data[dataval] = true elseif data[dataval] == "no" then data[dataval] = false end end -- Make sure no holes in mod values for i=1,(#mod_list)-1 do if args[mod_list[i+1]] and not args[mod_list[i]] then error("Hole in modifier arguments -- " .. mod_list[i+1] .. " present but not " .. mod_list[i]) end end -- FIXME! Remove this once we're sure there are no instances of mod2 -- that haven't been converted to modhead2. if args["mod2"] then track("mod2") end -- Set default value; may be overridden e.g. by arg["state"] or -- by idafaN=. data.states = data.allstates -- List of pairs of idafaN/modN parameters local idafa_mod_list = {{"idafa", "mod"}} for i=2,max_mods do table.insert(idafa_mod_list, {"idafa" .. i, "mod" .. i}) end -- True if the value of an |idafa= param is a valid adjectival modifier -- value. local function valid_adjectival_idafaval(idafaval) return idafaval == "adj" or idafaval == "adj-base" or idafaval == "adj-mod" or rfind(idafaval, "^adj%-mod[0-9]+$") end -- Extract the referent (base or modifier) of an adjectival |idafa= param. -- Assumes the value is valid. local function adjectival_idafaval_referent(idafaval) if idafaval == "adj" then return "base" end return assert_rsub(idafaval, "^adj%-", "") end -- Convert a base/mod spec to an index: 0=base, 1=mod, 2=mod2, etc. local function basemod_to_index(basemod) if basemod == "base" then return 0 end if basemod == "mod" then return 1 end return tonumber(assert_rsub(basemod, "^mod", "")) end -- Recognize idafa spec and handle it. -- We do the following: -- (1) Check that if idafaN= is given, then modN= is also given. -- (2) Check that adjectival modifiers aren't followed by idafa modifiers. -- (3) Check that adjectival modifiers are modifying the base or an -- ʾidāfa modifier, not another adjectival modifier. -- (4) Support idafa values "adj-base", "adj-mod", "adj-mod2", "adj" -- (="adj-base") etc. and check that we're referring to an earlier -- word. -- (5) For ʾidāfa modifiers, set basestate=con, set modN_case=gen, -- set modN_idafa=true, and set modN_number to the number specified -- in the parameter value (e.g. 'sg' or 'def-pl'); and if the -- parameter value specifies a state (e.g. 'def' or 'ind-du'), -- set modN_state= to this value, and if this is the last ʾidāfa -- modifier, also set state= to this value; if this is not the last -- ʾidāfa modifier, set modN_state=con and disallow a state to be -- specified in the parameter value. -- (6) For adjectival modifiers of the base, do nothing. -- (7) For adjectival modifiers of ʾidāfa modifiers, set modN_case=gen; -- set modN_idafa=false; and set modN_number=, modN_numgen= and -- modN_state= to match the values of the idafa modifier. -- error checking and find last ʾidāfa modifier local last_is_idafa = true local last_idafa_mod = "base" for _, idafa_mod in ipairs(idafa_mod_list) do local idafaparam = idafa_mod[1] local mod = idafa_mod[2] local idafaval = args[idafaparam] if idafaval then local paramval = idafaparam .. "=" .. idafaval if not args[mod] then error("'" .. idafaparam .. "' parameter without corresponding '" .. mod .. "' parameter") end if not valid_adjectival_idafaval(idafaval) then -- We're a construct (ʾidāfa) modifier if not last_is_idafa then error("ʾidāfa modifier " .. paramval .. " follows adjectival modifier") end last_idafa_mod = mod else last_is_idafa = false local adjref = adjectival_idafaval_referent(idafaval) if adjref ~= "base" then if basemod_to_index(adjref) >= basemod_to_index(mod) then error(paramval .. " can only refer to an earlier element") end local idafaref = assert_rsub(adjref, "^mod", "idafa") if not args[idafaref] then error(paramval .. " cannot refer to a missing modifier") elseif valid_adjectival_idafaval(args[idafaref]) then error(paramval .. " cannot refer to an adjectival modifier") end end end end end -- Now go through and set all the modN_ data values appropriately. for _, idafa_mod in ipairs(idafa_mod_list) do local idafaparam = idafa_mod[1] local mod = idafa_mod[2] local idafaval = args[idafaparam] if idafaval then local paramval = idafaparam .. "=" .. idafaval local bad_idafa = true if idafaval == "yes" then idafaval = "sg" end if idafaval == "ind-def" or contains(data.allstates, idafaval) then idafaval = idafaval .. "-sg" end if not idafaval then bad_idafa = false elseif valid_adjectival_idafaval(idafaval) then local adjref = adjectival_idafaval_referent(idafaval) if adjref ~= "base" then data[mod .. "_case"] = "gen" data[mod .. "_state"] = data[adjref .. "_state"] -- if agreement is with ind-def, make it def if data[mod .. "_state"] == "ind-def" then data[mod .. "_state"] = "def" end data[mod .. "_number"] = data[adjref .. "_number"] data[mod .. "_numgen"] = data[adjref .. "_numgen"] data[mod .. "_idafa"] = false end bad_idafa = false elseif contains(data.allnumbers, idafaval) then data.basestate = "con" data[mod .. "_case"] = "gen" data[mod .. "_number"] = idafaval data[mod .. "_idafa"] = true if mod ~= last_idafa_mod then data[mod .. "_state"] = "con" end bad_idafa = false elseif rfind(idafaval, "%-") then local state_num = rsplit(idafaval, "%-") -- Support ind-def as a possible value. We set modstate to -- ind-def, which will signal definite agreement with adjectival -- modifiers; then later on we change the value to ind. if #state_num == 3 and state_num[1] == "ind" and state_num[2] == "def" then state_num[1] = "ind-def" state_num[2] = state_num[3] table.remove(state_num) end if #state_num == 2 then local state = state_num[1] local num = state_num[2] if (state == "ind-def" or contains(data.allstates, state)) and contains(data.allnumbers, num) then if mod == last_idafa_mod then if state == "ind-def" then data.states = {"def"} else data.states = {state} end else error(paramval .. " cannot specify a state because it is not the last ʾidāfa modifier") end data.basestate = "con" data[mod .. "_case"] = "gen" data[mod .. "_state"] = state data[mod .. "_number"] = num data[mod .. "_idafa"] = true bad_idafa = false end end end if bad_idafa then error(paramval .. " should be one of yes, def, sg, def-sg, adj, adj-base, adj-mod, adj-mod2 or similar") end end end if args["state"] == "ind-def" then data.states = {"def"} data.basestate = "ind" elseif args["state"] then data.states = rsplit(args["state"], ",") for _, state in ipairs(data.states) do if not contains(data.allstates, state) then error("For state=, value '" .. state .. "' should be one of " .. table.concat(data.allstates, ", ")) end end end -- Now process explicit settings, so that they can override the -- settings based on idafaN=. check("basestate", "basestate", data.allstates) check_boolean("noirreg", "noirreg") check_boolean("omitarticle", "omitarticle") data.prefix = args.prefix for _, mod in ipairs(mod_list) do check(mod .. "state", mod .. "_state", data.allstates) check(mod .. "case", mod .. "_case", data.allcases) check(mod .. "number", mod .. "_number", data.allnumgens) check(mod .. "numgen", mod .. "_numgen", data.allnumgens) check_boolean(mod .. "idafa", mod .. "_idafa") check_boolean(mod .. "omitarticle", mod .. "_omitarticle") data[mod .. "_prefix"] = args[mod .. "prefix"] end -- Make sure modN_numgen is initialized, to modN_number if necessary. -- This simplifies logic in certain places, e.g. call_inflections(). -- Also convert ind-def to ind. for _, mod in ipairs(mod_list) do data[mod .. "_numgen"] = data[mod .. "_numgen"] or data[mod .. "_number"] if data[mod .. "_state"] == "ind-def" then data[mod.. "_state"] = "ind" end end end -- Parse the user-specified number spec. The user can manually specify which -- numbers are to appear. Return true if |number= was specified. function parse_number_spec(data, args) if args["number"] then data.numbers = rsplit(args["number"], ",") for _, num in ipairs(data.numbers) do if not contains(data.allnumbers, num) then error("For number=, value '" .. num .. "' should be one of " .. table.concat(data.allnumbers, ", ")) end end return true else data.numbers = data.allnumbers return false end end -- Determine which numbers will appear using the logic for nouns. -- See comment just below. function determine_noun_numbers(data, args, pls) -- Can manually specify which numbers are to appear, and exactly those -- numbers will appear. Otherwise, if any plurals given, duals and plurals -- appear; else, only singular (on the assumption that the word is a proper -- noun or abstract noun that exists only in the singular); however, -- singular won't appear if "-" given for singular, and similarly for dual. if not parse_number_spec(data, args) then data.numbers = {} local sgarg1 = args[1] local duarg1 = args["d"] if sgarg1 ~= "-" then table.insert(data.numbers, "sg") end if #pls["base"] > 0 then -- Dual appears if either: explicit dual stem (not -) is given, or -- default dual is used and explicit singular stem (not -) is given. if (duarg1 and duarg1 ~= "-") or (not duarg1 and sgarg1 ~= "-") then table.insert(data.numbers, "du") end table.insert(data.numbers, "pl") elseif duarg1 and duarg1 ~= "-" then -- If explicit dual but no plural given, include it. Useful for -- dual tantum words. table.insert(data.numbers, "du") end end end -- For stem STEM, convert to stem-and-type format and insert stem and type -- into RESULTS, checking to make sure it's not already there. SGS is the -- list of singular items to base derived forms off of (masculine or feminine -- as appropriate), an array of length-two arrays of {COMBINED_STEM, TYPE} as -- returned by stem_and_type(); ISFEM is true if this is feminine gender; -- NUM is "sg", "du" or "pl". POS is the part of speech, generally "noun" or -- "adjective". function insert_stems(stem, results, sgs, isfem, num, pos) if stem == "-" then return end for _, sg in ipairs(sgs) do local combined_stem, ty = export.stem_and_type(stem, sg[1], sg[2], isfem, num, pos) insert_if_not(results, {combined_stem, ty}) end end -- Handle manually specified overrides of individual forms. Separate -- outer-level alternants with ; or , or the Arabic equivalents; separate -- inner-level alternants with | (we can't use / because it's already in -- use separating Arabic from translit). -- -- Also determine lemma and allow it to be overridden. -- Also allow POS (part of speech) to be overridden. function handle_lemma_and_overrides(data, args) local function handle_override(arg) if args[arg] then local ovval = {} local alts1 = rsplit(args[arg], "[;,؛،]") for _, alt1 in ipairs(alts1) do local alts2 = rsplit(alt1, "|") table.insert(ovval, alts2) end data.forms[arg] = ovval end end local function do_overrides(mod) for _, numgen in ipairs(data.allnumgens) do for _, state in ipairs(data.allstates) do for _, case in ipairs(data.allcases) do local arg = mod .. case .. "_" .. numgen .. "_" .. state handle_override(arg) if args[arg] and not data.noirreg then insert_cat(data, mod, numgen, "Arabic NOUNs with irregular SINGULAR", "SINGULAR of irregular NOUN") end end end end end do_overrides("") for _, mod in ipairs(mod_list) do do_overrides(mod .. "_") end local function get_lemma(mod) for _, numgen in ipairs(data.numgens()) do for _, state in ipairs(data.states) do local arg = mod .. "lemma_" .. numgen .. "_" .. state if data.forms[arg] and #data.forms[arg] > 0 then return data.forms[arg] end end end return nil end data.forms["lemma"] = get_lemma("") for _, mod in ipairs(mod_list) do data.forms[mod .. "_lemma"] = get_lemma(mod .. "_") end handle_override("lemma") for _, mod in ipairs(mod_list) do handle_override(mod .. "_lemma") end end -- Return the part of speech based on the part of speech contained in -- data.pos and MOD (either "", "mod_", "mod2_", etc., same as in -- do_gender_number_1()). If we're a modifier, don't use data.pos but -- instead choose based on whether modifier is adjectival or nominal -- (ʾiḍāfa). function get_pos(data, mod) local ismod = mod ~= "" if not ismod then return data.pos elseif data[mod .. "idafa"] then return "noun" else return "adjective" end end -- Find the stems associated with a particular gender/number combination. -- ARGS is the set of all arguments. ARGPREFS is an array of argument prefixes -- (e.g. "f" for the actual arguments "f", "f2", ..., for the feminine -- singular; we allow more than one to handle "cpl"). SGS is a -- "stem-type list" (see do_gender_number()), and is the list of stems to -- base derived forms off of (masculine or feminine as appropriate), an array -- of length-two arrays of {COMBINED_STEM, TYPE} as returned by -- stem_and_type(). DEFAULT, ISFEM and NUM are as in do_gender_number(). -- MOD is either "", "mod_", "mod2_", etc. depending if we're working on a -- base or modifier argument (in the latter case, basically if the argument -- begins with "mod"). function do_gender_number_1(data, args, argprefs, sgs, default, isfem, num, mod) local results = {} local function handle_stem(stem) insert_stems(stem, results, sgs, isfem, num, get_pos(data, mod)) end -- If no arguments specified, use the default instead. need_default = true for _, argpref in ipairs(argprefs) do if args[argpref] then need_default = false break end end if need_default then if not default then return results end handle_stem(default) return results end -- For explicitly specified arguments, make sure there's at least one -- stem to generate off of; otherwise specifying e.g. 'sing=- pauc=فُلَان' -- won't override paucal. if #sgs == 0 then sgs = {{"", ""}} end for _, argpref in ipairs(argprefs) do if args[argpref] then handle_stem(args[argpref]) end local i = 2 while args[argpref .. i] do handle_stem(args[argpref .. i]) i = i + 1 end end return results end -- For a given gender/number combination, parse and return the full set -- of stems for both base and modifier. The return value is a -- "stem specification", i.e. table with a "base" key for the base, a -- "mod" key for the first modifier (see below), a "mod2" key for the -- second modifier, etc. listing all stems for both the base and modifier(s). -- The value of each key is a "stem-type list", i.e. an array of stem-type -- pairs, where each element is a size-two array of {COMBINED_STEM, STEM_TYPE}. -- COMBINED_STEM is a stem with attached transliteration in the form -- STEM/TRANSLIT (where the transliteration is either manually specified in -- the stem argument, e.g. 'pl=لُورْدَات/lordāt', or auto-transliterated from -- the Arabic, with BOGUS_CHAR substituting for the transliteration if -- auto-translit fails). STEM_TYPE is the declension of the stem, either -- manually specified, e.g. 'بَبَّغَاء:di' for manually-specified diptote, or -- auto-detected (see stem_and_type() and detect_type()). -- -- DATA and ARGS are as in init(). ARGPREFS is an array of the prefixes for -- the argument(s) specifying the stem (and optional translit and declension -- type). For a given ARGPREF, we check ARGPREF, ARGPREF2, ARGPREF3, ... in -- turn for the base, and modARGPREF, modARGPREF2, modARGPREF3, ... in turn -- for the first modifier, and mod2ARGPREF, mod2ARGPREF2, mod2ARGPREF3, ... -- for the second modifier, etc. SGS is a stem specification (see above), -- giving the stems that are used to base derived forms off of (e.g. if a stem -- type "smp" appears in place of a stem, the sound masculine plural of the -- stems in SGS will be derived). DEFAULT is a single stem (i.e. a string) that -- is used when no stems were explicitly given by the user (typically either -- "f", "m", "d" or "p"), or nil for no default. ISFEM is true if we're -- accumulating stems for a feminine number/gender category, and NUM is the -- number (expected to be "sg", "du" or "pl") of the number/gender category -- we're accumulating stems for. -- -- About bases and modifiers: Note that e.g. in the noun phrase يَوْم الاِثْنَيْن -- the head noun يَوْم is the base and the noun الاِثْنَيْن is the modifier. -- In a noun phrase like البَحْر الأَبْيَض المُتَوَسِّط, there are two modifiers. -- Note that modifiers come in two varieties, adjectival modifiers and -- construct (ʾidāfa) modifiers. The first above noun phrase is an example -- of a noun phrase with a construct modifier, where the base is fixed in -- the construct state and the modifier is fixed in number and case -- (which is always genitive) and possibly in state. The second above noun -- phrase is an example of a noun phrase with two adjectival modifiers. -- A construct modifier is generally a noun, whereas an adjectival modifier -- is an adjective that usually agrees in state, number and case with the -- base noun. (Note that in the case of multiple modifiers, it is possible -- for e.g. the second modifier to be an adjectival modifier that agrees -- with the first, construct, modifier, in which case its case will be fixed -- to genitive, its number will be fixed to the same number as the first -- modifier and its state will vary or not depending on whether the first -- modifier's state varies. It is not possible in general to distinguish -- adjectival and construct modifiers by looking at the values of -- modN_state, modN_case or modN_number, since e.g. a third modifier could -- have all of them specified and be either kind. Thus we have modN_idafa, -- which is true for a construct modifier, false otherwise.) function do_gender_number(data, args, argprefs, sgs, default, isfem, num) local results = do_gender_number_1(data, args, argprefs, sgs["base"], default, isfem, num, "") basemodtable = {base=results} for _, mod in ipairs(mod_list) do local modn_argprefs = {} for _, argpref in ipairs(argprefs) do table.insert(modn_argprefs, mod .. argpref) end local modn_results = do_gender_number_1(data, args, modn_argprefs, sgs[mod] or {}, default, isfem, num, mod .. "_") basemodtable[mod] = modn_results end return basemodtable end -- Generate inflections for the given combined stem and type, for MOD -- (either "" if we're working on the base or "mod_", "mod2_", etc. if we're -- working on a modifier) and NUMGEN (number or number-gender combination, -- of the sort that forms part of the keys in DATA.FORMS). function call_inflection(combined_stem, ty, data, mod, numgen) if ty == "-" then return end if not inflections[ty] then error("Unknown inflection type '" .. ty .. "'") end local ar, tr = split_arabic_tr(combined_stem) inflections[ty](ar, tr, data, mod, numgen) end -- Generate inflections for the stems of a given number/gender combination -- and for either the base or the modifier. STEMTYPES is a stem-type list -- (see do_gender_number()), listing all the stems and corresponding -- declension types. MOD is either "", "mod_", "mod2_", etc. depending on -- whether we're working on the base or a modifier. NUMGEN is the number or -- number-gender combination we're working on, of the sort that forms part -- of the keys in DATA.FORMS, e.g. "sg" or "m_sg". function call_inflections(stemtypes, data, mod, numgen) local mod_with_modnumgen = mod ~= "" and data[mod .. "numgen"] -- If modN_numgen= is given, do nothing if NUMGEN isn't the same if mod_with_modnumgen and data[mod .. "numgen"] ~= numgen then return end -- always call inflection() if mod_with_modnumgen since it may affect -- other numbers (cf. يَوْم الاِثْنَيْن) if mod_with_modnumgen or contains(data.numbers, rsub(numgen, "^.*_", "")) then for _, stemtype in ipairs(stemtypes) do call_inflection(stemtype[1], stemtype[2], data, mod, numgen) end end end -- Generate the entire set of inflections for a noun or adjective. -- Also handle any manually-specified part of speech and any manual -- inflection overrides. The value of INFLECTIONS is an array of stem -- specifications, one per number, where each element is a size-two -- array of a stem specification (containing the set of stems and -- corresponding declension types for the base and any modifiers; -- see do_gender_number()) and a NUMGEN string, i.e. a string identifying -- the number or number/gender in question (e.g. "sg", "du", "pl", -- "m_sg", "f_pl", etc.). function do_inflections_and_overrides(data, args, inflections) -- do this before generating inflections so POS change is reflected in -- categories if args["pos"] then data.pos = args["pos"] end for _, inflection in ipairs(inflections) do call_inflections(inflection[1]["base"] or {}, data, "", inflection[2]) for _, mod in ipairs(mod_list) do call_inflections(inflection[1][mod] or {}, data, mod .. "_", inflection[2]) end end handle_lemma_and_overrides(data, args) end -- Helper function for get_heads(). Parses the stems for either the -- base or the modifier (see do_gender_number()). ARG1 is the argument -- for the first stem and ARGN is the prefix of the arguments for the -- remaining stems. For example, for the singular base, ARG1=1 and -- ARGN="head"; for the first singular modifier, ARG1="mod" and -- ARGN="modhead"; for the plural base, ARG1=ARGN="pl". The arguments -- other than the first are numbered 2, 3, ..., which is appended to -- ARGN. MOD is either "", "mod_", "mod2_", etc. depending if we're -- working on a base or modifier argument. The returned value is an -- array of stems, where each element is a size-two array of -- {COMBINED_STEM, STEM_TYPE}. See do_gender_number(). function get_heads_1(data, args, arg1, argn, mod) if not args[arg1] then return {} end local heads if args[arg1] == "-" then heads = {{"", "-"}} else heads = {} insert_stems(args[arg1], heads, {{args[arg1], ""}}, false, "sg", get_pos(data, mod)) end local i = 2 while args[argn .. i] do local arg = args[argn .. i] insert_stems(arg, heads, {{arg, ""}}, false, "sg", get_pos(data, mod)) i = i + 1 end return heads end -- Very similar to do_gender_number(), and returns the same type of -- structure, but works specifically for the stems of the head (the -- most basic gender/number combiation, e.g. singular for nouns, -- masculine singular for adjectives and gendered nouns, collective -- for collective nouns, etc.), including both base and modifier. -- See do_gender_number(). Note that the actual return value is -- two items, the first of which is the same type of structure -- returned by do_gender_number() and the second of which is a boolean -- indicating whether we were called from within a template documentation -- page (in which case no user-specified arguments exist and we -- substitute sample ones). The reason for this boolean is to indicate -- whether sample arguments need to be substituted for other numbers -- as well. function get_heads(data, args, headtype) if not args[1] and mw.title.getCurrentTitle().nsText == "Template" then return {base={{"{{{1}}}", "tri"}}}, true end if not args[1] then error("Parameter 1 (" .. headtype .. " stem) may not be empty.") end local base = get_heads_1(data, args, 1, "head", "") basemodtable = {base=base} for _, mod in ipairs(mod_list) do local modn = get_heads_1(data, args, mod, mod .. "head", mod .. "_") basemodtable[mod] = modn end return basemodtable, false end -- The main entry point for noun tables. function export.show_noun(frame) local args, origargs, data = init(frame:getParent().args) data.pos = "noun" data.numgens = function() return data.numbers end data.allnumgens = data.allnumbers local sgs, is_template = get_heads(data, args, "singular") local pls = is_template and {base={{"{{{pl}}}", "tri"}}} or do_gender_number(data, args, {"pl", "cpl"}, sgs, nil, false, "pl") -- always do dual so cases like يَوْم الاِثْنَيْن work -- a singular with -- a dual modifier, where data.number refers only the singular -- but we need to go ahead and compute the dual so it parses the -- "modd" modifier dual argument. When the modifier dual argument -- is parsed, it will store the resulting dual declension for اِثْنَيْن -- in the modifier slot for all numbers, including specifically -- the singular. local dus = do_gender_number(data, args, {"d"}, sgs, "d", false, "du") parse_state_etc_spec(data, args) determine_noun_numbers(data, args, pls) do_inflections_and_overrides(data, args, {{sgs, "sg"}, {dus, "du"}, {pls, "pl"}}) -- Make the table return make_noun_table(data) end function any_feminine(data, stem_spec) for basemod, stemtypelist in pairs(stem_spec) do -- Only check modifiers if modN_numgen= not given. If not given, the -- modifier needs to be declined for all numgens; else only for the -- given numgen, which should be explicitly specified. if not (basemod ~= "base" and data[basemod .. "_numgen"]) then for _, stemtype in ipairs(stemtypelist) do if rfind(stemtype[1], TAM .. UNUOPT .. "/") then return true end end end end return false end function all_feminine(data, stem_spec) for basemod, stemtypelist in pairs(stem_spec) do -- Only check modifiers if modN_numgen= not given. If not given, the -- modifier needs to be declined for all numgens; else only for the -- given numgen, which should be explicitly specified. if not (basemod ~= "base" and data[basemod .. "_numgen"]) then for _, stemtype in ipairs(stemtypelist) do if not rfind(stemtype[1], TAM .. UNUOPT .. "/") then return false end end end end return true end -- The main entry point for collective noun tables. function export.show_coll_noun(frame) local args, origargs, data = init(frame:getParent().args) data.pos = "noun" data.allnumbers = {"coll", "sing", "du", "pauc", "pl"} data.engnumberscap["pl"] = "plural of variety" data.numgens = function() return data.numbers end data.allnumgens = data.allnumbers local colls, is_template = get_heads(data, args, "collective") local pls = is_template and {base={{"{{{pl}}}", "tri"}}} or do_gender_number(data, args, {"pl", "cpl"}, colls, nil, false, "pl") parse_state_etc_spec(data, args) -- If collective noun is already feminine in form, don't try to -- form a feminine singulative local collfem = any_feminine(data, colls) local sings = do_gender_number(data, args, {"sing"}, colls, not already_feminine and "f" or nil, true, "sg") local singfem = all_feminine(data, sings) local dus = do_gender_number(data, args, {"d"}, sings, "d", singfem, "du") local paucs = do_gender_number(data, args, {"pauc"}, sings, "paucp", singfem, "pl") -- Can manually specify which numbers are to appear, and exactly those -- numbers will appear. Otherwise, if any plurals given, plurals appear, -- and if singulative given, dual and paucal appear. if not parse_number_spec(data, args) then data.numbers = {} if args[1] ~= "-" then table.insert(data.numbers, "coll") end if #sings["base"] > 0 then table.insert(data.numbers, "sing") end if #dus["base"] > 0 then table.insert(data.numbers, "du") end if #paucs["base"] > 0 then table.insert(data.numbers, "pauc") end if #pls["base"] > 0 then table.insert(data.numbers, "pl") end end -- Generate the collective, singulative, dual, paucal and plural forms do_inflections_and_overrides(data, args, {{colls, "coll"}, {sings, "sing"}, {dus, "du"}, {paucs, "pauc"}, {pls, "pl"}}) -- Make the table return make_noun_table(data) end -- The main entry point for singulative noun tables. function export.show_sing_noun(frame) local args, origargs, data = init(frame:getParent().args) data.pos = "noun" data.allnumbers = {"sing", "coll", "du", "pauc", "pl"} data.engnumberscap["pl"] = "plural of variety" data.numgens = function() return data.numbers end data.allnumgens = data.allnumbers parse_state_etc_spec(data, args) local sings, is_template = get_heads(data, args, "singulative") -- If all singulative nouns feminine in form, form a masculine collective local singfem = all_feminine(data, sings) local colls = do_gender_number(data, args, {"coll"}, sings, singfem and "m" or nil, false, "sg") local dus = do_gender_number(data, args, {"d"}, sings, "d", singfem, "du") local paucs = do_gender_number(data, args, {"pauc"}, sings, "paucp", singfem, "pl") local pls = is_template and {base={{"{{{pl}}}", "tri"}}} or do_gender_number(data, args, {"pl", "cpl"}, colls, nil, false, "pl") -- Can manually specify which numbers are to appear, and exactly those -- numbers will appear. Otherwise, if any plurals given, plurals appear; -- if singulative given or derivable, it and dual and paucal will appear. if not parse_number_spec(data, args) then data.numbers = {} if args[1] ~= "-" then table.insert(data.numbers, "sing") end if #colls["base"] > 0 then table.insert(data.numbers, "coll") end if #dus["base"] > 0 then table.insert(data.numbers, "du") end if #paucs["base"] > 0 then table.insert(data.numbers, "pauc") end if #pls["base"] > 0 then table.insert(data.numbers, "pl") end end -- Generate the singulative, collective, dual, paucal and plural forms do_inflections_and_overrides(data, args, {{sings, "sing"}, {colls, "coll"}, {dus, "du"}, {paucs, "pauc"}, {pls, "pl"}}) -- Make the table return make_noun_table(data) end -- The implementation of the main entry point for adjective and -- gendered noun tables. function show_gendered(frame, isadj, pos) local args, origargs, data = init(frame:getParent().args) data.pos = pos data.numgens = function() local numgens = {} for _, gender in ipairs(data.allgenders) do for _, number in ipairs(data.numbers) do table.insert(numgens, gender .. "_" .. number) end end return numgens end data.allnumgens = {} for _, gender in ipairs(data.allgenders) do for _, number in ipairs(data.allnumbers) do table.insert(data.allnumgens, gender .. "_" .. number) end end parse_state_etc_spec(data, args) local msgs = get_heads(data, args, 'masculine singular') -- Always do all of these so cases like يَوْم الاِثْنَيْن work. -- See comment in show_noun(). local fsgs = do_gender_number(data, args, {"f"}, msgs, "f", true, "sg") local mdus = do_gender_number(data, args, {"d"}, msgs, "d", false, "du") local fdus = do_gender_number(data, args, {"fd"}, fsgs, "d", true, "du") local mpls = do_gender_number(data, args, {"pl", "cpl"}, msgs, isadj and "p" or nil, false, "pl") local fpls = do_gender_number(data, args, {"fpl", "cpl"}, fsgs, "fp", true, "pl") if isadj then parse_number_spec(data, args) else determine_noun_numbers(data, args, mpls) end -- Generate the singular, dual and plural forms do_inflections_and_overrides(data, args, {{msgs, "m_sg"}, {fsgs, "f_sg"}, {mdus, "m_du"}, {fdus, "f_du"}, {mpls, "m_pl"}, {fpls, "f_pl"}}) -- Make the table if isadj then return make_adj_table(data) else return make_gendered_noun_table(data) end end -- The main entry point for gendered noun tables. function export.show_gendered_noun(frame) return show_gendered(frame, false, "noun") end -- The main entry point for numeral tables. Same as using show_gendered_noun() -- with pos=numeral. function export.show_numeral(frame) return show_gendered(frame, false, "numeral") end -- The main entry point for adjective tables. function export.show_adj(frame) return show_gendered(frame, true, "adjective") end -- Inflection functions function do_translit(term) return (lang:transliterate(term)) or track("cant-translit") and BOGUS_CHAR end function split_arabic_tr(term) if term == "" then return "", "" elseif not rfind(term, "/") then return term, do_translit(term) else splitvals = rsplit(term, "/") if #splitvals ~= 2 then error("Must have at most one slash in a combined Arabic/translit expr: '" .. term .. "'") end return splitvals[1], splitvals[2] end end function reorder_shadda(word) -- shadda+short-vowel (including tanwīn vowels, i.e. -an -in -un) gets -- replaced with short-vowel+shadda during NFC normalisation, which -- MediaWiki does for all Unicode strings; however, it makes the -- detection process inconvenient, so undo it. word = rsub(word, "(" .. DIACRITIC_ANY_BUT_SH .. ")" .. SH, SH .. "%1") return word end -- Combine PREFIX, AR/TR, and ENDING in that order. PREFIX and ENDING -- can be of the form ARABIC/TRANSLIT. The Arabic and translit parts are -- separated out and grouped together, resulting in a string of the -- form ARABIC/TRANSLIT (TRANSLIT will always be present, computed -- automatically if not present in the source). The return value is actually a -- list of ARABIC/TRANSLIT strings because hamza resolution is applied to -- ARABIC, which may produce multiple outcomes (all of which will have the -- same TRANSLIT). function combine_with_ending(prefix, ar, tr, ending) local prefixar, prefixtr = split_arabic_tr(prefix) local endingar, endingtr = split_arabic_tr(ending) -- When calling hamza_seat(), leave out prefixes, which we expect to be -- clitics like وَ. (In case the prefix is a separate word, it won't matter -- whether we include it in the text passed to hamza_seat().) allar = hamza_seat(ar .. endingar) -- Convert ...īān to ...iyān in case of stems ending in -ī or -ū -- (e.g. kubrī "bridge"). if rfind(endingtr, "^[aeiouāēīōū]") then if rfind(tr, "ī$") then tr = rsub(tr, "ī$", "iy") elseif rfind(tr, "ū$") then tr = rsub(tr, "ū$", "uw") end end tr = prefixtr .. tr .. endingtr allartr = {} for _, arval in ipairs(allar) do table.insert(allartr, prefixar .. arval .. "/" .. tr) end return allartr end -- Combine PREFIX, STEM/TR and ENDING in that order and insert into the -- list of items in DATA[KEY], initializing it if empty and making sure -- not to insert duplicates. ENDING can be a list of endings, will be -- distributed over the remaining parts. PREFIX and/or ENDING can be -- of the form ARABIC/TRANSLIT (the stem is already split into Arabic STEM -- and Latin TR). Note that what's inserted into DATA[KEY] is actually a -- list of ARABIC/TRANSLIT strings; if more than one is present in the list, -- they represent hamza variants, i.e. different ways of writing a hamza -- sound, such as مُبْتَدَؤُون vs. مُبْتَدَأُون (see init_data()). function add_inflection(data, key, prefix, stem, tr, ending) if data.forms[key] == nil then data.forms[key] = {} end if type(ending) ~= "table" then ending = {ending} end for _, endingval in ipairs(ending) do insert_if_not(data.forms[key], combine_with_ending(prefix, stem, tr, endingval)) end end -- Form inflections from combination of STEM, with transliteration TR, -- and ENDINGS (and definite article where necessary, plus any specified -- prefixes) and store in DATA, for the number or gender/number -- determined by MOD ("", "mod_", "mod2_", etc.; see call_inflection()) and -- NUMGEN ("sg", "du", "pl", or "m_sg", "f_pl", etc. for adjectives). ENDINGS -- is an array of 15 values, each of which is a string or array of -- alternatives. The order of ENDINGS is indefinite nom, acc, gen; definite -- nom, acc, gen; construct-state nom, acc, gen; informal indefinite, definite, -- construct; lemma indefinite, definite, construct. (Normally the lemma is -- based off of the indefinite, but if the inflection has been restricted to -- particular states, it comes from one of those states, in the order -- indefinite, definite, construct.) See also add_inflection() for more info -- on exactly what is inserted into DATA. function add_inflections(stem, tr, data, mod, numgen, endings) stem = canon_hamza(stem) assert(#endings == 15) local ismod = mod ~= "" -- If working on modifier and modN_numgen= is given, it better agree with -- NUMGEN; the case where it doesn't agree should have been caught in -- call_inflections(). if ismod and data[mod .. "numgen"] then assert(data[mod .. "numgen"] == numgen) end -- Return a list of combined of ar/tr forms, with the ending tacked on. -- There may be more than one form because of alternative hamza seats that -- may be supplied, e.g. مُبْتَدَؤُون or مُبْتَدَأُون (mubtadaʾūn "(grammatical) subjects"). local defstem, deftr if stem == "?" or data[mod .. "omitarticle"] then defstem = stem deftr = tr else -- apply sun-letter assimilation and hamzat al-wasl elision defstem = rsub("الْ" .. stem, "^الْ([سشصتثطدذضزژظنرل])", "ال%1ّ") defstem = rsub(defstem, "^الْ([اٱ])([ًٌٍَُِ])", "ال%2%1") deftr = rsub("al-" .. tr, "^al%-([sšṣtṯṭdḏḍzžẓnrḷ])", "a%1-%1") end -- For a given MOD spec, is the previous word (base or modifier) a noun? -- We assume the base is always a noun in this case, and otherwise -- look at the value of modN_idafa. local function prev_mod_is_noun(mod) if mod == "mod_" then return true end if mod == "mod2_" then return data["mod_idafa"] end modnum = assert_rsub(mod, "^mod([0-9]+)_$", "%1") modnum = modnum - 1 return data["mod" .. modnum .. "_idafa"] end local numgens = ismod and data[mod .. "numgen"] and data.numgens() or {numgen} -- "defcon" means definite adjective modifying construct state noun. We -- add a ... before the adjective (and after the construct-state noun) to -- indicate that a nominal modifier would go between noun and adjective. local stems = {ind = stem, def = defstem, con = stem, defcon = "... " .. defstem} local trs = {ind = tr, def = deftr, con = tr, defcon = "... " .. deftr} for _, ng in ipairs(numgens) do for _, state in ipairs(data.allstates) do for _, case in ipairs(data.allcases_with_lemma) do -- We are generating the inflections for STATE, but sometimes -- we want to use the inflected form of a different state, e.g. -- if modN_state= or basestate= is set to some particular state. -- If we're dealing with an adjectival modifier, then in -- place of "con" we use "defcon" if immediately after a noun -- (see comment above), else "def". local thestate = ismod and data[mod .. "state"] or ismod and not data[mod .. "idafa"] and state == "con" and (prev_mod_is_noun(mod) and "defcon" or "def") or not ismod and data.basestate or state local is_lemmainf = case == "lemma" or case == "inf" -- Don't substitute value of modcase for lemma/informal "cases" local thecase = is_lemmainf and case or ismod and data[mod .. "case"] or case add_inflection(data, mod .. case .. "_" .. ng .. "_" .. state, data[mod .. "prefix"] or "", stems[thestate], trs[thestate], endings[data.statecases[thestate][thecase]]) end end end end -- Insert into a category and a type variable (e.g. m_sg_type) for the -- declension type of a particular declension (e.g. masculine singular for -- adjectives). MOD and NUMGEN are as in call_inflection(). CATVALUE is the -- category and ENGVALUE is the English description of the declension type. -- In these values, NOUN is replaced with either "noun" or "adjective", -- SINGULAR is replaced with the English equivalent of the number in NUMGEN -- (e.g. "singular", "dual" or "plural") while BROKSING is the same but uses -- "broken plural" in place of "plural" and "broken paucal" in place of -- "paucal". function insert_cat(data, mod, numgen, catvalue, engvalue) local singpl = data.engnumbers[rsub(numgen, "^.*_", "")] assert(singpl ~= nil) local broksingpl = rsub(singpl, "plural", "broken plural") broksingpl = rsub(broksingpl, "paucal", "broken paucal") if rfind(broksingpl, "broken plural") and (rfind(catvalue, "BROKSING") or rfind(engvalue, "BROKSING")) then table.insert(data.categories, "Arabic " .. data.pos .. "s with broken plural") end if rfind(catvalue, "irregular") or rfind(engvalue, "irregular") then table.insert(data.categories, "Arabic irregular " .. data.pos .. "s") end catvalue = rsub(catvalue, "NOUN", data.pos) catvalue = rsub(catvalue, "SINGULAR", singpl) catvalue = rsub(catvalue, "BROKSING", broksingpl) engvalue = rsub(engvalue, "NOUN", data.pos) engvalue = rsub(engvalue, "SINGULAR", singpl) engvalue = rsub(engvalue, "BROKSING", broksingpl) -- add links to specialised grammatical terms engvalue = rsub(engvalue, "triptote", "[[triptote]]") engvalue = rsub(engvalue, "diptote", "[[diptote]]") engvalue = rsub(engvalue, "broken plural", "BBB") engvalue = rsub(engvalue, "sound plural", "SSS") engvalue = rsub(engvalue, "broken", "[[broken plural|broken]]") engvalue = rsub(engvalue, "sound", "[[sound plural|sound]]") engvalue = rsub(engvalue, "BBB", "[[broken plural]]") engvalue = rsub(engvalue, "SSS", "[[sound plural]]") if mod == "" and catvalue ~= "" then insert_if_not(data.categories, catvalue) end if engvalue ~= "" then local key = mod .. numgen .. "_type" if data.forms[key] == nil then data.forms[key] = {} end insert_if_not(data.forms[key], engvalue) end if contains(data.states, "def") and not contains(data.states, "ind") then insert_if_not(data.categories, "Arabic definite " .. data.pos .. "s") end end -- Return true if we're handling modifier inflections and the modifier's -- case is limited to an oblique case (gen or acc; typically genitive, -- in an ʾidāfa construction). This is used when returning lemma -- inflections -- the modifier part of the lemma should agree in case -- with modifier's case if it's restricted in case. function mod_oblique(mod, data) return mod ~= "" and data[mod .. "case"] and ( data[mod .. "case"] == "acc" or data[mod .. "case"] == "gen") end -- Similar to mod_oblique but specifically when the modifier case is -- limited to the accusative (which is rare or nonexistent in practice). function mod_acc(mod, data) return mod ~= "" and data[mod .. "case"] and data[mod .. "case"] == "acc" end -- Handle triptote and diptote inflections function triptote_diptote(stem, tr, data, mod, numgen, is_dip, lc) -- Remove any case ending if rfind(stem, "[" .. UN .. U .. "]$") then stem = rsub(stem, "[" .. UN .. U .. "]$", "") tr = rsub(tr, "un?$", "") end -- special-case for صلوة pronounced ṣalāh; check translit local is_aah = rfind(stem, TAM .. "$") and rfind(tr, "āh$") if rfind(stem, TAM .. "$") then if rfind(tr, "h$") then tr = rsub(tr, "h$", "t") elseif not rfind(tr, "t$") then tr = tr .. "t" end end add_inflections(stem, tr, data, mod, numgen, {is_dip and U or UN, is_dip and A or AN .. ((rfind(stem, "[" .. HAMZA_ON_ALIF .. TAM .. "]$") or rfind(stem, "[" .. AMAD .. ALIF .. "]" .. HAMZA .. "$") ) and "" or ALIF), is_dip and A or IN, U, A, I, lc and UU or U, lc and AA or A, lc and II or I, {}, {}, {}, -- omit informal inflections {}, {}, {}, -- omit lemma inflections }) -- add category and informal and lemma inflections local tote = lc and "long construct" or is_dip and "diptote" or "triptote" local singpl_tote = "BROKSING " .. tote local cat_prefix = "Arabic NOUNs with " .. tote .. " BROKSING" -- since we're checking translit for -āh we probably don't need to -- check stem too if is_aah or rfind(stem, "[" .. AMAD .. ALIF .. "]" .. TAM .. "$") then add_inflections(stem, rsub(tr, "t$", ""), data, mod, numgen, {{}, {}, {}, {}, {}, {}, {}, {}, {}, "/t", "/t", "/t", -- informal pron. is -āt "/h", "/h", "/t", -- lemma uses -āh }) insert_cat(data, mod, numgen, cat_prefix .. " in -āh", singpl_tote .. " in " .. make_link(HYPHEN .. AAH)) elseif rfind(stem, TAM .. "$") then add_inflections(stem, rsub(tr, "t$", ""), data, mod, numgen, {{}, {}, {}, {}, {}, {}, {}, {}, {}, "", "", "/t", "", "", "/t", }) insert_cat(data, mod, numgen, cat_prefix .. " in -a", singpl_tote .. " in " .. make_link(HYPHEN .. AH)) elseif lc then add_inflections(stem, tr, data, mod, numgen, {{}, {}, {}, {}, {}, {}, {}, {}, {}, "", "", UU, "", "", UU, }) insert_cat(data, mod, numgen, cat_prefix, singpl_tote) else -- also special-case the nisba ending, which has an informal -- pronunciation. if rfind(stem, IY .. SH .. "$") then local infstem = rsub(stem, SH .. "$", "") local inftr = rsub(tr, "iyy$", "ī") -- add informal and lemma inflections separately add_inflections(infstem, inftr, data, mod, numgen, {{}, {}, {}, {}, {}, {}, {}, {}, {}, "", "", "", {}, {}, {}, }) add_inflections(stem, tr, data, mod, numgen, {{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, "", "", "", }) else add_inflections(stem, tr, data, mod, numgen, {{}, {}, {}, {}, {}, {}, {}, {}, {}, "", "", "", "", "", "", }) end insert_cat(data, mod, numgen, "Arabic NOUNs with basic " .. tote .. " BROKSING", "basic " .. singpl_tote) end end -- Regular triptote inflections["tri"] = function(stem, tr, data, mod, numgen) triptote_diptote(stem, tr, data, mod, numgen, false) end -- Regular diptote inflections["di"] = function(stem, tr, data, mod, numgen) triptote_diptote(stem, tr, data, mod, numgen, true) end -- Elative and color/defect adjective: usually same as diptote, -- might be invariable function elative_color_defect(stem, tr, data, mod, numgen) if rfind(stem, "[" .. ALIF .. AMAQ .. "]$") then invariable(stem, tr, data, mod, numgen) else triptote_diptote(stem, tr, data, mod, numgen, true) end end -- Elative: usually same as diptote, might be invariable inflections["el"] = function(stem, tr, data, mod, numgen) elative_color_defect(stem, tr, data, mod, numgen) end -- Color/defect adjective: Same as elative inflections["cd"] = function(stem, tr, data, mod, numgen) elative_color_defect(stem, tr, data, mod, numgen) end -- Triptote with lengthened ending in the construct state inflections["lc"] = function(stem, tr, data, mod, numgen) triptote_diptote(stem, tr, data, mod, numgen, false, true) end function in_defective(stem, tr, data, mod, numgen, tri) if not rfind(stem, IN .. "$") then error("'in' declension stem should end in -in: '" .. stem .. "'") end stem = rsub(stem, IN .. "$", "") tr = rsub(tr, "in$", "") local acc_ind_ending = tri and IY .. AN .. ALIF or IY .. A add_inflections(stem, tr, data, mod, numgen, {IN, acc_ind_ending, IN, II, IY .. A, II, II, IY .. A, II, II, II, II, -- FIXME: What should happen with the lemma when modifier case -- is limited to the accusative and modifier state is e.g. definite? -- Should the lemma end in -iya or -ī? In practice this will rarely -- if ever happen. mod_acc(mod, data) and acc_ind_ending or IN, II, II, }) local tote = tri and "triptote" or "diptote" insert_cat(data, mod, numgen, "Arabic NOUNs with " .. tote .. " BROKSING in -in", "BROKSING " .. tote .. " in " .. make_link(HYPHEN .. IN)) end function detect_in_type(stem, ispl) if ispl and rfind(stem, "^" .. CONS .. AOPT .. CONS .. AOPTA .. CONS .. IN .. "$") then -- layālin return "diin" else -- other -in words return "triin" end end -- Defective in -in inflections["in"] = function(stem, tr, data, mod, numgen) in_defective(stem, tr, data, mod, numgen, detect_in_type(stem, rfind(numgen, "pl")) == "triin") end -- Defective in -in, force "triptote" variant inflections["triin"] = function(stem, tr, data, mod, numgen) in_defective(stem, tr, data, mod, numgen, true) end -- Defective in -in, force "diptote" variant inflections["diin"] = function(stem, tr, data, mod, numgen) in_defective(stem, tr, data, mod, numgen, false) end -- Defective in -an (comes in two variants, depending on spelling with tall alif or alif maqṣūra) inflections["an"] = function(stem, tr, data, mod, numgen) local tall_alif if rfind(stem, AN .. ALIF .. "$") then tall_alif = true stem = rsub(stem, AN .. ALIF .. "$", "") elseif rfind(stem, AN .. AMAQ .. "$") then tall_alif = false stem = rsub(stem, AN .. AMAQ .. "$", "") else error("Invalid stem for 'an' declension type: " .. stem) end tr = rsub(tr, "an$", "") if tall_alif then add_inflections(stem, tr, data, mod, numgen, {AN .. ALIF, AN .. ALIF, AN .. ALIF, AA, AA, AA, AA, AA, AA, AA, AA, AA, AN .. ALIF, AA, AA, }) else add_inflections(stem, tr, data, mod, numgen, {AN .. AMAQ, AN .. AMAQ, AN .. AMAQ, AAMAQ, AAMAQ, AAMAQ, AAMAQ, AAMAQ, AAMAQ, AAMAQ, AAMAQ, AAMAQ, AN .. AMAQ, AAMAQ, AAMAQ, }) end -- FIXME: Should we distinguish between tall alif and alif maqṣūra? insert_cat(data, mod, numgen, "Arabic NOUNs with BROKSING in -an", "BROKSING in " .. make_link(HYPHEN .. AN .. (tall_alif and ALIF or AMAQ))) end function invariable(stem, tr, data, mod, numgen) add_inflections(stem, tr, data, mod, numgen, {"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", }) insert_cat(data, mod, numgen, "Arabic NOUNs with invariable BROKSING", "BROKSING invariable") end -- Invariable in -ā (non-loanword type) inflections["inv"] = function(stem, tr, data, mod, numgen) invariable(stem, tr, data, mod, numgen) end -- Invariable in -ā (loanword type, behaving in the dual as if ending in -a, I think!) inflections["lwinv"] = function(stem, tr, data, mod, numgen) invariable(stem, tr, data, mod, numgen) end -- Duals inflections["d"] = function(stem, tr, data, mod, numgen) if rfind(stem, ALIF .. NI .. "?$") then stem = rsub(stem, AOPTA .. NI .. "?$", "") elseif rfind(stem, AMAD .. NI .. "?$") then stem = rsub(stem, AMAD .. NI .. "?$", HAMZA_PH) else error("Dual stem should end in -ān(i): '" .. stem .. "'") end tr = rsub(tr, "āni?$", "") local mo = mod_oblique(mod, data) add_inflections(stem, tr, data, mod, numgen, {AANI, AYNI, AYNI, AANI, AYNI, AYNI, AA, AYSK, AYSK, AYN, AYN, AYSK, mo and AYN or AAN, mo and AYN or AAN, mo and AYSK or AA, }) insert_cat(data, mod, numgen, "", "dual in " .. make_link(HYPHEN .. AANI)) end -- Sound masculine plural inflections["smp"] = function(stem, tr, data, mod, numgen) if not rfind(stem, UUNA .. "?$") then error("Sound masculine plural stem should end in -ūn(a): '" .. stem .. "'") end stem = rsub(stem, UUNA .. "?$", "") tr = rsub(tr, "ūna?$", "") local mo = mod_oblique(mod, data) add_inflections(stem, tr, data, mod, numgen, {UUNA, IINA, IINA, UUNA, IINA, IINA, UU, II, II, IIN, IIN, II, mo and IIN or UUN, mo and IIN or UUN, mo and II or UU, }) -- use SINGULAR because conceivably this might be used with the paucal -- instead of plural insert_cat(data, mod, numgen, "Arabic NOUNs with sound masculine SINGULAR", "sound masculine SINGULAR") end -- Sound feminine plural inflections["sfp"] = function(stem, tr, data, mod, numgen) if not rfind(stem, "[" .. ALIF .. AMAD .. "]" .. T .. UN .. "?$") then error("Sound feminine plural stem should end in -āt(un): '" .. stem .. "'") end stem = rsub(stem, UN .. "$", "") tr = rsub(tr, "un$", "") add_inflections(stem, tr, data, mod, numgen, {UN, IN, IN, U, I, I, U, I, I, "", "", "", "", "", "", }) -- use SINGULAR because this might be used with the paucal -- instead of plural insert_cat(data, mod, numgen, "Arabic NOUNs with sound feminine SINGULAR", "sound feminine SINGULAR") end -- Plural of defective in -an inflections["awnp"] = function(stem, tr, data, mod, numgen) if not rfind(stem, AWNA .. "?$") then error("'awnp' plural stem should end in -awn(a): '" .. stem .. "'") end stem = rsub(stem, AWNA .. "?$", "") tr = rsub(tr, "awna?$", "") local mo = mod_oblique(mod, data) add_inflections(stem, tr, data, mod, numgen, {AWNA, AYNA, AYNA, AWNA, AYNA, AYNA, AWSK, AYSK, AYSK, AYN, AYN, AYSK, mo and AYN or AWN, mo and AYN or AWN, mo and AYSK or AWSK, }) -- use SINGULAR because conceivably this might be used with the paucal -- instead of plural insert_cat(data, mod, numgen, "Arabic NOUNs with sound SINGULAR in -awna", "sound SINGULAR in " .. make_link(HYPHEN .. AWNA)) end -- Unknown inflections["?"] = function(stem, tr, data, mod, numgen) add_inflections("?", "?", data, mod, numgen, {"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", }) insert_cat(data, mod, numgen, "Arabic NOUNs with unknown SINGULAR", "SINGULAR unknown") end -- Detect declension of noun or adjective stem or lemma. We allow triptotes, -- diptotes and sound plurals to either come with ʾiʿrāb or not. We detect -- some cases where vowels are missing, when it seems fairly unambiguous to -- do so. ISFEM is true if we are dealing with a feminine stem (not -- currently used and needs to be rethought). NUM is "sg", "du", or "pl", -- depending on the number of the stem. -- -- POS is the part of speech, generally "noun" or "adjective". Used to -- distinguish nouns and adjectives of the فَعْلَان type. There are nouns of -- this type and they generally are triptotes, e.g. قَطْرَان "tar" -- and شَيْطَان "devil". An additional complication is that the user can set -- the POS to something else, like "numeral". We don't use this POS for -- modifiers, where we determine whether they are noun-like or adjective-like -- according to whether mod_idafa= is true. -- -- Some unexpectedly diptote nouns/adjectives: -- -- jiʿrān in ʾabū jiʿrān "dung beetle" -- distributive numbers: ṯunāʾ "two at a time", ṯulāṯ/maṯlaṯ "three at a time", -- rubāʿ "four at a time" (not a regular diptote pattern, cf. triptote -- junāḥ "misdemeanor, sin", nujār "origin, root", nuḥām "flamingo") -- jahannam (f.) "hell" -- many names: jilliq/jillaq "Damascus", judda/jidda "Jedda", jibrīl (and -- variants) "Gabriel", makka "Mecca", etc. -- jibriyāʾ "pride" -- kibriyāʾ "glory, pride" -- babbaḡāʾ "parrot" -- ʿayāyāʾ "incapable, tired" -- suwaidāʾ "black bile, melancholy" -- Note also: ʾajhar "day-blind" (color-defect) and ʾajhar "louder" (elative) function export.detect_type(stem, isfem, num, pos) local function dotrack(word) track(word) track(word .. "/" .. pos) return true end -- Not strictly necessary because the caller (stem_and_type) already -- reorders, but won't hurt, and may be necessary if this function is -- called from an external caller. stem = reorder_shadda(stem) local origstem = stem -- So that we don't get tripped up by alif madda, we replace alif madda -- with the sequence hamza + fatḥa + alif before the regexps below. stem = rsub(stem, AMAD, HAMZA .. AA) if num == "du" then if rfind(stem, ALIF .. NI .. "?$") then return "d" else error("Malformed stem for dual, should end in the nominative dual ending -ān(i): '" .. origstem .. "'") end end if rfind(stem, IN .. "$") then -- -in words return detect_in_type(stem, num == "pl") elseif rfind(stem, AN .. "[" .. ALIF .. AMAQ .. "]$") then return "an" elseif rfind(stem, AN .. "$") then error("Malformed stem, fatḥatan should be over second-to-last letter: " .. origstem) elseif num == "pl" and rfind(stem, AW .. SKOPT .. N .. AOPT .. "$") then return "awnp" elseif num == "pl" and rfind(stem, ALIF .. T .. UNOPT .. "$") and -- Avoid getting tripped up by plurals like ʾawqāt "times", -- ʾaḥwāt "fishes", ʾabyāt "verses", ʾazyāt "oils", ʾaṣwāt "voices", -- ʾamwāt "dead (pl.)". not rfind(stem, HAMZA_ON_ALIF .. A .. CONS .. SK .. CONS .. AAT .. UNOPT .. "$") then return "sfp" elseif num == "pl" and rfind(stem, W .. N .. AOPT .. "$") and -- Avoid getting tripped up by plurals like ʿuyūn "eyes", -- qurūn "horns" (note we check for U between first two consonants -- so we correctly ignore cases like sinūn "hours" (from sana), -- riʾūn "lungs" (from riʾa) and banūn "sons" (from ibn). not rfind(stem, "^" .. CONS .. U .. CONS .. UUN .. AOPT .. "$") then return "smp" elseif rfind(stem, UN .. "$") then -- explicitly specified triptotes (we catch sound feminine plurals above) return "tri" elseif rfind(stem, U .. "$") then -- explicitly specified diptotes return "di" elseif -- num == "pl" and ( -- various diptote plural patterns; these are diptote even in the singular (e.g. yanāyir "January", falāfil "falafel", tuʾabāʾ "yawn, fatigue" -- currently we sometimes end up with such plural patterns in the "singular" in a singular -- ʾidāfa construction with plural modifier. (FIXME: These should be fixed to the correct number.) rfind(stem, "^" .. CONS .. AOPT .. CONS .. AOPTA .. CONS .. IOPT .. Y .. "?" .. CONS .. "$") and dotrack("fawaakih") or -- fawākih, daqāʾiq, makātib, mafātīḥ rfind(stem, "^" .. CONS .. AOPT .. CONS .. AOPTA .. CONS .. SH .. "$") and not rfind(stem, "^" .. T) and dotrack("mawaadd") or -- mawādd, maqāmm, ḍawāll; exclude t- so we don't catch form-VI verbal nouns like taḍādd (HACK!!!) rfind(stem, "^" .. CONS .. U .. CONS .. AOPT .. CONS .. AOPTA .. HAMZA .. "$") and dotrack("wuzaraa") or -- wuzarāʾ "ministers", juhalāʾ "ignorant (pl.)" rfind(stem, ELCD_START .. SKOPT .. CONS .. IOPT .. CONS .. AOPTA .. HAMZA .. "$") and dotrack("asdiqaa") or -- ʾaṣdiqāʾ rfind(stem, ELCD_START .. IOPT .. CONS .. SH .. AOPTA .. HAMZA .. "$") and dotrack("aqillaa") -- ʾaqillāʾ, ʾajillāʾ "important (pl.)", ʾaḥibbāʾ "lovers" ) then return "di" elseif num == "sg" and ( -- diptote singular patterns (nouns/adjectives) rfind(stem, "^" .. CONS .. A .. CONS .. SK .. CONS .. AOPTA .. HAMZA .. "$") and dotrack("qamraa") or -- qamrāʾ "moon-white, moonlight"; baydāʾ "desert"; ṣaḥrāʾ "desert-like, desert"; tayhāʾ "trackless, desolate region"; not pl. to avoid catching e.g. ʾabnāʾ "sons", ʾaḥmāʾ "fathers-in-law", ʾamlāʾ "steppes, deserts" (pl. of malan), ʾanbāʾ "reports" (pl. of nabaʾ) rfind(stem, ELCD_START .. SK .. CONS .. A .. CONS .. "$") and dotrack("abyad") or -- ʾabyaḍ "white", ʾakbar "greater"; FIXME nouns like ʾaʿzab "bachelor", ʾaḥmad "Ahmed" but not ʾarnab "rabbit", ʾanjar "anchor", ʾabjad "abjad", ʾarbaʿ "four", ʾandar "threshing floor" (cf. diptote ʾandar "rarer") rfind(stem, ELCD_START .. A .. CONS .. SH .. "$") and dotrack("alaff") or -- ʾalaff "plump", ʾaḥabb "more desirable" -- do the following on the origstem so we can check specifically for alif madda rfind(origstem, "^" .. AMAD .. CONS .. A .. CONS .. "$") and dotrack("aalam") -- ʾālam "more painful", ʾāḵar "other" ) then return "di" elseif num == "sg" and pos == "adjective" and ( -- diptote singular patterns (adjectives) rfind(stem, "^" .. CONS .. A .. CONS .. SK .. CONS .. AOPTA .. N .. "$") and dotrack("kaslaan") or -- kaslān "lazy", ʿaṭšān "thirsty", jawʿān "hungry", ḡaḍbān "angry", tayhān "wandering, perplexed"; but not nouns like qaṭrān "tar", šayṭān "devil", mawtān "plague", maydān "square" -- rfind(stem, "^" .. CONS .. A .. CONS .. SH .. AOPTA .. N .. "$") and dotrack("laffaa") -- excluded because of too many false positives e.g. ḵawwān "disloyal", not to mention nouns like jannān "gardener"; only diptote example I can find is ʿayyān "incapable, weary" (diptote per Lane but not Wehr) rfind(stem, "^" .. CONS .. A .. CONS .. SH .. AOPTA .. HAMZA .. "$") and dotrack("laffaa") -- laffāʾ "plump (fem.)"; but not nouns like jarrāʾ "runner", ḥaddāʾ "camel driver", lawwāʾ "wryneck" ) then return "di" elseif rfind(stem, AMAQ .. "$") then -- kaslā, ḏikrā (spelled with alif maqṣūra) return "inv" elseif rfind(stem, "[" .. ALIF .. SK .. "]" .. Y .. AOPTA .. "$") then -- dunyā, hadāyā (spelled with tall alif after yāʾ) return "inv" elseif rfind(stem, ALIF .. "$") then -- kāmērā, lībiyā (spelled with tall alif; we catch dunyā and hadāyā above) return "lwinv" elseif rfind(stem, II .. "$") then -- cases like كُوبْرِي kubrī "bridge" and صَوَانِي ṣawānī pl. of ṣīniyya; modern words that would probably end with -in dotrack("ii") return "inv" elseif rfind(stem, UU .. "$") then -- FIXME: Does this occur? Check the tracking dotrack("uu") return "inv" else return "tri" end end -- Replace hamza (of any sort) at the end of a word, possibly followed by -- a nominative case ending or -in or -an, with HAMZA_PH, and replace alif -- madda at the end of a word with HAMZA_PH plus fatḥa + alif. To undo these -- changes, use hamza_seat(). function canon_hamza(word) word = rsub(word, AMAD .. "$", HAMZA_PH .. AA) word = rsub(word, HAMZA_ANY .. "([" .. UN .. U .. IN .. "]?)$", HAMZA_PH .. "%1") word = rsub(word, HAMZA_ANY .. "(" .. AN .. "[" .. ALIF .. AMAQ .. "])$", HAMZA_PH .. "%1") return word end -- Supply the appropriate hamza seat(s) for a placeholder hamza. function hamza_seat(word) if rfind(word, HAMZA_PH) then -- optimization to avoid many regexp substs return ar_utilities.process_hamza(word) end return {word} end --[[ -- Supply the appropriate hamza seat for a placeholder hamza in a combined -- Arabic/translation expression. function split_and_hamza_seat(word) if rfind(word, HAMZA_PH) then -- optimization to avoid many regexp substs local ar, tr = split_arabic_tr(word) -- FIXME: Do something with all values returned ar = ar_utilities.process_hamza(ar)[1] return ar .. "/" .. tr end return word end --]] -- Return stem and type of an argument given the singular stem and whether -- this is a plural argument. WORD may be of the form ARABIC, ARABIC/TR, -- ARABIC:TYPE, ARABIC/TR:TYPE, or TYPE, for Arabic stem ARABIC with -- transliteration TR and of type (i.e. declension) TYPE. If the type -- is omitted, it is auto-detected using detect_type(). If the transliteration -- is omitted, it is auto-transliterated from the Arabic. If only the type -- is present, it is a sound plural type ("sf", "sm" or "awn"), -- in which case the stem and translit are generated from the singular by -- regular rules. SG may be of the form ARABIC/TR or ARABIC. ISFEM is true -- if WORD is a feminine stem. NUM is either "sg", "du" or "pl" according to -- the number of the stem. The return value will be in the ARABIC/TR format. -- -- POS is the part of speech, generally "noun" or "adjective". Used to -- distinguish nouns and adjectives of the فَعْلَان type. There are nouns of -- this type and they generally are triptotes, e.g. قَطْرَان "tar" -- and شَيْطَان "devil". An additional complication is that the user can set -- the POS to something else, like "numeral". We don't use this POS for -- modifiers, where we determine whether they are noun-like or adjective-like -- according to whether mod_idafa= is true. function export.stem_and_type(word, sg, sgtype, isfem, num, pos) local rettype = nil if rfind(word, ":") then local split = rsplit(word, ":") if #split > 2 then error("More than one colon found in argument: '" .. word .. "'") end word, rettype = split[1], split[2] end local ar, tr = split_arabic_tr(word) -- Need to reorder shaddas here so that shadda at the end of a stem -- followed by ʾiʿrāb or a plural ending or whatever can get processed -- correctly. This processing happens in various places so make sure -- we return the reordered Arabic in all circumstances. ar = reorder_shadda(ar) local artr = ar .. "/" .. tr -- Now return split-out ARABIC/TR and TYPE, with shaddas reordered in -- the Arabic. if rettype then return artr, rettype end -- Likewise, do shadda reordering for the singular. local sgar, sgtr = split_arabic_tr(sg) sgar = reorder_shadda(sgar) -- Apply a substitution to the singular Arabic and translit. If a -- substitution could be made, return the combined ARABIC/TR with -- substitutions made; else, return nil. The Arabic has ARFROM -- replaced with ARTO, while the translit has TRFROM replaced with -- TRTO, and if that doesn't match, replace TRFROM2 with TRTO2. local function sub(arfrom, arto, trfrom, trto, trfrom2, trto2, trfrom3, trto3) if rfind(sgar, arfrom) then local arret = rsub(sgar, arfrom, arto) local trret = sgtr if rfind(sgtr, trfrom) then trret = rsub(sgtr, trfrom, trto) elseif trfrom2 and rfind(sgtr, trfrom2) then trret = rsub(sgtr, trfrom2, trto2) elseif trfrom3 and rfind(sgtr, trfrom3) then trret = rsub(sgtr, trfrom3, trto3) elseif not rfind(sgtr, BOGUS_CHAR) then error("Transliteration '" .. sgtr .."' does not have same ending as Arabic '" .. sgar .. "'") end return arret .. "/" .. trret else return nil end end if (num ~= "sg" or not isfem) and (word == "elf" or word == "cdf" or word == "intf" or word == "rf" or word == "f") then error("Inference of form for inflection type '" .. word .. "' only allowed in singular feminine") end if num ~= "du" and word == "d" then error("Inference of form for inflection type '" .. word .. "' only allowed in dual") end if num ~= "pl" and (word == "sfp" or word == "smp" or word == "awnp" or word == "cdp" or word == "sp" or word == "fp" or word == "p") then error("Inference of form for inflection type '" .. word .. "' only allowed in plural") end local function is_intensive_adj(ar) return rfind(ar, "^" .. CONS .. A .. CONS .. SK .. CONS .. AOPTA .. N .. UOPT .. "$") or rfind(ar, "^" .. CONS .. A .. CONS .. SK .. AMAD .. N .. UOPT .. "$") or rfind(ar, "^" .. CONS .. A .. CONS .. SH .. AOPTA .. N .. UOPT .. "$") end local function is_feminine_cd_adj(ar) return pos == "adjective" and (rfind(ar, "^" .. CONS .. A .. CONS .. SK .. CONS .. AOPTA .. HAMZA .. UOPT .. "$") or -- ʾḥamrāʾ/ʿamyāʾ/bayḍāʾ rfind(ar, "^" .. CONS .. A .. CONS .. SH .. AOPTA .. HAMZA .. UOPT .. "$") -- laffāʾ ) end local function is_elcd_adj(ar) return rfind(ar, ELCD_START .. SK .. CONS .. A .. CONS .. UOPT .. "$") or -- ʾabyaḍ "white", ʾakbar "greater" rfind(ar, ELCD_START .. A .. CONS .. SH .. UOPT .. "$") or -- ʾalaff "plump", ʾaqall "fewer" rfind(ar, ELCD_START .. SK .. CONS .. AAMAQ .. "$") or -- ʾaʿmā "blind", ʾadnā "lower" rfind(ar, "^" .. AMAD .. CONS .. A .. CONS .. UOPT .. "$") -- ʾālam "more painful", ʾāḵar "other" end if word == "?" or (rfind(word, "^[a-z][a-z]*$") and sgtype == "?") then --if 'word' is a type, actual value inferred from sg; if sgtype is ?, --propagate it to all derived types return "", "?" end if word == "intf" then if not is_intensive_adj(sgar) then error("Singular stem not in CACCān form: " .. sgar) end local ret = ( sub(AMAD .. N .. UOPT .. "$", AMAD, "nu?$", "") or -- ends in -ʾān sub(AOPTA .. N .. UOPT .. "$", AMAQ, "nu?$", "") -- ends in -ān ) return ret, "inv" end if word == "elf" then local ret = ( sub(ELCD_START .. SK .. "[" .. Y .. W .. "]" .. A .. CONSPAR .. UOPT .. "$", "%1" .. UU .. "%2" .. AMAQ, "ʔa(.)[yw]a(.)u?", "%1ū%2ā") or -- ʾajyad sub(ELCD_START .. SK .. CONSPAR .. A .. CONSPAR .. UOPT .. "$", "%1" .. U .. "%2" .. SK .. "%3" .. AMAQ, "ʔa(.)(.)a(.)u?", "%1u%2%3ā") or -- ʾakbar sub(ELCD_START .. A .. CONSPAR .. SH .. UOPT .. "$", "%1" .. U .. "%2" .. SH .. AMAQ, "ʔa(.)a(.)%2u?", "%1u%2%2ā") or -- ʾaqall sub(ELCD_START .. SK .. CONSPAR .. AAMAQ .. "$", "%1" .. U .. "%2" .. SK .. Y .. ALIF, "ʔa(.)(.)ā", "%1u%2yā") or -- ʾadnā sub("^" .. AMAD .. CONSPAR .. A .. CONSPAR .. UOPT .. "$", HAMZA_ON_ALIF .. U .. "%1" .. SK .. "%2" .. AMAQ, "ʔā(.)a(.)u?", "ʔu%1%2ā") -- ʾālam "more painful", ʾāḵar "other" ) if not ret then error("Singular stem not an elative adjective: " .. sgar) end return ret, "inv" end if word == "cdf" then local ret = ( sub(ELCD_START .. SK .. CONSPAR .. A .. CONSPAR .. UOPT .. "$", "%1" .. A .. "%2" .. SK .. "%3" .. AA .. HAMZA, "ʔa(.)(.)a(.)u?", "%1a%2%3āʔ") or -- ʾaḥmar sub(ELCD_START .. A .. CONSPAR .. SH .. UOPT .. "$", "%1" .. A .. "%2" .. SH .. AA .. HAMZA, "ʔa(.)a(.)%2u?", "%1a%2%2āʔ") or -- ʾalaff sub(ELCD_START .. SK .. CONSPAR .. AAMAQ .. "$", "%1" .. A .. "%2" .. SK .. Y .. AA .. HAMZA, "ʔa(.)(.)ā", "%1a%2yāʔ") -- ʾaʿmā ) if not ret then error("Singular stem not a color/defect adjective: " .. sgar) end return ret, "cd" -- so plural will be correct end -- Regular feminine -- add ة, possibly with stem modifications if word == "rf" then sgar = canon_hamza(sgar) if rfind(sgar, TAM .. UNUOPT .. "$") then --Don't do this or we have problems when forming singulative from --collective with a construct modifier that's feminine --error("Singular stem is already feminine: " .. sgar) return sgar .. "/" .. sgtr, "tri" end local ret = ( sub(AN .. "[" .. ALIF .. AMAQ .. "]$", AAH, "an$", "āh") or -- ends in -an sub(IN .. "$", IY .. AH, "in$", "iya") or -- ends in -in sub(AOPT .. "[" .. ALIF .. AMAQ .. "]$", AAH, "ā$", "āh") or -- ends in alif or alif maqṣūra -- We separate the ʾiʿrāb and no-ʾiʿrāb cases even though we can -- do a single Arabic regexp to cover both because we want to -- remove u(n) from the translit only when ʾiʿrāb is present to -- lessen the risk of removing -un in the actual stem. We also -- allow for cases where the ʾiʿrāb is present in Arabic but not -- in translit. sub(UNU .. "$", AH, "un?$", "a", "$", "a") or -- anything else + -u(n) sub("$", AH, "$", "a") -- anything else ) return ret, "tri" end if word == "f" then if sgtype == "cd" then return export.stem_and_type("cdf", sg, sgtype, true, "sg", pos) elseif sgtype == "el" then return export.stem_and_type("elf", sg, sgtype, true, "sg", pos) elseif sgtype =="di" and is_intensive_adj(sgar) then return export.stem_and_type("intf", sg, sgtype, true, "sg", pos) elseif sgtype == "di" and is_elcd_adj(sgar) then -- If form is elative or color-defect, we don't know which of -- the two it is, and each has a special feminine which isn't -- the regular "just add ة", so shunt to unknown. This will -- ensure that ?'s appear in place of the inflection -- also -- for dual and plural. return export.stem_and_type("?", sg, sgtype, true, "sg", pos) else return export.stem_and_type("rf", sg, sgtype, true, "sg", pos) end end if word == "rm" then sgar = canon_hamza(sgar) --Don't do this or we have problems when forming collective from --singulative with a construct modifier that's not feminine, --e.g. شَجَرَة التُفَّاح --if not rfind(sgar, TAM .. UNUOPT .. "$") then -- error("Singular stem is not feminine: " .. sgar) --end local ret = ( sub(AAH .. UNUOPT .. "$", AN .. AMAQ, "ātun?$", "an", "ā[ht]$", "an") or -- in -āh sub(IY .. AH .. UNUOPT .. "$", IN, "iyatun?$", "in", "iya$", "in") or -- ends in -iya sub(AOPT .. TAM .. UNUOPT .. "$", "", "atun?$", "", "a$", "") or --ends in -a sub("$", "", "$", "") -- do nothing ) return ret, "tri" end if word == "m" then -- FIXME: handle cd (color-defect) -- FIXME: handle el (elative) -- FIXME: handle int (intensive) return export.stem_and_type("rm", sg, sgtype, false, "sg", pos) end -- The plural used for feminine adjectives. If the singular type is -- color/defect or it looks like a feminine color/defect adjective, -- use color/defect plural. Otherwise shunt to sound feminine plural. if word == "fp" then if sgtype == "cd" or is_feminine_cd_adj(sgar) then return export.stem_and_type("cdp", sg, sgtype, true, "pl", pos) else return export.stem_and_type("sfp", sg, sgtype, true, "pl", pos) end end if word == "sp" then if sgtype == "cd" then return export.stem_and_type("cdp", sg, sgtype, isfem, "pl", pos) elseif isfem then return export.stem_and_type("sfp", sg, sgtype, true, "pl", pos) elseif sgtype == "an" then return export.stem_and_type("awnp", sg, sgtype, false, "pl", pos) else return export.stem_and_type("smp", sg, sgtype, false, "pl", pos) end end -- Conservative plural, as used for masculine plural adjectives. -- If singular type is color-defect, shunt to color-defect plural; else -- shunt to unknown, so ? appears in place of the inflections. if word == "p" then if sgtype == "cd" then return export.stem_and_type("cdp", sg, sgtype, isfem, "pl", pos) else return export.stem_and_type("?", sg, sgtype, isfem, "pl", pos) end end -- Special plural used for paucal plurals of singulatives. If ends in -ة -- (most common), use strong feminine plural; if ends with -iyy (next -- most common), use strong masculine plural; ends default to "p" -- (conservative plural). if word == "paucp" then if rfind(sgar, TAM .. UNUOPT .. "$") then return export.stem_and_type("sfp", sg, sgtype, true, "pl", pos) elseif rfind(sgar, IY .. SH .. UNUOPT .. "$") then return export.stem_and_type("smp", sg, sgtype, false, "pl", pos) else return export.stem_and_type("p", sg, sgtype, isfem, "pl", pos) end end if word == "d" then sgar = canon_hamza(sgar) local ret = ( sub(AN .. "[" .. ALIF .. AMAQ .. "]$", AY .. AAN, "an$", "ayān") or -- ends in -an sub(IN .. "$", IY .. AAN, "in$", "iyān") or -- ends in -in sgtype == "lwinv" and sub(AOPTA .. "$", AT .. AAN, "[āa]$", "atān") or -- lwinv, ends in alif; allow translit with short -a sub(AOPT .. "[" .. ALIF .. AMAQ .. "]$", AY .. AAN, "ā$", "ayān") or -- ends in alif or alif maqṣūra -- We separate the ʾiʿrāb and no-ʾiʿrāb cases even though we can -- do a single Arabic regexp to cover both because we want to -- remove u(n) from the translit only when ʾiʿrāb is present to -- lessen the risk of removing -un in the actual stem. We also -- allow for cases where the ʾiʿrāb is present in Arabic but not -- in translit. -- -- NOTE: Collapsing the "h$" and "$" cases into "h?$" doesn't work -- in the case of words ending in -āh, which end up having the -- translit end in -tāntān. sub(TAM .. UNU .. "$", T .. AAN, "[ht]un?$", "tān", "h$", "tān", "$", "tān") or -- ends in tāʾ marbuṭa + -u(n) sub(TAM .. "$", T .. AAN, "h$", "tān", "$", "tān") or -- ends in tāʾ marbuṭa -- Same here as above sub(UNU .. "$", AAN, "un?$", "ān", "$", "ān") or -- anything else + -u(n) sub("$", AAN, "$", "ān") -- anything else ) return ret, "d" end -- Strong feminine plural in -āt, possibly with stem modifications if word == "sfp" then sgar = canon_hamza(sgar) sgar = rsub(sgar, AMAD .. "(" .. TAM .. UNUOPT .. ")$", HAMZA_PH .. AA .. "%1") sgar = rsub(sgar, HAMZA_ANY .. "(" .. AOPT .. TAM .. UNUOPT .. ")$", HAMZA_PH .. "%1") local ret = ( sub(AOPTA .. TAM .. UNUOPT .. "$", AYAAT, "ā[ht]$", "ayāt", "ātun?$", "ayāt") or -- ends in -āh sub(AOPT .. TAM .. UNUOPT .. "$", AAT, "a$", "āt", "atun?$", "āt") or -- ends in -a sub(AN .. "[" .. ALIF .. AMAQ .. "]$", AYAAT, "an$", "ayāt") or -- ends in -an sub(IN .. "$", IY .. AAT, "in$", "iyāt") or -- ends in -in sgtype == "inv" and ( sub(AOPT .. "[" .. ALIF .. AMAQ .. "]$", AYAAT, "ā$", "ayāt") -- ends in alif or alif maqṣūra ) or sgtype == "lwinv" and ( sub(AOPTA .. "$", AAT, "[āa]$", "āt") -- loanword ending in tall alif; allow translit with short -a ) or -- We separate the ʾiʿrāb and no-ʾiʿrāb cases even though we can -- do a single Arabic regexp to cover both because we want to -- remove u(n) from the translit only when ʾiʿrāb is present to -- lessen the risk of removing -un in the actual stem. We also -- allow for cases where the ʾiʿrāb is present in Arabic but not -- in translit. sub(UNU .. "$", AAT, "un?$", "āt", "$", "āt") or -- anything else + -u(n) sub("$", AAT, "$", "āt") -- anything else ) return ret, "sfp" end if word == "smp" then sgar = canon_hamza(sgar) local ret = ( sub(IN .. "$", UUN, "in$", "ūn") or -- ends in -in -- See comments above for why we have two cases, one for UNU and -- one for non-UNU sub(UNU .. "$", UUN, "un?$", "ūn", "$", "ūn") or -- anything else + -u(n) sub("$", UUN, "$", "ūn") -- anything else ) return ret, "smp" end -- Color/defect plural; singular must be masculine or feminine -- color/defect adjective if word == "cdp" then local ret = ( sub(ELCD_START .. SK .. W .. A .. CONSPAR .. UOPT .. "$", "%1" .. UU .. "%2", "ʔa(.)wa(.)u?", "%1ū%2") or -- ʾaswad sub(ELCD_START .. SK .. Y .. A .. CONSPAR .. UOPT .. "$", "%1" .. II .. "%2", "ʔa(.)ya(.)u?", "%1ī%2") or -- ʾabyaḍ sub(ELCD_START .. SK .. CONSPAR .. A .. CONSPAR .. UOPT .. "$", "%1" .. U .. "%2" .. SK .. "%3", "ʔa(.)(.)a(.)u?", "%1u%2%3") or -- ʾaḥmar sub(ELCD_START .. A .. CONSPAR .. SH .. UOPT .. "$", "%1" .. U .. "%2" .. SH, "ʔa(.)a(.)%2u?", "%1u%2%2") or -- ʾalaff sub(ELCD_START .. SK .. CONSPAR .. AAMAQ .. "$", "%1" .. U .. "%2" .. Y, "ʔa(.)(.)ā", "%1u%2y") or -- ʾaʿmā sub("^" .. CONSPAR .. A .. W .. SKOPT .. CONSPAR .. AA .. HAMZA .. UOPT .. "$", "%1" .. UU .. "%2", "(.)aw(.)āʔu?", "%1ū%2") or -- sawdāʾ sub("^" .. CONSPAR .. A .. Y .. SKOPT .. CONSPAR .. AA .. HAMZA .. UOPT .. "$", "%1" .. II .. "%2", "(.)ay(.)āʔu?", "%1ī%2") or -- bayḍāʾ sub("^" .. CONSPAR .. A .. CONSPAR .. SK .. CONSPAR .. AA .. HAMZA .. UOPT .. "$", "%1" .. U .. "%2" .. SK .. "%3", "(.)a(.)(.)āʔu?", "%1u%2%3") or -- ʾḥamrāʾ/ʿamyāʾ sub("^" .. CONSPAR .. A .. CONSPAR .. SH .. AA .. HAMZA .. UOPT .. "$", "%1" .. U .. "%2" .. SH, "(.)a(.)%2āʔu?", "%1u%2%2") -- laffāʾ ) if not ret then error("For 'cdp', singular must be masculine or feminine color/defect adjective: " .. sgar) end return ret, "tri" end if word == "awnp" then local ret = ( sub(AN .. "[" .. ALIF .. AMAQ .. "]$", AWSK .. N, "an$", "awn") -- ends in -an ) if not ret then error("For 'awnp', singular must end in -an: " .. sgar) end return ret, "awnp" end return artr, export.detect_type(ar, isfem, num, pos) end -- need LRM here so multiple Arabic plurals end up agreeing in order with -- the transliteration local outersep = LRM .. "; " local innersep = LRM .. "/" -- Subfunction of show_form(), used to implement recursively generating -- all combinations of elements from FORM and from each of the items in -- LIST_OF_MODS, both of which are either arrays of strings or arrays of -- arrays of strings, where the strings are in the form ARABIC/TRANSLIT, -- as described in show_form(). TRAILING_ARTRMODS is an array of ARTRMOD -- items, each of which is a two-element array of ARMOD (Arabic) and TRMOD -- (transliteration), accumulating all of the suffixes generated so far -- in the recursion process. Each time we recur we take the last MOD item -- off of LIST_OF_MODS, separate each element in MOD into its Arabic and -- Latin parts and to each Arabic/Latin pair we add all elements in -- TRAILING_ARTRMODS, passing the newly generated list of ARTRMOD items -- down the next recursion level with the shorter LIST_OF_MODS. We end up -- returning a string to insert into the Wiki-markup table. function show_form_1(form, list_of_mods, trailing_artrmods, use_parens) if #list_of_mods == 0 then local arabicvals = {} local latinvals = {} local parenvals = {} -- Accumulate separately the Arabic and transliteration into -- ARABICVALS and LATINVALS, then concatenate each down below. -- However, if USE_PARENS, we put each transliteration directly -- after the corresponding Arabic, in parens, and put the results -- in PARENVALS, which get concatenated below. (This is used in the -- title of the declension table.) for _, artrmod in ipairs(trailing_artrmods) do assert(#artrmod == 2) local armod = artrmod[1] local trmod = artrmod[2] for _, subform in ipairs(form) do local ar_span, tr_span local ar_subspan, tr_subspan local ar_subspans = {} local tr_subspans = {} if type(subform) ~= "table" then subform = {subform} end for _, subsubform in ipairs(subform) do local arabic, translit = split_arabic_tr(subsubform) if arabic == "-" then ar_subspan = "&mdash;" tr_subspan = "&mdash;" elseif arabic == "?" then ar_subspan = "?" tr_subspan = "?" else tr_subspan = (rfind(translit, BOGUS_CHAR) or rfind(trmod, BOGUS_CHAR)) and "?" or require("Module:script utilities").tag_translit(translit .. trmod, lang, "default", 'style="color: var(--wikt-palette-grey-8,#888);"') -- implement elision of al- after vowel tr_subspan = rsub(tr_subspan, "([aeiouāēīōū][ %-])a([sšṣtṯṭdḏḍzžẓnrḷl]%-)", "%1%2") tr_subspan = rsub(tr_subspan, "([aeiouāēīōū][ %-])a(llāh)", "%1%2") ar_subspan = m_links.full_link({lang = lang, term = arabic .. armod, tr = "-"}) end insert_if_not(ar_subspans, ar_subspan) insert_if_not(tr_subspans, tr_subspan) end ar_span = table.concat(ar_subspans, innersep) tr_span = table.concat(tr_subspans, innersep) if use_parens then table.insert(parenvals, ar_span .. " (" .. tr_span .. ")") else table.insert(arabicvals, ar_span) table.insert(latinvals, tr_span) end end end if use_parens then return table.concat(parenvals, outersep) else local arabic_span = table.concat(arabicvals, outersep) local latin_span = table.concat(latinvals, outersep) if arabic_span == "?" then return "?" else return arabic_span .. "<br />" .. latin_span end end else local last_mods = table.remove(list_of_mods) local artrmods = {} for _, mod in ipairs(last_mods) do if type(mod) ~= "table" then mod = {mod} end for _, submod in ipairs(mod) do local armod, trmod = split_arabic_tr(submod) -- If the value is -, we need to create a blank entry -- rather than skipping it; if we have no entries at any -- level, then there will be no overall entries at all -- because the inside of the loop at the next level will -- never be executed. if armod == "-" then armod = "" trmod = "" end if armod ~= "" then armod = ' ' .. armod end if trmod ~= "" then trmod = ' ' .. trmod end for _, trailing_artrmod in ipairs(trailing_artrmods) do local trailing_armod = trailing_artrmod[1] local trailing_trmod = trailing_artrmod[2] armod = armod .. trailing_armod trmod = trmod .. trailing_trmod artrmod = {armod, trmod} table.insert(artrmods, artrmod) end end end return show_form_1(form, list_of_mods, artrmods, use_parens) end end -- Generate a string to substitute into a particular form in a Wiki-markup -- table. FORM is the set of inflected forms corresponding to the base, -- either an array of strings (referring e.g. to different possible plurals) -- or an array of arrays of strings (the first level referring e.g. to -- different possible plurals and the inner level referring typically to -- hamza-spelling variants). LIST_OF_MODS is an array of MODS elements, one -- per modifier. Each MODS element is the set of inflected forms corresponding -- to the modifier and is of the same form as FORM, i.e. an array of strings -- or an array of arrays of strings. Each string is typically of the form -- "ARABIC/TRANSLIT", i.e. an Arabic string and a Latin string separated -- by a slash. We loop over all possible combinations of elements from -- each array; this requires recursion. function show_form(form, list_of_mods, use_parens) if not form then return "&mdash;" elseif type(form) ~= "table" then error("a non-table value was given in the list of inflected forms.") end if #form == 0 then return "&mdash;" end -- We need to start the recursion with the third parameter containing -- one blank element rather than no elements, otherwise no elements -- will be propagated to the next recursion level. return show_form_1(form, list_of_mods, {{"", ""}}, use_parens) end -- Create a Wiki-markup table using the values in DATA and the template in -- WIKICODE. function make_table(data, wikicode) -- Function used as replace arg of call to rsub(). Replace the -- specified param with its (HTML) value. The param references appear -- as {{{PARAM}}} in the wikicode. local function repl(param) if param == "pos" then return data.pos elseif param == "info" then return data.title and " (" .. data.title .. ")" or "" elseif rfind(param, "type$") then return table.concat(data.forms[param] or {"&mdash;"}, outersep .. "<br>") else local list_of_mods = {} for _, mod in ipairs(mod_list) do local mods = data.forms[mod .. "_" .. param] if not mods or #mods == 0 then -- We need one blank element rather than no element, -- otherwise no elements will be propagated from one -- recursion level to the next. mods = {""} end table.insert(list_of_mods, mods) end return show_form(data.forms[param], list_of_mods, param == "lemma") end end -- For states not in the list of those to be displayed, clear out the -- corresponding inflections so they appear as a dash. for _, state in ipairs(data.allstates) do if not contains(data.states, state) then for _, numgen in ipairs(data.numgens()) do for _, case in ipairs(data.allcases) do data.forms[case .. "_" .. numgen .. "_" .. state] = {} end end end end return rsub(wikicode, "{{{([a-z_]+)}}}", repl) .. m_utilities.format_categories(data.categories, lang) end -- Generate part of the noun table for a given number spec NUM (e.g. sg) function generate_noun_num(num) return [=[! indefinite ! definite ! construct |- ! informal | {{{inf_]=] .. num .. [=[_ind}}} | {{{inf_]=] .. num .. [=[_def}}} | {{{inf_]=] .. num .. [=[_con}}} |- ! nominative | {{{nom_]=] .. num .. [=[_ind}}} | {{{nom_]=] .. num .. [=[_def}}} | {{{nom_]=] .. num .. [=[_con}}} |- ! accusative | {{{acc_]=] .. num .. [=[_ind}}} | {{{acc_]=] .. num .. [=[_def}}} | {{{acc_]=] .. num .. [=[_con}}} |- ! genitive | {{{gen_]=] .. num .. [=[_ind}}} | {{{gen_]=] .. num .. [=[_def}}} | {{{gen_]=] .. num .. [=[_con}}} ]=] end -- Make the noun table function make_noun_table(data) local wikicode = mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-top', args = { title = 'Declension of {{{pos}}} {{{lemma}}}', tall = 'yes', palette = "green", category = 'declension', class = 'tr-alongside', -- temp hack to prevent extra line break } } for _, num in ipairs(data.numbers) do if num == "du" then wikicode = wikicode .. [=[|- ! class="outer" | dual ]=] .. generate_noun_num("du") else wikicode = wikicode .. [=[|- ! class="outer" rowspan=2 | ]=] .. data.engnumberscap[num] .. "\n" .. [=[ ! class="outer" style="font-style:normal" colspan=3 | {{{]=] .. num .. [=[_type}}} |- ]=] .. generate_noun_num(num) end end wikicode = wikicode .. mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-bottom' } return make_table(data, wikicode) end -- Generate part of the gendered-noun table for a given numgen spec -- NUM (e.g. m_sg) function generate_gendered_noun_num(num) return [=[|- ! indefinite ! definite ! construct ! indefinite ! definite ! construct |- ! informal | {{{inf_m_]=] .. num .. [=[_ind}}} | {{{inf_m_]=] .. num .. [=[_def}}} | {{{inf_m_]=] .. num .. [=[_con}}} | {{{inf_f_]=] .. num .. [=[_ind}}} | {{{inf_f_]=] .. num .. [=[_def}}} | {{{inf_f_]=] .. num .. [=[_con}}} |- ! nominative | {{{nom_m_]=] .. num .. [=[_ind}}} | {{{nom_m_]=] .. num .. [=[_def}}} | {{{nom_m_]=] .. num .. [=[_con}}} | {{{nom_f_]=] .. num .. [=[_ind}}} | {{{nom_f_]=] .. num .. [=[_def}}} | {{{nom_f_]=] .. num .. [=[_con}}} |- ! accusative | {{{acc_m_]=] .. num .. [=[_ind}}} | {{{acc_m_]=] .. num .. [=[_def}}} | {{{acc_m_]=] .. num .. [=[_con}}} | {{{acc_f_]=] .. num .. [=[_ind}}} | {{{acc_f_]=] .. num .. [=[_def}}} | {{{acc_f_]=] .. num .. [=[_con}}} |- ! genitive | {{{gen_m_]=] .. num .. [=[_ind}}} | {{{gen_m_]=] .. num .. [=[_def}}} | {{{gen_m_]=] .. num .. [=[_con}}} | {{{gen_f_]=] .. num .. [=[_ind}}} | {{{gen_f_]=] .. num .. [=[_def}}} | {{{gen_f_]=] .. num .. [=[_con}}} ]=] end -- Make the gendered noun table function make_gendered_noun_table(data) local wikicode = mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-top', args = { title = 'Declension of {{{pos}}} {{{lemma}}}', tall = 'yes', palette = "green", category = 'declension', class = 'tr-alongside', -- temp hack to prevent extra line break } } for _, num in ipairs(data.numbers) do if num == "du" then wikicode = wikicode .. [=[|- ! class="outer" rowspan=2 | dual ! class="outer" colspan=3 | masculine ! class="outer" colspan=3 | feminine ]=] .. generate_gendered_noun_num("du") else wikicode = wikicode .. [=[|- ! class="outer" rowspan=3 | ]=] .. data.engnumberscap[num] .. "\n" .. [=[ ! class="outer" colspan=3 | masculine ! class="outer" colspan=3 | feminine |- ! class="outer" style="font-style:normal" colspan=3 | {{{m_]=] .. num .. [=[_type}}} ! class="outer" style="font-style:normal" colspan=3 | {{{f_]=] .. num .. [=[_type}}} ]=] .. generate_gendered_noun_num(num) end end wikicode = wikicode .. mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-bottom' } return make_table(data, wikicode) end -- Generate part of the adjective table for a given numgen spec NUM (e.g. m_sg) function generate_adj_num(num) return [=[|- ! indefinite ! definite ! indefinite ! definite |- ! informal | {{{inf_m_]=] .. num .. [=[_ind}}} | {{{inf_m_]=] .. num .. [=[_def}}} | {{{inf_f_]=] .. num .. [=[_ind}}} | {{{inf_f_]=] .. num .. [=[_def}}} |- ! nominative | {{{nom_m_]=] .. num .. [=[_ind}}} | {{{nom_m_]=] .. num .. [=[_def}}} | {{{nom_f_]=] .. num .. [=[_ind}}} | {{{nom_f_]=] .. num .. [=[_def}}} |- ! accusative | {{{acc_m_]=] .. num .. [=[_ind}}} | {{{acc_m_]=] .. num .. [=[_def}}} | {{{acc_f_]=] .. num .. [=[_ind}}} | {{{acc_f_]=] .. num .. [=[_def}}} |- ! genitive | {{{gen_m_]=] .. num .. [=[_ind}}} | {{{gen_m_]=] .. num .. [=[_def}}} | {{{gen_f_]=] .. num .. [=[_ind}}} | {{{gen_f_]=] .. num .. [=[_def}}} ]=] end -- Make the adjective table function make_adj_table(data) local wikicode = mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-top', args = { title = 'Declension of {{{pos}}} {{{lemma}}}', tall = 'yes', palette = "green", category = 'declension', class = 'tr-alongside', -- temp hack to prevent extra line break } } if contains(data.numbers, "sg") then wikicode = wikicode .. [=[|- ! class="outer" rowspan=3 | singular ! class="outer" colspan=2 | masculine ! class="outer" colspan=2 | feminine |- ! class="outer" style="font-style:normal" colspan=2 | {{{m_sg_type}}} ! class="outer" style="font-style:normal" colspan=2 | {{{f_sg_type}}} ]=] .. generate_adj_num("sg") end if contains(data.numbers, "du") then wikicode = wikicode .. [=[|- ! class="outer" rowspan=2 | dual ! class="outer" colspan=2 | masculine ! class="outer" colspan=2 | feminine ]=] .. generate_adj_num("du") end if contains(data.numbers, "pl") then wikicode = wikicode .. [=[|- ! class="outer" rowspan=3 | plural ! class="outer" colspan=2 | masculine ! class="outer" colspan=2 | feminine |- ! class="outer" style="font-style:normal" colspan=2 | {{{m_pl_type}}} ! class="outer" style="font-style:normal" colspan=2 | {{{f_pl_type}}} ]=] .. generate_adj_num("pl") end wikicode = wikicode .. mw.getCurrentFrame():expandTemplate{ title = 'inflection-table-bottom' } return make_table(data, wikicode) end return export -- For Vim, so we get 4-space tabs -- vim: set ts=4 sw=4 noet: 6y30719qp23rwl6phtl5bn8lc8lkw5a Module:ar-utilities 828 8171 27706 2026-06-21T14:59:12Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local m_utilities = require("Module:utilities") local lang = require("Module:languages").getByCode("ar") local sc = require("Module:scripts").getByCode("Arab") local rfind = mw.ustring.find local rsubn = mw.ustring.gsub local u = require("Module:string/char") local consonants = "[بتثجحخدذرزسشصضطظعغقفلكمنهويء]" -- version of rsubn() that discards all but the first return value function export.rsub(term, foo, bar)...' 27706 Scribunto text/plain local export = {} local m_utilities = require("Module:utilities") local lang = require("Module:languages").getByCode("ar") local sc = require("Module:scripts").getByCode("Arab") local rfind = mw.ustring.find local rsubn = mw.ustring.gsub local u = require("Module:string/char") local consonants = "[بتثجحخدذرزسشصضطظعغقفلكمنهويء]" -- version of rsubn() that discards all but the first return value function export.rsub(term, foo, bar) local retval = rsubn(term, foo, bar) return retval end local rsub = export.rsub -- synthesize a frame so that exported functions meant to be called from -- templates can be called from the debug console. function export.debug_frame(parargs, args) return { args = args, getParent = function() return { args = parargs } end } end function export.catfix() return m_utilities.catfix(lang, sc) end --------------------------- diacritics, letters and combinations ------------------------------ -- hamza variants local HAMZA = u(0x0621) -- hamza on the line (stand-alone hamza) = ء local HAMZA_ON_ALIF = u(0x0623) local HAMZA_ON_WAW = u(0x0624) local HAMZA_UNDER_ALIF = u(0x0625) local HAMZA_ON_YA = u(0x0626) local HAMZA_PH = u(0xFFF0) -- hamza placeholder export.HAMZA = HAMZA export.HAMZA_ON_ALIF = HAMZA_ON_ALIF export.HAMZA_ON_WAW = HAMZA_ON_WAW export.HAMZA_UNDER_ALIF = HAMZA_UNDER_ALIF export.HAMZA_ON_YA = HAMZA_ON_YA export.HAMZA_PH = HAMZA_PH -- diacritics local A = u(0x064E) -- fatḥa local AN = u(0x064B) -- fatḥatān (fatḥa tanwīn) local U = u(0x064F) -- ḍamma local UN = u(0x064C) -- ḍammatān (ḍamma tanwīn) local I = u(0x0650) -- kasra local IN = u(0x064D) -- kasratān (kasra tanwīn) local SK = u(0x0652) -- sukūn = no vowel local SH = u(0x0651) -- šadda = gemination of consonants local DAGGER_ALIF = u(0x0670) -- Pattern matching any diacritics that may be on a consonant other than shadda local DIACRITIC_ANY_BUT_SH = "[" .. A .. I .. U .. AN .. IN .. UN .. SK .. DAGGER_ALIF .. "]" -- Pattern matching short vowels local AIU = "[" .. A .. I .. U .. "]" -- Pattern matching any diacritics that may be on a consonant local DIACRITIC = SH .. "?" .. DIACRITIC_ANY_BUT_SH export.A = A export.AN = AN export.U = U export.UN = UN export.I = I export.IN = IN export.SK = SK export.SH = SH export.DAGGER_ALIF = DAGGER_ALIF -- Pattern matching any diacritics that may be on a consonant other than shadda export.DIACRITIC_ANY_BUT_SH = DIACRITIC_ANY_BUT_SH -- Pattern matching short vowels export.AIU = AIU -- Pattern matching any diacritics that may be on a consonant export.DIACRITIC = DIACRITIC -- various letters and signs local ALIF = u(0x0627) -- ʾalif = ا local ALIF_WASLA = u(0x0671) -- ʾalif waṣla = hamzatu l-waṣl = ٱ local AMAQ = u(0x0649) -- ʾalif maqṣūra = ى local AMAD = u(0x0622) -- ʾalif madda = آ local TAM = u(0x0629) -- tāʾ marbūṭa = ة local WAW = u(0x0648) -- wāw = و local W = WAW local YA = u(0x064A) -- yā = ي local Y = YA local T = u(0x062A) -- tāʾ = ت local HYPHEN = u(0x0640) local N = u(0x0646) -- nūn = ن local LRM = u(0x200E) -- left-to-right mark export.ALIF = ALIF export.ALIF_WASLA = ALIF_WASLA export.AMAQ = AMAQ export.AMAD = AMAD export.TAM = TAM export.WAW = WAW export.W = W export.YA = YA export.Y = Y export.T = T export.HYPHEN = HYPHEN export.N = N export.LRM = LRM -- common combinations local AW = A .. W -- diphthong, construct state of some final-weak nouns, 3sm past of some final-weak verbs, etc. local AY = A .. Y -- diphthong, construct state of most final-weak nouns, 3sm past of most final-weak verbs, etc. local IY = I .. Y -- equivalent to long ī local UW = U .. W -- equivalent to long ū local AA = A .. ALIF -- long ā local AAMAQ = A .. AMAQ -- vocalized ʾalif maqṣūra local II = IY -- long ī local IIN = IY .. N -- short strong masculine oblique plural ending local IINA = IIN .. A -- full strong msaculine oblique plural ending local UU = UW -- long ū local UUN = UU .. N -- short strong masculine nominative plural ending local UUNA = UUN .. A -- full strong masculine nominative plural ending local AWN = AW .. SK .. N -- short verbal ending of some final-weak verbs local AWNA = AWN .. A -- full verbal ending of some final-weak verbs local AYN = AY .. SK .. N -- short oblique dual ending, verbal ending of some final-weak verbs local AYNI = AYN .. I -- full oblique dual ending local AYNA = AYN .. A -- full verbal ending of some final-weak verbs local AAN = AA .. N -- short nominative dual ending local AANI = AAN .. I -- full nominative dual ending local UNU = "[" .. UN .. U .. "]" -- matches nominative singular of strong masculine triptotes and diptotes local UNUOPT = UNU .. "?" -- optional equivalent of UNU, for short forms local AH = A .. TAM -- feminine ending local AAH = AA .. TAM -- final-weak feminine ending local AAT = AA .. T -- short strong feminine plural ending local AATUN = AAT .. UN -- full strong nominative feminine plural ending local IYAH = I .. Y .. AH -- ending of some final-weak feminines local AYAAT = AY .. AAT -- final-weak plural ending local AYAAN = AY .. AAN -- final-weak dual ending local IYAAT = IY .. AAT -- final-weak plural ending local IYAAN = IY .. AAN -- final-weak dual ending local IYY = IY .. SH -- masculine nisba ending local IYYAH = IY .. SH .. AH -- feminine nisba ending local ATAAN = A .. T .. AAN -- feminine dual ending local AATAAN = AAT .. AAN -- final-weak feminine dual ending -- other possibilities (currently found in verb module): -- AT, AYSK, AWSK, N, NA, NI, M, MA, MU, TA, TU, _I = ALIF .. I, _U = ALIF .. U export.AW = AW export.AY = AY export.IY = IY export.UW = UW export.AA = AA export.AAMAQ = AAMAQ export.II = II export.IIN = IIN export.IINA = IINA export.UU = UU export.UUN = UUN export.UUNA = UUNA export.AWN = AWN export.AWNA = AWNA export.AYN = AYN export.AYNI = AYNI export.AYNA = AYNA export.AAN = AAN export.AANI = AANI export.UNU = UNU export.UNUOPT = UNUOPT export.AH = AH export.AAH = AAH export.AAT = AAT export.AATUN = AATUN export.IYAH = IYAH export.AYAAT = AYAAT export.AYAAN = AYAAN export.IYAAT = IYAAT export.IYAAN = IYAAN export.IYY = IYY export.IYYAH = IYYAH export.ATAAN = ATAAN export.AATAAN = AATAAN function export.reorder_shadda(text) -- shadda+short-vowel (including tanwīn vowels, i.e. -an -in -un) gets -- replaced with short-vowel+shadda during NFC normalisation, which -- MediaWiki does for all Unicode strings; however, it makes the -- detection process inconvenient, so undo it. (For example, the code in -- remove_in would fail to detect the -in in مُتَرَبٍّ because the shadda -- would come after the -in.) text = rsub(text, "(" .. DIACRITIC_ANY_BUT_SH .. ")" .. SH, SH .. "%1") return text end function export.undo_reorder_shadda(text) return mw.ustring.toNFC(text) end --------------------------- hamza processing ------------------------------ local hamza_subs = { --------------------------- handle initial hamza -------------------------- -- put initial hamza on a seat according to following vowel. { "^" .. HAMZA_PH .. "([" .. I .. YA .. "])", HAMZA_UNDER_ALIF .. "%1" }, { " " .. HAMZA_PH .. "([" .. I .. YA .. "])", " " .. HAMZA_UNDER_ALIF .. "%1" }, { "^" .. HAMZA_PH, HAMZA_ON_ALIF }, -- if no vowel, assume a { " " .. HAMZA_PH, " " .. HAMZA_ON_ALIF }, -- if no vowel, assume a ----------------------------- handle final hamza -------------------------- -- "final" hamza may be followed by a short vowel or tanwīn sequence -- use a previous short vowel to get the seat { "(" .. AIU .. ")(" .. HAMZA_PH .. ")(" .. DIACRITIC .. "?)$", function(v, ham, diacrit) ham = v == I and HAMZA_ON_YA or v == U and HAMZA_ON_WAW or HAMZA_ON_ALIF return v .. ham .. diacrit end }, { "(" .. AIU .. ")(" .. HAMZA_PH .. ")(" .. DIACRITIC .. "? )", function(v, ham, diacrit) ham = v == I and HAMZA_ON_YA or v == U and HAMZA_ON_WAW or HAMZA_ON_ALIF return v .. ham .. diacrit end }, -- else hamza is on the line { HAMZA_PH .. "(" .. DIACRITIC .. "?)$", HAMZA .. "%1" }, ---------------------------- handle medial hamza -------------------------- -- if long vowel or diphthong precedes, we need to ignore it. { "([" .. AMAD .. ALIF .. WAW .. YA .. "]" .. SK .. "?)(" .. HAMZA_PH .. ")(" .. SH .. "?)([^ ])", function(prec, ham, shad, v2) ham = (v2 == I or v2 == YA) and HAMZA_ON_YA or (v2 == U or v2 == WAW) and HAMZA_ON_WAW or rfind(prec, YA) and HAMZA_ON_YA or HAMZA return prec .. ham .. shad .. v2 end }, -- otherwise, seat of medial hamza relates to vowels on one or both sides. { "([^ ])(" .. HAMZA_PH .. ")(" .. SH .. "?)(" .. AN .. "?[^ ])", function(v1, ham, shad, v2) ham = (v1 == I or v2 == I or v2 == YA) and HAMZA_ON_YA or (v1 == U or v2 == U or v2 == WAW) and HAMZA_ON_WAW or -- special exception for the accusative ending, in words like -- جُزْءًا (juzʾan). By the rules of Thackston pp. 281-282 a -- hamza-on-alif should appear, but that would result in -- two alifs in a row, which is generally forbidden. -- According to Haywood/Nahmad pp. 114-115, after sukūn before -- the accusative ending (including when a pronominal suffix -- follows) hamza is written on yāʾ if the previous letter -- is connecting, else on the line. The only examples they -- give involve preceding non-connecting z (جُزْءًا juzʾan and -- (جُزْءَهُ juzʾahu) and preceding diphthongs, with the only -- connecting letter being yāʾ, where we have hamza-on-yāʾ -- anyway by the preceding regexp. Haywood/Nahmad's rule seems -- too complicated, and since it conflicts with Thackston, -- we only implement the case where otherwise two alifs would -- appear with the indefinite accusative ending. v2 == AN .. ALIF and HAMZA or HAMZA_ON_ALIF return v1 .. ham .. shad .. v2 end }, --------------------------- handle alif madda ----------------------------- { HAMZA_ON_ALIF .. A .. "?" .. ALIF, AMAD }, ----------------------- catch any remaining hamzas ------------------------ { HAMZA_PH, HAMZA } } function export.process_hamza(term) -- convert HAMZA_PH into appropriate hamza seat for _, sub in ipairs(hamza_subs) do term = rsub(term, sub[1], sub[2]) end -- sequence of hamza-on-wāw + wāw is problematic and leads to a preferred -- alternative with some other type of hamza, as well as the original -- sequence; sequence of wāw + hamza-on-wāw + wāw is especially problematic -- and leads to two different alternatives with the original sequence not -- one of them if rfind(term, WAW .. "ؤُو") then return { rsub(term, WAW .. "ؤُو", WAW .. "ئُو"), rsub(term, WAW .. "ؤُو", WAW .. "ءُو") } elseif rfind(term, YA .. "ؤُو") then return { rsub(term, YA .. "ؤُو", YA .. "ئُو"), term } elseif rfind(term, ALIF .. "ؤُو") then -- Here John Mace "Arabic Verbs" is inconsistent. In past-tense parts, -- the preferred alternative has hamza on the line, whereas in -- non-past parts the preferred alternative has hamza-on-yāʾ even -- though the sequence of vowels is identical. It's too complicated to -- propagate information about tense through to here so pick one. return { rsub(term, ALIF .. "ؤُو", ALIF .. "ئُو"), term } -- no alternative spelling in sequence of U/A + hamza-on-wāw + U + wāw; -- sequence of I + hamza-on-wāw + U + wāw does not occur (has -- hamza-on-yāʾ instead) else return { term } end end ----------------------------------- misc junk --------------------------------- -- Used in {{ar-adj-in}} so that we can specify a full lemma rather than -- requiring the user to truncate the -in ending. FIXME: Move ar-adj-in -- into Lua. function export.remove_in(frame) local lemma = frame.args[1] or error("Lemma required.") return rsub(export.reorder_shadda(lemma), IN .. "$", "") end -- Used in {{ar-adj-an}} so that we can specify a full lemma rather than -- requiring the user to truncate the -an ending. FIXME: Move ar-adj-an -- into Lua. function export.remove_an(frame) local lemma = frame.args[1] or error("Lemma required.") return rsub(export.reorder_shadda(lemma), AN .. AMAQ .. "$", "") end -- Compare two words and find the alternation pattern (vowel changes, prefixes, suffixes etc.) -- Still a WIP, doesn't work correctly yet. function export.find_pattern(word1, word2) return nil end function export.etymology(frame) local text, categories = {}, {} local linkText local frame_params = { [1] = { required = true }, } local frame_args = require("Module:parameters").process(frame.args, frame_params) local anchor = frame_args[1] local data = { ["color adjective"] = { anchor = "Color or defect adjectives", text = "color adjective", categories = { "color/defect adjectives" }, }, ["defect adjective"] = { anchor = "Color or defect adjectives", text = "defect adjective", categories = { "color/defect adjectives" }, }, } local params = { [1] = {}, ["nocat"] = { type = "boolean", default = false }, ["lc"] = { type = "boolean", default = false }, ["nocap"] = { alias_of = "lc" }, ["notext"] = { type = "boolean", default = false }, } local args = require("Module:parameters").process(frame:getParent().args, params) if anchor and data[anchor] then local data = data[anchor] anchor = data.anchor or error('The data table does not include an anchor for "' .. anchor .. '".') linkText = data.text or error('The data table does not include link text for "' .. anchor .. '".') if not args.lc then linkText = rsubn(linkText, "^%a", function(a) return mw.ustring.upper(a) end) end if not args.notext then table.insert(text, "[[Appendix:Arabic nominals#" .. anchor .. "|" .. linkText .. "]]") end if not args.nocat then table.insert(categories, m_utilities.format_categories(data.categories, lang)) end else error('The anchor "' .. tostring(anchor) .. '" is not found in the list of anchors.') end return table.concat(text) .. table.concat(categories) end return export b5yp8g0vr2gkhegct05upo1dfcwoet9 Module:anchors 828 8172 27707 2026-06-21T15:00:35Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local string_utilities_module = "Module:string utilities" local anchor_encode = mw.uri.anchorEncode local concat = table.concat local insert = table.insert local language_anchor -- Defined below. local function decode_entities(...) decode_entities = require(string_utilities_module).decode_entities return decode_entities(...) end local function encode_entities(...) encode_entities = require(string_utilities_module).encode_entities return...' 27707 Scribunto text/plain local export = {} local string_utilities_module = "Module:string utilities" local anchor_encode = mw.uri.anchorEncode local concat = table.concat local insert = table.insert local language_anchor -- Defined below. local function decode_entities(...) decode_entities = require(string_utilities_module).decode_entities return decode_entities(...) end local function encode_entities(...) encode_entities = require(string_utilities_module).encode_entities return encode_entities(...) end -- Returns the anchor text to be used as the fragment of a link to a language section. function export.language_anchor(lang, id) return anchor_encode(lang:getFullName() .. ": " .. id) end language_anchor = export.language_anchor -- Normalizes input text (removes formatting etc.), which can then be used as an anchor in an `id=` field. function export.normalize_anchor(str) return decode_entities(anchor_encode(str)) end function export.make_anchors(ids) local anchors = {} for i = 1, #ids do local id = ids[i] local el = mw.html.create("span") :addClass("template-anchor") :attr("id", anchor_encode(id)) :attr("data-id", id) insert(anchors, tostring(el)) end return concat(anchors) end function export.senseid(lang, id, tag_name) -- The following tag is opened but never closed, where is it supposed to be closed? -- with <li> it doesn't matter, as it is closed automatically. -- with <p> it is a problem -- Cannot use mw.html here as it always closes tags return "<" .. tag_name .. " class=\"senseid\" id=\"" .. language_anchor(lang, id) .. "\" data-lang=\"" .. lang:getCode() .. "\" data-id=\"" .. encode_entities(id) .. "\">" end function export.etymid(lang, id) -- Use a <ul> tag to ensure spacing doesn't get messed up. local el = mw.html.create("ul") :addClass("etymid") :attr("id", language_anchor(lang, id)) :attr("data-lang", lang:getCode()) :attr("data-id", id) return tostring(el) end function export.etymonid(lang, id, opts) opts = opts or {} -- Use a <ul> tag to ensure spacing doesn't get messed up. local el = mw.html.create("ul") :addClass("etymonid") :attr("data-lang", lang:getCode()) if id then el:attr("id", language_anchor(lang, id)) el:attr("data-id", id) end if opts.no_tree then el:attr("data-no-tree", "1") end if opts.title then el:attr("data-title", opts.title) end if opts.empty_tree then el:attr("data-empty-tree", "1") end if opts.ety_tree_json then el:attr("data-ety-tree-json", opts.ety_tree_json) end return tostring(el) end return export 2o0jqln8y0bxvpfhe66eogs2k88qkap Module:alternative forms 828 8173 27708 2026-06-21T15:01:33Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local labels_module = "Module:labels" local links_module = "Module:links" local parameter_utilities_module = "Module:parameter utilities" local function track(page) require("Module:debug/track")("alter/" .. page) end --[==[ Main function for displaying alternative forms. Extracted out from the template-callable function so this can be called by other modules (in particular, [[Module:descendants tree]]). `allow_self_link` causes terms the sam...' 27708 Scribunto text/plain local export = {} local labels_module = "Module:labels" local links_module = "Module:links" local parameter_utilities_module = "Module:parameter utilities" local function track(page) require("Module:debug/track")("alter/" .. page) end --[==[ Main function for displaying alternative forms. Extracted out from the template-callable function so this can be called by other modules (in particular, [[Module:descendants tree]]). `allow_self_link` causes terms the same as the pagename to be shown normally; otherwise they are displayed unlinked. `default_separator` controls the separator between terms when the user didn't use a special separator term like ";" (defaulting to ", "). ]==] function export.display_alternative_forms(parent_args, pagename, allow_self_link, default_separator) local params = { [1] = {required = true, type = "language", default = "en"}, [2] = {list = true, allow_holes = true}, } local m_param_utils = require(parameter_utilities_module) local param_mods = m_param_utils.construct_param_mods { {group = {"link", "ref"}}, -- For compatibility, we need to turn off separate_no_index for q= and qq=. {group = "q", separate_no_index = false}, -- We currently don't support unindexed l= and ll=. {group = "l", require_index = true}, } local items, args = m_param_utils.parse_list_with_inline_modifiers_and_separate_params { params = params, param_mods = param_mods, raw_args = parent_args, termarg = 2, parse_lang_prefix = true, track_module = "alter", lang = 1, sc = "sc.default", stop_when = function(data) local stop = not data.any_param_at_index if stop and parent_args[data.orig_index + 1] == nil then track("actual hole in params") end return stop end, default_separator = default_separator, } if not items[1] then error("No items found!") end local lang = args[1] local raw_labels = {} -- Extract the labels and make sure none are blank or omitted. local last_item_index = items[#items].orig_index if last_item_index < args[2].maxindex then for i = last_item_index + 2, args[2].maxindex do if not args[2][i] then -- Indices in i start at 1 but parameters start at 2 to add 1 to shown index. error("Missing/blank item not allowed in [[Template:alt]] labels, but saw such an item in parameter " .. (i + 1)) end table.insert(raw_labels, args[2][i]) end end -- Make sure there aren't property parameters after the last item (i.e. corresponding to labels). for k, v in pairs(args) do -- Look for named list parameters. We check: -- (1) key is a string (excludes the term param, which is a number); -- (2) value is a table, i.e. a list; -- (3) v.maxindex is set (i.e. allow_holes was used); -- (4) v.maxindex is past the index of the last term. if type(k) == "string" and type(v) == "table" and v.maxindex and v.maxindex > last_item_index then local set_values = {} for i = last_item_index + 1, v.maxindex do if v[i] then table.insert(set_values, i) end end error(("Extraneous values for %s= (set at position%s %s)"):format(k, #set_values > 1 and "s" or "", table.concat(set_values, ","))) end end if not allow_self_link then -- If the to-be-linked term is the same as the pagename, display it unlinked. for _, item in ipairs(items) do if not item.term and lang:stripDiacritics(item.term) == pagename then track("term is pagename") item.alt = item.alt or item.term item.term = nil end end end local labels if #raw_labels > 0 then labels = require(labels_module).process_raw_labels { labels = raw_labels, lang = lang, nocat = true } end local parts = {} local function ins(part) table.insert(parts, part) end -- Construct the final output. -- First the items, including separators, left and right regular qualifiers and left and right per-item labels. for _, item in ipairs(items) do ins(item.separator) local text = require(links_module).full_link(item, nil, allow_self_link, "show qualifiers") ins(text) end -- If there are labels, construct them now and append to final output. if labels then if lang:hasTranslit() then ins(" &mdash; " .. require(labels_module).format_processed_labels { labels = labels, lang = lang }) else ins(" " .. require(labels_module).format_processed_labels { labels = labels, lang = lang, open = "(", close = ")" }) end end return table.concat(parts) end --[==[ Template-callable function for displaying alternative forms. ]==] function export.create(frame) local parent_args = frame:getParent().args return export.display_alternative_forms(parent_args, mw.loadData("Module:headword/data").pagename) end return export mhx02egmbnof8g4pu8dgdyw5qclxj2k Module:anagrams 828 8174 27709 2026-06-22T06:36:31Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local m_links = require("Module:links") local export = {} function export.show(frame) local params = { [1] = {required = true, type = "full language", default = "und"}, [2] = {required = true, default = "anagram", list = true}, ["a"] = true, } local args = require("Module:parameters").process(frame:getParent().args, params) for i, val in ipairs(args[2]) do args[2][i] = m_links.full_link({lang = args[1], term = val}) end return table.concat(ar...' 27709 Scribunto text/plain local m_links = require("Module:links") local export = {} function export.show(frame) local params = { [1] = {required = true, type = "full language", default = "und"}, [2] = {required = true, default = "anagram", list = true}, ["a"] = true, } local args = require("Module:parameters").process(frame:getParent().args, params) for i, val in ipairs(args[2]) do args[2][i] = m_links.full_link({lang = args[1], term = val}) end return table.concat(args[2], ", ") end return export cbd4ykft3oy42d5o4i3uazieynl3v5a Module:ar-link 828 8175 27710 2026-06-22T06:37:34Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local U = require("Module:string/char") -- Derived from Arabic data table in [[Module:languages/data/2]]. local entry_name_replacements = { [U(0x0671)] = U(0x0627), [U(0x064B)] = "", [U(0x064C)] = "", [U(0x064D)] = "", [U(0x064E)] = "", [U(0x064F)] = "", [U(0x0650)] = "", [U(0x0651)] = "", [U(0x0652)] = "", [U(0x0670)] = "", [U(0x0640)] = "", } local function make_entry_name(text) return (text:gsub("[%z\1-\127\194-\244][\128-\191]*", entr...' 27710 Scribunto text/plain local export = {} local U = require("Module:string/char") -- Derived from Arabic data table in [[Module:languages/data/2]]. local entry_name_replacements = { [U(0x0671)] = U(0x0627), [U(0x064B)] = "", [U(0x064C)] = "", [U(0x064D)] = "", [U(0x064E)] = "", [U(0x064F)] = "", [U(0x0650)] = "", [U(0x0651)] = "", [U(0x0652)] = "", [U(0x0670)] = "", [U(0x0640)] = "", } local function make_entry_name(text) return (text:gsub("[%z\1-\127\194-\244][\128-\191]*", entry_name_replacements)) end local function link(entry, text) return '<span class="Arab" lang="ar">[[' .. make_entry_name(entry) .. '#Arabic|' .. (text or entry) .. ']]</span>&lrm;' end function export.link(frame) local text = frame.args[1] if not text then return nil end local transliterate = require("Module:memoize")(require "Module:ar-translit".tr) local open_paren = ' <span class="mention-gloss-paren annotation-paren">(</span><span class="tr Latn" xml:lang="ar-Latn" lang="ar-Latn">' local close_paren = '</span><span class="mention-gloss-paren annotation-paren">)</span>' return (text :gsub( "%[%[([^%]]+)%]%]", function (link_text) local entry, text = link_text:match("^([^|]+)|(.+)$") entry, text = entry or link_text, text or link_text local translit = transliterate(text) if translit then return link(entry, text) .. open_paren .. translit .. close_paren else return link(link_text) end end)) end return export 0mw6bj6ewmpvyx16aqly4rvowarxafg Module:category link/templates 828 8176 27711 2026-06-22T06:38:26Z Umarxon III 2840 Sahypa döretdi, mazmuny: '-- Prevent substitution. if mw.isSubsting() then return require("Module:unsubst") end local make_link = require("Module:category link").make_link local process_params = require("Module:parameters").process local unpack = unpack or table.unpack -- Lua 5.2 compatibility local export = {} function export.category_t(frame) return make_link(unpack(process_params(frame:getParent().args, { [1] = {required = true, allow_empty = true, no_trim = true}, [2] = {allo...' 27711 Scribunto text/plain -- Prevent substitution. if mw.isSubsting() then return require("Module:unsubst") end local make_link = require("Module:category link").make_link local process_params = require("Module:parameters").process local unpack = unpack or table.unpack -- Lua 5.2 compatibility local export = {} function export.category_t(frame) return make_link(unpack(process_params(frame:getParent().args, { [1] = {required = true, allow_empty = true, no_trim = true}, [2] = {allow_empty = true, no_trim = true}, }))) end return export 99l3255tcq0j0mvel3osy2rm41dv30q Module:doublet table 828 8177 27712 2026-06-22T06:39:14Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local get_by_name = require("Module:languages").getByCanonicalName local m_links = require("Module:links") local auto_subtable = require("Module:auto-subtable") local langs = require("Module:languages/cache") local function quote(word) return "“" .. word .. "”" end local function trim(word) return string.match(word, "%s*(.-)%s*$") end local link local function make_link(lang, qualifier) return function(word) return link(word, lan...' 27712 Scribunto text/plain local export = {} local get_by_name = require("Module:languages").getByCanonicalName local m_links = require("Module:links") local auto_subtable = require("Module:auto-subtable") local langs = require("Module:languages/cache") local function quote(word) return "“" .. word .. "”" end local function trim(word) return string.match(word, "%s*(.-)%s*$") end local link local function make_link(lang, qualifier) return function(word) return link(word, lang, qualifier) end end -- Create a link out of `word` (which may be multipart, with the parts separated by a slash or by "and") in language -- `lang`, with non-gloss text `qualifier` to include. Normally, qualifiers ought to be `.qq` or `.ll`, but the -- previous version faked full links totally manually and included the qualifiers inside of the gloss/translit parens -- like the `.ng` argument does, so we maintain some of the old code in format_qualifier() and include the result as -- non-gloss text. (Declared local above to make a forward reference.) function link(word, lang, qualifier) if word == "" then return "&mdash;" end if word:find("\127", nil, true) then return (word:gsub("^(.-)( ?\127'\"`UNIQ%-%-%w+%-[%dA-F]+%-%-?QINU`\"'\127)", function(text, space_and_strip_marker) return link(text, lang, qualifier) .. space_and_strip_marker end) ) end if word:find(" and ", nil, true) then return (word:gsub("(.+) and (.+)", function (first, second) return link(first, lang, qualifier) .. " and " .. link(second, lang, qualifier) end)) end if word:find("[[", nil, true) then return (word:gsub("%[%[([^%]]+)%]%]", make_link(lang, qualifier))) end local entry, link_text, sense_id if word:find("|", nil, true) then entry, link_text = word:match("^([^|]+)|(.+)$") if not entry then error("Malformed piped link: " .. word) end else entry = word end -- moule$mussel -> moule#French-mussel (assuming lang is French) if entry:match("%$") then entry, sense_id = entry:match("([^$]+)$(.+)$") if not entry then error("Malformed sense id: " .. entry) end link_text = entry end if not link_text then link_text = entry or word end return m_links.full_link { lang = lang, term = mw.text.killMarkers(entry), alt = link_text, id = sense_id, ng = qualifier, } end local function gsub_or_nil(str, pattern, repl) local result, count = string.gsub(str, pattern, repl) if count == 0 then return nil end return result end local langs_by_name = {} setmetatable(langs_by_name, { -- Auto-create language objects: langs.English -> language object for English. __index = function(self, key) local lang = get_by_name(mw.text.killMarkers(key), true) self[key] = lang return lang end }) local function link_language_names(text) return (text:gsub("%[%[([^%]]+)%]%]", function (name) return langs_by_name[name]:makeWikipediaLink() end)) end local comma_placeholder = "\1" local semicolon_placeholder = "\2" local placeholder_convert = { [comma_placeholder] = ",", [semicolon_placeholder] = ";", [","] = comma_placeholder, [";"] = semicolon_placeholder, } local function format_qualifier(qualifier_content, link_text, lang) if qualifier_content:find("\127", nil, true) then return (qualifier_content:gsub("[^\127]+ ?", format_qualifier)) end if qualifier_content:find('"', nil, true) then return (qualifier_content :gsub(comma_placeholder, placeholder_convert) :gsub( '"([^"]+)"', function (gloss) return quote(gloss:gsub("[,;]", placeholder_convert)) end) :gsub( "[^,;]+", function (item) if item:find("“", nil, true) then return item else return "''" .. item .. "''" end end) :gsub("[" .. comma_placeholder .. semicolon_placeholder .. "]", placeholder_convert) ) else return (qualifier_content :gsub(comma_placeholder, placeholder_convert) :gsub("[^,;]+", "''%1''") ) end end local function link_and_make_qualifier(cell, lang) if not cell then return "" end if cell:find(",", nil, true) then return (cell -- Replace commas in qualifiers with semicolons, so that the function -- doesn't confuse commas in qualifiers and commas that separate words. :gsub("%([^%)]+%)", function (qualifier) return qualifier:gsub(",", placeholder_convert) end) :gsub("([^,]+)(,? ?)", function(text, comma) return link_and_make_qualifier(text, lang) .. comma end) ) elseif cell:find("/", nil, true) then return (cell :gsub("([^/]+)( ?/? ?)", function(text, slash) return link_and_make_qualifier(text, lang) .. slash end) ) elseif cell:find("(", nil, true) then return gsub_or_nil( cell, "(.-) %(([^%)]+)%)", function (link_text, qualifier_content) return link(link_text, lang, link_language_names(format_qualifier(qualifier_content, link_text, lang))) end) or error("Ill-formed qualifier in " .. quote(cell) .. " for " .. lang:getCanonicalName() .. ".") end return link(cell, lang) end local function link_term_list(text, lang) if text:find("[[", nil, true) then return (text:gsub("%[%[([^%]]+)%]%]", make_link(lang))) end return (text:gsub("([^,]+)", make_link(lang))) end local function make_table(rows, column_number_to_lang, arg_count) local output = {} for i, header_cell in ipairs(rows[1]) do output[i] = ("! %s"):format(header_cell) end local row_count_for_headers_at_bottom = 10 local headers_at_bottom = #rows > row_count_for_headers_at_bottom local headers if headers_at_bottom then headers = "|-\n" .. table.concat(output, "\n") end table.insert(output, 1, '{| class="wikitable sortable"') local column_count = #column_number_to_lang local column_number = column_count local row_number = 1 -- Header is row 1. table.insert(output, "|-") for _ = column_count + 1, arg_count do if column_number == column_count then column_number = 1 row_number = row_number + 1 table.insert(output, "|-") else column_number = column_number + 1 end local lang = langs[column_number_to_lang[column_number]] local content = rows[row_number][column_number] table.insert(output, ('| data-sort-value="%s" | %s'):format( m_links.remove_links(lang:stripDiacritics(content:match("[^,(]+") or content)), link_and_make_qualifier(content, lang))) end if headers_at_bottom then table.insert(output, headers) end table.insert(output, "|}") return table.concat(output, "\n") end function export.doublet_table(frame) local args = frame:getParent().args if not args.langs then return end local column_number_to_lang = {} local column_count = 0 for lang in args.langs:gmatch("[^, ]+") do column_count = column_count + 1 column_number_to_lang[column_count] = lang end local rows = auto_subtable() local column_number = 0 local row_number = 1 local arg_count for i, arg in ipairs(args) do arg_count = i if column_number == column_count then column_number = 1 row_number = row_number + 1 else column_number = column_number + 1 end rows[row_number][column_number] = trim(arg) end return make_table(rows, column_number_to_lang, arg_count) end local function make_family_doublet_table(rows, column_count) local Array = require("Module:array") local output = Array() for i, header_cell in ipairs(rows[1]) do if i == 1 then -- Assumes the language name is a single capitalized word. -- Works in [[Appendix:Romance doublets]]. header_cell = header_cell:gsub("^(%u%l+) (.+)$", function (language_name, terms) return language_name .. " " .. link_term_list(terms, langs_by_name[language_name]) end) output:insert(("|+ %s"):format(header_cell)) output:insert("!") else output:insert(("! %s"):format(header_cell)) end end local row_count_for_headers_at_bottom = 10 local headers_at_bottom = #rows > row_count_for_headers_at_bottom local headers if headers_at_bottom then headers = "|-\n" .. output:concat("\n") end output:insert(1, '{| class="wikitable"') for i = 2, #rows do if rows[i][1] == "See also" then output:insert(('|-\n| colspan="%d" style="text-align: center; font-weight: bold;" | See also') :format(column_count)) else local lang = langs_by_name[rows[i][1]] output:insert("|-\n! " .. rows[i][1]) -- link language name? for j = 2, column_count do output:insert("| " .. link_and_make_qualifier(rows[i][j], lang)) end end end if headers_at_bottom then output:insert(headers) end output:insert("|}") return output:concat("\n") end -- Copies sequential numbered arguments and counts them (while ignoring "See also"). local function process_args(args) local count = 0 local new_args = {} for i, v in ipairs(args) do v = trim(v) if v ~= "See also" then count = count + 1 end new_args[i] = v end return new_args, count end function export.family_doublets(frame) local args = frame:getParent().args local column_count = tonumber(args.cols) or error("Provide the number of columns in the |cols= parameter.") local arg_count args, arg_count = process_args(args) -- Warning! Removes named parameters! if arg_count % column_count ~= 0 then error( string.format( "There are %d cell parameters but %d columns. The number of cells should be a multiple of the number of columns.", arg_count, column_count)) end local rows = auto_subtable() local column_number = 0 local row_number = 1 for _, arg in ipairs(args) do if column_number == column_count then column_number = 1 row_number = row_number + 1 else column_number = column_number + 1 end rows[row_number][column_number] = arg if arg == "See also" then column_number = 0 row_number = row_number + 1 end end rows:un_auto_subtable() -- to avoid problems with below function return make_family_doublet_table(rows, column_count) end return export 6iixx4arsp3m6bupfczbk72okvq83eu Module:grc-appendix 828 8178 27713 2026-06-22T06:40:22Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local items = { ["contraction"] = "contraction", ["contracted"] = "contraction", ["first declension"] = "first declension", ["second declension"] = "second declension", ["third declension"] = "third declension", ["enclitics"] = "enclitics", ["numerals"] = "numerals", ["correlatives"] = "correlatives", ["nouns"] = "nouns", [""] = "", [""] = "", } return export' 27713 Scribunto text/plain local export = {} local items = { ["contraction"] = "contraction", ["contracted"] = "contraction", ["first declension"] = "first declension", ["second declension"] = "second declension", ["third declension"] = "third declension", ["enclitics"] = "enclitics", ["numerals"] = "numerals", ["correlatives"] = "correlatives", ["nouns"] = "nouns", [""] = "", [""] = "", } return export 62lyp9ktuuv9trvtaoeo4hfi9xk0oja Module:grc-link/data 828 8179 27714 2026-06-22T06:41:35Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local make_auto_subtabler = require("Module:auto-subtable") local content = mw.title.new("Appendix:Ancient Greek endings"):getContent() local endings = {} -- Find entries for endings marked by the syntax "; [[...]]". -- Store them in a table. for anchor in content:gmatch("\n; %[%[([^%]]+)%]%]") do if anchor:find("[\128-\255]") then for suffix in anchor:gmatch("%-[^%s,]+") do endings[suffix] = true end end end local shares_ending = make_auto_subtabler...' 27714 Scribunto text/plain local make_auto_subtabler = require("Module:auto-subtable") local content = mw.title.new("Appendix:Ancient Greek endings"):getContent() local endings = {} -- Find entries for endings marked by the syntax "; [[...]]". -- Store them in a table. for anchor in content:gmatch("\n; %[%[([^%]]+)%]%]") do if anchor:find("[\128-\255]") then for suffix in anchor:gmatch("%-[^%s,]+") do endings[suffix] = true end end end local shares_ending = make_auto_subtabler() -- The actual purpose of this data module: -- Check if each ending ends with the characters of any smaller endings, by -- snipping off progressively larger pieces of the ending and comparing them to -- all other endings. -- If so, store the ending in an array indexed by the shorter ending. -- For instance, -εσθαι ends with the characters of -αι and -σθαι. for ending in pairs(endings) do for i = 1, mw.ustring.len(ending) - 2 do -- Ignore the first two characters because of the hyphen. local sub_ending = "-" .. mw.ustring.sub(ending, -i) if endings[sub_ending] then table.insert(shares_ending[sub_ending], ending) end end end shares_ending:un_auto_subtable() return { shares_ending = shares_ending } shatopxunukohzewfrh759dh24p398a Module:grc-link 828 8180 27715 2026-06-22T06:42:58Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local function remove_macron_breve(text) return mw.ustring.toNFD(text):gsub("\204[\132\134]", "") end local function link(text) return '<span class="Polyt" lang="grc">[[' .. remove_macron_breve(text) .. '#Ancient Greek|' .. text .. ']]</span>' end local function anchor_link(text) return '<span class="Polyt" lang="grc">[[#' .. text .. '|' .. text .. ']]</span>' end local function tag(text) return '<span class="Polyt" lang="grc">'...' 27715 Scribunto text/plain local export = {} local function remove_macron_breve(text) return mw.ustring.toNFD(text):gsub("\204[\132\134]", "") end local function link(text) return '<span class="Polyt" lang="grc">[[' .. remove_macron_breve(text) .. '#Ancient Greek|' .. text .. ']]</span>' end local function anchor_link(text) return '<span class="Polyt" lang="grc">[[#' .. text .. '|' .. text .. ']]</span>' end local function tag(text) return '<span class="Polyt" lang="grc">' .. text .. '</span>' end local function individual_anchor(text) return '<span id="' .. text .. '"></span>' end local function make_anchors(text) if text:find(",") then local anchors = {} for word in text:gmatch("[^, ]+") do table.insert(anchors, individual_anchor(word)) end return table.concat(anchors) else return individual_anchor(text) end end local function count(text, pattern, bytepattern) local _, count = (bytepattern and string.gsub or mw.ustring.gsub)(text, pattern, "") return count end local function get_length(text) return count(text, "[%z\1-\127\194-\244][\128-\191]*", true) end local U = require("Module:string/char") local acute = U(0x301) local grave = U(0x300) local circumflex = U(0x342) local function check(text) if get_length(text) == 1 then return "" end local errors = {} text = mw.ustring.toNFD(text) if count(text, grave) > 0 then table.insert(errors, "Grave found!") end local accent_count = count(text, "[" .. acute .. circumflex .. "]") if accent_count > 1 then table.insert(errors, "Too many accents!") elseif accent_count == 0 and text:sub(-1) ~= "-" then table.insert(errors, "No accent!") end if errors[1] then return ' <span style="color: goldenrod;">' .. table.concat(errors, " ") .. '</span>' else return "" end end -- For [[Appendix:Ancient Greek endings]]; using individual templates is way too slow. function export.link_Greek(frame) local text = frame:getParent().args[1] if text then local data = mw.loadData "Module:grc-link/data" local macron = mw.ustring.char(0x306) local breve = mw.ustring.char(0x304) local subscript = mw.ustring.char(0x345) local replacements = { [macron] = "a", -- macron [breve] = "b", -- breve [subscript] = "c", -- iota subscript } local get_sort_value = require("Module:memoize")(function (suffix) suffix = mw.ustring.gsub(mw.ustring.toNFD(suffix), "[" .. macron .. breve .. subscript .. "]", replacements) return suffix end) local entries = {} local i, j, entry, pos while true do i, j, entry = text:find("(...-)\n;", pos) if i == nil then table.insert(entries, text:sub(pos or 1)) break end table.insert(entries, entry) pos = j - 1 end return (table.concat( require("Module:fun").map( -- Automatically list other suffixes that share the same last -- few letters, using [[Module:User:Erutuon/10/data]]. function (entry) if entry:find("\n;") then local shares_ending for headword in entry:match("\n; %[%[([^%]]+)%]%]"):gmatch("%-[^%s,]+") do if data.shares_ending[headword] then shares_ending = shares_ending or {} for _, suffix in ipairs(data.shares_ending[headword]) do table.insert(shares_ending, "[[" .. suffix .. "]]") end end end if shares_ending then table.sort( shares_ending, function (ending1, ending2) return get_sort_value(ending1) < get_sort_value(ending2) end) return entry .. "\n: See also " .. table.concat(shares_ending, ", ") .. "." end end return entry end, entries)) :gsub( "(\n?;? ?)%[%[((%-?)[^%]]+)%]%]", function (preceding, link_text, hyphen) if link_text:find("[\206\207\225]") then -- leading bytes for Greek and Coptic block and leading byte for Greek Extended block if preceding == "\n; " then return preceding .. make_anchors(link_text) .. tag(link_text) else if hyphen == "-" then return preceding .. anchor_link(link_text) else return preceding .. link(link_text) .. check(link_text) end end end end) :gsub( "(&[^;]+;)(&[^;]+;)", '<span class="Polyt" lang="grc">[[%2|%1%2]]</span>') :gsub("\n$", "")) end end -- Used in [[User:Erutuon/Classical Greek prose]]. function export.link_and_transliterate(frame) local text = frame.args[1] or frame:getParent().args[1] if not text then return end local open_paren = ' <span class="mention-gloss-paren annotation-paren">(</span><span class="tr Latn" xml:lang="grc-Latn" lang="grc-Latn">' local close_paren = '</span><span class="mention-gloss-paren annotation-paren">)</span>' local column_value = '10em' return '<div style="-moz-columns: ' .. column_value .. '; -webkit-columns: ' .. column_value .. '; columns: ' .. column_value .. ';">' .. text :gsub( "%[%[([^%]]+)%]%]", function (link_text) return link(link_text) .. open_paren .. (require("Module:languages").getByCode("grc"):transliterate(link_text)) .. close_paren end) :gsub("\n$", "") .. '</div>' end function export.strongs_list(frame) local text = frame.args[1] local function strip_diacritics(word) return mw.ustring.toNFD(word):gsub("[\204\205][\128\129\136\147\148\130\133]", "") -- U+0300 \204\128 COMBINING GRAVE ACCENT -- U+0301 \204\129 COMBINING ACUTE ACCENT -- U+0308 \204\136 COMBINING DIAERESIS -- U+0313 \204\147 COMBINING COMMA ABOVE -- U+0314 \204\148 COMBINING REVERSED COMMA ABOVE -- U+0342 \205\130 COMBINING GREEK PERISPOMENI -- U+0345 \205\133 COMBINING GREEK YPOGEGRAMMENI end local function get_first_letter(word) return mw.ustring.upper(strip_diacritics(word:match("^%-?([%z\1-\127\194-\244][\128-\191]*)"))) end local prev_letter return text:gsub( "%f[^\n%z]([^\t\n]+)\t([^\t\n]+)", function(number, word) local header = "" local letter = get_first_letter(word) if letter ~= prev_letter then if number ~= "1" then header = "</ul>\n\n" end header = header .. ('===%s &ndash; %04d===\n<ul class="plainlinks" style="column-width: 12em;">\n'):format(letter, tonumber(number)) prev_letter = letter end return header .. '<li> [https://www.blueletterbible.org/lexicon/g' .. number .. "/wlc G" .. number .. "]: " .. link(word) .. (word:find(" ", 1, true) and ("<br>(" .. word:gsub("[^ ]+", link) .. ")") or "") end) .. "</ul>" end return export g84o250ck5y6ht0x2q9wt4ki8ymvwgx Module:ja-link 828 8181 27716 2026-06-22T06:43:50Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local m_links = require("Module:links") local m_string_utils = require("Module:string utilities") local ugsub = m_string_utils.gsub local upper = m_string_utils.upper local kana_to_romaji = require("Module:Hrkt-translit").tr -- [[Module:languages]] -- [[Module:parameters]] -- [[Module:script utilities]] -- [[Module:ja-ruby]] -- [[Module:Hrkt-translit]] function export.link(data, options) options = options or {} data.lang = data.lang or req...' 27716 Scribunto text/plain local export = {} local m_links = require("Module:links") local m_string_utils = require("Module:string utilities") local ugsub = m_string_utils.gsub local upper = m_string_utils.upper local kana_to_romaji = require("Module:Hrkt-translit").tr -- [[Module:languages]] -- [[Module:parameters]] -- [[Module:script utilities]] -- [[Module:ja-ruby]] -- [[Module:Hrkt-translit]] function export.link(data, options) options = options or {} data.lang = data.lang or require'Module:languages'.getByCode'ja' local kana_for_rom = data.kana or data.lemma if not data.kana then data.lemma = data.lemma:gsub('[ %%%^%-%.]', '') end local ruby if data.kana and data.lemma ~= data.kana then ruby = require('Module:ja-ruby').ruby_auto{ term = data.lemma, kana = data.kana, options = options.rubyOptions, } else require("Module:debug").track('ja-link/no ruby') ruby = data.lemma end if ruby:match'%[%[.+%]%]' then require("Module:debug").track('ja-link/manual wikilink') data.term = ruby elseif data.linkto == "" or data.linkto == "-" then require("Module:debug").track('ja-link/disabled link') data.alt = ruby else data.term = data.linkto or data.lemma:gsub('[ %%]', '') data.alt = ruby end if data.tr ~= '-' then if not data.tr then data.tr = m_links.remove_links(kana_to_romaji(kana_for_rom, data.lang:getCode(), nil, {hist = options.hist})) if options.caps then require("Module:debug").track("ja-link/caps") data.tr = ugsub(data.tr, "%f[^%s%c%p]%l", upper) end else if options.hist then require("Module:debug").track("ja-link/parameter hist unused") end end data.tr = "<i>" .. data.tr .. "</i>" end data.lemma = nil data.kana = nil data.linkto = nil return m_links.full_link(data, options.face, not options.disableSelfLink) end function export.show(frame) local alias_of_3 = {alias_of = 3} local boolean = {type = "boolean"} local args = require("Module:parameters").process(frame:getParent().args, { [1] = {required = true}, [2] = true, [3] = true, ['gloss'] = alias_of_3, ['t'] = alias_of_3, ['linkto'] = {allow_empty = true}, ['rom'] = true, ['lit'] = true, ['pos'] = true, ['id'] = true, ['hist'] = boolean, ['caps'] = boolean, ['self'] = {type = "boolean", default = false}, }) return export.link({ lang = frame.args[1] and require'Module:languages'.getByCode(frame.args[1]), lemma = args[1], kana = args[2], gloss = args[3], lit = args["lit"], pos = args["pos"], id = args["id"], linkto = args["linkto"], tr = args["rom"], }, { caps = args["caps"], hist = args["hist"], disableSelfLink = args["self"], }) end return export 146glug46mxs4kvdx4wlctbgfrhmz4v Module:ja-link/fast 828 8182 27717 2026-06-22T06:44:41Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} -- Used in [[Wiktionary:Frequency lists/Japanese]]. Converts bare links to -- {{l/ja}}-type links. function export.link(frame) local text = frame.args[1] if not text then return nil end local function link(text) return '<span class="Jpan" lang="ja">[[' .. text .. '#Japanese|' .. text .. ']]</span>' end return (text :gsub( "%[%[([^%]]+)%]%]", link)) end return export' 27717 Scribunto text/plain local export = {} -- Used in [[Wiktionary:Frequency lists/Japanese]]. Converts bare links to -- {{l/ja}}-type links. function export.link(frame) local text = frame.args[1] if not text then return nil end local function link(text) return '<span class="Jpan" lang="ja">[[' .. text .. '#Japanese|' .. text .. ']]</span>' end return (text :gsub( "%[%[([^%]]+)%]%]", link)) end return export 4wcbighghsc4ym5bapy3g6m2phenjns Module:links 828 8183 27718 2026-06-22T06:45:40Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} --[=[ [[Unsupported titles]], pages with high memory usage, extraction modules and part-of-speech names are listed at [[Module:links/data]]. Other modules used: [[Module:script utilities]] [[Module:scripts]] [[Module:languages]] and its submodules [[Module:gender and number]] [[Module:debug/track]] ]=] local anchors_module = "Module:anchors" local debug_track_module = "Module:debug/track" local form_of_module = "Module:form of"...' 27718 Scribunto text/plain local export = {} --[=[ [[Unsupported titles]], pages with high memory usage, extraction modules and part-of-speech names are listed at [[Module:links/data]]. Other modules used: [[Module:script utilities]] [[Module:scripts]] [[Module:languages]] and its submodules [[Module:gender and number]] [[Module:debug/track]] ]=] local anchors_module = "Module:anchors" local debug_track_module = "Module:debug/track" local form_of_module = "Module:form of" local gender_and_number_module = "Module:gender and number" local languages_module = "Module:languages" local load_module = "Module:load" local memoize_module = "Module:memoize" local pages_module = "Module:pages" local pron_qualifier_module = "Module:pron qualifier" local scripts_module = "Module:scripts" local script_utilities_module = "Module:script utilities" local string_encode_entities_module = "Module:string/encode entities" local string_utilities_module = "Module:string utilities" local table_module = "Module:table" local utilities_module = "Module:utilities" local concat = table.concat local find = string.find local get_current_title = mw.title.getCurrentTitle local insert = table.insert local ipairs = ipairs local match = string.match local new_title = mw.title.new local pairs = pairs local remove = table.remove local sub = string.sub local toNFC = mw.ustring.toNFC local tostring = tostring local type = type local unstrip = mw.text.unstrip local NAMESPACE = get_current_title().nsText local function anchor_encode(...) anchor_encode = require(memoize_module)(mw.uri.anchorEncode, true) return anchor_encode(...) end local function debug_track(...) debug_track = require(debug_track_module) return debug_track(...) end local function decode_entities(...) decode_entities = require(string_utilities_module).decode_entities return decode_entities(...) end local function decode_uri(...) decode_uri = require(string_utilities_module).decode_uri return decode_uri(...) end -- Can't yet replace, as the [[Module:string utilities]] version no longer has automatic double-encoding prevention, which requires changes here to account for. local function encode_entities(...) encode_entities = require(string_encode_entities_module) return encode_entities(...) end local function extend(...) extend = require(table_module).extend return extend(...) end local function find_best_script_without_lang(...) find_best_script_without_lang = require(scripts_module).findBestScriptWithoutLang return find_best_script_without_lang(...) end local function format_categories(...) format_categories = require(utilities_module).format_categories return format_categories(...) end local function format_genders(...) format_genders = require(gender_and_number_module).format_genders return format_genders(...) end local function format_qualifiers(...) format_qualifiers = require(pron_qualifier_module).format_qualifiers return format_qualifiers(...) end local function get_current_L2(...) get_current_L2 = require(pages_module).get_current_L2 return get_current_L2(...) end local function get_lang(...) get_lang = require(languages_module).getByCode return get_lang(...) end local function get_script(...) get_script = require(scripts_module).getByCode return get_script(...) end local function language_anchor(...) language_anchor = require(anchors_module).language_anchor return language_anchor(...) end local function load_data(...) load_data = require(load_module).load_data return load_data(...) end local function request_script(...) request_script = require(script_utilities_module).request_script return request_script(...) end local function shallow_copy(...) shallow_copy = require(table_module).shallowCopy return shallow_copy(...) end local function split(...) split = require(string_utilities_module).split return split(...) end local function tag_text(...) tag_text = require(script_utilities_module).tag_text return tag_text(...) end local function tag_translit(...) tag_translit = require(script_utilities_module).tag_translit return tag_translit(...) end local function trim(...) trim = require(string_utilities_module).trim return trim(...) end local function u(...) u = require(string_utilities_module).char return u(...) end local function ulower(...) ulower = require(string_utilities_module).lower return ulower(...) end local function umatch(...) umatch = require(string_utilities_module).match return umatch(...) end local m_headword_data local function get_headword_data() m_headword_data = load_data("Module:headword/data") return m_headword_data end local function track(page, code) local tracking_page = "links/" .. page debug_track(tracking_page) if code then debug_track(tracking_page .. "/" .. code) end end local function selective_trim(...) -- Unconditionally trimmed charset. local always_trim = "\194\128-\194\159" .. -- U+0080-009F (C1 control characters) "\194\173" .. -- U+00AD (soft hyphen) "\226\128\170-\226\128\174" .. -- U+202A-202E (directionality formatting characters) "\226\129\166-\226\129\169" -- U+2066-2069 (directionality formatting characters) -- Standard trimmed charset. local standard_trim = "%s" .. -- (default whitespace charset) "\226\128\139-\226\128\141" .. -- U+200B-200D (zero-width spaces) always_trim -- If there are non-whitespace characters, trim all characters in `standard_trim`. -- Otherwise, only trim the characters in `always_trim`. selective_trim = function(text) if text == "" then return text end local trimmed = trim(text, standard_trim) if trimmed ~= "" then return trimmed end return trim(text, always_trim) end return selective_trim(...) end local function escape(text, str) local rep repeat text, rep = text:gsub("\\\\(\\*" .. str .. ")", "\5%1") until rep == 0 return (text:gsub("\\" .. str, "\6")) end local function unescape(text, str) return (text :gsub("\5", "\\") :gsub("\6", str)) end -- Remove bold, italics, soft hyphens, strip markers and HTML tags. local function remove_formatting(str) str = str :gsub("('*)'''(.-'*)'''", "%1%2") :gsub("('*)''(.-'*)''", "%1%2") :gsub("­", "") return (unstrip(str) :gsub("<[^<>]+>", "")) end --[==[Takes an input and splits on a double slash (taking account of escaping backslashes).]==] function export.split_on_slashes(text) if text:find("\\", nil, true) then track("escaped", "split_on_slashes") end text = split(escape(text, "//"), "//", true) or {} for i, v in ipairs(text) do text[i] = unescape(v, "//") if v == "" then text[i] = false end end return text end --[==[Takes a wikilink and outputs the link target and display text. By default, the link target will be returned as a title object, but if `allow_bad_target` is set it will be returned as a string, and no check will be performed as to whether it is a valid link target.]==] function export.get_wikilink_parts(text, allow_bad_target) -- TODO: replace `allow_bad_target` with `allow_unsupported`, with support for links to unsupported titles, including escape sequences. if ( -- Filters out anything but "[[...]]" with no intermediate "[[" or "]]". not match(text, "^()%[%[") or -- Faster than sub(text, 1, 2) ~= "[[". find(text, "[[", 3, true) or find(text, "]]", 3, true) ~= #text - 1 ) then return nil, nil end local pipe, title, display = find(text, "|", 3, true) if pipe then title, display = sub(text, 3, pipe - 1), sub(text, pipe + 1, -3) else title = sub(text, 3, -3) display = title end if allow_bad_target then return title, display end title = new_title(title) -- No title object means the target is invalid. if title == nil then return nil, nil -- If the link target starts with "#" then mw.title.new returns a broken -- title object, so grab the current title and give it the correct fragment. elseif title.prefixedText == "" then local fragment = title.fragment if fragment == "" then -- [[#]] isn't valid return nil, nil end title = get_current_title() title.fragment = fragment end return title, display end -- Does the work of export.get_fragment, but can be called directly to avoid unnecessary checks for embedded links. local function get_fragment(text) text = escape(text, "#") -- Replace numeric character references with the corresponding character (&#39; → '), -- as they contain #, which causes the numeric character reference to be -- misparsed (wa'a → wa&#39;a → pagename wa&, fragment 39;a). text = decode_entities(text) local target, fragment = text:match("^(.-)#(.+)$") target = target or text target = unescape(target, "#") fragment = fragment and unescape(fragment, "#") return target, fragment end --[==[Takes a link target and outputs the actual target and the fragment (if any).]==] function export.get_fragment(text) if text:find("\\", nil, true) then track("escaped", "get_fragment") end -- If there are no embedded links, process input. local open = find(text, "[[", nil, true) if not open then return get_fragment(text) end local close = find(text, "]]", open + 2, true) if not close then return get_fragment(text) -- If there is one, but it's redundant (i.e. encloses everything with no pipe), remove and process. elseif open == 1 and close == #text - 1 and not find(text, "|", 3, true) then return get_fragment(sub(text, 3, -3)) end -- Otherwise, return the input. return text end --[==[ Given a link target as passed to `full_link()`, get the actual page that the target refers to. This removes bold, italics, strip markets and HTML; calls `makeEntryName()` for the language in question; converts targets beginning with `*` to the Reconstruction namespace; and converts appendix-constructed languages to the Appendix namespace. Returns up to three values: # the actual page to link to, or {nil} to not link to anything; # how the target should be displayed as, if the user didn't explicitly specify any display text; generally the same as the original target, but minus any anti-asterisk !!; # the value `true` if the target had a backslash-escaped * in it (FIXME: explain this more clearly). ]==] function export.get_link_page_with_auto_display(target, lang, sc, plain) local orig_target = target if not target then return nil elseif target:find("\\", nil, true) then track("escaped", "get_link_page") end target = remove_formatting(target) if target:sub(1, 1) == ":" then track("initial colon") -- FIXME, the auto_display (second return value) should probably remove the colon return target:sub(2), orig_target end local prefix = target:match("^(.-):") -- Convert any escaped colons target = target:gsub("\\:", ":") if prefix then -- If this is an a link to another namespace or an interwiki link, ensure there's an initial colon and then -- return what we have (so that it works as a conventional link, and doesn't do anything weird like add the term -- to a category.) prefix = ulower(trim(prefix)) if prefix ~= "" and ( load_data("Module:data/namespaces")[prefix] or load_data("Module:data/interwikis")[prefix] ) then return target, orig_target end end -- Check if the term is reconstructed and remove any asterisk. Also check for anti-asterisk (!!). -- Otherwise, handle the escapes. local reconstructed, escaped, anti_asterisk if not plain then target, reconstructed = target:gsub("^%*(.)", "%1") if reconstructed == 0 then target, anti_asterisk = target:gsub("^!!(.)", "%1") if anti_asterisk == 1 then -- Remove !! from original. FIXME! We do it this way because the call to remove_formatting() above -- may cause non-initial !! to be interpreted as anti-asterisks. We should surely move the -- remove_formatting() call later. orig_target = orig_target:gsub("^!!", "") end end end target, escaped = target:gsub("^(\\-)\\%*", "%1*") if not (sc and sc:getCode() ~= "None") then sc = lang:findBestScript(target) end -- Remove carets if they are used to capitalize parts of transliterations (unless they have been escaped). if (not sc:hasCapitalization()) and sc:isTransliterated() and target:match("%^") then target = escape(target, "^") :gsub("%^", "") target = unescape(target, "^") end -- Get the entry name for the language. target = lang:makeEntryName(target, sc, reconstructed == 1 or lang:hasType("appendix-constructed")) -- If the link contains unexpanded template parameters, then don't create a link. if target:match("{{{.-}}}") then -- FIXME: Should we return the original target as the default display value (second return value)? return nil end -- Link to appendix for reconstructed terms and terms in appendix-only languages. Plain links interpret * -- literally, however. if reconstructed == 1 then if lang:getFullCode() == "und" then -- Return the original target as default display value. If we don't do this, we wrongly get -- [Term?] displayed instead. return nil, orig_target end target = "Reconstruction:" .. lang:getFullName() .. "/" .. target -- Reconstructed languages and substrates require an initial *. elseif anti_asterisk ~= 1 and (lang:hasType("reconstructed") or lang:getFamilyCode() == "qfa-sub") then error(("The specified language %s is unattested, while the term '%s' does not begin with '*' to indicate that it is reconstructed.") : format(lang:getCanonicalName(), orig_target)) elseif lang:hasType("appendix-constructed") then target = "Appendix:" .. lang:getFullName() .. "/" .. target else target = target end return target, orig_target, escaped > 0 end function export.get_link_page(target, lang, sc, plain) local target, auto_display, escaped = export.get_link_page_with_auto_display(target, lang, sc, plain) return target, escaped end -- Make a link from a given link's parts local function make_link(link, lang, sc, id, isolated, cats, no_alt_ast, plain) -- Convert percent encoding to plaintext. link.target = link.target and decode_uri(link.target, "PATH") link.fragment = link.fragment and decode_uri(link.fragment, "PATH") -- Find fragments (if one isn't already set). -- Prevents {{l|en|word#Etymology 2|word}} from linking to [[word#Etymology 2#English]]. -- # can be escaped as \#. if link.target and link.fragment == nil then link.target, link.fragment = get_fragment(link.target) end -- Process the target local auto_display, escaped link.target, auto_display, escaped = export.get_link_page_with_auto_display(link.target, lang, sc, plain) -- Create a default display form. -- If the target is "" then it's a link like [[#English]], which refers to the current page. if auto_display == "" then auto_display = (m_headword_data or get_headword_data()).pagename end -- If the display is the target and the reconstruction * has been escaped, remove the escaping backslash. if escaped then auto_display = auto_display:gsub("\\([^\\]*%*)", "%1", 1) end -- Process the display form. if link.display then local orig_display = link.display link.display = lang:makeDisplayText(link.display, sc, true) if cats then auto_display = lang:makeDisplayText(auto_display, sc) -- If the alt text is the same as what would have been automatically generated, then the alt parameter is redundant (e.g. {{l|en|foo|foo}}, {{l|en|w:foo|foo}}, but not {{l|en|w:foo|w:foo}}). -- If they're different, but the alt text could have been entered as the term parameter without it affecting the target page, then the target parameter is redundant (e.g. {{l|ru|фу|фу́}}). -- If `no_alt_ast` is true, use pcall to catch the error which will be thrown if this is a reconstructed lang and the alt text doesn't have *. if link.display == auto_display then insert(cats, lang:getFullName() .. " links with redundant alt parameters") else local ok, check if no_alt_ast then ok, check = pcall(export.get_link_page, orig_display, lang, sc, plain) else ok = true check = export.get_link_page(orig_display, lang, sc, plain) end if ok and link.target == check then insert(cats, lang:getFullName() .. " links with redundant target parameters") end end end else link.display = lang:makeDisplayText(auto_display, sc) end if not link.target then return link.display end -- If the target is the same as the current page, there is no sense id -- and either the language code is "und" or the current L2 is the current -- language then return a "self-link" like the software does. if link.target == get_current_title().prefixedText then local fragment, current_L2 = link.fragment, get_current_L2() if ( fragment and fragment == current_L2 or not (id or fragment) and (lang:getFullCode() == "und" or lang:getFullName() == current_L2) ) then return tostring(mw.html.create("strong") :addClass("selflink") :wikitext(link.display)) end end -- Add fragment. Do not add a section link to "Undetermined", as such sections do not exist and are invalid. -- TabbedLanguages handles links without a section by linking to the "last visited" section, but adding -- "Undetermined" would break that feature. For localized prefixes that make syntax error, please use the -- format: ["xyz"] = true. local prefix = link.target:match("^:*([^:]+):") prefix = prefix and ulower(prefix) if prefix ~= "category" and not (prefix and load_data("Module:data/interwikis")[prefix]) then if (link.fragment or link.target:sub(-1) == "#") and not plain then track("fragment", lang:getFullCode()) if cats then insert(cats, lang:getFullName() .. " links with manual fragments") end end if not link.fragment then if id then link.fragment = lang:getFullCode() == "und" and anchor_encode(id) or language_anchor(lang, id) elseif lang:getFullCode() ~= "und" and not (link.target:match("^Appendix:") or link.target:match("^Reconstruction:")) then link.fragment = anchor_encode(lang:getFullName()) end end end -- Put inward-facing square brackets around a link to isolated spacing character(s). if isolated and #link.display > 0 and not umatch(decode_entities(link.display), "%S") then link.display = "&#x5D;" .. link.display .. "&#x5B;" end link.target = link.target:gsub("^(:?)(.*)", function(m1, m2) return m1 .. encode_entities(m2, "#%&+/:<=>@[\\]_{|}") end) link.fragment = link.fragment and encode_entities(remove_formatting(link.fragment), "#%&+/:<=>@[\\]_{|}") return "[[" .. link.target:gsub("^[^:]", ":%0") .. (link.fragment and "#" .. link.fragment or "") .. "|" .. link.display .. "]]" end -- Split a link into its parts local function parse_link(linktext) local link = { target = linktext } local target = link.target link.target, link.display = target:match("^(..-)|(.+)$") if not link.target then link.target = target link.display = target end -- There's no point in processing these, as they aren't real links. local target_lower = link.target:lower() for _, false_positive in ipairs({ "category", "cat", "file", "image" }) do if target_lower:match("^" .. false_positive .. ":") then return nil end end link.display = decode_entities(link.display) link.target, link.fragment = get_fragment(link.target) -- So that make_link does not look for a fragment again. if not link.fragment then link.fragment = false end return link end local function check_params_ignored_when_embedded(alt, lang, id, cats) if alt then track("alt-ignored") if cats then insert(cats, lang:getFullName() .. " links with ignored alt parameters") end end if id then track("id-ignored") if cats then insert(cats, lang:getFullName() .. " links with ignored id parameters") end end end -- Find embedded links and ensure they link to the correct section. local function process_embedded_links(text, alt, lang, sc, id, cats, no_alt_ast, plain) -- Process the non-linked text. text = lang:makeDisplayText(text, sc, true) -- If the text begins with * and another character, then act as if each link begins with *. However, don't do this if the * is contained within a link at the start. E.g. `|*[[foo]]` would set all_reconstructed to true, while `|[[*foo]]` would not. local all_reconstructed = false if not plain then -- anchor_encode removes links etc. if anchor_encode(text):sub(1, 1) == "*" then all_reconstructed = true end -- Otherwise, handle any escapes. text = text:gsub("^(\\-)\\%*", "%1*") end check_params_ignored_when_embedded(alt, lang, id, cats) local function process_link(space1, linktext, space2) local capture = "[[" .. linktext .. "]]" local link = parse_link(linktext) -- Return unprocessed false positives untouched (e.g. categories). if not link then return capture end if all_reconstructed then if link.target:find("^!!") then -- Check for anti-asterisk !! at the beginning of a target, indicating that a reconstructed term -- wants a part of the term to link to a non-reconstructed term, e.g. Old English -- {{ang-noun|m|head=*[[!!Crist|Cristes]] [[!!mæsseǣfen]]}}. link.target = link.target:sub(3) -- Also remove !! from the display, which may have been copied from the target (as in mæsseǣfen in -- the example above). link.display = link.display:gsub("^!!", "") elseif not link.target:match("^%*") then link.target = "*" .. link.target end end linktext = make_link(link, lang, sc, id, false, nil, no_alt_ast, plain) :gsub("^%[%[", "\3") :gsub("%]%]$", "\4") return space1 .. linktext .. space2 end -- Use chars 1 and 2 as temporary substitutions, so that we can use charsets. These are converted to chars 3 and 4 by process_link, which means we can convert any remaining chars 1 and 2 back to square brackets (i.e. those not part of a link). text = text :gsub("%[%[", "\1") :gsub("%]%]", "\2") -- If the script uses ^ to capitalize transliterations, make sure that any carets preceding links are on the inside, so that they get processed with the following text. if ( text:find("^", nil, true) and not sc:hasCapitalization() and sc:isTransliterated() ) then text = escape(text, "^") :gsub("%^\1", "\1%^") text = unescape(text, "^") end text = text:gsub("\1(%s*)([^\1\2]-)(%s*)\2", process_link) -- Remove the extra * at the beginning of a language link if it's immediately followed by a link whose display begins with * too. if all_reconstructed then text = text:gsub("^%*\3([^|\1-\4]+)|%*", "\3%1|*") end return (text :gsub("[\1\3]", "[[") :gsub("[\2\4]", "]]") ) end local function simple_link(term, fragment, alt, lang, sc, id, cats, no_alt_ast, srwc) local plain if lang == nil then lang, plain = get_lang("und"), true end -- Get the link target and display text. If the term is the empty string, treat the input as a link to the current page. if term == "" then term = get_current_title().prefixedText elseif term then local new_term, new_alt = export.get_wikilink_parts(term, true) if new_term then check_params_ignored_when_embedded(alt, lang, id, cats) -- [[|foo]] links are treated as plaintext "[[|foo]]". -- FIXME: Pipes should be handled via a proper escape sequence, as they can occur in unsupported titles. if new_term == "" then term, alt = nil, term else local title = new_title(new_term) if title then local ns = title.namespace -- File: and Category: links should be returned as-is. if ns == 6 or ns == 14 then return term end end term, alt = new_term, new_alt if cats then if not (srwc and srwc(term, alt)) then insert(cats, lang:getFullName() .. " links with redundant wikilinks") end end end end end if alt then alt = selective_trim(alt) if alt == "" then alt = nil end end -- If there's nothing to process, return nil. if not (term or alt) then return nil end -- If there is no script, get one. if not sc then sc = lang:findBestScript(alt or term) end -- Embedded wikilinks need to be processed individually. if term then local open = find(term, "[[", nil, true) if open and find(term, "]]", open + 2, true) then return process_embedded_links(term, alt, lang, sc, id, cats, no_alt_ast, plain) end term = selective_trim(term) end -- If not, make a link using the parameters. return make_link({ target = term, display = alt, fragment = fragment }, lang, sc, id, true, cats, no_alt_ast, plain) end --[==[Creates a basic link to the given term. It links to the language section (such as <code>==English==</code>), but it does not add language and script wrappers, so any code that uses this function should call the <code class="n">[[Module:script utilities#tag_text|tag_text]]</code> from [[Module:script utilities]] to add such wrappers itself at some point. The first argument, <code class="n">data</code>, may contain the following items, a subset of the items used in the <code class="n">data</code> argument of <code class="n">full_link</code>. If any other items are included, they are ignored. { { term = entry_to_link_to, alt = link_text_or_displayed_text, lang = language_object, id = sense_id, } } ; <code class="n">term</code> : Text to turn into a link. This is generally the name of a page. The text can contain wikilinks already embedded in it. These are processed individually just like a single link would be. The <code class="n">alt</code> argument is ignored in this case. ; <code class="n">alt</code> (''optional'') : The alternative display for the link, if different from the linked page. If this is {{code|lua|nil}}, the <code class="n">text</code> argument is used instead (much like regular wikilinks). If <code class="n">text</code> contains wikilinks in it, this argument is ignored and has no effect. (Links in which the alt is ignored are tracked with the tracking template {{whatlinkshere|tracking=links/alt-ignored}}.) ; <code class="n">lang</code> : The [[Module:languages#Language objects|language object]] for the term being linked. If this argument is defined, the function will determine the language's canonical name (see [[Template:language data documentation]]), and point the link or links in the <code class="n">term</code> to the language's section of an entry, or to a language-specific senseid if the <code class="n">id</code> argument is defined. ; <code class="n">id</code> (''optional'') : Sense id string. If this argument is defined, the link will point to a language-specific sense id ({{ll|en|identifier|id=HTML}}) created by the template {{temp|senseid}}. A sense id consists of the language's canonical name, a hyphen (<code>-</code>), and the string that was supplied as the <code class="n">id</code> argument. This is useful when a term has more than one sense in a language. If the <code class="n">term</code> argument contains wikilinks, this argument is ignored. (Links in which the sense id is ignored are tracked with the tracking template {{whatlinkshere|tracking=links/id-ignored}}.) The second argument is as follows: ; <code class="n">allow_self_link</code> : If {{code|lua|true}}, the function will also generate links to the current page. The default ({{code|lua|false}}) will not generate a link but generate a bolded "self link" instead. The following special options are processed for each link (both simple text and with embedded wikilinks): * The target page name will be processed to generate the correct entry name. This is done by the [[Module:languages#makeEntryName|makeEntryName]] function in [[Module:languages]], using the <code class="n">entry_name</code> replacements in the language's data file (see [[Template:language data documentation]] for more information). This function is generally used to automatically strip dictionary-only diacritics that are not part of the normal written form of a language. * If the text starts with <code class="n">*</code>, then the term is considered a reconstructed term, and a link to the Reconstruction: namespace will be created. If the text contains embedded wikilinks, then <code class="n">*</code> is automatically applied to each one individually, while preserving the displayed form of each link as it was given. This allows linking to phrases containing multiple reconstructed terms, while only showing the * once at the beginning. * If the text starts with <code class="n">:</code>, then the link is treated as "raw" and the above steps are skipped. This can be used in rare cases where the page name begins with <code class="n">*</code> or if diacritics should not be stripped. For example: ** {{temp|l|en|*nix}} links to the nonexistent page [[Reconstruction:English/nix]] (<code class="n">*</code> is interpreted as a reconstruction), but {{temp|l|en|:*nix}} links to [[*nix]]. ** {{temp|l|sl|Franche-Comté}} links to the nonexistent page [[Franche-Comte]] (<code>é</code> is converted to <code>e</code> by <code class="n">makeEntryName</code>), but {{temp|l|sl|:Franche-Comté}} links to [[Franche-Comté]].]==] function export.language_link(data) if type(data) ~= "table" then error( "The first argument to the function language_link must be a table. See Module:links/documentation for more information.") elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then track("escaped", "language_link") end -- Categorize links to "und". local lang, cats = data.lang, data.cats if cats and lang:getCode() == "und" then insert(cats, "Undetermined language links") end return simple_link( data.term, data.fragment, data.alt, lang, data.sc, data.id, cats, data.no_alt_ast, data.suppress_redundant_wikilink_cat ) end function export.plain_link(data) if type(data) ~= "table" then error( "The first argument to the function plain_link must be a table. See Module:links/documentation for more information.") elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then track("escaped", "plain_link") end return simple_link( data.term, data.fragment, data.alt, nil, data.sc, data.id, data.cats, data.no_alt_ast, data.suppress_redundant_wikilink_cat ) end --[==[Replace any links with links to the correct section, but don't link the whole text if no embedded links are found. Returns the display text form.]==] function export.embedded_language_links(data) if type(data) ~= "table" then error( "The first argument to the function embedded_language_links must be a table. See Module:links/documentation for more information.") elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then track("escaped", "embedded_language_links") end local term, lang, sc = data.term, data.lang, data.sc -- If we don't have a script, get one. if not sc then sc = lang:findBestScript(term) end -- Do we have embedded wikilinks? If so, they need to be processed individually. local open = find(term, "[[", nil, true) if open and find(term, "]]", open + 2, true) then return process_embedded_links(term, data.alt, lang, sc, data.id, data.cats, data.no_alt_ast) end -- If not, return the display text. term = selective_trim(term) -- FIXME: Double-escape any percent-signs, because we don't want to treat non-linked text as having percent-encoded characters. This is a hack: percent-decoding should come out of [[Module:languages]] and only dealt with in this module, as it's specific to links. term = term:gsub("%%", "%%25") return lang:makeDisplayText(term, sc, true) end function export.mark(text, item_type, face, lang) local tag = { "", "" } if item_type == "gloss" then tag = { '<span class="mention-gloss-double-quote">“</span><span class="mention-gloss">', '</span><span class="mention-gloss-double-quote">”</span>' } if type(text) == "string" and text:match("^''[^'].*''$") then -- Temporary tracking for mention glosses that are entirely italicized or bolded, which is probably -- wrong. (Note that this will also find bolded mention glosses since they use triple apostrophes.) track("italicized-mention-gloss", lang and lang:getFullCode() or nil) end elseif item_type == "tr" then if face == "term" then tag = { '<span lang="' .. lang:getFullCode() .. '" class="tr mention-tr Latn">', '</span>' } else tag = { '<span lang="' .. lang:getFullCode() .. '" class="tr Latn">', '</span>' } end elseif item_type == "ts" then -- \226\129\160 = word joiner (zero-width non-breaking space) U+2060 tag = { '<span class="ts mention-ts Latn">/\226\129\160', '\226\129\160/</span>' } elseif item_type == "pos" then tag = { '<span class="ann-pos">', '</span>' } elseif item_type == "non-gloss" then tag = { '<span class="ann-non-gloss">', '</span>' } elseif item_type == "annotations" then tag = { '<span class="mention-gloss-paren annotation-paren">(</span>', '<span class="mention-gloss-paren annotation-paren">)</span>' } elseif item_type == "infl" then tag = { '<span class="ann-infl">', '</span>' } end if type(text) == "string" then return tag[1] .. text .. tag[2] else return "" end end local pos_tags --[==[Formats the annotations that are displayed with a link created by {{code|lua|full_link}}. Annotations are the extra bits of information that are displayed following the linked term, and include things such as gender, transliteration, gloss and so on. * The first argument is a table possessing some or all of the following keys: *:; <code class="n">genders</code> *:: Table containing a list of gender specifications in the style of [[Module:gender and number]]. *:; <code class="n">tr</code> *:: Transliteration. *:; <code class="n">gloss</code> *:: Gloss that translates the term in the link, or gives some other descriptive information. *:; <code class="n">pos</code> *:: Part of speech of the linked term. If the given argument matches one of the aliases in `pos_aliases` in [[Module:headword/data]], or consists of a part of speech or alias followed by `f` (for a non-lemma form), expand it appropriately. Otherwise, just show the given text as it is. *:; <code class="n">ng</code> *:: Arbitrary non-gloss descriptive text for the link. This should be used in preference to putting descriptive text in `gloss` or `pos`. *:; <code class="n">lit</code> *:: Literal meaning of the term, if the usual meaning is figurative or idiomatic. *:; <code class="n">infl</code> *:: Table containing a list of grammar tags in the style of [[Module:form of]] `tagged_inflections`. *:Any of the above values can be omitted from the <code class="n">info</code> argument. If a completely empty table is given (with no annotations at all), then an empty string is returned. * The second argument is a string. Valid values are listed in [[Module:script utilities/data]] "data.translit" table.]==] function export.format_link_annotations(data, face) local output = {} -- Interwiki link if data.interwiki then insert(output, data.interwiki) end -- Genders if type(data.genders) ~= "table" then data.genders = { data.genders } end if data.genders and #data.genders > 0 then local genders, gender_cats = format_genders(data.genders, data.lang) insert(output, "&nbsp;" .. genders) if gender_cats then local cats = data.cats if cats then extend(cats, gender_cats) end end end local annotations = {} -- Transliteration and transcription if data.tr and data.tr[1] or data.ts and data.ts[1] then local kind if face == "term" then kind = face else kind = "default" end if data.tr[1] and data.ts[1] then insert(annotations, tag_translit(data.tr[1], data.lang, kind) .. " " .. export.mark(data.ts[1], "ts")) elseif data.ts[1] then insert(annotations, export.mark(data.ts[1], "ts")) else insert(annotations, tag_translit(data.tr[1], data.lang, kind)) end end -- Gloss/translation if data.gloss then insert(annotations, export.mark(data.gloss, "gloss")) end -- Part of speech if data.pos then -- debug category for pos= containing transcriptions if data.pos:match("/[^><]-/") then data.pos = data.pos .. "[[Category:links likely containing transcriptions in pos]]" end -- Canonicalize part of speech aliases as well as non-lemma aliases like 'nf' or 'nounf' for "noun form". pos_tags = pos_tags or (m_headword_data or get_headword_data()).pos_aliases local pos = pos_tags[data.pos] if not pos and data.pos:find("f$") then local pos_form = data.pos:sub(1, -2) -- We only expand something ending in 'f' if the result is a recognized non-lemma POS. pos_form = (pos_tags[pos_form] or pos_form) .. " form" if (m_headword_data or get_headword_data()).nonlemmas[pos_form .. "s"] then pos = pos_form end end insert(annotations, export.mark(pos or data.pos, "pos")) end -- Inflection data if data.infl then local m_form_of = require(form_of_module) -- Split tag sets manually, since tagged_inflections creates a numbered list, and we do not want that. local infl_outputs = {} local tag_sets = m_form_of.split_tag_set(data.infl) for _, tag_set in ipairs(tag_sets) do table.insert(infl_outputs, m_form_of.tagged_inflections({ tags = tag_set, lang = data.lang, nocat = true, nolink = true, nowrap = true })) end insert(annotations, export.mark(table.concat(infl_outputs, "; "), "infl")) end -- Non-gloss text if data.ng then insert(annotations, export.mark(data.ng, "non-gloss")) end -- Literal/sum-of-parts meaning if data.lit then insert(annotations, "literally " .. export.mark(data.lit, "gloss")) end -- Provide a hook to insert additional annotations such as nested inflections. if data.postprocess_annotations then data.postprocess_annotations { data = data, annotations = annotations } end if #annotations > 0 then insert(output, " " .. export.mark(concat(annotations, ", "), "annotations")) end return concat(output) end -- Encode certain characters to avoid various delimiter-related issues at various stages. We need to encode < and > -- because they end up forming part of CSS class names inside of <span ...> and will interfere with finding the end -- of the HTML tag. I first tried converting them to URL encoding, i.e. %3C and %3E; they then appear in the URL as -- %253C and %253E, which get mapped back to %3C and %3E when passed to [[Module:accel]]. But mapping them to &lt; -- and &gt; somehow works magically without any further work; they appear in the URL as < and >, and get passed to -- [[Module:accel]] as < and >. I have no idea who along the chain of calls is doing the encoding and decoding. If -- someone knows, please modify this comment appropriately! local accel_char_map local function get_accel_char_map() accel_char_map = { ["%"] = ".", [" "] = "_", ["_"] = u(0xFFF0), ["<"] = "&lt;", [">"] = "&gt;", } return accel_char_map end local function encode_accel_param_chars(param) return (param:gsub("[% <>_]", accel_char_map or get_accel_char_map())) end local function encode_accel_param(prefix, param) if not param then return "" end if type(param) == "table" then local filled_params = {} -- There may be gaps in the sequence, especially for translit params. local maxindex = 0 for k in pairs(param) do if type(k) == "number" and k > maxindex then maxindex = k end end for i = 1, maxindex do filled_params[i] = param[i] or "" end -- [[Module:accel]] splits these up again. param = concat(filled_params, "*~!") end -- This is decoded again by [[WT:ACCEL]]. return prefix .. encode_accel_param_chars(param) end local function insert_if_not_blank(list, item) if item == "" then return end insert(list, item) end local function get_class(lang, tr, accel, nowrap) if not accel and not nowrap then return "" end local classes = {} if accel then insert(classes, "form-of lang-" .. lang:getFullCode()) local form = accel.form if form then insert(classes, encode_accel_param_chars(form) .. "-form-of") end insert_if_not_blank(classes, encode_accel_param("gender-", accel.gender)) insert_if_not_blank(classes, encode_accel_param("pos-", accel.pos)) insert_if_not_blank(classes, encode_accel_param("transliteration-", accel.translit or (tr ~= "-" and tr or nil))) insert_if_not_blank(classes, encode_accel_param("target-", accel.target)) insert_if_not_blank(classes, encode_accel_param("origin-", accel.lemma)) insert_if_not_blank(classes, encode_accel_param("origin_transliteration-", accel.lemma_translit)) if accel.no_store then insert(classes, "form-of-nostore") end end if nowrap then insert(classes, nowrap) end return concat(classes, " ") end -- Add any left or right regular or accent qualifiers, labels or references to a formatted term. `data` is the object -- specifying the term, which should optionally contain: -- * a language object in `lang`; required if any accent qualifiers or labels are given; -- * left regular qualifiers in `q` (an array of strings or a single string); an empty array or blank string will be -- ignored; -- * right regular qualifiers in `qq` (an array of strings or a single string); an empty array or blank string will be -- ignored; -- * left accent qualifiers in `a` (an array of strings); an empty array will be ignored; -- * right accent qualifiers in `aa` (an array of strings); an empty array will be ignored; -- * left labels in `l` (an array of strings); an empty array will be ignored; -- * right labels in `ll` (an array of strings); an empty array will be ignored; -- * references in `refs`, an array either of strings (formatted reference text) or objects containing fields `text` -- (formatted reference text) and optionally `name` and/or `group`. -- `formatted` is the formatted version of the term itself. local function add_qualifiers_and_refs_to_term(data, formatted) local q = data.q if type(q) == "string" then q = { q } end local qq = data.qq if type(qq) == "string" then qq = { qq } end if q and q[1] or qq and qq[1] or data.a and data.a[1] or data.aa and data.aa[1] or data.l and data.l[1] or data.ll and data.ll[1] or data.refs and data.refs[1] then formatted = format_qualifiers { lang = data.lang, text = formatted, q = q, qq = qq, a = data.a, aa = data.aa, l = data.l, ll = data.ll, refs = data.refs, } end return formatted end --[==[ Creates a full link, with annotations (see `[[#format_link_annotations|format_link_annotations]]`), in the style of {{tl|l}} or {{tl|m}}. The first argument, `data`, must be a table. It contains the various elements that can be supplied as parameters to {{tl|l}} or {{tl|m}}: { { term = entry_to_link_to, alt = link_text_or_displayed_text, lang = language_object, sc = script_object, track_sc = boolean, no_nonstandard_sc_cat = boolean, fragment = link_fragment, id = sense_id, genders = { "gender1", "gender2", ... }, tr = transliteration, respect_link_tr = boolean, ts = transcription, gloss = gloss, pos = part_of_speech_tag, ng = non-gloss text, lit = literal_translation, infl = { "form_of_grammar_tag1", "form_of_grammar_tag2", ... }, no_alt_ast = boolean, accel = {accelerated_creation_tags}, interwiki = interwiki, pretext = "text_at_beginning" or nil, posttext = "text_at_end" or nil, q = { "left_qualifier1", "left_qualifier2", ...} or "left_qualifier", qq = { "right_qualifier1", "right_qualifier2", ...} or "right_qualifier", l = { "left_label1", "left_label2", ...}, ll = { "right_label1", "right_label2", ...}, a = { "left_accent_qualifier1", "left_accent_qualifier2", ...}, aa = { "right_accent_qualifier1", "right_accent_qualifier2", ...}, refs = { "formatted_ref1", "formatted_ref2", ...} or { {text = "text", name = "name", group = "group"}, ... }, show_qualifiers = boolean, } } Any one of the items in the `data` table may be {nil}, but an error will be shown if neither `term` nor `alt` nor `tr` is present. Thus, calling {full_link{ term = term, lang = lang, sc = sc }}, where `term` is the page to link to (which may have diacritics that will be stripped and/or embedded bracketed links) and `lang` is a [[Module:languages#Language objects|language object]] from [[Module:languages]], will give a plain link similar to the one produced by the template {{tl|l}}, and calling {full_link( { term = term, lang = lang, sc = sc }, "term" )} will give a link similar to the one produced by the template {{tl|m}}. The function will: * Try to determine the script, based on the characters found in the `term` or `alt` argument, if the script was not given. If a script is given and `track_sc` is {true}, it will check whether the input script is the same as the one which would have been automatically generated and add the category [[:Category:LANG terms with redundant script codes]] if yes, or [[:Category:LANG terms with non-redundant manual script codes]] if no. This should be used when the input script object is directly determined by a template's `sc` parameter. * Call `[[#language_link|language_link]]` on the `term` or `alt` forms, to remove diacritics in the page name, process any embedded wikilinks and create links to Reconstruction or Appendix pages when necessary. * Call `[[Module:script utilities#tag_text]]` to add the appropriate language and script tags to the term and italicize terms written in the Latin script if necessary. Accelerated creation tags, as used by [[WT:ACCEL]], are included. * Generate a transliteration, based on the `alt` or `term` arguments, if the script is not Latin, no transliteration was provided in `tr` and the combination of the term's language and script support automatic transliteration. The transliteration itself will be linked if both `.respect_link_tr` is specified and the language of the term has the `link_tr` property set for the script of the term; but not otherwise. * Add the annotations (transliteration, gender, gloss, etc.) after the link. * If `no_alt_ast` is specified, then the `alt` text does not need to contain an asterisk if the language is reconstructed. This should only be used by modules which really need to allow links to reconstructions that don't display asterisks (e.g. number boxes). * If `pretext` or `posttext` is specified, this is text to (respectively) prepend or append to the output, directly before processing qualifiers, labels and references. This can be used to add arbitrary extra text inside of the qualifiers, labels and references. * If `show_qualifiers` is specified or the `show_qualifiers` argument is given, then left and right qualifiers, accent qualifiers, labels and references will be displayed, otherwise they will be ignored. (This is because a fair amount of code stores qualifiers, labels and/or references in these fields and displays them itself, rather than expecting {full_link()} to display them.)]==] function export.full_link(data, face, allow_self_link, show_qualifiers) if type(data) ~= "table" then error("The first argument to the function full_link must be a table. " .. "See Module:links/documentation for more information.") elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then track("escaped", "full_link") end -- Prevent data from being destructively modified. local data = shallow_copy(data) -- FIXME: this shouldn't be added to `data`, as that means the input table needs to be cloned. data.cats = {} -- Categorize links to "und". local lang, cats = data.lang, data.cats if cats and lang:getCode() == "und" then insert(cats, "Undetermined language links") end local terms = { true } -- Generate multiple forms if applicable. for _, param in ipairs { "term", "alt" } do if type(data[param]) == "string" and data[param]:find("//", nil, true) then data[param] = export.split_on_slashes(data[param]) elseif type(data[param]) == "string" and not (type(data.term) == "string" and data.term:find("//", nil, true)) then if not data.no_generate_forms then data[param] = lang:generateForms(data[param]) else data[param] = { data[param] } end else data[param] = {} end end for _, param in ipairs { "sc", "tr", "ts" } do data[param] = { data[param] } end for _, param in ipairs { "term", "alt", "sc", "tr", "ts" } do for i in pairs(data[param]) do terms[i] = true end end -- Create the link local output = {} local id, no_alt_ast, srwc, accel, nevercalltr = data.id, data.no_alt_ast, data.suppress_redundant_wikilink_cat, data.accel, data.never_call_transliteration_module local link_tr = data.respect_link_tr and lang:link_tr(data.sc[1]) for i in ipairs(terms) do local link -- Is there any text to show? if (data.term[i] or data.alt[i]) then -- Try to detect the script if it was not provided local display_term = data.alt[i] or data.term[i] local best = lang:findBestScript(display_term) -- no_nonstandard_sc_cat is intended for use in [[Module:interproject]] if ( not data.no_nonstandard_sc_cat and best:getCode() == "None" and find_best_script_without_lang(display_term):getCode() ~= "None" ) then insert(cats, lang:getFullName() .. " terms in nonstandard scripts") end if not data.sc[i] then data.sc[i] = best -- Track uses of sc parameter. elseif data.track_sc then if data.sc[i]:getCode() == best:getCode() then insert(cats, lang:getFullName() .. " terms with redundant script codes") else insert(cats, lang:getFullName() .. " terms with non-redundant manual script codes") end end -- If using a discouraged character sequence, add to maintenance category if data.sc[i]:hasNormalizationFixes() == true then if (data.term[i] and data.sc[i]:fixDiscouragedSequences(toNFC(data.term[i])) ~= toNFC(data.term[i])) or (data.alt[i] and data.sc[i]:fixDiscouragedSequences(toNFC(data.alt[i])) ~= toNFC(data.alt[i])) then insert(cats, "Pages using discouraged character sequences") end end link = simple_link( data.term[i], data.fragment, data.alt[i], lang, data.sc[i], id, cats, no_alt_ast, srwc ) end -- simple_link can return nil, so check if a link has been generated. if link then -- Add "nowrap" class to prefixes in order to prevent wrapping after the hyphen local nowrap local display_term = data.alt[i] or data.term[i] if display_term and (display_term:find("^%-") or display_term:find("^־")) then -- Hebrew maqqef -- FIXME, use hyphens from [[Module:affix]] nowrap = "nowrap" end link = tag_text(link, lang, data.sc[i], face, get_class(lang, data.tr[i], accel, nowrap)) else --[[ No term to show. Is there at least a transliteration we can work from? ]] link = request_script(lang, data.sc[i]) -- No link to show, and no transliteration either. Show a term request (unless it's a substrate, as they rarely take terms). if (link == "" or (not data.tr[i]) or data.tr[i] == "-") and lang:getFamilyCode() ~= "qfa-sub" then -- If there are multiple terms, break the loop instead. if i > 1 then remove(output) break elseif NAMESPACE ~= "Template" then insert(cats, lang:getFullName() .. " term requests") end link = "<small>[Term?]</small>" end end insert(output, link) if i < #terms then insert(output, "<span class=\"Zsym mention\" style=\"font-size:100%;\">&nbsp;/ </span>") end end -- When suppress_tr is true, do not show or generate any transliteration if data.suppress_tr then data.tr[1] = nil else -- TODO: Currently only handles the first transliteration, pending consensus on how to handle multiple translits for multiple forms, as this is not always desirable (e.g. traditional/simplified Chinese). if data.tr[1] == "" or data.tr[1] == "-" then data.tr[1] = nil else local phonetic_extraction = load_data("Module:links/data").phonetic_extraction phonetic_extraction = phonetic_extraction[lang:getCode()] or phonetic_extraction[lang:getFullCode()] if phonetic_extraction then data.tr[1] = data.tr[1] or require(phonetic_extraction).getTranslit(export.remove_links(data.alt[1] or data.term[1])) elseif (data.term[1] or data.alt[1]) and data.sc[1]:isTransliterated() then -- Track whenever there is manual translit. The categories below like 'terms with redundant transliterations' -- aren't sufficient because they only work with reference to automatic translit and won't operate at all in -- languages without any automatic translit, like Persian and Hebrew. if data.tr[1] then local full_code = lang:getFullCode() track("manual-tr", full_code) end if not nevercalltr then -- Try to generate a transliteration. local text = data.alt[1] or data.term[1] if not link_tr then text = export.remove_links(text, true) end local automated_tr = lang:transliterate(text, data.sc[1]) if automated_tr then local manual_tr = data.tr[1] if manual_tr then if export.remove_links(manual_tr) == export.remove_links(automated_tr) then insert(cats, lang:getFullName() .. " terms with redundant transliterations") else -- Prevents Arabic root categories from flooding the tracking categories. if NAMESPACE ~= "Category" then insert(cats, lang:getFullName() .. " terms with non-redundant manual transliterations") end end end if not manual_tr or lang:overrideManualTranslit(data.sc[1]) then data.tr[1] = automated_tr end end end end end end -- Link to the transliteration entry for languages that require this if data.tr[1] and link_tr and not data.tr[1]:match("%[%[(.-)%]%]") then data.tr[1] = simple_link( data.tr[1], nil, nil, lang, get_script("Latn"), nil, cats, no_alt_ast, srwc ) elseif data.tr[1] and not link_tr then -- Remove the pseudo-HTML tags added by remove_links. data.tr[1] = data.tr[1]:gsub("</?link>", "") end if data.tr[1] and not umatch(data.tr[1], "[^%s%p]") then data.tr[1] = nil end insert(output, export.format_link_annotations(data, face)) if data.pretext then insert(output, 1, data.pretext) end if data.posttext then insert(output, data.posttext) end local categories = cats[1] and format_categories(cats, lang, "-", nil, nil, data.sc) or "" output = concat(output) if show_qualifiers or data.show_qualifiers then output = add_qualifiers_and_refs_to_term(data, output) end return output .. categories end --[==[Replaces all wikilinks with their displayed text, and removes any categories. This function can be invoked either from a template or from another module. -- Strips links: deletes category links, the targets of piped links, and any double square brackets involved in links (other than file links, which are untouched). If `tag` is set, then any links removed will be given pseudo-HTML tags, which allow the substitution functions in [[Module:languages]] to properly subdivide the text in order to reduce the chance of substitution failures in modules which scrape pages like [[Module:zh-translit]]. -- FIXME: This is quite hacky. We probably want this to be integrated into [[Module:languages]], but we can't do that until we know that nothing is pushing pipe linked transliterations through it for languages which don't have link_tr set. * <code><nowiki>[[page|displayed text]]</nowiki></code> &rarr; <code><nowiki>displayed text</nowiki></code> * <code><nowiki>[[page and displayed text]]</nowiki></code> &rarr; <code><nowiki>page and displayed text</nowiki></code> * <code><nowiki>[[Category:English lemmas|WORD]]</nowiki></code> &rarr; ''(nothing)'']==] function export.remove_links(text, tag) if type(text) == "table" then text = text.args[1] end if not text or text == "" then return "" end text = text :gsub("%[%[", "\1") :gsub("%]%]", "\2") -- Parse internal links for the display text. text = text:gsub("(\1)([^\1\2]-)(\2)", function(c1, c2, c3) -- Don't remove files. for _, false_positive in ipairs({ "file", "image" }) do if c2:lower():match("^" .. false_positive .. ":") then return c1 .. c2 .. c3 end end -- Remove categories completely. for _, false_positive in ipairs({ "category", "cat" }) do if c2:lower():match("^" .. false_positive .. ":") then return "" end end -- In piped links, remove all text before the pipe, unless it's the final character (i.e. the pipe trick), in which case just remove the pipe. c2 = c2:match("^[^|]*|(.+)") or c2:match("([^|]+)|$") or c2 if tag then return "<link>" .. c2 .. "</link>" else return c2 end end) text = text :gsub("\1", "[[") :gsub("\2", "]]") return text end function export.section_link(link) if type(link) ~= "string" then error("The first argument to section_link was a " .. type(link) .. ", but it should be a string.") elseif link:find("\\", nil, true) then track("escaped", "section_link") end local target, section = get_fragment((link:gsub("_", " "))) if not section then error("No \"#\" delineating a section name") end return simple_link( target, section, target .. " §&nbsp;" .. section ) end return export go1a5j6fymqq8baizjbz87vusf7r536 Module:links/templates 828 8184 27719 2026-06-22T06:46:36Z Umarxon III 2840 Sahypa döretdi, mazmuny: '-- Prevent substitution. if mw.isSubsting() then return require("Module:unsubst") end local export = {} local links_module = "Module:links" local process_params = require("Module:parameters").process local remove = table.remove local upper = require("Module:string utilities").upper --[=[ Modules used: [[Module:links]] [[Module:languages]] [[Module:scripts]] [[Module:parameters]] [[Module:debug]] ]=] do local function get_args(frame) -- `compat` is a...' 27719 Scribunto text/plain -- Prevent substitution. if mw.isSubsting() then return require("Module:unsubst") end local export = {} local links_module = "Module:links" local process_params = require("Module:parameters").process local remove = table.remove local upper = require("Module:string utilities").upper --[=[ Modules used: [[Module:links]] [[Module:languages]] [[Module:scripts]] [[Module:parameters]] [[Module:debug]] ]=] do local function get_args(frame) -- `compat` is a compatibility mode for {{term}}. -- If given a nonempty value, the function uses lang= to specify the -- language, and all the positional parameters shift one number lower. local iargs = frame.args iargs.compat = iargs.compat and iargs.compat ~= "" iargs.langname = iargs.langname and iargs.langname ~= "" iargs.notself = iargs.notself and iargs.notself ~= "" local alias_of_4 = {alias_of = 4} local boolean = {type = "boolean"} local params = { [1] = {required = true, type = "language", default = "und"}, [2] = true, [3] = true, [4] = true, g = {list = true, type = "genders", flatten = true}, gloss = alias_of_4, id = true, lit = true, ng = true, pos = true, sc = {type = "script"}, t = alias_of_4, tr = true, ts = true, q = {type = "qualifier"}, qq = {type = "qualifier"}, l = {type = "labels"}, ll = {type = "labels"}, ref = {type = "references"}, ["accel-form"] = true, ["accel-translit"] = true, ["accel-lemma"] = true, ["accel-lemma-translit"] = true, ["accel-gender"] = true, ["accel-nostore"] = boolean, } if iargs.compat then params.lang = {type = "language", default = "und"} remove(params, 1) alias_of_4.alias_of = 3 end if iargs.langname then params.w = boolean end return process_params(frame:getParent().args, params), iargs end -- Used in [[Template:l]] and [[Template:m]]. function export.l_term_t(frame) local args, iargs = get_args(frame) local compat = iargs.compat local lang = args[compat and "lang" or 1] -- Tracking for und. if not compat and lang:getCode() == "und" then require("Module:debug").track("link/und") end local term = args[(compat and 1 or 2)] local alt = args[(compat and 2 or 3)] term = term ~= "" and term or nil if not term and not alt and iargs.demo then term = iargs.demo end local langname = iargs.langname and ( args.w and lang:makeWikipediaLink() or lang:getCanonicalName() ) or nil if langname and term == "-" then return langname end -- Forward the information to full_link return (langname and langname .. " " or "") .. require(links_module).full_link( { lang = lang, sc = args.sc, track_sc = true, term = term, alt = alt, gloss = args[4], id = args.id, tr = args.tr, ts = args.ts, genders = args.g, pos = args.pos, ng = args.ng, lit = args.lit, q = args.q, qq = args.qq, l = args.l, ll = args.ll, refs = args.ref, show_qualifiers = true, accel = args["accel-form"] and { form = args["accel-form"], translit = args["accel-translit"], lemma = args["accel-lemma"], lemma_translit = args["accel-lemma-translit"], gender = args["accel-gender"], nostore = args["accel-nostore"], } or nil }, iargs.face, not iargs.notself ) end -- Used in [[Template:link-annotations]]. function export.l_annotations_t(frame) local args, iargs = get_args(frame) -- Forward the information to format_link_annotations return require(links_module).format_link_annotations( { lang = args[1], tr = { args.tr }, ts = { args.ts }, genders = args.g, pos = args.pos, ng = args.ng, lit = args.lit }, iargs.face ) end end -- Used in [[Template:ll]]. do local function get_args(frame) return process_params(frame:getParent().args, { [1] = {required = true, type = "language", default = "und"}, [2] = {allow_empty = true}, [3] = true, id = true, sc = {type = "script"}, }) end function export.ll(frame) local args = get_args(frame) local lang = args[1] local sc = args.sc local term = args[2] term = term ~= "" and term or nil return require(links_module).language_link{ lang = lang, sc = sc, term = term, alt = args[3], id = args.id } or "<small>[Term?]</small>" .. require("Module:utilities").format_categories( {lang:getFullName() .. " term requests"}, lang, "-", nil, nil, sc ) end end function export.def_t(frame) local args = process_params(frame:getParent().args, { [1] = {required = true, default = ""}, }) local face = frame.args.face local ret = require("Module:script utilities").tag_definition(require(links_module).embedded_language_links{ term = args[1], lang = require("Module:languages").getByCode("en"), sc = require("Module:scripts").getByCode("Latn") }, face) if face == "non-gloss" then return ret end return '<span class="mention-gloss-paren">(</span>' .. ret .. '<span class="mention-gloss-paren">)</span>' end function export.linkify_t(frame) local args = process_params(frame:getParent().args, { [1] = {required = true, default = ""}, }) args[1] = mw.text.trim(args[1]) if args[1] == "" or args[1]:find("[[", nil, true) then return args[1] end return "[[" .. args[1] .. "]]" end function export.cap_t(frame) local args = process_params(frame:getParent().args, { [1] = {required = true}, [2] = true, lang = {type = "language", default = "en"}, }) local term = args[1] return require(links_module).full_link{ lang = args.lang, term = term, alt = term:gsub("^.[\128-\191]*", upper) .. (args[2] or "") } end function export.section_link_t(frame) local args = process_params(frame:getParent().args, { [1] = {}, }) return require(links_module).section_link(args[1]) end return export 49ywpqwyhc1meeszxumsw7m3fxck38v Module:links/testcases 828 8185 27720 2026-06-22T06:47:40Z Umarxon III 2840 Sahypa döretdi, mazmuny: '--[=[ Unit tests for [[Module:links]]. Click talk page to run tests. ]=] local p = require('Module:UnitTests') local m_links = require('Module:links') local m_util = require('Module:utilities') local get_lang_by_code = require("Module:languages").getByCode local function tag(lang_code, sc_code) return function (text) return '<span class="' .. sc_code .. '" lang="' .. lang_code .. '">' .. text .. '</span>' end end local options = { nowiki = true, show_diff...' 27720 Scribunto text/plain --[=[ Unit tests for [[Module:links]]. Click talk page to run tests. ]=] local p = require('Module:UnitTests') local m_links = require('Module:links') local m_util = require('Module:utilities') local get_lang_by_code = require("Module:languages").getByCode local function tag(lang_code, sc_code) return function (text) return '<span class="' .. sc_code .. '" lang="' .. lang_code .. '">' .. text .. '</span>' end end local options = { nowiki = true, show_difference = true } function p:check_link(example, expected) self:preprocess_equals(example, expected, options) end function p:test_links() local frame = mw.getCurrentFrame() local temp = frame.args.temp or "l" local compat = frame.args.compat local lang = compat and "lang=" or "" local link_examples = { 'anchor', { '{{' .. temp .. '|' .. lang .. 'en|-er#Etymology 2|-er}}', '<span class="Latn" lang="en">[[-er#Etymology 2|-er]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'en|[[-er#Etymology 2|-er]]}}', '<span class="Latn" lang="en">[[-er#Etymology 2|-er]]</span>' }, 'character entity references in link target', { '{{' .. temp .. '|' .. lang .. 'nia|wa&#39;a}}', '<span class="Latn" lang="nia">[[wa\'a#Nias|wa\'a]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'nia|wa&#x27;a}}', '<span class="Latn" lang="nia">[[wa\'a#Nias|wa\'a]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'ja|恵&#8204;美|恵&amp;#8204;美}}', '<span class="Jpan" lang="ja">[[恵‌美#Japanese|恵&amp;#8204;美]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'en|&amp;}}', '<span class="None" lang="en">[[Unsupported titles/Amp#English|&amp;]]</span>' }, 'simple linking', -- ([[Module:languages]]) { '{{' .. temp .. '|' .. lang .. 'la|verbum}}', '<span class="Latn" lang="la">[[verbum#Latin|verbum]]</span>' }, 'using wikilinks', { '{{' .. temp .. '|' .. lang .. 'en|[[God]] be [[with]] [[you]]}}', '<span class="Latn" lang="en">[[God#English|God]] be [[with#English|with]] [[you#English|you]]</span>' }, 'alternative text', { '{{' .. temp .. '|' .. lang .. 'en|go|went}}', '<span class="Latn" lang="en">[[go#English|went]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'en|to [[go]]|went}}', '<span class="Latn" lang="en">to [[go#English|go]]</span>' }, 'sense id', { '{{' .. temp .. '|' .. lang .. 'en|go|id=game}}', '<span class="Latn" lang="en">[[go#English-game|go]]</span>' }, 'constructed terms', -- ([[Module:languages]]) { '{{' .. temp .. '|' .. lang .. 'sjn|mithril}}', '<span class="Latn" lang="sjn">[[Appendix&#x3a;Sindarin/mithril|mithril]]</span>' }, 'reconstructed terms', -- ([[Module:languages]]) { '{{' .. temp .. '|' .. lang .. 'ine-pro|*bʰréh₂tēr}}', '<span class="Latn" lang="ine-pro">[[Reconstruction&#x3a;Proto-Indo-European/bʰréh₂tēr|*bʰréh₂tēr]]</span>' }, { '{{#iferror:{{' .. temp .. '|' .. lang .. 'ine-pro|bʰréh₂tēr}}|Script error}}', 'Script error' }, { '{{' .. temp .. '|' .. lang .. 'sla-pro|[[*dьnь]] [[*serda]]}}', '<span class="Latn" lang="sla-pro">[[Reconstruction&#x3a;Proto-Slavic/dьnь|*dьnь]] [[Reconstruction&#x3a;Proto-Slavic/serda|*serda]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'la|verbum .. [[verbum]] .. [[*verbum]] .. [[*verbum|verbum]] .. [[*verbum|*verba]]}}', '<span class="Latn" lang="la">verbum .. [[verbum#Latin|verbum]] .. [[Reconstruction&#x3a;Latin/verbum|*verbum]] .. [[Reconstruction&#x3a;Latin/verbum|verbum]] .. [[Reconstruction&#x3a;Latin/verbum|*verba]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'sla-pro|*[[serda]]}}', '<span class="Latn" lang="sla-pro">*[[Reconstruction&#x3a;Proto-Slavic/serda|serda]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'sla-pro|*[[*serda]] .. [[*serda]] .. [[serda]] .. [[*serda|serda]] .. [[*serda|*serda]]}}', '<span class="Latn" lang="sla-pro">[[Reconstruction&#x3a;Proto-Slavic/*serda|*serda]] .. [[Reconstruction&#x3a;Proto-Slavic/*serda|*serda]] .. [[Reconstruction&#x3a;Proto-Slavic/serda|serda]] .. [[Reconstruction&#x3a;Proto-Slavic/*serda|serda]] .. [[Reconstruction&#x3a;Proto-Slavic/*serda|*serda]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'sla-pro|*[[dьnь|alt1]] [[serda|alt2]]}}', '<span class="Latn" lang="sla-pro">*[[Reconstruction&#x3a;Proto-Slavic/dьnь|alt1]] [[Reconstruction&#x3a;Proto-Slavic/serda|alt2]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'und|[[attested]] .. [[*unattested]] .. [[*unattested|unattested-alt]]}}', '<span class="Zyyy" lang="und">[[attested|attested]] .. *unattested .. unattested-alt</span>[[Category:Undetermined language links]]' }, 'script detection', -- (lang_obj:findBestScript()) { '{{' .. temp .. '|' .. lang .. 'sh|српски}} / {{' .. temp .. '|' .. lang .. 'sh|srpski}}', '<span class="Cyrl" lang="sh">[[српски#Serbo-Croatian|српски]]</span> / <span class="Latn" lang="sh">[[srpski#Serbo-Croatian|srpski]]</span>' }, 'target page\'s title', -- (Language:stripDiacritics()) { '{{' .. temp .. '|' .. lang .. 'la|verbō}}', '<span class="Latn" lang="la">[[verbo#Latin|verbō]]</span>' }, 'gender and number', -- ([[Module:gender and number]]) { '{{' .. temp .. '|' .. lang .. 'la|verbum|g=m}}', '<span class="Latn" lang="la">[[verbum#Latin|verbum]]</span>&nbsp;<span class="gender"><abbr title="masculine gender">m</abbr></span>' }, { '{{' .. temp .. '|' .. lang .. 'la|verbum|g=m|g2=f}}', '<span class="Latn" lang="la">[[verbum#Latin|verbum]]</span>&nbsp;<span class="gender"><abbr title="masculine gender">m</abbr> or <abbr title="feminine gender">f</abbr></span>' }, 'transliteration', { '{{' .. temp .. '|' .. lang .. 'ar|كلمة|tr=kalima}}', '<span class="Arab" lang="ar">[[كلمة#Arabic|كلمة]]</span>&lrm; <span class="mention-gloss-paren annotation-paren">(</span><span lang="ar-Latn" class="tr Latn">kalima</span><span class="mention-gloss-paren annotation-paren">)</span>' }, { '{{' .. temp .. '|' .. lang .. 'ru|русский}}', '<span class="Cyrl" lang="ru">[[русский#Russian|русский]]</span> <span class="mention-gloss-paren annotation-paren">(</span><span lang="ru-Latn" class="tr Latn">russkij</span><span class="mention-gloss-paren annotation-paren">)</span>' }, 'gloss', { '{{' .. temp .. '|' .. lang .. 'ru|русский|gloss=Russian}}', '<span class="Cyrl" lang="ru">[[русский#Russian|русский]]</span> <span class="mention-gloss-paren annotation-paren">(</span><span lang="ru-Latn" class="tr Latn">russkij</span>, <span class="mention-gloss-double-quote">“</span><span class="mention-gloss">Russian</span><span class="mention-gloss-double-quote">”</span><span class="mention-gloss-paren annotation-paren">)</span>' }, 'Wikipedia link', { '{{' .. temp .. '|' .. lang .. 'en|w:word}}', '<span class="Latn" lang="en">[[w:word|word]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'en|[[w:English language]]}}', '<span class="Latn" lang="en">[[w:English language|w:English language]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'en|[[wikipedia:English language]]}}', '<span class="Latn" lang="en">[[wikipedia:English language|wikipedia:English language]]</span>' }, 'Linking to titles with special characters: asterisk, slash', { '{{' .. temp .. '|' .. lang .. 'mul|/}}', '<span class="None" lang="mul">[[&#x3a;/#Translingual|/]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'mul|//}}', '<span class="None" lang="mul">[[&#x3a;//#Translingual|//]]</span>' }, { '{{' .. temp .. '|' .. lang .. 'mul|*}}', '<span class="None" lang="mul">[[*#Translingual|*]]</span>' }, } self:iterate(link_examples, 'check_link') end function p:check_strip_diacritics(lang_code, unstripped, stripped) local lang_obj = get_lang_by_code(lang_code) local sc_code = lang_obj:findBestScript(unstripped):getCode() self:equals( ('[%s] <i class="mention %s" lang="%s">%s</i>'):format(lang_code, sc_code, lang_code, unstripped), lang_obj:stripDiacritics(unstripped), stripped, { display = tag(lang_code, sc_code) } ) end function p:test_remove_diacritics() -- insert here lines of the form: local examples = { { 'ru', 'ба́бушка', 'бабушка' }, { 'mk', 'ЃѓЌќ - е́а́́', 'ЃѓЌќ - еа' }, -- [[w:Macedonian alphabet]] { 'sh', 'Łł ĆćŃńŹź Ŭŭ - ȁàȃáā ȐȒŔ ѝӣ', 'Łł ĆćŃńŹź Ŭŭ - aaaaa RRR ии' }, -- [[w:Serbian Cyrillic alphabet]] / [[w:Gaj's Latin alphabet]] { 'grc', 'ᾱ, ᾱ́, ᾰ̓́', 'α, ά, ἄ' }, } self:iterate(examples, 'check_strip_diacritics') end function p:test_section_link() local examples = { { "w:Hindustani phonology#Vowels [ɛ], [ɛː]", "[[w:Hindustani phonology#Vowels_%5B%C9%9B%5D,_%5B%C9%9B%CB%90%5D|" .. "w:Hindustani phonology §&nbsp;Vowels [ɛ], [ɛː]]]" }, } self:iterate( examples, function (self, page, expected) self:equals( mw.text.nowiki(page), m_links.section_link(page), expected) end) end return p epu79d5ol18lhi07uskbf2m3w0ae2zr Module:nn-inf 828 8186 27721 2026-06-22T06:48:37Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local lang = require("Module:languages").getByCode("nn") local links = require("Module:links") local export = {} function export.main(frame) local PAGENAME = mw.loadData("Module:headword/data").pagename local args = frame:getParent().args local root = '' local separator = '' if args[1] and args[1] ~= '' then root = args[1] else root = PAGENAME:gsub('[ae]$', '') end if args[2] and args[2] ~= '' then separator = args[2] else separator = '...' 27721 Scribunto text/plain local lang = require("Module:languages").getByCode("nn") local links = require("Module:links") local export = {} function export.main(frame) local PAGENAME = mw.loadData("Module:headword/data").pagename local args = frame:getParent().args local root = '' local separator = '' if args[1] and args[1] ~= '' then root = args[1] else root = PAGENAME:gsub('[ae]$', '') end if args[2] and args[2] ~= '' then separator = args[2] else separator = ',' end if separator == ',' then separator = ', ' end local linkA = links.full_link{term = root .. 'a', lang = lang} local linkE = links.full_link{term = root .. 'e', lang = lang} return linkA .. separator .. linkE end return export 9p1t1c0rn0kfdm6ec9pv8n920y03jv0 Module:quick link 828 8187 27722 2026-06-22T06:49:45Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local function validate_lang(lang) return type(lang) == "table" and type(lang.getCode) == "function" end -- Entry names are processed with this function. Catalan has no entry-name -- replacements, but apparently uses straight apostrophes in entry titles and -- curly ones in displayed text. local function curly_apostrophe_to_straight(str) return (str:gsub("’", "'")) end -- Find grammatical terms such as "first person" and "plural" in part...' 27722 Scribunto text/plain local export = {} local function validate_lang(lang) return type(lang) == "table" and type(lang.getCode) == "function" end -- Entry names are processed with this function. Catalan has no entry-name -- replacements, but apparently uses straight apostrophes in entry titles and -- curly ones in displayed text. local function curly_apostrophe_to_straight(str) return (str:gsub("’", "'")) end -- Find grammatical terms such as "first person" and "plural" in part of the -- Catalan personal pronouns table and link them. -- FIXME: This is a massive hack and should not be in this module. local function link_terms_in_ca_table(text) local ordinals = { "first", "second", "third" } text = text:gsub( "((%d)%l+ %l+)", function(whole_match, number) return "[[" .. ordinals[tonumber(number)] .. " person|" .. whole_match:gsub(" ", "&nbsp;") .. "]]" end) text = text:gsub( '![^\n]+singular.-class="notes%-row"', function(table_interior) return table_interior:gsub( "%l+", function(word) local link if word == "singular" or word == "neuter" or word:find "i[nv]e$" or word:find "al$" then link = word elseif word == "majestic" then link = "majestic plural|majestic" end if link then return "[[" .. link .. "]]" end end) end) return text end function export.main(frame) local params = { title = {required = true}, lang = {type = "language"}, } local args = require("Module:parameters").process(frame.args, params) local title = args.title local lang = args.lang local content = frame:preprocess("{{" .. title .. "}}") local m_links = require("Module:links") local function full_link(entry, text) if not text then -- FIXME!!! Another nasty hack. local curly_to_straight = curly_apostrophe_to_straight(entry) if curly_to_straight ~= text then text = entry entry = curly_to_straight end end return m_links.full_link { term = entry, alt = text, lang = lang, } end linked_content = content:gsub( "%b[]", function (potential_link) if potential_link:sub(2, 2) == "[" and potential_link:sub(-2, -2) == "]" then local link_contents = potential_link:sub(3, -3) -- strip off outer brackets local target, text if link_contents:find("|") then target, text = link_contents:match("^([^|]+)|(.+)$") else target = link_contents end if target:find("^([^:]+):") or target:find("#") then return potential_link else return full_link(target, text) end end end) if lang:getCode() == "ca" then linked_content = link_terms_in_ca_table(linked_content) end return linked_content end return export hia51mnglcb225uwtxz8p1tf0z7efcl Module:ru-link 828 8188 27723 2026-06-22T06:50:53Z Umarxon III 2840 Sahypa döretdi, mazmuny: 'local export = {} local full_link = require 'Module:links'.full_link local ru = require 'Module:languages'.getByCode 'ru' -- Just guessing at some of these! local abbreviations = { a = 'adjective', advpro = 'adverbial pronoun', anum = 'adjectival numeral', apro = 'adjectival pronoun', conj = 'conjunction', init = 'initialism', intj = 'interjection', num = 'numeral', part = 'particle', pr = 'preposition', s = 'substantive', spro = 'substantival pronoun', v...' 27723 Scribunto text/plain local export = {} local full_link = require 'Module:links'.full_link local ru = require 'Module:languages'.getByCode 'ru' -- Just guessing at some of these! local abbreviations = { a = 'adjective', advpro = 'adverbial pronoun', anum = 'adjectival numeral', apro = 'adjectival pronoun', conj = 'conjunction', init = 'initialism', intj = 'interjection', num = 'numeral', part = 'particle', pr = 'preposition', s = 'substantive', spro = 'substantival pronoun', v = 'verb', } function export.link_list(frame) local list = frame.args[1] list = list:gsub( '# ([^,]+), ([^\n]+)', function (word, POS) return '# ' .. full_link { lang = ru, term = word, pos = abbreviations[POS] or POS } end) return list end return export 9osmvyhzd5k2kn3e9zk178kz8it8ht2