وِکیٖلۄغَتھ kswiktionary https://ks.wiktionary.org/wiki/%D8%A7%D9%8E%DB%81%D9%8E%D9%85_%D8%B5%D9%8E%D9%81%DB%81%D9%95 MediaWiki 1.46.0-wmf.26 case-sensitive میڈیا خاص کَتھ رُکُن رُکُن کَتھ وِکیٖلۄغَتھ وِکیٖلۄغَتھ کَتھ فَیِل فَیِل کَتھ میٖڈیاوِکی میٖڈیاوِکی کَتھ فرما فرما کَتھ مَدَتھ مَدَتھ کَتھ زٲژ زٲژ کَتھ TimedText TimedText talk Module Module talk Event Event talk رُکُن:James500 2 8564 31781 26929 2026-04-30T16:29:11Z James500 3095 /* */ Remove template 31781 wikitext text/x-wiki {{#babel:en}} [[en:User:James500]] eu13so1xoub6xvcjld6yea9o5qsrodx جۆم تہٕ کٔشیٖر 0 9520 31782 2026-05-01T09:32:37Z آیات محراج 3545 /* */ Page created 31782 wikitext text/x-wiki ==کٲشُر== ===ناوُک آگُر=== From {{com|ks|جۆم|t1=[[Jammu]]|تہٕ|tr2=tụ|t2=[[and]]/[[&]]|کٔشیٖر|t3=[[Kashmir]]}}, likely a {{calque|ks|en|Jammu and Kashmir|nocap=1}}. ===وۄشژار=== * {{IPA|ks|/d͡ʒom tɨ kəʃiːr/}} ==ناوُن== # جۆم تہٕ کٔشیٖر (مَرکٔزی عَلاقہٕ) # تَوٲریٖخی طور جۆم تہٕ کٔشیٖر (شاہزٲدؠ عَلاقہٕ) تہٕ جۆم تہٕ کٔشیٖر (رِیاسَتھ) ====ہِوی لفٕظ==== * {{l|ks|جۆم}} * {{l|ks|کٔشیٖر}} t8pga2owi4rxcaj6zt8ogoxgct3576g 31783 31782 2026-05-01T09:33:00Z آیات محراج 3545 31783 wikitext text/x-wiki ==کٲشُر== ===ناوُک آگُر=== From {{com|ks|جۆم|t1=[[Jammu]]|تہٕ|tr2=tụ|t2=[[and]]/[[&]]|کٔشیٖر|t3=[[Kashmir]]}}, likely a {{calque|ks|en|Jammu and Kashmir|nocap=1}}. ===وۄشژار=== * {{IPA|ks|/d͡ʒom tɨ kəʃiːr/}} ===ناوُت=== # جۆم تہٕ کٔشیٖر (مَرکٔزی عَلاقہٕ) # تَوٲریٖخی طور جۆم تہٕ کٔشیٖر (شاہزٲدؠ عَلاقہٕ) تہٕ جۆم تہٕ کٔشیٖر (رِیاسَتھ) ====ہِوی لفٕظ==== * {{l|ks|جۆم}} * {{l|ks|کٔشیٖر}} bjoosl94m56utrh5x8ksf9ilv5k5qa7 31789 31783 2026-05-01T09:44:08Z آیات محراج 3545 /* وۄشژار */ 31789 wikitext text/x-wiki ==کٲشُر== ===ناوُک آگُر=== From {{com|ks|جۆم|t1=[[Jammu]]|تہٕ|tr2=tụ|t2=[[and]]/[[&]]|کٔشیٖر|t3=[[Kashmir]]}}, likely a {{calque|ks|en|Jammu and Kashmir|nocap=1}}. ===وۄشژار=== * {{ks-noun|ipa=lat͡sʰul}} ===ناوُت=== # جۆم تہٕ کٔشیٖر (مَرکٔزی عَلاقہٕ) # تَوٲریٖخی طور جۆم تہٕ کٔشیٖر (شاہزٲدؠ عَلاقہٕ) تہٕ جۆم تہٕ کٔشیٖر (رِیاسَتھ) ====ہِوی لفٕظ==== * {{l|ks|جۆم}} * {{l|ks|کٔشیٖر}} m1ahkm5z2ogk823ubnellu4fv9j3pe8 31790 31789 2026-05-01T09:44:39Z آیات محراج 3545 31790 wikitext text/x-wiki ==کٲشُر== ===ناوُک آگُر=== From {{com|ks|جۆم|t1=[[Jammu]]|تہٕ|tr2=tụ|t2=[[and]]/[[&]]|کٔشیٖر|t3=[[Kashmir]]}}, likely a {{calque|ks|en|Jammu and Kashmir|nocap=1}}. ===وۄشژار=== * {{ks-noun|ipa=d͡ʒom tɨ kəʃiːr}} ===ناوُت=== # جۆم تہٕ کٔشیٖر (مَرکٔزی عَلاقہٕ) # تَوٲریٖخی طور جۆم تہٕ کٔشیٖر (شاہزٲدؠ عَلاقہٕ) تہٕ جۆم تہٕ کٔشیٖر (رِیاسَتھ) ====ہِوی لفٕظ==== * {{l|ks|جۆم}} * {{l|ks|کٔشیٖر}} pcrxwkats5lc6o7xyynbfmi6x472r12 31792 31790 2026-05-01T09:51:31Z آیات محراج 3545 /* کٲشُر */ 31792 wikitext text/x-wiki ==کٲشُر== ===ناوُک آگُر=== {{l|ks|جۆم}} (jom, “Jammu”) +‎ تہٕ (tụ, “and/&”) +‎ {{l|ks|کٔشیٖر}} (kạśīr, “Kashmir”)، شایَد اَنٛگریٖزی لفٕظ {{l|ks|Jammu and Kashmir}} پؠٹھ ٲسِتھ. ===وۄشژار=== * {{ks-noun|ipa=d͡ʒom tɨ kəʃiːr}} ===ناوُت=== # جۆم تہٕ کٔشیٖر (مَرکٔزی عَلاقہٕ) # تَوٲریٖخی طور جۆم تہٕ کٔشیٖر (شاہزٲدؠ عَلاقہٕ) تہٕ جۆم تہٕ کٔشیٖر (رِیاسَتھ) ====ہِوی لفٕظ==== * {{l|ks|جۆم}} * {{l|ks|کٔشیٖر}} jcr8j74oa2zocw0ln2ju4186da4d5r6 31793 31792 2026-05-01T09:52:01Z آیات محراج 3545 /* کٲشُر */ 31793 wikitext text/x-wiki ==کٲشُر== ===ناوُک آگُر=== {{l|ks|جۆم}} (jom, “Jammu”) +‎ تہٕ (tụ, “and/&”) +‎ {{l|ks|کٔشیٖر}} (kạśīr, “Kashmir”)، شایَد اَنٛگریٖزی لفٕظ {{l|en|Jammu and Kashmir}} پؠٹھ ٲسِتھ. ===وۄشژار=== * {{ks-noun|ipa=d͡ʒom tɨ kəʃiːr}} ===ناوُت=== # جۆم تہٕ کٔشیٖر (مَرکٔزی عَلاقہٕ) # تَوٲریٖخی طور جۆم تہٕ کٔشیٖر (شاہزٲدؠ عَلاقہٕ) تہٕ جۆم تہٕ کٔشیٖر (رِیاسَتھ) ====ہِوی لفٕظ==== * {{l|ks|جۆم}} * {{l|ks|کٔشیٖر}} r7yrwl48ykuqt3x0u1nsxgz39qbjpe7 31795 31793 2026-05-01T10:04:38Z آیات محراج 3545 /* کٲشُر */ 31795 wikitext text/x-wiki ==کٲشُر== {{wp|ks:}} ===ناوُک آگُر=== {{l|ks|جۆم}} (jom, “Jammu”) +‎ تہٕ (tụ, “and/&”) +‎ {{l|ks|کٔشیٖر}} (kạśīr, “Kashmir”)، شایَد اَنٛگریٖزی لفٕظ {{l|en|Jammu and Kashmir}} پؠٹھ ٲسِتھ. ===وۄشژار=== * {{ks-noun|ipa=d͡ʒom tɨ kəʃiːr}} ===ناوُت=== # جۆم تہٕ کٔشیٖر (مَرکٔزی عَلاقہٕ) # تَوٲریٖخی طور جۆم تہٕ کٔشیٖر (شاہزٲدؠ عَلاقہٕ) تہٕ جۆم تہٕ کٔشیٖر (رِیاسَتھ) ====ہِوی لفٕظ==== * {{l|ks|جۆم}} * {{l|ks|کٔشیٖر}} 6qsvmw9stailt8aytkiv6bwai3skzfe 31797 31795 2026-05-01T10:12:45Z آیات محراج 3545 31797 wikitext text/x-wiki ==کٲشُر== {{wp}} ===ناوُک آگُر=== {{l|ks|جۆم}} (jom, “Jammu”) +‎ تہٕ (tụ, “and/&”) +‎ {{l|ks|کٔشیٖر}} (kạśīr, “Kashmir”)، شایَد اَنٛگریٖزی لفٕظ {{l|en|Jammu and Kashmir}} پؠٹھ ٲسِتھ. ===وۄشژار=== * {{ks-noun|ipa=d͡ʒom tɨ kəʃiːr}} ===ناوُت=== # جۆم تہٕ کٔشیٖر (مَرکٔزی عَلاقہٕ) # تَوٲریٖخی طور جۆم تہٕ کٔشیٖر (شاہزٲدؠ عَلاقہٕ) تہٕ جۆم تہٕ کٔشیٖر (رِیاسَتھ) ====ہِوی لفٕظ==== * {{l|ks|جۆم}} * {{l|ks|کٔشیٖر}} 1f6lab7qjy2ujcdefzemfzcf8oqfpg7 31799 31797 2026-05-01T10:18:33Z آیات محراج 3545 /* کٲشُر */ 31799 wikitext text/x-wiki ==کٲشُر== {{wp|زَبان=en}} ===ناوُک آگُر=== {{l|ks|جۆم}} (jom, “Jammu”) +‎ تہٕ (tụ, “and/&”) +‎ {{l|ks|کٔشیٖر}} (kạśīr, “Kashmir”)، شایَد اَنٛگریٖزی لفٕظ {{l|en|Jammu and Kashmir}} پؠٹھ ٲسِتھ. ===وۄشژار=== * {{ks-noun|ipa=d͡ʒom tɨ kəʃiːr}} ===ناوُت=== # جۆم تہٕ کٔشیٖر (مَرکٔزی عَلاقہٕ) # تَوٲریٖخی طور جۆم تہٕ کٔشیٖر (شاہزٲدؠ عَلاقہٕ) تہٕ جۆم تہٕ کٔشیٖر (رِیاسَتھ) ====ہِوی لفٕظ==== * {{l|ks|جۆم}} * {{l|ks|کٔشیٖر}} 9ifwna60beitlwp3mexdt67uie59elh 31800 31799 2026-05-01T10:19:13Z آیات محراج 3545 /* کٲشُر */ 31800 wikitext text/x-wiki ==کٲشُر== {{wp}} ===ناوُک آگُر=== {{l|ks|جۆم}} (jom, “Jammu”) +‎ تہٕ (tụ, “and/&”) +‎ {{l|ks|کٔشیٖر}} (kạśīr, “Kashmir”)، شایَد اَنٛگریٖزی لفٕظ {{l|en|Jammu and Kashmir}} پؠٹھ ٲسِتھ. ===وۄشژار=== * {{ks-noun|ipa=d͡ʒom tɨ kəʃiːr}} ===ناوُت=== # جۆم تہٕ کٔشیٖر (مَرکٔزی عَلاقہٕ) # تَوٲریٖخی طور جۆم تہٕ کٔشیٖر (شاہزٲدؠ عَلاقہٕ) تہٕ جۆم تہٕ کٔشیٖر (رِیاسَتھ) ====ہِوی لفٕظ==== * {{l|ks|جۆم}} * {{l|ks|کٔشیٖر}} 1f6lab7qjy2ujcdefzemfzcf8oqfpg7 فرما:com 10 9521 31784 2026-05-01T09:36:38Z آیات محراج 3545 Created page with "{{#invoke:affix/templates|compound}}<noinclude></noinclude>" 31784 wikitext text/x-wiki {{#invoke:affix/templates|compound}}<noinclude></noinclude> 6ijitfk9yv7podv4sexn145833639z6 Module:affix/templates 828 9522 31785 2026-05-01T09:38:08Z آیات محراج 3545 Created page with "local export = {} local m_affix = require("Module:affix") local m_utilities = require("Module:utilities") local en_utilities_module = "Module:en-utilities" local parameter_utilities_module = "Module:parameter utilities" local pseudo_loan_module = "Module:affix/pseudo-loan" local insert = table.insert local boolean_param = {type = "boolean"} local function is_property_key(k) return require(parameter_utilities_module).item_key_is_property(k) end local recognized_aff..." 31785 Scribunto text/plain local export = {} local m_affix = require("Module:affix") local m_utilities = require("Module:utilities") local en_utilities_module = "Module:en-utilities" local parameter_utilities_module = "Module:parameter utilities" local pseudo_loan_module = "Module:affix/pseudo-loan" local insert = table.insert local boolean_param = {type = "boolean"} local function is_property_key(k) return require(parameter_utilities_module).item_key_is_property(k) end local recognized_affix_types = { prefix = "prefix", pre = "prefix", suffix = "suffix", suf = "suffix", interfix = "interfix", inter = "interfix", infix = "infix", ["in"] = "infix", circumfix = "circumfix", circum = "circumfix", ["non-affix"] = "non-affix", naf = "non-affix", root = "non-affix", } local function pre_normalize_affix_type(data) local modtext = data.modtext modtext = modtext:match("^<(.*)>$") if not modtext then error(("Internal error: Passed-in modifier isn't surrounded by angle brackets: %s"):format(data.modtext)) end if recognized_affix_types[modtext] then modtext = "type:" .. modtext end return "<" .. modtext .. ">" end -- Parse raw arguments. A single parameter `data` is passed in, with the following fields: -- * `raw_args`: The raw arguments to parse, normally taken from `frame:getParent().args`. -- * `extra_params`: An optional function of one argument that is called on the `params` structure before parsing; its -- purpose is to specify additional allowed parameters or possibly disable parameters. -- * `has_source`: There is a source-language parameter following 1= (which becomes the "destination" language -- parameter) and preceding the terms. This is currently used for {{pseudo-loan}}. -- * `ilang`: If given, it is a language object that serves as the default for the language. If specified, there is no -- language code specified in 1=; instead the term parameters start directly at 1= (or at 2= if `has_source` is -- given). -- * `require_index_for_pos`: There is no separate |pos= parameter distinct from |pos1=, |pos2=, etc. Instead, -- specifying |pos= results in an error. -- * `dont_require_index`: Allow |foo= to be specified as a synonym for |foo1= (except for |lit=, which remains -- distinct). -- * `allow_type`: Allow |type1=, |type2=, etc. or inline <type:...> for the affix type, and allow a separate |type= -- parameter for the etymology type (FIXME: this may be confusing; consider changing the etymology type to |etype=). -- * `allow_semicolon_separator`: Allow semicolon as a separator, displaying as " or ". This requires changes in the -- display of the output, to not always put a + between the items. -- -- Note that all language parameters are allowed to be etymology-only languages. -- -- Return five values ARGS, ITEMS, LANG_OBJ, SCRIPT_OBJ, SOURCE_LANG_OBJ where ARGS is a table of the parsed arguments; -- ITEMS is the list of parsed items; LANG_OBJ is the language object corresponding to the language code specified in 1= -- (or taken from `ilang` if given); SCRIPT_OBJ is the script object corresponding to sc= (if given, otherwise nil); and -- SOURCE_LANG_OBJ is the language object corresponding to the source-language code specified in 2= (or 1= if `ilang` is -- given) if `has_source` is specified (otherwise nil). local function parse_args(data) local raw_args = data.raw_args local has_source = data.has_source local ilang = data.ilang if raw_args.lang then error("The |lang= parameter is not used by this template. Place the language code in parameter 1 instead.") end local term_index = (ilang and 1 or 2) + (has_source and 1 or 0) local params = { [term_index] = {list = true, allow_holes = true}, ["sort"] = {}, ["nocap"] = boolean_param, -- always allow this even if not used, for use with {{surf}}, which adds it } if not ilang then params[1] = {required = true, type = "language", default = "und"} end local source_index if has_source then source_index = term_index - 1 params[source_index] = {required = true, type = "language", default = "und"} end local m_param_utils = require(parameter_utilities_module) local param_mod_source = {} if not data.dont_require_index then insert(param_mod_source, -- We want to require an index for all params (or use separate_no_index, which also requires an index for the -- param corresponding to the first item). {default = true, require_index = true} ) end insert(param_mod_source, {group = {"link", "ref", "lang", "q", "l", "infl"}}) -- Override lit= to be separate from lit1=. insert(param_mod_source, {param = "lit", separate_no_index = true}) if not data.dont_require_index and not data.require_index_for_pos then -- Override pos= to be separate from pos1=. insert(param_mod_source, {param = "pos", separate_no_index = true}) end if data.allow_type then insert(param_mod_source, {param = "type", separate_no_index = true}) end local param_mods = m_param_utils.construct_param_mods(param_mod_source) if data.extra_params then data.extra_params(params) end local items, args = m_param_utils.parse_list_with_inline_modifiers_and_separate_params { params = params, param_mods = param_mods, raw_args = raw_args, termarg = term_index, parse_lang_prefix = true, track_module = "homophones", -- the inclusion of &lrm; is what [[Module:affix]] has always done default_separator = data.allow_semicolon_separator and " +&lrm; " or nil, special_separators = data.allow_semicolon_separator and {[";"] = " or "} or nil, disallow_custom_separators = not data.allow_semicolon_separator, -- For compatibility, we need to not skip completely unspecified items. It is common, for example, to do -- {{suffix|lang||foo}} to generate "+ -foo". dont_skip_items = true, -- Allow e.g. <infix> to be specified in place of <type:infix>. pre_normalize_modifiers = pre_normalize_affix_type, -- Don't pass in `lang` or `sc`, as they will be used as defaults to initialize the items, which we don't want -- (particularly for `lang`), as the code in [[Module:affix]] uses the presence of `lang` as an indicator that -- a part-specific language was explicitly given. } local lang = ilang or args[1] local source if has_source then source = args[source_index] end -- For compatibility with the prior code, we need to convert items without term or properties to nil. for i = 1, #items do local item = items[i] local saw_item_property = item.term if not saw_item_property then for k, v in pairs(item) do if is_property_key(k) then saw_item_property = true break end end end if not saw_item_property then items[i] = nil elseif item.type then -- Validate and canonicalize affix types. if not recognized_affix_types[item.type] then local valid_types = {} for k in pairs(recognized_affix_types) do insert(valid_types, ("'%s'"):format(k)) end table.sort(recognized_affix_types) error(("Unrecognized affix type '%s' in item %s; valid values are %s"):format( item.type, item.itemno, table.concat(valid_types, ", "))) else item.type = recognized_affix_types[item.type] end end end if args.type and args.type.default and not m_affix.etymology_types[args.type.default] then error("Unrecognized etymology type: '" .. args.type.default .. "'") end return args, items, lang, args.sc.default, source end local function augment_affix_data(data, args, lang, sc) data.lang = lang data.sc = sc data.pos = args.pos and args.pos.default data.lit = args.lit and args.lit.default data.sort_key = args.sort data.type = args.type and args.type.default data.nocap = args.nocap data.notext = args.notext data.nocat = args.nocat data.force_cat = args.force_cat data.l = args.l.default data.ll = args.ll.default data.q = args.q.default data.qq = args.qq.default data.infl = args.infl.default return data end function export.affix(frame) local function extra_params(params) params.notext = boolean_param params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, allow_type = true, allow_semicolon_separator = true, } -- There must be at least one part to display. If there are gaps, a term -- request will be shown. if not next(parts) and not args.type.default then if mw.title.getCurrentTitle().nsText == "Template" then parts = { {term = "prefix-"}, {term = "base"}, {term = "-suffix"} } else error("You must provide at least one part.") end end return m_affix.show_affix(augment_affix_data({ parts = parts }, args, lang, sc)) end function export.compound(frame) local function extra_params(params) params.notext = boolean_param params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, allow_type = true, allow_semicolon_separator = true, } -- There must be at least one part to display. If there are gaps, a term -- request will be shown. if not next(parts) and not args.type.default then if mw.title.getCurrentTitle().nsText == "Template" then parts = { {term = "first"}, {separator = " +&lrm; ", term = "second"} } else error("You must provide at least one part of a compound.") end end return m_affix.show_compound(augment_affix_data({ parts = parts }, args, lang, sc)) end -- FIXME: Temporary for check in compound_like() below for old-style {{contraction}} parameters. Remove eventually. local function ine(arg) if arg == "" then return nil else return arg end end function export.compound_like(frame) local iparams = { ["lang"] = {type = "language"}, ["template"] = {}, ["text"] = {}, ["oftext"] = {}, ["cat"] = {}, ["noaffixcat"] = boolean_param, ["dont_require_index"] = boolean_param, } local iargs = require("Module:parameters").process(frame.args, iparams) local parent_args = frame:getParent().args -- Error to catch most uses of old-style parameters for {{contraction}}. (FIXME: Remove eventually.) local term_param = iargs.lang and 1 or 2 if ine(parent_args[term_param + 2]) and not ine(parent_args[term_param + 1]) and not ine(parent_args.tr2) and not ine(parent_args.ts2) and not ine(parent_args.t2) and not ine(parent_args.gloss2) and not ine(parent_args.g2) and not ine(parent_args.alt2) then error(("You specified a term in %s= and not one in %s=. You probably meant to use t= to specify a gloss instead. " .. "If you intended to specify two terms, put the second term in %s=."):format(term_param + 2, term_param + 1, term_param + 1)) end if not ine(parent_args[term_param + 1]) and not ine(parent_args.alt2) and not ine(parent_args.tr2) and not ine(parent_args.ts2) and ine(parent_args.g2) then error(("You specified a gender in g2= but no term in %s=. You were probably trying to specify two genders for " .. "a single term. To do that, put both genders in g=, comma-separated."):format(term_param + 1)) end local function extra_params(params) params.notext = boolean_param params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = parent_args, extra_params = extra_params, ilang = iargs.lang, dont_require_index = iargs.dont_require_index, -- FIXME, why are we doing this? Formerly we had 'params.pos = nil' whose intention was to disable the overall -- pos= while preserving posN=, which is equivalent to the following using the new syntax. But why is this -- necessary? require_index_for_pos = not iargs.dont_require_index, allow_semicolon_separator = true, } local template = iargs.template local nocat = args.nocat local notext = args.notext local text = not notext and iargs.text local oftext = not notext and (iargs.oftext or text and "of") local cat = not nocat and iargs.cat local noaffixcat = nocat or iargs.noaffixcat if not next(parts) then if mw.title.getCurrentTitle().nsText == "Template" then parts = { {term = "first"}, {separator = " +&lrm; ", term = "second"} } end end return m_affix.show_compound_like(augment_affix_data({ parts = parts, text = text, oftext = oftext, cat = cat, noaffixcat = noaffixcat }, args, lang, sc)) end function export.surface_analysis(frame) local function ine(arg) -- Since we're operating before calling [[Module:parameters]], we need to imitate how that module processes -- arguments, including trimming since numbered arguments don't have automatic whitespace trimming. if not arg then return arg end arg = mw.text.trim(arg) if arg == "" then arg = nil end return arg end local parent_args = frame:getParent().args local etymtext local arg1 = ine(parent_args[1]) if not arg1 then -- Allow omitted first argument to just display "By surface analysis". etymtext = "" elseif arg1:find("^%+") then -- If the first argument (normally a language code) is prefixed with a +, it's a template name. local template_name = arg1:sub(2) local new_args = {} for i, v in pairs(parent_args) do if type(i) == "number" then if i > 1 then new_args[i - 1] = v end else new_args[i] = v end end new_args.nocap = true etymtext = ", " .. frame:expandTemplate { title = template_name, args = new_args } end if etymtext then return (ine(parent_args.nocap) and "b" or "B") .. "y [[Appendix:Glossary#surface analysis|surface analysis]]" .. etymtext end local function extra_params(params) params.notext = boolean_param params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = parent_args, extra_params = extra_params, allow_type = true, allow_semicolon_separator = true, } -- There must be at least one part to display. If there are gaps, a term -- request will be shown. if not next(parts) then if mw.title.getCurrentTitle().nsText == "Template" then parts = { {term = "first"}, {separator = " +&lrm; ", term = "second"} } else error("You must provide at least one part.") end end return m_affix.show_surface_analysis(augment_affix_data({ parts = parts }, args, lang, sc)) end local function check_max_items(items, max_allowed) if #items > max_allowed then local bad_item = items[max_allowed + 1] if bad_item.term then error(("At most %s terms can be specified but saw a term specified for term #%s") :format(max_allowed, max_allowed + 1)) else for k, v in pairs(bad_item) do if is_property_key(k) then error(("At most %s terms can be specified but saw a value for property '%s' of term #%s") :format(max_allowed, k, max_allowed + 1)) end end end error(("Internal error: Something wrong, %s items generated when there should be at most %s, but item #%s doesn't have a term or any properties") :format(#items, max_allowed, max_allowed + 1)) end end function export.circumfix(frame) local function extra_params(params) params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, } check_max_items(parts, 3) local prefix = parts[1] local base = parts[2] local suffix = parts[3] -- Just to make sure someone didn't use the template in a silly way if not (prefix and base and suffix) then if mw.title.getCurrentTitle().nsText == "Template" then prefix = {term = "circumfix", alt = "prefix"} base = {term = "base"} suffix = {term = "circumfix", alt = "suffix"} else error("You must specify a prefix part, a base term and a suffix part.") end end return m_affix.show_circumfix(augment_affix_data({ prefix = prefix, base = base, suffix = suffix }, args, lang, sc)) end function export.confix(frame) local function extra_params(params) params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, } check_max_items(parts, 3) local prefix = parts[1] local base = parts[3] and parts[2] or nil local suffix = parts[3] or parts[2] -- Just to make sure someone didn't use the template in a silly way if not (prefix and suffix) then if mw.title.getCurrentTitle().nsText == "Template" then prefix = {term = "prefix"} suffix = {term = "suffix"} else error("You must specify a prefix part, an optional base term and a suffix part.") end end return m_affix.show_confix(augment_affix_data({ prefix = prefix, base = base, suffix = suffix }, args, lang, sc)) end function export.pseudo_loan(frame) local function extra_params(params) params.notext = boolean_param params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc, source = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, has_source = true, -- FIXME, why are we doing this? Formerly we had 'params.pos = nil' whose intention was to disable the overall -- pos= while preserving posN=, which is equivalent to the following using the new syntax. But why is this -- necessary? require_index_for_pos = true, allow_semicolon_separator = true, } return require(pseudo_loan_module).show_pseudo_loan( augment_affix_data({ source = source, parts = parts }, args, lang, sc)) end function export.infix(frame) local function extra_params(params) params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, } check_max_items(parts, 3) local base = parts[1] local infix = parts[2] -- Just to make sure someone didn't use the template in a silly way if not (base and infix) then if mw.title.getCurrentTitle().nsText == "Template" then base = {term = "base"} infix = {term = "infix"} else error("You must provide a base term and an infix.") end end return m_affix.show_infix(augment_affix_data({ base = base, infix = infix }, args, lang, sc)) end function export.prefix(frame) local function extra_params(params) params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, } local prefixes = parts local base = nil local max_prefix = 0 for k, v in pairs(prefixes) do max_prefix = math.max(k, max_prefix) end if max_prefix >= 2 then base = prefixes[max_prefix] prefixes[max_prefix] = nil end -- Just to make sure someone didn't use the template in a silly way if not next(prefixes) then if mw.title.getCurrentTitle().nsText == "Template" then base = {term = "base"} prefixes = { {term = "prefix"} } else error("You must provide at least one prefix.") end end return m_affix.show_prefix(augment_affix_data({ prefixes = prefixes, base = base }, args, lang, sc)) end function export.suffix(frame) local function extra_params(params) params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, } local base = parts[1] local suffixes = {} for k, v in pairs(parts) do suffixes[k - 1] = v end -- Just to make sure someone didn't use the template in a silly way if not next(suffixes) then if mw.title.getCurrentTitle().nsText == "Template" then base = {term = "base"} suffixes = { {term = "suffix"} } else error("You must provide at least one suffix.") end end return m_affix.show_suffix(augment_affix_data({ base = base, suffixes = suffixes }, args, lang, sc)) end function export.derivsee(frame) local iargs = frame.args local iparams = { ["derivtype"] = {}, } local iargs = require("Module:parameters").process(frame.args, iparams) local params = { ["head"] = {}, ["id"] = {}, ["sc"] = {type = "script"}, ["pos"] = {}, } local derivtype = iargs.derivtype params[1] = {required = "true", type = "language", default = "und"} params[2] = {} local args = require("Module:parameters").process(frame:getParent().args, params) local lang = args[1] local term = args[2] or args.head local id = args.id local sc = args.sc local pos = require(en_utilities_module).pluralize(args.pos or "term") if not term then local SUBPAGE = mw.loadData("Module:headword/data").pagename if lang:hasType("reconstructed") or mw.title.getCurrentTitle().nsText == "Reconstruction" then term = "*" .. SUBPAGE elseif lang:hasType("appendix-constructed") then term = SUBPAGE else term = SUBPAGE end end local category = nil local langname = lang:getFullName() if (derivtype == "compound" and pos == nil) then category = langname .. " compounds with " .. term elseif derivtype == "compound" and pos == "verbs" then category = langname .. " compound " .. pos .. " formed with " .. term elseif derivtype == "compound" then category = langname .. " compound " .. pos .. " with " .. term else category = langname .. " " .. pos .. " " .. derivtype .. "ed with " .. term .. (id and " (" .. id .. ")" or "") end return require('Module:collapsible category tree').make{ lang = lang, sc = sc, category = category, } end return export 701ci4kau9yagdq4na1uqxca6idigvq Module:affix 828 9523 31786 2026-05-01T09:39:02Z آیات محراج 3545 Created page with "local export = {} local debug_force_cat = false -- if set to true, always display categories even on userspace pages local m_links = require("Module:links") local m_str_utils = require("Module:string utilities") local m_table = require("Module:table") local en_utilities_module = "Module:en-utilities" local etymology_module = "Module:etymology" local pron_qualifier_module = "Module:pron qualifier" local scripts_module = "Module:scripts" local utilities_module = "Module:..." 31786 Scribunto text/plain local export = {} local debug_force_cat = false -- if set to true, always display categories even on userspace pages local m_links = require("Module:links") local m_str_utils = require("Module:string utilities") local m_table = require("Module:table") local en_utilities_module = "Module:en-utilities" local etymology_module = "Module:etymology" local pron_qualifier_module = "Module:pron qualifier" local scripts_module = "Module:scripts" local utilities_module = "Module:utilities" -- Export this so the category code in [[Module:category tree/etymology]] can access it. export.affix_lang_data_module_prefix = "Module:affix/lang-data/" local rsub = m_str_utils.gsub local usub = m_str_utils.sub local ulen = m_str_utils.len local rfind = m_str_utils.find local rmatch = m_str_utils.match local pluralize = require(en_utilities_module).pluralize local u = m_str_utils.char local ucfirst = m_str_utils.ucfirst local unpack = unpack or table.unpack -- Lua 5.2 compatibility function export.affix_variants(canonical, variants) local mappings = {} for _, variant in ipairs(variants) do mappings[variant] = canonical end return mappings end function export.id_mapping(default, ids) local mapping = { default = default } if ids then for id, target in pairs(ids) do mapping[id] = target end end return mapping end function export.id_mapping_with_affix_variants(base, id_variants) local mappings = {} for id, variants in pairs(id_variants) do for _, variant in ipairs(variants) do mappings[variant] = export.id_mapping(base, {[id] = base}) end end return mappings end function export.merge_tables(...) local result = {} for i = 1, select('#', ...) do local t = select(i, ...) if t then for k, v in pairs(t) do result[k] = v end end end return result end -- Export this so the category code in [[Module:category tree/etymology]] can access it. export.langs_with_lang_specific_data = { ["az"] = true, ["fi"] = true, ["fr"] = true, ["izh"] = true, ["la"] = true, ["sah"] = true, ["tr"] = true, ["trk-pro"] = true, } local default_pos = "term" --[==[ intro: ===About different types of hyphens ("template", "display" and "lookup"):=== * The "template hyphen" is the per-script hyphen character that is used in template calls to indicate that a term is an affix. This is always a single Unicode char, but there may be multiple possible hyphens for a given script. Normally this is just the regular hyphen character "-", but for some non-Latin-script languages (currently only right-to-left languages), it is different. * The "display hyphen" is the string (which might be an empty string) that is added onto a term as displayed and linked, to indicate that a term is an affix. Currently this is always either the same as the template hyphen or an empty string, but the code below is written generally enough to handle arbitrary display hyphens. Specifically: *# For East Asian languages, the display hyphen is always blank. *# For Arabic-script languages, either tatweel (ـ) or ZWNJ (zero-width non-joiner) are allowed as template hyphens, where ZWNJ is supported primarily for Farsi, because some suffixes have non-joining behavior. The display hyphen corresponding to tatweel is also tatweel, but the display hyphen corresponding to ZWNJ is blank (tatweel is also the default display hyphen, for calls to {{tl|prefix}}/{{tl|suffix}}/etc. that don't include an explicit hyphen). * The "lookup hyphen" is the hyphen that is used when looking up language-specific affix mappings. (These mappings are discussed in more detail below when discussing link affixes.) It depends only on the script of the affix in question. Most scripts (including East Asian scripts) use a regular hyphen "-" as the lookup hyphen, but Hebrew and Arabic have their own lookup hyphens (respectively maqqef and tatweel). Note that for Arabic in particular, there are three possible template hyphens that are recognized (tatweel, ZWNJ and regular hyphen), but mappings must use tatweel. ===About different types of affixes ("template", "display", "link", "lookup" and "category"):=== * A "template affix" is an affix in its source form as it appears in a template call. Generally, a template affix has an attached template hyphen (see above) to indicate that it is an affix and indicate what type of affix it is (prefix, suffix, interfix or circumfix), but some of the older-style templates such as {{tl|suffix}}, {{tl|prefix}}, {{tl|confix}}, etc. have "positional" affixes where the presence of the affix in a certain position (e.g. the second or third parameter) indicates that it is a certain type of affix, whether or not it has an attached template hyphen. * A "display affix" is the corresponding affix as it is actually displayed to the user. The display affix may differ from the template affix for various reasons: *# The display affix may be specified explicitly using the {{para|alt<var>N</var>}} parameter, the `<alt:...>` inline modifier or a piped link of the form e.g. `<nowiki>[[-kas|-käs]]</nowiki>` (here indicating that the affix should display as `-käs` but be linked as `-kas`). Here, the template affix is arguably the entire piped link, while the display affix is `-käs`. *# Even in the absence of {{para|alt<var>N</var>}} parameters, `<alt:...>` inline modifiers and piped links, certain languages have differences between the "template hyphen" specified in the template (which always needs to be specified somehow or other in templates like {{tl|affix}}, to indicate that the term is an affix and what type of affix it is) and the display hyphen (see above), with corresponding differences between template and display affixes. * A (regular) "link affix" is the affix that is linked to when the affix is shown to the user. The link affix is usually the same as the display affix, but will differ in one of three circumstances: *# The display and link affixes are explicitly made different using {{para|alt<var>N</var>}} parameters, `<alt:...>` inline modifiers or piped links, as described above under "display affix". *# For certain languages, certain affixes are mapped to canonical form using language-specific mappings. For example, in Finnish, the adjective-forming suffix {{m|fi|-kas}} appears as {{m|fi|-käs}} after front vowels, but logically both forms are the same suffix and should be linked and categorized the same. Similarly, in Latin, the negative and intensive prefixes spelled {{m|la|in-}} (etymologically two distinct prefixes) appear variously as {{m|la|il-}}, {{m|la|im-}} or {{m|la|ir-}} before certain consonants. Mappings are supplied in [[Module:affix/lang-data/LANGCODE]] to convert Finnish {{m|fi|-käs}} to {{m|fi|-kas}} for linking and categorization purposes. Note that the affixes in the mappings use "lookup hyphens" to indicate the different types of affixes, which is usually the same as the template hyphen but differs for Arabic scripts, because there are multiple possible template hyphens recognized but only one lookup hyphen (tatweel). The form of the affix as used to look up in the mapping tables is called the "lookup affix"; see below. * A "stripped link affix" is a link affix that has been passed through the language's `stripDiacritics()` function, which may strip certain diacritics: e.g. macrons in Latin and Old English (indicating length); acute and grave accents in Russian and various other Slavic languages (indicating stress); vowel diacritics in most Arabic-script languages; and also tatweel in some Arabic-script languages (currently, for example, Persian, Arabic and Urdu strip tatweel, but Ottoman Turkish does not). Stripped link affixes are currently what are used in category names. * A "lookup affix" is the form of the affix as it is looked up in the language-specific lookup mappings described above under link affixes. There are actually two lookup stages: *# First, the affix is looked up in a modified display form (specifically, the same as the display affix but using lookup hyphens). Note that this lookup does not occur if an explicit display form is given using {{para|alt<var>N</var>}} or an `<alt:...>` inline modifier, or if the template affix contains a piped or embedded link. *# If no entry is found, the affix is then looked up in a modified link form (specifically, the modified display form passed through the language's `stripDiacritics()` function, which strips out certain diacritics, but with the lookup hyphen re-added if it was stripped out, as in the case of tatweel in many Arabic-script languages). The reason for this double lookup procedure is to allow for mappings that are sensitive to the extra diacritics, but also allow for mappings that are not sensitive in this fashion (e.g. Russian {{m|ru|-ливый}} occurs both stressed and unstressed, but is the same prefix either way). * A "category affix" is the affix as it appears in categories such as [[:Category:Finnish terms suffixed with -kas| Category:Finnish terms suffixed with ''-kas'']]. The category affix is currently always the same as the stripped link affix. This means that for Arabic-script languages, it may or may not have a tatweel, even if the correponding display affix and regular link affix have a tatweel. As mentioned above, stripDiacritics() strips tatweel for Arabic, Persian and Urdu, but not for Ottoman Turkish. Hence affix categories for Arabic, Persian and Urdu will be missing the tatweel, but affix categories for Ottoman Turkish will have it. An additional complication is that if the template affix contains a ZWNJ, the display (and hence the link and category affixes) will have no hyphen attached in any case. ]==] ----------------------------------------------------------------------------------------- -- Template and display hyphens -- ----------------------------------------------------------------------------------------- --[=[ Per-script template hyphens. The template hyphen is what appears in the {{affix}}/{{prefix}}/{{suffix}}/etc. template (in the wikicode). See above. They key below is a script code, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab' and 'ur-Arab' will match 'Arab'. The value below is a string consisting of one or more hyphen characters. If there is more than one character, the default hyphen must come last and a non-default function must be specified for the script in display_hyphens[] so the correct display hyphen will be specified when no template hyphen is given (in {{suffix}}/{{prefix}}/etc.). Script detection is normally done when linking, but we need to do it earlier. However, under most circumstances we don't need to do script detection. Specifically, we only need to do script detection for a given language if (a) the language has multiple scripts; and (b) at least one of those scripts is listed below or in display_hyphens. ]=] local ZWNJ = u(0x200C) -- zero-width non-joiner local template_hyphens = { -- This covers all Arabic scripts. See above. ["Arab"] = "ـ" .. ZWNJ .. "-", -- tatweel + zero-width non-joiner + regular hyphen ["Hebr"] = "־", -- Hebrew-specific hyphen termed "maqqef" ["Mong"] = "᠊", -- FIXME! What about the following right-to-left scripts? -- Adlm (Adlam) -- Armi (Imperial Aramaic) -- Avst (Avestan) -- Cprt (Cypriot) -- Khar (Kharoshthi) -- Mand (Mandaic/Mandaean) -- Mani (Manichaean) -- Mend (Mende/Mende Kikakui) -- Narb (Old North Arabian) -- Nbat (Nabataean/Nabatean) -- Nkoo (N'Ko) -- Orkh (Orkhon runes) -- Phli (Inscriptional Pahlavi) -- Phlp (Psalter Pahlavi) -- Phlv (Book Pahlavi) -- Phnx (Phoenician) -- Prti (Inscriptional Parthian) -- Rohg (Hanifi Rohingya) -- Samr (Samaritan) -- Sarb (Old South Arabian) -- Sogd (Sogdian) -- Sogo (Old Sogdian) -- Syrc (Syriac) -- Thaa (Thaana) } -- Hyphens used when looking up an affix in a lang-specific affix mapping. Defaults to regular hyphen (-). The keys -- are script codes, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab' and 'ur-Arab' -- will match 'Arab'. The value should be a single character. local lookup_hyphens = { ["Hebr"] = "־", -- This covers all Arabic scripts. See above. ["Arab"] = "ـ", } -- Default display-hyphen function. local function default_display_hyphen(script, hyph) if not hyph then return template_hyphens[script] or "-" end return hyph end local function arab_get_display_hyphen(script, hyph) if not hyph then return "ـ" -- tatweel elseif hyph == ZWNJ then return "" else return hyph end end local function no_display_hyphen(script, hyph) return "" end -- Per-script function to return the correct display hyphen given the script and template hyphen. The function should -- also handle the case where the passed-in template hyphen is nil, corresponding to the situation in -- {{prefix}}/{{suffix}}/etc. where no template hyphen is specified. The key is the script code after removing a hyphen -- and anything preceding, so 'fa-Arab', 'ur-Arab' etc. will match 'Arab'. local display_hyphens = { -- This covers all Arabic scripts. See above. ["Arab"] = arab_get_display_hyphen, ["Bopo"] = no_display_hyphen, ["Hani"] = no_display_hyphen, ["Hans"] = no_display_hyphen, ["Hant"] = no_display_hyphen, -- The following is a mixture of several scripts. Hopefully the specs here are correct! ["Jpan"] = no_display_hyphen, ["Jurc"] = no_display_hyphen, ["Kitl"] = no_display_hyphen, ["Kits"] = no_display_hyphen, ["Laoo"] = no_display_hyphen, ["Nshu"] = no_display_hyphen, ["Shui"] = no_display_hyphen, ["Tang"] = no_display_hyphen, ["Thaa"] = no_display_hyphen, ["Thai"] = no_display_hyphen, ["Tibt"] = no_display_hyphen, } ----------------------------------------------------------------------------------------- -- Basic Utility functions -- ----------------------------------------------------------------------------------------- local function glossary_link(entry, text) text = text or entry return "[[Appendix:Glossary#" .. entry .. "|" .. text .. "]]" end local function track(page) if type(page) == "table" then for i, pg in ipairs(page) do page[i] = "affix/" .. pg end else page = "affix/" .. page end require("Module:debug/track")(page) end local function ine(val) return val ~= "" and val or nil end ----------------------------------------------------------------------------------------- -- Compound types -- ----------------------------------------------------------------------------------------- local function make_compound_type(typ, alttext) return { text = glossary_link(typ, alttext) .. " compound", cat = typ .. " compounds", } end -- Make a compound type entry with a simple rather than glossary link. -- These should be replaced with a glossary link when the entry in the glossary -- is created. local function make_non_glossary_compound_type(typ, alttext) local link = alttext and "[[" .. typ .. "|" .. alttext .. "]]" or "[[" .. typ .. "]]" return { text = link .. " compound", cat = typ .. " compounds", } end local function make_raw_compound_type(typ, alttext) return { text = glossary_link(typ, alttext), cat = pluralize(typ), } end local function make_borrowing_type(typ, alttext) return { text = glossary_link(typ, alttext), borrowing_type = pluralize(typ), } end export.etymology_types = { ["adapted borrowing"] = make_borrowing_type("adapted borrowing"), ["adap"] = "adapted borrowing", ["abor"] = "adapted borrowing", ["alliterative"] = make_non_glossary_compound_type("alliterative"), ["allit"] = "alliterative", ["antonymous"] = make_non_glossary_compound_type("antonymous"), ["ant"] = "antonymous", ["bahuvrihi"] = make_compound_type("bahuvrihi", "bahuvrīhi"), ["bahu"] = "bahuvrihi", ["bv"] = "bahuvrihi", ["coordinative"] = make_compound_type("coordinative"), ["coord"] = "coordinative", ["descriptive"] = make_compound_type("descriptive"), ["desc"] = "descriptive", ["determinative"] = make_compound_type("determinative"), ["det"] = "determinative", ["dvandva"] = make_compound_type("dvandva"), ["dva"] = "dvandva", ["dvigu"] = make_compound_type("dvigu"), ["dvi"] = "dvigu", ["endocentric"] = make_compound_type("endocentric"), ["endo"] = "endocentric", ["exocentric"] = make_compound_type("exocentric"), ["exo"] = "exocentric", ["izafet I"] = make_compound_type("izafet I"), ["iz1"] = "izafet I", ["izafet II"] = make_compound_type("izafet II"), ["iz2"] = "izafet II", ["izafet III"] = make_compound_type("izafet III"), ["iz3"] = "izafet III", ["karmadharaya"] = make_compound_type("karmadharaya", "karmadhāraya"), ["karma"] = "karmadharaya", ["kd"] = "karmadharaya", ["kenning"] = make_raw_compound_type("kenning"), ["ken"] = "kenning", ["rhyming"] = make_non_glossary_compound_type("rhyming"), ["rhy"] = "rhyming", ["synonymous"] = make_non_glossary_compound_type("synonymous"), ["syn"] = "synonymous", ["tatpurusa"] = make_compound_type("tatpurusa", "tatpuruṣa"), ["tat"] = "tatpurusa", ["tp"] = "tatpurusa", } local function process_etymology_type(typ, nocap, notext, has_parts) local text_sections = {} local categories = {} local borrowing_type if typ then local typdata = export.etymology_types[typ] if type(typdata) == "string" then typdata = export.etymology_types[typdata] end if not typdata then error("Internal error: Unrecognized type '" .. typ .. "'") end local text = typdata.text if not nocap then text = ucfirst(text) end local cat = typdata.cat borrowing_type = typdata.borrowing_type local oftext = typdata.oftext or " of" if not notext then table.insert(text_sections, text) if has_parts then table.insert(text_sections, oftext) table.insert(text_sections, " ") end end if cat then table.insert(categories, cat) end end return text_sections, categories, borrowing_type end ----------------------------------------------------------------------------------------- -- Utility functions -- ----------------------------------------------------------------------------------------- -- Iterate an array up to the greatest integer index found. local function ipairs_with_gaps(t) local indices = m_table.numKeys(t) local max_index = #indices > 0 and math.max(unpack(indices)) or 0 local i = 0 return function() while i < max_index do i = i + 1 return i, t[i] end end end export.ipairs_with_gaps = ipairs_with_gaps --[==[ Join formatted parts (in `parts_formatted`) together with any overall {{para|lit}} spec (in `lit`) plus categories, which are formatted by prepending the language name as found in `lang`. The value of an entry in `categories` can be either a string (which is formatted using `sort_key`) or a table of the form `{ {cat=<var>category</var>, sort_key=<var>sort_key</var>, sort_base=<var>sort_base</var>}`, specifying the sort key and sort base to use when formatting the category. If `nocat` is given, no categories are added; otherwise, `force_cat` causes categories to be added even on userspace pages. ]==] function export.join_formatted_parts(data) local cattext local lang = data.data.lang local force_cat = data.data.force_cat or debug_force_cat if data.data.nocat then cattext = "" else for i, cat in ipairs(data.categories) do if type(cat) == "table" then data.categories[i] = require(utilities_module).format_categories(lang:getFullName() .. " " .. cat.cat, lang, cat.sort_key, cat.sort_base, force_cat) else data.categories[i] = require(utilities_module).format_categories(lang:getFullName() .. " " .. cat, lang, data.data.sort_key, nil, force_cat) end end cattext = table.concat(data.categories) end local result = table.concat(data.parts_formatted, not data.separator_already_added and " +&lrm; " or nil) .. (data.data.lit and ", literally " .. m_links.mark(data.data.lit, "gloss") or "") local q = data.data.q local qq = data.data.qq local l = data.data.l local ll = data.data.ll local infl = data.data.infl if q and q[1] or qq and qq[1] or l and l[1] or ll and ll[1] or infl and infl[1] then result = require(pron_qualifier_module).format_qualifiers { lang = lang, text = result, q = q, qq = qq, l = l, ll = ll, infl = infl, } end return result .. cattext end local function pluralize(pos) if pos ~= "nouns" and usub(pos, -5) ~= "verbs" and usub(pos, -4) ~= "ives" then if pos:find("[sx]$") then pos = pos .. "es" else pos = pos .. "s" end end return pos end -- Remove links and call lang:stripDiacritics(term). local function strip_diacritics_no_links(lang, term) return lang:stripDiacritics(m_links.remove_links(term)) end --[=[ Convert a raw part as passed into an entry point into a part ready for linking. `lang` and `sc` are the overall language and script objects. This uses the overall language and script objects as defaults for the part and parses off any fragment from the term. We need to do the latter so that fragments don't end up in categories and so that we correctly do affix mapping even in the presence of fragments. ]=] local function canonicalize_part(part, lang, sc) if not part then return end -- Save the original (user-specified, part-specific) value of `lang`. If such a value is specified, we don't insert -- a '*fixed with' category, and we format the part using format_derived() in [[Module:etymology]] rather than -- full_link() in [[Module:links]]. part.part_lang = part.lang part.lang = part.lang or lang part.sc = part.sc or sc local term = part.term if not term then return elseif not part.fragment then part.term, part.fragment = m_links.get_fragment(term) else part.term = m_links.get_fragment(term) end end --[==[ Construct a single linked part based on the information in `part`, for use by `show_affix()` and other entry points. This should be called after `canonicalize_part()` is called on the part. This is a thin wrapper around `full_link()` in [[Module:links]] unless `part.part_lang` is specified (indicating that a part-specific language was given), in which case `format_derived()` in [[Module:etymology]] is called to display a term in a language other than the language of the overall term (specified in `data.lang`). `data` contains the entire object passed into the entry point and is used to access information for constructing the categories added by `format_derived()`. ]==] function export.link_term(part, data, include_separator) local result if part.part_lang then result = require(etymology_module).format_derived { lang = data.lang, terms = {part}, sources = {part.lang}, sort_key = data.sort_key, nocat = data.nocat, template_name = "affix", qualifiers_labels_on_outside = true, borrowing_type = data.borrowing_type, force_cat = data.force_cat or debug_force_cat, } else result = m_links.full_link(part, "term", nil, "show qualifiers") end if include_separator and part.separator then return part.separator .. result else return result end end local function canonicalize_script_code(scode) -- Convert fa-Arab, ur-Arab etc. to Arab. return (scode:gsub("^.*%-", "")) end ----------------------------------------------------------------------------------------- -- Affix-handling functions -- ----------------------------------------------------------------------------------------- -- Figure out the appropriate script for the given affix and language (unless the script is explicitly passed in), and -- return the values of template_hyphens[], display_hyphens[] and lookup_hyphens[] for that script, substituting -- default values as appropriate. Four values are returned: -- DETECTED_SCRIPT, TEMPLATE_HYPHEN, DISPLAY_HYPHEN, LOOKUP_HYPHEN local function detect_script_and_hyphens(text, lang, sc) local scode -- 1. If the script is explicitly passed in, use it. if sc then scode = sc:getCode() else local possible_script_codes = lang:getScriptCodes() -- YUCK! `possible_script_codes` comes from loadData() so #possible_scripts doesn't work (always returns 0). local num_possible_script_codes = m_table.length(possible_script_codes) if num_possible_script_codes == 0 then -- This shouldn't happen; if the language has no script codes, -- the list {"None"} should be returned. error("Something is majorly wrong! Language " .. lang:getCanonicalName() .. " has no script codes.") end if num_possible_script_codes == 1 then -- 2. If the language has only one possible script, use it. scode = possible_script_codes[1] else -- 3. Check if any of the possible scripts for the language have non-default values for template_hyphens[] -- or display_hyphens[]. If so, we need to do script detection on the text. If not, just use "Latn", -- which may not be technically correct but produces the right results because Latn has all default -- values for template_hyphens[] and display_hyphens[]. local may_have_nondefault_hyphen = false for _, script_code in ipairs(possible_script_codes) do script_code = canonicalize_script_code(script_code) if template_hyphens[script_code] or display_hyphens[script_code] then may_have_nondefault_hyphen = true break end end if not may_have_nondefault_hyphen then scode = "Latn" else scode = lang:findBestScript(text):getCode() end end end scode = canonicalize_script_code(scode) local template_hyphen = template_hyphens[scode] or "-" local lookup_hyphen = lookup_hyphens[scode] or "-" local display_hyphen = display_hyphens[scode] or default_display_hyphen return scode, template_hyphen, display_hyphen, lookup_hyphen end --[=[ Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string, specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen, or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix). ]=] local function reconstruct_term_per_hyphens(term, affix_type, scode, thyph_re, new_hyphen) local function get_hyphen(hyph) if type(new_hyphen) == "string" then return new_hyphen end return new_hyphen(scode, hyph) end if affix_type == "non-affix" then return term elseif affix_type == "circumfix" then local before, before_hyphen, after_hyphen, after = rmatch(term, "^(.*)" .. thyph_re .. " " .. thyph_re .. "(.*)$") if not before or ulen(term) <= 3 then -- Unlike with other types of affixes, don't try to add hyphens in the middle of the term to convert it to -- a circumfix. Also, if the term is just hyphen + space + hyphen, return it. return term end return before .. get_hyphen(before_hyphen) .. " " .. get_hyphen(after_hyphen) .. after elseif affix_type == "infix" or affix_type == "interfix" then local before_hyphen, middle, after_hyphen = rmatch(term, "^" .. thyph_re .. "(.*)" .. thyph_re .. "$") if before_hyphen and ulen(term) <= 1 then -- If the term is just a hyphen, return it. return term end return get_hyphen(before_hyphen) .. (middle or term) .. get_hyphen(after_hyphen) elseif affix_type == "prefix" then local middle, after_hyphen = rmatch(term, "^(.*)" .. thyph_re .. "$") if middle and ulen(term) <= 1 then -- If the term is just a hyphen, return it. return term end return (middle or term) .. get_hyphen(after_hyphen) elseif affix_type == "suffix" then local before_hyphen, middle = rmatch(term, "^" .. thyph_re .. "(.*)$") if before_hyphen and ulen(term) <= 1 then -- If the term is just a hyphen, return it. return term end return get_hyphen(before_hyphen) .. (middle or term) else error(("Internal error: Unrecognized affix type '%s'"):format(affix_type)) end end --[=[ Look up a mapping from a given affix variant to the canonical form used in categories and links. The lookup tables are language-specific according to `lang`, and may be ID-specific according to `affix_id`. The affixes as they appear in the lookup tables (both the variant and the canonical form) are in "lookup affix" format (approximately speaking, they use a regular hyphen for most scripts, but a tatweel for Arabic-script entries and a maqqef for Hebrew-script entries), but the passed-in `affix` param is in "template affix" format (which differs from the lookup affix for Arabic-script entries, because more types of hyphens are allowed in template affixes; see the comments at the top of the file). The remaining parameters to this function are used to convert from template affixes to lookup affixes; see the reconstruct_term_per_hyphens() function above. If the affix contains brackets, no lookup is done. Otherwise, a two-stage process is used, first looking up the affix directly and then stripping diacritics and looking it up again. The reason for this is documented above in the comments at the top of the file (specifically, the comments describing lookup affixes). The value of a mapping can either be a string (do the mapping regardless of affix ID) or a table indexed by affix ID (where the special value `false` indicates no affix ID). The values of entries in this table can also be strings, or tables with keys `affix` and `id` (again, use `false` to indicate no ID). This allows an affix mapping to map from one ID to another (for example, this is used in English to map the [[an-]] prefix with no ID to the [[a-]] prefix with the ID 'not'). The Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string, specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen, or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix). ]=] local function lookup_affix_mapping(affix, affix_type, lang, scode, thyph_re, lookup_hyph, affix_id) local function do_lookup(affix) -- Ensure that the affix uses lookup hyphens regardless of whether it used a different type of hyphens before -- or no hyphens. local lookup_affix = reconstruct_term_per_hyphens(affix, affix_type, scode, thyph_re, lookup_hyph) local function do_lookup_for_langcode(langcode) if export.langs_with_lang_specific_data[langcode] then local langdata = mw.loadData(export.affix_lang_data_module_prefix .. langcode) if langdata.affix_mappings then local mapping = langdata.affix_mappings[lookup_affix] if mapping then if type(mapping) == "table" then mapping = mapping[affix_id] or mapping.default or mapping[affix_id or false] if mapping then return mapping end else return mapping end end end end end -- If `lang` is an etymology-only language, look for a mapping both for it and its full parent. local langcode = lang:getCode() local mapping = do_lookup_for_langcode(langcode) if mapping then return mapping end local full_langcode = lang:getFullCode() if full_langcode ~= langcode then mapping = do_lookup_for_langcode(full_langcode) if mapping then return mapping end end return nil end if affix:find("%[%[") then return nil end return do_lookup(affix) or do_lookup(lang:stripDiacritics(affix)) or nil end --[==[ For a given template term in a given language (see the definition of "template affix" near the top of the file), possibly in an explicitly specified script `sc` (but usually nil), return the term's affix type ({"prefix"}, {"interfix"}, {"suffix"}, {"circumfix"} or {"non-affix"}) along with the corresponding link and display affixes (see definitions near the top of the file); also the corresponding lookup affix (if `return_lookup_affix` is specified). The term passed in should already have any fragment (after the # sign) parsed off of it. Four values are returned: `affix_type`, `link_term`, `display_term` and `lookup_term`. The affix type can be passed in instead of autodetected; in this case, the template term need not have any attached hyphens, and the appropriate hyphens will be added in the appropriate places. If `do_affix_mapping` is specified, look up the affix in the lang-specific affix mappings, as described in the comment at the top of the file; otherwise, the link and display terms will always be the same. (They will be the same in any case if the template term has a bracketed link in it or is not an affix.) If `return_lookup_affix` is given, the fourth return value contains the term with appropriate lookup hyphens in the appropriate places; otherwise, it is the same as the display term. (This functionality is used in [[Module:category tree/affixes and compounds]] to convert link affixes into lookup affixes so that they can be looked up in the affix mapping tables.) ]==] local function parse_term_for_affixes(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id) if not term then return "non-affix", nil, nil, nil end if term == "^" then -- Indicates a null term to emulate the behavior of {{suffix|foo||bar}}. term = "" return "non-affix", term, term, term end if term:find("^%^") then -- HACK! ^ at the beginning of Korean languages has a special meaning, triggering capitalization of the -- transliteration. Don't interpret it as "force non-affix" for those languages. local langcode = lang:getCode() if langcode ~= "ko" and langcode ~= "okm" and langcode ~= "jje" then -- Formerly we allowed ^ to force non-affix type; this is now handled using an inline modifier -- <naf>, <root>, etc. Throw an error for the moment when the old way is encountered. error("Use of ^ to force non-affix status is no longer supported; use an inline modifier <naf> or <root> " .. "after the component") end end -- Remove an asterisk if the morpheme is reconstructed and add it back at the end. local reconstructed = "" if term:find("^%*") then reconstructed = "*" term = term:gsub("^%*", "") end local scode, thyph, dhyph, lhyph = detect_script_and_hyphens(term, lang, sc) thyph = "([" .. thyph .. "])" if not affix_type then if rfind(term, thyph .. " " .. thyph) then affix_type = "circumfix" else local has_beginning_hyphen = rfind(term, "^" .. thyph) local has_ending_hyphen = rfind(term, thyph .. "$") if has_beginning_hyphen and has_ending_hyphen then affix_type = "interfix" elseif has_ending_hyphen then affix_type = "prefix" elseif has_beginning_hyphen then affix_type = "suffix" else affix_type = "non-affix" end end end local link_term, display_term, lookup_term if affix_type == "non-affix" then link_term = term display_term = term lookup_term = term else display_term = reconstruct_term_per_hyphens(term, affix_type, scode, thyph, dhyph) if do_affix_mapping then link_term = lookup_affix_mapping(term, affix_type, lang, scode, thyph, lhyph, affix_id) -- The return value of lookup_affix_mapping() may be an affix mapping with lookup hyphens if a mapping -- was found, otherwise nil if a mapping was not found. We need to convert to display hyphens in -- either case, but in the latter case we can reuse the display term, which has already been converted. if link_term then link_term = reconstruct_term_per_hyphens(link_term, affix_type, scode, thyph, dhyph) else link_term = display_term end else link_term = display_term end if return_lookup_affix then lookup_term = reconstruct_term_per_hyphens(term, affix_type, scode, thyph, lhyph) else lookup_term = display_term end end link_term = reconstructed .. link_term display_term = reconstructed .. display_term lookup_term = reconstructed .. lookup_term return affix_type, link_term, display_term, lookup_term end --[==[ Add a hyphen to a term in the appropriate place, based on the specified affix type, stripping off any existing hyphens in that place. For example, if `affix_type` == {"prefix"}, we'll add a hyphen onto the end if it's not already there (or is of the wrong type). Three values are returned: the link term, display term and lookup term. This function is a thin wrapper around `parse_term_for_affixes`; see the comments above that function for more information. Note that this function is exposed externally because it is called by [[Module:category tree/affixes and compounds]]; see the comment in `parse_term_for_affixes` for more information. ]==] function export.make_affix(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id) if not (affix_type == "prefix" or affix_type == "suffix" or affix_type == "circumfix" or affix_type == "infix" or affix_type == "interfix" or affix_type == "non-affix") then error("Internal error: Invalid affix type " .. (affix_type or "(nil)")) end local _, link_term, display_term, lookup_term = parse_term_for_affixes(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id) return link_term, display_term, lookup_term end ----------------------------------------------------------------------------------------- -- Main entry points -- ----------------------------------------------------------------------------------------- --[==[ Core categorization logic for affixes. This is shared between show_affix(), show_compound_like() and get_affix_categories_only(). Returns the categories array and other metadata needed for formatting. ]==] local function generate_affix_categories(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) local text_sections, categories, borrowing_type = process_etymology_type(data.type, data.surface_analysis or data.nocap, data.notext, #data.parts > 0) data.borrowing_type = borrowing_type -- Process each part local whole_words = 0 local is_affix_or_compound = false -- Canonicalize and generate links for all the parts first; then do categorization in a separate step, because when -- processing the first part for categorization, we may access the second part and need it already canonicalized. for i, part in ipairs_with_gaps(data.parts) do part = part or {} data.parts[i] = part canonicalize_part(part, data.lang, data.sc) -- Determine affix type and get link and display terms (see text at top of file). Store them in the part -- (in fields that won't clash with fields used by full_link() in [[Module:links]] or link_term()), so they -- can be used in the loop below when categorizing. part.affix_type, part.affix_link_term, part.affix_display_term = parse_term_for_affixes(part.term, part.lang, part.sc, part.type, not part.alt, nil, part.id) -- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with inline -- modifiers. The intention in either case is not to link the term. part.term = ine(part.affix_link_term) -- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being -- redundant alt text. part.alt = part.alt or (part.affix_display_term ~= part.affix_link_term and part.affix_display_term) or nil end if not data.noaffixcat then -- Now do categorization. for i, part in ipairs_with_gaps(data.parts) do local affix_type = part.affix_type if affix_type ~= "non-affix" then is_affix_or_compound = true -- Make a sort key. For the first part, use the second part as the sort key; the intention is that if the -- term has a prefix, sorting by the prefix won't be very useful so we sort by what follows, which is -- presumably the root. local part_sort_base = nil local part_sort = part.sort or data.sort_key if i == 1 and data.parts[2] and data.parts[2].term then local part2 = data.parts[2] -- If the second-part link term is empty, the user requested an unlinked term; avoid a wikitext error -- by using the alt value if available. part_sort_base = ine(part2.affix_link_term) or ine(part2.alt) if part_sort_base then part_sort_base = strip_diacritics_no_links(part2.lang, part_sort_base) end end if part.pos and rfind(part.pos, "patronym") then table.insert(categories, {cat = "patronymics", sort_key = part_sort, sort_base = part_sort_base}) end if data.pos ~= "terms" and part.pos and rfind(part.pos, "diminutive") then table.insert(categories, {cat = "diminutive " .. data.pos, sort_key = part_sort, sort_base = part_sort_base}) end -- Don't add a '*fixed with' category if the link term is empty or is in a different language. if ine(part.affix_link_term) and not part.part_lang then table.insert(categories, {cat = data.pos .. " " .. affix_type .. "ed with " .. strip_diacritics_no_links(part.lang, part.affix_link_term) .. (part.id and " (" .. part.id .. ")" or ""), sort_key = part_sort, sort_base = part_sort_base}) end else whole_words = whole_words + 1 if whole_words == 2 then is_affix_or_compound = true table.insert(categories, "compound " .. data.pos) end end end -- Make sure there was either an affix or a compound (two or more non-affix terms). if not is_affix_or_compound and not data.allow_no_affixes_or_compounds then error("The parameters did not include any affixes, and the term is not a compound. Please provide at least one affix.") end end return text_sections, categories, borrowing_type end --[==[ Implementation of {{tl|affix}} and {{tl|surface analysis}}. `data` contains all the information describing the affixes to be displayed, and contains the following: * `.lang` ('''required'''): Overall language object. Different from term-specific language objects (see `.parts` below). * `.sc`: Overall script object (usually omitted). Different from term-specific script objects. * `.parts` ('''required'''): List of objects describing the affixes to show. The general format of each object is as would be passed to `full_link()`, except that the `.lang` field should be missing unless the term is of a language different from the overall `.lang` value (in such a case, the language name is shown along with the term and an additional "derived from" category is added). '''WARNING''': The data in `.parts` will be destructively modified. * `.pos`: Overall part of speech (used in categories, defaults to {"terms"}). Different from term-specific part of speech. * `.sort_key`: Overall sort key. Normally omitted except e.g. in Japanese. * `.type`: Type of compound, if the parts in `.parts` describe a compound. Strictly optional, and if supplied, the compound type is displayed before the parts (normally capitalized, unless `.nocap` is given). * `.nocap`: Don't capitalize the first letter of text displayed before the parts (relevant only if `.type` or `.surface_analysis` is given). * `.notext`: Don't display any text before the parts (relevant only if `.type` or `.surface_analysis` is given). * `.nocat`: Disable all categorization. * `.noaffixcat`: Disable affix (and compound) categorization. Relevant for e.g. blends, which may otherwise be incorrectly categorized as compound terms. * `.lit`: Overall literal definition. Different from term-specific literal definitions. * `.force_cat`: Always display categories, even on userspace pages. * `.surface_analysis`: Implement {{surface analysis}}; adds `By surface analysis, ` before the parts. '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.show_affix(data) local text_sections, categories, borrowing_type = generate_affix_categories(data) -- Process each part for display local parts_formatted = {} for i, part in ipairs_with_gaps(data.parts) do -- Make a link for the part table.insert(parts_formatted, export.link_term(part, data, "include_separator")) end if data.surface_analysis then local text = "by " .. glossary_link("surface analysis") .. ", " if not data.nocap then text = ucfirst(text) end table.insert(text_sections, 1, text) end table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories, separator_already_added = true }) return table.concat(text_sections) end --[==[ Get only the categories that would be generated by show_affix(), without any text output or formatting. This is used by Module:etymon to get affix categorization. Returns an array of category objects, where each entry is either a string (simple category name) or a table with keys `cat`, `sort_key`, and `sort_base` for more complex categorization. `data` should have the same structure as passed to show_affix(): * `.lang` (required): Overall language object * `.parts` (required): Array of affix part objects with `.term`, `.lang`, `.id`, etc. * `.pos`: Part of speech (defaults to "terms") * `.sort_key`: Overall sort key for categories '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.get_affix_categories_only(data) local text_sections, categories, borrowing_type = generate_affix_categories(data) return categories end function export.show_surface_analysis(data) data.surface_analysis = true data.allow_no_affixes_or_compounds = true return export.show_affix(data) end --[==[ Implementation of {{tl|compound}}. '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.show_compound(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) local text_sections, categories, borrowing_type = process_etymology_type(data.type, data.nocap, data.notext, #data.parts > 0) data.borrowing_type = borrowing_type local parts_formatted = {} table.insert(categories, "compound " .. data.pos) -- Make links out of all the parts local whole_words = 0 for i, part in ipairs(data.parts) do canonicalize_part(part, data.lang, data.sc) -- Determine affix type and get link and display terms (see text at top of file). local affix_type, link_term, display_term = parse_term_for_affixes(part.term, part.lang, part.sc, part.type, not part.alt, nil, part.id) -- If the term is an interfix or the type was explicitly given, recognize it as such (which means e.g. that we -- will display the term without hyphens for East Asian languages). Otherwise, ignore the fact that it looks -- like an affix and display as specified in the template (but pay attention to the detected affix type for -- certain tracking purposes). if affix_type == "interfix" or (part.type and part.type ~= "non-affix") then -- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with -- inline modifiers. The intention in either case is not to link the term. Don't add a '*fixed with' -- category in this case, or if the term is in a different language. -- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being -- redundant alt text. if link_term and link_term ~= "" and not part.part_lang then table.insert(categories, {cat = data.pos .. " " .. affix_type .. "ed with " .. strip_diacritics_no_links(part.lang, link_term), sort_key = part.sort or data.sort_key}) end part.term = link_term ~= "" and link_term or nil part.alt = part.alt or (display_term ~= link_term and display_term) or nil else if affix_type ~= "non-affix" then local langcode = data.lang:getCode() -- If `data.lang` is an etymology-only language, track both using its code and its full parent's code. track { affix_type, affix_type .. "/lang/" .. langcode } local full_langcode = data.lang:getFullCode() if langcode ~= full_langcode then track(affix_type .. "/lang/" .. full_langcode) end else whole_words = whole_words + 1 end end table.insert(parts_formatted, export.link_term(part, data, "include_separator")) end if whole_words == 1 then track("one whole word") elseif whole_words == 0 then track("looks like confix") end table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories, separator_already_added = true }) return table.concat(text_sections) end --[==[ Implementation of {{tl|blend}}, {{tl|univerbation}} and similar "compound-like" templates. '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.show_compound_like(data) data.allow_no_affixes_or_compounds = true local text_sections, categories, borrowing_type = generate_affix_categories(data) if data.cat then table.insert(categories, data.cat) end -- Process each part for display local parts_formatted = {} for i, part in ipairs_with_gaps(data.parts) do -- Make a link for the part table.insert(parts_formatted, export.link_term(part, data, "include_separator")) end if #data.parts > 0 and data.oftext then table.insert(text_sections, 1, " " .. data.oftext .. " ") end if data.text then table.insert(text_sections, 1, data.text) end table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories, separator_already_added = true }) return table.concat(text_sections) end --[==[ Make `part` (a structure holding information on an affix part) into an affix of type `affix_type`, and apply any relevant affix mappings. For example, if the desired affix type is "suffix", this will (in general) add a hyphen onto the beginning of the term, alt, tr and ts components of the part if not already present. The hyphen that's added is the "display hyphen" (see above) and may be script-specific. (In the case of East Asian scripts, the display hyphen is an empty string whereas the template hyphen is the regular hyphen, meaning that any regular hyphen at the beginning of the part will be effectively removed.) `lang` and `sc` hold overall language and script objects. Note that this also applies any language-specific affix mappings, so that e.g. if the language is Finnish and the user specified [[-käs]] in the affix and didn't specify an `.alt` value, `part.term` will contain [[-kas]] and `part.alt` will contain [[-käs]]. This function is used by the "legacy" templates ({{tl|prefix}}, {{tl|suffix}}, {{tl|confix}}, etc.) where the nature of the affix is specified by the template itself rather than auto-determined from the affix, as is the case with {{tl|affix}}. '''WARNING''': This destructively modifies `part`. ]==] local function make_part_into_affix(part, lang, sc, affix_type) canonicalize_part(part, lang, sc) local link_term, display_term = export.make_affix(part.term, part.lang, part.sc, affix_type, not part.alt, nil, part.id) part.term = link_term -- When we don't specify `do_affix_mapping` to make_affix(), link and display terms (first and second retvals of -- make_affix()) are the same. -- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being -- redundant alt text. part.alt = part.alt and export.make_affix(part.alt, part.lang, part.sc, affix_type) or (display_term ~= link_term and display_term) or nil local Latn = require(scripts_module).getByCode("Latn") part.tr = export.make_affix(part.tr, part.lang, Latn, affix_type) part.ts = export.make_affix(part.ts, part.lang, Latn, affix_type) end local function track_wrong_affix_type(template, part, expected_affix_type) if part and not part.type then local affix_type = parse_term_for_affixes(part.term, part.lang, part.sc) if affix_type ~= expected_affix_type then local part_name = expected_affix_type or "base" local langcode = part.lang:getCode() local full_langcode = part.lang:getFullCode() require("Module:debug/track") { template, template .. "/" .. part_name, template .. "/" .. part_name .. "/" .. (affix_type or "none"), template .. "/" .. part_name .. "/" .. (affix_type or "none") .. "/lang/" .. langcode } -- If `part.lang` is an etymology-only language, track both using its code and its full parent's code. if full_langcode ~= langcode then require("Module:debug/track")( template .. "/" .. part_name .. "/" .. (affix_type or "none") .. "/lang/" .. full_langcode ) end end end end local function insert_affix_category(categories, pos, affix_type, part, sort_key, sort_base) -- Don't add a '*fixed with' category if the link term is empty or is in a different language. if part.term and not part.part_lang then local cat = pos .. " " .. affix_type .. "ed with " .. strip_diacritics_no_links(part.lang, part.term) .. (part.id and " (" .. part.id .. ")" or "") if sort_key or sort_base then table.insert(categories, {cat = cat, sort_key = sort_key, sort_base = sort_base}) else table.insert(categories, cat) end end end --[==[ Implementation of {{tl|circumfix}}. '''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`. ]==] function export.show_circumfix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. make_part_into_affix(data.prefix, data.lang, data.sc, "prefix") make_part_into_affix(data.suffix, data.lang, data.sc, "suffix") track_wrong_affix_type("circumfix", data.prefix, "prefix") track_wrong_affix_type("circumfix", data.base, nil) track_wrong_affix_type("circumfix", data.suffix, "suffix") -- Create circumfix term. local circumfix = nil if data.prefix.term and data.suffix.term then circumfix = data.prefix.term .. " " .. data.suffix.term data.prefix.alt = data.prefix.alt or data.prefix.term data.suffix.alt = data.suffix.alt or data.suffix.term data.prefix.term = circumfix data.suffix.term = circumfix end -- Make links out of all the parts. local parts_formatted = {} local categories = {} local sort_base if data.base.term then sort_base = strip_diacritics_no_links(data.base.lang, data.base.term) end table.insert(parts_formatted, export.link_term(data.prefix, data)) table.insert(parts_formatted, export.link_term(data.base, data)) table.insert(parts_formatted, export.link_term(data.suffix, data)) -- Insert the categories, but don't add a '*fixed with' category if the link term is in a different language. if not data.prefix.part_lang then table.insert(categories, {cat=data.pos .. " circumfixed with " .. strip_diacritics_no_links(data.prefix.lang, circumfix), sort_key=data.sort_key, sort_base=sort_base}) end return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|confix}}. '''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`. ]==] function export.show_confix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. make_part_into_affix(data.prefix, data.lang, data.sc, "prefix") make_part_into_affix(data.suffix, data.lang, data.sc, "suffix") track_wrong_affix_type("confix", data.prefix, "prefix") track_wrong_affix_type("confix", data.base, nil) track_wrong_affix_type("confix", data.suffix, "suffix") -- Make links out of all the parts. local parts_formatted = {} local prefix_sort_base if data.base and data.base.term then prefix_sort_base = strip_diacritics_no_links(data.base.lang, data.base.term) elseif data.suffix.term then prefix_sort_base = strip_diacritics_no_links(data.suffix.lang, data.suffix.term) end -- Insert the categories and parts. local categories = {} table.insert(parts_formatted, export.link_term(data.prefix, data)) insert_affix_category(categories, data.pos, "prefix", data.prefix, data.sort_key, prefix_sort_base) if data.base then table.insert(parts_formatted, export.link_term(data.base, data)) end table.insert(parts_formatted, export.link_term(data.suffix, data)) -- FIXME, should we be specifying a sort base here? insert_affix_category(categories, data.pos, "suffix", data.suffix) return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|infix}}. '''WARNING''': This destructively modifies both `data` and `.base` and `.infix`. ]==] function export.show_infix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. make_part_into_affix(data.infix, data.lang, data.sc, "infix") track_wrong_affix_type("infix", data.base, nil) track_wrong_affix_type("infix", data.infix, "infix") -- Make links out of all the parts. local parts_formatted = {} local categories = {} table.insert(parts_formatted, export.link_term(data.base, data)) table.insert(parts_formatted, export.link_term(data.infix, data)) -- Insert the categories. -- FIXME, should we be specifying a sort base here? insert_affix_category(categories, data.pos, "infix", data.infix) return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|prefix}}. '''WARNING''': This destructively modifies both `data` and the structures within `.prefixes`, as well as `.base`. ]==] function export.show_prefix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. for i, prefix in ipairs(data.prefixes) do make_part_into_affix(prefix, data.lang, data.sc, "prefix") end for i, prefix in ipairs(data.prefixes) do track_wrong_affix_type("prefix", prefix, "prefix") end track_wrong_affix_type("prefix", data.base, nil) -- Make links out of all the parts. local parts_formatted = {} local first_sort_base = nil local categories = {} if data.prefixes[2] then first_sort_base = ine(data.prefixes[2].term) or ine(data.prefixes[2].alt) if first_sort_base then first_sort_base = strip_diacritics_no_links(data.prefixes[2].lang, first_sort_base) end elseif data.base then first_sort_base = ine(data.base.term) or ine(data.base.alt) if first_sort_base then first_sort_base = strip_diacritics_no_links(data.base.lang, first_sort_base) end end for i, prefix in ipairs(data.prefixes) do table.insert(parts_formatted, export.link_term(prefix, data)) insert_affix_category(categories, data.pos, "prefix", prefix, data.sort_key, i == 1 and first_sort_base or nil) end if data.base then table.insert(parts_formatted, export.link_term(data.base, data)) else table.insert(parts_formatted, "") end return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|suffix}}. '''WARNING''': This destructively modifies both `data` and the structures within `.suffixes`, as well as `.base`. ]==] function export.show_suffix(data) local categories = {} data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. for i, suffix in ipairs(data.suffixes) do make_part_into_affix(suffix, data.lang, data.sc, "suffix") end track_wrong_affix_type("suffix", data.base, nil) for i, suffix in ipairs(data.suffixes) do track_wrong_affix_type("suffix", suffix, "suffix") end -- Make links out of all the parts. local parts_formatted = {} if data.base then table.insert(parts_formatted, export.link_term(data.base, data)) else table.insert(parts_formatted, "") end for i, suffix in ipairs(data.suffixes) do table.insert(parts_formatted, export.link_term(suffix, data)) end -- Insert the categories. for i, suffix in ipairs(data.suffixes) do -- FIXME, should we be specifying a sort base here? insert_affix_category(categories, data.pos, "suffix", suffix) if suffix.pos and rfind(suffix.pos, "patronym") then table.insert(categories, "patronymics") end end return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end return export mdstvxnr2kw23xqtqvre3ecwzj3cjlv Module:links 828 9524 31787 2026-05-01T09:41:05Z آیات محراج 3545 Content copied from en wiki 31787 Scribunto text/plain local export = {} --[=[ [[Unsupported titles]], pages with high memory usage, extraction modules and part-of-speech names are listed at [[Module:links/data]]. Other modules used: [[Module:script utilities]] [[Module:scripts]] [[Module:languages]] and its submodules [[Module:gender and number]] [[Module:debug/track]] ]=] local anchors_module = "Module:anchors" local debug_track_module = "Module:debug/track" local form_of_module = "Module:form of" local gender_and_number_module = "Module:gender and number" local languages_module = "Module:languages" local load_module = "Module:load" local memoize_module = "Module:memoize" local pages_module = "Module:pages" local pron_qualifier_module = "Module:pron qualifier" local scripts_module = "Module:scripts" local script_utilities_module = "Module:script utilities" local string_encode_entities_module = "Module:string/encode entities" local string_utilities_module = "Module:string utilities" local table_module = "Module:table" local utilities_module = "Module:utilities" local concat = table.concat local find = string.find local get_current_title = mw.title.getCurrentTitle local insert = table.insert local ipairs = ipairs local match = string.match local new_title = mw.title.new local pairs = pairs local remove = table.remove local sub = string.sub local toNFC = mw.ustring.toNFC local tostring = tostring local type = type local unstrip = mw.text.unstrip local NAMESPACE = get_current_title().nsText local function anchor_encode(...) anchor_encode = require(memoize_module)(mw.uri.anchorEncode, true) return anchor_encode(...) end local function debug_track(...) debug_track = require(debug_track_module) return debug_track(...) end local function decode_entities(...) decode_entities = require(string_utilities_module).decode_entities return decode_entities(...) end local function decode_uri(...) decode_uri = require(string_utilities_module).decode_uri return decode_uri(...) end -- Can't yet replace, as the [[Module:string utilities]] version no longer has automatic double-encoding prevention, which requires changes here to account for. local function encode_entities(...) encode_entities = require(string_encode_entities_module) return encode_entities(...) end local function extend(...) extend = require(table_module).extend return extend(...) end local function find_best_script_without_lang(...) find_best_script_without_lang = require(scripts_module).findBestScriptWithoutLang return find_best_script_without_lang(...) end local function format_categories(...) format_categories = require(utilities_module).format_categories return format_categories(...) end local function format_genders(...) format_genders = require(gender_and_number_module).format_genders return format_genders(...) end local function format_qualifiers(...) format_qualifiers = require(pron_qualifier_module).format_qualifiers return format_qualifiers(...) end local function get_current_L2(...) get_current_L2 = require(pages_module).get_current_L2 return get_current_L2(...) end local function get_lang(...) get_lang = require(languages_module).getByCode return get_lang(...) end local function get_script(...) get_script = require(scripts_module).getByCode return get_script(...) end local function language_anchor(...) language_anchor = require(anchors_module).language_anchor return language_anchor(...) end local function load_data(...) load_data = require(load_module).load_data return load_data(...) end local function request_script(...) request_script = require(script_utilities_module).request_script return request_script(...) end local function shallow_copy(...) shallow_copy = require(table_module).shallowCopy return shallow_copy(...) end local function split(...) split = require(string_utilities_module).split return split(...) end local function tag_text(...) tag_text = require(script_utilities_module).tag_text return tag_text(...) end local function tag_translit(...) tag_translit = require(script_utilities_module).tag_translit return tag_translit(...) end local function trim(...) trim = require(string_utilities_module).trim return trim(...) end local function u(...) u = require(string_utilities_module).char return u(...) end local function ulower(...) ulower = require(string_utilities_module).lower return ulower(...) end local function umatch(...) umatch = require(string_utilities_module).match return umatch(...) end local m_headword_data local function get_headword_data() m_headword_data = load_data("Module:headword/data") return m_headword_data end local function track(page, code) local tracking_page = "links/" .. page debug_track(tracking_page) if code then debug_track(tracking_page .. "/" .. code) end end local function selective_trim(...) -- Unconditionally trimmed charset. local always_trim = "\194\128-\194\159" .. -- U+0080-009F (C1 control characters) "\194\173" .. -- U+00AD (soft hyphen) "\226\128\170-\226\128\174" .. -- U+202A-202E (directionality formatting characters) "\226\129\166-\226\129\169" -- U+2066-2069 (directionality formatting characters) -- Standard trimmed charset. local standard_trim = "%s" .. -- (default whitespace charset) "\226\128\139-\226\128\141" .. -- U+200B-200D (zero-width spaces) always_trim -- If there are non-whitespace characters, trim all characters in `standard_trim`. -- Otherwise, only trim the characters in `always_trim`. selective_trim = function(text) if text == "" then return text end local trimmed = trim(text, standard_trim) if trimmed ~= "" then return trimmed end return trim(text, always_trim) end return selective_trim(...) end local function escape(text, str) local rep repeat text, rep = text:gsub("\\\\(\\*" .. str .. ")", "\5%1") until rep == 0 return (text:gsub("\\" .. str, "\6")) end local function unescape(text, str) return (text :gsub("\5", "\\") :gsub("\6", str)) end -- Remove bold, italics, soft hyphens, strip markers and HTML tags. local function remove_formatting(str) str = str :gsub("('*)'''(.-'*)'''", "%1%2") :gsub("('*)''(.-'*)''", "%1%2") :gsub("­", "") return (unstrip(str) :gsub("<[^<>]+>", "")) end --[==[Takes an input and splits on a double slash (taking account of escaping backslashes).]==] function export.split_on_slashes(text) if text:find("\\", nil, true) then track("escaped", "split_on_slashes") end text = split(escape(text, "//"), "//", true) or {} for i, v in ipairs(text) do text[i] = unescape(v, "//") if v == "" then text[i] = false end end return text end --[==[Takes a wikilink and outputs the link target and display text. By default, the link target will be returned as a title object, but if `allow_bad_target` is set it will be returned as a string, and no check will be performed as to whether it is a valid link target.]==] function export.get_wikilink_parts(text, allow_bad_target) -- TODO: replace `allow_bad_target` with `allow_unsupported`, with support for links to unsupported titles, including escape sequences. if ( -- Filters out anything but "[[...]]" with no intermediate "[[" or "]]". not match(text, "^()%[%[") or -- Faster than sub(text, 1, 2) ~= "[[". find(text, "[[", 3, true) or find(text, "]]", 3, true) ~= #text - 1 ) then return nil, nil end local pipe, title, display = find(text, "|", 3, true) if pipe then title, display = sub(text, 3, pipe - 1), sub(text, pipe + 1, -3) else title = sub(text, 3, -3) display = title end if allow_bad_target then return title, display end title = new_title(title) -- No title object means the target is invalid. if title == nil then return nil, nil -- If the link target starts with "#" then mw.title.new returns a broken -- title object, so grab the current title and give it the correct fragment. elseif title.prefixedText == "" then local fragment = title.fragment if fragment == "" then -- [[#]] isn't valid return nil, nil end title = get_current_title() title.fragment = fragment end return title, display end -- Does the work of export.get_fragment, but can be called directly to avoid unnecessary checks for embedded links. local function get_fragment(text) text = escape(text, "#") -- Replace numeric character references with the corresponding character (&#39; → '), -- as they contain #, which causes the numeric character reference to be -- misparsed (wa'a → wa&#39;a → pagename wa&, fragment 39;a). text = decode_entities(text) local target, fragment = text:match("^(.-)#(.+)$") target = target or text target = unescape(target, "#") fragment = fragment and unescape(fragment, "#") return target, fragment end --[==[Takes a link target and outputs the actual target and the fragment (if any).]==] function export.get_fragment(text) if text:find("\\", nil, true) then track("escaped", "get_fragment") end -- If there are no embedded links, process input. local open = find(text, "[[", nil, true) if not open then return get_fragment(text) end local close = find(text, "]]", open + 2, true) if not close then return get_fragment(text) -- If there is one, but it's redundant (i.e. encloses everything with no pipe), remove and process. elseif open == 1 and close == #text - 1 and not find(text, "|", 3, true) then return get_fragment(sub(text, 3, -3)) end -- Otherwise, return the input. return text end --[==[ Given a link target as passed to `full_link()`, get the actual page that the target refers to. This removes bold, italics, strip markets and HTML; calls `makeEntryName()` for the language in question; converts targets beginning with `*` to the Reconstruction namespace; and converts appendix-constructed languages to the Appendix namespace. Returns up to three values: # the actual page to link to, or {nil} to not link to anything; # how the target should be displayed as, if the user didn't explicitly specify any display text; generally the same as the original target, but minus any anti-asterisk !!; # the value `true` if the target had a backslash-escaped * in it (FIXME: explain this more clearly). ]==] function export.get_link_page_with_auto_display(target, lang, sc, plain) local orig_target = target if not target then return nil elseif target:find("\\", nil, true) then track("escaped", "get_link_page") end target = remove_formatting(target) if target:sub(1, 1) == ":" then track("initial colon") -- FIXME, the auto_display (second return value) should probably remove the colon return target:sub(2), orig_target end local prefix = target:match("^(.-):") -- Convert any escaped colons target = target:gsub("\\:", ":") if prefix then -- If this is an a link to another namespace or an interwiki link, ensure there's an initial colon and then -- return what we have (so that it works as a conventional link, and doesn't do anything weird like add the term -- to a category.) prefix = ulower(trim(prefix)) if prefix ~= "" and ( load_data("Module:data/namespaces")[prefix] or load_data("Module:data/interwikis")[prefix] ) then return target, orig_target end end -- Check if the term is reconstructed and remove any asterisk. Also check for anti-asterisk (!!). -- Otherwise, handle the escapes. local reconstructed, escaped, anti_asterisk if not plain then target, reconstructed = target:gsub("^%*(.)", "%1") if reconstructed == 0 then target, anti_asterisk = target:gsub("^!!(.)", "%1") if anti_asterisk == 1 then -- Remove !! from original. FIXME! We do it this way because the call to remove_formatting() above -- may cause non-initial !! to be interpreted as anti-asterisks. We should surely move the -- remove_formatting() call later. orig_target = orig_target:gsub("^!!", "") end end end target, escaped = target:gsub("^(\\-)\\%*", "%1*") if not (sc and sc:getCode() ~= "None") then sc = lang:findBestScript(target) end -- Remove carets if they are used to capitalize parts of transliterations (unless they have been escaped). if (not sc:hasCapitalization()) and sc:isTransliterated() and target:match("%^") then target = escape(target, "^") :gsub("%^", "") target = unescape(target, "^") end -- Get the entry name for the language. target = lang:makeEntryName(target, sc, reconstructed == 1 or lang:hasType("appendix-constructed")) -- If the link contains unexpanded template parameters, then don't create a link. if target:match("{{{.-}}}") then -- FIXME: Should we return the original target as the default display value (second return value)? return nil end -- Link to appendix for reconstructed terms and terms in appendix-only languages. Plain links interpret * -- literally, however. if reconstructed == 1 then if lang:getFullCode() == "und" then -- Return the original target as default display value. If we don't do this, we wrongly get -- [Term?] displayed instead. return nil, orig_target end target = "Reconstruction:" .. lang:getFullName() .. "/" .. target -- Reconstructed languages and substrates require an initial *. elseif anti_asterisk ~= 1 and (lang:hasType("reconstructed") or lang:getFamilyCode() == "qfa-sub") then error(("The specified language %s is unattested, while the term '%s' does not begin with '*' to indicate that it is reconstructed.") : format(lang:getCanonicalName(), orig_target)) elseif lang:hasType("appendix-constructed") then target = "Appendix:" .. lang:getFullName() .. "/" .. target else target = target end return target, orig_target, escaped > 0 end function export.get_link_page(target, lang, sc, plain) local target, auto_display, escaped = export.get_link_page_with_auto_display(target, lang, sc, plain) return target, escaped end -- Make a link from a given link's parts local function make_link(link, lang, sc, id, isolated, cats, no_alt_ast, plain) -- Convert percent encoding to plaintext. link.target = link.target and decode_uri(link.target, "PATH") link.fragment = link.fragment and decode_uri(link.fragment, "PATH") -- Find fragments (if one isn't already set). -- Prevents {{l|en|word#Etymology 2|word}} from linking to [[word#Etymology 2#English]]. -- # can be escaped as \#. if link.target and link.fragment == nil then link.target, link.fragment = get_fragment(link.target) end -- Process the target local auto_display, escaped link.target, auto_display, escaped = export.get_link_page_with_auto_display(link.target, lang, sc, plain) -- Create a default display form. -- If the target is "" then it's a link like [[#English]], which refers to the current page. if auto_display == "" then auto_display = (m_headword_data or get_headword_data()).pagename end -- If the display is the target and the reconstruction * has been escaped, remove the escaping backslash. if escaped then auto_display = auto_display:gsub("\\([^\\]*%*)", "%1", 1) end -- Process the display form. if link.display then local orig_display = link.display link.display = lang:makeDisplayText(link.display, sc, true) if cats then auto_display = lang:makeDisplayText(auto_display, sc) -- If the alt text is the same as what would have been automatically generated, then the alt parameter is redundant (e.g. {{l|en|foo|foo}}, {{l|en|w:foo|foo}}, but not {{l|en|w:foo|w:foo}}). -- If they're different, but the alt text could have been entered as the term parameter without it affecting the target page, then the target parameter is redundant (e.g. {{l|ru|фу|фу́}}). -- If `no_alt_ast` is true, use pcall to catch the error which will be thrown if this is a reconstructed lang and the alt text doesn't have *. if link.display == auto_display then insert(cats, lang:getFullName() .. " links with redundant alt parameters") else local ok, check if no_alt_ast then ok, check = pcall(export.get_link_page, orig_display, lang, sc, plain) else ok = true check = export.get_link_page(orig_display, lang, sc, plain) end if ok and link.target == check then insert(cats, lang:getFullName() .. " links with redundant target parameters") end end end else link.display = lang:makeDisplayText(auto_display, sc) end if not link.target then return link.display end -- If the target is the same as the current page, there is no sense id -- and either the language code is "und" or the current L2 is the current -- language then return a "self-link" like the software does. if link.target == get_current_title().prefixedText then local fragment, current_L2 = link.fragment, get_current_L2() if ( fragment and fragment == current_L2 or not (id or fragment) and (lang:getFullCode() == "und" or lang:getFullName() == current_L2) ) then return tostring(mw.html.create("strong") :addClass("selflink") :wikitext(link.display)) end end -- Add fragment. Do not add a section link to "Undetermined", as such sections do not exist and are invalid. -- TabbedLanguages handles links without a section by linking to the "last visited" section, but adding -- "Undetermined" would break that feature. For localized prefixes that make syntax error, please use the -- format: ["xyz"] = true. local prefix = link.target:match("^:*([^:]+):") prefix = prefix and ulower(prefix) if prefix ~= "category" and not (prefix and load_data("Module:data/interwikis")[prefix]) then if (link.fragment or link.target:sub(-1) == "#") and not plain then track("fragment", lang:getFullCode()) if cats then insert(cats, lang:getFullName() .. " links with manual fragments") end end if not link.fragment then if id then link.fragment = lang:getFullCode() == "und" and anchor_encode(id) or language_anchor(lang, id) elseif lang:getFullCode() ~= "und" and not (link.target:match("^Appendix:") or link.target:match("^Reconstruction:")) then link.fragment = anchor_encode(lang:getFullName()) end end end -- Put inward-facing square brackets around a link to isolated spacing character(s). if isolated and #link.display > 0 and not umatch(decode_entities(link.display), "%S") then link.display = "&#x5D;" .. link.display .. "&#x5B;" end link.target = link.target:gsub("^(:?)(.*)", function(m1, m2) return m1 .. encode_entities(m2, "#%&+/:<=>@[\\]_{|}") end) link.fragment = link.fragment and encode_entities(remove_formatting(link.fragment), "#%&+/:<=>@[\\]_{|}") return "[[" .. link.target:gsub("^[^:]", ":%0") .. (link.fragment and "#" .. link.fragment or "") .. "|" .. link.display .. "]]" end -- Split a link into its parts local function parse_link(linktext) local link = { target = linktext } local target = link.target link.target, link.display = target:match("^(..-)|(.+)$") if not link.target then link.target = target link.display = target end -- There's no point in processing these, as they aren't real links. local target_lower = link.target:lower() for _, false_positive in ipairs({ "category", "cat", "file", "image" }) do if target_lower:match("^" .. false_positive .. ":") then return nil end end link.display = decode_entities(link.display) link.target, link.fragment = get_fragment(link.target) -- So that make_link does not look for a fragment again. if not link.fragment then link.fragment = false end return link end local function check_params_ignored_when_embedded(alt, lang, id, cats) if alt then track("alt-ignored") if cats then insert(cats, lang:getFullName() .. " links with ignored alt parameters") end end if id then track("id-ignored") if cats then insert(cats, lang:getFullName() .. " links with ignored id parameters") end end end -- Find embedded links and ensure they link to the correct section. local function process_embedded_links(text, alt, lang, sc, id, cats, no_alt_ast, plain) -- Process the non-linked text. text = lang:makeDisplayText(text, sc, true) -- If the text begins with * and another character, then act as if each link begins with *. However, don't do this if the * is contained within a link at the start. E.g. `|*[[foo]]` would set all_reconstructed to true, while `|[[*foo]]` would not. local all_reconstructed = false if not plain then -- anchor_encode removes links etc. if anchor_encode(text):sub(1, 1) == "*" then all_reconstructed = true end -- Otherwise, handle any escapes. text = text:gsub("^(\\-)\\%*", "%1*") end check_params_ignored_when_embedded(alt, lang, id, cats) local function process_link(space1, linktext, space2) local capture = "[[" .. linktext .. "]]" local link = parse_link(linktext) -- Return unprocessed false positives untouched (e.g. categories). if not link then return capture end if all_reconstructed then if link.target:find("^!!") then -- Check for anti-asterisk !! at the beginning of a target, indicating that a reconstructed term -- wants a part of the term to link to a non-reconstructed term, e.g. Old English -- {{ang-noun|m|head=*[[!!Crist|Cristes]] [[!!mæsseǣfen]]}}. link.target = link.target:sub(3) -- Also remove !! from the display, which may have been copied from the target (as in mæsseǣfen in -- the example above). link.display = link.display:gsub("^!!", "") elseif not link.target:match("^%*") then link.target = "*" .. link.target end end linktext = make_link(link, lang, sc, id, false, nil, no_alt_ast, plain) :gsub("^%[%[", "\3") :gsub("%]%]$", "\4") return space1 .. linktext .. space2 end -- Use chars 1 and 2 as temporary substitutions, so that we can use charsets. These are converted to chars 3 and 4 by process_link, which means we can convert any remaining chars 1 and 2 back to square brackets (i.e. those not part of a link). text = text :gsub("%[%[", "\1") :gsub("%]%]", "\2") -- If the script uses ^ to capitalize transliterations, make sure that any carets preceding links are on the inside, so that they get processed with the following text. if ( text:find("^", nil, true) and not sc:hasCapitalization() and sc:isTransliterated() ) then text = escape(text, "^") :gsub("%^\1", "\1%^") text = unescape(text, "^") end text = text:gsub("\1(%s*)([^\1\2]-)(%s*)\2", process_link) -- Remove the extra * at the beginning of a language link if it's immediately followed by a link whose display begins with * too. if all_reconstructed then text = text:gsub("^%*\3([^|\1-\4]+)|%*", "\3%1|*") end return (text :gsub("[\1\3]", "[[") :gsub("[\2\4]", "]]") ) end local function simple_link(term, fragment, alt, lang, sc, id, cats, no_alt_ast, srwc) local plain if lang == nil then lang, plain = get_lang("und"), true end -- Get the link target and display text. If the term is the empty string, treat the input as a link to the current page. if term == "" then term = get_current_title().prefixedText elseif term then local new_term, new_alt = export.get_wikilink_parts(term, true) if new_term then check_params_ignored_when_embedded(alt, lang, id, cats) -- [[|foo]] links are treated as plaintext "[[|foo]]". -- FIXME: Pipes should be handled via a proper escape sequence, as they can occur in unsupported titles. if new_term == "" then term, alt = nil, term else local title = new_title(new_term) if title then local ns = title.namespace -- File: and Category: links should be returned as-is. if ns == 6 or ns == 14 then return term end end term, alt = new_term, new_alt if cats then if not (srwc and srwc(term, alt)) then insert(cats, lang:getFullName() .. " links with redundant wikilinks") end end end end end if alt then alt = selective_trim(alt) if alt == "" then alt = nil end end -- If there's nothing to process, return nil. if not (term or alt) then return nil end -- If there is no script, get one. if not sc then sc = lang:findBestScript(alt or term) end -- Embedded wikilinks need to be processed individually. if term then local open = find(term, "[[", nil, true) if open and find(term, "]]", open + 2, true) then return process_embedded_links(term, alt, lang, sc, id, cats, no_alt_ast, plain) end term = selective_trim(term) end -- If not, make a link using the parameters. return make_link({ target = term, display = alt, fragment = fragment }, lang, sc, id, true, cats, no_alt_ast, plain) end --[==[Creates a basic link to the given term. It links to the language section (such as <code>==English==</code>), but it does not add language and script wrappers, so any code that uses this function should call the <code class="n">[[Module:script utilities#tag_text|tag_text]]</code> from [[Module:script utilities]] to add such wrappers itself at some point. The first argument, <code class="n">data</code>, may contain the following items, a subset of the items used in the <code class="n">data</code> argument of <code class="n">full_link</code>. If any other items are included, they are ignored. { { term = entry_to_link_to, alt = link_text_or_displayed_text, lang = language_object, id = sense_id, } } ; <code class="n">term</code> : Text to turn into a link. This is generally the name of a page. The text can contain wikilinks already embedded in it. These are processed individually just like a single link would be. The <code class="n">alt</code> argument is ignored in this case. ; <code class="n">alt</code> (''optional'') : The alternative display for the link, if different from the linked page. If this is {{code|lua|nil}}, the <code class="n">text</code> argument is used instead (much like regular wikilinks). If <code class="n">text</code> contains wikilinks in it, this argument is ignored and has no effect. (Links in which the alt is ignored are tracked with the tracking template {{whatlinkshere|tracking=links/alt-ignored}}.) ; <code class="n">lang</code> : The [[Module:languages#Language objects|language object]] for the term being linked. If this argument is defined, the function will determine the language's canonical name (see [[Template:language data documentation]]), and point the link or links in the <code class="n">term</code> to the language's section of an entry, or to a language-specific senseid if the <code class="n">id</code> argument is defined. ; <code class="n">id</code> (''optional'') : Sense id string. If this argument is defined, the link will point to a language-specific sense id ({{ll|en|identifier|id=HTML}}) created by the template {{temp|senseid}}. A sense id consists of the language's canonical name, a hyphen (<code>-</code>), and the string that was supplied as the <code class="n">id</code> argument. This is useful when a term has more than one sense in a language. If the <code class="n">term</code> argument contains wikilinks, this argument is ignored. (Links in which the sense id is ignored are tracked with the tracking template {{whatlinkshere|tracking=links/id-ignored}}.) The second argument is as follows: ; <code class="n">allow_self_link</code> : If {{code|lua|true}}, the function will also generate links to the current page. The default ({{code|lua|false}}) will not generate a link but generate a bolded "self link" instead. The following special options are processed for each link (both simple text and with embedded wikilinks): * The target page name will be processed to generate the correct entry name. This is done by the [[Module:languages#makeEntryName|makeEntryName]] function in [[Module:languages]], using the <code class="n">entry_name</code> replacements in the language's data file (see [[Template:language data documentation]] for more information). This function is generally used to automatically strip dictionary-only diacritics that are not part of the normal written form of a language. * If the text starts with <code class="n">*</code>, then the term is considered a reconstructed term, and a link to the Reconstruction: namespace will be created. If the text contains embedded wikilinks, then <code class="n">*</code> is automatically applied to each one individually, while preserving the displayed form of each link as it was given. This allows linking to phrases containing multiple reconstructed terms, while only showing the * once at the beginning. * If the text starts with <code class="n">:</code>, then the link is treated as "raw" and the above steps are skipped. This can be used in rare cases where the page name begins with <code class="n">*</code> or if diacritics should not be stripped. For example: ** {{temp|l|en|*nix}} links to the nonexistent page [[Reconstruction:English/nix]] (<code class="n">*</code> is interpreted as a reconstruction), but {{temp|l|en|:*nix}} links to [[*nix]]. ** {{temp|l|sl|Franche-Comté}} links to the nonexistent page [[Franche-Comte]] (<code>é</code> is converted to <code>e</code> by <code class="n">makeEntryName</code>), but {{temp|l|sl|:Franche-Comté}} links to [[Franche-Comté]].]==] function export.language_link(data) if type(data) ~= "table" then error( "The first argument to the function language_link must be a table. See Module:links/documentation for more information.") elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then track("escaped", "language_link") end -- Categorize links to "und". local lang, cats = data.lang, data.cats if cats and lang:getCode() == "und" then insert(cats, "Undetermined language links") end return simple_link( data.term, data.fragment, data.alt, lang, data.sc, data.id, cats, data.no_alt_ast, data.suppress_redundant_wikilink_cat ) end function export.plain_link(data) if type(data) ~= "table" then error( "The first argument to the function plain_link must be a table. See Module:links/documentation for more information.") elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then track("escaped", "plain_link") end return simple_link( data.term, data.fragment, data.alt, nil, data.sc, data.id, data.cats, data.no_alt_ast, data.suppress_redundant_wikilink_cat ) end --[==[Replace any links with links to the correct section, but don't link the whole text if no embedded links are found. Returns the display text form.]==] function export.embedded_language_links(data) if type(data) ~= "table" then error( "The first argument to the function embedded_language_links must be a table. See Module:links/documentation for more information.") elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then track("escaped", "embedded_language_links") end local term, lang, sc = data.term, data.lang, data.sc -- If we don't have a script, get one. if not sc then sc = lang:findBestScript(term) end -- Do we have embedded wikilinks? If so, they need to be processed individually. local open = find(term, "[[", nil, true) if open and find(term, "]]", open + 2, true) then return process_embedded_links(term, data.alt, lang, sc, data.id, data.cats, data.no_alt_ast) end -- If not, return the display text. term = selective_trim(term) -- FIXME: Double-escape any percent-signs, because we don't want to treat non-linked text as having percent-encoded characters. This is a hack: percent-decoding should come out of [[Module:languages]] and only dealt with in this module, as it's specific to links. term = term:gsub("%%", "%%25") return lang:makeDisplayText(term, sc, true) end function export.mark(text, item_type, face, lang) local tag = { "", "" } if item_type == "gloss" then tag = { '<span class="mention-gloss-double-quote">“</span><span class="mention-gloss">', '</span><span class="mention-gloss-double-quote">”</span>' } if type(text) == "string" and text:match("^''[^'].*''$") then -- Temporary tracking for mention glosses that are entirely italicized or bolded, which is probably -- wrong. (Note that this will also find bolded mention glosses since they use triple apostrophes.) track("italicized-mention-gloss", lang and lang:getFullCode() or nil) end elseif item_type == "tr" then if face == "term" then tag = { '<span lang="' .. lang:getFullCode() .. '" class="tr mention-tr Latn">', '</span>' } else tag = { '<span lang="' .. lang:getFullCode() .. '" class="tr Latn">', '</span>' } end elseif item_type == "ts" then -- \226\129\160 = word joiner (zero-width non-breaking space) U+2060 tag = { '<span class="ts mention-ts Latn">/\226\129\160', '\226\129\160/</span>' } elseif item_type == "pos" then tag = { '<span class="ann-pos">', '</span>' } elseif item_type == "non-gloss" then tag = { '<span class="ann-non-gloss">', '</span>' } elseif item_type == "annotations" then tag = { '<span class="mention-gloss-paren annotation-paren">(</span>', '<span class="mention-gloss-paren annotation-paren">)</span>' } elseif item_type == "infl" then tag = { '<span class="ann-infl">', '</span>' } end if type(text) == "string" then return tag[1] .. text .. tag[2] else return "" end end local pos_tags --[==[Formats the annotations that are displayed with a link created by {{code|lua|full_link}}. Annotations are the extra bits of information that are displayed following the linked term, and include things such as gender, transliteration, gloss and so on. * The first argument is a table possessing some or all of the following keys: *:; <code class="n">genders</code> *:: Table containing a list of gender specifications in the style of [[Module:gender and number]]. *:; <code class="n">tr</code> *:: Transliteration. *:; <code class="n">gloss</code> *:: Gloss that translates the term in the link, or gives some other descriptive information. *:; <code class="n">pos</code> *:: Part of speech of the linked term. If the given argument matches one of the aliases in `pos_aliases` in [[Module:headword/data]], or consists of a part of speech or alias followed by `f` (for a non-lemma form), expand it appropriately. Otherwise, just show the given text as it is. *:; <code class="n">ng</code> *:: Arbitrary non-gloss descriptive text for the link. This should be used in preference to putting descriptive text in `gloss` or `pos`. *:; <code class="n">lit</code> *:: Literal meaning of the term, if the usual meaning is figurative or idiomatic. *:; <code class="n">infl</code> *:: Table containing a list of grammar tags in the style of [[Module:form of]] `tagged_inflections`. *:Any of the above values can be omitted from the <code class="n">info</code> argument. If a completely empty table is given (with no annotations at all), then an empty string is returned. * The second argument is a string. Valid values are listed in [[Module:script utilities/data]] "data.translit" table.]==] function export.format_link_annotations(data, face) local output = {} -- Interwiki link if data.interwiki then insert(output, data.interwiki) end -- Genders if type(data.genders) ~= "table" then data.genders = { data.genders } end if data.genders and #data.genders > 0 then local genders, gender_cats = format_genders(data.genders, data.lang) insert(output, "&nbsp;" .. genders) if gender_cats then local cats = data.cats if cats then extend(cats, gender_cats) end end end local annotations = {} -- Transliteration and transcription if data.tr and data.tr[1] or data.ts and data.ts[1] then local kind if face == "term" then kind = face else kind = "default" end if data.tr[1] and data.ts[1] then insert(annotations, tag_translit(data.tr[1], data.lang, kind) .. " " .. export.mark(data.ts[1], "ts")) elseif data.ts[1] then insert(annotations, export.mark(data.ts[1], "ts")) else insert(annotations, tag_translit(data.tr[1], data.lang, kind)) end end -- Gloss/translation if data.gloss then insert(annotations, export.mark(data.gloss, "gloss")) end -- Part of speech if data.pos then -- debug category for pos= containing transcriptions if data.pos:match("/[^><]-/") then data.pos = data.pos .. "[[Category:links likely containing transcriptions in pos]]" end -- Canonicalize part of speech aliases as well as non-lemma aliases like 'nf' or 'nounf' for "noun form". pos_tags = pos_tags or (m_headword_data or get_headword_data()).pos_aliases local pos = pos_tags[data.pos] if not pos and data.pos:find("f$") then local pos_form = data.pos:sub(1, -2) -- We only expand something ending in 'f' if the result is a recognized non-lemma POS. pos_form = (pos_tags[pos_form] or pos_form) .. " form" if (m_headword_data or get_headword_data()).nonlemmas[pos_form .. "s"] then pos = pos_form end end insert(annotations, export.mark(pos or data.pos, "pos")) end -- Inflection data if data.infl then local m_form_of = require(form_of_module) -- Split tag sets manually, since tagged_inflections creates a numbered list, and we do not want that. local infl_outputs = {} local tag_sets = m_form_of.split_tag_set(data.infl) for _, tag_set in ipairs(tag_sets) do table.insert(infl_outputs, m_form_of.tagged_inflections({ tags = tag_set, lang = data.lang, nocat = true, nolink = true, nowrap = true })) end insert(annotations, export.mark(table.concat(infl_outputs, "; "), "infl")) end -- Non-gloss text if data.ng then insert(annotations, export.mark(data.ng, "non-gloss")) end -- Literal/sum-of-parts meaning if data.lit then insert(annotations, "literally " .. export.mark(data.lit, "gloss")) end -- Provide a hook to insert additional annotations such as nested inflections. if data.postprocess_annotations then data.postprocess_annotations { data = data, annotations = annotations } end if #annotations > 0 then insert(output, " " .. export.mark(concat(annotations, ", "), "annotations")) end return concat(output) end -- Encode certain characters to avoid various delimiter-related issues at various stages. We need to encode < and > -- because they end up forming part of CSS class names inside of <span ...> and will interfere with finding the end -- of the HTML tag. I first tried converting them to URL encoding, i.e. %3C and %3E; they then appear in the URL as -- %253C and %253E, which get mapped back to %3C and %3E when passed to [[Module:accel]]. But mapping them to &lt; -- and &gt; somehow works magically without any further work; they appear in the URL as < and >, and get passed to -- [[Module:accel]] as < and >. I have no idea who along the chain of calls is doing the encoding and decoding. If -- someone knows, please modify this comment appropriately! local accel_char_map local function get_accel_char_map() accel_char_map = { ["%"] = ".", [" "] = "_", ["_"] = u(0xFFF0), ["<"] = "&lt;", [">"] = "&gt;", } return accel_char_map end local function encode_accel_param_chars(param) return (param:gsub("[% <>_]", accel_char_map or get_accel_char_map())) end local function encode_accel_param(prefix, param) if not param then return "" end if type(param) == "table" then local filled_params = {} -- There may be gaps in the sequence, especially for translit params. local maxindex = 0 for k in pairs(param) do if type(k) == "number" and k > maxindex then maxindex = k end end for i = 1, maxindex do filled_params[i] = param[i] or "" end -- [[Module:accel]] splits these up again. param = concat(filled_params, "*~!") end -- This is decoded again by [[WT:ACCEL]]. return prefix .. encode_accel_param_chars(param) end local function insert_if_not_blank(list, item) if item == "" then return end insert(list, item) end local function get_class(lang, tr, accel, nowrap) if not accel and not nowrap then return "" end local classes = {} if accel then insert(classes, "form-of lang-" .. lang:getFullCode()) local form = accel.form if form then insert(classes, encode_accel_param_chars(form) .. "-form-of") end insert_if_not_blank(classes, encode_accel_param("gender-", accel.gender)) insert_if_not_blank(classes, encode_accel_param("pos-", accel.pos)) insert_if_not_blank(classes, encode_accel_param("transliteration-", accel.translit or (tr ~= "-" and tr or nil))) insert_if_not_blank(classes, encode_accel_param("target-", accel.target)) insert_if_not_blank(classes, encode_accel_param("origin-", accel.lemma)) insert_if_not_blank(classes, encode_accel_param("origin_transliteration-", accel.lemma_translit)) if accel.no_store then insert(classes, "form-of-nostore") end end if nowrap then insert(classes, nowrap) end return concat(classes, " ") end -- Add any left or right regular or accent qualifiers, labels or references to a formatted term. `data` is the object -- specifying the term, which should optionally contain: -- * a language object in `lang`; required if any accent qualifiers or labels are given; -- * left regular qualifiers in `q` (an array of strings or a single string); an empty array or blank string will be -- ignored; -- * right regular qualifiers in `qq` (an array of strings or a single string); an empty array or blank string will be -- ignored; -- * left accent qualifiers in `a` (an array of strings); an empty array will be ignored; -- * right accent qualifiers in `aa` (an array of strings); an empty array will be ignored; -- * left labels in `l` (an array of strings); an empty array will be ignored; -- * right labels in `ll` (an array of strings); an empty array will be ignored; -- * references in `refs`, an array either of strings (formatted reference text) or objects containing fields `text` -- (formatted reference text) and optionally `name` and/or `group`. -- `formatted` is the formatted version of the term itself. local function add_qualifiers_and_refs_to_term(data, formatted) local q = data.q if type(q) == "string" then q = { q } end local qq = data.qq if type(qq) == "string" then qq = { qq } end if q and q[1] or qq and qq[1] or data.a and data.a[1] or data.aa and data.aa[1] or data.l and data.l[1] or data.ll and data.ll[1] or data.refs and data.refs[1] then formatted = format_qualifiers { lang = data.lang, text = formatted, q = q, qq = qq, a = data.a, aa = data.aa, l = data.l, ll = data.ll, refs = data.refs, } end return formatted end --[==[ Creates a full link, with annotations (see `[[#format_link_annotations|format_link_annotations]]`), in the style of {{tl|l}} or {{tl|m}}. The first argument, `data`, must be a table. It contains the various elements that can be supplied as parameters to {{tl|l}} or {{tl|m}}: { { term = entry_to_link_to, alt = link_text_or_displayed_text, lang = language_object, sc = script_object, track_sc = boolean, no_nonstandard_sc_cat = boolean, fragment = link_fragment, id = sense_id, genders = { "gender1", "gender2", ... }, tr = transliteration, respect_link_tr = boolean, ts = transcription, gloss = gloss, pos = part_of_speech_tag, ng = non-gloss text, lit = literal_translation, infl = { "form_of_grammar_tag1", "form_of_grammar_tag2", ... }, no_alt_ast = boolean, accel = {accelerated_creation_tags}, interwiki = interwiki, pretext = "text_at_beginning" or nil, posttext = "text_at_end" or nil, q = { "left_qualifier1", "left_qualifier2", ...} or "left_qualifier", qq = { "right_qualifier1", "right_qualifier2", ...} or "right_qualifier", l = { "left_label1", "left_label2", ...}, ll = { "right_label1", "right_label2", ...}, a = { "left_accent_qualifier1", "left_accent_qualifier2", ...}, aa = { "right_accent_qualifier1", "right_accent_qualifier2", ...}, refs = { "formatted_ref1", "formatted_ref2", ...} or { {text = "text", name = "name", group = "group"}, ... }, show_qualifiers = boolean, } } Any one of the items in the `data` table may be {nil}, but an error will be shown if neither `term` nor `alt` nor `tr` is present. Thus, calling {full_link{ term = term, lang = lang, sc = sc }}, where `term` is the page to link to (which may have diacritics that will be stripped and/or embedded bracketed links) and `lang` is a [[Module:languages#Language objects|language object]] from [[Module:languages]], will give a plain link similar to the one produced by the template {{tl|l}}, and calling {full_link( { term = term, lang = lang, sc = sc }, "term" )} will give a link similar to the one produced by the template {{tl|m}}. The function will: * Try to determine the script, based on the characters found in the `term` or `alt` argument, if the script was not given. If a script is given and `track_sc` is {true}, it will check whether the input script is the same as the one which would have been automatically generated and add the category [[:Category:LANG terms with redundant script codes]] if yes, or [[:Category:LANG terms with non-redundant manual script codes]] if no. This should be used when the input script object is directly determined by a template's `sc` parameter. * Call `[[#language_link|language_link]]` on the `term` or `alt` forms, to remove diacritics in the page name, process any embedded wikilinks and create links to Reconstruction or Appendix pages when necessary. * Call `[[Module:script utilities#tag_text]]` to add the appropriate language and script tags to the term and italicize terms written in the Latin script if necessary. Accelerated creation tags, as used by [[WT:ACCEL]], are included. * Generate a transliteration, based on the `alt` or `term` arguments, if the script is not Latin, no transliteration was provided in `tr` and the combination of the term's language and script support automatic transliteration. The transliteration itself will be linked if both `.respect_link_tr` is specified and the language of the term has the `link_tr` property set for the script of the term; but not otherwise. * Add the annotations (transliteration, gender, gloss, etc.) after the link. * If `no_alt_ast` is specified, then the `alt` text does not need to contain an asterisk if the language is reconstructed. This should only be used by modules which really need to allow links to reconstructions that don't display asterisks (e.g. number boxes). * If `pretext` or `posttext` is specified, this is text to (respectively) prepend or append to the output, directly before processing qualifiers, labels and references. This can be used to add arbitrary extra text inside of the qualifiers, labels and references. * If `show_qualifiers` is specified or the `show_qualifiers` argument is given, then left and right qualifiers, accent qualifiers, labels and references will be displayed, otherwise they will be ignored. (This is because a fair amount of code stores qualifiers, labels and/or references in these fields and displays them itself, rather than expecting {full_link()} to display them.)]==] function export.full_link(data, face, allow_self_link, show_qualifiers) if type(data) ~= "table" then error("The first argument to the function full_link must be a table. " .. "See Module:links/documentation for more information.") elseif data.term and data.term:find("\\", nil, true) or data.alt and data.alt:find("\\", nil, true) then track("escaped", "full_link") end -- Prevent data from being destructively modified. local data = shallow_copy(data) -- FIXME: this shouldn't be added to `data`, as that means the input table needs to be cloned. data.cats = {} -- Categorize links to "und". local lang, cats = data.lang, data.cats if cats and lang:getCode() == "und" then insert(cats, "Undetermined language links") end local terms = { true } -- Generate multiple forms if applicable. for _, param in ipairs { "term", "alt" } do if type(data[param]) == "string" and data[param]:find("//", nil, true) then data[param] = export.split_on_slashes(data[param]) elseif type(data[param]) == "string" and not (type(data.term) == "string" and data.term:find("//", nil, true)) then if not data.no_generate_forms then data[param] = lang:generateForms(data[param]) else data[param] = { data[param] } end else data[param] = {} end end for _, param in ipairs { "sc", "tr", "ts" } do data[param] = { data[param] } end for _, param in ipairs { "term", "alt", "sc", "tr", "ts" } do for i in pairs(data[param]) do terms[i] = true end end -- Create the link local output = {} local id, no_alt_ast, srwc, accel, nevercalltr = data.id, data.no_alt_ast, data.suppress_redundant_wikilink_cat, data.accel, data.never_call_transliteration_module local link_tr = data.respect_link_tr and lang:link_tr(data.sc[1]) for i in ipairs(terms) do local link -- Is there any text to show? if (data.term[i] or data.alt[i]) then -- Try to detect the script if it was not provided local display_term = data.alt[i] or data.term[i] local best = lang:findBestScript(display_term) -- no_nonstandard_sc_cat is intended for use in [[Module:interproject]] if ( not data.no_nonstandard_sc_cat and best:getCode() == "None" and find_best_script_without_lang(display_term):getCode() ~= "None" ) then insert(cats, lang:getFullName() .. " terms in nonstandard scripts") end if not data.sc[i] then data.sc[i] = best -- Track uses of sc parameter. elseif data.track_sc then if data.sc[i]:getCode() == best:getCode() then insert(cats, lang:getFullName() .. " terms with redundant script codes") else insert(cats, lang:getFullName() .. " terms with non-redundant manual script codes") end end -- If using a discouraged character sequence, add to maintenance category if data.sc[i]:hasNormalizationFixes() == true then if (data.term[i] and data.sc[i]:fixDiscouragedSequences(toNFC(data.term[i])) ~= toNFC(data.term[i])) or (data.alt[i] and data.sc[i]:fixDiscouragedSequences(toNFC(data.alt[i])) ~= toNFC(data.alt[i])) then insert(cats, "Pages using discouraged character sequences") end end link = simple_link( data.term[i], data.fragment, data.alt[i], lang, data.sc[i], id, cats, no_alt_ast, srwc ) end -- simple_link can return nil, so check if a link has been generated. if link then -- Add "nowrap" class to prefixes in order to prevent wrapping after the hyphen local nowrap local display_term = data.alt[i] or data.term[i] if display_term and (display_term:find("^%-") or display_term:find("^־")) then -- Hebrew maqqef -- FIXME, use hyphens from [[Module:affix]] nowrap = "nowrap" end link = tag_text(link, lang, data.sc[i], face, get_class(lang, data.tr[i], accel, nowrap)) else --[[ No term to show. Is there at least a transliteration we can work from? ]] link = request_script(lang, data.sc[i]) -- No link to show, and no transliteration either. Show a term request (unless it's a substrate, as they rarely take terms). if (link == "" or (not data.tr[i]) or data.tr[i] == "-") and lang:getFamilyCode() ~= "qfa-sub" then -- If there are multiple terms, break the loop instead. if i > 1 then remove(output) break elseif NAMESPACE ~= "Template" then insert(cats, lang:getFullName() .. " term requests") end link = "<small>[Term?]</small>" end end insert(output, link) if i < #terms then insert(output, "<span class=\"Zsym mention\" style=\"font-size:100%;\">&nbsp;/ </span>") end end -- When suppress_tr is true, do not show or generate any transliteration if data.suppress_tr then data.tr[1] = nil else -- TODO: Currently only handles the first transliteration, pending consensus on how to handle multiple translits for multiple forms, as this is not always desirable (e.g. traditional/simplified Chinese). if data.tr[1] == "" or data.tr[1] == "-" then data.tr[1] = nil else local phonetic_extraction = load_data("Module:links/data").phonetic_extraction phonetic_extraction = phonetic_extraction[lang:getCode()] or phonetic_extraction[lang:getFullCode()] if phonetic_extraction then data.tr[1] = data.tr[1] or require(phonetic_extraction).getTranslit(export.remove_links(data.alt[1] or data.term[1])) elseif (data.term[1] or data.alt[1]) and data.sc[1]:isTransliterated() then -- Track whenever there is manual translit. The categories below like 'terms with redundant transliterations' -- aren't sufficient because they only work with reference to automatic translit and won't operate at all in -- languages without any automatic translit, like Persian and Hebrew. if data.tr[1] then local full_code = lang:getFullCode() track("manual-tr", full_code) end if not nevercalltr then -- Try to generate a transliteration. local text = data.alt[1] or data.term[1] if not link_tr then text = export.remove_links(text, true) end local automated_tr = lang:transliterate(text, data.sc[1]) if automated_tr then local manual_tr = data.tr[1] if manual_tr then if export.remove_links(manual_tr) == export.remove_links(automated_tr) then insert(cats, lang:getFullName() .. " terms with redundant transliterations") else -- Prevents Arabic root categories from flooding the tracking categories. if NAMESPACE ~= "Category" then insert(cats, lang:getFullName() .. " terms with non-redundant manual transliterations") end end end if not manual_tr or lang:overrideManualTranslit(data.sc[1]) then data.tr[1] = automated_tr end end end end end end -- Link to the transliteration entry for languages that require this if data.tr[1] and link_tr and not data.tr[1]:match("%[%[(.-)%]%]") then data.tr[1] = simple_link( data.tr[1], nil, nil, lang, get_script("Latn"), nil, cats, no_alt_ast, srwc ) elseif data.tr[1] and not link_tr then -- Remove the pseudo-HTML tags added by remove_links. data.tr[1] = data.tr[1]:gsub("</?link>", "") end if data.tr[1] and not umatch(data.tr[1], "[^%s%p]") then data.tr[1] = nil end insert(output, export.format_link_annotations(data, face)) if data.pretext then insert(output, 1, data.pretext) end if data.posttext then insert(output, data.posttext) end local categories = cats[1] and format_categories(cats, lang, "-", nil, nil, data.sc) or "" output = concat(output) if show_qualifiers or data.show_qualifiers then output = add_qualifiers_and_refs_to_term(data, output) end return output .. categories end --[==[Replaces all wikilinks with their displayed text, and removes any categories. This function can be invoked either from a template or from another module. -- Strips links: deletes category links, the targets of piped links, and any double square brackets involved in links (other than file links, which are untouched). If `tag` is set, then any links removed will be given pseudo-HTML tags, which allow the substitution functions in [[Module:languages]] to properly subdivide the text in order to reduce the chance of substitution failures in modules which scrape pages like [[Module:zh-translit]]. -- FIXME: This is quite hacky. We probably want this to be integrated into [[Module:languages]], but we can't do that until we know that nothing is pushing pipe linked transliterations through it for languages which don't have link_tr set. * <code><nowiki>[[page|displayed text]]</nowiki></code> &rarr; <code><nowiki>displayed text</nowiki></code> * <code><nowiki>[[page and displayed text]]</nowiki></code> &rarr; <code><nowiki>page and displayed text</nowiki></code> * <code><nowiki>[[Category:English lemmas|WORD]]</nowiki></code> &rarr; ''(nothing)'']==] function export.remove_links(text, tag) if type(text) == "table" then text = text.args[1] end if not text or text == "" then return "" end text = text :gsub("%[%[", "\1") :gsub("%]%]", "\2") -- Parse internal links for the display text. text = text:gsub("(\1)([^\1\2]-)(\2)", function(c1, c2, c3) -- Don't remove files. for _, false_positive in ipairs({ "file", "image" }) do if c2:lower():match("^" .. false_positive .. ":") then return c1 .. c2 .. c3 end end -- Remove categories completely. for _, false_positive in ipairs({ "category", "cat" }) do if c2:lower():match("^" .. false_positive .. ":") then return "" end end -- In piped links, remove all text before the pipe, unless it's the final character (i.e. the pipe trick), in which case just remove the pipe. c2 = c2:match("^[^|]*|(.+)") or c2:match("([^|]+)|$") or c2 if tag then return "<link>" .. c2 .. "</link>" else return c2 end end) text = text :gsub("\1", "[[") :gsub("\2", "]]") return text end function export.section_link(link) if type(link) ~= "string" then error("The first argument to section_link was a " .. type(link) .. ", but it should be a string.") elseif link:find("\\", nil, true) then track("escaped", "section_link") end local target, section = get_fragment((link:gsub("_", " "))) if not section then error("No \"#\" delineating a section name") end return simple_link( target, section, target .. " §&nbsp;" .. section ) end return export go1a5j6fymqq8baizjbz87vusf7r536 Module:string utilities 828 9525 31788 2026-05-01T09:41:57Z آیات محراج 3545 Created page with "local export = {} local function_module = "Module:fun" local load_module = "Module:load" local memoize_module = "Module:memoize" local string_char_module = "Module:string/char" local string_charset_escape_module = "Module:string/charsetEscape" local mw = mw local string = string local table = table local ustring = mw.ustring local byte = string.byte local char = string.char local concat = table.concat local find = string.find local format = string.format local gmatch..." 31788 Scribunto text/plain local export = {} local function_module = "Module:fun" local load_module = "Module:load" local memoize_module = "Module:memoize" local string_char_module = "Module:string/char" local string_charset_escape_module = "Module:string/charsetEscape" local mw = mw local string = string local table = table local ustring = mw.ustring local byte = string.byte local char = string.char local concat = table.concat local find = string.find local format = string.format local gmatch = string.gmatch local gsub = string.gsub local insert = table.insert local len = string.len local lower = string.lower local match = string.match local next = next local require = require local reverse = string.reverse local select = select local sort = table.sort local sub = string.sub local tonumber = tonumber local tostring = tostring local type = type local ucodepoint = ustring.codepoint local ufind = ustring.find local ugcodepoint = ustring.gcodepoint local ugmatch = ustring.gmatch local ugsub = ustring.gsub local ulower = ustring.lower local umatch = ustring.match local unpack = unpack or table.unpack -- Lua 5.2 compatibility local upper = string.upper local usub = ustring.sub local uupper = ustring.upper local memoize = require(memoize_module) -- Defined below. local codepoint local explode_utf8 local format_fun local get_charset local gsplit local pattern_escape local pattern_simplifier local replacement_escape local title_case local trim local ucfirst local ulen --[==[ Loaders for functions in other modules, which overwrite themselves with the target function when called. This ensures modules are only loaded when needed, retains the speed/convenience of locally-declared pre-loaded functions, and has no overhead after the first call, since the target functions are called directly in any subsequent calls. ]==] local function charset_escape(...) charset_escape = require(string_charset_escape_module) return charset_escape(...) end local function is_callable(...) is_callable = require(function_module).is_callable return is_callable(...) end local function load_data(...) load_data = require(load_module).load_data return load_data(...) end local function u(...) u = require(string_char_module) return u(...) end local function prepare_iter(str, pattern, str_lib, plain) local callable = is_callable(pattern) if str_lib or plain then return pattern, #str, string, callable elseif not callable then local simple = pattern_simplifier(pattern) if simple then return simple, #str, string, false end end return pattern, ulen(str), ustring, callable end --[==[ Returns {nil} if the input value is the empty string, or otherwise the same value. If the input is a string and `do_trim` is set, the input value will be trimmed before returning; if the trimmed value is the empty string, returns {nil}. If `quote_delimiters` is set, then any outer pair of quotation marks ({' '} or {" "}) surrounding the rest of the input string will be stripped, if present. The string will not be trimmed again, converted to {nil}, or have further quotation marks stripped, as it exists as a way to embed spaces or the empty string in an input. Genuine quotation marks may also be embedded this way (e.g. {"''foo''"} returns {"'foo'"}). ]==] function export.is_not_empty(str, do_trim, quote_delimiters) if str == "" then return nil elseif not (str and type(str) == "string") then return str elseif do_trim then str = trim(str) if str == "" then return nil end end return quote_delimiters and gsub(str, "^(['\"])(.*)%1$", "%2") or str end --[==[ Explodes a string into an array of UTF-8 characters. '''Warning''': this function assumes that the input is valid UTF-8 in order to optimize speed and memory use. Passing in an input containing non-UTF-8 byte sequences could result in unexpected behaviour. ]==] function export.explode_utf8(str) local text, i = {}, 0 for ch in gmatch(str, ".[\128-\191]*") do i = i + 1 text[i] = ch end return text end explode_utf8 = export.explode_utf8 --[==[ Returns {true} if `str` is a valid UTF-8 string. This is true if, for each character, all of the following are true: * It has the expected number of bytes, which is determined by value of the leading byte: 1-byte characters are `0x00` to `0x7F`, 2-byte characters start with `0xC2` to `0xDF`, 3-byte characters start with `0xE0` to `0xEF`, and 4-byte characters start with `0xF0` to `0xF4`. * The leading byte must not fall outside of the above ranges. * The trailing byte(s) (if any), must be between `0x80` to `0xBF`. * The character's codepoint must be between U+0000 (`0x00`) and U+10FFFF (`0xF4 0x8F 0xBF 0xBF`). * The character cannot have an overlong encoding: for each byte length, the lowest theoretical encoding is equivalent to U+0000 (e.g. `0xE0 0x80 0x80`, the lowest theoretical 3-byte encoding, is exactly equivalent to U+0000). Encodings that use more than the minimum number of bytes are not considered valid, meaning that the first valid 3-byte character is `0xE0 0xA0 0x80` (U+0800), and the first valid 4-byte character is `0xF0 0x90 0x80 0x80` (U+10000). Formally, 2-byte characters have leading bytes ranging from `0xC0` to `0xDF` (rather than `0xC2` to `0xDF`), but `0xC0 0x80` to `0xC1 0xBF` are overlong encodings, so it is simpler to say that the 2-byte range begins at `0xC2`. If `allow_surrogates` is set, surrogates (U+D800 to U+DFFF) will be treated as valid UTF-8. Surrogates are used in UTF-16, which encodes codepoints U+0000 to U+FFFF with 2 bytes, and codepoints from U+10000 upwards using a pair of surrogates, which are taken together as a 4-byte unit. Since surrogates have no use in UTF-8, as it encodes higher codepoints in a different way, they are not considered valid in UTF-8 text. However, there are limited circumstances where they may be necessary: for instance, JSON escapes characters using the format `\u0000`, which must contain exactly 4 hexadecimal digits; under the scheme, codepoints above U+FFFF must be escaped as the equivalent pair of surrogates, even though the text itself must be encoded in UTF-8 (e.g. U+10000 becomes `\uD800\uDC00`). ]==] function export.isutf8(str, allow_surrogates) for ch in gmatch(str, "[\128-\255][\128-\191]*") do if #ch > 4 then return false end local b1, b2, b3, b4 = byte(ch, 1, 4) if not (b2 and b2 >= 0x80 and b2 <= 0xBF) then return false -- 1-byte is always invalid, as gmatch excludes 0x00 to 0x7F elseif not b3 then -- 2-byte if not (b1 >= 0xC2 and b1 <= 0xDF) then -- b1 == 0xC0 or b1 == 0xC1 is overlong return false end elseif not (b3 >= 0x80 and b3 <= 0xBF) then -- trailing byte return false elseif not b4 then -- 3-byte if b1 > 0xEF then return false elseif b2 < 0xA0 then if b1 < 0xE1 then -- b1 == 0xE0 and b2 < 0xA0 is overlong return false end elseif b1 < 0xE0 or (b1 == 0xED and not allow_surrogates) then -- b1 == 0xED and b2 >= 0xA0 is a surrogate return false end elseif not (b4 >= 0x80 and b4 <= 0xBF) then -- 4-byte return false elseif b2 < 0x90 then if not (b1 >= 0xF1 and b1 <= 0xF4) then -- b1 == 0xF0 and b2 < 0x90 is overlong return false end elseif not (b1 >= 0xF0 and b1 <= 0xF3) then -- b1 == 0xF4 and b2 >= 0x90 is too high return false end end return true end do local charset_chars = { ["\0"] = "%z", ["%"] = "%%", ["-"] = "%-", ["]"] = "%]", ["^"] = "%^" } charset_chars.__index = charset_chars local chars = setmetatable({ ["$"] = "%$", ["("] = "%(", [")"] = "%)", ["*"] = "%*", ["+"] = "%+", ["."] = "%.", ["?"] = "%?", ["["] = "%[" }, charset_chars) --[==[ Escapes the magic characters used in a [[mw:Extension:Scribunto/Lua reference manual#Patterns|pattern]] (Lua's version of regular expressions): {$%()*+-.?[]^}, and converts the null character to {%z}. For example, {"^$()%.[]*+-?\0"} becomes {"%^%$%(%)%%%.%[%]%*%+%-%?%z"}. This is necessary when constructing a pattern involving arbitrary text (e.g. from user input). ]==] function export.pattern_escape(str) return (gsub(str, "[%z$%%()*+%-.?[%]^]", chars)) end pattern_escape = export.pattern_escape --[==[ Escapes only {%}, which is the only magic character used in replacement [[mw:Extension:Scribunto/Lua reference manual#Patterns|patterns]] with string.gsub and mw.ustring.gsub. ]==] function export.replacement_escape(str) return (gsub(str, "%%", "%%%%")) end replacement_escape = export.replacement_escape local function case_insensitive_char(ch) local upper_ch = uupper(ch) if upper_ch == ch then ch = ulower(ch) if ch == upper_ch then return chars[ch] or ch end end return "[" .. (charset_chars[upper_ch] or upper_ch) .. (charset_chars[ch] or ch) .. "]" end local function iterate(str, str_len, text, n, start, _gsub, _sub, loc1, loc2) if not (loc1 and start <= str_len) then -- Add final chunk and return. n = n + 1 text[n] = _gsub(_sub(str, start), ".", chars) return elseif loc2 < loc1 then if _sub == sub then local b = byte(str, loc1) if b and b >= 128 then loc1 = loc1 + (b < 224 and 1 or b < 240 and 2 or 3) end end n = n + 1 text[n] = _gsub(_sub(str, start, loc1), ".", chars) start = loc1 + 1 if start > str_len then return end else -- Add chunk up to the current match. n = n + 1 text[n] = _gsub(_sub(str, start, loc1 - 1), ".", chars) -- Add current match. n = n + 1 text[n] = _gsub(_sub(str, loc1, loc2), ".", case_insensitive_char) start = loc2 + 1 end return n, start end --[==[ Escapes the magic characters used in a [[mw:Extension:Scribunto/Lua reference manual#Patterns|pattern]], and makes all characters case-insensitive. An optional pattern or find function (see {split}) may be supplied as the second argument, the third argument (`str_lib`) forces use of the string library, while the fourth argument (`plain`) turns any pattern matching facilities off in the optional pattern supplied. ]==] function export.case_insensitive_pattern(str, pattern_or_func, str_lib, plain) if pattern_or_func == nil then return (gsub(str, str_lib and "[^\128-\255]" or ".[\128-\191]*", case_insensitive_char)) end local text, n, start, str_len, _string, callable = {}, 0, 1 pattern_or_func, str_len, _string, callable = prepare_iter(str, pattern_or_func, str_lib, plain) local _find, _gsub, _sub = _string.find, _string.gsub, _string.sub if callable then repeat n, start = iterate(str, str_len, text, n, start, _gsub, _sub, pattern_or_func(str, start)) until not start -- Special case if the pattern is anchored to the start: "^" always -- anchors to the start position, not the start of the string, so get -- around this by only attempting one match with the pattern, then match -- the end of the string. elseif byte(pattern_or_func) == 0x5E then -- ^ n, start = iterate(str, str_len, text, n, start, _gsub, _sub, _find(str, pattern_or_func, start, plain)) if start ~= nil then iterate(str, str_len, text, n, start, _gsub, _sub, _find(str, "$", start, plain)) end else repeat n, start = iterate(str, str_len, text, n, start, _gsub, _sub, _find(str, pattern_or_func, start, plain)) until not start end return concat(text) end end do local character_classes local function get_character_classes() character_classes, get_character_classes = { [0x41] = true, [0x61] = true, -- Aa [0x43] = true, [0x63] = true, -- Cc [0x44] = true, [0x64] = true, -- Dd [0x4C] = true, [0x6C] = true, -- Ll [0x50] = true, [0x70] = true, -- Pp [0x53] = true, [0x73] = true, -- Ss [0x55] = true, [0x75] = true, -- Uu [0x57] = true, [0x77] = true, -- Ww [0x58] = true, [0x78] = true, -- Xx [0x5A] = true, -- z dealt with separately. }, nil return character_classes end local function check_sets_equal(set1, set2) local k2 for k1, v1 in next, set1 do local v2 = set2[k1] if v1 ~= v2 and (v2 == nil or not check_sets_equal(v1, v2)) then return false end k2 = next(set2, k2) end return next(set2, k2) == nil end local function check_sets(bytes) local key, set1, set = next(bytes) if set1 == true then return true elseif not check_sets(set1) then return false end while true do key, set = next(bytes, key) if not key then return true elseif not check_sets_equal(set, set1) then return false end end end local function make_charset(range) if #range == 1 then return char(range[1]) end sort(range) local compressed, n, start = {}, 0, range[1] for i = 1, #range do local this, nxt = range[i], range[i + 1] if nxt ~= this + 1 then n = n + 1 compressed[n] = this == start and char(this) or char(start) .. "-" .. char(this) start = nxt end end return "[" .. concat(compressed) .. "]" end local function parse_1_byte_charset(pattern, pos) local ch while true do pos, ch = match(pattern, "()([%%%]\192-\255])", pos) if ch == "%" then local nxt = byte(pattern, pos + 1) if not nxt or nxt >= 128 or (character_classes or get_character_classes())[nxt] then -- acdlpsuwxACDLPSUWXZ, but not z return false end pos = pos + 2 elseif ch == "]" then pos = pos + 1 return pos else return false end end end --[==[ Parses `pattern`, a ustring library pattern, and attempts to convert it into a string library pattern. If conversion isn't possible, returns false. ]==] function pattern_simplifier(pattern) if type(pattern) == "number" then return tostring(pattern) end local pos, capture_groups, start, n, output, ch, nxt_pos = 1, 0, 1, 0 while true do -- FIXME: use "()([%%(.[\128-\255])[\128-\191]?[\128-\191]?[\128-\191]?()" and ensure non-UTF8 always fails. pos, ch, nxt_pos = match(pattern, "()([%%(.[\192-\255])[\128-\191]*()", pos) if not ch then break end local nxt = byte(pattern, nxt_pos) if ch == "%" then if nxt == 0x62 then -- b local nxt2, nxt3 = byte(pattern, pos + 2, pos + 3) if not (nxt2 and nxt2 < 128 and nxt3 and nxt3 < 128) then return false end pos = pos + 4 elseif nxt == 0x66 then -- f nxt_pos = nxt_pos + 2 local nxt2, nxt3 = byte(pattern, nxt_pos - 1, nxt_pos) -- Only possible to convert a positive %f charset which is -- all ASCII, so use parse_1_byte_charset. if not (nxt2 == 0x5B and nxt3 and nxt3 ~= 0x5E and nxt3 < 128) then -- [^ return false elseif nxt3 == 0x5D then -- Initial ] is non-magic. nxt_pos = nxt_pos + 1 end pos = parse_1_byte_charset(pattern, nxt_pos) if not pos then return false end elseif nxt == 0x5A then -- Z nxt = byte(pattern, nxt_pos + 1) if nxt == 0x2A or nxt == 0x2D then -- *- pos = pos + 3 else if output == nil then output = {} end local ins = sub(pattern, start, pos - 1) .. "[\1-\127\192-\255]" n = n + 1 if nxt == 0x2B then -- + output[n] = ins .. "%Z*" pos = pos + 3 elseif nxt == 0x3F then -- ? output[n] = ins .. "?[\128-\191]*" pos = pos + 3 else output[n] = ins .. "[\128-\191]*" pos = pos + 2 end start = pos end elseif not nxt or (character_classes or get_character_classes())[nxt] then -- acdlpsuwxACDLPSUWX, but not Zz return false -- Skip the next character if it's ASCII. Otherwise, we will -- still need to do length checks. else pos = pos + (nxt < 128 and 2 or 1) end elseif ch == "(" then if nxt == 0x29 or capture_groups == 32 then -- ) return false end capture_groups = capture_groups + 1 pos = pos + 1 elseif ch == "." then if nxt == 0x2A or nxt == 0x2D then -- *- pos = pos + 2 else if output == nil then output = {} end local ins = sub(pattern, start, pos - 1) .. "[^\128-\191]" n = n + 1 if nxt == 0x2B then -- + output[n] = ins .. ".*" pos = pos + 2 elseif nxt == 0x3F then -- ? output[n] = ins .. "?[\128-\191]*" pos = pos + 2 else output[n] = ins .. "[\128-\191]*" pos = pos + 1 end start = pos end elseif ch == "[" then -- Fail negative charsets. TODO: 1-byte charsets should be safe. if nxt == 0x5E then -- ^ return false -- If the first character is "%", ch_len is determined by the -- next one instead. elseif nxt == 0x25 then -- % nxt = byte(pattern, nxt_pos + 1) elseif nxt == 0x5D then -- Initial ] is non-magic. nxt_pos = nxt_pos + 1 end if not nxt then return false end local ch_len = nxt < 128 and 1 or nxt < 224 and 2 or nxt < 240 and 3 or 4 if ch_len == 1 then -- Single-byte charset. pos = parse_1_byte_charset(pattern, nxt_pos) if not pos then return false end else -- Multibyte charset. -- TODO: 1-byte chars should be safe to mix with multibyte chars. CONFIRM THIS FIRST. local charset_pos, bytes = pos pos = pos + 1 while true do -- TODO: non-ASCII charset ranges. pos, ch, nxt_pos = match(pattern, "^()([^\128-\191])[\128-\191]*()", pos) -- If escaped, get the next character. No need to -- distinguish magic characters or character classes, -- as they'll all fail for having the wrong length -- anyway. if ch == "%" then pos, ch, nxt_pos = match(pattern, "^()([^\128-\191])[\128-\191]*()", nxt_pos) elseif ch == "]" then pos = nxt_pos break end if not (ch and nxt_pos - pos == ch_len) then return false elseif bytes == nil then bytes = {} end local bytes, last = bytes, nxt_pos - 1 for i = pos, last - 1 do local b = byte(pattern, i) local bytes_b = bytes[b] if bytes_b == nil then bytes_b = {} bytes[b] = bytes_b end bytes[b], bytes = bytes_b, bytes_b end bytes[byte(pattern, last)] = true pos = nxt_pos end if not pos then return false end nxt = byte(pattern, pos) if ( (nxt == 0x2A or nxt == 0x2D or nxt == 0x3F) or -- *-? (nxt == 0x2B and ch_len > 2) or -- + not check_sets(bytes) ) then return false end local ranges, b, key, next_byte = {}, 0 repeat key, next_byte = next(bytes) local range, n = {key}, 1 -- Loop starts on the second iteration. for key in next, bytes, key do n = n + 1 range[n] = key end b = b + 1 ranges[b] = range bytes = next_byte until next_byte == true if nxt == 0x2B then -- + local range1, range2 = ranges[1], ranges[2] ranges[1], ranges[3] = make_charset(range1), make_charset(range2) local n = #range2 for i = 1, #range1 do n = n + 1 range2[n] = range1[i] end ranges[2] = make_charset(range2) .. "*" pos = pos + 1 else for i = 1, #ranges do ranges[i] = make_charset(ranges[i]) end end if output == nil then output = {} end nxt = byte(pattern, pos) n = n + 1 output[n] = sub(pattern, start, charset_pos - 1) .. concat(ranges) .. ((nxt == 0x2A or nxt == 0x2B or nxt == 0x2D or nxt == 0x3F) and "%" or "") -- following *+-? now have to be escaped start = pos end elseif not nxt then break elseif nxt == 0x2B then -- + if nxt_pos - pos ~= 2 then return false elseif output == nil then output = {} end pos, nxt_pos = pos + 1, nxt_pos + 1 nxt = byte(pattern, nxt_pos) local ch2 = sub(pattern, pos, pos) n = n + 1 output[n] = sub(pattern, start, pos - 1) .. "[" .. ch .. ch2 .. "]*" .. ch2 .. ((nxt == 0x2A or nxt == 0x2B or nxt == 0x2D or nxt == 0x3F) and "%" or "") -- following *+-? now have to be escaped pos, start = nxt_pos, nxt_pos elseif nxt == 0x2A or nxt == 0x2D or nxt == 0x3F then -- *-? return false else pos = nxt_pos end end if start == 1 then return pattern end return concat(output) .. sub(pattern, start) end pattern_simplifier = memoize(pattern_simplifier, true) export.pattern_simplifier = pattern_simplifier end --[==[ Parses `charset`, the interior of a string or ustring library character set, and normalizes it into a string or ustring library pattern (e.g. {"abcd-g"} becomes {"[abcd-g]"}, and {"[]"} becomes {"[[%]]"}). The negative (`^`), range (`-`) and literal (`%`) magic characters work as normal, and character classes may be used (e.g. `%d` and `%w`), but opening and closing square brackets are sanitized so that they behave like ordinary characters. ]==] function get_charset(charset) if type(charset) == "number" then return tostring(charset) end local pos, start, n, output = 1, 1, 0 if byte(charset) == 0x5E then -- ^ pos = pos + 1 end -- FIXME: "]" is non-magic if it's the first character in a charset. local nxt_pos, nxt while true do local new_pos, ch = match(charset, "()([%%%-%]])", pos) if not ch then break -- Skip percent escapes. Ranges can't start with them, either. elseif ch == "%" then pos = new_pos + 2 else -- If `ch` is a hyphen, get the character before iff it's at or ahead of `pos`. if ch == "-" and new_pos > pos then pos, nxt_pos, nxt = new_pos - 1, new_pos, ch ch = sub(charset, pos, pos) else pos, nxt_pos = new_pos, new_pos + 1 nxt = sub(charset, nxt_pos, nxt_pos) end -- Range. if nxt == "-" then if output == nil then output = {} end n = n + 1 output[n] = sub(charset, start, pos - 1) nxt_pos = nxt_pos + 1 nxt = sub(charset, nxt_pos, nxt_pos) -- Ranges fail if they end with a percent escape, so escape the hyphen to avoid undefined behaviour. if nxt == "" or nxt == "%" then n = n + 1 output[n] = (ch == "]" and "%]" or ch) .. "%-" start = nxt_pos nxt_pos = nxt_pos + 2 -- Since ranges can't contain "%]", since it's escaped, range inputs like "]-z" or "a-]" must be -- adjusted to the character before or after, plus "%]" (e.g. "%]^-z" or "a-\\%]"). The escaped "%]" is -- omitted if the range would be empty (i.e. if the first byte is greater than the second). else n = n + 1 output[n] = (ch == "]" and (byte(nxt) >= 0x5D and "%]^" or "^") or ch) .. "-" .. (nxt == "]" and (byte(ch) <= 0x5D and "\\%]" or "\\") or nxt) nxt_pos = nxt_pos + 1 start = nxt_pos end elseif ch == "-" or ch == "]" then if output == nil then output = {} end n = n + 1 output[n] = sub(charset, start, pos - 1) .. "%" .. ch start = nxt_pos end pos = nxt_pos end end if start == 1 then return "[" .. charset .. "]" end return "[" .. concat(output) .. sub(charset, start) .. "]" end get_charset = memoize(get_charset, true) export.get_charset = get_charset function export.len(str) return type(str) == "number" and len(str) or #str - #gsub(str, "[^\128-\191]+", "") end ulen = export.len function export.sub(str, i, j) str, i = type(str) == "number" and tostring(str) or str, i or 1 if i < 0 or j and j < 0 then return usub(str, i, j) elseif j and i > j or i > #str then return "" end local n, new_i = 0 for loc1, loc2 in gmatch(str, "()[^\128-\191]+()[\128-\191]*") do n = n + loc2 - loc1 if not new_i and n >= i then new_i = loc2 - (n - i) - 1 if not j then return sub(str, new_i) end end if j and n > j then return sub(str, new_i, loc2 - (n - j) - 1) end end return new_i and sub(str, new_i) or "" end do local function _find(str, loc1, loc2, ...) if loc1 and not match(str, "^()[^\128-\255]*$") then -- Use raw values of loc1 and loc2 to get loc1 and the length of the match. loc1, loc2 = ulen(sub(str, 1, loc1)), ulen(sub(str, loc1, loc2)) -- Offset length with loc1 to get loc2. loc2 = loc1 + loc2 - 1 end return loc1, loc2, ... end --[==[A version of find which uses string.find when possible, but otherwise uses mw.ustring.find.]==] function export.find(str, pattern, init, plain) init = init or 1 if init ~= 1 and not match(str, "^()[^\128-\255]*$") then return ufind(str, pattern, init, plain) elseif plain then return _find(str, find(str, pattern, init, true)) end local simple = pattern_simplifier(pattern) if simple then return _find(str, find(str, simple, init)) end return ufind(str, pattern, init) end end --[==[A version of match which uses string.match when possible, but otherwise uses mw.ustring.match.]==] function export.match(str, pattern, init) init = init or 1 if init ~= 1 and not match(str, "^()[^\128-\255]*$") then return umatch(str, pattern, init) end local simple = pattern_simplifier(pattern) if simple then return match(str, simple, init) end return umatch(str, pattern, init) end --[==[A version of gmatch which uses string.gmatch when possible, but otherwise uses mw.ustring.gmatch.]==] function export.gmatch(str, pattern) local simple = pattern_simplifier(pattern) if simple then return gmatch(str, simple) end return ugmatch(str, pattern) end --[==[A version of gsub which uses string.gsub when possible, but otherwise uses mw.ustring.gsub.]==] function export.gsub(str, pattern, repl, n) local simple = pattern_simplifier(pattern) if simple then return gsub(str, simple, repl, n) end return ugsub(str, pattern, repl, n) end --[==[ Like gsub, but pattern-matching facilities are turned off, so `pattern` and `repl` (if a string) are treated as literal. ]==] function export.plain_gsub(str, pattern, repl, n) return gsub(str, pattern_escape(pattern), type(repl) == "string" and replacement_escape(repl) or repl, n) end --[==[ Reverses a UTF-8 string; equivalent to string.reverse. ]==] function export.reverse(str) return reverse((gsub(str, "[\192-\255][\128-\191]*", reverse))) end function export.char(...) -- To be moved to [[Module:string/char]]. return u(...) end do local function utf8_err(func_name) error(format("bad argument #1 to '%s' (string is not UTF-8)", func_name), 4) end local function get_codepoint(func_name, b1, b2, b3, b4) if b1 <= 0x7F then return b1, 1 elseif not (b2 and b2 >= 0x80 and b2 <= 0xBF) then utf8_err(func_name) elseif b1 <= 0xDF then local cp = 0x40 * b1 + b2 - 0x3080 return cp >= 0x80 and cp or utf8_err(func_name), 2 elseif not (b3 and b3 >= 0x80 and b3 <= 0xBF) then utf8_err(func_name) elseif b1 <= 0xEF then local cp = 0x1000 * b1 + 0x40 * b2 + b3 - 0xE2080 return cp >= 0x800 and cp or utf8_err(func_name), 3 elseif not (b4 and b4 >= 0x80 and b4 <= 0xBF) then utf8_err(func_name) end local cp = 0x40000 * b1 + 0x1000 * b2 + 0x40 * b3 + b4 - 0x3C82080 return cp >= 0x10000 and cp <= 0x10FFFF and cp or utf8_err(func_name), 4 end function export.codepoint(str, i, j) if str == "" then return -- return nothing elseif type(str) == "number" then return byte(str, i, j) end i, j = i or 1, j == -1 and #str or i or 1 if i == 1 and j == 1 then return (get_codepoint("codepoint", byte(str, 1, 4))) elseif i < 0 or j < 0 then return ucodepoint(str, i, j) -- FIXME end local n, nb, ret, nr = 0, 1, {}, 0 while n < j do n = n + 1 if n < i then local b = byte(str, nb) nb = nb + (b < 128 and 1 or b < 224 and 2 or b < 240 and 3 or 4) else local b1, b2, b3, b4 = byte(str, nb, nb + 3) if not b1 then break end nr = nr + 1 local add ret[nr], add = get_codepoint("codepoint", b1, b2, b3, b4) nb = nb + add end end return unpack(ret) end codepoint = export.codepoint function export.gcodepoint(str, i, j) i, j = i or 1, j ~= -1 and j or nil if i < 0 or j and j < 0 then return ugcodepoint(str, i, j) -- FIXME end local n, nb = 1, 1 while n < i do local b = byte(str, nb) if not b then break end nb = nb + (b < 128 and 1 or b < 224 and 2 or b < 240 and 3 or 4) n = n + 1 end return function() if j and n > j then return nil end n = n + 1 local b1, b2, b3, b4 = byte(str, nb, nb + 3) if not b1 then return nil end local ret, add = get_codepoint("gcodepoint", b1, b2, b3, b4) nb = nb + add return ret end end end do local _ulower = ulower --[==[A version of lower which uses string.lower when possible, but otherwise uses mw.ustring.lower.]==] function export.lower(str) return (match(str, "^()[^\128-\255]*$") and lower or _ulower)(str) end end do local _uupper = uupper --[==[A version of upper which uses string.upper when possible, but otherwise uses mw.ustring.upper.]==] function export.upper(str) return (match(str, "^()[^\128-\255]*$") and upper or _uupper)(str) end end do local function add_captures(t, n, ...) if ... == nil then return end -- Insert any captures from the splitting pattern. local offset, capture = n - 1, ... while capture do n = n + 1 t[n] = capture capture = select(n - offset, ...) end return n end --[==[ Reimplementation of mw.text.split() that includes any capturing groups in the splitting pattern. This works like Python's re.split() function, except that it has Lua's behavior when the split pattern is empty (i.e. advancing by one character at a time; Python returns the whole remainder of the string). When possible, it will use the string library, but otherwise uses the ustring library. There are two optional parameters: `str_lib` forces use of the string library, while `plain` turns any pattern matching facilities off, treating `pattern` as literal. In addition, `pattern` may be a custom find function (or callable table), which takes the input string and start index as its two arguments, and must return the start and end index of the match, plus any optional captures, or nil if there are no further matches. By default, the start index will be calculated using the ustring library, unless `str_lib` or `plain` is set. ]==] function export.split(str, pattern_or_func, str_lib, plain) local iter, t, n = gsplit(str, pattern_or_func, str_lib, plain), {}, 0 repeat n = add_captures(t, n, iter()) until n == nil return t end export.capturing_split = export.split -- To be removed. end --[==[ Returns an iterator function, which iterates over the substrings returned by {split}. The first value returned is the string up the splitting pattern, with any capture groups being returned as additional values on that iteration. ]==] function export.gsplit(str, pattern_or_func, str_lib, plain) local start, final, str_len, _string, callable = 1 pattern_or_func, str_len, _string, callable = prepare_iter(str, pattern_or_func, str_lib, plain) local _find, _sub = _string.find, _string.sub local function iter(loc1, loc2, ...) -- If no match, or there is but we're past the end of the string -- (which happens when the match is the empty string), then return -- the final chunk. if not loc1 then final = true return _sub(str, start) end -- Special case: If we match the empty string, then eat the -- next character; this avoids an infinite loop, and makes -- splitting by the empty string work the way mw.text.gsplit() does -- (including non-adjacent empty string matches with %f). If we -- reach the end of the string this way, set `final` to true, so we -- don't get stuck matching the empty string at the end. local chunk if loc2 < loc1 then -- If using the string library, we need to make sure we advance -- by one UTF-8 character. if _sub == sub then local b = byte(str, loc1) if b and b >= 128 then loc1 = loc1 + (b < 224 and 1 or b < 240 and 2 or 3) end end chunk = _sub(str, start, loc1) if loc1 >= str_len then final = true else start = loc1 + 1 end -- Eat chunk up to the current match. else chunk = _sub(str, start, loc1 - 1) start = loc2 + 1 end return chunk, ... end if callable then return function() if not final then return iter(pattern_or_func(str, start)) end end -- Special case if the pattern is anchored to the start: "^" always -- anchors to the start position, not the start of the string, so get -- around this by only attempting one match with the pattern, then match -- the end of the string. elseif byte(pattern_or_func) == 0x5E then -- ^ local returned return function() if not returned then returned = true return iter(_find(str, pattern_or_func, start, plain)) elseif not final then return iter(_find(str, "$", start, plain)) end end end return function() if not final then return iter(_find(str, pattern_or_func, start, plain)) end end end gsplit = export.gsplit function export.count(str, pattern, plain) if plain then return select(2, gsub(str, pattern_escape(pattern), "")) end local simple = pattern_simplifier(pattern) if simple then return select(2, gsub(str, pattern, "")) end return select(2, ugsub(str, pattern, "")) end function export.trim(str, charset, str_lib, plain) if charset == nil then -- "^.*%S" is the fastest trim algorithm except when strings only consist of characters to be trimmed, which are -- very slow due to catastrophic backtracking. gsub with "^%s*" gets around this by trimming such strings to "" -- first. return match(gsub(str, "^%s*", ""), "^.*%S") or "" elseif charset == "" then return str end charset = plain and ("[" .. charset_escape(charset) .. "]") or get_charset(charset) -- The pattern uses a non-greedy quantifier instead of the algorithm used for %s, because negative character sets -- are non-trivial to compute (e.g. "[^^-z]" becomes "[%^_-z]"). Plus, if the ustring library has to be used, there -- would be two callbacks into PHP, which is slower. local pattern = "^" .. charset .. "*(.-)" .. charset .. "*$" if not str_lib then local simple = pattern_simplifier(pattern) if not simple then return umatch(str, pattern) end pattern = simple end return match(str, pattern) end trim = export.trim do local entities local function get_entities() entities, get_entities = load_data("Module:data/entities"), nil return entities end local function decode_entity(hash, x, code) if hash == "" then return (entities or get_entities())[x .. code] end local cp if x == "" then cp = match(code, "^()%d+$") and tonumber(code) else cp = match(code, "^()%x+$") and tonumber(code, 16) end return cp and (cp <= 0xD7FF or cp >= 0xE000 and cp <= 0x10FFFF) and u(cp) or nil end -- Non-ASCII characters aren't valid in proper HTML named entities, but MediaWiki uses them in some custom aliases -- which have also been included in [[Module:data/entities]]. function export.decode_entities(str) local amp = find(str, "&", nil, true) return amp and find(str, ";", amp, true) and gsub(str, "&(#?)([xX]?)([%w\128-\255]+);", decode_entity) or str end end do local entities local function get_entities() -- Memoized HTML entities (taken from mw.text.lua). entities, get_entities = { ["\""] = "&quot;", ["&"] = "&amp;", ["'"] = "&#039;", ["<"] = "&lt;", [">"] = "&gt;", ["\194\160"] = "&nbsp;", }, nil return entities end local function encode_entity(ch) local entity = (entities or get_entities())[ch] if entity == nil then local cp = codepoint(ch) -- U+D800 to U+DFFF are surrogates, so can't be encoded as entities. entity = cp and (cp <= 0xD7FF or cp >= 0xE000) and format("&#%d;", cp) or false entities[ch] = entity end return entity or nil end function export.encode_entities(str, charset, str_lib, plain) if charset == nil then return (gsub(str, "[\"&'<>\194]\160?", entities or get_entities())) elseif charset == "" then return str end local pattern = plain and ("[" .. charset_escape(charset) .. "]") or charset == "." and charset or get_charset(charset) if not str_lib then local simple = pattern_simplifier(pattern) if not simple then return (ugsub(str, pattern, encode_entity)) end pattern = simple end return (gsub(str, pattern, encode_entity)) end end do local function decode_path(code) return char(tonumber(code, 16)) end local function decode(lead, trail) if lead == "+" or lead == "_" then return " " .. trail elseif #trail == 2 then return decode_path(trail) end return lead .. trail end function export.decode_uri(str, enctype) enctype = enctype and upper(enctype) or "QUERY" if enctype == "PATH" then return find(str, "%", nil, true) and gsub(str, "%%(%x%x)", decode_path) or str elseif enctype == "QUERY" then return (find(str, "%", nil, true) or find(str, "+", nil, true)) and gsub(str, "([%%%+])(%x?%x?)", decode) or str elseif enctype == "WIKI" then return (find(str, "%", nil, true) or find(str, "_", nil, true)) and gsub(str, "([%%_])(%x?%x?)", decode) or str end error("bad argument #2 to 'decode_uri' (expected QUERY, PATH, or WIKI)", 2) end end do local function _remove_comments(str, pre) local head = find(str, "<!--", nil, true) if not head then return str end local ret, n = {sub(str, 1, head - 1)}, 1 while true do local loc = find(str, "-->", head + 4, true) if not loc then return pre and concat(ret) or concat(ret) .. sub(str, head) end head = loc + 3 loc = find(str, "<!--", head, true) if not loc then return concat(ret) .. sub(str, head) end n = n + 1 ret[n] = sub(str, head, loc - 1) head = loc end end --[==[ Removes any HTML comments from the input text. `stage` can be one of three options: * {"PRE"} (default) applies the method used by MediaWiki's preprocessor: all {{code|html|<nowiki><!-- ... --></nowiki>}} pairs are removed, as well as any text after an unclosed {{code|html|<nowiki><!--</nowiki>}}. This is generally suitable when parsing raw template or [[mw:Parser extension tags|parser extension tag]] code. (Note, however, that the actual method used by the preprocessor is considerably more complex and differs under certain conditions (e.g. comments inside nowiki tags); if full accuracy is absolutely necessary, use [[Module:template parser]] instead). * {"POST"} applies the method used to generate the final page output once all templates have been expanded: it loops over the text, removing any {{code|html|<nowiki><!-- ... --></nowiki>}} pairs until no more are found (e.g. {{code|html|<nowiki><!-<!-- ... -->- ... --></nowiki>}} would be fully removed), but any unclosed {{code|html|<nowiki><!--</nowiki>}} is ignored. This is suitable for handling links embedded in template inputs, where the {"PRE"} method will have already been applied by the native parser. * {"BOTH"} applies {"PRE"} then {"POST"}. ]==] function export.remove_comments(str, stage) if not stage or stage == "PRE" then return _remove_comments(str, true) end local processed = stage == "POST" and _remove_comments(str) or stage == "BOTH" and _remove_comments(str, true) or error("bad argument #2 to 'remove_comments' (expected PRE, POST, or BOTH)", 2) while processed ~= str do str = processed processed = _remove_comments(str) end return str end end do local byte_escapes local function get_byte_escapes() byte_escapes, get_byte_escapes = load_data("Module:string utilities/data").byte_escapes, nil return byte_escapes end local function escape_byte(b) return (byte_escapes or get_byte_escapes())[b] or format("\\%03d", byte(b)) end function export.escape_bytes(str) return (gsub(str, ".", escape_byte)) end end function export.format_fun(str, fun) return (gsub(str, "{(\\?)((\\?)[^{}]*)}", function(p1, name, p2) if #p1 + #p2 == 1 then return name == "op" and "{" or name == "cl" and "}" or error(mw.getCurrentFrame():getTitle() .. " format: unrecognized escape sequence '{\\" .. name .. "}'") elseif fun(name) and type(fun(name)) ~= "string" then error(mw.getCurrentFrame():getTitle() .. " format: \"" .. name .. "\" is a " .. type(fun(name)) .. ", not a string") end return fun(name) or error(mw.getCurrentFrame():getTitle() .. " format: \"" .. name .. "\" not found in table") end)) end format_fun = export.format_fun --[==[ This function, unlike {string.format} and {mw.ustring.format}, takes just two parameters, a format string and a table, and replaces all instances of { {param_name} } in the format string with the table's entry for {param_name}. The opening and closing brace characters can be escaped with { {\op} } and { {\cl} }, respectively. A table entry beginning with a slash can be escaped by doubling the initial slash. ====Examples==== * {string_utilities.format("{foo} fish, {bar} fish, {baz} fish, {quux} fish", {["foo"]="one", ["bar"]="two", ["baz"]="red", ["quux"]="blue"}) } *: produces: {"one fish, two fish, red fish, blue fish"} * {string_utilities.format("The set {\\op}1, 2, 3{\\cl} contains {\\\\hello} elements.", {["\\hello"]="three"})} *: produces: {"The set {1, 2, 3} contains three elements."} *:* Note that the single and double backslashes should be entered as double and quadruple backslashes when quoted in a literal string. ]==] function export.format(str, tbl) return format_fun(str, function(key) return tbl[key] end) end do local function do_uclcfirst(str, case_func) -- Re-case the first letter. local first, remainder = match(str, "^(.[\128-\191]*)(.*)") return first and (case_func(first) .. remainder) or "" end local function uclcfirst(str, case_func) -- Strip off any HTML tags at the beginning. This currently does not handle comments or <ref>...</ref> -- correctly; it's intended for text wrapped in <span> or the like, as happens when passing text through -- [[Module:links]]. local html_at_beginning = nil if str:match("^<") then while true do local html_tag, rest = str:match("^(<.->)(.*)$") if not html_tag then break end if not html_at_beginning then html_at_beginning = {} end insert(html_at_beginning, html_tag) str = rest end end -- If there's a link at the beginning, re-case the first letter of the -- link text. This pattern matches both piped and unpiped links. -- If the link is not piped, the second capture (linktext) will be empty. local link, linktext, remainder = match(str, "^%[%[([^|%]]+)%|?(.-)%]%](.*)$") local retval if link then retval = "[[" .. link .. "|" .. do_uclcfirst(linktext ~= "" and linktext or link, case_func) .. "]]" .. remainder else retval = do_uclcfirst(str, case_func) end if html_at_beginning then retval = concat(html_at_beginning) .. retval end return retval end --[==[ Uppercase the first character of the input string, correctly handling one-part and two-part links, optionally surrounded by HTML tags such as `<nowiki><span>...</span></nowiki>`, possibly nested. Intended to correctly uppercase the first character of text that may include links that have been passed through `full_link()` in [[Module:links]] or a similar function. ]==] function export.ucfirst(str) return uclcfirst(str, uupper) end ucfirst = export.ucfirst --[==[ Lowercase the first character of the input string, correctly handling one-part and two-part links, optionally surrounded by HTML tags such as `<nowiki><span>...</span></nowiki>`, possibly nested. Intended to correctly lowercase the first character of text that may include links that have been passed through `full_link()` in [[Module:links]] or a similar function. ]==] function export.lcfirst(str) return uclcfirst(str, ulower) end --[==[Capitalizes each word of the input string. WARNING: May be broken in the presence of multiword links.]==] function export.capitalize(str) -- Capitalize multi-word that is separated by spaces -- by uppercasing the first letter of each part. return (ugsub(str, "%w+", ucfirst)) end local function do_title_case(first, remainder) first = uupper(first) return remainder == "" and first or (first .. ulower(remainder)) end --[==[ Capitalizes each word of the input string, with any further letters in each word being converted to lowercase. ]==] function export.title_case(str) return str == "" and "" or ugsub(str, "(%w)(%w*)", do_title_case) end title_case = export.title_case --[==[ Converts the input string to {{w|Camel case|CamelCase}}. Any non-word characters are treated as breaks between words. If `lower_first` is set, then the first character of the string will be lowercase (e.g. camelCase). ]==] function export.camel_case(str, lower_first) str = ugsub(str, "%W*(%w*)", title_case) return lower_first and do_uclcfirst(str, ulower) or str end end do local function do_snake_case(nonword, word) return nonword == "" and word or "_" .. word end --[==[ Converts the input string to {{w|Snake case|snake_case}}. Any non-word characters are treated as breaks between words. ]==] function export.snake_case(str) return (ugsub(str, "(%W*)(%w*)", do_snake_case)) end end return export ndjfpg4lokleft04xzto19wigx7fczh Module:memoize 828 9526 31791 2026-05-01T09:45:26Z آیات محراج 3545 Created page with "local math_module = "Module:math" local table_pack_module = "Module:table/pack" local require = require local select = select local unpack = unpack or table.unpack -- Lua 5.2 compatibility -- table.pack: in Lua 5.2+, this is a function that wraps the parameters given -- into a table with the additional key `n` that contains the total number of -- parameters given. This is not available on Lua 5.1, so [[Module:table/pack]] -- provides the same functionality. local funct..." 31791 Scribunto text/plain local math_module = "Module:math" local table_pack_module = "Module:table/pack" local require = require local select = select local unpack = unpack or table.unpack -- Lua 5.2 compatibility -- table.pack: in Lua 5.2+, this is a function that wraps the parameters given -- into a table with the additional key `n` that contains the total number of -- parameters given. This is not available on Lua 5.1, so [[Module:table/pack]] -- provides the same functionality. local function pack(...) pack = require(table_pack_module) return pack(...) end local function sign(...) sign = require(math_module).sign return sign(...) end ----- M E M O I Z A T I O N----- -- Memoizes a function or callable table. -- Supports any number of arguments and return values. -- If the optional parameter `simple` is set, then the memoizer will use a faster implementation, but this is only compatible with one argument and one return value. If `simple` is set, additional arguments will be accepted, but this should only be done if those arguments will always be the same. -- Sentinels. local _nil, neg_0, pos_nan, neg_nan = {}, {}, {}, {} -- Certain values can't be used as table keys, so they require sentinels as well: e.g. f("foo", nil, "bar") would be memoized at memo["foo"][_nil]["bar"][memo]. These values are: -- nil. -- -0, which is equivalent to 0 in most situations, but becomes "-0" on conversion to string; it also behaves differently in some operations (e.g. 1/a evaluates to inf if a is 0, but -inf if a is -0). -- NaN and -NaN, which are the only values for which n == n is false; they only seem to differ on conversion to string ("nan" and "-nan"). local function get_key(x) if x == x then return x == nil and _nil or x == 0 and 1 / x < 0 and neg_0 or x end return sign(x) == 1 and pos_nan or neg_nan end -- Return values are memoized as tables of return values, which are looked up using each input argument as a key, followed by `memo`. e.g. if the input arguments were (1, 2, 3), the memo would be located at t[1][2][3][memo]. `memo` is always used as the final lookup key so that (for example) the memo for f(1, 2, 3), f[1][2][3][memo], doesn't interfere with the memo for f(1, 2), f[1][2][memo]. local function get_memo(memo, n, nargs, key, ...) key = get_key(key) local next_memo = memo[key] if next_memo == nil then next_memo = {} memo[key] = next_memo end memo = next_memo return n == nargs and memo or get_memo(memo, n + 1, nargs, ...) end -- Used to catch the function output values instead of using a table directly, -- since pack() returns a table with the key `n`, giving the number of return -- values, even if they are nil. This ensures that any nil return values after -- the last non-nil value will always be present (e.g. pack() gives {n = 0}, -- pack(nil) gives {n = 1}, pack(nil, "foo", nil) gives {[2] = "foo", n = 3} -- etc.). The distinction between nil and nothing affects some native functions -- (e.g. tostring() throws an error, but tostring(nil) returns "nil"), so it -- needs to be reconstructable from the memo. local function memoize_then_return(memo, _memo, ...) _memo[memo] = pack(...) return ... end return function(func, simple) local memo = {} if simple then return function(...) local key = get_key((...)) local output = memo[key] if output == nil then output = func(...) memo[key] = output == nil and _nil or output return output elseif output == _nil then return nil end return output end end return function(...) local nargs = select("#", ...) -- Since all possible inputs need to be memoized (including true, false -- and nil), the memo table itself is used as a sentinel to ensure that -- the table of arguments will always have a unique key. local _memo = nargs == 0 and memo or get_memo(memo, 1, nargs, ...) local output = _memo[memo] -- If get_memo() returned nil, call `func` with the arguments and catch -- the output with memoize_then_return(); this packs the return values -- into a table to memoize them, then returns them. Since the return -- values are available to it as `...`, this avoids the need to call -- unpack() on the memoized table on the first call, as they can be -- returned directly. if output == nil then return memoize_then_return(memo, _memo, func(...)) end -- Unpack from 1 to the original number of return values (memoized at -- key `n`); unpack() returns nil for any values not in output. return unpack(output, 1, output.n) end end 8gmyvbgmtmwh2pu7neciscs7liw1v30 Jammu and Kashmir 0 9527 31794 2026-05-01T10:00:54Z آیات محراج 3545 Created page with "==اَنٛگریٖزی== ===ناوُک آگُر=== ===وۄشژار=== [[فَیِل:LL-Q1860 (eng)-Sumxr-Jammu and Kashmir.wav]] ===ناوُت=== # [[جۆم تہٕ کٔشیٖر]]" 31794 wikitext text/x-wiki ==اَنٛگریٖزی== ===ناوُک آگُر=== ===وۄشژار=== [[فَیِل:LL-Q1860 (eng)-Sumxr-Jammu and Kashmir.wav]] ===ناوُت=== # [[جۆم تہٕ کٔشیٖر]] s2r29pm3gdfn1gm3pbxdnza2bnmth6p 31802 31794 2026-05-01T10:45:12Z آیات محراج 3545 31802 wikitext text/x-wiki ==اَنٛگریٖزی== {{wp|lang=en}} ===ناوُک آگُر=== ===وۄشژار=== [[فَیِل:LL-Q1860 (eng)-Sumxr-Jammu and Kashmir.wav]] ===ناوُت=== # [[جۆم تہٕ کٔشیٖر]] perxstyyzvw6rrh9ifi30gky34z3z15 فرما:wp 10 9528 31796 2026-05-01T10:09:57Z آیات محراج 3545 Created page with "<div style="float:right;border: 1px solid #aaa;background-color: #f9f9f9;color: black;margin: 0.5em 0 0.5em 1em;padding: 0.2em;float: right;clear: right;text-align: left;font-size: 88%;line-height: 1.5em;width: 20em;font-size: 90%;"> <div style="float: left;">[[File:Wikipedia-logo-v2.svg|49px|وِکیٖپیٖڈیا]]</div> <div style="margin-left: 60px;">'''[[:w:{{{1|Special:Search/{{lc:{{PAGENAME}}}}}}}|{{{2|{{{1|{{PAGENAME}}}}}}}}]]''' خٲطرٕ وِکیٖپیٖڈی..." 31796 wikitext text/x-wiki <div style="float:right;border: 1px solid #aaa;background-color: #f9f9f9;color: black;margin: 0.5em 0 0.5em 1em;padding: 0.2em;float: right;clear: right;text-align: left;font-size: 88%;line-height: 1.5em;width: 20em;font-size: 90%;"> <div style="float: left;">[[File:Wikipedia-logo-v2.svg|49px|وِکیٖپیٖڈیا]]</div> <div style="margin-left: 60px;">'''[[:w:{{{1|Special:Search/{{lc:{{PAGENAME}}}}}}}|{{{2|{{{1|{{PAGENAME}}}}}}}}]]''' خٲطرٕ [[وِکیٖپیٖڈیا]],<br /> اَکھ آزاد اینسایکلوپیٖڈیا مَنٛز ؤچھِو</div> </div><noinclude> [[Category:Interwiki templates]] </noinclude> q9zscg3y5ifm9kzeu46ol5rb4kmz8ci 31798 31796 2026-05-01T10:14:57Z آیات محراج 3545 /* */ 31798 wikitext text/x-wiki <div style="float:right;border: 1px solid #aaa;background-color: #f9f9f9;color: black;margin: 0.5em 0 0.5em 1em;padding: 0.2em;float: right;clear: right;text-align: left;font-size: 88%;line-height: 1.5em;width: 20em;font-size: 90%;"> <div style="float: left;">[[File:Wikipedia-logo-v2.svg|49px|وِکیٖپیٖڈیا]]</div> <div style="margin-left: 60px;">'''[[:w:{{{1|Special:Search/{{lc:{{PAGENAME}}}}}}}|{{{2|{{{1|{{PAGENAME}}}}}}}}]]''' خٲطرٕ ؤچھِو [[وِکیٖپیٖڈیا]],<br /> اَکھ آزاد اینسایکلوپیٖڈیا</div> </div><noinclude> [[Category:Interwiki templates]] </noinclude> 0ttbj7iwnvi7ykdrnogjy52nf7wo6cb 31801 31798 2026-05-01T10:44:24Z آیات محراج 3545 /* */ 31801 wikitext text/x-wiki <div style="float:right;border: 1px solid #aaa;background-color: #f9f9f9;color: black;margin: 0.5em 0 0.5em 1em;padding: 0.2em;float: right;clear: right;text-align: left;font-size: 88%;line-height: 1.5em;width: 20em;font-size: 90%;"> <div style="float: left;">[[File:Wikipedia-logo-v2.svg|49px|وِکیٖپیٖڈیا]]</div> <div style="margin-left: 60px;">'''[[:w:{{#if:{{{lang|}}}|{{{lang}}}:}}{{{1|Special:Search/{{lc:{{PAGENAME}}}}}}}|{{{2|{{{1|{{PAGENAME}}}}}}}}]]''' خٲطرٕ ؤچھِو [[وِکیٖپیٖڈیا]],<br /> اَکھ آزاد اینسایکلوپیٖڈیا</div> </div><noinclude> [[Category:Interwiki templates]] </noinclude> emt4cji178sn4ll2dij38v40cpxlw9y وِکیٖپیٖڈیا 0 9529 31803 2026-05-01T10:55:32Z آیات محراج 3545 Created page with "==کٲشُر== ===ناوُک آگُر=== {{bor|ks|en|Wikipedia}}۔ ===وۄشژار=== {{ks-noun|ipa=vikīpīḍyā}} ===ناوُت=== # مُفت موادَس پؠٹھ مبنی اکھ آن لاین اینسایکلوپیٖڈیا یُس 2001 منٛز شۆروٗع گوٚو، تہٕ چھُ واریاہَن زبانن منٛز دٔستِیاب۔" 31803 wikitext text/x-wiki ==کٲشُر== ===ناوُک آگُر=== {{bor|ks|en|Wikipedia}}۔ ===وۄشژار=== {{ks-noun|ipa=vikīpīḍyā}} ===ناوُت=== # مُفت موادَس پؠٹھ مبنی اکھ آن لاین اینسایکلوپیٖڈیا یُس 2001 منٛز شۆروٗع گوٚو، تہٕ چھُ واریاہَن زبانن منٛز دٔستِیاب۔ an6i13ra0j7kcaw8icvdrcp5rkmq6i6 31804 31803 2026-05-01T10:56:35Z آیات محراج 3545 /* کٲشُر */ 31804 wikitext text/x-wiki ==کٲشُر== {{wp}} ===ناوُک آگُر=== {{bor|ks|en|Wikipedia}}۔ ===وۄشژار=== {{ks-noun|ipa=vikīpīḍyā}} ===ناوُت=== # مُفت موادَس پؠٹھ مبنی اکھ آن لاین اینسایکلوپیٖڈیا یُس 2001 منٛز شۆروٗع گوٚو، تہٕ چھُ واریاہَن زبانن منٛز دٔستِیاب۔ hqw0mtq82xkmuv33a66jn6h894pj9o6