Language proposal policy/SabineCretella
From LangCom
![]() |
This page is no longer maintained and may be outdated: moved to m:Special projects subcommittees/Languages/Policy/SabineCretella. |
Language proposal policy (SabineCretella proposal) |
This is the draft policy for processing proposals for new language subdomains of existing projects. These are currently processed under a policy drafted by the community.
SabineCretella wrote this proposal on 06 June 2006. |
Contents |
[edit] New languages in general
New languages creation at this moment cannot be seen as objective. Too many people simply vote because of an opinion, but not thanks to proven facts.
When can a language be considered a language ... that is hard, because even relying on Ethnologue or ISO 639-3 (see note below) we cannot be sure if a language really has been considered as such. One example is Griko Salentino that has two variations that are even quite different in terminology (and of course have also many things in common). Griko Salentino is a language deriving from Greek spoken in the South of Italy, in Puglia and Calabria. The language is considered to be a minority language by the Italian State. So as you can see: this is one of those cases where a language exists, but we don't have a code for it. How would such a discussion go on?
Anyway: to prove that a language is a language there are certain requirements to be met in order to get an ISO 639-3 code.
Furthermore there should also be some internal requirements – such as the Swadesh list completed on Wiktionary/WiktionaryZ (so we have the possibility to compare with other existing languages). We need a certain amount of editors to make sure the project can live (5 dedicated people is not enough ... out of ten who say that they want to contribute, normally only three really do/can do). We should at least have some texts in that language before starting any new language (see wikisource) – also a certain amount of quotes, proverbs etc. would be a solution.
[edit] ISO 639-3
To say if a language is a language we normally quote the ISO 639 code ... I'd go further: for all new languages I would apply for the ISO 639-3 code, because it is much better/detailed even if not exhaustive.
To get such an ISO 639-3 code certain conditions must be met. For example a certain number of documents ... really I don't know them all, but it would make sense that people who want their language to become really a language have to request an ISO-639-3 code from ISO. Meeting these conditions the language is a fact and therefore can be considered as such.
There is already one contact to the standardization organization and this person would also co-operate. The thing is: if we go that way it means that people who want new languages to become true must be really committed ... well, this shouldn't be a problem, right? Otherwise the new project would not have success either.
[edit] Wikipedia
Normally, and that is somewhat strange, a new language is nearly always about creating a new Wikipedia. Wikipedia has very specific requirements and these cannot be met by only a handful of editors. Well, yes, if this handful of editors worked only on Wikipedia, also during the time when other people actually work, they can reach outrageous results, but that most often is not possible. Now there is that requirement of 5 native speakers we have up to now: I say it is not enough. Let's take the nap.wikipedia – we had quite a strong support and approx. 10 people who wanted to contribute. Now what does contribute mean: write articles or also run bots? When it comes to writing articles out of these 10 people we have 3 or 4 regular contributors ... now one less, since due to private problems I can have a look at the project, but I had to reduce workload and therefore had to choose where to spend my time: where it makes more sense to the community. I every now and then edit on nap.wikipedia, but if it was not for E. abu Filumena and GENNYSAR ... well, the Neapolitan wikipedia would be much poorer in contents. Now this is a project where I do marketing for continuously (mainly outside the Wikimedia community) – and I care more about talking about the Neapolitan wikipedia than editing myself, since if I can get a new person to edit every now and then it is better for the community. We need different people, we need to discuss, we need a way to exchange opinions etc. Imagine what would happen if also GENNYSAR and E. abu Filumena had some private problems ... the project would die or at least slow down a lot. So we need at least 5 to 10 really active members.
Getting these 5 to 10 real editors is problematic when your wiki is somewhat isolated from the others – just connected through interwiki links, when there is no real exchange among these small projects that have more or less the same basic problems.
In some time we will have the Multilingual Mediawiki ... considering the experience I gained with the creation of the nap.wikipedia I would say that on one hand we must give people the possibility to create contents and to go ahead, to attract other people, not only within the wikimedia community, but also from outside. This means they need to be able to show something. Well, I can imagine having a Multilingual Mediawiki where all the languages/dialects can create their portal and their pages – so even only one person can start to edit and can be joined by other people step by step. This person, on that kind of wiki, would have less problems, because there is less admin work to do. Not all help pages must be translated there – the portal must give links to meeting places and where to talk about the project + information who to contact in case of questions/requests. In this way we can see if it is a serious attempt to create a new project and people wanting to work on it can work without having to feel that others object to a project that could become valuable. It is then up to the editors to proof that they are working on it seriously – and even if such a project should have a longer pause ... it will just be there waiting for the next one to come and take up action.
Babel templates based on ISO 639-3 should be a requirement in order to understand who is of mother tongue and who is near native from the editors – this will help to see how many people are actually working on a language. I am referring to ISO 639-3 since it is the only logical way to go – it is the only one where new languages can be requested. When it comes to not being present a language code: in order to have the wikipedia public, we could ask people to apply for the code – in that way we assure that we talk about a language and not a dialect. Unless the code is not there the wikipedia contents simply remains on the multilingual pages that are some kind of “trial”, but must have serious contents and not just let's try a bit around how a wiki works ...
Another thing to consider is: we will soon have wikidata at disposal – this means that many contents can be automatically there – names of cities, countries, distances, geocodes ... all that is common to all wikipedias can be in wikidata. Also the calendar can be within a relational database – having this connected to a database structure similar to the WiktionaryZ structure it allows for inserting new relevant dates of events, dates when people died and were born. This means that we only have to fill in the date once and it is at disposal for all and we only translate the description.
We have to consider all these things, because they are on their way – and people for example can work on the translation of the calender events, the structural data before the actual wikipedia is created. I know that all this goes further than only new language creation: but also this has to be considered because it will be there. It does not make sense to change policy without considering these points.
[edit] Languages - and people speaking it
On the page Proposals for closing projects I just read an interesting note (today: 20 June 2006). How do we make sure that people really speak that language and don't fake speaking it ... well how can we control that? I suppose that people propsing new projects should be known people by the community - and since they are known others should know if they write that language or not. I don't have a clue right now how we can make this point sure ... well: I suppose for 95% of language requests there will be no doubts ... but for those we have doubts? hmmmm ....
[edit] Wiktionary/WiktionaryZ
When it comes to Wiktionary/WiktionaryZ I personally insist on ISO 639-3 codes – the reason is simple: it covers more languages and does consider macro languages. That is even more important when it comes to lexicological data. Of course we can upload collections, but for a new language that has no collection the first thing to complete will be the Swadesh list – it is “only” 208 terms (or was it 207 ...?) - well it seems to be easy, but it is not, compiling it is often underestimated because words often have multiple meanings ... The Swadesh list, like already said above, will help us to understand the differences. The actual quantity of general words, necessary to communicate in a language is approx. 2000 words.
It does not make sense to create a new Wiktionary, but to integrate in WiktionaryZ. Well, I know that no decision has been taken for its integration into the Wikimedia foundation: being one of the creators I, like all the other members of the WiktionaryZ committee, want it inside the Wikimedia Foundation – should it for some reason happen that this is not possible right now and that it will take some more time to get an integration, it makes anyway sense to co-operate. The reason is the same: new wikis are mainly about rare languages – so there is not a huge user base that can care about it – see the Sicilian Wiktionary – the community wanted it (it has approx. 6000 terms, mainly uploaded with the bot). At the time of creation I would have preferred to have the contents in the Italian Wiktionary, because it would have been less admin work (now I look every now and than if there is any vandalism) and the language would have had more exposure to other people and therefore I suppose more people would have worked on it, well the community then wanted it differently. This means it does not make much difference for the “big” languages if they are on Wiktionary or WiktionaryZ, but it makes a lot of difference for the rare languages. When you are a member of a huge community you do work better and have more fun than being confined in a small project.
[edit] Wikisource
Has similar problematics to Wiktionary/WiktionaryZ. Also here I would propose a collective Multilingual Mediawiki installation for the rare languages. Contents for rare languages are not too many (normally) and therefore administration work would exceed work on contents. At the beginning Wikisource was a project that had all texts in common. Well, it makes sense to have separate projects for languages that are big enough in contents. Also here: it makes more sense to collect contents and when they reach a certain limit they can be exported to a single Wikisource.
[edit] Wikibooks
If Wikipedia has problems in getting a user base in rare languages that is huge enough, how difficult would that be with Wikibooks? Also here: see Wikisource.
[edit] Wikiquote
That is a question of its own ... we should take care about Wikiquote separately simply because it makes sense to consider it from different points of views – not only when it comes to the collection of quotes, but also to the re-usability of the data.
[edit] Wikinews
Well ... also here: an initial solution would be “one central place for rare languages”. I asked one of the people on the Italian wikinews if we could get a section for the Italian regional languages – I did not get an answer. Since I believe that rare languages will have news only every now and then and they would be about very local events, like in Neapolitan for example a note about Cartoons on the Bay that takes place every year in Amalfi or about the Via Crucis in the several towns on Easter or about the particular Neapolitan Cribs and the feasts around it in the Christmas period etc. A single wiki, here too, would be too much admin work. People working on the several projects, in particular in these rare languages, are mainly the same – they only have different contents that need a place where it can be stored. A separate wiki for all those small communities together also means that they can talk “across projects” share experiences, grow community more easily. A Neapolitan speaker can understand for example most of the Sicilian texts, a huge part of the Piemontese texts and so they can read even if they are not able to talk/write in that language. It is like me reading Spanish: I can understand it quite well, but I cannot write or really talk.
[edit] Additional Notes
All these are things that come into mind considering what we have, what we need and the difficulties I personally know. Please let me know about your doubts and experiences – your thoughts and also personal feelings. Knowing them will contribute to resolve our very difficult task to find a proper way of new projects creation and to determine how we are going to consider a new language to be a language. I myself would opt for the ISO 639-3 way, strictly co-operating with them. In that way: people who want a new language to be there, one that is not considered to be a language must proof that it is. Of course we must also find out how to deal with languages that have no script at all ... how to get all that in our projects, because according to a study every week one language on earth dies. Here we should consider Commons as well – as a repository of sound files of people actually speaking that language, of photographic collections that show how these people live, etc. It will preserve a part of the culture of mankind for the future. It is relevant, it is important to do it.
My thoughts go much further – but for now we have to consider all these actual necessities and problems to reach the goal. We should consider that many people now objecting to languages do that without knowing the language, often also because of politics. What we do, within the Wikimedia Foundation must stay away from actual “personal opinions” and “national politics” - we must be objective and give anybody a chance. People who want to do actually must show that they really want to do. So creating basic requirements like the request for an ISO 639 code, proofing that a language exists and a close co-operation with ISO on languages and policies on how to consider what actually is a language (by not excluding those who only have an oral tradition up to now) is a very objective way to do things, even if often it will turn out to be a harder way than today to reach the creation of a new language.
[edit] Where languages are spoken etc.
I am just getting sick to explain the maybe hundredth time that nap=Neapolitan does not only refer to the language spoken in Naples, but to a huge area (including parts of Abruzzo, Puglia etc.) following the definition of Rohlfs. That nap is really more than just a language - on the considered territory you can find other languages as well as dialects of Neapolitan and dialects of these other languages. People do not seem to be willing to understand and quite often miscredit nap in that way. It is often more a kind of a political thingie - but I cannot help ... so one thing would be fundamental: have pages like on Ethnologue, but have them in a wiki format where people can add information - something that could be a starting point for the understanding what needs to be a separate language and what definitely is a dialect. Where people can add information if certain requirements of "speaking a different language" can be met (requirements set by ISO ... but maybe also providing other information that proofs that they exist). Wikistandards?--Sabine 07:13, 6 June 2006 (UTC)