Romani is a language divided into several varieties, belonging to the Indo-European branch of the Indo-European language family, and a direct successor to a dialect that must have been closely related, though not necessarily identical, to the vernacular basis of Sanskrit. The languages share common features with both Central Indian and Northwest Indian languages. The linguistic evidence suggests that Romani first participated in an early development of Central Indian languages and then joined the development of Northwest Indian languages such as Sindhi over a longer period. It is therefore assumed that the speakers of Romani at that time came from central India and shifted their areas of settlement to northwest India from the 3rd century BC onwards. There is no agreement on the time of further migration to the west, but it can be placed between the 5th and 10th centuries and must probably also assume several migration movements within this period.
Romani dialects have thus developed independently of other Indian languages for more than 800 years, including at least 700 years in Europe. Romani, after the arrival of its speakers in Europe, was particularly subject in vocabulary and syntax to the influence of the Balkan languages, especially the Middle Greek of the Byzantine period.
Older classifications assumed that Romani had divided into three main variants even before its arrival in Europe: Romani, which came to Europe in the 13th century, Domari in the Middle East and North Africa, and Lomavren in Armenia. Today, however, researchers assume that Romani and Lomavren are only distantly related and that Domari is an independent language that arrived in West Asia from India as early as the 7th century.
A common categorization that lasted for a long time was the division into Vlax (derived from the word Wallachian) and non-Vlax dialects. According to this, Vlax were those Roma who lived in slavery for many centuries on the territory of Romania (Wallachia). The main distinguishing feature between the two groups was the degree of influence of Romanian on their vocabulary. Vlax-speaking groups accounted for the largest number of speakers. Bernard Gilliath-Smith was the first to make this distinction and coined the term Vlax in 1915 in the book The Report on the Gypsy tribes of North East Bulgaria.
In recent decades, a number of scholars have undertaken a linguistic categorization of Romani dialects based on historical development and isoglosses. Much of this work has been done by the Bochum linguist Norbert Boretzky, who pioneered the systematic representation of structural features of Romani dialects on geographical maps. This work culminated in an Atlas of Romani Dialects, co-authored by Birgit Igla, published in 2005, which maps numerous isoglosses. At the University of Manchester, comparable work has been done by linguist and former Romani rights activist Yaron Matras and his colleagues. Together with Viktor Elšík (now of Charles University, Prague), Matras built the Romani Morpho-Syntax Database, which is currently the largest compilation of data on dialects of Romani. Parts of this database can be accessed online via the Manchester Romani Project website. Matras (2002, 2005) advocated a theory of geographical classification of Romani dialects based on the spatial distribution of innovations. According to this theory, Old Romani (English: "Early Romani") (as it was spoken in the Byzantine Empire) was brought to western and other parts of Europe by population migrations from Rome in the 14th-15th centuries. These groups settled in various European regions during the 16th and 17th centuries and acquired linguistic competence in a variety of contact languages. Then changes set in that spread in wave-like patterns, causing the differences in individual languages that can be seen today. According to Matras, there were two main centers of innovation: Some changes appeared in Western Europe (Germany and the surrounding area) and spread eastward; other changes appeared in the Wallachian (Vlax) area and spread westward and southward. In addition, many regional and local isoglosses formed and created a complex wave of language boundaries (English: "complex wave of language boundaries"). Matras points to the prosthesis of the j- in aro > jaro 'egg' and ov > jov 'er' as a typical example of West (post)East diffusion and to the affixation of the prosthetic a- in bijav > abijav as a typical East (post)West diffusion. His implication is that the differences formed in situ and not as a result of different waves of migration.
Rough classification according to Boretzky, more precise classification according to the above-mentioned study by Matras (KS = main contact language):
- Northern Romani dialects in Northern, Western and Southern Europe, most of Poland, Russia and the Baltic States:
- Western Branch:
- Piedmontese Sinti in Italy (KS: Italian)
- Sinti-Romani (Sintitikes) in Germany (formerly also in Bohemia), Austria, Netherlands, Belgium (KS: German); not to be confused with Sinti-Manouche (which is a Para-Romani variety).
- Welsh-Romani with English and Welsh as the main contact language (extinct since the middle of the 20th century)
- Central Branch:
- Bergitka-Romani in Poland (KS: Polish)
- Čerhari-Romani in Romania (KS: Romanian)
- Northeastern dialects (Balto-Slavic dialects in the Baltic States)
- Čuxny-Romani in Estonia (KS: Estonian, Russian)
- Finnish Kalo (Fíntika Rómma) (KS: Finnish)
- Latvian Romani (Lotfika) in Latvia and Russia
- Lithuanian Romani in Lithuania and Baltic Russia
- Northern Russian Romani (Xaladitka) in Baltic Russia, spoken by the Ruska Roma people
- Polish Romani in Poland (KS: Polish)
- North Central Romani:
- Western Subbranch:
- Bohemian Romani
- Moravian Romani
- West Slovak Romani (KS: Slovak)
- Eastern Subbranch:
- Central Slovak Romani
- Eastern Slovak Romani (KS: Slovak)
- Ruthenian Romani
- Southern Polish Romani
- Southern Central Dialects
- Romungro-Romani in Slovakia (KS: Slovak) and Hungary (KS: Hungarian)
- Vend-Romani
- Burgenland Romani ('Roman') in Burgenland, Austria
- Hungarian Vend-Romani
- Prekmurski-Romani in northern Slovenia (KS: Slovenian)
- Southern Balkans I (northern branch of Balkan dialects, also called zis dialects):
- Southern Balkans II (Southern branch of the Balkan dialects)
- Arli-Romani in Southern Serbia and Montenegro (KS: Serbian), Northern Macedonia (KS: Macedonian) and Northern Greece (KS: Greek)
- Cocomanya-Romani in Bulgaria (KS: Bulgarian)
- Crimean Romany in Russia (KS: Russian, Tatar)
- Džambazi-Romani in Northern Macedonia
- Erli-Romani (Yerli) in Bulgaria (KS: Bulgarian)
- Gurvari-Romani in Hungary
- Romacilikanes in Greece (KS: Greek)
- Romani in the Rumelia region between Greece and Turkey (KS: Greek, Turkish)
- Sepeči-Romani in Greece (KS: Turkish)
- Sepečides-Romani in Volos (Greece) and Izmir (Turkey)
- Sofades-Romani in Greece (KS: Greek), spoken by the Sofades-Romani people.
- Ursari-Romani in Romania (KS: Romanian), spoken by the Ursari people.
- Iranian branch (Zargari-Romani in Iran) (KS: Farsi)
- Vlax-Romani:
- North Branch (also called Vlax I):
- Čekeši-Romani in Russia (KS: Russian, Moldovan)
- Kalderash-Romani in Romania (KS: Romanian), spoken by the Kalderash
- Lovari Romani in Czech (KS: Czech), Hungary (KS: Hungarian), spoken by the Lovari people
- Mačvaja-Romani
- Northern Ukrainian in Ukraine (KS: Ukrainian)
- South branch (also called Vlax II):
- Agia-Varvara-Romani in Greece (KS: Greek)
- Gurbet-Romani Serbia (KS: Serbian)
- Gurbet-Rabešte in Serbia and Montenegro (KS: Serbian)
- Kalburdžu-Romani in Bulgaria (KS: Bulgarian, Turkish)
- Moldavian Romani in Moldova (KS: Moldavian, Russian)
- Prizren-Romani in Serbia (KS: Serbian, Albanian)
- Rakarengo-Romani in Moldova (KS: Moldovan)
- Thracian Kalajdži-Romani (Vlaxurja) in Bulgaria (KS: Bulgarian)
To be distinguished from these languages are the so-called Para-Romani languages such as the English Anglo-Romani, the Scandinavian Romani rakripa, the Spanish Caló or the Basque Erromintxela, in which not only the vocabulary but also the syntax and morphology are already dominated by one of the contact languages and which are therefore to be classified as a variant of the contact language.