Brazilian Portuguese Lexicon - LexPorBR

Credits

Brazilian Portuguese Lexicon

The Brazilian Portuguese Lexicon was born from an idea written on a postit at the beginning of my Ph.D in Lyon, France. I am doing my Ph.D in psycholinguistics and I investigate the verbal inflectional morphological processing in Brazilian Portuguese, French and bilinguals with Brazilian Portuguese as the first language and French as the foreign language. I started the French experiments in 2012, selecting the stimuli through the Lexique corpus, which offers psycholinguistic and metalinguistic information for word selection (form frequency, word length, number of neighbors, reversed form, CVCV structure, etc.). In 2013, when I was preparing the experiments in Brazilian Portuguese, I realized the lack of existence of a psycholinguistic corpus of the Brazilian Portuguese. In that search, I found the Linguateca website, which offers a number of Portuguse corpora, as the NILC, and it was in that moment that I write "do the Brazilian Portuguese Lexicon" on a postit.

Construction of the Brazilian Portuguese Lexicon

In 2014, with the concepts already organized, I began the construction of the Brazilian Portuguese Lexicon, which was divided in four stages: 1) construction of the corpus with the words and psycholinguistic and metalinguistic information, 2) development of the internet page in HTML to the interface between the user and database, 3) implementation of the corpus in a MySQL database on an internet server and 4) functional programming in PHP of the resources and tools of the Brazilian Portuguese Lexicon. Then I created the other pages as downloads, tools, updates, credits, etc. Currently, I have worked in the page with the engines for generation of Brazilian Portuguese pseudowords, as also the statistical linguisics page.

Alpha Version

The creation of the Brazilian Portuguese Lexicon was conceptualized in three versions: 1) Alpha (2014), 2) Beta (2015), and 3) Delta (2016). The current Alpha version was inaugurated on 03.25.2014, marking the birth of the Brazilian Portuguese Lexicon and the creation of the first psycholinguistic corpus of the Brazilian Portuguese. The Alpha version is an orthgraphic corpus with information processed from the Brazilian Portuguese orthographic words. The Beta version will provide information about: a) phonological form, b) syllables and c) lemma. Finally, the Delta version will offer: a) morphological information, b) allomorphic information, c) word acquisition information, d) word pronunciation engine, and as far as possible, e) reaction time measures of the recognition of a large number of words and pseudowords of the Brazilian Portuguese, following the examples of the Lexicon Projects.

Author

The Brazilian Portuguese lexicon is being developed by Gustavo Lopez Estivalet during his Ph.D financed by the program Science without Borders (CsF) from the the National Council for Scientific and Technological Development (CNPq) scholarship, Brazil, between 2012 and 2016. Gustavo Lopez Estivalet holds his Ph.D in France in the city of Lyon, in the Université Claude Barnard Lyon 1 (UCBL) in the Doctoral School of Neurosciences et Cognition (ED NSCo), in the Laboratory on Language, Brain, and Cognition (L2C2), located at the Institut of Cognitive Sciences (ISC), and being supervised by Prof. Dr. Fanny Meunier, which are funded by the National Council for Scientific Research (CNRS).

NILC/São Carlos Corpus

The Brazilian Portuguese Lexicon was developed from the corpus of the Interinstitutional Center for Computational Linguistics (NILC), based at the Institut of Mathematics and Computer Sciences (ICMC), at the University of São Paulo in the city of São Carlos (USP/São Carlos). The NILC is available on the website of Linguateca under the heading Access to corpora/Provision of corpora (AC/DC) with open and free access under the heading NILC/São Carlos. The NILC structure and quantitative data are found in the Linguateca website under the heading NILC/São Carlos corpus description. The full description of the NILC is found in the Linguateca website under the heading NILC/São Carlos corpus offspring. The 13 files (6 form files: adjectives, adverbs, grammatical, nouns, numerals, and verbs; and the 7 lemma files: adjectives, adverbs, grammatical, nouns, proper names, numerals, and verbs) from the NILC were downloaded in .txt format from the Linguateca website under the headings NILC/São Carlos corpus form lists and NILC/São Carlos corpus lemma lists. Specific NILC details are obtained in the articles:

link Pinheiro, G. M., & Aluísio, S. M. (2003). Corpus NILC: descrição e análise crítica com vistas ao projeto Lacio-Web. Série de Relatórios do Núcleo Interinstitucional de Lingüística Computacional NILC - ICMC - USP. São Carlos, SP: Universidade Federal de São Carlos (UFSCar).
link Maria das Graças Volpe Nunes, Claudete M. Ghiraldelo, Gisele Montilha, Marcelo A. S. Turine, Maria Cristina F. de Oliveira, Ricardo Hasegawa, Ronaldo T. Martins & Osvaldo N. Oliveira Jr. Desenvolvimento de um sistema de revisão gramatical automática para o português do Brasil. In Anais do II Encontro para o Processamento de Português Escrito e Falado (Curitiba, PR, 21-22 de Outubro de 1996), Curitiba: CEFET-PR, pp. 71-80.
link Maria das Graças Volpe Nunes, Fabiano M. Costa Vieira, Cláudia Zavaglia, Cássia R. C. Sossolote & Josélia Hernandez. A construção de um léxico para o português do Brasil: lições aprendidas e perspectivas. In Anais do II Encontro para o Processamento de Português Escrito e Falado (Curitiba, PR, 21-22 de Outubro de 1996), Curitiba: CEFET-PR, pp. 61-70.

Linguateca

What is the legal status of documents, tools, and working materials provided by Linguateca?
"All the material that we provide is not restricted to any group and was authorized (under provided terms) by the respective authors or copyright holders. From resource to resource the conditions are different and are specified in its specific documentation. The tools created by the Linguateca are available under the GNU General Public License. However, it must be taken note to the fundamental difference between what actually is available and what is just cataloged by us. In the first case in the "Access to Resources" section, in the second case in the "Catalog of Resources" section. The conditions of use of the latter must be confirmed with the respective authors."

Lexique

The Brazilian Portuguese Lexicon was inspired by the psycholinguistic corpus of French Lexique, created and maintained by Prof. Dr. Boris New and Prof. Dr. Christophe Pallier. The Lexique has already offered information from the French words to a series of studies and is a great example of psycholinguistic corpus. A detailed description of the Lexique can be found at the Manuel du Lexique Manual and in the articles:

link New, B., Pallier, C., Brysbaert, M., & Ferrand, L. (2004). Lexique 2: A new French lexical database. Behavior Research Methods, Instruments, & Computers, 36(3), 516-524. doi: 10.3758/bf03195598.
link Matos, R., Ferrand, L., Pallier, C., & New, B. (2001). Une base de données lexicales du français contemporain sur internet: LEXIQUE™//A lexical database for contemporary french: LEXIQUE™. L'Année Psychologique, 447-462. doi: 10.3406/psy.2001.1341.

R Program and Packages

The Brazilian Portuguese Lexicon was developed with the R program from the linguistic data in .txt files from NILC. The number of orthographic neighbors (Coltheart's N) and the orthographic Levenshtein distance (OLD20) were calculated with the coltheart.N and old20 functions offered in the package: vwr developed by Prof. Dr. Emmanuel Keuleers. A number of functions from the package: languageR developed by Prof. Dr. R. Harald Baayen were also used in the development of the Brazilian Portuguese lexicon.

Creative Commons License

The Brazilian Portuguese Lexicon from Gustavo Lopez Estivalet is licensed with a License Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International.
Based on the work avaiable in http://www.linguateca.pt/acesso/corpus.php?corpus=SAOCARLOS.
Permissions beyond the scope of this license may be available at http://www.lexicodoportugues.com/credits_en.php.

Acknowledgements

For the realization and success of the Brazilian Portuguese Lexicon, I thank the National Council for Scientific and Technological Development (CNPq) by the Ph.D scholarship from the program Science without Borders (CsF). I thank my Ph.D supervisor Prof. Dr. Fanny Meunier and Prof. Dr. Michel Hoen, who understood the importance of developing a psycholinguistic corpus of the Brazilian Portuguese. I thank the NILC Prof. Dr. Sandra M. Aluísio and Prof. Dr. Maria das Graças Volpe Nunes by the help in materials, informations, and discussions on the NILC, as well as support, motivation and recognition of this work. I thank my work colleagues Léo Varnet and Emmanuel Trouche by the discussions on scripts and algorithms for the development of Brazilian Portuguese Lexicon. I thank the internet community users who work with website development and database management for the forum discussions and tutorials available. Finally, I thank Prof. Dr. Mailce Borges Mota, and the best Portuguese Prof. Lise Lopez. Finally, I thank Luanda Lins by understanding the importance of this project and my motivation to do it. Thank you all!

Contact

contato@lexicodoportugues.com

Brazilian Portuguese Lexicon is licensed with a License Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International.
Last update: 12/27/2019.

Brazilian Portuguese Lexicon - LexPorBR

Links

Credits