Diego Rojas

Data Blog in Medium

Latest publication: Trabajando con millones de datos de las Declaraciones Anuales Anonimizadas del SAT en R

Learn more

Evaluation of the initiative to modify the excise tax on beer production

Using text mining techniques (in PySpark and R), I evaluated millions of digital tax receipts (CFDI) with which I estimated the volume of beer produced, the degrees of alcohol, and the tax paid per unit produced. With this data I simulated the change in price and excise tax collection (Ad Valorem to Ad Quantum).

The role of local governments in dealing with the effects of the COVID-19 pandemic in Mexico City

The objective of the project was to classify by nature and type of instrument the programs and actions implemented by the local governmentin Mexico City and the 16 municipalities. Honorable mention in the contest "Datatón: Your money, your data 2021" of the Digital Agency for Public Innovation (ADIP).

Use of web scraping and NLP techniques in social media project

In this project, I used web scraping, NLP, and supervised machine learning techniques to obtain and classify data from digital media publications related to top technology companies such as Google, Facebook, Twitter, etc.

The impact of communication campaigns on citizen commitment

The objective of the project is to provide empirical evidence on the elements that can encourage citizen participation in the fight against corruption. Through an experiment we offer information on the type of message that can generate in citizens a greater propensity to support the activities of the Citizen Participation Council (CPC) of the Aguascalientes Anticorruption System. Specifically, the intention to support a law initiative was measured in two groups that were randomly shown a video with institutional actions (control) versus the group that was shown a video with concrete actions (treatment).

Evaluation of the Program "Contacto Dígital" (072) of the Municipality of Aguascalientes.

Using data analysis and NLP techniques, I assessed the performance of the 072 program in the attention of citizen reports. The reports were classified by type of request and secretary of assignment. In addition, I estimated response and attention times to citizen reports were estimated to establish benchmarks.