Data Blog in Medium
Latest publication: Trabajando con millones de datos de las Declaraciones Anuales Anonimizadas del SAT en R
Learn moreLatest publication: Trabajando con millones de datos de las Declaraciones Anuales Anonimizadas del SAT en R
Learn moreUsing text mining techniques (in PySpark and R), I evaluated millions of digital tax receipts (CFDI) with which I estimated the volume of beer produced, the degrees of alcohol, and the tax paid per unit produced. With this data I simulated the change in price and excise tax collection (Ad Valorem to Ad Quantum).
The objective of the project was to classify by nature and type of instrument the programs and actions implemented by the local governmentin Mexico City and the 16 municipalities. Honorable mention in the contest "Datatón: Your money, your data 2021" of the Digital Agency for Public Innovation (ADIP).
In this project, I used web scraping, NLP, and supervised machine learning techniques to obtain and classify data from digital media publications related to top technology companies such as Google, Facebook, Twitter, etc.
The objective of the project is to provide empirical evidence on the elements that can encourage citizen participation in the fight against corruption. Through an experiment we offer information on the type of message that can generate in citizens a greater propensity to support the activities of the Citizen Participation Council (CPC) of the Aguascalientes Anticorruption System. Specifically, the intention to support a law initiative was measured in two groups that were randomly shown a video with institutional actions (control) versus the group that was shown a video with concrete actions (treatment).
Using data analysis and NLP techniques, I assessed the performance of the 072 program in the attention of citizen reports. The reports were classified by type of request and secretary of assignment. In addition, I estimated response and attention times to citizen reports were estimated to establish benchmarks.